Apart from its own native file formats, DataWarrior also reads and writes TAB-delimited and comma-separated text files as well as SD-files, which are the de-facto industry standard for exchanging chemical information. In addition to reading data from files, data may be pasted from the clipboard or retrieved from databases. After reading data from any source, DataWarrior analyses every column to understand the kind of data it contains, i.e. whether it contains numerical and/or category data, whether the row contains empty values, and more. It also checks for correlations and creates default views and filters.
If DataWarrior was installed correctly, then every file type discussed in this section should have a proper icon assigned and double clicking a file's icon should result in DataWarrior opening the file. This section explains the interaction with files and the clipboard.
Whenever you save any data from DataWarrior to open it later from the same application, then a native DataWarrior file ending with .dwar is the preferred file type. In addition to the plain data, .dwar files may contain the following kind of information:
To open a native DataWarrior file, choose from the menu or just double-click an icon representing a .dwar file.
A DataWarrior Template file contains the complete configuration of views and filters, as they have been, when the Template file was saved. If you want to store the current state of views and filters of an open DataWarrior window in order to possibly restore it later with the same or another dataset, you may save a Template file. To re-apply a formerly stored template to an open DataWarrior window, choose from the menu. You may then select either a .dwat or a .dwar file. In both cases the template will be read from the file and all views and filters will be replaced by new ones as defined in the file.
DataWarrior version 4.0 and above support recording, editing and replaying entire workflows. These may be stored as part of a native DataWarrior file or can be exported into a dedicated macro file. Similar to templates you may run a macro by opening a dedicated macro file with from the menu.
By creating a self organized map (SOM) DataWarrior can position chemical molecules or other objects on a two dimensional area in a way, that any object's closest neighbours in the plane are those objects that are the most similar ones in the dataset. A calculated SOM is actually a 2-dimensional grid of reference vectors of which everyone resembles one or more molecules/objects of the dataset. Once these reference vectors are calculated, the objects are one by one assigned to that reference vector, which is the most similar to the object. If one intends to map a second set of objects from an external file to a previously calculated SOM, then these vectors must have be available. For that reason they can be saved as SOM file, which can later be used to map external objects, which is effectively creating compatible 2-dimensional object coordinates.
A .dwaq file or Query File does not contain any data. It rather contains a database query that is performed when the file is opened. Moreover, it may contain the template information needed to construct certain views and filter settings after the query result data has been retrieved. Query files are used if data in a database is frequently changing or to confidentially communicate new results, e.g. via e-mail. To open a .dwaq file, select from the menu, or double-click the icon representing the file.
SD-Files are the de-facto industry standard for exchanging chemical structures and associated
alpha-numerical information. It has been developed and published by Molecular Design Ltd. (MDL).
The version most widely used is version 2, which has limited support for stereo chemistry:
A so-called chiral flag defines for the entire molecule, whether it is a racemate of a mixture of enantiomers.
With version 2 SD-files it is not possible to define epimers, mixtures of diastereomers, etc.
In order to tackle the deficiencies, MDL introduced an updated concept
From the DataWarrior reads the entire content of the SD-File, displays rows in the Table View, creates default 2D- and 3D-Views, a Structure View and generates a structure index (FragFp descriptor), which is needed internally for some structure related tasks. While the indexing process is underway and its progress bar is visible in the status area, these functions e.g. sub-structure search are not yet available.menu, select and use the dialog window to select the SD-file(s) (the file extension is .sdf) to import.
TAB delimited and comma separated text files ('.txt' and '.csv') are among the most portable file formats because they can be created by many programs. In these text files each line represents a row and all fields within the row are separated by TABs or commas. In case one or more columns of the text file contains chemical structures in SMILES format, then DataWarrior automatically recognizes them and creates an additional column with chemical structures for every SMILES containing column. From the menu, select and choose
In the standard DataWarrior installation, the menu contains two submenus with direct access to some example files. The option covers various files with chemical structures and related data, e.g. known drugs, pKa values, bioactive compounds, and other datasets of interest. provides examples that illustrate non-chemistry related aspects of DataWarrior. Depending on the installation, further submenus may provide quick access to files in user defined directories.
If you copy tabular data from any text editor or spreadsheet application, you may paste it directly into DataWarrior. This will open the data as if it were loaded from a text file. By analyzing the data DataWarrior will try to evaluate, whether a header row is present. If it believes that there is none it will generate default column names.
In most cases DataWarrior will correctly predict, whether the clipboard content starts with a header row. If it fails because of insuffucient clues, then one may use one of the options to hint that a row header is present or not.
In the following example some data was selected within a spreadsheet application and then copied to the clipboard with Ctrl-C.
After switching to DataWarrior and after choosing (Ctrl-V) DataWarrior responds by displaying the clipboard's content in a new window. It has recognized the column named "Smiles" to contain valid SMILES codes and automatically created an additional column with chemical structures from the SMILES strings. It also created two graphical default views and, since the data now contains chemical structures, it also created a dedicated structure view.
Depending on your particular version DataWarrior is able to directly retrieve data from a variety of databases. These include:
At Actelion the following additional database options exist:
Continue with Main Views...