DataWarrior User Manual

General Concepts


DataWarrior is an interactive, chemistry-aware multi-purpose data visualization and analysis program. It works on any kind of tabular data assuming rows to contain objects or cases and columns to contain associated properties and values. DataWarrior provides interactive views to visualize data, to discover correlations and to extract hidden knowledge from large data sets. Data filters allow focusing on specific data subsets in a dynamic way. Chemical intelligence is woven into the program to work with chemical structures as easily as with numerical values. Data can be filtered on structural motives, views are chemistry aware, molecule properties can be predicted from chemical structures, and specialized cheminformatics methods explore the relationship between chemical structure and measured properties. DataWarrior supports multiple files types and allows merging data from files with data from the clipboard or from databases. DataWarrior is freely available for Linux, Macintosh and Windows and its complete source code is downloadable under the GNU Public License.
This section explains general concepts and explains important keywords.


Main View Area and Detail Area

The Main View Area shows one or multiple Main Views, everyone displaying data rows either as data table, in a 2- or 3-dimensional interactive chart, or in another kind of view. Typically, multiple Main Views show the same (sub-)set of data, but display the data in different ways. Multiple Main Views can coexist either stacked on top of each other or side by side.

Individual Main Views can be maximized or closed by clicking the respective buttons on the view's header area. A right mouse click within the header area opens a popup menu allowing to create new views or to create pictures of the view. Main Views can be moved to another position by dragging its header area to the middle or any edge of another view.

While Main Views show multiple rows, the Detail Area displays the content of one row in detail. When moving the mouse over a Main View the Detail Area is constantly updated to reveal the details of the row or marker underneath the mouse pointer.


Table View

The Table View serves as a direct window to the data contained in a DataWarrior file. Through it rows can be sorted, data of individual cells can be modified, and display options for individual columns can be changed. The Table View cannot be removed nor can there be more than one Table View. Columns may display non-alpha-numerical content like molecules or reactions. The column order may be changed and less important columns may be hidden. A column specific popup menu is accessible from the column header. More...


2D-View

This view visualizes table data in a 2-dimensional chart, e.g. as a scatter plot, bar or pie chart, box plot, etc. It is dynamically linked with the Detail Area such that a mouse movement above the view updates the row's details according to the mouse position. Marker colors, shapes, and sizes are customizable and may depend on a column of your data. A right mouse click inside of the view reveals a popup menu allowing to customize view settings. The xyz button in the view's header area opens a popup with controls for assigning data columns and to dynamically zoom into each axis. More...


3D-View

The 3D-View is similar to the 2D-View, however, it displays markers or bars in a three dimensional cartesian coordinate system. The left mouse button can be used to select row markers or to change the Current Row. Pressing and dragging the right mouse button over empty space rotates the view. Pressing the same mouse button over a marker opens a popup menu for changing view specific settings. More...


Form View

Form Views present your data like a book with one page per data row. A Design Mode allows to custom tailor the form layout and an Edit Mode allows to directly manipulate the data shown. Form Views are particularly useful, if some table cells reference html or picture detail information. Such detail information is not shown in the Table View. Instead the Table View displays a small number in a colored square indicating that more information is available. More...


Filters

Filters may hide rows from main views to temporarily focus on the remaining visible rows. Various filter kinds are available, but they all share certain characteristics. Filters always apply to all views, i.e. hide the same rows from all views at the same time. If a row is excluded by one or more filters, then it is not visible in any view. Usually, a filter is associated with one column of the data and the kind of data within a column determines, which filter types can be created and used for this column. The effect of a filter can be inverted by selecting the Invert button. A Disable button allows to temporarily switch off the filtering effect without removing the filter's settings. The status panel displays the number of visible rows, of the selected rows, and of all rows.

Available filter types are:

Sliders filter rows based on numerical or date values. They define an allowed or forbidden data range by defining minimum and maximum values. Pressing the Ctrl key while dragging the slider caps reduces sensitivity by a factor of 20 to allow fine-tuning.

Text filters may be used to hide rows, which match or don't match a query in a certain way. This includes a sub-string search, an exact match, or a regular expression search. If multiple comma separated text snippets are given, then all rows are hidden, which match none of the individual query strings.

Category filters hide certain categories of the data. Rows are hidden if they don't belong to any of the checked categories. If some rows belong to multiple categories at the same time, then a <multiple categories> option is offered. If this is checked then rows belonging into multiple categories are kept from hiding.

A Category Browser allows to quickly browse through the different categories within a dataset. This can be done stepwise through clicking the 'previous' and 'next' buttons or more rapidly by dragging the slider. Using the Animate button Category Browsers can be configured to automatically cycle through all categories one by one.

Row list filters are used to hide all members of a selected hitlist or all rows that do not belong to that hitlist.

Chemical structure filters may be used to show or hide compounds with a given substructure or that are similar to a given molecule. More...

Two specialized filters allow an immediate filtering based on multiple structures at once. The SSS list filter run a parallel sub-structure-search on multiple query structures and hides rows, whose molecules don't contain any of the listed query fragments. Fragment lists can be loaded from file, edited manually or combined by drag and drop.

A Similarity structure list filter may be used to hide rows, whose molecules are not similar to any of the listed structures. Any currently available descriptor may serve as similarity criterion.

Per default any visible graphical view also serves as implicit filter. When the xyz button is clicked, then a popup appears that allows assigning data columns to view axes and zooming into particular axes. When individual row markers are zoomed out of the view, they disappear also from other views. This visibility influence on other views can be changed in the view's General View Options. When a graphical view gets hidden behind other views, then its filtering effect is disabled until the view gets visible again.

One graphical view may also be defined to serve as an explicit filter, which changes its row selection behaviour. If rows are chosen in an explicitly filtering view, the global row selection is not changed. Instead, all rows not chosen are hidden from all other views. This way one may quickly restrict the row visibility in other views to a chosen custom set. The chosen rows are drawn in purple, while globally selected rows are still drawn in the usual grayish blue.

Depending on the data, DataWarrior shows a reasonable set of default filters. One can always add more filters or remove filters from the filter panel. Multiple active filters accumulate their affects, which means that rows are hidden if they don't match the filter condition of at least one filter.


Visible Rows

The Visible Rows are those rows, which are currently visible in the main view area. Typically the set of Visible Rows is a subset of all rows of the dataset. Its number is always displayed in the status area. The visibility of rows is primarily affected by filter settings. However, a visible main view may also hide some rows, e.g. through zooming or if the data row has empty values. This usually hides effected rows from other visible views as well, unless the hiding view is configured to not contribute to global row hiding. Main views, which are hidden behind other views, don't contribute to global row hiding. Thus, the number of Visible Rows may change when main views change visibility.


Selected Rows

All main views except for the Form View allow to select or deselect rows with the mouse. The row selection is shared among all views. Thus, when you change the row selection in one view, the selection in all other views is updated simultaniously. The number of (visible) selected rows is always displayed in the status area.


Current Row

With a mouse click one row can be defined as the Current Row. As with the Selected Rows, the Current Row is shared among all views. Most views mark the Current Row with a red frame or show it in red color. The Form View always shows the Current Row. Various display modes use the Current Row as a reference to compare to when using color to show similarities or when constructing neighbourship graphs, etc.


Row Lists

A Row List is a named subset of all rows. Row Lists may be used for various purposes in DataWarrior, e.g. as subject of a filter, to affect marker sizes, shapes or colors, to highlight a group of rows, to permanently store a current selection, to define data subset for a data analysis method, etc. Defined Row Lists are automatically saved within DataWarrior files. The List menu contains functionality to create and manipulate Row Lists. You may for instance create a few Row Lists from different sets of selected rows and afterwards add more rows to some of those lists with the Add Selected To item from the List menu. This way one may manually assign rows to self defined categories. Alternatively, one may add individual rows to a list by pressing the right mouse button on any row in any view and selecting the respective list name from the Add Row To List popup menu item.
The Add Column From Row Lists... menu creates a new table column that contains for every row the names of all lists, which it is a member of. This way list membership can be materialized and used for filtering, view customization, or other purposes.


Templates

A Template comprises all DataWarrior display settings. It includes the information about existing views and their orientation, all view specific settings like marker colors, font sizes, shown labels etc. It contains information about active filters and their settings, column aliases and about the layout of existing form views. Templates are part of a native DataWarrior file and may also be stored as a stand-alone file to be later re-applied to another similar or updated data set. Templates may also be associated with a database query, such that query results are immediately shown in a predefined way.


Continue with Loading Data...