DataWarrior User Manual

General Concepts


DataWarrior is an interactive, chemistry-aware multi-purpose data visualization and analysis program. It works on any kind of tabular data assuming rows to contain objects or cases and columns to contain associated properties and values. DataWarrior provides interactive views to visualize data, to discover correlations and to extract hidden knowledge from large data sets. Data filters allow focusing on specific data subsets in a dynamic way. Chemical intelligence is woven into the program to work with chemical structures as easily as with numerical values. Data can be filtered on structural motives, views are chemistry aware, molecule properties can be predicted from chemical structures, and specialized cheminformatics methods explore the relationship between chemical structure and measured properties. DataWarrior supports multiple files types and allows merging data from files with data from the clipboard or from databases. DataWarrior is freely available for Linux, Macintosh and Windows and its complete source code is accessible under the GNU Public License.
This section explains general concepts and explains important keywords.


Main View Area and Detail Area

The Main View Area shows one or multiple Main Views, everyone displaying data rows either as data table, in a 2- or 3-dimensional interactive chart, or in another type of view. Typically, multiple Main Views show the same (sub-)set of data, but display the data different ways. Multiple Main Views can coexist either stacked on top of each other or side by side.

Typical DataWarrior window with relevant areas shown in distinct colors.

The + button on any Main View's title area allows to create a new Main View or to create a high resultion picture of the view. Other buttons let you configure, maximize, or close the view. Main Views can be moved to another position by dragging their header area onto the center or any edge of another view.

While Main Views show many rows at a time, the Detail Area aims to display all information of one particular row. When moving the mouse within a Main View, then the Detail Area is constantly updated to reveal the details of the row or marker underneath the mouse pointer.


Table View

The Table View serves as a direct window to the data contained in a DataWarrior file. Through it rows can be sorted, data can be modified, and display options for individual columns can be changed. The Table View cannot be removed nor can there be a second Table View. Columns may contain and display non-alpha-numerical content like molecules or reactions. The visible column order may be changed. Columns can be added, grouped, hidden, or removed. A column specific popup menu is accessible with a right mouse click on the respective column header. More...


2D-View

This view visualizes table data in a 2-dimensional chart, e.g. as a scatter plot, bar or pie chart, or box plot. The view type and other view related properties can be chosen from the popup menu that opens when the title area's wrench button is clicked. Colors, marker shapes and sizes, connection lines, labels, font sizes, are customizable. The xy button lets you assign columns to axes and axis-wise zoom into the data.

The view is dynamically linked with the Detail Area. When the mouse is moved over a marker or bar/pie fraction, then the corresponding row data is displayed in the Detail Area. A right mouse click now opens a popup menu giving access to functionality related to this particular row. More...


3D-View

The 3D-View is similar to the 2D-View, however, it displays markers or bars in a freely rotatable three dimensional cartesian coordinate system. The left mouse button can be used to select row markers or to change the Reference Row. Pressing and dragging the right mouse button rotates the view. Again the wrench and xyz give access to view related properties and axis assignments. More...


Form View

Form Views present your data like a book with one page per data row. A Design Mode allows to custom tailor the form layout, i.e. to define, which column data is displyed where on the form. An Edit Mode allows to directly modify the displayed data. Form Views are particularly useful, if rows contain conformers or some of their cells reference html or picture detail information. Neither conformers nor referenced detail information is displayed in the Table View. Only the detail area or form views show this information. More...


Filters

Filters are used to hide rows from all main views to temporarily focus on subset of the data. Various filter kinds are available, but they all share certain characteristics. Usually, all filters apply to all views, which means that if a row is hidden by one or more filters, then it is not visible in any view. Views, however, can be configured to ignore filters. Usually, a filter is associated with one data column. The kind of data determines, which filter types can be used for a column. The effect of a filter can be inverted by selecting its Invert button. A Disable button allows to temporarily switch off the filtering effect without removing the filter's settings. Some filters possess animation options that can be set to automatically change the visible data subset over time in a stepwise or smooth fashion. The Status Area at the bottom of any DataWarrior window always displays the number of visible rows, selected rows, and all rows.

These filter types exist:

Sliders filter rows based on numerical or date values. They define an allowed or forbidden data range by defining minimum and maximum values. Pressing Ctrl while dragging allows fine-tuning. After clicking a numerical value it can be edited with the keyboard. Red bars indicate that the corresponding column contains empty values, which the slider's numerical range cannot include.

Text filters may be used to hide rows, which match or don't match a query in a certain way. This includes a sub-string search, an exact match, or a regular expression search. If multiple comma separated text snippets are given, then all rows are hidden, which match none of the individual query strings.

Category filters hide certain categories of the data. Rows are hidden if they don't belong to any of the checked categories. If some rows belong to multiple categories at the same time, then a <multiple categories> option is offered. If this is checked then rows belonging into multiple categories are kept from hiding.

A Category Browser allows to quickly browse through the different categories within a dataset. This can be done stepwise through clicking the 'previous' and 'next' buttons or more rapidly by dragging the slider. Using the Animate button Category Browsers can be configured to automatically cycle through all categories one by one.

Row list filters are used to hide all members of a selected hitlist or all rows that do not belong to that hitlist.

Chemical structure filters may be used to show or hide compounds with a given substructure or that are similar to a given molecule. More...

Two specialized filters allow an immediate filtering based on multiple structures at once. The SSS list filter run a parallel sub-structure-search on multiple query structures and hides rows, whose molecules don't contain any of the listed query fragments. Fragment lists can be loaded from file, edited manually or combined by drag and drop.

A Similarity structure list filter may be used to hide rows, whose molecules are not similar to any of the listed structures. Any currently available descriptor may serve as similarity criterion.

Similar to the structure filter, the reaction filter offers similarity and sub-structure based reaction filtering. Once a query reaction is drawn, rows may independently be filtered on similarities of the reaction center and on the periphery. Both modes, reaction substructure and reaction similarity filtering require reactions to be mapped, which means that all reactant atoms, which also exist on the product side, must be mapped to the respective product atoms. More...

A retron is a structural pattern, i.e. a substructure on the product side, that is created within the reaction. This filter offers an easy way to quickly search a reaction collection. It does not need any atom mapping, neither in the query nor in searched reaction collection. More...

Per default any visible graphical view also serves as implicit filter. When the xyz button is clicked, then a popup appears that allows assigning data columns to view axes and zooming into particular axes. When individual row markers are zoomed out of the view, they disappear also from other views. This visibility influence on other views can be changed in the view's General View Options. When a graphical view gets hidden behind other views, then its filtering effect is disabled until the view gets visible again.

One of all graphical views may be chosen to serve as an explicit filter. This view then behaves differently, when rows are lasso-surrounded with the mouse. Instead of globally selecting rows, the local view shows these rows in purple color. This causes all non-purple rows to disappear from all other views. This offers a quick way to flexibly show selectable data subsets in all other views.

Depending on the data, DataWarrior shows a reasonable set of default filters. One can always add more filters with Edit->New Filter... or remove filters by clicking the close button. Filters can be re-arranged by grabbing its name and dragging it to a new position.

Multiple active filters accumulate their affects, which means that rows are hidden if they don't match the filter condition of at least one filter.


Visible Rows

The Visible Rows are those rows, which are currently visible in the main view area. Typically the set of Visible Rows is a subset of all rows of the dataset. Its number is always displayed in the status area. The visibility of rows is primarily affected by filter settings. However, a visible main view may also hide some rows, e.g. through zooming or if the data row has empty values. This usually hides effected rows from other visible views as well, unless the hiding view is configured to not contribute to global row hiding. Main views, which are hidden behind other views, don't contribute to global row hiding. Thus, the number of Visible Rows may change when main views change visibility.


Selected Rows

All main views except for the Form View allow to select or deselect rows with the mouse. The row selection is shared among all views. Thus, when you change the row selection in one view, the selection in all other views is updated simultaniously. The number of (visible) selected rows is always displayed in the status area.


Reference Row

With a mouse click one row can be defined as the Reference Row. As with the Selected Rows, the Reference Row is shared among all views. Most views mark the Reference Row with a red frame or show it in red color. The Form View always shows the Reference Row. Various view features use the Reference Row to compare all other rows to, e.g. when using color to show similarities, when compound similarity is assigned to an axis, or when constructing neighbourship graphs, etc. Views may be configured to automatically zoom and move the view such that the Reference Row is displayed in the view's center.


Row Lists

A Row List is a named subset of all rows. In DataWarrior Row Lists may be used for various purposes, e.g. to filter rows, to affect marker sizes, shapes or colors, to highlight a group of rows, to store a current row selection, to define a data subset for a data analysis method, etc. Row Lists are automatically saved within DataWarrior files. The List menu contains the functionality to create and manipulate Row Lists. More...


Templates

A Template comprises all DataWarrior display settings. It includes the information about existing views and their orientation, all view specific settings like marker colors, font sizes, shown labels etc. It contains information about active filters and their settings, column aliases and about the layout of existing form views. Templates are part of a native DataWarrior file and may also be stored as a stand-alone file to be later re-applied to another similar or updated data set. Templates may also be associated with a database query, such that query results are immediately shown in a predefined way.


Tasks and Macros

DataWarrior is primarily a program to interactively work with data. Every single interactively performed action is called a task. Tasks may be simple ones like opening a file and changing a filter setting or they may be more complicated and require defining various parameters e.g. to launch a 2-dimensional scaling algorithm. By sequentially performing multiple tasks in a row one may solve rather complex data analysis problems.

To support automation and reproducibility DataWarrior allows recording interactive task sequences as macros, which can be saved to file, edited, and executed at a later time on different or updated data or by a different person. More...


Continue with Loading Data...