DataWarrior Features

General

  • Interactive data visualization and analysis
  • Built-in chemical intelligence
  • Realtime data filtering on alphanumerical and chemical criteria
  • Prediction of molecular properties from the chemical structure
  • Dedicated cheminformatics modules support drug discovery
  • Installation contains user manual and many example files
  • Runs on Linux, Macintosh (with retina support) and Windows
  • Computationally demanding algorithms use all processor cores

Files

  • Reads and writes its own native text-based file formats
  • Imports TAB delimited txt, csv, sdf (version 2 & 3), interprets SMILES codes
  • Imports from clipboard content
  • Exports TAB delimited txt, sdf (version 2 & 3)
  • Flexible file merge and append options

Views

  • Table view with columns containing alphanumerical or chemical information
  • Versatile graphical 2D-view for scatter plots, bar & pie charts, box plots, ...
  • Graphical freely rotatable 3D-view for scatter plots & bar charts
  • Dedicated chemical structure view with optional alphanumerical data
  • Form based view with form designer and form based data editing
  • Multiple views are shown side by side or are stacked on top of each other
  • Views can be highly customized to reveal multiple dimensions of the data

Filter Types

  • Text filters with support for regular expressions
  • Data range sliders for numerical and date columns
  • Category filters with individually selectable categories
  • Category browser to manually or automatically switch categories
  • Substructure filter with flexiple query features and real-time filtering
  • Filtering on various shades of compound similarity
  • Special filters screen against lists of compounds or substructures
  • Reaction filtering by similarity, reaction sub-structure, and retrons
  • Filter animations allow for dynamic graphical views

Data Analysis

  • Data pivoting and reverse pivoting
  • Calculation of new column from custom expression
  • Principal Component Analysis
  • Self Organizing Maps
  • T-distributed stochastic neighbor embedding (t-SNE)
  • Uniform manifold approximation and projection (UMAP)
  • Calculation and display of statistical parameters
  • Creation and manipulation of persistent row lists for many purposes

Cheminformatics

  • Fast substructure & compound similarity filtering (see descriptors)
  • Calculation of physico-chemical properties like MW, logP, logS, tPSA
  • Calculation of druglikeness, flexibility, complexity, atom/ring counts, etc.
  • Detection of toxicity risk factor for four toxicity categories
  • Enumeration of combinatorial libraries with predefined or custom design
  • De-novo structure creation using evolutionary algorithm with flexible fitness criteria
  • Principal component analysis and self organizing maps on chemical descriptors
  • 2-dimensional scaling algorithm using chemical and pharmacophore similarities
  • Automatic and semi-automatic creation of structure-activity-relationship tables
  • Scaffold analysis (ring systems or Murcko scaffolds)
  • Search & Replace functionality on chemical structure columns
  • Comparison of two structure files to reveal overlap of similar structures
  • 2D atom coordinate generation with unified scaffold orientation
  • Activity cliff analysis
  • Generation of drug-like or natural-product like random molecules
  • Diverse subset selection and compound clustering
  • Consistently uses MDL's concept of Enhanced Stereo Recognition
  • Generation of conformers with MMFF94 energy minimization
  • Conformation explorer with raytracer for photo-realistic molecule images
  • Comprehensive support for chemical reactions, reads Biovia databases & reaction SMILES
  • Machine learning using chemical descriptors: Applicability check & missing value prediction
  • PheSA superpositioning of conformers (PHarmacophore Enhanced Shape Alignment)
  • Protein-ligand docking with pose scoring and interactive visualization

Descriptors

  • FragFp: fragment dictionary based binary fingerprint (analog MDL keys)
  • PathFp: linear atom strands normalized, hashed, binary (analog Daylight)
  • SphereFp: canonical circular fragments, hashed, binary
  • SkelSpheres: canonical circular fragments & skeletons, stereo perception, hashed, counts
  • OrgFunctions: synthetically accessible organic functionality in similarity tree
  • Flexophore: pharmacophore similarity considering diverse conformers and PDB statistics
  • RxnFp: reaction similarity, reaction center similarity, reaction periphery similarity

Databases

  • All chemical structures in Wikipedia can be downloaded and searched locally.
  • Fast structure and target search in ChEMBL database with result retrieval.
  • Structure, price and package size search in Enamine building block database.
  • Substructure/similarity/Author/Year search in Crystallography Open Database (COD) and retrieval of 3D-crystal structures.
  • Direct access to Oracle, PostgreSQL, MySQL, SQL-Server using custom SQL queries.
  • Customized search and retrieval from any database using self developed plugins.

Automation

  • (Almost) any sequence of tasks can be recorded as macro.
  • Macros can be created or edited interactively without scripting.
  • Macros allow to share or repeat complex tasks on updated or different data.