Downloadable Data Files

The DataWarrior installation already comes with various sample data files. These include FDA-approved drugs, compound collections with physico-chemical properties or measured pKa values, kinase ligands, and a few files with non-chemical content to illustrate various program features.

This page contains download links to larger data files, which are not included in the DataWarrior installers, because they would significantly increase its size or because may not be of general interest.


Organic Subset of the Crystallography Open Database (COD)

The recent upgrade of DataWarrior introduced the capability of generating conformers. The algorithm uses a combination of self-organization and rule-based approach. The latter is based on statistical data derived from a large number of 3-dimensional, diverse, organic structures from a crystallographic database. The de-facto standard source for organic, crystallographic molecule structures would be the Cambridge Structural Database (CSD). Its license, however, does not permit to derive and publish geometrical statistical data as part of an open source package. Luckily, there is an alternative, the Crystallography Open Database (COD). This database consists of one CIF file per structure. Saulius Grazulis published Perl scripts to facilitate the conversion of these files with correct stochiometry into sd-files. We used his program codcif2sdf, which itself employs OpenBabel for the final step to deduce bond orders and to create sd file records. Since OpenBabel output does sometimes contain wrong bond orders, charges and unpaired electrons we did some significant automated error correction and plausibility checking. Finally, the process yielded 90155 purely organic structures (103.3 MByte from COD snapshot, July 20, 2015).

DataWarrior with organic subset of the COD


DrugBank Version 5.0.3 (Subset in DataWarrior format)

The DrugBank database is a unique bioinformatics and cheminformatics resource that combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information. The database contains 8250 drug entries including 2016 FDA-approved small molecule drugs, 229 FDA-approved biotech (protein/peptide) drugs, 94 nutraceuticals and over 6000 experimental drugs. This DataWarrior file is a subset of drugbank 5.0.3 downloaded from https://www.drugbank.ca. DrugBank is offered to the public as a freely available resource. Use and re-distribution of the data, in whole or in part, for commercial purposes (including internal use) requires a license. We ask that users who download significant portions of the database cite the DrugBank paper in any resulting publications. Citing DrugBank: Wishart DS, Knox C, Guo AC, Shrivastava S, Hassanali M, Stothard P, Chang Z, Woolsey J. DrugBank: a comprehensive resource for in silico drug discovery and exploration. Nucleic Acids Res. 2006 Jan 1;34(Database issue):D668-72. 16381955.

DataWarrior showing general information about Vitamine E entry of 'DrugBank'