DataWarrior User Manual

Chemistry in 3D

Biological properties of a chemical substance depend largely on its 3-dimensional structure, i.e. on the interaction potential of its atoms, their geometrical orientation and on the flexibility of the molecule. Typically, a molecule has not one but many low energy conformers and to understand the biological potential of a compound one needs to investigate its conformer structures in detail. The Flexophore descriptor was designed to cover all representative conformers of a molecule and to even consider its flexibility. Calculating similarities between molecules using the Flexophore is easy and allows to detect molecules whose conformers have a high potential to interact with a target protein in a similar way. Nevertheless, it doesn't reveal any insights into the 3-dimensional nature of a compounds.

DataWarrior has a conformer generator and forcefield based energy minimization built in, which together allow generating diverse and low energy conformers, which can be explored within DataWarrior, can be exported to be used in other software packages, or can be even rendered to yield photo-realistic images. Within DataWarrior there are three views, that may show conformers: First, the detail area automatically includes a 3D-molecule viewer, if a structure column has associated conformer information. Second, the form view may contain a form item that shows conformers and, third, the conformer explorer, of course, shows conformers.

Generating Conformers

This functionality creates one or multiple conformers for every structure within a DataWarrior document. Various algorithms for the conformer generation and subsequent energy minimization are available. To create conformers for the current data window's molecules, select Generate Conformers... from the Chemistry menu. A dialog allows to define options for the conformer generator.

Conformer Generator Options

Structure column: A column containing chemical structures for which to generate the conformers.

Algorithm: Most of the algorithms, which can be selected here, share the same general procedure to generate conformers, a rule based assembly of self organized rigid fragments:
First DataWarrior locates all freely rotatable bonds of the molecule, which are not part of a ring. By cutting any of these bonds a set of more or less rigid fragments is obtained. For any of these fragments a self organization based algorithm creates one or multiple fragment conformers.
In a second step the local environment of every rotatable bond is used to assign the bond to a specific bond class. Using bond classes DataWarrior assigns preferred torsion angles to any rotatable bond. These are taken from experimental, i.e. crystallographic data. Torsion angle likelyhoods are used in addition, which reflect how often a particular torsion angle is found in x-ray data.
In a third step, the fragments are then assembled using the preferred torsions for every rotatable bond. A collision check determines, whether the combination of torsions causes any atom collisions. If no collision occurs, the conformer is accepted and a new permutation is chosen. Otherwise, the algorithm creates a rule about a torsion combination, which leads to a collision. These rules are considered when constructing new conformers.
Potentially, the number of constructable conformers may be very high, depending on the number of rotatable bonds, the number of torsions per bond and the number of self-organized fragment conformers. Therefore, a selectable algorithm prioritizes how torsion angles are permutated, how atom collisions are handled and to which extend likely torsions are preferred:

  • Random, low energy bias: This strategy randomly selects for every new conformer a new set of torsions and fragments. However, a weighted random method is used giving more likely torsion angles and better scoring fragments a higher chance of being selected that the less likely ones. This is a well ballanced strategy leading to diverse low energy conformers.
  • Pure random: The degrees of freedom are selected randomly neglecting any likelyhoods. This produces the most diverse conformers, but not necessarily low energy ones.
  • Adaptive collision avoidance, low energy bias: This strategy starts and works like the low energy biased random strategy until a set of torsion angles causes atom collisions. Then, for every rotatable bond is determined to which extend its current rotation state contributes to atom collisions. With a weighted random approach one of the rotatable bonds is chosen to be modified next, such that the likelyhood for the next conformer is high to escape the collision.
  • Systematic, low energy bias: The starting point for this algorithm is that conformer, which uses for any degree of freedom the most likely option, i.e. the torsion angle for every rotatable bond and the best scoring fragment conformer where we have multiple choices. For the next conformer only that degree of freedom is changed, which goes along with the smallest the drop in overall likelyhood. This way, the most likely conformers are produced first, but the initial diversity may not be very high.
  • Self organized: This algorithm does not use the general procedure described above. It applies a self organization approach to the entire molecule. For that all atoms are initialized with random coordinates. The a list of constraints is determined as follows: Distance constrainst define preferred distances between any two atoms. Plane constraints group atoms, which should share the same plane. Other constraints handle preferred torsions, stereochemistry and atoms on a straight line. In a kind of minimization procedure constraints are randomly picked and their atoms relocated in space to better meet the constraint. This algorithm works best with highly constrained, i.e. rigid structures like bridged ring systems.

Initial torsion: Since using crystallographic data derived torsion angles introduces a certain bias and in order to mimic other conformer generator's construction principles, DataWarrior's conformer generator may also forgo using experimental torsion data. Instead it may use six torsion angles per rotatable bond with 60 degree steps. In this case all six torsion angles are considered equally likely. Of course, conformers constructed this way are often far off the local energy minimum and probably should be energy minimized afterwards.

Minimize energy: Rule based assembled or self-organized conformers like those created by the above algorithms may still suffer from angle strains, slight atom collisions or suboptimal torsions, because the local environment of a particular molecule may not be well represented by more general rules that were used for the construction. To reduce strain and minimize energies these conformers can be further optimized by applying one of these forcefields:

  • MMFF94s+ forcefield: This is an optimized version of the MMFF94s force field, which adresses known and unrealistic torsion parameterization of the original MMFF94s implementation. The torsion angle analysis and corrections were done by Joel Wahl. A peer-reviewed publication is in process.
  • MMFF94s forcefield: The Merck Molecular Force Field 94 is a widely used and well known forcefield based on the MM3 forcefield. It is parameterized to be applicable to a wide range of organic compounds. The implementation that DataWarrior uses was ported from the RD-Kit to Java and validated by Daniel Bergmann and Paolo Tosco, who earlier had developed the MMFF94 implementation in C++ for the RD-Kit as well.
  • Idorsia forcefield: This forcefield is based on the MM2 forcefield. Its implementation in Java was developed by Joel Freyss. It is also universally applicable and mainly used for in-house purposes at Idorsia.
  • Don't minimize: This option passes through all conformers as they are generated by the construction algorithm.
  • Write into file: When this option is selected, generated conformers are exported into a compound file rather than added to the current dataset.

    File type: The most widely supported format is probably the SD-file version 2, while the most compact file format is certainly a native DataWarrior file.

    Max. conformer count: The number of generated conformers will be limited to the number defined here.

    Remove small fragments: If this option is selected, then all unconnected fragments except for the largest one are removed from the molecule before conformers are generated. This is particularly advisable, if a forcefield minimization is used, which may potentially take a very long time to optimize relative positions of non connected fragments.

    Neutralize remaining fragment: If this option and the Remove small fragmenti> option both are selected, then DataWarrior tries after the removal of all small fragments to neutralize all charged atoms of the remaining fragment through protonation or deprotonation. If quarternary nitrogens cannot be deprotonated, the DataWarrior tries to deprotonate acidic atoms to achieve a neutral overall charge.

    Skip compounds with more than NN stereo isomers: If a molecule contains undefined stereo centers, then the conformer generator randomly constructs stereo isomers. Depending on the purpose, molecules with a high potential number of isomers and therefore an even higher number of representing conformers may pose a problem, e.g. for virtual screening where they cause high computation time and a low propability that found hits really represent the correct stereo isomer. This option allows to just skip this kind of molecules.

    Create proper protonation state: This option is available for selection only, if the ChemAxon pKa-Plugin is installed and DataWarrior was configured to find it. If this option is selected, then the pKa-values of basic and acidic atoms are determined using the ChemAxon method and these atoms are properly protonated or deprotonated to reflect their natural state at the given pH value. If the value in the +- text field is different from '0', then this defines a pH-range. In this case DataWarrior may produce more than one protonation state if the pKa of one or more of the basic or acidic atoms fall into the pKa range.

    Exploring Conformers of a Molecule

    DataWarrior has a built-in conformer explorer that allows to inspect multiple conformers of the same molecule. To open the conformer explorer select Explore conformers of 'Structure'... from the popup menu, which appears when clicking the right mouse button on top of any structure or marker within any main view. Conformers may be shown with a small delay, because one set of conformers is generated immediately. You may use the mouse wheel for zooming and moving conformers within the screen plane. Conformers may be rotated using the right mouse button. A right mouse click on a conformer opens a context sensitive popup menu with options for rendering, molecular surface display and distance and angle measurement.

    Conformer Explorer with superposed conformers

    The bottom panel of the conformer explorer shows some controls to regenerate conformers with different settings, to separate individual conformers, to display molecular surfaces, to save conformers into a file and to define, which atoms are super-positioned. To change these atoms, click the Superpose... button, select the atoms, which shall be superposed, and close the dialog.

    Conformer Explorer with separated conformers and molecular surfaces

    To freshly generate new conformers within the conformer explorer you may select the maximum number of conformers, the algorithm to be used, and a forcefield for the energy minimization. Then press the Generate button. The available algorithms are explained in previous section.

    Photorealistic Rendering

    If you select the Photo-Realistic Image... item from the popup menu of any of DataWarrior's 3D molecule viewers, then a dialog opens that lets you calculate a photo-realistic image using the professional quality ray-tracer Sunflow, which is part of the DataWarrior installation. The dialog lets you choose various options.

    • Image size: This is the size of the created image in pixel.
    • Environment: This option contains some predefined lighting, color and material conditions as bright sun and black background.
    • Move and zoom to fill image: If this option is selected, the molecule is rotated automatically to expose its largest possible silhouette to the camera. Furthermore, it is zoomed and moved to just about fill the image. If this option is not selected, DataWarrior tries to mimic the perspective and zoom state of the conformer panel. Since the rendering concepts of the ray-tracer and the conformer viewer are different, the original perspective will be similar, but not necessarily exactly reproduced.
    As soon as the render dialog is closed, a new window opens, in which all available processor cores are busy to render the molecule. Once the image is completed one may save it to a file or copy it to the clipboard by selecting the appropriate option from a popup menu. The following picture shows an example taken from the Crystallography Open Database.

    Photorealistic image of COD entry 2230709,

    Continue with Accessing Databases...