DataWarrior User Manual

Chemistry in 3D


Biological properties of a chemical substance depend largely on its 3-dimensional structure, i.e. on the interaction potential of its atoms, their geometrical orientation and on the flexibility of the molecule. Typically, a molecule has not one but many low energy conformers and to understand the biological potential of a compound one needs to investigate its conformer structures in detail. The Flexophore descriptor was designed to cover all representative conformers of a molecule and to even consider its flexibility. Calculating similarities between molecules using the Flexophore is easy and allows to detect molecules whose conformers have a high potential to interact with a target protein in a similar way. Nevertheless, it doesn't reveal any insights into the 3-dimensional nature of a compounds.

DataWarrior has a conformer generator and forcefield based energy minimization built in, which together allow generating diverse and low energy conformers, which can be explored within DataWarrior, can be exported to be used in other software packages, or can be even rendered to yield photo-realistic images. Within DataWarrior there are three views, that may show conformers: First, the detail area automatically includes a 3D-molecule viewer, if a structure column has associated conformer information. Second, the form view may contain a form item that shows conformers and, third, the conformer explorer, of course, shows conformers.


Generating Conformers

This functionality creates one or multiple conformers for every structure within a DataWarrior document. Various algorithms for the conformer generation and subsequent energy minimization are available. To create conformers for the current data window's molecules, select Generate Conformers... from the Chemistry menu. A dialog allows to define various options for the conformer generator.

Conformer Generator Options

Structure column: A column containing chemical structures for which to generate the conformers.

Algorithm: Most of the algorithms, which can be selected here, share the same general procedure to generate conformers, a rule based assembly of self organized rigid fragments: First DataWarrior locates all freely rotatable bonds of the molecule, which are not part of a ring. By cutting any of these bonds a set of more or less rigid fragments is obtained. For any of these fragments a self organization based algorithm creates one or multiple fragment conformers. Then, the local environment of every rotatable bond is classified. The rotatable bond class is then used to assign preferred torsion angles to the bond. Bond environment specific torsion angles are taken from a precompiled list from crystallographic data. They are also associated with likelyhoods reflecting how often a particular torsion angle is found in x-ray data.
The fragment conformers are then reassembled using preferred torsions for every rotatable bond. A collision check determines, whether the combination of torsions causes any atom collisions. If no collision occurs, the conformer is accepted and a new permutation is chosen. Otherwise, the algorithm creates a rule about a torsion combination, which leads to a collision. These rules are considered, when rotatable bonds and torsion angles are selected.
Potentially, the number of constructable conformers may be very high, depending on the number of rotatable bonds, the number of torsions per bond and the number of self-organized fragment conformers. Therefore, a selectable algorithm prioritizes how torsion angles are permutated, how atom collisions are handled and to which extend likely torsions are preferred:

  • Adaptive collision avoidance, low energy bias: This strategy starts and works like the low energy biased random strategy until a set of torsion angles causes atom collisions. Then, for every rotatable bond is determined to which extend its current rotation state contributes to atom collisions. With a weighted random approach one of the rotatable bonds is chosen to be modified next, such that the likelyhood for the next conformer is high to escape the collision.
  • Systematic, low energy bias: The starting point for this algorithm is that conformer, which uses for any degree of freedom the most likely option, i.e. the torsion angle for every rotatable bond and the best scoring fragment conformer where we have multiple choices. For the next conformer only that degree of freedom is changed, which goes along with the smallest the drop in overall likelyhood. This way, the most likely conformers are produced first, but the initial diversity may not be very high.
  • Random, low energy bias: This strategy randomly selects for every new conformer a new set of torsions and fragments. However, a weighted random method is used giving more likely torsion angles and better scoring fragments a higher chance of being selected that the less likely ones. This is a well ballanced strategy leading to diverse low energy conformers.
  • Pure random: The degrees of freedom are selected randomly neglecting any likelyhoods. This produces the most diverse conformers, but not necessarily low energy ones.
  • Self organized: This algorithm does not use the general procedure described above. It applies a self organization approach to the entire molecule. For that all atoms are initialized with random coordinates. The a list of constraints is determined as follows: Distance constrainst define preferred distances between any two atoms. Plane constraints group atoms, which should share the same plane. Other constraints handle preferred torsions, stereochemistry and atoms on a straight line. In a kind of minimization procedure constraints are randomly picked and their atoms relocated in space to better meet the constraint. This algorithm works best with highly constrained, i.e. rigid structures like bridged ring systems.

Algorithm: Rule based assembled or self-organized conformers like those created by the above algorithms may still suffer from angle strains, slight atom collisions or suboptimal torsions, because the local environment of a particular molecule may not be well represented by more general rules that were used for the construction. In order to minimize strains and energy these conformers can be optimized by a forcefield:

  • MMFF94 forcefield: The Merck Molecular Force Field 94 is a widely used and well known forcefield based on the MM3 forcefield. It is parameterized to be applicable to a wide range of organic compounds. The implementation that DataWarrior uses was ported from the RD-Kit to Java and validated by Daniel Bergmann and Paolo Tosco, who earlier had developed the MMFF94 implementation in C++ for the RD-Kit as well.
  • Actelion forcefield: This forcefield is based on the MM2 forcefield. It is also universally applicable and mainly used for in-house purposes at Actelion.
  • Write into file: When this option is selected, generated conformers are exported into a compound file rather than added to the current dataset.

    File type: The most widely supported format is probably the SD-file version 2.

    Max. conformer count: The number of generated conformers will be limited to the number defined here.

    Remove small fragments: If this option is selected, then all unconnected fragments except for the largest one are removed from the molecule before conformers are generated. This is particularly advisable, if a forcefield minimization is used, which may potentially take a very long time to optimize relative positions of non connected fragments.


    Exploring Conformers of a Molecule

    DataWarrior has a built-in conformer explorer that allows to inspect up to nine conformers of the same molecule in parallel. To generate and interactively explore a molecule's conformers select Explore conformers of 'Structure'... from the popup menu, which appears when clicking the right mouse button on top of any structure or marker within any main view. The conformer explorer window may open with a small delay, because one set of conformers is generated immediately.

    Conformer Explorer

    The conformer explorer shows up to nine different conformers, which were created using the collision avoidance strategy described above with subsequent MMFF94 forcefield minimization. Conformers can be rotated and zoomed with the left and right mouse buttons, respectively. A click on the right mouse button opens a popup menu that allows to change render modes from ball and sticks to a few others. The Measurements menu allows to measure atom distances, bond angles or torsion angles.

    To freshly generate new conformers within the conformer explorer you may select a desired algorithm and press the Create Conformers button at the bottom of the window. The available algorithms are explained in some detail in previous section.


    Photorealistic Rendering

    If you select the Photo-Realistic Image... item from the popup menu of any of DataWarrior's 3D molecule viewers, then a dialog opens that lets you calculate a photo-realistic image using the professional quality ray-tracer Sunflow, which is part of the DataWarrior installation. The dialog lets you choose various options.

    • Image size: This is the size of the created image in pixel.
    • Environment: This option contains some predefined lighting, color and material conditions as bright sun and black background.
    • Move and zoom to fill image: If this option is selected, the molecule is rotated automatically to expose its largest possible silhouette to the camera. Furthermore, it is zoomed and moved to just about fill the image. If this option is not selected, DataWarrior tries to mimic the perspective and zoom state of the conformer panel. Since the rendering concepts of the ray-tracer and the conformer viewer are different, the original perspective will be similar, but not necessarily exactly reproduced.
    As soon as the render dialog is closed, a new window opens, in which all available processor cores are busy to render the molecule. Once the image is completed one may save it to a file or copy it to the clipboard by selecting the appropriate option from a popup menu. The following picture shows an example taken from the Crystallography Open Database.

    Photorealistic image of COD entry 2230709,
    catena-Poly[[(2,2'-dimethyl-4,4'-bi-1,3-thiazole-N,N')cadmium]-di-bromido]