Java Expression Parser JEP


All common arithmetic operators are supported. Boolean operators are also fully supported. Boolean expressions are evaluated to be either 1 or 0 (true or false respectively).

An Check indicates that the operator can be used with the specific type of variable. Refer to the grammar section for detailed information about operator precedence.

Description Function Double String
Power ^ Check  
Boolean Not ! Check  
Unary Plus, Unary Minus +x, -x Check  
Modulus % Check  
Division / Check  
Multiplication * Check  
Addition, Subtraction +, - Check Check (only +)
Less or Equal, More or Equal <=, >= Check  
Less Than, Greater Than < ,> Check  
Not Equal, Equal !=, == Check Check
Boolean And && Check  
Boolean Or || Check  

Standard Functions

Each of the following alphanumerical standard functions can be applied to objects of the types indicated.

Description Function Double String
Sine sin(x) Check  
Cosine cos(x) Check  
Tangent tan(x) Check  
Arc Sine asin(x) Check  
Arc Cosine acos(x) Check  
Arc Tangent atan(x) Check  
Arc Tangent (with 2 parameters) atan2(y, x) Check  
Hyperbolic Sine sinh(x) Check  
Hyperbolic Cosine cosh(x) Check  
Hyperbolic Tangent tanh(x) Check  
Inverse Hyperbolic Sine asinh(x) Check  
Inverse Hyperbolic Cosine acosh(x) Check  
Inverse Hyperbolic Tangent atanh(x) Check  
Natural Logarithm ln(x) Check  
Logarithm base 10 log(x) Check  
Exponential exp(x) Check  
Absolute Value / Magnitude abs(x) Check  
Integer Value int(x) Check  
Rounded Value round(x, scale) Check  
Random number (between 0 and 1) rand()    
Modulus mod(x, y) = x % y Check  
Square Root sqrt(x) Check  
Sum sum(x, y, z) Check Check
If if(condition, x, y) Check  
String of str(x) Check Check
String length len(s)   Check
String s1 contains s2 (returns 1 or 0) contains(s1, s2)   Check
Binomial coefficients binom(n, i) integers  

Biology And Chemistry Functions

In addition to the standard functions DataWarrior provides a few special functions being useful in the context of DataWarrior, drug discovery and chem- or bio-informatics. These functions may calculate values from alphanumerical data as the standard functions above or the may work on special data types as on chemical structures, reactions or descriptors. Available special functions are listed and explained below.

Description Function
Ligand Efficiency (HTS) ligeff1(ra, conc in μmol/l, structure)
Ligand Efficiency (IC50) ligeff2(ic50 in nmol/l, structure)
Chemical Similarity (A) chemsim(descriptor, idcode)
Chemical Similarity (B) chemsim(descriptor1, descriptor2)
Frequency of Occurance frequency(s, column-name)

The ligeff1() and ligeff2() functions calculate ligand efficiencies as relative free binding energy in kcal/mol per non-H atom. While the first function ligeff1() requires the remaining activity of an HTS result, the second syntax ligeff2() needs IC50 values to work on. Ligand efficiency values are a much more reasonable basis for selecting leads of an HTS campaign than remaining activities, because this avoids the strong bias towards high molecular weight compounds, which is an implicit drawback of selecting those compounds as leads, which have a remaining activity below a certain threshold. Also during lead optimization one should compare target affinities based on ligand efficiencies rather than pure IC50 values.
"For the purposes of HTS follow-up, we recommend considering optimizing the hits or leads with the highest ligand efficiencies rather than the most potent..." (Ref.: A. L. Hopkins et al., Drug Disc. Today, 9 (2004), pp. 430-431).

To give an example: A compound with 30 atoms (400 MW) that binds with a Kd=10 nM has a ligand efficiency value of 0.36 kcal/mol per non-H atom. Another compound with 38 non-H atoms (500 MW) and the same ligand efficiency would have a 100 fold higher activity with Kd=0.106 nM. Let us assume an HTS screening revealed two hit compounds A and B with equal activities of IC50=10 nm, but different molecular weights of 400 and 500, respectively. Based on activities both compounds look equally attractive. Considering, however, that a synthetic introduction of a new group with 8 non-H atoms into compound A would match compound B in terms of weight, but would increase the activity by a factor of 100, if its ligand efficiency value can be maintained, it becomes clear that compound A is the by far more attractive alternative.

The remaining activity ra supplied to the ligeff1() function should be roughly between 0 and 100. The second parameter to this function is the assay concentration conc of the potential inhibitor in μmol/l. The third parameter is the molecular structure from which the number of non-hydrogen-atoms is determined automatically. In order to avoid misinterpretations one should understand the way the ligeff1() function works:
1) ra values below 1.0 are set to 1.0. Those above 99.0 are set to 99.0.
2) IC50 values are calculated from these range limited ra values as ic50 = conc / (100/ra - 1.0)
3) Assuming that the ic50 values are equivalent to the Kd the free energy of the ligand binding is calculated as dG = -RT * ln(ic50) with R=1.986 cal/(mol*K) and T=300K
4) The ligand efficiency is then calculated as ligeff = dG/Nnon-hydrogenatoms.
The consequences from the calculation are: Calculated ic50 values cover 4 log units with those values at the lower and upper end of this range having the highest uncertainty, i.e. the higher the noise of the screening the higher is the uncertainty of ic50 values and also of ligand efficiency values, especially those at the lower and higher end of the scale.
If one can use the second function ligeff2() based on measured IC50 values, one avoids the error potential of the calculation of IC50 values from remaining activities. Ligand efficiency values from this function are therefore much more reliable and only contain the error margin of the original IC50 value.

The chemsim() function calculates similarities between two chemical structures or reactions. This function is available in two variations:

A Use syntax A to calculate the similarities of one column's chemistry objects against one reference compound or reaction. The first parameter of this function defines the kind of similarity to be calculated. It must be the name of a descriptor column from the popup menu. The second parameter is the idcode of the reference structure. The following example calculates the 3D-pharmacophore similarity of the compounds in column Structure to pyridin (gFx@@eJf`@@@ is the idcode of pyridin).

Example: chemsim(PP3DMM2_of_Structure,"gFx@@eJf`@@@")

B Alternatively, you may use syntax B to calculate the similarities between two diffent columns containing chemistry objects. In this case you may need to calculate chemical descriptors first. Be aware that the descriptors supplied to the chemsim() function need to be of the same type. This example calculates the similarities between a Reactant and a Product column.

Example: chemsim(FragFp_of_Reactant,FragFp_of_Product)


Operators are ordered from lowest to highest precedence (from top to bottom).

Start ::= ( Expression ( <EOF> | <SEMI> ) | ( <EOF> | <SEMI> ) )
Expression ::= AssignExpression
| OrExpression
AssignExpression ::= ( Variable <ASSIGN> Expression )
OrExpression ::= AndExpression ( ( <OR> AndExpression ) )*
AndExpression ::= EqualExpression ( ( <AND> EqualExpression ) )*
EqualExpression ::= RelationalExpression ( ( <NE> RelationalExpression ) | ( <EQ> RelationalExpression ) )*
RelationalExpression ::= AdditiveExpression ( ( <LT> AdditiveExpression ) | ( <GT> AdditiveExpression ) | ( <LE> AdditiveExpression ) | ( <GE> AdditiveExpression ) )*
AdditiveExpression ::= MultiplicativeExpression ( ( <PLUS> MultiplicativeExpression ) | ( <MINUS> MultiplicativeExpression ) )*
MultiplicativeExpression ::= UnaryExpression ( ( PowerExpression ) | ( <MUL> UnaryExpression ) | ( <DOT> UnaryExpression ) | ( <CROSS> UnaryExpression ) | ( <DIV> UnaryExpression ) | ( <MOD> UnaryExpression ) *
UnaryExpression ::= ( <PLUS> UnaryExpression )
| ( <MINUS> UnaryExpression )
| ( <NOT> UnaryExpression )
| PowerExpression
PowerExpression ::= UnaryExpressionNotPlusMinus ( ( <POWER> UnaryExpression ) )?
UnaryExpressionNotPlusMinus ::= AnyConstant
| ( Function | Variable )
| <LRND> Expression <RRND>
| ListExpression
ListExpression ::= ( <LSQ> Expression ( <COMMA> Expression )* <RSQ> )
Variable ::= ( Identifier )
Function ::= ( Identifier <LRND> ArgumentList <RRND> )
ArgumentList ::= ( Expression ( <COMMA> Expression )* )?
Identifier ::= ( <IDENTIFIER1> | <IDENTIFIER2> )
AnyConstant ::= ( <STRING_LITERAL> | RealConstant )