SARpy (SAR in python) to model biological endpoints

What is this about?

SAR (Structure-Activity Relationships) models typically make use of rules, created by experts, to check for the presence of some specific structural fragments, called Structural Alerts (SA), already known to be responsible for the property under investigation.

SARpy (SAR in python) is a new ad hoc approach to automatically generate SAR models by finding the relevant rules from data, without any a priori knowledge. The algorithm generates substructures of arbitrary complexity, and automatically selects the fragments to become SAs on the basis of their prediction performance on a training set.

Making a model requires to give SARpy a training set of molecular structures, expressed in the SMILES notation, with their experimental activity binary labels. Automatically SARpy generates rules in three steps:

The obtained model should be checked on an external test set to validate it.

Using the model requires applying the rules to the unknown molecule to produce the class label: the model tags the compound as toxic when one or more are present, and as non-toxic if no SA is found. Dually, the user can ask SARpy to generate rules related to non-toxic substances, and use them to better assign molecules to the non-toxic class.

Distribution

SARpy is distributed as an open source application under the GNU GLPv3; it requires Python 2.7.xx.

There is a manual describing both versions.

The user interested in using the published models developed with SARpy can refer to the VEGA website www.vega-qsar.eu/

Applications

SARpy has been usee to create the published models for AMES test mutagenicity, ready biodegradability, and cancerogenicity.

The relevant publications are: