Statistical contribution to the virtual multicriteria optimisation of combinatorial molecules libraries and to the validation and application of QSAR models
This thesis develops an integrated methodology based on the desirability index and QSAR models to virtually optimise molecules. Statistical and algorithmic tools are proposed to search in huge collections of compounds obtained by combinatorial chemistry the most promising ones.
First, once the drugability properties of interest have been precisely defined, QSAR models are developed to mimic the relationship between those optimised properties and chemical descriptors of molecules. The literature on QSAR models is reviewed and the statistical tools to validate the models, analyse their fit and their predictive power are detailed.
Even if a QSAR model has been validated and sounds highly predictive, we emphasise the importance of measuring extrapolation by the definition of its applicability domain and quantifying the prediction error for a given molecule. Indeed, QSAR models are often massively applied to predict drugability properties for libraries of new compounds without taking care of the reliability of each individual prediction.
Then, a desirability index measures the compromise between the multiple estimated drugability properties and allows to rank the molecules in the combinatorial library in preference order. The propagation of the models prediction error on the desirability index is quantified by a confidence interval that can be constructed under general conditions for linear regression, PLS regression or regression tree models. This fulfills an important lack of the desirability index literature that considers it as exact.
Finally, a new efficient algorithm (WEALD) is proposed to virtually screen the combinatorial library and retain the molecule with the highest desirability indexes.
For each explored molecule, it is checked if it belongs to the applicability domain of each QSAR models.
In addition, the uncertainty of the desirability index of each explored molecule is taken into account by gathering molecules that can not be distinguished from the optimal one due to the propagation of QSAR models prediction error. Those molecules do not have a significantly smaller desirability than the optimal molecule found by WEALD.
This constitutes another important improvement in the use of desirability index as a tool to compare solutions in a multicriteria optimisation problem.
This integrated methodology has been developed in the context of lead optimisation and is illustrated on a real combinatorial library provided by Eli Lilly and Company. This is the main application of the thesis. Nevertheless, as the results on desirability index uncertainty are applicable under general conditions, they can be applied to any multicriteria optimisation problem, like it often occurs in industry.
School:Université catholique de Louvain
Source Type:Master's Thesis
Keywords:desirability quantitative structure activity relationship screening algorithm uncertainty propagation multicriteria optimisation lead delta method combinatorial library
Date of Publication:01/07/2008