Machine learning approach predicts crystallinity

One of the biggest barriers when it comes to studying the structures of molecules is the ability to obtain them in a crystalline form for x-ray diffraction. Now, Richard Cooper and Jerome Wicker at the University of Oxford, UK, have developed a machine learning approach to predict whether a small organic molecule will be able to crystallise. Since crystallinity is vital both for determining structures, and also for the delivery of many drugs, this work could provide valuable information.

Machine learning involves the construction of algorithms that can learn from data, and it has been used in the past to predict the solubilities and melting points of materials. Cooper and Wicker set out to test whether simple two-dimensional information, such as atom types, bond types and molecular volume, could be used to predict if a material would crystallise.

Data sets were obtained from the Cambridge Crystallographic Data Centre (CCDC) and ZINC, a database of commercially available chemical compounds, and the model was trained and tested with a few properties of the molecules to determine which were the most significant in predicting crystallinity. Rotatable bond count and 0?v, a molecular connectivity index that gives an indirect measure of 3D volume, proved to be the key variables and produced a model that was 80% accurate.

0χv was found to give the highest predictive accuracy in determining crystallisation propensity

‘The analysis tells us whether a material should crystallise, and therefore when to expend effort trying to obtain a crystalline sample,’ explains Cooper. The model could also give information as to whether changing a small feature, such as a functional group, might make a molecule more or less likely to crystallise

Crystallography experts put the work into context: Simon Coles, Director of the UK National Crystallography Service, says ‘many areas of science are on the verge of a new age – we have been collecting individual datasets for decades and can now apply informatics-based approaches across these collections, not only to observe trends and derive rules but also to predict.’  Pete Wood, a scientist at the CCDC says ‘the likelihood of crystallinity, or crystallisability, of small molecules is of great significance in the pharmaceutical industry as the majority of small molecule drugs are delivered in the crystalline state.’

In the future Cooper and Wicker hope to incorporate other variables into the model, such as temperature and solvent, and are currently testing their model on a range of materials on the ‘edge of crystallinity’ in order to get more insight into the mechanisms that determine whether these materials crystallise.