A database of quantum chemical results and some clever algorithms can be used to predict atomisation energies

Quantum chemical approximations don’t always need to start from scratch, says an international team of researchers. With many results already known, a little artificial intelligence can go a long way to predicting the rest. Drawing on a database of quantum chemical results for over 7000 molecules, their program could give the atomisation energies of unfamiliar molecules to within 1% - and in a billionth of the time required for a full approximation. 

Fully solving Schr?dinger’s equation is impossible for all but the smallest and simplest chemical systems and accurate approximations are time-consuming. However, in the past few years, powerful computing has become so ubiquitous that thousands of density-functional field theory (DFT) calculations can be made within days, says Anatole von Lilienfeld of Argonne National Laboratory, Illinois. He and his colleagues argue that with such information at hand, algorithms similar to Amazon’s book recommendation program can predict the properties and behaviours of ’zillions’ of molecules.  


Algorithms can now be used to predict the atomisation energy of huge numbers of molecules

To prove the point, the team developed a program for finding molecular atomisation energies. It defines a molecule’s elements and configuration as a matrix, with a row and column for each atom. Where an atom’s row meets its own column, the number represents the potential energy of the atom isolated from the molecule, and where the rows and columns of different atoms meet, that entry signifies the Coulomb repulsion between the two nuclear charges.  

To harness artificial intelligence for quantum chemistry, Albert Bart?k-P?rtay of the University of Cambridge, UK, says that the community needs to ’represent atomic configurations in a way that can be fed into the machinery’. This paper ’describes a really pretty way of doing this’, he says. 

Von Lilienfeld’s team trained the algorithm on a subset of molecules in the database, comparing their matrices to find ’distances’ between molecules - a measure of the difference between the molecules in terms of their matrices. The heavy computing work was in developing a landscape of distances. Once that was complete, an unknown molecule could be assigned a place in the landscape, according to its atoms and configuration. In the case of finding an unknown molecule’s atomisation energy, the distances between the unknown molecule and all the known molecules gave weights for how much each known atomisation energy could contribute to an estimate for the unknown molecule.  

The researchers found that with a landscape of more than 5000 molecules, the error for predicting atomisation energies of new molecules drops below 10kcal/mol, approaching the 5kcal/mol accuracy of hybrid DFT. ’Calculating a molecule’s atomisation energy using hybrid DFT would take on average one hour on a single CPU,’ says von Lilienfeld. ’With machine learning, it’s milliseconds.’ 

Christopher Handley of the University of Warwick, UK, says that unlike previous studies, the new method, ’allows for the development of models that are applicable to other molecules, rather than just different conformations of the same molecules’. 

Von Lilienfeld’s team is optimistic that with improvements they could extend their machine learning approach to tackle more complex problems such as chemical reactions, or the design of drugs, catalysts, and other purpose-built materials. 

Kate McAlpine