Dataset with millions of entries set to help AI find new drugs

No comments

Researchers in Russia have put together the world’s largest dataset to date for training deep neural network models. The dataset contains around six million conformations of about one million drug-like molecules.

From a computational point of view, one must know details such as conformation energies and the Hamiltonian matrix parameters to forecast the biological activity of a potential drug long before it is synthesised in a lab. Density functional theory (DFT) can be used to predict such parameters, but quantum chemical calculations tend to be time-consuming and computationally expensive. Machine-learning, however, can be used to lower the computational complexity of DFT.

Frustrated by a lack of datasets for training machine learning models, the team set out to fill this gap and ultimately reduce the computational costs surrounding medicinal chemistry. They began with a training set of 100,000 molecules with 436,581 conformations and calculated their conformation energies and the Hamiltonian coefficients using DFT. This training set was significantly larger than the datasets used in publicly available deep neural networks models. The researchers then compared the performance of the original DFT-based models with test sets containing different molecules. The team noted these models performed much better after being trained with larger datasets.

The team made the code publicly available to encourage other researchers to use and develop the dataset, which they hope will aid future quantum chemistry studies.

References

K Khrabrov et al, Phys. Chem. Chem. Phys., 2022, 24, 25853 (DOI: 10.1039/d2cp03966d)

Additional information

The code and links to the full dataset and its parts can be found at https://github.com/AIRI-Institute/nablaDFT

Topics

No comments

Dataset with millions of entries set to help AI find new drugs

References

Additional information

Topics

Related articles

Landmark intellectual property ruling could offer new opportunities for chemists working with AI

New neural networks calculate catalysts’ adsorption energy ‘with lightning-fast speed’

Machine learning ecosystem evolves MOF design

Algorithm produces one of the best solutions to molecules’ Schrödinger equations yet

Quantum chemical analysis uncovers previously overlooked contributor to carbocation stability trend

Three-centre two-positron bond predicted

No comments yet

Only registered users can comment on this article.

More News

Water squeezed into 2D channels conducts electricity 100,000 times better

The cost of a visa for a researcher moving to the UK is 22 times that of international average

Civet coffee kopi luwak’s reputedly superior flavour may have chemical basis

J&J now faces talc cancer claims in the UK

Algenesis cracks diisocyanate problem to make fully bio-based polyurethane

UK’s new national laboratories site will support ‘invisible infrastructure’ of measurements that keep world in sync