An artificial intelligence (AI) model is able to fill in missing or incorrectly placed atoms – such as hydrogen – in the crystal structures of inorganic materials. Refining atomic positions in this way may allow chemists to better simulate material structures or help design new materials, such as superconductors.

Inpainting method

Source: © Paul Scherrer Institute PSI/Giovanni Pizzi

The generative AI model has been trained to preserve a known crystal structure (blue, black, and red spheres) and only insert the missing hydrogen atoms (blurred on the left, white spheres on the right)

Determining the positions of hydrogen and other small atoms in crystalline materials is challenging as these species scatter x-rays weakly, meaning techniques such as x-ray powder diffraction can lead to inaccurate structures. While neutron or synchrotron diffraction experiments are more accurate, these require access to large facilities, require more material and are expensive to run.

‘I remember once I wanted to compare the results of our predictions of the crystal structure of cellulose with experiments,’ says Artem Oganov at Skolkovo Institute of Science and Technology in Russia, who wasn’t involved in the new work. Despite being the most abundant polymer on Earth, he says that the crystal structure still had missing hydrogen atoms.

‘If we don’t know where the atoms sit, there’s no way for us to simulate the material,’ says Timo Reents, a PhD student at the PSI Centre for Scientific Computing, Theory and Data in Switzerland, who developed the model. Varying atomic positions can affect the properties of materials, including thermal or electrical conductivity, vibrational spectra and superconductivity in hydride materials.

Artificial intelligence (AI) is an umbrella term often incorrectly used to encompass a variety of connected but simpler processes.

AI is the ability of machines and computer programmes to perform tasks that typically only humans could do, such as reasoning, responding to feedback and decision making.

Generative AI is a newer variant of AI that analyses and detects patterns in training datasets to generate original text, images and videos in response to requests from users. ChatGPT, Microsoft Copilot, Google Gemini and more recently X’s Grok are all examples of chatbots that use generative AI.

Neural networks are an interconnected array of artificial neurons, akin to biological brains, that identify, analyse and learn from statistical patterns in data.

Machine learning is a subset of AI that allows machines to learn from datasets and make predictions based on new data, without programmers explicitly asking it to do so. Machine learning models improve their performance as they receive more data.

Deep learning is an enhanced type of machine learning that uses neural networks with many layers to analyse complex data from very large datasets. Applications of deep learning include speech recognition, image generation and translation.

Large language models or LLMs are a type of deep learning trained on large amounts of data to understand and generate language. LLMs learn patterns in text by predicting the next word in the sequence and these models are now able to write prose, analyse text from the internet and hold dialogues with users.

The Swiss team has now built on Microsoft’s MatterGen – a generative AI model that can generate new inorganic materials – to better position atoms in crystal structures. Reents likens the team’s work to using AI tools to remove unwanted bystanders from photos. Such a model is then able to use its training on similar images to fill in the erased area with what is likely to be there.

A similar process is used by the team at PSI to fill in missing or incorrectly placed atoms in crystal structures. ‘We know the [positions of] heavy atoms, we know the unit cell shape,’ says Reents, ‘and we want to use this host structure [to] predict the hydrogen positions, which is kind of this missing part in the image.’

Timo Reents and Giovanni Pizzi

Source: © Paul Scherrer Institute PSI/Mahir Dzambegovic

Timo Reents (left) and Giovanni Pizzi have taught an artificial intelligence system to find missing positions of hydrogen atoms in crystal structures

The generative AI model works by adding ‘noise’ or additional data points to the unknown positions within the crystal and refining until the model produces the lowest energy structure. The team trained the model by artificially removing the locations of hydrogen atoms from known crystal structures found in an inorganic structural database. This included over 800 DFT-produced structures that had up to 20 atoms per unit cell.

Expanding the dataset to materials with unit cells containing up to 40 atoms allowed the Swiss team to test the model’s effectiveness on thousands of structures. In around 85% of cases, the model was able to predict the same crystal structure, and in a further 12% predict structures that were more stable.

Reents says that the model is now publicly available. He also notes that the model is ‘hydrogen agnostic’, meaning the model can be used to predict the position of other atoms, such as lithium or sodium.

Pierre-Paul De Breuck, a computational material scientist at Ruhr-Universität Bochum in Germany, thinks that this model will be useful to flag and correct errors in existing crystal structures. ‘Crystallographers can [also] use it as a fast, physically grounded starting point for refining ambiguous x-ray structures, instead of relying on “chemically sensible” guesses,’ he adds.

However, De Breuck notes that the DFT simulations that the model is trained on are usually done at 0K. ‘X-ray experiments are performed at finite temperature, so lattice parameters and atomic positions can differ somewhat from what the model has learned.’

While this model itself is not a breakthrough, according to Oganov, he still thinks that ‘targeting a problem that has been an old pain in the neck for the whole community’, is invaluable. ‘Hydrogen has the same rights as other atoms,’ he adds.