How to use large language models in chemistry

No comments

Five ways that chemists can use GPT-4 and other generative AI tools

Artificial intelligence (AI) has the potential to transform the way we work. For some, it presents endless opportunities, but for others it is something to approach with trepidation.

GPT-4, launched in March 2023, is the newest version of OpenAI’s large language model (LLM) systems and uses deep learning to generate human-like, conversational text; it is the model behind tools like ChatGPT, and digital image generator, DALL-E, to name but a few.

Increasingly, researchers are exploring the benefits and limitations of applying the use of LLMs to chemistry. And while we may not yet be at the point where LLMs can help with every aspect of chemistry there are some key areas where they could bring considerable benefit to the average chemist’s day-to-day work.

‘It’s not going to replace any chemists but I think chemists who are using it can get some small advantages and start doing things with a lower effort,’ says Andrew White, an associate professor and computational chemist at the University of Rochester in the US.

Research assistance

Generative AI tools such as ChatGPT and Perplexity are powerful search engines and can be extremely helpful for identifying relevant sources and generating a list of potential research topics. They can also provide summaries of key articles and research papers and answer technical questions about the content, helping you understand the main points without having to read them in their entirety.

‘Literature research is much better with a tool like Perplexity’ says White.

‘You can ask questions like “how could I manufacture carbon nanotubes at a realistic scale?”, and it pulls up articles that talk about how to do scale up for carbon nanotubes,’ he adds.

Andrés Bran, a PhD student at École Polytechnique Fédérale de Lausanne (EPFL) in Switzerland, agrees with the summarisation capabilities of generative AI systems.

‘Instead of reading three articles you could potentially copy and paste all the content there and ask it questions about them, which is extremely useful in any field, not only chemistry,’ he explains.

Science communication (particularly if English is not your native language)

For Kan Hatakeyama-Sato, from the department of chemistry at Waseda University in Tokyo, Japan, one of the biggest advantages of tools like ChatGPT is their ability to translate text into English: ‘I can write my manuscript in Japanese and GPT can translate to English very efficiently, as if I was a native speaker,’ he says.

White agrees, saying it can be used as a prompt to improve and assist with writing in English if the user requires it: ‘something like, “how can I improve this paragraph to sound like a native English speaker? Please explain all the edits.” Things like that really help you learn.’

Philippe Schwaller, an assistant professor leading the Laboratory of Artificial Chemical Intelligence (LIAC) at EPFL, believes in the power of GPT-4 for science communication, in general.

‘You can use it to simplify a message that you have in a very academic format and make it for a broader target audience, you could also ask it to write a tweet or a LinkedIn article – it can really help bring your science to the people,’ he tells Chemistry World.

Translating chemical names

The ability of LLMs to convert chemical names into different formats can be hit and miss. Kan Hatakeyama-Sato and co-workers tested GPT-4’s ability to name a compound from a Smiles string and vice versa. While GPT-4 could correctly produce the Smiles string of simple molecules, it struggled with more complex structures – and translating in the other direction was even less successful. When Chemistry World tried the same with ChatGPT, it could accurately name a molecule from its Smiles string, but was less good translating the other way.

Structures

Neither GPT-4 nor ChatGPT had much success translating a chemical name to its Smiles string – but ChatGPT proved effective at naming compunds from their Smiles string

Asking ChatGPT to describe the structure of a molecule from its Iupac name also had mixed results. Even when it can correctly explain the logic behind the name, its drawing skills leave much to be desired:

Chat GPT

Editing

The writing capabilities of the likes of ChatGPT have been much debated in recent months. However, White says he prefers to use it for editing text.

‘For example, I might have a document that has citations at the bottom but they’re in MLA format and I want them in Nature BibTeX format; I’ll just say, “could you please rewrite them?” And maybe it makes a mistake or two but it’s a much better starting spot than writing it all over myself,’ he explains.

‘Or you could say ”take this abstract and rewrite it and write an impact statement”. It’s a really nice way to get over that little starting barrier.’

Bran says he finds the AI useful for mulling over ideas and, like White, for making light revisions to an existing draft of text. ‘I’m not going to guarantee that the [revisions are] always great, but they’re usually useful,’ he says.

Writing code

For new and well-seasoned coders alike, LLMs can be a useful tool.

‘Maybe I wrote some code for one simulation engine and I want to change it to another simulation engine – so I might ask “can you take this code for GROMACS and rewrite it for CHARMM”. [LMMs are] really good at these kinds of conversions,’ says White.

They can also be used to help identify errors in code and suggest possible improvements.

‘There’s a tool called Copilot – it’s like GPT in your code editor – so as you’re writing code it makes suggestions,’ White explains. ‘Students using that learn to code so much faster,’ he adds, saying that previously it might have taken months for a chemistry student to be able to do cheminformatics but now they are able to start programming in a meaningful way in a matter of weeks.

To increase productivity

In a study published in Science in July, researchers assigned occupation-specific, incentivised writing tasks to over 450 college-education professionals. In the half who were randomly exposed to ChatGPT the average time to complete the task was reduced by 40% and output quality rose by 18%. It was found to be particularly helpful for those with relatively weak writing skills.

LLMs are also capable of writing emails, summarising meetings, creating action items from a transcript and querying databases, all of which could save busy chemistry professionals from the mundane parts of their jobs, freeing them up for the parts they enjoy the most.

Things to be aware of

There is a risk of ‘hallucinations’ – tools like ChatGPT are trained to give an answer every time and so if you ask it impossible questions, it will still try to give you an answer based on false or nonsensical information
Don’t assume the answer you are given is always correct – even if it sounds convincing it is always worth double checking the information provided (any AI system is only as good as the data it is trained on)
The quality of the output depends on the quality of the input – the more detail you can provide in your query, the more accurate the response will be
AI tools need open data so cannot access information in papers that are not open access

Julia Robinson

Julia joined the Chemistry World team as Science correspondent in May 2023. She previously spent eight years leading the clinical and science content at The Pharmaceutical Journal, the official journal of the Royal Pharmaceutical Society, a membership body for pharmacists.View full profile