AI assistance

Source: © Andriy Onufriyenko/Getty Images

Could AI research assistants one day become commonplace when refining and testing hypotheses?

Artificial intelligence (AI) research labs Google DeepMind and Futurehouse have now released AI research assistants that can generate scientific hypotheses, design experiments and analyse data. The researchers behind these systems think that they could help accelerate scientific discovery, for example, by identifying clinically approved drugs to treat unrelated conditions or diseases.

AI tools have so far helped scientists analyse large amounts of data, predict the structure of proteins or even design small molecule drugs that are starting to enter clinical trials. ‘[But] how do we teach AI systems to think in the manner that scientists do?’ says Vivek Natarajan at Google DeepMind. Current interactive AI tools – such as Chat GPT – often produce quick responses, which ‘is not how science works’, he says. ‘[Scientific thinking] is slower, it’s much more rigorous, deliberate, structured, and it’s a process that often happens for a much longer period of time.’

To combat this challenge, Google DeepMind developed Co-Scientist – an AI assistant purposefully built to work alongside scientists. DeepMind released the programme in February last year in a preprint. Non-profit Futurehouse unveiled its own version, Robin, a few months later and both companies have now published their findings in Nature.1,2

Co-Scientist builds on Google’s AI assistant Gemini and uses several AI programmes or ‘agents’ to respond to a research goal proposed by a user. The agents then generate initial ideas, search existing scientific literature to evaluate hypotheses and iterate this process several times to refine ideas. This is similar to how Google’s AlphaGo decides which move to make next in the board game Go, says Natarajan. Scientists are then able to carry out the suggested experiments, using the results to get Co-Scientist to generate better hypotheses.

What is AI?

Artificial intelligence (AI) is an umbrella term often incorrectly used to encompass a variety of connected but simpler processes.

AI is the ability of machines and computer programmes to perform tasks that typically only humans could do, such as reasoning, responding to feedback and decision making.

Generative AI is a newer variant of AI that analyses and detects patterns in training datasets to generate original text, images and videos in response to requests from users. ChatGPT, Microsoft Copilot, Google Gemini and more recently X’s Grok are all examples of chatbots that use generative AI.

Neural networks are an interconnected array of artificial neurons, akin to biological brains, that identify, analyse and learn from statistical patterns in data.

Machine learning is a subset of AI that allows machines to learn from datasets and make predictions based on new data, without programmers explicitly asking it to do so. Machine learning models improve their performance as they receive more data.

Deep learning is an enhanced type of machine learning that uses neural networks with many layers to analyse complex data from very large datasets. Applications of deep learning include speech recognition, image generation and translation.

Large language models or LLMs are a type of deep learning trained on large amounts of data to understand and generate language. LLMs learn patterns in text by predicting the next word in the sequence and these models are now able to write prose, analyse text from the internet and hold dialogues with users.

Natarajan explains that the team has worked with hundreds of scientists across the globe to see how the scientific community might use the tool. ‘Most of the scientists [we speak to] are quite surprised at where AI is, what it is coming up with [and] how fast it is responding,’ he says.

For example, the team worked with microbiologists at Imperial College London, UK, who were researching how certain genes are transferred between bacteria that allow them to develop antimicrobial resistance. ‘When we ran the [Co-Scientist] system for a couple of days, it essentially recapitulated their entire research journey and made the exact same predictions,’ says Natarajan.

Tiago Dias da Costa, who led the research at Imperial, says that Co-Scientist had ‘independently generated a hypotheses that closely matched the mechanism we had uncovered through years of experimental work, but ‘it did not replace the experimental discovery process’.

‘The value of Co-Scientist was not that it simply “gave us the answer” but that it demonstrated how AI can help generate, prioritise and refine biologically meaningful hypotheses’. He stresses that ‘AI-generated hypotheses only become discoveries when they are tested rigorously in the lab’.

Co-Scientist will become the hypothesis-generating tool within Google’s broader Gemini for Science programme, which will be available to researchers in ‘the next few weeks and months’, he adds. However, Natarajan thinks that ‘there’s still a few leaps that are needed before we have a system that can do what some of the great scientists of the past [have done] – like coming up with a true original breakthrough [or] paradigm shifting theory’.

Repurposing existing drugs to find new ways to treat disease

Futurehouse’s Robin works in a similar way to Co-Scientist. ‘A person can just input the name of a disease – any disease and no other context – and Robin will extensively search the literature and come up with experiments that model different components,’ says Michaela Hinks, who helped develop Futurehouse’s programme.

She adds that Robin will suggest existing drugs that scientists can then test in the lab. Inputting the experimental data into Robin allows the system to generate a refined set of hypotheses. What makes Robin stand out, says Hinks, is that the system only uses three agents. She explains that other similar tools were ‘just adding complexity for the sake of being fancy’. ‘I think systems should be as simple as possible to do the job.’

The team tested Robin by attempting to find treatments for dry age-related macular degeneration (dAMD) – a sight loss condition that affects millions of people in the US, which currently has no cure.

Hinks explains that Robin generated a novel way to treat the disease by enhancing the destruction of certain eye cells. Robin then proposed an experiment to measure the destruction of these cells and molecules that might treat the condition, with biological tests revealing two possibilities. This included Ripasudil – a clinically approved drug to treat glaucoma – and a circadian clock modulator, which was an unexpected result.

‘I think that this was probably a good testbed to use for this system to see if it could recapitulate the fact that human scientists have actually thought about doing this,’ says Derek Lowe, a US-based drug discovery chemist and Chemistry World columnist. He adds that when he searched for scientific papers with Ripasudil and dAMD, he found lots of results.

Hinks adds that Robin generates hypotheses by accessing the open-source literature ‘to make sure what the model proposes is grounded in what we already know’.

‘[Being able to generate hypotheses] is only going to be useful in cases where you have a lot of very good, reliable data to feed into these things,’ says Lowe, ‘because otherwise it’s going to spit out all kinds of stuff that’s not going to pan out. But join the club, we humans do that too,’ he adds.

‘Any machine learning system is absolutely dependent on the highest quality data that you can give it – both positive and negative results – and experimental observations as well,’ says Lowe. He adds that the scientific literature is currently ‘a bit of a mess’, adding that the influx of research papers – increasingly ones that include AI-generated results or text – will not help the situation. Lowe thinks that these tools will not be able to answer fundamental scientific questions. ‘You’re not going to get all the answers by doing this because that would presuppose that all the answers are there in the literature.’

‘I don’t think that your standard academic or industrial drug discovery scientists are going to start using [these tools] immediately,’ says Lowe. ‘These are the equivalent, to me, of a laboratory instrument that still has soldered wires and duct tape and things hanging out of it.’ He thinks that things have a long way to go before they are ready for a regular end user to benefit from such tools.