Realistic costs and diverse suggestions make Chematica more insightful

The developers of Chematica, a computer program that can predict organic synthesis routes, have significantly improved the way it chooses from the options it discovers to present users with better and more diverse results. The speed at which the software makes these comparisons has also greatly improved, making the software more appealing as a planning and investigative tool.

Chematica performs retrosynthetic analysis: given a target compound, it suggests how it could have been made by reacting simpler components, and how those could have been made, and so on. Repeating the process eventually leads to the kinds of starting materials you can order in a catalogue. Backtracking from those reagents to the target molecule gives a synthesis route. The software looks for all of the possible alternatives at each retrosynthetic step, so rather than discovering a single synthesis, it builds a network of routes.

The wealth of alternatives proved to be an embarrassment of riches. ‘In the beginning we were very happy when it found one route. And then it started finding not one, but a thousand, or a hundred thousand routes. Saying “look, they’re all correct” doesn’t really solve a practical problem,’ comments inventor of the program Bartosz Grzybowski, from the Polish Academy of Sciences and Ulsan National Institute of Science and Technology, South Korea. Instead, Chematica ranks the possible routes by the end cost of the target molecule, and can show a variety of alternatives rather than just the cheapest – which may be variations on a theme. Now, Grzybowski’s team has significantly improved how Chematica makes these selections.

Good timing

Accounting for the imperfect yields of reactions had the biggest impact, implicitly penalising linear synthesis pathways and favouring convergent ones where molecules are assembled in chunks. ‘For the examples shown, the approach taken by Chematica does not differ too much from the retrosynthetic analysis made by a trained organic chemist,’ says Mariola Tortosa, a researcher in organic synthesis at the Autonomous University of Madrid, Spain. ‘One feature that I found particularly attractive and impressive is the fact that the program can identify the optimal timing to use the most expensive reagents.’

An image showing top-scoring syntheses of unsymmetrical triarylamine proposed by Chematica

Source: © Bartosz Grzybowski/Ulsan National Institute of Science and Technology

Top-scoring syntheses of unsymmetrical triarylamine proposed by Chematica under different yield scenarios

Chematica’s definition of chemical novelty has also been improved. The code can vary its shortlist by penalising the use of similar reactions in consecutive suggestions, but it previously had a specific idea of what made two reactions alike. In the new version, two reactions are similar if they make the same product and share a reagent with more than four carbon atoms, which is broader and more realistic. The team tested this by having Chematica develop syntheses for trans-whisky lactone, and found that the software gave three very different approaches.

The software is also significantly faster than its previous version, which in severe cases could take thousands of seconds to rank the syntheses and select the best. ‘Who is willing to wait three hours to do this kind of selection? It was quite slow,’ says Grzybowski. Tomasz Badowski, a mathematician specialising in algorithm theory who worked on the project, designed superior algorithms to cut the search time to less than a second, a four-order-of-magnitude improvement.

An image showing top-scoring syntheses of trans whisky lactone proposed by Chematica

Source: © Bartosz Grzybowski/Ulsan National Institute of Science and Technology

Top-scoring syntheses of trans-whisky lactone proposed by Chematica without (a) and with (b) the application of diversity penalties

The software’s combination of speed, realism, and variety should allow organic chemists to approach the synthesis problem in new ways. ‘It gives you a realistic route to basically slide the dial and say, right, I care about yield, or no, actually I care about steps, or I care about the cost,’ says Lee Cronin, who investigates automation in chemistry at the University of Glasgow, UK. ‘I think that research like this is going to put the practical chemist back in control of their time, so they can spend it doing more fascinating chemistry and making bigger molecules.’ The sentiment is shared by Tortosa: ‘As tools like this keep improving, we won’t need to worry about things that are somehow repetitive and we will have more time to focus on creativity, on understanding fundamental mechanistic aspects and to search for original synthetic transformations. As an academic this is a very important factor.’

Merck KGaA of Darmstadt, Germany bought out the company set up to develop the software in 2017 and has released a commercial version, named Synthia. These improvements to Chematica will be included in a future release.