The AI-driven retrosynthesis planner Chematica can now not only analyse data and learn from existing reaction patterns but also identify new synthetic routes that may be counterintuitive, even to expert chemists. A new feature has made it possible to unlock a large number of previously unreported strategic reaction sequences – known as tactical combinations – which are particularly useful for the synthesis of complex molecules. The software was tested by preparing a small natural product.

When planning the synthesis of a structurally challenging molecule, it can actually be a good idea to choose a combination of reactions that first creates an even larger molecule and then trim it to create a less complex structure. ‘This uphill and apparently unproductive step sets the scene for an ensuing, extremely elegant simplification,’ says Bartosz Grzybowski of the Polish Academy of Sciences and Ulsan National Institute of Science and Technology, South Korea, one of the developers of Chematica. But such two-step tactical combinations are not that easy to find. ‘Spotting such sequences is very difficult because we, chemists, are not trained to envision steps that would “complexify” the structure,’ Grzybowski says. ’In organic chemistry courses, we are taught to simplify the target, and simplify more, until we reach something we can buy. Thinking uphill in structural complexity is just not very intuitive. No wonder that only some 500 such sequences have been catalogued so far.’ Now, the team has discovered many new tactical combinations.

Grzybowski and his team have spent 15 years encoding around 75,000 reaction rules into Chematica, so they decided to adapt the program to create two-step combinations and fish the ones out that met certain criteria, including an increase/decrease in complexity and the absence of one-step bypasses that would give the same results. It was also important that the first reaction in the sequence enables the second one, creating a real synthetic strategy. ‘It took quite a while to design the right algorithm and survey close to 1 billion sequences, but in many ways it was bound to work – and it did,’ Grzybowski says. ‘We show some 46,000 unprecedented sequences based on suitable reaction classes and close to 5 million based on specific reaction variants. This is arguably the first-ever example of how a machine identifies new chemical knowledge.’

Tactical planning

To make the newly discovered tactical combinations available to other scientists, the team created the Strategist web app, where users can query suitable sequences within seconds. ‘Of course, all this knowledge is also in Chematica,’ says Grzybowski. ‘The program now really thinks and can produce synthetic plans to really difficult targets.’ He and his colleagues used the upgraded software to plan the synthesis of the platelet aggregation inhibitor imperanene and were able to shorten the synthetic route from eight steps to just three and two protections by choosing a tactical combination that included an addition and a reduction reaction. ‘We actually show – by cooking – how one of the newly discovered sequences shortens the synthesis of the small natural product by about a half,’ Grzybowski says.

Schemes showing the syntheses of medicinally relevant molecules and natural products designed Autonomously by the chematica program with the use of the TC collection

Source: © 2019 Ewa P Gajewska et al/Published by Elsevier Inc.

Imperanene, a platelet aggregation inhibitor, was successfully synthesised following the tactical combination route recommended by Chematica. The top portion is Chematica’s cost planning route for the drug, the middle is the corresponding, experimentally executed plan (with conditions and yields given next to the reaction arrows). The bottom portion is the shortest literature-reported route of the drug

Richmond Sarpong of the University of California, Berkeley, US, who was not involved in the study, believes that Chematica’s upgrade will help scientists identify routes that are not immediately obvious from a retrosynthetic perspective. ‘These tactical combinations should aid and inspire human designers as to how to best construct a compound by considering disconnections that may lead back to structurally more complex compounds. In the end, these non-intuitive disconnections may provide a more efficient overall synthesis,’ he says. ‘The ability to more easily recognise more tactical combinations will be a significant addition to how synthetic organic chemistry is practised. This work is a positive start as to how to do this with computers.’

Grzybowski admits that chemists could have discovered some of the new strategic combinations too. ‘But they didn’t,’ he says. ‘Besides, they would have never discovered 5 million of them. In terms of scope and speed of discovery, the machine is unbeatable.’