New tool seeks to answer the question: can an organic chemist’s intuition be quantified?

Organic synthesis is often heralded as more art than science. An organic chemist’s eye for complexity, breaking down structures into simpler forms, is honed and nurtured over decades. But, is it possible to take this seemingly intangible skill and quantify it, putting a simple number on how complex a chemical structure actually is? 

Process chemists Martin Eastgate and Jun Li, at Bristol-Myers Squibb (BMS) in the US have developed a tool to do just that, generating a unique index they have termed a molecule’s ‘current complexity’, which also accounts for changes over time due to the impact of new technologies.1

According to Eastgate, the idea for the tool was born out of their recent attempts to synthesise JAK2 inhibitor BMS-911543. A new transformation radically changed their ability to prepare the compound, enabling an eight rather than 19 step synthesis.2 ‘When I reflected on what we had achieved, the molecule no longer looked as tough as it once had,’ says Eastgate. ‘The beauty of the structure remained, but my perception of the challenge it posed had altered.’

The index is based on a community's perception of complexity, within the context of current technology

This led to an interest in understanding how the process of making molecules at BMS had evolved and how process advances had made structures simpler. ‘It was clear that no one could address these questions accurately, so we decided to approach this problem ourselves,’ says Eastgate. The ensuing research resulted in the current complexity index, which was based on an analysis of collective intelligence from a group of 18 synthetic chemists asked to rank 40 molecules in terms of their perceived complexity. The data obtained from the chemist’s intuition was then refined by considering a large series of intrinsic and extrinsic factors and applying a Bayesian regression model to determine the five major factors that impacted the complexity of a structure the most. These are as follows: (i) the structure’s molecular topological index (as proposed by Randic);3 (ii) the number of stereogenic centres established in the synthesis; (iii) the number of heteroatoms on and in aromatic rings; (iv) the number of steps and (v) ideality of the route (as defined by Phil Baran in 2010).4 (i) and (iii) are intrinsic and unchangeable, whereas the others are extrinsic variables reflecting advances that occur over time. From this was established an easily comprehensible 1–10 rating scale, with 1 being the most complex and 10 being least complex.

Strychnine has become easier to make over time

As an example, Eastgate and Li examined strychnine. Their current complexity index illustrated how advances in techniques and technologies have reduced the complexity of strychinine from a score of 2.14, for Robert Woodward’s original synthesis,5 to 3.75, for Chris Vanderwal’s in 2011.6

Convincing the community

The idea of trying to assign the complexity of a molecular structure a number has been around for some time with chemistry titans such as Robert Woodward and EJ Corey failing to find a suitable solution. As pointed out by Johann Gasteiger, an expert in cheminformatics at the University of Erlangen-Nürnberg, Germany, ‘even with the advent of computers, no system has found broad acceptance among the organic community’. Gasteiger feels the reasons for this are manifold, but not least because of community resistance as many organic chemists consider ‘synthesis design as an “art” where computers should not have a place’.

This view is backed by organic chemist Scott Snyder of the Scripps Research Institute, in the US, who says ‘it could be compared to deciding which painting is superior or which piece of music is more pleasing to the ear.’ However, Gasteiger surmises that, looking at the large, distribution of merit values assigned by the chemists, even within Li and Eastgate’s study to the same molecule, it clearly shows that a more ‘unbiased approach to the definition of the complexity of organic molecules is urgently needed.’

Eastgate is convinced that this model is on its way to meeting this need because unlike previous efforts, such as that of Peter Ertl and Ansgar Schuffenhauer,7 the concept of current complexity actually accounts for our perception of complexity not being totally fixed over time. It’s this aspect that appeals most to Snyder as he feels it ‘allows for the inclusion of advances in the field, something which past approaches have not considered or handled with particular aplomb.’

The assessment tool is still in its early stages of development and currently only stands as a proof-of-method. ‘It would be fantastic to develop a web based game as a ranking mechanism,’ says Eastgate ‘increasing both the number of raters and the molecules assessed would improve the quality of our model.’

The tool is already in use at BMS, but the ultimate aim would be to incorporate this method into a synthetic route design engine to assess the potential impact a proposed synthetic approach could have on the complexity of the system in question. 

Accounting for the passage of time may be what makes this model unique. And as Snyder points out time will also evaluate the model’s usefulness, ‘but for now it is a provocative and interesting method of assessment of complexity.’