Smart machines could soon outpace even the best organic chemist

I’ve had a principle for some time that every working scientist should have an ‘elevator pitch’ ready to describe their jobs. That’s an old phrase from Hollywood, when someone gets a chance to pitch a movie idea in the time of an elevator ride with some big studio executive. For a scientist, the elevator pitch is a compressed description of what they’re doing, or what they do in general. It’s a particularly relevant idea for people in industry, where the question ‘What is it you do around here, anyway?’ can be asked at any time – in that case, not only should you have an answer ready, but it would do you good to make sure that the same answer would occur to other people anyway. Ideally, it should involve something that you’re doing that is unique to your abilities, something that would make you hard to replace.

These thoughts have come to mind recently as I read papers about machine learning, artificial intelligence and other advanced computing applications. As people are doubtless aware, such software is coming increasingly close to what have been traditionally thought of as human functions. This has been underway for quite a while, and we’re already used to seeing once-human problems outsourced to machines – things such as “How do I get to this restaurant from here?” or “When is the last time I had any messages about topic X or from person Y?” We’re all pretty comfortable with that, and not too many are bothered about the loss of the once-human functions of being the best in the world at chess. But it’s a bit different when the machine comes after what we think of as key intellectual functions of our own jobs.

For organic chemists, I think that the first software to cause major uneasiness will be retrosynthetic planning. I’ve seen papers recently that convince me that programs to work out synthetic routes to new compounds are getting quite a bit more capable. Chemists have been trying to realise this idea for more than 50 years, but now that it might be coming true, we may find ourselves wondering if we’d really thought through the consequences. There are chemists whose elevator pitch consists at least partly of being able to come up with such ideas. Now what?

Even if the current software is not quite up to replacing humans at coming up with syntheses, I think that the proverbial handwriting is on the proverbial wall. The next generation will be, and the next generation is coming. The synthetic organic literature is incomprehensibly large, and no one has truly been able to keep up with it for many years now. Only a machine could deal with the chemical literature as it exists today (and they do). A retrosynthesis program knows all the reactions, never forgets them and never misplaces them. Another huge stack of literature lands on the digital doorstep and the program does not curse, groan or head out for a drink: it adds everything to its grand scheme of All Organic Chemistry and waits patiently for more.

The real eye-openers are the latest versions of machine-learning software, first demonstrated for games like Go. In contrast to the recent world-beating programs that had a database of every known human strategy (already enough to beat the best human champions), the latest generation skips that step entirely. Instead, the program runs through countless simulations and test cases, learning the rules for itself as it goes. In the case of Go, such programs not only far surpass human performance, they also far surpass the human-mimic programs that came just before them, deploying strategies that have literally never been seen before. This is a higher level of abstraction: instead of dealing passively with huge piles of data according to the rules we have discovered, these programs move the process of discovering such rules into their domains as well.

If this really does work for something like organic synthesis, we can expect at some point to have such programs propose routes that may not even make much sense to us. At first that will be because they will be invoking known reactions that we might not have heard of. But if we also start teach them general chemical principles – or, more likely, dump the entire chemical literature into them so they can figure those principles out for themselves – the programs may well infer reactions that don’t yet exist, but are about to. And at that point, a lot of us are going to need a new elevator pitch!