Integrating coding into chemistry helps science progress faster

High-throughput chemists like my team and me generally notice inefficiency the most. If a bench chemist takes maybe eight LC–MS samples, spread evenly through an average day, they quickly get used to a 20-second workflow imperfection and tend not to complain. But when a high-throughput chemist processes 96 samples from a wellplate at once, small limitations accumulate to become a major annoyance or even a disenabling blocker. At one point in my career, I was asked to work with a particular workflow to sort out my LC–MS data. The process involved remoting into multiple computers and performing several lengthy workarounds to combine our files and export high-throughput data in batch. Since we could only export our analytical data slowly, if we noticed a new side-product afterwards, we had to weigh up the benefit of identifying the side-product yield patterns across the plate versus the value of our time rerunning the export process. Since the benefits are often not immediate, information was regularly lost. We could all see that if you were designing software, you wouldn’t do it this way. But we were chemists, not software designers.

However a year before this, I had tried my hand at a free intro to coding course. It was run by an acquaintance from my undergraduate days, and as a chemist in my first industry job, I thought it seemed like a great idea to do alongside my job. Although I didn’t think I had much reason to code, it looked like it just might come in useful one day.

I was right. By the time I was shown the convoluted LC–MS workflow, any Python learnings in my immediate recall were limited. But the main thing the course had taught me was that it was possible. I knew that Python could fix our analytical export, so I figured out what I wanted to do, and googled almost every line until I had built a process that worked. It was written poorly and failed to take advantage of many helpful open source packages that I wasn’t yet aware of, but it was good enough. The new script meant my team and I were free to capture all the information we identified.

Chemists are in a great position to keep a foot in both camps

Since dipping my toe into the broad data, coding and chemoinformatics sphere of chemistry knowledge, I have discovered it is a great place to be, though quite a different community to synthetic chemistry. As a whole, data chemists are adamant about the need to be open and push for progress, even more so than the already quite open and progressive pharma synthetic chemistry community. In the data world beyond chemistry, there is a wide effort to distribute learning for free, since leading a free public course is a badge of honour. This made it possible for me to start learning while I had a full-time chemistry job. However, when an organic chemist friend and I attended our first chemistry data conference, we found a big culture difference: everybody left and went home at 5pm on the dot! We caught one of the organisers, who definitely deserved a beer by then, and the three of us went to the pub on our own. Very different to the typical late nights in the bar after organic chemistry conferences.

‘Everyone seems to be going into computers these days,’ a colleague remarked to me recently. However, there’s actually no need to ‘go’ anywhere. Chemists are in a great position to keep a foot in both camps, as I did initially. I surveyed some colleagues about whether they would be interested in taking a beginner Python course, expecting the answers to be largely negative. Instead, around 80% of them shrewdly said yes. I’m glad, as this will make it easier for them to use timesaving scripts written by programming professionals. I’ve now had so many queries about how to get into coding that I created a webpage of resources rather than repeat myself a hundred times.

But now it is time for me to make a move – transitioning full time into working with data strategy, while remaining very much embedded in synthetic chemistry. My reason to be passionate about chemical data, code and workflows is ‘because of’ rather than ‘despite’ being a synthetic organic chemist. Chemistry knowledge is a key strength in this role too. Many of us have seen digital lab tools that were clearly made without ever consulting lab chemists – with a little collaboration we can change that. My goal is that excellent synthetic chemistry can be done both faster and with higher quality, enabling chemists to get more out of their experiments as well as releasing their time from the things they don’t want to do. We should be reducing their workload, not adding to it! Data chemists are making huge strides in the goal of getting faster, better science done, and I can’t wait to be a part of that.