Building data systems to break down research silos

Prefer Chemistry World
in Google search

No comments

The right data ecosystem could make it a lot easier for researchers to cross discipline boundaries

As I had lunch with my fellow synthetic organic chemists in my student days, we chatted through a tough chemoselective reduction. We had a few ideas for either homogeneous catalysts, or the organic chemists’ generic heterogeneous option ‘palladium on carbon’. Our new PhD student, fresh from a chemical engineering Master’s, walked in and suggested a specialist supported catalyst, tuned just right to provide the perfect reducing power for our tricky carbon–carbon double bond while retaining a halogen elsewhere. The oldest synthetic student looked straight at him and summarily dismissed the idea by saying, ‘yeah, we don’t do that here’. This was baffling – but he was undeniably correct, we didn’t do that there. How much better we could have been if we had accepted ideas from outside our rigid experience!

History is replete with cases of pioneering scientists achieving success after refusing to be bound by the confines of a single discipline. Even within synthetic organic chemistry, working flexibly across boundaries has caused previously shunned technologies like photochemistry, electrochemistry and Design of Experiments to rise in popularity. Nowadays, leaders also spend much time and heartache trying to push interdisciplinarity, despite a system that seems bent on rejecting the notion. Academics grouse that they can’t find the right journals and that university leaders don’t understand the different publication metrics of differing fields, plus it’s tougher to recruit outside the safety of a single discipline.

Some part of human insecurity makes us resistant to sharing information

Interdisciplinary working is probably a lot easier in a large company. And, to be fair, pharma and agro chemists may have a simpler problem since our most diverse disciplines at least have human health or plant health in common. For R&D in chemistry, at least in the UK, pioneers look to the benefits of knitting ties between disparate teams. In fact I’m so used to the word ‘silo’ being used negatively that I was surprised to hear tech professionals using it as a standard term for different teams in software development. It speaks to how these isolated groups are wilfully, if not intentionally, created. Some part of human insecurity makes us resistant to sharing information.

Communication barriers

Sometimes there are rational reasons for declining to communicate. Certainly, some marketed tools have deliberately low interoperability, as a low-subtlety attempt to ensnare customers into the vendor’s data ecosystem. But as a reaction optimisation chemist, I see cases regularly where I need to interrogate other chemists’ raw data – a high-level report won’t cut the mustard. I’m looking for side-products that help me understand the previous reaction holistically, while to the original team the most important outcomes are the product yield and biological data. At its worst, when teams cannot or do not share their results directly, we end up reducing quality or repeating work.

A good data ecosystem, whether public or private, can help with this. In synthetic chemistry, we rarely approach a synthesis completely from scratch. Chemists look to their own and colleagues’ experience, and to literature prior art. However, even searching for similar molecules is a challenge. Your exact product or transformation isn’t likely to generate results, and the ones it does generate may be too few to represent cutting-edge chemistry. On the other hand, Markush structures brutally diminished to their simplest components lack the exact problem’s nuance. Additionally, it’s difficult to distinguish what actually determines ‘closest precedent’ in chemistry. Will these conditions used for a pendant alkyl ether also work when there’s a potentially less stable methyl ester in that position? Searching electronically similar groups by generic structure is also tricky, such as if that ester were replaced with a trifluoromethyl group. The same problems occur with company electronic lab notebooks – except sometimes users report that these are even harder to search.

The greater its data capabilities and the stronger the links between its teams, the closer an institution is to being able to semi-automate this search. Firstly, a shared schema designed for all users’ needs allows the most disparate of chemistry teams to contribute and benefit. To reduce the number of queries needed to cover all of the close matches, cutting-edge literature conditions and steric and electronic similarities, an automated system running through all these possibilities is a tantalising alternative. It’s plausible for a chemist to design their intended reaction, and for their data ecosystem to reply, ‘did you know your process chemistry colleague tried almost the same transformation, and your formulation colleague found the product dissolves well in propanol?’

Communicating results directly is the most memorable way to share information, but of course nobody can know every reaction and every project within even one company. A quick, automated reference might initially sound like an impersonal alternative. But building a cross-project, cross-function, and cross-institution data ecosystem naturally helps silos merge – and solves the main problem of knowing who to ask in the first place.