Chemists saved the trouble of converting old pictures of chemical structures into computer-readable format

Ned Stafford/Hamburg, Germany

A software tool which automatically converts old pictures of chemical structures into computer-readable format promises to solve the most tedious problem plaguing chemical bibliographers. But it’s got competition. 

For years, the images contained in old scientific journals and patents have had to be redrawn digitally and entered manually into databases, so that computers can search them. Now, Germany’s Fraunhofer Institute for Algorithms and Scientific Computing (SCAI) says it has developed software that automates this process. 

But SCAI’s claim to have achieved a world first is disputed by Peter Johnson, a chemist at the University of Leeds in the UK. In an interview with Chemistry World, Johnson challenged SCAI to compare its conversion software with the software he helped to develop: CLiDE (Chemical Literature Data Extraction), marketed by Toronto-based SimBioSys Inc.

Marc Zimmermann, deputy head of SCAI’s Bioinformatics Department in Sankt Augustin near Munich, said computerised indexing of chemical structure images is one of the toughest challenges facing research record-keepers. Chemists can easily classify structures from old pictures, but to a computer, the images are just an accumulation of pixels.  The extraction and conversion software developed by SCAI, called ’chemoCR’ (chemical Compound Reconstruction) is much faster than redrawing structures digitally by hand, Zimmermann claimed.

Zimmermann said chemoCR is able to detect, extract, and convert scanned images of chemical structures if the images are clear. SCAI recently announced a 50-50 strategic partnership with InfoChem GmbH, a chemoinformatics firm in Munich, to further develop and market chemoCR, though it will take three more years of fine-tuning before the software is ready to market. ’We are already talking to nearly every big pharma company,’ Zimmermann added.

SCAI stumbled onto the project by chance when it was looking for chemical structure recognition software. The firm tested CLiDE, but was not satisfied, mainly because the software did not learn from mistakes. ’We had some ideas on how to improve their software, but they were not really interested,’ Zimmermann said, adding that SCAI then decided to try to develop its own software.

But Johnson, leader of the group that developed CLiDE, says SCAI only tested a cut down ’Lite’ version of SimBioSys’ software. SCAI was advised to buy the full CLiDE version, but didn’t, Johnson said; he also disputes Zimmermann’s claim that SCAI was open to working with SimBioSys to improve CLiDE software. He added that large pharmaceutical companies and major scientific publishers have been satisfied with CLiDE. 

Since the beginning of 2006, SimBioSys handed back further development of CLiDE to a firm, Keymodule Ltd, of which Johnson is chief scientific officer. Johnson said the company will soon release a new version of the software, ’CLiDE Pro.’

In a challenge appeared to be aimed at SCAI, he said: ’We would welcome a head-to-head comparison of CLiDE Pro with any other system which purports to do the same job. We do not welcome competitors who spread misinformation about our system rather than publishing details of their own work.’

Richard Kidd, who leads the RSC’s Project Prospect, has analysed a demo version of CLiDE. He agrees that it is hard to develop consistently accurate chemical structure recognition software, partly because structures in publications often include other information crammed on to the image. Unless software can do the work with a minimum of human supervision and double-checking, it would usually be less time-consuming to simply draw the images from scratch, Kidd said. He believes the results of even the best recognition software in the future will need some human supervision: ’It would never be 100 per cent, I don’t think. But it could get good enough to be worthwhile.’

Enjoy this story? Spread the word using the ’tools’ menu on the left.