Collaboration provides tools to help chemists tag their own compounds online

In an effort to make the internet’s mass of chemical data easier to search through, the RSC and US-based company ChemZoo are developing tools to help chemists label their own compounds with a standard computer-readable tag. 

The collaborative project, announced in December 2008, aims to help researchers share information on chemical structures and data for free online, and may impact on closed subscription databases such as those offered by the American Chemical Society (ACS).

The standard way to represent chemical structures using a string of text, the International Chemical Identifier (InChI), was developed several years ago by chemistry’s governing body, the International Union of Pure and Applied Chemistry (Iupac), together with the US government’s National Institute of Standards and Technology (NIST). The idea was that InChI, which is generated by an algorithm from a chemical structure, would serve as a single public format for identifying structures and - if every molecule were tagged with its own InChI - a basis for sharing chemical information on the web.

"The lack of an open chemical identifier and service to use it is a real barrier to the development of shared chemical resources across the Web" - Richard Kidd, RSC

’InChIs enable people to look up and find information on a particular chemical very quickly,’ explains Steve Heller, an expert in chemical information systems and guest researcher at NIST who helped develop the format. ’Chemical information is all disconnected on the Web right now, and this lets us organise it and link it all together.’

Yet despite its significance, the InChI system is still unused - or even unknown - by many chemists. ’[Chemists] don’t necessarily see an immediate value to InChIs yet,’ says Antony Williams of ChemSpider, the online searchable repository of chemical structures that ChemZoo hosts. But he says such tags are slowly becoming more mainstream: Wikipedia, among other commercial and public databases, is starting to use them, for instance. 

In the hope of provoking more enthusiasm for the format, the RSC and ChemZoo are working to provide a free ’resolver’ to turn any InChI into a shorter 25-letter code (the ’InChI key’), also developed by Iupac and NIST, which is friendlier to search engines. 

’Really we’re just putting in a simple bit of plumbing and encouraging deposition of new compounds [using InChI and the InChI key] to enable links between them to be made,’ says Richard Kidd, RSC’s informatics manager. ’A site that has new compounds deposited, and that will grow to harness the InChI collections around the world, will be an important central resource for locating information on a compound.’ That site, Williams hopes, will be ChemSpider, a service offered by  ChemZoo that currently hosts over 21 million chemical structures, sourced from some 150 databases.

Clash of standards

There is disagreement over what impact the collaboration could have on current gold standards in managing chemical information, such as the ACS’s subscription-only chemical abstracts service (CAS) which allots compounds a CAS number and catalogues them using its own proprietary informatics platform.

CAS, based in Columbus, Ohio, holds some 40 million organic and inorganic substances in its registry - roughly double ChemSpider’s existing database. The database is authoritative and guarantees quality, as data is deposited and error-checked by paid employees. But the information is not openly accessible - indeed, it is a major revenue generator for ACS, reportedly producing some $250 million in 2007.

"There are publishers who are willingly adopting InChIs and seeing their value to the community, but ACS is also seeing the value to not joining in" - Antony Williams, ChemSpider

Williams thinks thatInChIs could eventually disrupt CAS by allowing public online searching of compounds by structure or substructure (rather than by typing in chemical names) - something that only the CAS registry and other proprietary services such as Elsevier’s Beilstein database offer at the moment. 

’There are publishers [such as the RSC, Elsevier, and Nature Publishing Group] who are willingly adopting InChIs and seeing their value to the community,’ Williams tells Chemistry World, ’but ACS is also seeing the value to not joining in.’ 

But Heller and others are not sure that adding InChI tagging to the ChemSpider database will significantly affect CAS. CAS contains information not available elsewhere on the internet, and a searcher typing in an InChI to track down a chemical structure could simply be referred to CAS, or another database or journal to which they don’t subscribe. 

’It’s not intended to replace in any way the high-quality curated service that CAS offers,’ says RSC’s Kidd. ’But the lack of an open chemical identifier and service to use it is a real barrier to the development of shared chemical resources across the Web’.

ACS’s Glenn Ruskin comments that the RSC/ChemZoo initiative is one of several activities focused on developing standards for communication of chemical information, which ’may prove useful to some sectors of the chemical enterprise’.

A beta form of the InchI resolver is expected to be on display in March, at the ACS’ annual meeting in Salt Lake City, Utah. This prototype will enable scientists to search ChemSpider’s collection of chemical structures and associated information, and to deposit their own structures there.

Rebecca Trager, US Correspondent for Research Day USA

Enjoy this story? Spread the word using the ’tools’ menu on the left.