A new open access database of microbial natural products has launched online. The Natural Products Atlas (NPAtlas) is free to use and contains more than 24,000 chemical structures. The tool is based on Fair data principles, making the information within it easier to search and use in secondary analysis.
‘Despite the fact that we’ve made advancements in a lot of areas of data-driven science, there still isn’t a central repository to record all of the chemistry that’s come from the microbial environment, in a public and open format,’ says the project’s leader Roger Linington, a natural products chemist based at Simon Fraser University, Canada.
The NPAtlas provides referenced information on source organisms, as well as compound names, isolations and total syntheses. Linington explains that, unlike many commercial databases, all of the data is downloadable and released under a creative commons licence that allows it to be used without restriction. ‘We wanted to make sure that this platform was genuinely open access and that anyone in the world who wanted to use it for any downstream applications was free to do so,’ he says.
To compile the data, dozens of researchers from around the world have painstakingly sifted through decades’ worth of literature. The compounds currently included in the database have been extracted from 10,481 articles from more than 300 journals. One major challenge facing Linington’s team, is that chemical structures are rarely included in machine readable forms. This meant that they had to manually process thousands of articles before they could train machine learning tools to analyse the text in paper titles and abstracts to accelerate the search.
Marcel Jaspars, a natural products chemist based at the University of Aberdeen, says Linington’s team has done a ‘tremendous job’ in setting up the database. ‘It is a community effort, well thought out, curated to a very high standard and will allow other scientists to contribute and benefit,’ says Jaspars. ‘The future of natural product chemistry lies in this type of open access database that adhere to the Fair principles.’ These principles specify that data should be findable, accessible, interoperable and re-useable.
The NPAtlas also provides tools that allow researchers to visualise how natural product compounds relate to one another in chemical space. This allows users to study links between molecules with shared substructures and functional groups.
‘One of the great things about natural products chemistry is the wealth of available data, and the community’s work to build datasets covering specific environments or compound classes,’ says Guy Jones, the executive editor of the Royal Society of Chemistry’s chemistry databases. ‘Roger and the team’s work to both collect these together and elucidate the links between them are really interesting, especially the tools to model and visualise connections of NPAtlas.’
Jones says that his team is already looking to make use of the Fair-data contained in the NPAtlas. ‘We’re planning to add the data from the service to ChemSpider, our public chemistry resource, to further assist with linking natural product resources together,’ he says.
Linington is keen to expand the database further. He explains that unearthing compounds reported in patents, older papers and articles written in languages other than English are challenges that his team is looking at. His team is also actively trying to add more taxonomic information, to allow greater understanding of the biological relationships between organisms and the compounds they produce.
Linington and his team are also planning to further develop the database’s search function. ‘At the moment, there’s no way for computers to directly query the web interface – human users can use the web interface to do whatever queries they want. But there’s no mechanism to allow other systems to do the same kind of thing,’ says Linington. ‘So that’s a structural change, which will greatly improve those Fair principles, in particular the interoperability question.’
Reference
J A van Santen et al, ACS Cent. Sci., 2019, DOI: 10.1021/acscentsci.9b00806