Ninety-four researchers have undertaken a colossal challenge: manually reviewing over 16,000 papers to extract information on perovskite solar cells. Now, they have launched an open-access database with all this information. This massive collection of data could accelerate the discovery of photoactive materials and contribute to a better understanding of how these solar cells work.

Perovskite-based solar cells have become a hot research area. In just a decade, their efficiency at transforming sunlight into electricity has grown from a modest 5% to more than 25%. The photoactive ingredient in these devices – the perovskite – is often sandwiched between other layers to collect and transport electric charge, which influences performance, as do the different fabrication methods.

‘When I first encountered perovskites in 2015, you could follow everything that happened,’ explains lead author Jesper Jacobsson, from the Helmholtz Centre for Materials and Energy in Berlin, Germany. ‘However, now this field has skyrocketed, and it’s hard to keep up with thousands of papers published each year.’ Jacobsson dreamed of an efficient manner to gather data following the Fair principles used to guide scientific data management – filtering and finding the right information makes comparisons easier. ‘Organisation gets you behind the noise, and this sparks new discoveries,’ he says.

Soon, Jacobsson started working on making his dream come true. ‘It wasn’t easy, a simple search for “perovskite solar” on the Web of Science captures thousands of papers,’ he says. So, he assembled a team and divided the workload. ‘Each author got around 200 references, as well as instructions and templates to properly collect the relevant data,’ he says. Given how information is described in academic papers and their supplementary files, all this work was carried out manually, which took over two years.

The first protocol included 100 fields for each perovskite-based device, although an improved version includes 400 data points. This includes number of perovskite layers, deposition method, material composition and, of course, performance metrics. ‘We gathered data for over 42,000 devices,’ says Jacobsson. ‘Sadly, information is often underreported, researchers only describe the best perovskite devices.’

Nevertheless, other experts believe it’s a tremendous advancement. ‘This database [is] an important step to advance research in perovskites,’ says Jovana Milic from the University of Fribourg, Switzerland. She believes that facilitating comparisons is ‘important for further advancing these systems and understanding the factors that determine … performance.’

Derya Baran from King Abdullah University of Science and Technology, Saudi Arabia, says the database is ‘impressive’. She was given exclusive access to it, to study its potential. ‘It allows you to access each reference when you click on the data [points] and shows you the progress of [each] technology.’ Baran expects the collaboration between researchers and end-users to enhance the platform. Milic agrees: ‘The [perovskite] research community will greatly benefit from this effort and … will contribute to the initiative [to] further develop it.’

These advances should also encourage transparency and reproducibility, according to Baran. Moreover, ‘to create a larger library, researchers should consider other important aspects … like stability, cost and even life-cycle analysis,’ says Baran. Reporting one-off record-breaking devices would defeat the purpose, thus maybe the database needs systems to assess the quality of submissions. Although users may report ‘fishy’ data points, Jacobsson agrees additional efforts in moderation and peer review could enhance the overall quality of the database.

Both Milic and Baran recognise the importance of meeting the Fair principles for data collection, as well as creating standardised procedures for analysing devices and reporting results. ‘It would be cool if journals created Fair systems from the beginning,’ says Baran. The last parts of Fair – interoperability and reuse of digital assets – could trigger applications for this database in machine learning. ‘Our data could feed algorithms to extract conclusions faster,’ explains Jacobsson.

Jacobsson adds that supporting open-access projects is important for the whole community. ‘We were lucky to count on Horizon 2020 funding,’ he notes. Furthermore, the newly launched database is just the beginning. ‘Open access also means our source code is free, and this same model could find uses in different fields like batteries, light-emitting devices and more,’ he points out.