A new, publicly available database has been created that contains the thousands of proteins encoded by genes in the human genome whose function remains a mystery. Dubbed the ‘Unknome’ database, the goal of this repository is to promote more rapid exploration of understudied proteins. It assigns a ‘knownness’ score to each of these proteins and ranks them based on factors such as function, conservation across species and subcellular compartmentalisation.

Developed by the University of Oxford’s Matthew Freeman and Sean Munro at the MRC Laboratory of Molecular Biology in Cambridge and colleagues, Unknome contains all protein clusters that contain at least one protein from humans or any of 11 model organisms.

When the research team investigated a subset of proteins in the database, focusing on 260 genes in humans about which almost nothing is known that had comparable genes in flies, they found that a majority contribute to essential functions influencing fertility, development, tissue growth, protein quality control and resilience to stress. ‘The results suggest that, despite decades of detailed study, there are thousands of fly genes that remain to be understood at even the most basic level, and the same is clearly true for the human genome,’ the authors concluded.

Munro said that the neglect of these proteins is unearned. ‘Our database provides a powerful, versatile and efficient platform to identify and select important genes of unknown function for analysis, thereby accelerating the closure of the gap in biological knowledge that the unknome represents,’ he said. The Unknome team express concern that the role of thousands of human proteins remains unclear but research tends to focus on those that are already well understood.

The creators of the Unknome database hope that, unlike other databases, it will shrink rather than grow over time.