Navigating the literature torrent

No comments

It’s humanly impossible to filter and read everything worthwhile – let’s embrace assistance

I don’t know about you, but I look at too many scientific papers every month. I have myself partly to blame for this, since I’m the person who set up all those RSS feeds to scroll through new abstracts. Now, it’s also true that I write a blog that discusses new developments in chemistry and drug discovery, so I always have my eyes open for that. But even without these tools and these motivations, I still have research projects and research areas that I need to keep current in. And these alone would still put too many papers in front of me.

A kayaker in white water going over a waterfall

Source: © Pedro Castellano Photography/Getty Images

It would be easy to blame all this on a proliferation of low-quality papers (and/or low-quality journals). Very easy, because there really is a proliferation of those, and it’s getting worse every year. And as long as there is continued demand from researchers, and publishers can profit from the proceeds, there’s no reason for it to stop. Whatsit Letters, Whatsit Communications, Journal of Whatsit, Whatsit Reviews, along with regional versions and Nano-, Green- and other fashionable derivatives. They can’t all be good!

I don’t have any lousy journals in my RSS feeds, and I still can’t keep up. There are too many papers that are still worth reading

But the problem is not as simple as that. I don’t have any lousy journals in my RSS feeds, and I still can’t keep up. There are too many papers that are still worth reading. There are individual proteins, cellular processes, and chemical transformations for which it is already beyond human ability to gain command of the existing literature, and ‘every day the paper boy brings more’, as Pink Floyd warned us so many years ago. No one can read all this stuff, and if anyone is so boastful as to claim otherwise, you should smile at them and put a little red mark by them in your memory so you can discount the other outlandish things they might try to tell you.

Just scrolling past the abstracts with any degree of attention is more than enough work. Every so often you’ll have a look at some of the full texts, of course, and even with a practiced eye that’s going to take some time and effort. I will confess to throwing many of these into my (overloaded) literature management software as indexed PDFs, continuing a tradition that I remember from the 1980s: that is, feeling as if you’d somehow absorbed something of a paper’s content just by photocopying it and stapling the pages together.

It would seem not such a great leap for software to somehow understand the papers and to start drawing conclusions from them. I’m not ready to go that far. Not yet

What to do? As with so many other areas, we are already leaning on the help of software and will have to do so even more. I don’t know how I ever worked without being able to do full-text searches and to sort papers into areas of interest (I well remember professors of mine 40 years ago attempting this with index cards, and I’m glad I missed out on those joys myself). But we’re also turning software agents loose to more or less read papers for us and fetch the most interesting parts to drop into our laps, and these will (need to) continue to become more capable and accurate. Automatically rating the results by novelty and overall quality would be a welcome next step, but one that I assume will have to be trained at least partially according to the preferences of each user. Novelty is in the eye of the beholder, as a look at your submitted manuscripts’ referee reports will illustrate.

It would seem not such a great leap past that for the software to somehow understand the papers and to start drawing conclusions from them. There are people who will tell you that’s already happening, or is just about to, but I’m not ready to go that far. Not yet. I think that large language models can do an excellent job of faking it, in some areas even a useful job of it. But to steal a line from Mark Twain, the difference between real comprehension and current artificial intelligence capabilities is like the difference between lightning and a lightning-bug. No, that part is still up to us, but that doesn’t mean that we can’t already get more help in staying oriented in the whitewater flow of new publications. We need whatever help we can get!

Derek LoweAn organic chemist working in drug discovery, Derek's perennially popular blog 'In the pipeline' offers an insider's perspective on the pharma industry.View full profile