Curating data into pictures has nuances that are easy to overlook

No comments

There are ways to balance the benefits of holistic data without causing information overload

Modern chemistry increasingly involves exploring large sets of results – which brings the problem of how to display them. I came to know data visualisation through working as a high-throughput experimentation chemist, where I designed experiments to evaluate parameter space (and had to contend with numerous axes showing numerous variables), usually to cajole stubborn reactions into working well. Many other areas of chemistry make great use of visualisation, including complex assays run by discovery chemists and biologists, chemical space portrayals for ligand or drug discovery, or perhaps a map of water contaminants across the country. For larger teams, dashboards showing real-time metrics are gaining in popularity, for automated, at-a-glance information on project completion or instrument maintenance.

Data visualisation

Source: © Getty Images

Thoughtfully gathered data deserves careful visualisation

I didn’t initially realise the importance of how I depict data, above other aspects of the report. For some time, my biggest mistake was to present the data without any particular emphasis, in a well-intentioned effort to allow readers to form their own conclusions. A mistake, because I had carefully designed the parameters and laboured over my experiment and dataset for some hours, yet I was expecting readers to be ready to spot the same conclusions in a few minutes.

Visualisation is one of those strange areas where chemistry meets psychology, and getting the balance right feels like both an art and a science. The ease of a reader understanding a chart is related to not just the number of elements within, but in how they are shown: how bright and bold (or otherwise) they are; the effects of colour, shape and relative sizing; and more. Although all data points may be of equal importance, in the report some values or trends will be more interesting than others, and we can help the reader see those quickly, without imparting so much bias as to obscure any conclusions. Pulling trends out of the data without committing any sin of obscuring other possibilities, or leading readers down a false path, has been easier than I expected.

Seeing too many things at once is confusing

The human mind is excellent at pattern-spotting, but its abilities diminish when there is too much going on. A typical person has been shown to generally be able to hold about seven conceptual items in their mind simultaneously, most of the time – the psychological concept of cognitive load. Of course, the actual number varies depending on all manner of things, and probably in most meetings at least one of those mental items is reserved for paying attention to everybody else’s words and body language. For data visualisation, we just need to remember that seeing too many things at once is confusing.

Despite cognitive load, I’ve always found benefit in showing the entire dataset. Although journals by and large only seem to want to see yields, lab chemists of any subdiscipline prefer more holistic data when optimising a reaction. While the experimental chemist certainly prospers from knowing the side-product profile where available, readers may also benefit from this knowledge. I recall working on a bromination with some discovery chemists – we did not know in advance, but they were very interested to see a bromodecarboxylated side-product that one of the alternative brominating agents excelled in producing. Going down the publication-style route of showing only yields and maybe regioselectivity in a table would have meant they were unable to test this material in their assay. I would argue that we could all be better served to see more data about what actually happened in literature reactions, especially for any readers trying to reuse those conditions for themselves.

Luckily, there are many ways to work with the dichotomy between cognitive load and the benefit of showing complex information. Unlike the curator, readers cannot spend hours on a dataset. But when the curator tells stories with visualisation, readers can build understanding – without even knowing they’re doing it. In this way, data points collect and crescendo into concepts, and once a reader has a mental model of a concept, it collapses to be just one of the seven items they can understand simultaneously. Another cognitive load-reducing trick is to give charts titles that reflect a main conclusion. Instead of ‘yield versus base and temperature’, a more helpful title could be ‘hydrolysis is diminished for electron-rich pyridines’. When many different details matter, I like to show multiple charts for a single dataset.

I was surprised at first to see how much more insightful my reports were when I thought hard about data visualisation. My co-workers could do more with the results, faster, and reuse the information more effectively. It shows the importance of detail in what sometimes is a forgotten part of experimental reports. Even for small substrate tables or assay results, I’d now really encourage anyone to think about how their information is displayed, and whether there’s a way to make it easier for the reader to discern trends. The standard ‘way we’ve always done things’ is sometimes not the best.