Using x-rays to probe biological molecules has revolutionised science. Clare Sansom looks at a century of progress

The International Year of Crystallography, celebrated throughout 2014, stems from the discovery almost exactly a hundred years ago that the diffraction of x-rays by crystals could yield information about the structures of the atoms and molecules within (see Crystal clear). Once this principle had been demonstrated, scientists began to use x-rays to study the structure of organic and biological as well as inorganic materials. The first bio-organic substances to yield diffraction patterns were fibrous ones such as wool and silk, and, a little later, fibres of DNA. William Astbury, who worked on fibre diffraction at the University of Leeds, UK, obtained ‘beautiful’ DNA diffraction patterns decades before Rosalind Franklin, but found them uninteresting; in any case, he could not have solved the structure, as no theory of helical diffraction had yet been published.

It is, however, only a tiny fraction of biological macromolecules that are regular enough in structure to be tackled by fibre diffraction, and early investigations of biological structures were also hindered by nearly complete ignorance of their chemical make-up. The first protein crystals had been reported in 1840, when the nature of so-called ‘blood crystals’ – of earthworm haemoglobin – was still completely unknown. Although the name ‘protein’ was coined in the 19th century, it was not until 1926 that US chemist James Sumner was able to prove that enzymes were proteins. He achieved this through purifying and crystallising urease, and thus, simultaneously, obtained the first crystals of an enzyme.

In the decades before modern techniques of molecular biology it was almost impossible to obtain sufficient quantities of pure protein for crystallisation experiments. The only proteins that pioneering structural biologists could work with were those that were available in relatively large quantities from, for instance, blood and egg white, and the digestive enzymes that could be obtained from slaughterhouses. Neil Isaacs, emeritus professor of chemistry at the University of Glasgow in the UK, remembers using ‘what we would now call bucket chemistry’ to produce large enough protein samples for crystallisation experiments as late as the 1970s.

Crystallising the problem

Nasa / Science Photo Library

The most important step towards protein crystallography occurred almost by accident. In 1934, John Philpot, an English biochemist working in Uppsala, Sweden, left a sample of the digestive enzyme pepsin in a fridge while he went skiing. On his return, he found that the protein had formed ‘unusually large’ crystals. He sent these to the crystallographer J D Bernal (known as ‘Sage’), then working in Cambridge, UK, who described them as ‘the best protein crystals [he had] ever seen’. Another crucial breakthrough came when Bernal realised that the delicate crystals became disordered as they dried out. He mounted a single crystal in a capillary tube with some of its mother liquor on his x-ray camera and obtained a definite pattern of spots. ‘Sage was ecstatic … imagining how much information … would be unlocked if only those photographs could be interpreted in detail,’ writes Bernal’s biographer, Andrew Brown. Bernal and his PhD student, Dorothy Crowfoot, published this work in a 1936 Nature paper titled just ‘X-ray photographs of crystalline pepsin’. Crowfoot went on to obtain x-ray diffraction photographs of another small protein, insulin, and to achieve fame and a Nobel prize as Dorothy Hodgkin.

An x-ray diffraction pattern from a protein crystal, or, indeed, from any other crystal, consists of a regularly spaced pattern of x-ray spots. Transforming these into a three-dimensional structure of the molecule requires information about the amplitude and phase of each spot. Measuring intensities yields the amplitudes, but phases cannot be measured directly. A solution to this ‘phase problem’ is necessary before a structure can be obtained from crystals: at first this was isomorphous replacement, which involves comparing a diffraction pattern from a native crystal with one from the protein bound to so-called ‘heavy atoms’, generally electron-rich metal ions. This technique is still in use in protein crystallography today.

Austrian Max Perutz had begun his studies of the blood protein haemoglobin in the 1930s, as a PhD student under Lawrence Bragg in Cambridge; John Kendrew, one of Perutz’s first doctoral students, chose the related protein myoglobin for his own structural work. It took over twenty years for isomorphous replacement to yield low resolution structures of first myoglobin and then haemoglobin. And Perutz and Kendrew’s elation when the structures were finally solved turned to disappointment as they showed none of the regularity of the recently-solved double helix of DNA, and provided no immediate insights into the mechanism of action of the molecules. One important point was proved, however: both structures included eight obvious ‘rods’ with the proportions of the alpha helix that Linus Pauling had predicted some years earlier.

Guy Selby-Lowndes / Science Photo Library

John Kendrew worked on the structures of myoglobin (pictured) and haemoglobin

In 1963, the International Union of Crystallography commissioned Paul Ewald, one of the pioneers of crystallographic methods, to edit a collection of papers celebrating the 50th anniversary of the discipline. One paper was devoted to the structures of biologically important macromolecules, but it could still feature only two: haemoglobin and myoglobin. The paper’s author, US crystallographer Ralph Wyckoff, merely commented optimistically that ‘Probably before long we will have a correspondingly detailed knowledge of ribonuclease and perhaps of other globular protein crystals’.

Throughout the later 1960s and the 1970s structures began to trickle out, in parallel with improvements in crystallographic techniques. Hodgkin brought her ‘insulin project’ and crystals to Oxford when she moved there from Cambridge in 1936; it took her over 30 years to solve the structure. A few years after that achievement, she wrote ‘I used to say that the evening I developed the first x-ray photograph I took of insulin in 1935 was the most exciting moment of my life. But the Saturday afternoon in late July 1969, when we realised that the insulin electron density map was interpretable, runs that moment very close.’ Many of the young scientists in her group had not been born when she grew the first insulin crystals.

Obtaining enough of a given protein to purify and crystallise was only one of many hurdles faced by the pioneers of protein crystallography. At least Hodgkin and her co-workers knew the amino acid sequence of insulin, as it was the first to be determined by Fred Sanger in 1951; Perutz and Kendrew had had no sequence of myoglobin or haemoglobin. Additionally, little was known about the amino acids themselves; a list of almost 800 small-molecule structures published by the Cambridge Crystallographic Data Centre (CCDC) in 1965 included only 10 of the 20 amino acids found in proteins.

Digital advances

John Helliwell, emeritus professor of chemistry at the University of Manchester, UK, who began his career in structural biology in the 1970s, remembers the methods used then as ‘very haphazard and primitive’. The structural biologists’ equivalent of the CCDC, the Protein Data Bank (PDB), was created in 1971; the first operational version of the data bank, published two years later, included nine sets of atomic coordinates for proteins. Everyone working in the field knew this list off by heart, and probably knew the principal scientists personally.

The methods used in the 1970s were very haphazard and primitive

John Helliwell

Protein crystallography developed alongside, and benefited greatly from, two revolutions that transformed science in the second half of the 20th century. The first, molecular biology, has already been touched on; the second, no less important, was in computing. Until the development of Fortran in the 1950s, scientific programs had had to be written in inaccessible machine code or assembly language. The first suites of programs for crystal structure determination were written in the 1960s and 1970s using that language. George Sheldrick’s Shelx package, which was first published in 1976, is still widely used for both small molecules and proteins.

Then, too, computer graphics – if they existed at all – could generate only primitive, stick-like models of molecules. Perutz, Kendrew, Hodgkin and other early protein crystallographers employed artists to illustrate their papers. As late as the 1980s, complex molecular graphics needed powerful and therefore expensive machines. ‘At one time, the universities of York, Sheffield and Leeds in England shared an Evans and Sutherland computer that could do high resolution molecular graphics, and it was trundled across the north of England every few weeks,’ remembers Isaacs. Rather cheaper Silicon Graphics machines and Alwyn Jones’ program Frodo brought graphics into protein crystallography labs, and then, in the mid-1990s, Rasmol was born. This user-friendly little program put interactive images of complex molecular structures onto the desktop of every bench biochemist and every student. It has been superseded many times, but biochemists and programmers still acknowledge a debt to its UK computer scientist author, Roger Sayle.

More power

RGB Ventures / SuperStock / Alamy

The European Synchrotron Radiation facility in Grenoble produces over 20% of the protein structures solved across the world

The introduction of powerful synchrotron x-ray sources did not impinge on protein crystallographers at first, as Helliwell remembers. ‘I first heard about the idea of using synchrotron radiation for protein crystallography from Dorothy Hodgkin when I was a student, but most researchers thought that fragile protein crystals would not stand the intensity of this radiation,’ he recalls. ‘It took about five years even to agree to try protein crystals in synchrotron beams, but the technique rapidly became established and most high-impact protein crystallography is now performed at synchrotrons.’ Helliwell later pioneered instruments and methods for macromolecular crystallography using synchrotron radiation at Daresbury in the UK, and transferred these developments to the European Synchrotron Radiation Facility in Grenoble, France. The UK’s current synchrotron, Diamond at Harwell in Oxfordshire, now has 29 active beamlines, each providing an intense beam of radiation with slightly different characteristics of intensity and wavelength; five have been optimised for macromolecular crystallography.

Even before the first structures were solved, visionary scientists looked forward to a time when structures of biomolecules might have medical relevance, particularly in drug discovery. Perutz’s hope that the haemoglobin structure might immediately lead to treatments for anaemia came to nothing, but some other early structures were more helpful. In particular, the structure of the enzyme dihydrofolate reductase, first solved in 1982, has proved useful in the development of drugs for cancer and for some infectious diseases. This enzyme makes a co-factor in nucleic acid metabolism, so inhibiting its action puts a brake on cell proliferation.

Structure-based drug discovery has evolved alongside protein crystallography since the 1980s. This is now considered an essential tool, but its most stunning success to date may have occurred fairly early on. The human immunodeficiency virus (HIV) was discovered to be the cause of Aids in 1985, and its tiny genome sequenced a few years afterwards. This was found to contain genes for just three enzymes, reverse transcriptase, protease and integrase, each of which was a target for known or potential antiviral drugs. The protease attracted the attention of crystallographers because of its small size and simplicity, and because of similarities with a well-understood protease: pepsin. ‘Determining the structure of this enzyme was a huge success because it revealed it to be a dimer of two identical subunits … which paved the way for drug design,’ says Polish crystallographer Mariusz Jaskolski, a co-author of several papers from one of the three groups that published the structure almost simultaneously.


Proteins were sent on Nasa missons to space in an attempt to grow crystals

The first anti-Aids drugs in the clinic, however, were targeted not against the protease but against reverse transcriptase, a larger enzyme that was very difficult to crystallise. Scientists were keen to solve its structure in order to develop more selective and therefore less toxic inhibitors. At that time – the 1990s – researchers with particularly intractable proteins sometimes sent samples up in the Space Shuttle to see whether crystals would grow better in zero-gravity conditions. David Stuart’s group at the University of Oxford tried this with HIV reverse transcriptase, which prompted a perhaps predictable tabloid headline: ‘Madness: Boffins want to put Aids in space’. They obtained no crystals, but crystallographers learned a great deal about crystal growth methods from these and similar experiments.

When HIV protease entered the PDB in 1989, the database held about 350 structures and the development of a rigorous classification system had begun to seem essential. Two hierarchical databases of protein structure were developed almost simultaneously: Cath at University College London and Scop at the University of Cambridge. Cath was the brainchild of Janet Thornton, now director of the European Bioinformatics Institute at Hinxton near Cambridge. ‘Cath’s relatively simple classification scheme allows crystallographers to find similar structures in the PDB even without an obvious evolutionary relationship between the proteins, and it has stood the test of time,’ says Thornton.

The Nobel prize factory

In total, 24 of the 42 scientists who have been awarded Nobel prizes for discoveries related to crystallography have achieved this recognition for work related to the structures of biological macromolecules. And although the US is the country with the highest number of Nobel-winning crystallographers, one institution – the UK Medical Research Council’s Laboratory of Molecular Biology (LMB) in Cambridge, established in 1962 with Perutz as director, stands head and shoulders above all others: seven of the 24 structural biology laureates were based there. Kiyoshi Nagai, who joined Perutz at the LMB from Japan as a post-doc in 1981 and has stayed there ever since, stresses the value of the laboratory’s collegiate atmosphere. ‘Max took an immense interest in his students’ and young scientists’ work, and took care to show how much he valued their opinions,’ he remembers.

The most recent LMB scientist to be awarded the Nobel prize was Venki Ramakrishnan, who was recognised in 2009 with Thomas Steitz from the US and Ada Yonath from Israel for elucidating the complex structure of the ribosome, the ‘molecular machine’ that catalyses protein synthesis (see Biology’s Nobel molecule factory). ‘It took many people – not just three – at least 10 years to solve the first ribosome structure, from a thermophilic bacterium,’ says Ramakrishnan. ‘We can now obtain structures at different points throughout the protein synthesis cycle, and each “snapshot” tells us more about how the machine works. But we have at least 15 years’ more work ahead of us.’

The PDB, founded from a list of just seven structures, today holds over 100,000: it is fitting that this landmark was reached during May 2014, nearly halfway through the International Year of Crystallography. Progress is continuing to accelerate, not least in solving structures of previously intractable membrane proteins. These include the G-protein coupled receptors that are the targets of perhaps about half all prescription drugs on the market today, and for which Robert Lefkowitz and Brian Kobilka were awarded the 2012 Nobel prize (see A signal honour). And, as the second century of crystallography dawns, the discipline may even be beginning to leave crystals behind.

We call it “diffraction before destruction”

Henry Chapman

The British Crystallographic Association’s conference in April 2014 included a structural biology programme with the strapline ‘Pushing the limits: Faster, smaller, slower, larger’. Solving the structures of ribosomes and other ‘large’, complex structures often involves techniques such as electron microscopy that are complementary to x-ray crystallography. Typically, atomic resolution x-ray structures of subunits are ‘docked’ into lower resolution electron micrographs of the entire complex. Bacterial respiratory complexes and secretion systems are among the structures that have yielded to this approach, but it may soon become unnecessary as electron microscopy – which does not need crystals – approaches atomic resolution.

At the other end of the scale, free-electron lasers that produce pulses of x-rays that are powerful enough to cut through steel can now reveal structures from crystals too small to be seen in a light microscope. A typical x-ray pulse from a free-electron laser is more than a billion times as powerful as that from a synchrotron source, but only lasts for a few femtoseconds. ‘The pulses are so short that the crystals diffract before they are “cooked” by radiation: we call it “diffraction before destruction”,’ says Henry Chapman, director of the Center for Free Electron Science in Hamburg, Germany. ‘We will use this technique to track biomolecular reactions with “snapshots” at different time points, and may even obtain structures from single molecules: the smallest “crystals” of all.’

During the 70 years since Bernal and Crowfoot’s first protein diffraction patterns, structural biology has matured beyond anything they could have anticipated. Crystallography’s second century will doubtless yield even more insights into the structures that define and maintain life. The International Year of Crystallography is giving the giants of the field the recognition they deserve, but, as Isaacs says, protein crystallographers ‘owe an enormous amount to an enormous number of people from many different disciplines’, and most of these are bound to remain unknown.

 Clare Sansom is a science writer based in London, UK