Sunday 27 November 2016

A molecule for Christmas. How cells wrap their genes.

Thanks to the pioneering work of scientists from Gregor Mendel and Friedrich Miescher to Watson and Crick together with the many scientists making up the Human Genome Organization, we all now know that there are around 20, 000 genes that make us what we are; and they're wrapped up inside the nucleus of every cell [there are some cell types that have no nuclei, but they wont concern us here]. In the spirit of Christmas, when for some of us, wrapping gifts is a bit of a nightmare, while for others, it is something they love to do, I am going to look at one of Nature's most impressive example of wrapping. I am referring to the way in which 6 billion base pairs of DNA, measuring nearly 1 metre in length, are packaged inside a nucleus whose diameter is around 3 millionths of a meter! Just stop and think of the challenge, for one minute! I think this is one of Nature's most incredible "achievements". The image a(top left) shows a classic light microscopy slide of onion root tip cells, showing the familiar organisation of the chromosomes during cell division. Before we see how the work of biochemists and structural biologists has shed light on this core feature of genetics, let's get a feel for the "scale" of the problem.

The metrics. Using Watson and Crick's model for the "average" dimensions of B form DNA: base pair length x total number of base pairs is given by (0.34 × 10-9 x 3 × 109)m, providing an estimate for the length of DNA in one nucleus as 1m. The cross section of a DNA double helix, by comparison is  2.1 × 10-9m. Again, we have another challenging set of measurements: the length of the genome is approximately half a billion times longer than it is wide! [For comparison, a human hair of length 5cm is only 500 times longer than it is in cross section]. If we calculate the volume of the genome (assuming for simplicity, that it is a continuous double helix), using these ancient formulae for all calculations:


we obtain a volume of approx 7 x 10-18 m3. [The accepted (mean) dimensions of cells, nuclei and nucleic acids are given in the Table below]. The dimensions of the cell nucleus: a sphere of radius 1.5 x 10-6 m, give a volume of around 0.12 x 10-15m3. I realise this is an over-simplification, but it does show that there is enough space to accommodate the genome as a simple Watson and Crick duplex. However, this isn't how it is done, since a continuous, long cylinder needs to be bent, at least to pack it into a spherical nucleus. Just think of the Ikea flat pack concept: they can reassemble parts, the nucleus has to keep the genome intact (or at least its chromosomes)! The key to genome packaging in nuclei is the nucleosome: the molecule of the month!

The nucleosome is the name given to the fundamental packaging unit of eukaryotic chromosomes: it is built from a protein core comprising the "histone octamer". This is an eight subunit cylinder/sphere comprising the histone proteins H2A, H2B, H3 and H4, around which 146 base pairs of DNA are wrapped, with the histone H1 providing an organising "cap" [see left]. If we now calculate the theoretical volume occupied by the genome in nucleosomal units (with the dimensions shown in the figure), by simply dividing the genome size by 150 (the "rounded" number of bases pairs wrapped around the nucleosome, to simplify the calculation), we end up with the following. There are (6, 000, 000, 000)/ 150 possible nucleosomes: 40, 000, 000. The dimensions of each nucleosome are: radius 5nm, height 6nm: the volume is therefore approx 100 x 10-27 m3 and there are 40,000,000 of them, which adds up to a total volume of approximately 4 x 10-18m3. Which suggests that chromosomes occupy a little above 3% of the total nuclear volume. If we use the base pair volume for this calculation we obtain a volumetric percentage as 2.5%. The basic published metrics for an average human cell (excluding sperm and oocytes) are taken from the Harvard hosted, Bionumbers web site. In my book this is pretty impressive packing!


Component
Volume(m3)
Cell
  3.00 x 10-15
Nucleus
  0.12 x 10-15
Nucleosome
 94.00 x 10-27
Base-pair
  1.00 x 10-27


If you are not so confident with unit conversions, remember that when a unit is cubed, you must remember to cube the powers: for example:
1mm3 = 1x10-9m3


Let's now shift the emphasis from metrics back to molecular cell biology. The acquisition of a nucleus by a prokaryotic cell, is considered to be one of the landmarks of evolution. It is generally accepted that cells without (or rather before) nuclei, prokaryotes, preceded those in which the genome is  sequestered in a membrane bound nucleus (shown left): the eukaryota. A question that is surely worth a tutorial discussion is whether prokaryotes have histones, and if they do are they anything like eukaryotic histones (see the structural information below)? Moreover, was the appearance of histones (or their molecular pre-decessors) a first step on the way to compartmentalisation of the genome? For molecular biologists, with access to vast amounts of genomic data, the first port of call might well be the NCBI (or similar). Searching for the term "histone" (amongst the protein coding sequences) throws up over three quarters of a million hits! If I now perform a simple BLAST search (listing say, 1000 sequences) with the terms appropriate for the human Histone H2A [Histone H2A Homo sapiens] (you can read about this in an earlier post here), I obtain a set of almost identical sequences in species ranging from human through Drosophila to sea anenomes

Clearly, one immediate conclusion is that once histones (H2A in this case) appeared on earth, they remained unchanged (at the primary structural level) over many generations of evolution. The same results are obtained for the other core histone octamer components (but don't just take my word for it, try it for yourselves, it's free! I am writing a detailed post on how to BLAST, which should follow over the holidays). All of this suggests that on the one hand, the early histones hit upon a solution that was and remains, fit for purpose, but it also makes me think "why 4 different histones"? If I now perform a pairwise BLAST between human Histones H2A and H2B I get (what to me was at first, and actually still is) an unexpected result. Whilst both proteins have a net positive charge, they are quite different in terms of primary structure. I wont show the alignment: it is too weak (less than 25% similarity), but it is worth looking at these relatively short sequences, both of which contain over 25% positive (Lys and Arg) side chains.

Histone H2B (19 K and 8 R residues)
 
mpepaksapa pkkgskkavt kaqkkdgkkr krsrkesysi yvykvlkqvh pdtgisskam
gimnsfvndi feriageasr lahynkrsti tsreiqtavr lllpgelakh avsegtkavt
kytssk

Histone H2A (14 K and 12 R residues)
 
msgrgkqggk arakaktrss raglqfpvgr vhrllrkgny aervgagapv ylaavleylt
aeilelagna ardnkktrii prhlqlairn deelnkllgk vtiaqggvlp niqavllpkk
teshhkakgk
 
The sequences above are worth considering for a general discussion of protein primary structure and possibly suggest how primary structure might drive tertiary structure (but I will leave this for tutors).

So we now have to consider the histone octamer as a more complex unit, rather than just a uniform sphere with a net positively charged surface. And we have to consider the co-evolution of the histones: H2A, H2B, H3 and H4, since they are similar at one level, but quite distinct at another. As Flaubert once said, the devil is in the detail. From experience however, I would now like to know whether the three dimensional structures of these histones are distinct or effectively the same. This is very similar to one of the challenges that faced Watson and Crick. They reasoned that DNA is likely to be a (structurally) homogeneous polymer, but it is made up of 4 different bases. In fact two classes of 2 bases: the purines (adenine and guanine) and the pyrimidines (cytosine and thymine). Combined with Chargaff's rules, they reasoned that purines pair with pyrimidines: specifically; A with T and G with C. Are we observing a similar phenomenon? Whilst the precise chemistries of the nucleotide bases are  different; in order to pack 3 billion base pairs into a set of chromosomes, the double helical base pairs G:C and A:T can be considered effectively the same, in terms of their molecular envelope and surface properties. Maybe the same logic applies to the histones? The image at the top left is of the histone fold (follow the blue helices in H3). This structure is found in all of the core histones, as well as some other gene regulatory proteins. It is now well known that the core histones possess a conserved 3D structure, in fact the sequence variations hide the almost superimposable atomic configurations of the core histones. As you can see from the figure, the formation of mixed dimers is facilitated by complementarity of the sequences of H3 and H4, combined with this structural "identity". The two histones lock together via a so-called molecular handshake (shown top left). I think this is an excellent example of how structural biology and genetics combine to make biochemical sense. I would encourage you to look for yourself at these structures and sequences in more detail via the various NCBI/PubMed resources. 

An external file that holds a picture, illustration, etc.
Object name is 1479-7364-6-10-4.jpgTo summarise then, the histones that make up the core particle are individually highly conserved at the primary structure level (e.g. human and yeast H2As), but not between each other (eg human H3 and H4). However, all four core histones share the same three dimensional fold, and moreover, they form heterodimers. Furthermore, it has also been shown that the nucleic acid condensing protein from bacteria, HU, shares a similar fold to histones (see the figure on the RHS)


I now want to look at the work from Cambridge (UK) which has included contributions from the Laboratory of Molecular Biology Aaron Klug's group which included Tim Richmond, who migrated his lab to Zurich over 20 years ago. Before the landmark structure of the yeast RNA Pol II enzyme complex, the Nobel laureate Roger Kornberg and Jean Thomas, were also at Cambridge and made key contributions to the nucleosome field. The beautiful image on the left is taken from Tim Richmond's lab web pages  The coloured histones form the "medulla", and the 146 base pair duplex is wrapped around them: the "cortex". In many protein nucleic acid interactions, it is the sequence specificity that attracts attention: in the case of the nucleosome, it is the relative indifference of the core histones, combined with their strong affinity for the DNA, that attracts my (and many others') interest.

[A nice topic for tutorials is to discuss the relationship between specificity and affinity in macromolecular interactions. Compare histones with antibodies, for example. I have posted on these concepts earlier, here]

The nucleosome makes over 120 direct protein-DNA interactions and several hundred, indirect water-mediated ones. The direct protein-DNA interactions are topologically "patchy", over the surface of the histone octamer: they arise from both helix displayed side chains and loop amino acids. The challenge facing the octamer is to curve the DNA as shown left (again from the Richmond lab), in a DNA sequence independent way. It seems that central to achieving this are a mixture of electrostatic "salt bridges" and hydrogen bonds between the many basic side chains and the DNA, but also important are the interactions between main chain amides and the phosphate backbone of DNA. I should reinforce this point: when a polypeptide makes interactions with a ligand (DNA here), via its main-chain atoms (as opposed to the amino acid side chain atoms), there can be no genetically encoded contribution to these interaction(s) other than the indirect contribution a primary structure makes to the final protein fold (see the discussion above).  

This is a critical observation that explains how the many nucleosomes distributed throughout our genome are mainly sequence-independent DNA-binding units. Although nucleosomes do have a level of DNA sequence preference, they are capable of binding to almost any DNA sequence. Another important element of this sequence insensitivity, is the utilisation of water mediated (indirect) interactions (see left).

Finally, we come to Histone H1, which is often called the linker histone. How does this "outsider" histone fit into the scheme of things? Imagine a gold necklace in which every link represents a core particle. Now throw the necklace in the air and let it drop. The chain will adopt many "random" shapes or configurations. If you now take two adjacent links between your thumb and fore-finger and move them up and down, there is a clear level of "rigidity" the two links are constrained: partly because they are made of a hard metal, but also because the hole in the middle of each link is mostly filled by part of the adjacent two links. So there is a level of conformational restriction, but there is also quite a large element of freedom across the whole chain that makes it impossible to predict how a necklace will land when it is dropped on the floor. The same result would be obtained with a pair of molecular tweezers, if you repeated the experiment with a chain of nucleosomes, in the absence of Histone H1.This histone provides a scaffolding function: it guides the nucleosome particles into a higher order arrangement that ultimately form the chromosome. In the interests of brevity, I shall return to the higher order packing (and unpacking) of chromatin in a future post. For now, we can envisage the 146 base pairs of DNA spooled around the octamer, with the linker histone, H1 imposing a constraint that promotes the packaging of the nucleosomes themselves.

In conclusion, I hope you will all agree that the nucleosome is an ideal molecular structure for Christmas. I shall return to the theme on Boxing Day, when I shall tell you about unwrapping the genome! In the meantime, some of you may like to read this thoughtful post where the author (and the commentators) consider the volume of information contained in our genome, from a computational perspective.