Tuesday, 3 October 2017

I got rhythm: and so have you?

The Nobel Committee in Stockholm, yesterday awarded the Prize in Physiology or Medicine to three Biologists who discovered the genetic basis of our own personal "cellular clocks". You may recall in 2014, the Prize was awarded for work on the brain's very own "satnav" system, which makes sure we know where we are going and where we have been. The work on "Circadian Rhythms" by Jeffrey Hall, Michael Rosbash (Brandeis University, Boston) and Michael Young (Rockefeller University, New York) provides an insight into our connection with night and day. Through their work, we can now appreciate how life on Earth is related to the movements of the sun, the moon and the Universe in general. 

Image result for night and dayThere are some words you will need to be familiar with in understanding this post. Diurnal is derived from the Latin meaning of the day or daily and comes from the word dyeu to shine (diamond?). Nocturnal, I think you will know refers to the night and finally, circadian (combined here with rhythm), comes from circa (about or around) and day and is an adjective that describes a process that occurs on a 24 hour cycle. Some species are awake in the day and sleep at night, like us (and I will come back to the problems that shift work and long plane journeys can cause) and others are nocturnal, like bats: sleeping all day and foraging at night. I hope you can see here how evolutionary adaptation is linked to the motion of the planets. Let's look at what they discovered, before I return to the planets!

In 1971, the great US geneticist Seymour Benzer and his colleague Robert Konopka, isolated a mutation in the fruit fly (drosophila), that had an altered pattern of behaviour. More specifically, the mutant flies seemed to have an elongated circadian rhythm of 29 hours instead of 24. They named this gene period, or per for short. This landmark discovery paved the way for the work honoured this week, by the three Nobel Laureates above. The per gene turns out to encode a protein that regulates its own production and destruction: levels of the messenger RNA (mRNA) that encode the PER protein peak during the night and drop down in the daytime (as you can see graphically, top left: the x-axis is in hours). However, the PER protein is produced in the cytoplam of the cell and of course the genes are in the nucleus. It was Young who, in the early 1990s identified the timeless gene, encoding the protein TIM. TIM binds specifically to PER and transports it to the nucleus, where it can now act accordingly. A further protein, identified by Young and this time called doubletime fine tunes the whole process, allowing for occasional adjustments to the 24 hour fixed period. In humans, the network of genes/proteins involved in regulation the body's circadian rhythms is a little more complex than I have described, with the usual collection of protein phosphorylating molecules (kinases and their counterparts phosphatases) ensuring the levels of PER proteins are exquisitely balanced. Finally, the "steady state" levels of the PER proteins are subject to controlled proteolysis in a manner reminiscent of the control of the cell cycle itself by cyclins, which Sir Tim Hunt told us all about when he visited the UTC.

I hope you can appreciate that the genes and proteins involved in maintaining the correct 24 hour clock in cells has now been firmly established by an elegant combination of genetics and biochemistry. Seymour Benzer, who was originally a physicist with interests similar to the young Einstein, set the Nobel Laureates on a journey that has linked Biological Evolution to our place in the Universe. The typical day on Earth is 24 hours, which is determined by the rotation of the Earth during its orbit around the sun, but night and day is opposite to us, if you live in Australia. Given that Mars has a solar cycle that is just over 39 minutes longer than our own: we might expect that life forms on other planets have evolved to match their own solar cycles.

Finally, beyond the fundamental importance of molecular basis of Circadian Rhythms, why may they be important in respect of our health and well-being? As I mentioned above, by flying in the face of our diurnal nature, the regulation of the PER system needs to adapt: this happened when you fly a long distance or you adopt the working habits of a badger and work shifts. Many people choose to work permanent nights, and must therefore re-configure their body clock. Try it once when you are not forced to! [I should mention that the PER network of regulation interacts with light sensors (cryptochromes) thereby providing the cell (and the body) with valuable cues which fine tune the body clock. It is also becoming clear that there is a correlation between the action of certain drugs and the body clock. If you are interested you can read these "rapid response" articles that appeared just after the announcement.

The Biochemical Society
The New York Times
The Nobel Foundation

Thursday, 28 September 2017

Crambin: my molecule for September 2017

This month I have picked a protein I first heard about from a crystallographer, visiting Sheffield around 1981. What interested me was that the structure had been refined to a very high resolution (less than one angstrom, in the early 1980s!). What also interested me was that this was a protein with no known function. So what was the point of all that effort! The molecule itself is pretty unremarkable (see the image left). It contains two alpha helices oriented into a Y-shape, linked by a short, constrained loop. The N-and C-termini seem to have considerable freedom, with a small amount of beta sheet, but surprisingly perhaps, the protein crystallises very easily.

The molecular envelope shown right, shows crambin to be a globular molecule with a well defined shape. Without any knowledge of its function, the surface of the long helix is presented for interaction and the N and C termini could re-fold around a small ligand or another macromolecule. But there is no evidence of a metal ion or any significant space for a small ligand or substrate. After all, it contains less than 50 amino acids, which makes it an ideal candidate for NMR spectroscopy.

The protein was initially isolated from an Abyssinian cabbage (or kale) and is now known to belong to the family of toxins called thionins.  [The source of proteins used by biochemists would make a nice Blog post for the future!] The key to the stability of the terminal segments is (as the name thionin suggests) the disulphide bond. The sequence of crambin is shown below, with the Cys residues highlighted. If you look at the representation shown left, you can see the yellow sulphurs and the small network of disulphide bonds that contribute to  the stability of the structure. This is a feature of many extracellular proteins, including immunoglobulins. The analysis of structures at such high resolution provides a molecular framework for defining the precise geometry of such bonding phenomena and provide a nice 


experimental opportunity to address the role of disulphides in stabilisation and in protein folding pathways of proteins in general. Try mapping the bonds from the structure onto the sequence. You will immediately appreciate that primary structures must be considered in three dimensions in order to fully appreciate the significance of sequence conservation! 
Image result for crambin nmr structure

The possible application of this plant product in cancer treatment is being investigated, but remains at an early stage to date. One other point I would like to draw your attention to is a comparison between methods of structure determination. X-ray crystallography "prefers" proteins of several hundred or more amino acids in each polypeptide, whereas Nuclear Magnetic Resonance (NMR) spectroscopy "prefers" proteins with molecular weights below 20 000 (<200 amino acids). These rules aren't hard and fast, but they do significantly improve the probability of obtaining a high resolution data set (required to fix the position of side-chain atoms). The structural representation on the right was obtained by NMR. NMR structure determination generates an "ensemble" of structures that are consistent with the spectral data. (See here for an introduction to protein NMR). The first thing you realise is that some parts of the protein are better defined than others. In X-ray crystallography, any significant "flexibility" in a protein structure usually prevents the assignment of electron density in that region and this may mean that this section of a protein is not included in the deposited structural file (it is usually pointed out in the publication). In the case of crambin, the NMR structure suggests that the N and C termini are pretty rigid. A consequence of the disulphide bonding, explaining why the protein is so compact and probably explains why it such an amenable molecule for obtaining high resolution atomic data.

So finally, why has so much effort been invested by structural biologists in a molecule of such poorly defined function? This is an important issue in Science in general: what should we (as tax payers, versus say drug companies) spend our money on? Molecules like crambin can help establish the fundamental principles of protein structure. Some medically important molecules may be difficult to purify, may be unstable or may yield poor diffraction (in the case of X ray crystallography) or may be difficult to solubilise and show poor spectral resolution (for NMR). The insight we gain from "well-behaved" proteins can help us fill in the gaps with molecules that we can easily recognise as being of societal value. Moreover, workhorses like crambin can help us push the envelope of techniques like X-ray crystallography and NMR, which may then make it more likely that we can interpret the data from proteins that are less well-behaved! One final point is that structure determination alone can rarely determine the function of a molecule. It may be that we need to solve the structures of the entire proteome of humans before we are ready to comprehensively link structure and function in Biology!

Wednesday, 23 August 2017

The challenges of making RNA in bacteria. A late summer selection of molecules

Related imageThis month is a little later than I had hoped, mainly due to my choice of molecule. I decided I wanted to get more familiar with the (expanding number of) regulatory proteins involved in the control of gene transcription in bacteria; partly out of a research interest, and partly because it allows me to combine my interest in language (or more specifically alphabets) and Science. The down-side is that I will have to replace all of my as with αand my bs with βs etc. which is always a little clunky on my free Blogging software! The molecule I am focusing on is RNA Polymerase (Pol), which I have covered earlier. However this time I am going to take a look at the transcription factors, or "known associates" of this multi-subunit protein complex that bridges the gap between information and function. The genomes of all prokaryotes contain a set of between 2 000 and 5 000 protein coding genes together with a few hundred genes that encode functional RNAs. This is all information; but in order to "translate" from nucleic acid speak (Nu-speak: sorry George!) to the language of amino acids and proteins (Pep-speak?), the ribosome is required. However, a limited number of RNA species combine an information mode with function, such as the hammerhead ribozyme, that can catalyse specific RNA cleavage in the absence of any proteins (a good future molecular candidate perhaps?). The nucleotide sequence of a ribozyme is no different than that in the genome (apart from an additional oxygen atom per sugar),and it also determines its three-dimensional fold. And therefore its biological function. 

You may be interested to know the source of the images used in this post. I have chosen, where possible, to include the beautiful models created and exhibited at the Pingry Biomolecular Modelling Project web site which is just one of the incredibly impressive Pingry School initiatives at the school: more information on this ground-breaking collaboration between the Milwaukee School of Engineering (MSOE University) staff and the students and teachers at the Pingray School can be found here. On the right is an image of the components of RNA Pol in the early stages of transcriptional initiation. I hope you will agree with me that these models capture both structure and function in a beautiful and informative way.

The Basics RNA Pols, in their simplest forms (let's leave bacteriophage enzymes on the side for now), comprise two α subunits, a β and a variation of β, called β-prime (written β')(there are some enzymes in which the β-type subunits are fused, but these are only occasional exceptions). This hetero-tetrameric "apo-enzyme" then associates with a number of "regulators" to form the "holo-enzyme", the most important being the σ subunit, which is critical for determining the DNA sequence specificity associated with the choice of the promoter to be transcriptionally active. The image below the reaction scheme shows the promoter sequences recognised by the RNA Pol holoenzyme, with the -10 and -35 elements (recognised by the sigma factor) highlighted. As we we shall see below, the σ subunit comes in a number of different "flavours".The reaction catalysed by RNA Pols is shown below: it is important to remember that while I am discussing sequence specific DNA binding, RNA Pols are catalysts and DNA and RNA represent substrates and products respectively. 

The prefixes apo and holo are derived from the Greek: meaning away from and complete, respectively, and are used frequently by Biochemists to describe proteins without (apo) a key component, such as a co-factor compared with the fully functional molecule (holo): apo-haemoglobin lacks the haem, for example. Which brings me to the inevitable glossary: an essential set of definitions of terms, symbols and concepts needed to understand gene transcription and for those of you are unfamiliar with the idiosyncrasies of the Greek alphabet, I have included my suggested (phonetic) pronunciations: remember when discussing Science, it really helps if you feel confident about the pronunciation of some of the rather ludicrous terms!

The Greek alphabet and my advice on pronunciation! [A "hard" consonant, eg the first and last G in gang is written gg, while the soft G in German is written as a single j. Where there is no ambiguity, e.g. the letter D, it is shown as a single d. If the vowel is drawn out, like the two Es in meet, it is again doubled].

α (alff-a)
β (bee-ta (UK), bayta (USA))
γ (ggamm-a)
δ (delt-a)
ε (ep-ssee-lon)
ζ (zee-ta)
η (new)
θ (thee-ta (UK), sometimes tha-yta (USA))
ι (eye-oh-ta)
κ (kapp-a)
λ (lamm-da)
μ (mew)
ν (new)
ξ (k-ss-eye)
ο (oh-mee-kron)
π (p-eye, or for English readers pie!)
ρ (row)
σ (ssigg-ma)
τ (torr)
υ (up-ssee-lon)
φ (ff-eye)
χ (kai, or k-eye [not kee])
ψ (p-ss-eye, as in psychology)
ω (oh-mee-ga (UK) or oh-may-ga (USA))

A short glossary


Apoenzyme: an incomplete molecule, usually requires a coenzyme (such as FAD, an additional protein (such as σ) or an RNA molecule for full function
Holoenzyme: an complete molecule, usually incorporating an essential coenzyme (such as FAD, an additional protein (such as σ) or an RNA molecule and expressing full biological function

Operator is the term given to a promoter that is flanked by a repressor (or an activator) binding site. The sequence of the promoter is extended in either direction (or possibly both

Promoter: a stretch of double-stranded DNA sequence to which an RNA Pol binds and, through a series of orchestrated molecular interactions, marks the initiation point for the transcription of a particular gene or group of genes. In bacteria, the DNA sequence comes in two sections: the -10 box comprises around 10 base pairs which are recognised by a σ factor (which is itself associated with the apo-enzyme for of RNA Pol). The -35 "box" provides contacts for the αβ subunits. The negative sign indicates the distance between the two "boxes" and the nucleotide that forms the 5' end of the transcript. The diagram below should help explain these concepts.
Ribosomes: a multi-component molecular machine comprising rRNA and polypeptides in the form of two "subunits" referred to by their sedimentation properties in an analytical ultracentrifuge. The 30S (small) and 50S (large) subunits co-assemble during the initiation of protein synthesis in the presence of initiation factors aminoacylated tRNAs, mRNA and an energy supply. You can read more here
Transcription: the catalytic, template mediated synthesis of RNA from double stranded DNA. The products are a range of RNAs, including messenger, transfer etc and the enzyme may be a single species such as bacterial RNA Polymerase, or a dedicated one such as RNA PolII in eukaryotes that catalyses mRNA biosynthesis.
Translation: the biosynthesis of polypeptide chains from mRNA templates via the ribosome. Each ribosome can accommodate virtually any mRNA and in higher organisms, aggregates of ribosomes are called polysomes

Sigma factors One of the many returns on our collective investment in genome sequencing, has been the insights gained into those genes that are essential for cell growth and reproduction. Not surprisingly, the genes encoding the polypeptides that make up RNA Pols are essential for cell viability. However, while all prokaryotes possess the genes encoding the α (rpoA)β/β'(rpoB and C) and the major σ factor,  σ70 (rpoD or sigA), there are some other regulatory factors that seem to confer advantages in regulating gene expression, that are likely to add to the physiological versatility of the organisms in which they are expressed. In the well-studied prokaryote E.coli, in addition to σ70 , we find  the following σ factors:

σ19 (fecI) - regulates the fec gene for iron transport
σ24 (rpoE) - the extreme heat stress factor
σ28 (rpoF) - the flagellar factor
σ32 (rpoH) - the heat shock factor, that is turned on when the bacteria are exposed to heat.  Some of the enzymes that are expressed upon activation of σ32 are chaperones, proteases and DNA-repair enzymes.
σ38 (rpoS) - the starvation/stationary phase sigma factor
σ54 (rpo
N) - the nitrogen-limitation factor.

Before (L) and After (R)

σ factors interact with the RNA Pol apoenzyme to generate the holoenzyme and in doing so, provide the enzyme with the capacity to recognise the -10 and -35 elements of a promoter (see figure and scheme above). The "before and after" images (LHS) show the location of the (orange) σ factor in the complex, and how its elongated shape facilitates recognition of the -10 and -35 elements (the promoter is the blue and pale green duplex above the RNA Pol). The initiation of transcription of all constitutive genes only requires the RNA Pol holoenzyme as in the "before" image. As soon as the transcriptional start site is exposed and a supply of NTPs is made available, the σ factor dissociates (the "after" image) and the elongation phase of transcription gets underway. The role of the σ factor is primarily to "target" the catalytic apparatus: by replacing the house-keeping σ factor with any of the above sigma variants, selective sets of genes can be expressed in response to one or more environmental cues. Pretty straight forward I think you'll agree. This principle of combining a core function, in this case RNA synthesis, with a variety of targeting polypeptides (in this case sigma subunits), is a common strategy used in Biology, with antibodies being a well known example. 

Anti-sigma factors The potency of σ factors has led to the evolution of antagonistic molecules, called anti-sigma factors. In some organisms, σ factors need to be attenuated [slowed] (or even abrogated [stopped]): this can be achieved by the expression of anti-sigmas. Again, the logic is pretty simple. A σ factor can be maintained in complex with an anti-sigma, until an environmental queue is triggered. Through an induced conformational switch, such as a pH transition, or the binding of a small molecule to the anti-sigma component, the two components (see the image of the T4 phage anti-sigma-σ complex, RHS) are able to dissociate and the σ factor is free to promote targeted transcription.

File:Lambda repressor.jpgRepressors These molecules have a special place in the history of Molecular Genetics. The work of Jacob and Monod (see an earlier post on RNA Pol) in the early 1960s laid the foundations for our understanding of gene regulation in prokaryotes and higher organisms. At the centre of their logic was the concept of the repressor, which was later defined in molecular terms as a protein molecule (although it can also be an RNA molecule) that interferes with transcription. The mode of action of repressors can be simply described as creating a road block in the path of a promoter bound RNA Pol, but since this simple concept was proposed, genetic, structural and kinetic studies have shown that repressors can inhibit RNA Pol progress by a variety of mechanisms which do not always arise from simply blocking the path of the RNA Pol, or by competing for a specific sequence in at or around the promoter. In fact, some repressors (including the lambda repressor shown left) are able to act as both repressors and activators of RNA Pol mediated transcription, and this forms the basis of the "plot" of the remarkable work from Mark Ptashne's laboratory, whose short book on this topic is a "must read" for all Molecular Biology students. Since most repressors do not form stable interactions with RNA Pols (although this is not meant to be a dogmatic statement), I will not discuss them further in this post.

Termination factor ρ , which is shown on the right, is responsible for terminating RNA Pol mediated transcription, but once again ρ acts like a classical repressor in recognising a specific RNA termination sequence of around 70 nucleotides, signalling the end of the road for RNA Pol: the ρ protein does not form a stable complex with RNA Pol. Bacteria like E.coli invest significant energy in synthesising this hexameric homo-polymeric protein and it is essential for viability in most prokaryotes. In fact the transcription of about half of the genes in E.coli are terminated via ρ while the remainder are said to be ρ-independent, or alyternatively utilise the proteins τ or nusA

The ω and δ factors. These are both bona fide components of the RNA Pol holoenzyme.  ω seems to be involved in chaperoning and stabilising the interactions of the β' subunit. Unlike ρ it is dispensable, in that ω knockouts survive; but it does seem to improve the net efficiency of transcription: I expect growth rates in ω knockouts are lower than wild-type strains. δ is also formally enshrined in the RNA Pol holoenzyme, and like ω, the gene encoding this not factor is not essential, but its removal from a genome, does give rise to some strange morphological changes in growing cells (abnormal elongation in particular). A complete understanding of the roles of these two factors in transcription remains to be elucidated, but both primary structures are highly conserved amongst prokaryotes and a number of groups are currently looking at the functions of these accessory factors during infections in pathogenic bacteria.

I want to close with a mention of a growing number of regulators of transcription that seem to modulate transcription and bind to DNA, or indirectly via σ factors, and thereby RNA Pol transcription through a redox signal mediated by an iron-sulphur (Fe-S) cluster, buried in the heart of the protein, or sometimes in a flexible subdomain. One such regulator is SoxR (containing an 2Fe-2S cluster, shown in red-yellow on the LHS). I think you can see how the distortion of DNA might be induced by SoxR and this can modulate transcription initiation. Environmental signals such as reactive oxygen species and NO, trigger gene expression events that ultimately lead to the elaboration of processes that defend the cell against this metabolic challenge. The wbl proteins are a class of gene regulators (originally identified in Streptomyces strains), but which are also found amongst the Mycobacteria (think TB). The main reason for including them is however that they have become a major area of interest of one of my colleagues at Sheffield, Professor Jeff Green. And because I really like the story emerging from his lab (see a review here ), that connects redox sensing and the control of gene expression, which may have wider implications for a number of prokaryotes and may possibly modulate the mode of action of some antibiotics. 

In summary, the extraordinary focal point for gene regulation is RNA Pol in bacteria and we are learning every day about the plethora of polypeptides and RNAs that influence its activity. I hope this has given you a flavour of the structure and function of this area of Molecular Biology. At some point, when I am brave enough, I'll look at the eukaryotic RNA Pols!

Wednesday, 28 June 2017

Luciferase versus GFP: A lighthearted molecule for July

Related imageThe summer brings out the best in most of us (purely based on the evidence of the greater number of smiling people on the platform when I catch the train!). So, I thought about choosing a molecule that reflected this mood. I could have looked amongst the proteins that are the targets of psychotic drugs, or I could have gone for sunlight capturing molecules involved in photosynthesis. In January of 2016, I discussed a number of photo-activated proteins after a thrilling seminar from the Biochemist Tomas Carrel. See here. This month I have chosen the enzyme luciferase, a key element in the generation of light in insects such as the glow worm and the  fire fly (Photinus pylaris). I hope you agree that these creatures (notwithstanding the general unpopularity of most insects), make most people smile!

Image result for lucifer paintingThe name luciferase, has its origins in the Latin for "bringer of light" (think lucid or elucidate). You might be familiar with the Biblical archangel Lucifer, who defied God, and went on to establish an alternative post-mortem retreat for those of a slightly unorthodox disposition. As Mark Twain (or J.M. Barrie?) famously commented: I'd choose Heaven for the climate and Hell for the company! I assume Lucifer lit up the general conversation in Hades? Alternatively, if you have read any Charles Dickens or Arthur Conan Doyle, you will know that the "nickname" for a match was a "lucifer". Let's now have a look how luciferase generates light and how the properties of the enzyme have been incorporated into a biological detection technology that is used both in a discovery and diagnostics mode. The reaction, catalysed by all luciferases is as follows.

luciferin + ATP → luciferyl adenylate + PPi

luciferyl adenylate + O2 → oxyluciferin + AMP + light

Light is produced because the reaction forms oxyluciferin in an electronically excited state. The reaction releases a photon of light as oxyluciferin returns to the ground state (in this case, the "quantum mechanical" state of a system having the lowest possible potential energy. The expression is also used in Biochemistry to define the lowest free energy state of substrate(s) in an enzyme catalysed reaction, usually with respect to the transition state and the products of the reaction). Firefly luciferase generates light from luciferin in a multistep process. First, D-luciferin is adenylated by ATP to form luciferyl adenylate and pyrophosphate. Following this "activation" by ATP, luciferyl adenylate is oxidized by molecular oxygen to form a dioxetanone ring. A decarboxylation reaction yields the excited state of oxyluciferin, which tautomerizes between the keto-enol form (at a given pH and temperature, all carbonyls have a tendency to shift between these two forms: you can read more here). The reaction finally emits light as oxyluciferin returns to the ground state. [I shall return to the important topic of "excitation" of molecules and its importance in Biological systems in a separate post.]

This is quite a complicated phenomenon without a background in undergraduate Biochemistry, Chemistry or Biophysics, so don't worry if it leaves you a little baffled. Think of the dyes that colour your clothes, or a bright blue copper sulphate solution. It is sometimes possible to re-organise electrons in a molecule in response to visible and uv light. This can result in a portion of the visible spectrum being removed by the molecule, which results in a very specific colour of a solution of the molecule. The process involves light energy in the form of photons, re-organising specific electrons in the molecule, followed by their return to the "unexcited" state which can be accompanied by the emission of a colour change, a fluorescence  emission or phosphorescence. There are specific "pathways" that are described in quantum mechanics that account for these phenomena and why different molecules choose one over the other, or none at all! I shall attempt to write a post on these important phenomena in the near future, since they are particularly important in the mechanism of photosynthesis.

The protein molecule (shown left from a dinoflagellate) comprises two major structural units. The blue (mainly) beta barrel sits beneath the alpha-helical arrangement, with the adenylate and the chromophore positioned at the junction of the two domains. On binding the reactants the domains come together to exclude water, which increases the half life of the "excited" state of the oxyluciferin. The details vary a little from species to species and this leads to a variation in the wavelength of the emitted light. One mechanism proposes that the colour of the emitted light depends on whether the product is in the keto or enol form. The mechanism suggests that red light is emitted from the keto form of oxyluciferin, while green light is emitted from the enol form of oxyluciferin. This is not proven, but the logic relates to the well established connection between resonance structures and the energetics of absorption of light in the visible and uv spectrum. There are some other ideas, but even though a consensus hasn't yet been reached, all mechanisms will probably connect the local (molecular) environment with the stabilisation of the excited state (see below RHS).

You may wish to compare the properties of luciferases with naturally fluorescent proteins such as the Green Fluorescent Protein (GFP for short). Can you think of the biological advantages for an organism emitting light? Maybe a useful exercise is to compare and contrast the applications of these enzymes in contemporary experimental molecular cell biology? Can you find glow worms and fireflies in the UK? Take a look at the survey.

Key points from the Blog. There are naturally occurring proteins (and small molecules) that have fascinating optical properties. Such properties are sometimes related to their requirement for energy to drive unfavourable reactions (DNA repair, photosynthetic electron transfer). In some cases, the natural glow of a fire-fly or the bright fluorescence of marine organisms has evolved for reasons that are not entirely understood. However, such beautiful natural phenomena attract Biochemists and they can lead to technologies that unlock hidden secrets in the behaviour of cells. Luciferases are used in a wide range of diagnostic and research methods, and I hope you agree with me that they are incredible molecules, however I think GFP currently holds the prize as the most important optical probe in contemporary biology.

Tuesday, 6 June 2017

Fighting back! Vancomycin (plus) my molecule for June

This month I decided to wait for the release of a paper that the press announced as an example of magic! Since all magic can be explained by science, and therefore science is not magic; I thought it would be appropriate to set the record straight. Dale Boger's group at The Scripps Institute in California published a series of chemical modifications to the "last resort" antibiotic vancomycin that impact not only on its potency, but also look to have made significant inroads to reducing the emergence of "resistance". The excellent final figure in their recent publication in Proceedings of the National Academy of Scienceis shown below. It reveals the complexity of the molecule and it identifies the chemical groups associated with its antibiotic properties. But first let me provide some background to vancomycin. 

Vancomycin was discovered in the same year that Watson and Crick discovered the double helical nature of DNA (1953), around 24 years after Fleming published the discovery of Penicillin. By 1953, resistance to penicillin treatment had become a real clinical issue, particularly in Staphylococcus aureus (recall MRSA). With the discovery of vancomycin (the vanquisher!), it looked like an alternative treatment for resistant strains was now in sight. In fact the drug was fast-tracked into hospitals and was in use just five years after its discovery.

Vancomycin is a naturally occurring heptapeptide, originally isolated from the organism Amycolatopsis orientalis, which was originally identified by the Harvard trained organic chemist, Edmund Kornfeld at Eli Lilly, working with soil samples collected from the jungles of Borneo by missionary workers! We now know that this seven amino acid peptide is synthesised by non-ribosomal protein synthesis (NRPS), after which it is chemically modified in a a complex series of secondary metabolism steps. the aromatic rings are a combination of modified phenyl-glycine and tyrosine which are chlorinated and glycosylated. The sequence of amino acids is 

(1) Leucine (2) Tyrosine (modified by hydroxylation) (3) Asparagine (4) Glycine (modified by phenyl-hydroxylation) (5) Glycine (modified by phenyl-hydroxylation) (6) Tyrosine (modified by hydroxylation) (7) Glycine (modified by addition of dihyxdroxylated benzene)

starting from the bottom RH corner and proceeding clockwise in the structural diagram at the top.

As you might imagine, this is too short a sequence to be synthesised as heptameric units on the ribosome (what is the shortest polypeptide to be synthesised in a mature form via the ribosome? and what are the constraints on chain length?). The role of vancomycin, a complex secondary metabolite in the physiology of Amycolatopsis orientalis, as with other antibiotics is presumably to serve as a defence against bacterial threats to its survival, but it also reveals that complex carbohydrates etc can be introduced into microbial polypeptide chains in a way that we usually associate with proteins in much more elevated species. The 7 modules of vancomycin (centred on each amino acid) are generated and "finalised" for function by a set enzymes that utilise ATP to provide the necessary energy through an adenylate intermediate. You can read more here about these enzymes and their genes. 

In my mind there are always three main questions that need addressing when trying to rationalise the mode of action of antibiotics. The first is pretty well understood and relates to the target (or targets). The second is less clear, and that is an understanding of the mechanism of killing versus growth arrest. The third question relates to the likelihood and mechanism of resistance. In the case of vancomycin and its synthetic derivatives, it would appear that there are three targets. The biosynthesis of the bacterial cell wall is essential for the normal growth of most bacteria. Vancomycin is a bivalent inhibitor of this process, interfering with two distinct enzymatic steps (shown in blue at 6 0'clock and 12 o'clock in the diagram). In addition, the positively charged moiety (bottom LHS) is thought to disrupt the cell membrane. The combination of these three weapons not only reduces the minimal inhibitory concentration (MIC) of vancomycin, but it also massively reduces the emergence of resistance in the target organism. So the Boger group have made significant progress in taking a natural product and improving it, in the great traditions of natural product chemistry and contemporary organic synthesis. It is now up to the molecular biologists to provide the explanations relating to cell death and ultimately the mechanisms of resistance, which will be when and not if!

Monday, 8 May 2017

Molecule of the Month of May 2017: Adhirons

As usual, I had a molecule almost ready to go, until I started reading an interesting thesis from one of (Professor) Mike McPherson's research group at the University of Leeds. Without disclosing any secrets, the thesis centred around a class of molecules called adhirons (see the left hand image from the Leeds group RCSB PDB). I thought I would put my reading to good use and share my enthusiasm for these remarkable polypeptides. On the one hand they offer an excellent opportunity to challenge the primacy of antibodies in high affinity, selective protein binding, but they also offer considerable insight into the evolutionary relationships between primary structure and the "encoded" tertiary interactions between protein surfaces. And as you know, it is the network of protein:protein interactions that expands the functionality of many eukaryotic genomes. Since "Molecule of the Month" aims to introduce you to a mixture of "Old Chestnuts" and "Young Turks", I also try to select molecules for the lessons they can teach about the relationship between protein structure and function in an evolutionary context.

Adhirons and antibodies have some things in common and some significant differences. I have written about IgG before, and rather than give details, I shall cite their general properties for comparative purposes here, since antibodies form a major part of any Biology curriculum in High School and above. The first point to make is that Adhirons are not found in Nature. However, they are molecules that have been considerably "informed" by Nature. They are derivatives of a group of protease inhibitors referred to as statins (but not the cardiovascular drugs), a class of cysteine protease inhibitors which are related to the stefins, studied in Sheffield in (Professor) Jon Waltho's structural biology group in the 1990s. The development of NMR for the determination of protein structure has been helped significantly by the existence in Nature of polypeptides of less than 100 amino acids (low molecular weight) that exist to keep hydrolytic enzymes, including proteases, nucleases and amylases in check. Such proteins (Tendamistat, Bovine Pancreatic Trypsin Inhibitor and Statins etc.) proved to be workhorses for method development and helped interpretation of the complex spectra derived from sophisticated NMR experiments. You can read Nobel laureate, Kurt Wüthrich's overview of the development of NMR in structural biology here.

The adhiron used in Mike's group is derived from a plant statin family called the phytostatins. It was selected for its simplicity (no awkward post-translational modifications), and its intrinsic physical properties. It is stable at elevated temperatures (around 100 degrees C), it is extremely soluble (hundred of mg/ml solutions are not uncommon: many proteins drop out of solution above concentrations of 1mg/ml!), it has no sensitive disulphide bonds (which can lead to redox problems in use) and it is relatively low in molecular weight and monomeric, which means every application becomes simplified and that much more cost-effective. You can read more about the background to the development of adhirons here.

Let me compare adhirons to antibodies in respect of their general structure and properties. Both molecules comprise two elements: a variable region (not really a domain) and a constant domain. In IgG, the constant domain (Fc) provides the scaffolding, or support structure, upon which the variable (Fv) domains (they are paired, in most antibodies, which gives them their bivalent binding property) sit, comfortably awaiting the "arrival" of a complementary antigen. This structure is usually represented as a Y shape: the V is the variable region and the tail is the constant region, which interacts with other proteins (such as Fc receptors). In adhirons, the scaffold function is provided by the regular secondary structure elements beta sheets and an alpha helix. And as with the Fv region in IgGs, the selective binding function is found on a series of "loops". Time for a little diversion to explain the paradox between Emil Fischer's elegant "lock and key" model and the use of "loops" in high affinity (sub-nanomolar) recognition between proteins (the figure above shows a benzene ring in the active site of an imaginary enzyme....or a nut in a spanner!).

When two perfectly complementary, rigid bodies come together, they first make a good "fit". This is the equivalent of placing the correct "spanner" on a "nut" (as shown above LHS), or a "Philips" screw driver into a suitable screw (such screws were invented by Henry F. Phillips over 80 years ago in Oregon and remain the most popular today). However, the nice fit must be stabilised in order for the interaction to be "productive". So the plumber or carpenter must push the screwdriver into the screw head or hold the spanner perpendicular to the nut (usually) before applying the force needed to turn both. This is the lock and key approach. As an amateur craftsman I have in my toolbox fixed width spanners and a selection of screwdrivers to match the screw sizes. I also have an adjustable spanner (above, RHS), to deal with the fact that sometimes, there are nuts that vary in diameter. The adjustable spanner has a mechanism for "fine tuning" the width of the spanner to fit a selection of nuts. This is an induced fit spanner! The latter "design" of spanner allows for an "evolutionary" change in the diameter of the nut, or it could be the availability of lactose versus glucose in the case of an enzyme. Sometimes we encode an enzyme for every substrate and sometimes we have enzymes that can accommodate substrates of varying dimensions. I am thinking here of short, medium and long chain acyl CoA dehydrogenases, for example.

It is less intuitive, to appreciate how two flexible surfaces can unite and organise rigidly (which is what enzyme active sites have to do most of the time). But think of a mixture of oil and water: although the two liquids are "mobile"; the interface is very sharp. Or think of pushing your hand into a soft glove. The other way I think of this challenging phenomenon is to consider protein folding during translation. As a polypeptide chain emerges from the ribosome it must find and adopt its tertiary structure pretty quickly (life and death is less than 30 minutes in E.coli!). So a disordered region, meeting a ligand or another protein surface will harness the forces of entropy and enthalpy in the same way as the folding polypeptdide. The role of the scaffold in antibodies and adhirons, serves to facilitate this: and experimental observations have shown that this strategy is really successful in other settings, including RNA binding and sequence-specific DNA recognition. It is this approach to molecular recognition that is adopted by adhirons. And this is translated into dissociation constants between adhirons and their targets that are often in the tens of picomolar range. [As a rule of thumb, you can estimate a dissociation constant (recall lower the molarity, the higher the affinity) from the average physiological concentration of the ligand: substrates for enzymes have Kds of a few mM, coenzymes are found at tens of micromolar and individual proteins and nucleic acids have Kds in the nM range)].

The variation in antibody specificity arises owing to the expression of a diverse set of variable regions that are linked to the constant domain (I wont discuss the mechanism here, but it is something you should read about). Adhirons can be generated in vitro that will specifically bind to many biological targets (although I expect not all with the same affinity), by randomising the loop regions that form the "spanner in the works" as statins block cysteine proteases. This can be achieved experimentally using an error prone DNA polymerase or some form of synthetic DNA, localised gene replacement method. The "library" of mutants can then be used to screen the target: a favourite technique for this is known as phage display, and you can read the details here. In this way, high affinity, highly selective adhirons can be generated to recognise target molecules with all of the advantages of a monoclonal antibody, but at a much lower cost (in view of the relatively simple methodology). The combination of a flexible interaction surface supported by a stable and relatively invariant scaffold domain in proteins, represents an interesting evolutionary concept that Nature has exploited on several occasions. (One such scaffold and loop structure is shown top left, which is part of the structure of thioredoxin, a widely distributed redox protein: the loop regions is shown in blue). Adhirons are not in the mainstream of molecular recognition reagents just yet, and the technologies for detection etc. that have grown up around antibodies are certainly not about to become obsolete, but these molecules certainly seem to me to offer some very significant opportunities for research and diagnostics in the future.

Summary of Key Points

Image result for life sciences utc logoFirst of all, I have picked a group of proteins (the adhirons) that have been derived by exposing a naturally occurring protease inhibitor to mutation in vitro. This results in a "library" of small proteins that are able to make very strong interactions with the target molecule. This feature is very similar to the highly specific recognition associated with antibodies. Since adhirons possess a set of small polypeptide loops (i.e. they show no rigid structure in solution), through which they make very strong interactions with other proteins, it is perhaps surprising that such flexible structures are so good at making these interactions. Adhirons, like antibodies comprise a stable scaffold upon which a set of flexible loops can specify strong intermolecular recognition. In fact we are beginning to appreciate that the strong interactions that stabilise substrates in the active sites of proteins seem to be less rigid (i.e. less lock and key and more induced fit), than we anticipated. Finally, this arrangement of molecules and cells with one part changing and one part constant seems to have been used on a number of occasions in Biology. The changeable region brings diversity of function and the constant region brings economy of function. Perhaps this could be discussed in class?