Predicting Protein Function from DNA Sequences

Identifying the function of a protein from its sequence alone is possible using a genetics-based approach. The principle of this approach is coarse; it involves detecting the presence of mRNA encoding a protein of unknown function.

ibreakstock | Shutterstock

The temporal expression of RNA

The expression pattern of a single gene or multiple genes can be investigated using a technique RNA microarray. In this technique, RNA molecules are ‘spotted’ onto a solid surface. These act as probes capable of binding the RNA with the corresponding sequence.

If the corresponding sequence is present in the sample that is applied to the microarray, hybridization of the probe and sample RNA will cause a signal. This is commonly a fluorescent signal, which can be seen under a microscope that reports the binding event.

Cells differ in their protein expression profiles based on external and internal changes, allowing for the comparison of RNA before and after a cellular event, such as heat stress or oxygen depletion. A microarray will demonstrate a change in the genomic expression profile of the cell in response to the event.

Genes that demonstrate a significant change in expression (seen as increased fluorescence hybridization), or those that are expressed exclusively in the sample before or after stress, can be verified by another molecular technique, called northern blot analysis.

Northern blotting

Northern blot analysis first involves sorting the RNA present in the two samples (stressed and unstressed) into groups of identical length. This is achieved by electrophoresis. Subsequently, the separated RNA fragments are transferred to a nitrocellulose membrane and held in place or immobilized. Labelled RNA probes are then applied to the membrane.

Hybridization of the probe occurs when complementary binding occurs. The signal is detected and quantified by fluorescence microscopy. Sequencing of this RNA will reveal its identity and possible function.

Gene knockout studies

A further variation of this technique is viewing the outcome of removing a gene encoding a protein of unknown function. The change in function (or phenotype) of the cell can be used to deduce what function the protein may play. This can be carried out using gene knockout studies in which the gene of interest is silenced.

Silencing is achieved by employing antisense RNA, a molecule that is complementary to the protein-encoding RNA of interest. Binding of antisense RNA to the mRNA prevents its translation into its corresponding protein. This process is called RNA interference (RNAi).

Alternatively, antibodies that bind to the protein of interest can employed. This prevents the protein from functioning in the cell and the resultant phenotype can be observed. An example of this is the biosynthetic pathway of Heparan Sulphate (HS), an essential proteoglycan, which regulates many essential biological processes.

The genes encoding Ext1-/- or Ext2-/- (the enzymes that regulate the synthesis of heparan sulphate) can be knocked out, and the effects observed. As these enzymes catalyze the addition of the first sugar (GluNAc) to the four-sugar on HS. Thus, in the absence of these genes, the murine embryos fail to undergo gastrulation. This suggests that HS synthesis is essential for early embryonic development.

Two-hybrid screening

Further probing of function can be achieved by the process of two-hybrid screening. This technique tests for binding of either two proteins, or a single protein to DNA. This is particularly useful when the protein encoded by the unknown gene is thought to interact with a partner protein of known function. Successful interaction allows researchers to infer the function of the unknown protein.

Studying conserved sequences

Another example of a genetics-based approach is the comparison of the codons within the binding region of an unknown protein to proteins with a known function.

If one amino acid is essential for the function of another protein, then there exists a pressure to maintain the amino acid(s) in the catalytic site. Examples of this pressure are seen in the ABC transporter family, where several conserved sequences of amino acids (motifs), such as the Walker A and B motifs, Q- and H-loops, and the C-motif have been retained.

An example of conservation of specific amino acids, is the aromatic amino acids in the A-loop, which interact with the adenine ring of ATP, and are essential for its binding. In the case of catalytically essential amino acids, the ‘catalytic triad’ present in the active site of some hydrolase and transferase enzymes (i.e. proteases and amidases) are highly conserved. Their persistence in these enzymes illustrate the process of convergent evolution.

Convergent evolution describes the process by which two unrelated organisms or molecules, independently evolve similar characteristics. In this context, the hydrolases and transferases employ similar chemistry, hence they use the same amino acids. Therefore, despite, for example, the serine protease enzymes, such as papain and chymotrypsin, being evolutionarily unrelated, the catalytic triad is conserved in both.

The individual amino acids of the triad are not identical, but are similar in property.  All triads contain an acidic residue, such as Glu or Asp, a base, such as His and a nucleophile, such as Ser/Thr/Cys.

Mutations of structurally important amino acids can also be used to infer function. It is assumed that if a mutation or group of mutations are destabilizing, another compensatory mutation will occur elsewhere in the protein in order to retain its structure and function. This occurs commonly in antibiotic proteins produced by bacteria.

Sources

  • https://www.ncbi.nlm.nih.gov/books/NBK6301/
  • http://arep.med.harvard.edu/johnson/predict/protein_main.html
  • https://www.ncbi.nlm.nih.gov/pubmed/12406214
  • https://www.sciencedirect.com/science/article/pii/S2001037015000070

Further Reading

  • All Protein Content
  • Protein Production: Initiation, Elongation and Termination
  • Protein Folding
  • Amino Acids and Protein Sequences
  • Protein Complex Analysis
More…

Last Updated: Jan 4, 2019

Written by

Hidaya Aliouche

Hidaya is a science communications enthusiast who has recently graduated and is embarking on a career in the science and medical copywriting. She has a B.Sc. in Biochemistry from The University of Manchester. She is passionate about writing and is particularly interested in microbiology, immunology, and biochemistry.

Source: Read Full Article