Bioinformatics
Bioinformatics is the use of mathematical and informational techniques to
solve biological problems, usually by creating or using computer programs,
mathematical models or both. One of the main applications of bioinformatics
is the data mining in and analysis of the data gathered in genome projects.
Other applications are sequence alignment, protein structure prediction,
metabolic networks, morphometrics and virtual evolution.
Computer scripting languages such as Perl and Python are often used to
interface with biological databases and parse output from bioinformatics
programs. Communities of bioinformatics programmers have setup free/open
source projects such as Bioperl, Bioruby, and Biopython which develop and
distribute shared programming tools and objects (as program modules) that
make bioinformatics easier.
Since the Epstein-Barr virus was sequenced in 1984, the DNA sequence of more
and more organisms is stored in electronic databases. This data is analyzed
to determine genes that code for proteins, as well as regulatory sequences.
A comparison of genes within a species or between different species can show
similarities between protein functions, or relations between species
(phylogenetic trees). With the growing amount of data, it becomes impossible
to analyze DNA sequences manually. Today, computer programs are used to find
similar sequences in the genome of dozens of organisms, within billions of
nucleotides. The programs can compensate for mutations (exchanged, deleted
or inserted bases) in the DNA sequence. A variant of this sequence alignment
is used in the sequencing process itself. The so-called shotgun sequencing
(that was used, for example, by Celera Genomics to sequence the human
genome) does not give a sequential list of nucleotides, but instead the
sequences of thousands of small DNA fragments (each about 600 nucleotides
long). The ends of these fragments overlap and, aligned in the right way,
make up the complete genome. Shotgun sequencing works very fast, but the
task to re-align the fragments is quite complicated. In the case of the
Human Genome Project (1988-2000), it took several months on a supercomputer
array to align them correctly.
Protein structure prediction is another important application of
bioinformatics. The amino acid sequence of a protein, the so-called primary
structure, can be easily determined from the sequence on the gene that codes
for it. But, the protein can only function correctly if it is folded in a
very special and individual way (if it has the correct the secondary,
tertiary and quartery structure). The prediction of this folding just by
looking at the amino acid sequence is quite difficult. Several methods for
computer predictions of protein folding are currently (2001) under
development.
One of the key principles in bioinformatics is homology. In the genomic
branch of bioinformatics, homology is used to predict the function of a
gene. If gene A is homologous to gene B of which the function is known, it
is likely to have a similar function. In the structural branch of
bioinformatics homology is used to determine which parts of the protein are
important in structure formation and interaction with other proteins. In a
technique called homology modelling, this information is used to predict the
structure of a protein once the structure of a homologous protein is known.
Despite many attempts, this is currently the only way to predict protein
structures with some reliability.
There are many other applications of bioinformatics. Computer simulations of
cellular subsystems such as the networks of metabolites and enzymes which
comprise metabolism, signal transduction pathways and gene networks can be
constructed that help to both analyze and visualize the complex connections
of these cellular processes. Morphometrics is used to analyze pictures of
embryos to track and to predict the fate of cell clusters during
morphogenesis. Artificial life or virtual evolution attempts to understand
evolutionary processes via the computer simulation of simple (artificial)
life forms. Another application is the automatic search for genes and
regulatory sequences within a genome. Not all of the nucleotides within a
genome are genes. Within the genome of higher organisms, large parts of the
DNA do not serve any obvious purpose (often called junk DNA). Bioinformatics
helps to bridge the gap between genome and proteome projects, for example in
the use of DNA sequence for protein identification.
As a summary, it can be said that the genome projects gave us long lists of
letters, and with bioinformatics, we can determine words, grammar, sentences
and, finally, their meaning.
This content from Wikipedia is licensed under the GNU Free Documentation License.
|
|