Download DNA and Evolution

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts

Zinc finger nuclease wikipedia , lookup

Transposable element wikipedia , lookup

Koinophilia wikipedia , lookup

DNA repair wikipedia , lookup

SNP genotyping wikipedia , lookup

Bisulfite sequencing wikipedia , lookup

Human genome wikipedia , lookup

Nucleosome wikipedia , lookup

Mitochondrial DNA wikipedia , lookup

DNA polymerase wikipedia , lookup

Epistasis wikipedia , lookup

United Kingdom National DNA Database wikipedia , lookup

Nutriepigenomics wikipedia , lookup

Gel electrophoresis of nucleic acids wikipedia , lookup

DNA vaccination wikipedia , lookup

Genomic library wikipedia , lookup

Genetic engineering wikipedia , lookup

Primary transcript wikipedia , lookup

Genealogical DNA test wikipedia , lookup

Cancer epigenetics wikipedia , lookup

DNA damage theory of aging wikipedia , lookup

Genome evolution wikipedia , lookup

Replisome wikipedia , lookup

Designer baby wikipedia , lookup

Oncogenomics wikipedia , lookup

Epigenomics wikipedia , lookup

Genetic code wikipedia , lookup

Mutagen wikipedia , lookup

Molecular cloning wikipedia , lookup

Genomics wikipedia , lookup

Vectors in gene therapy wikipedia , lookup

DNA supercoil wikipedia , lookup

Nucleic acid double helix wikipedia , lookup

Cell-free fetal DNA wikipedia , lookup

Gene wikipedia , lookup

No-SCAR (Scarless Cas9 Assisted Recombineering) Genome Editing wikipedia , lookup

Extrachromosomal DNA wikipedia , lookup

Site-specific recombinase technology wikipedia , lookup

Microsatellite wikipedia , lookup

Genome editing wikipedia , lookup

Therapeutic gene modulation wikipedia , lookup

Non-coding DNA wikipedia , lookup

Frameshift mutation wikipedia , lookup

Nucleic acid analogue wikipedia , lookup

Cre-Lox recombination wikipedia , lookup

History of genetic engineering wikipedia , lookup

Mutation wikipedia , lookup

Artificial gene synthesis wikipedia , lookup

Deoxyribozyme wikipedia , lookup

Helitron (biology) wikipedia , lookup

Microevolution wikipedia , lookup

Point mutation wikipedia , lookup

Transcript
Bacterial DNA
Mutation and Evolution
• DNA carries the genetic instructions forward in time. You
might consider life from DNA’s point of view: it goes
through many generations of individual organism, slowly
changing, occasionally having a major shift, and
sometimes dying out.
• DNA is affected by two forces: changes caused by random
mutations of several kinds, and natural selection. To
paraphrase someone*’s quote: Random mutation proposes
new DNA, and natural selection disposes of unsuccessful
ideas.
• Ludovico Ariosto (Italian poet 1474-1533) “Man proposes and God
disposes”
Evolution by Natural Selection
•
The basic principal of natural selection is very simple: an organism that
survives and reproduces better than other organism increases its share of the
next generation.
– This is especially clear from the DNA point of view: There will be more copies of
DNA that causes its host organisms to produce more offspring.
– It is important to realize that this is a long term thing: evolutionary success requires
each generation to produce more offspring (on the average. More concretely: if
you have 10 children and none of them reproduce, your evolutionary fitness is zero
because your DNA stops moving forward in time with your children.
– Evolutionary fitness: the ability to survive and reproduce
•
Malthusian principle: there’s not enough room for all descendants. Resources
are always limited, which implies that some individuals die without
reproducing. Survival is affected by how well an individual is adapted to the
environment. Or, how well the individual’s genes match the needs and
challenges imposed by the environment it lives in.
– adaptations are specific traits that help the organism function in its environment
– What works well under one set of conditions may not work when conditions
change.
Neutral Mutations
•
Many mutations have very little or no effect on the organism
– mutations in intergenic regions
– synonymous mutations: affect the gene’s DNA but not the amino acids sequence
– amino acids changes in non-critical regions of the protein
•
Neutral mutations: do not affect the evolutionary fitness of the organism.
– Many genetic polymorphisms: small differences between different strains of the
same species, are (probably) neutral. Polymorphisms are very useful for strain
identification and for genetic mapping
•
The concept of pre-adaptive mutations: DNA variations that are neutral, or
have only a small effect on fitness under “normal” conditions suddenly
become very useful when conditions change.
•
Mutations with a strong negative effect on fitness are quickly weeded out: the
organism can’t survive or reproduce. But, neutral (no “nearly neutral”)
mutations can stay in a population for a long time.
Evolutionary Conservation
•
Most of gene annotation is a matter of evolutionary reasoning:
– Gene A in a new species probably has the same function as the similar gene B in
another species because both species need to solve the same problem and they are
related by evolutionary descent from a common ancestor.
– We look at gene sequences and other features and make a decision that the
differences we see are not important, and therefore assign the same function to both
genes
•
Basic principles: what is being selected in function: how well do the genes
work in the organism as it lives its life. DNA changes are conserved to the
degree that they affect function. Most function is based on how well enzymes
and other proteins do their job.
– Protein sequence is more conserved than DNA sequence. Thus most of our
sequence homology searches are conducted with protein sequences.
– Three-dimensional shape, the key to enzyme function, is conserved better than
protein sequence. It is quite possible to produce the same structure with completely
different amino acids.
• Unfortunately, it is very difficult to search 3-D structures, mainly because there is no good
way to determine how an amino acid sequence will fold up. This is the “protein folding
problem”, one of the major unsolved problems in bioinformatics.
More on Conservation
•
genes are more conserved than intergenic regions. Being very loose here, a
“gene” can be considered any part of the DNA that is transcribed
– however, there are some functional regions in DNA that are not transcribed but
which are conserved: the origin of replication and gene control regions, for
example.
•
•
•
Protein-coding portions of genes are conserved more than untranslated regions
the middles of proteins are conserved more than the ends. It can be hard to
pinpoint the translation start of a gene because it is not well conserved between
species
the amino acids that make up the active site of the enzyme are the most
conserved of all, often being identical across large evolutionary distances
Mutations
•
•
•
•
Any change in the DNA sequence of an organism is a mutation.
Mutations are the source of the altered versions of genes that provide the raw material
for evolution.
A central tenet of biology is that the flow of information from DNA to protein is one
way. DNA cannot be altered in a directed way by changing the environment. Only
random DNA changes occur.
Some terminology: the genotype is the organism’s genetic constitution, at the bottom,
the sequence of its DNA. The phenotype is the physical characteristics of the organism:
its appearance, biochemistry, reactions to the environment, etc.
–
–
•
•
before DNA sequencing, the genotype was deduced from the phenotypes of parents and
offspring.
the point of genome annotation is to deduce the phenotype that will result from a given
genotype.
Most mutations have no effect on the organism, especially among the eukaryotes,
because a large portion of the DNA is not in genes and thus does not affect the
organism’s phenotype.
Of the mutations that do affect the phenotype, the most common effect of mutations is
lethality, because most genes are necessary for life.
Base Change Mutations
•
The simplest mutations are base changes, where
one base is converted to another. (Also called
“substitutions”, or “point mutations”.) These can
be classified as either:
–
–
•
--“transitions”, where one purine is changed to
another purine (A -> G, for example), or one
pyrimidine is changed to another pyrimidine (T -> C,
for example).
“transversions”, where a purine is substituted for a
pyrimidine, or a pyrimidine is substituted for a
purine. For example, A -> C.
Transitions are more common than transversions,
because they are easier to create, and because
transitions often have less drastic effects than
transversions.
A
•
•
Base change mutations are the cause of single
nucleotide polymorphisms (SNPs). Mapping SNPs
is the current best way to locate human disease
genes.
Base change mutations are the most common
mutations, and they are the easiest to handle for
statistics and evolutionary studies.
C
G
T
A
0.6
0.1
0.2
0.1
C
0.1
0.6
0.1
0.2
G
0.2
0.1
0.6
0.1
T
0.1
0.2
0.1
0.6
Base Change Causes
• Base changes occur naturally as errors in replication: the
wrong base gets inserted.
– DNA polymerase has an editing function that detects most errors,
then backs up, removes the wrong base and puts in the proper base.
– enzymes that replicate RNA don’t have the editing function, so
their error rate is 100 x that of DNA polymerase, causing the high
mutation rate of RNA viruses.
•
Various chemical changes in a base can cause mutation.
For instance, the spontaneous loss of the amino group on
cytosine converts it to uracil (which will pair with A, not
G).
• environmental chemicals that attach bulky groups onto
bases (alkylating agents) can cause the bases to be misread by DNA polymerase.
Phenotypic Effects of Base Changes
•
•
•
•
•
•
Mutations can be classified according to their effects on the protein (or mRNA) produced by
the gene that is mutated.
1. Silent mutations (synonymous mutations). Since the genetic code is degenerate, several
codons produce the same amino acid. Especially, third base changes often have no effect on
the amino acid sequence of the protein. These mutations affect the DNA but not the protein.
Therefore they are called neutral mutations, mutations which should have no effect on the
organism’s phenotype.
2. Missense mutations. Missense mutations substitute one amino acid for another. Some
missense mutations have very large effects, while others have minimal or no effect. It
depends on where the mutation occurs in the protein’s structure, and how big a change in the
type of amino acid it is.
3. Nonsense mutations convert an amino acid into a stop codon. The effect is to shorten the
resulting protein. Sometimes this has only a little effect, as the ends of proteins are often
relatively unimportant to function. However, often nonsense mutations result in completely
non-functional proteins.
4. Sense mutations are the opposite of nonsense mutations. Here, a stop codon is converted
into an amino acid codon. Since DNA outside of protein-coding regions contains an average
of 3 stop codons per 64, the translation process usually stops after producing a slightly longer
protein.
Base changes can also affect RNA initiation, splicing and termination.
More on Substitution
•
In addition to synonymous
mutations, some amino acid
changes are “conservative” in that
they have little or no affect on the
protein’s function.
–
–
–
–
•
for example, isoleucine and valine
are both hydrophobic and readily
substitute for each other.
other amino acid substitutions are
very unlikely: leucine
(hydrophobic) for aspartic acid
(hydrophilic and charged). This
would be a non-conservative
substitution.
Some amino acids play unique
roles: cysteines form disulfide
bridges, prolines induce kinks in
the chain, etc.
However, some amino acids are
critical fro active sites and cannot
be substituted.
Tables of substitution frequencies
for all pairs of amino acids have
been generated.
BLOSUM62 Table. Numbers on the diagonal
indicate the likelihood of the amino acid
staying the same. The off-diagonal numbers
are relative substitution frequencies.
Indels
•
Another simple type of mutation is the gain or
loss of one or a few bases. These mutations
are called indels, which is short for
“insertion/deletion”.
– When comparing two species it isn’t easy to tell
whether an insertion occurred in one species or
a deletion occurred in the other.
•
Indels are thought to be generated when the
DNA polymerase slips forward or backward on
the template DNA it is copying.
– This occurs most easily in repeated sequences,
but can occur anywhere.
•
A second cause of short indels is chemical- or
radiation-induced loss of the base portion of
the nucleotide. The DNA polymerase often
skips right over these sugar/phosphate stumps,
leaving a missing base in the resulting DNA
chain.
Frameshifts and Reversions
•
•
•
Translation occurs codon by codon,
examining nucleotides in groups of 3.
If a nucleotide or two is added or
removed, the groupings of the codons is
altered. This is a frameshift mutation,
where the reading frame of the
ribosome is altered.
Frameshift mutations result in all amino
acids downstream from the mutation
site being completely different from
wild type. These proteins are generally
non-functional.
A reversion is a second mutation that
reverse the effects of an initial
mutation, bringing the phenotype back
to wild type (or almost).
–
Frameshift mutations sometimes have
“second site reversions”, where a
second frameshift downstream from the
first frameshift reverses the effect.
DNA Replication
•
•
•
•
•
•
•
How DNA makes copies of itself.
Involves an enzyme: DNA polymerase.
In bacteria, replication starts at a single point, the origin
of replication (ori) and proceeds in both directions around
the circle, meeting on the opposite side.
The DNA double helix unwinds into 2 separate strands,
and a new strand is build on each old one. Thus, each
new DNA molecule consists of 1 old strand plus 1 new
strand. This is called “semi-conservative” replication.
DNA polymerase makes the new strands, using the old
strands as a template, with normal base pairing: A with T,
and G with C.
The energy for this comes from the nucleotide precursors.
They all have 3 phosphates on them, like ATP, and 2 of
the phosphates are removed to make the DNA.
DNA polymerase always adds new bases to the 3’ end of
the new DNA strand. This makes it necessary to
synthesize one strand (the lagging strand) in short pieces,
then join them together.
–
–
The pieces are called Okazaki fragments. They are
synthesized starting with RNA primers that are degraded
and replaced with DNA in the final product.
Joining DNA pieces is done with DNA ligase
Recombination
•
Recombination is the breaking and rejoining of 2 DNA
molecules, usually at homologous regions (=sequence is
the same).
– Also called crossing-over
– you end up with a DNA molecule that has 2 parental
molecules
•
DNA metabolism in all organisms includes enzymes that
catalyze recombination.
– Recombination seems to be essential to long term survival:
you can remove bad mutations, and you can combine
several good ones together in the same organism.
•
In bacteria, DNA must be circular to replicate.
– If a linear piece of DNA recombines with the circular
chromosome, there must be 2 crossovers to exchange a part
of the DNA and keep the chromosome circular
– if 2 circles recombine, the result is a single larger circle.
The smaller circle has become integrated into the larger
circle.
Sources of New DNA
•
•
•
Bacteria reproduce by binary
fission: replicating their DNA,
then splitting in half. Each cell
has only 1 parent, and there is no
regular sexual process.
Horizontal gene transfer, bringing
in DNA from another species, is
quite common: estimated 15% of
genes.
Bacteria have 3 main ways of
bringing in new DNA:
– conjugation: direct transfer of
DNA between 2 cells (although
not necessarily of the same
species)
– transduction: transfer of DNA
between cells using a
bacteriophage (virus) as an
intermediate
– transformation: the cell takes up
DNA molecules from the
environment
Mutation Caused by Recombination
•
Most recombination simply breaks and reattaches
DNA sequences from 2 parents without changing
them.
– However, one possible outcome of the recombination
event, “gene conversion” causes only a very short
stretch of DNA to be altered: as if a very short region
of the DNA from parent A is altered to be like parent
B.
•
Recombination within a single DNA molecule can
also occur, if the two regions of the DNA are similar
– if the matching regions are inverted relative to each
other, recombination inverts the area between them.
– If the matching regions are in the same orientation,
the whole region can be deleted.
•
Misalignment during recombination: unequal
crossing over can cause genes to duplicate into
tandem arrays. Very common in eukaryotes, but
also happens in bacteria.
Transposable Elements
•
•
•
•
Transposable elements are DNA sequences that move
from place to place in the genome. Unlike genes,
transposable elements don’t have a fixed location on the
chromosome.
Transposable elements are essentially parasites. In
general they don’t contribute to the evolutionary fitness
of the organism.
Most of the genes in an organism are necessary, at least
under some circumstances, for the organism’s survival.
Genes avoid being destroyed by random mutations
because individuals with mutated genes are less fit: don’t
survive or reproduce as well as unmutated individuals.
Transposable elements avoid being destroyed by
increasing their numbers by enough to keep some
functional copies present even if some are destroyed.
– However, too much increase in numbers will kill the
organism because sometimes transposable elements insert
within a gene, inactivating it.
More Transposable Elements
•
•
•
Two basic types: those that are strictly DNA, and those
that replicate through an RNA intermediate.
Most bacterial TEs are DNA only
Most common type: Insertion Sequences (IS)
– roughly 1-3 kbp long, containing a transposase gene, and
are bounded by short (10-40 bp) inverted repeats
– many different families, not well conserved across species
•
•
•
Transposons are longer TEs, usually composed of 2 IS
elements and a gene(s) in between, often an antibiotic
resistance gene.
RNA transposable elements are called retrotransposons
in eukaryotes.
In bacteria, the common RNA TE is a “group II intron”.
– When transcribed into messenger RNA they can splice
themselves out without the need for proteins
– group II introns contain a gene for reverse transcriptase,
which copies the RNA back into DNA at a new location in
the genome.
Integrons
• Recently discovered in Gram negative
bacteria. Involved in the spread of
antibiotic resistance.
• Contain a gene for integrase, a
recombination site called attI, a strong
transcription promoter, and a set of
gene “cassettes” that code for drug
resistance.
– the most common type also has a
sulfonamide resistance gene (sulI) at
the 3’ end.
• Cassettes exist as small DNA circles
that don’t replicate or get transcribed.
They contain a corresponding att site.
When teh stt sites are aligned,
integrase catalyzes a recombination
event and incorporates them into the
integron (or removes them)
• Found in variable locations in the
genome.
Lysogenic Bacteriophage
•
•
•
Bacteriophage (phage) are bacterial viruses: DNA (or RNA) surrounded by a
protein coat, but with no internal metabolic activity.
Most bacteriophage enter the cell, hijack its machinery to reproduce themselves,
and then kill the cell by lysing it (breaking it open). This is called the lytic cycle.
Some phage have the ability to insert themselves into the bacterial genome and
remain there, inactive, for many generations: the lysogenic cycle.
– First described in phage lambda
– the inserted phage chromosome is called the prophage.
•
•
When conditions get harsh, the phage DNA comes out of the chromosome and
enters the normal lytic pathway. It reproduces and kills the host cell.
Sometimes the prophage is inactivated by mutation and becomes a permanent part
of the chromosome.
Chromosome Breaks
•
•
•
DNA sometimes breaks due to mechanical stress,
ionizing radiation, or chemical attack.
Most organisms contain enzymes that reassemble
broken DNA molecules, called non-homologous
end joining.
If there is more than one break, ends are joined
randomly, which can lead to a rearranged genome.
– This breaks up blocks of genes over evolutionary
time