Download Gene

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
molecular cytogenetics
Samuel Murray
Karyotyping; limitations
• You only see abnormalities when they involve millions
of nucleotides!
Resolution: ~ 500 bands
~ 1 band per 6 million nucleotides
• It is time-consuming, difficult to automate,
interpretation is subjective.
FISH; limitations
• You have to know where to look!
• It is time-consuming, relatively expensive, difficult to
automate
Molecular cytogenetics in the postgenomic era:
Array-based comparative genomic hybridization
FISH
Karyotyping
array CGH
Total genome
High resolution
BACkground
BAC clones are widely used and precisely mapped in
the Human Genome Project. In array CGH, these clones
are used as targets on a microarray.
BAC clones
Chromosome
Band
Genes
http://genome.ucsc.edu
Array CGH procedure
Patient DNA
loss
Reference DNA
gain
Hybridize overnight
gain
Male vs. Female
Microarray slide
with spotted
BAC clones
autosomal
chr.X
chrY
no change
loss
From subtelomeric screening to tiling resolution
Subtelomeric screening
1 Mb screening
Tiling resolution screening
80 BACs
Veltman et al. AJHG 2002
3500 BACs
32400 BACs
Vissers et al. AJHG 2003
de Vries et al. AJHG 2005
Karyotyping vs. array CGH
~ 5-10 Mb resolution
~ 50-100 kb resolution
100x increased resolution
Array CGH: advantages
• You can identify copy number alterations throughout the
genome at a very high resolution!
array CGH ~ 100,000 nucleotides
karyotyping ~ 5,000,000 nucleotides
• Results can be mapped onto the human genome,
allowing detailed genotype-phenotype studies!
• The procedure is rapid and can be automated.
Improve identification by increasing resolution
Vissers et al. HMG 2005
Increased coverage SNP arrays unravels CNVs
Detail chr. 8
32k BACs
100k SNPs
192 kb del, known CNV
Yu et al. AJHG 2002
Are all Arrays Applicable
There are many considerations in designing an array
based CGH assay.
In an ideal situation, the array should
a) cover the entire genome,
b) should have a high resolution
c) and should be cost-effective
Most importantly it should be Fit for Purpose
Design Particulars
1. ‘Fit for Purpose’ = Application
2. CGH MUST out perform Karyotype
3. Due to NGS addition it does not need to be super dense…
4. 40K coverage = 100-125kb coverage
5. 100k coverage allows for good SNP coverage…we are not doing
GWAS or WGS
Compromise
1) Costly to perform 180K
2) >180k density is overkill
3) Array formats limit patient access to Large
Centralised Institutes
4) TAT is a consideration
5) CGH is ONLY 1 COMPONENT required
for accurate Diagnosis and patient
stratification
6) Platforms are Stagnant….Inflexible
Require:
Typical Array Platforms
Flexible Single Patient Alternatives
Next-Generation
DNA Sequencing Technology
DNA Has Only Two Jobs
• It serves as a store of
information
– Ensuring that information is
passed on to each new cell upon
division (and the next generation)
• It directs the synthesis of
proteins
– Which are necessary to carry out
the functions of a living organism
DNA’s Structure Explains How it Accomplishes
Both Jobs
DNA Serves as a Store of Information
A C
T G
T
A G
C
Adenine (A)
Always binds to
Thymine (T)
Cytosine (C)
Always binds to
Guanine (G)
As a double helix, each DNA molecule contains a copy of itself
DNA’s Structure Explains How it Accomplishes
Both Jobs
DNA Directs the Synthesis of Proteins
A C
T
G
• It is the order of the bases that provides the
instructions for protein synthesis
• One stretch of DNA directs the synthesis of one
type of protein, another stretch directs the synthesis
of another type of protein
We call a stretch of DNA that directs the
synthesis of one particular protein a Gene
Gene #1
Gene #2
Gene #3
DNA Sequencing
• You have 3 billion bases arrayed in a unique order,
with ~20,000 genes that direct the synthesis of all
the proteins that comprise you
• “Sequencing” DNA is simply the elucidation of the
order of the bases in an organism’s DNA strand
• The unique order of your bases greatly influences
your health, e.g.
– What disease you are more - or less - prone to
– How you will react to different medications
• A human “genome” is about 6 feet of DNA
– And each of your cells contains two copies of your
genome
A Human DNA Sequence
atcgtgactgattaccaggatcctagcggatcctactgacctgacgtacgtaatgcagtggtcaggttgttcaactcgatgactagaatatatccaggaaaatccctgggaaaaattgggccctac
gtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgtaggcccttgaatcttggcagtcgtaacgtacgtac
ggtactggtaacgtgaggtcaggttgttcaactcatccaggaaaatccctgggaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgacag
atagacagattgtcgtgttatvtgacttggaactgtaggcccttgaatcttggcagtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgttcaactcatcgtgactgattaccaggatcta
ctagaagaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgtaggcccttgaa
tcttggcagtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgttcaactcatccaggaaaatccctgggaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgttt
ccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgtaggcccttgaatcttggcagtcgtaacgtactagaatatatccaggaaaatccctgggaaaaattgg
aacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgtaggcccttgaatcttggcagtcgtaacgtacgtagccctac
gtaccgtcggtactggtaacgtgaggtcaggttgttcaactcatccaggattagatccgtagatcgtaggaaatatctcggataattaacagatacacacccttagaccatttaaatccctgggaaa
aattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgtaggcccttgaatcttggcagtc
gtaacgtacgtacggtactggtaacgtgaggtcaggttgttcaactcatcgtgactgattaccaggatcctagcggatcctactgacctgacgtacgtaatgcagtggtcaggttgttcaactcgat
gactagaatatatccaggaaaatccctgggaaaaattgggccctacgtgtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgttcaactcatccaggaaaatccctgggaaaaattg
ggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgtaggcccttgaatcttggcagtcgtaa
cgacgtttccaggctacacacacactgacagatagacagattcaaattcagtcccttgaatcttggcagtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgttcaactcatcgtgact
gattaccaggatcctagcggatcctactgacctgacgtacgtaatgcagtggtcaggttgttcaactcgatgagaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttcca
ggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgtaggcccttgaatcttggcagtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgttcaactcat
ccaggaaaatccctgggaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgt
aggcccttgaatcttggcagtcgtaacgtactagaatatatccaggaaaatccctgggaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacact
gacagatagacagattgtcgtgttatvtgacttggaactgtaggcccttgaatcttggcagtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgttcatatatccaggaaaatccctgg
gaaaaattggctacgtaccgtattaactaggatctccgatggtacccattaagacacccaaaataggtaacaggtagacatattgatacccatagaggatagatttaggacgttgcaaattcagtcg
gtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgtaggcccttgaatcttggcagtcgtaacgtacgtacggtactggtaacgtgaggtcaggtt
gttcaactcatccaggaaaatccctgggaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgac
ttggaactgtaggcccttgaatcttggcagtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgttcaactcatcgtgactgattaccaggatcctagcggatcctactgacctgacgta
cgtaatgcagtggtcaggttgttcaactcgatgagaaaaattgggccctacgtaccgtaacgttgtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaa
ctgtaggcccttgaatcttggcagtcgtaacgtactagaatatatccaggaaaatccctgggaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacac
actgacagatagacagattgtcgtgttatvtgacttggaactgtaggcccttgaatcttggcagtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgttcaactcatccaggaaaatcc
ctgggaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccagctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgtaggcccttgaatctt
ggcagtcgtaacgtacgtacgggtactggtaacgtgaggtcaggttgttcaactcatcgtgactgattaggaaaatccctgggaaaaattgggccctacgtaccgtaacgttgcaaattcagtcgg
tacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgtaggcccttgaatcttggcagtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgt
tcaactcatccaggaaaatccctgggaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgactt
ggaactgtaggcccttgaatcttggcagtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgttcaactcatcgtgactgattaccaggatcctagcggatcctactgacctgacgtac
gtaatgcagtggtcaggttgttcaactcgatgagaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgtta
tvtgacttggaactgtaggcccttgaatcttggcagtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgttcaactcatccaggaaaatccctgggaaaaattgggccctacgtaccg
taacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgtaggcccttgaatcttggcagtcgtaacgtactagaatatatc
caggaaaatccctgggaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgta
ggcccttgaatcttggcagtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgttcaactcatccaggaaaatccctgggaaaaattgggccctacgtaccgtaacgttgcaaattcag
tcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgtaggcccttgacgtacggtactggtaacgtgaggtcaggttgttcaactcatccagga
aaatccctgggaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgtaggccct
tgaatcttggcagtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgttcaactcatcgtgactgattaccaggatcctagcggatcctactgacctgacgtacgtaatgcagtggtcag
gttgttcaactcgatgactagaatatatccaggaaaatccctgggaaaaattgggccctacgtgtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgttcaactcatccaggaaaatcc
ctgggaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgtaggcgccttgaat
cttggcagtcgtaacgtactagaatatatccaggaaaatccctgggaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagaca
gattgtcgtgttatvtgacttggaactgtaggcccttgaatcttggcagtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgttcaactcatccaggaaaatccctgggaaaaattggg
ccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgtaggcacacacacactgacagagacaga
ttgtcgtgttatvtgacttggaactgtaggcccttgaatcttggcagtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgttcaactcatccaggaaaatccctgggaaaaattgggcc
ctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgcagatagacagattgtcgtgttatvtgacttggaactgtaggcccttgaatcttggcagtcgtaacgtacg
tacggtactggtaacgtgagagtcaggttgttcaactcatcgtgactgattaccaggatcctagcggatcctactgacctgacgtacgtaatgcagtggtcaggttgttcaactcgatgagaaaaat
tgggccctacgtaccgtaacgttgcaaattcagtcggtctagcggatcctactgacctgacgtacgtaatgcagtggtcaggttgttcaactcgatgactagaatatatccaggaaaatccctggac
tcatccaggaaaatccctgggaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaac
• ~1/1,000,000th of
the Human Genome
• Interspersed with
genes
• And
“polymorphisms”
which differ between
people
– Are sometimes
important,
influencing traits
– or medically
important
characteristics
Sequencing DNA
• Early techniques were developed in the 1970’s
• A variety of approaches now exist
• The biggest limitation to sequencing is that the genome is big
– So carrying out these reactions for an entire genome is slow and
expensive
Sanger Sequencing: 1975
Next Generation Sequencing
• Takes advantage of miniaturization to
engage in massively parallel analysis
– Essentially carrying out millions of sequencing
reactions simultaneously in each of 10 million
tiny wells
• Sophisticated computer analysis of huge
amounts of information allows “assembly"
of a given sequence
Illumina Genome Analyzer
Richard K. Wilson
Data Analysis Pipeline
Images
Intensities
Reads
Alignments
Sample IN……Simple OUT
-Not Discovery
-Not Foundation Medicine n=2500 genes
NGS Data Output
Whole Human Genome Sequencing Requires ~30x Coverage
Somatic ~500x Coverage
 Uneven Coverage - Poisson distribution of small DNA reads
 Flexible probe adjustment
 Sequencing Errors -machine/chemistry ( ~ 1% per 30Mbp)
 Systematic Biases - some regions are harder to sequence
 Flexible probe adjustment
 Alignment Problems - gaps, repeats, etc.
 Targeted Sequencing
 Quality Factor - additional data/metadata
Bioinformatics of Deep
Sequencing
The Basics.
Related documents