Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
molecular cytogenetics Samuel Murray Karyotyping; limitations • You only see abnormalities when they involve millions of nucleotides! Resolution: ~ 500 bands ~ 1 band per 6 million nucleotides • It is time-consuming, difficult to automate, interpretation is subjective. FISH; limitations • You have to know where to look! • It is time-consuming, relatively expensive, difficult to automate Molecular cytogenetics in the postgenomic era: Array-based comparative genomic hybridization FISH Karyotyping array CGH Total genome High resolution BACkground BAC clones are widely used and precisely mapped in the Human Genome Project. In array CGH, these clones are used as targets on a microarray. BAC clones Chromosome Band Genes http://genome.ucsc.edu Array CGH procedure Patient DNA loss Reference DNA gain Hybridize overnight gain Male vs. Female Microarray slide with spotted BAC clones autosomal chr.X chrY no change loss From subtelomeric screening to tiling resolution Subtelomeric screening 1 Mb screening Tiling resolution screening 80 BACs Veltman et al. AJHG 2002 3500 BACs 32400 BACs Vissers et al. AJHG 2003 de Vries et al. AJHG 2005 Karyotyping vs. array CGH ~ 5-10 Mb resolution ~ 50-100 kb resolution 100x increased resolution Array CGH: advantages • You can identify copy number alterations throughout the genome at a very high resolution! array CGH ~ 100,000 nucleotides karyotyping ~ 5,000,000 nucleotides • Results can be mapped onto the human genome, allowing detailed genotype-phenotype studies! • The procedure is rapid and can be automated. Improve identification by increasing resolution Vissers et al. HMG 2005 Increased coverage SNP arrays unravels CNVs Detail chr. 8 32k BACs 100k SNPs 192 kb del, known CNV Yu et al. AJHG 2002 Are all Arrays Applicable There are many considerations in designing an array based CGH assay. In an ideal situation, the array should a) cover the entire genome, b) should have a high resolution c) and should be cost-effective Most importantly it should be Fit for Purpose Design Particulars 1. ‘Fit for Purpose’ = Application 2. CGH MUST out perform Karyotype 3. Due to NGS addition it does not need to be super dense… 4. 40K coverage = 100-125kb coverage 5. 100k coverage allows for good SNP coverage…we are not doing GWAS or WGS Compromise 1) Costly to perform 180K 2) >180k density is overkill 3) Array formats limit patient access to Large Centralised Institutes 4) TAT is a consideration 5) CGH is ONLY 1 COMPONENT required for accurate Diagnosis and patient stratification 6) Platforms are Stagnant….Inflexible Require: Typical Array Platforms Flexible Single Patient Alternatives Next-Generation DNA Sequencing Technology DNA Has Only Two Jobs • It serves as a store of information – Ensuring that information is passed on to each new cell upon division (and the next generation) • It directs the synthesis of proteins – Which are necessary to carry out the functions of a living organism DNA’s Structure Explains How it Accomplishes Both Jobs DNA Serves as a Store of Information A C T G T A G C Adenine (A) Always binds to Thymine (T) Cytosine (C) Always binds to Guanine (G) As a double helix, each DNA molecule contains a copy of itself DNA’s Structure Explains How it Accomplishes Both Jobs DNA Directs the Synthesis of Proteins A C T G • It is the order of the bases that provides the instructions for protein synthesis • One stretch of DNA directs the synthesis of one type of protein, another stretch directs the synthesis of another type of protein We call a stretch of DNA that directs the synthesis of one particular protein a Gene Gene #1 Gene #2 Gene #3 DNA Sequencing • You have 3 billion bases arrayed in a unique order, with ~20,000 genes that direct the synthesis of all the proteins that comprise you • “Sequencing” DNA is simply the elucidation of the order of the bases in an organism’s DNA strand • The unique order of your bases greatly influences your health, e.g. – What disease you are more - or less - prone to – How you will react to different medications • A human “genome” is about 6 feet of DNA – And each of your cells contains two copies of your genome A Human DNA Sequence atcgtgactgattaccaggatcctagcggatcctactgacctgacgtacgtaatgcagtggtcaggttgttcaactcgatgactagaatatatccaggaaaatccctgggaaaaattgggccctac gtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgtaggcccttgaatcttggcagtcgtaacgtacgtac ggtactggtaacgtgaggtcaggttgttcaactcatccaggaaaatccctgggaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgacag atagacagattgtcgtgttatvtgacttggaactgtaggcccttgaatcttggcagtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgttcaactcatcgtgactgattaccaggatcta ctagaagaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgtaggcccttgaa tcttggcagtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgttcaactcatccaggaaaatccctgggaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgttt ccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgtaggcccttgaatcttggcagtcgtaacgtactagaatatatccaggaaaatccctgggaaaaattgg aacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgtaggcccttgaatcttggcagtcgtaacgtacgtagccctac gtaccgtcggtactggtaacgtgaggtcaggttgttcaactcatccaggattagatccgtagatcgtaggaaatatctcggataattaacagatacacacccttagaccatttaaatccctgggaaa aattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgtaggcccttgaatcttggcagtc gtaacgtacgtacggtactggtaacgtgaggtcaggttgttcaactcatcgtgactgattaccaggatcctagcggatcctactgacctgacgtacgtaatgcagtggtcaggttgttcaactcgat gactagaatatatccaggaaaatccctgggaaaaattgggccctacgtgtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgttcaactcatccaggaaaatccctgggaaaaattg ggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgtaggcccttgaatcttggcagtcgtaa cgacgtttccaggctacacacacactgacagatagacagattcaaattcagtcccttgaatcttggcagtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgttcaactcatcgtgact gattaccaggatcctagcggatcctactgacctgacgtacgtaatgcagtggtcaggttgttcaactcgatgagaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttcca ggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgtaggcccttgaatcttggcagtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgttcaactcat ccaggaaaatccctgggaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgt aggcccttgaatcttggcagtcgtaacgtactagaatatatccaggaaaatccctgggaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacact gacagatagacagattgtcgtgttatvtgacttggaactgtaggcccttgaatcttggcagtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgttcatatatccaggaaaatccctgg gaaaaattggctacgtaccgtattaactaggatctccgatggtacccattaagacacccaaaataggtaacaggtagacatattgatacccatagaggatagatttaggacgttgcaaattcagtcg gtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgtaggcccttgaatcttggcagtcgtaacgtacgtacggtactggtaacgtgaggtcaggtt gttcaactcatccaggaaaatccctgggaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgac ttggaactgtaggcccttgaatcttggcagtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgttcaactcatcgtgactgattaccaggatcctagcggatcctactgacctgacgta cgtaatgcagtggtcaggttgttcaactcgatgagaaaaattgggccctacgtaccgtaacgttgtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaa ctgtaggcccttgaatcttggcagtcgtaacgtactagaatatatccaggaaaatccctgggaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacac actgacagatagacagattgtcgtgttatvtgacttggaactgtaggcccttgaatcttggcagtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgttcaactcatccaggaaaatcc ctgggaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccagctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgtaggcccttgaatctt ggcagtcgtaacgtacgtacgggtactggtaacgtgaggtcaggttgttcaactcatcgtgactgattaggaaaatccctgggaaaaattgggccctacgtaccgtaacgttgcaaattcagtcgg tacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgtaggcccttgaatcttggcagtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgt tcaactcatccaggaaaatccctgggaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgactt ggaactgtaggcccttgaatcttggcagtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgttcaactcatcgtgactgattaccaggatcctagcggatcctactgacctgacgtac gtaatgcagtggtcaggttgttcaactcgatgagaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgtta tvtgacttggaactgtaggcccttgaatcttggcagtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgttcaactcatccaggaaaatccctgggaaaaattgggccctacgtaccg taacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgtaggcccttgaatcttggcagtcgtaacgtactagaatatatc caggaaaatccctgggaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgta ggcccttgaatcttggcagtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgttcaactcatccaggaaaatccctgggaaaaattgggccctacgtaccgtaacgttgcaaattcag tcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgtaggcccttgacgtacggtactggtaacgtgaggtcaggttgttcaactcatccagga aaatccctgggaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgtaggccct tgaatcttggcagtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgttcaactcatcgtgactgattaccaggatcctagcggatcctactgacctgacgtacgtaatgcagtggtcag gttgttcaactcgatgactagaatatatccaggaaaatccctgggaaaaattgggccctacgtgtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgttcaactcatccaggaaaatcc ctgggaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgtaggcgccttgaat cttggcagtcgtaacgtactagaatatatccaggaaaatccctgggaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagaca gattgtcgtgttatvtgacttggaactgtaggcccttgaatcttggcagtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgttcaactcatccaggaaaatccctgggaaaaattggg ccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaactgtaggcacacacacactgacagagacaga ttgtcgtgttatvtgacttggaactgtaggcccttgaatcttggcagtcgtaacgtacgtacggtactggtaacgtgaggtcaggttgttcaactcatccaggaaaatccctgggaaaaattgggcc ctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgcagatagacagattgtcgtgttatvtgacttggaactgtaggcccttgaatcttggcagtcgtaacgtacg tacggtactggtaacgtgagagtcaggttgttcaactcatcgtgactgattaccaggatcctagcggatcctactgacctgacgtacgtaatgcagtggtcaggttgttcaactcgatgagaaaaat tgggccctacgtaccgtaacgttgcaaattcagtcggtctagcggatcctactgacctgacgtacgtaatgcagtggtcaggttgttcaactcgatgactagaatatatccaggaaaatccctggac tcatccaggaaaatccctgggaaaaattgggccctacgtaccgtaacgttgcaaattcagtcggtacgtttccaggctacacacacactgacagatagacagattgtcgtgttatvtgacttggaac • ~1/1,000,000th of the Human Genome • Interspersed with genes • And “polymorphisms” which differ between people – Are sometimes important, influencing traits – or medically important characteristics Sequencing DNA • Early techniques were developed in the 1970’s • A variety of approaches now exist • The biggest limitation to sequencing is that the genome is big – So carrying out these reactions for an entire genome is slow and expensive Sanger Sequencing: 1975 Next Generation Sequencing • Takes advantage of miniaturization to engage in massively parallel analysis – Essentially carrying out millions of sequencing reactions simultaneously in each of 10 million tiny wells • Sophisticated computer analysis of huge amounts of information allows “assembly" of a given sequence Illumina Genome Analyzer Richard K. Wilson Data Analysis Pipeline Images Intensities Reads Alignments Sample IN……Simple OUT -Not Discovery -Not Foundation Medicine n=2500 genes NGS Data Output Whole Human Genome Sequencing Requires ~30x Coverage Somatic ~500x Coverage Uneven Coverage - Poisson distribution of small DNA reads Flexible probe adjustment Sequencing Errors -machine/chemistry ( ~ 1% per 30Mbp) Systematic Biases - some regions are harder to sequence Flexible probe adjustment Alignment Problems - gaps, repeats, etc. Targeted Sequencing Quality Factor - additional data/metadata Bioinformatics of Deep Sequencing The Basics.