Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Uncovering mutational and DNA repair processes in the search for cis-regulatory mutations in cancer genomes Prince of Wales Clinical School Dr Jason Wong Senior Lecturer & ARC Future Fellow Head, Bioinformatics and Integrative Genomics Lowy Cancer Research Centre Somatic mutations in cancer Lawrence et al. Nature 2013 Identifying cancer driver mutations • “Function” of the mutation - Is the mutated gene important? - Does the mutation alter the function of the gene? BUT, function of non-protein coding genes/regions difficult to define. • Recurrence of the mutation - Is the mutation present in lots of samples? - Is the gene mutated in lots of samples? BUT, the more samples we sequence the more recurrently mutated genes we find. How do mutations form and accumulate? DNA lesion formation Replication and translesional DNA synthesis DNA repair proteins Lesion recognition and repair Formation of stable mutation Mutagenic processes • Exogenous factors (e.g. UV light) • Replication errors • Endogenous factors (AID) • Viruses/retrotransposons • DNA repair failure These ultimately lead to mutations Non-coding genomic regions • How to define functional non-coding regions? • What happens if we get mutations in these noncoding regions? Gregory Nat Rev Genetics 2005 How much functional DNA is there in the human genome? ENCODE suggests that as much as 80% of the genome is “functional”. However probably only ~8% is truly important – i.e. are: • Protein coding • Non-coding RNA • Regulatory sequences Rands et al PLoS Genetics 2014 What are gene regulatory regions? TSS Standard histone Nucleomsome free region Genic region H2A.Z histone Post-translation modification How to find regulatory sequences? Transcription factor ChIP-seq Histone ChIP-seq DNase-seq FLI1 ERG LMO2 TF ChIP-seq SCL GATA2 LYL1 RUNX1 H3K27ac Histone ChIP-seq H3K4me1 H3K4me3 DNase-seq RNA-seq Beck D… Wong JWH*, Pimanda JE*. Blood 2013 Do cis-regulatory mutations exist? Wild-type Mutant Review: Poulos RC, Sloane MA, Hesson LB, Wong JWH (2015) Oncotarget 6(32):32509-25 How do we find these mutations? 1. Whole cancer genome sequencing data. 2. Cell type (ideally sample) specific cis-regulatory data. OncoCis - Annotation of cis-regulatory mutations in cancer Unique features of OncoCis 1. Cell type specific epigenetic annotations. 2. Ability to assess TF motif creating mutations. 3. Integrate gene expression information. Cell types available in OncoCis Cell/tissue type Lung Prostate Liver Blood Blood Breast Melanocytes Cervical Colon Pancreas Astrocyte Osteoblast Mesenchymal stem cell Neural progenitor cell Embryonic stem cell Cell line name A549 LNCaP HepG2 K562 CD34 HMEC Melano HeLa HCT116 PANC-1 NHA Osteo MSC NPC ESC Description Alveolar basal epithelial adenocarcinoma Prostate epithelial adenocarcinoma Hepatocellular epithelial carcinoma Chronic myelogenous leukemia CD34+ mobilised hematopoeitic stem/progenitor cells Normal human mammary epithelial cells Normal foreskin melanocytes Cervival epithelial adenocarcinoma Colon epithelial carcinoma Pacreatic epithelioid carcinoma Normal human astrocytes Normal human osteoblasts Human mesenchymal stem cell, differentiated from ES cells Human neural progenitor cells, differentiated from ES cells Human embryonic stem cells, undifferentiated DNase I, H3K4me1, H3K4me3 and H3K27ac for all cell types from ENCODE or Epigenome Atlas Annotation of TERT promoter mutations Analysis of 17 whole breast cancer genomes Annotated using Human Mammary Epithelial Cell line (HMEC) data Integration with expression 18 mutations: 1. With significant change in expression relative to other samples without the mutation 2. Is within DHS and at least 1 histone mark 3. Is conserved 4. Creates or removes a motif Validation of CDK6 mutation 3000 PD4107a 2000 1000 0 Samples p = 0.013 10000 8000 6000 4000 2000 c/ C D K 6m ut t SV /lu c/ C D K 6w SV /lu SV /lu c 0 Perera D… Wong JWH (2014) Genome Biol 15:485 powcs.med.unsw.edu.au/oncocis How many mutations identified by OncoCis are truly functional? COLO-829 cell line • • • Metastatic melanoma cell line from 45 yr male. Matched “normal” cell line from B cells valiable (COLO-829BL) One of the first cell lines to be whole genome sequenced by the Sanger Institute (Pleasance et al Nature 2010) Assessing the function of promoter mutations in COLO-829 malignant melanoma cell line Substrate Promoter Luciferase reporter Luciferase protein Light? Wild-type or mutant 4 out of 23 promoters with mutations tested showed significant change in promoter activity NDUFB9 promoter mutation is recurrent in malignant melanoma No evidence that NDUFB9 promoter mutations are functional in vivo n.s. 6 out of 16 “non-functional” promoter mutations were also recurrent! GDAP1 (14%), PES1 (8%), STK19 (8%), WDR3 (3%), GPATCH2L (3%), BLCAP (3%) Poulos RC, Thoms JAI, Shah A, Beck D, Pimanda JE, Wong JWH (2015) Mol Cancer Res 13:1218-1226 Promoter mutations are frequent but how important are they in cancer? Weinhold et al. (2014) Nat Genetics 46 1160-1165 Fredriksson et al. (2014) Nat Genetics 46 1258-1263 Melton et al. (2015) Nat Genetics 47 710-716 • • • There quite a few recurrent mutations in gene promoters (18 in >5% of cancers). But TERT promoter mutations are exceptional – only ones with real strong links to function. Therefore, no strong evidence that promoter mutations are a major player in cancer. Systematic analysis of cis-regulatory mutations • Somatic point mutations from 1,161 whole cancer genome sequenced samples across 14 cancer types from the ICGC, M ed ul C lo LL bl as to As tro ma cy to m a Li ve Ly r m ph o Pa ma nc re at ic Br ea st R en al Pr os ta te M el an om datasets. a Lu ng Co lo n O va ri Es op an ha ge al TCGA and various other public • Annotated mutations using ENCODE/Epigenome Atlas data from 14 cell line/tissues. M e As lan tro om cy a to m a Lu O n Es va g op ria ha n g Pr ea l o Pa sta nc t e re M at ed ul B ic lo re bl as as t to m a Li ve r Ly Re m na ph l om a C LL C ol on Mutation density ratio (DHS/ 1kb DHS flank) Promoter/Enhancer flank DHS centre (150 bp) flank DNase I endonuclease (DHS = DNase I Hypersensitive Site) Identifying underlying causes for increased promoter mutation density Melanoma DNase I coverage Gene expression GC content Replication timing Proportion rare SNP Cancer gene Conservation (GERP) OR 1.51 1.08 1.04 1.03 0.97 0.90 0.85 95% CI 1.45,1.56 1.04,1.13 1.00,1.08 0.99,1.08 0.95,1.00 0.69,1.16 0.82,0.89 p-value < 2E-16 0.00032 0.0891 0.112 0.078 0.432 7.69E-14 adj. OR 1.56 0.87 0.92 0.98 0.81 0.88 adj. 95% CI 1.50,1.63 -,0.23,3.26 0.88,0.97 0.95,1.01 0.62,1.05 0.84,0.82 adj. p-value <2E-16 0.8313 0.0005 0.2042 0.1272 1.22E-08 OR 1.25 1.38 1.10 0.99 1.00 1.34 0.95 95% CI 1.17,1.32 1.14,1.67 1.02,1.18 0.92,1.06 0.95,1.03 0.89,1.95 0.88,1.02 p-value 5.07E-13 0.00101 0.0149 0.682 0.7867 0.133 0.13 adj. OR 1.23 0.35 0.91 0.99 1.25 0.97 adj. 95% CI 1.15,1.31 -,0.0,3.6 0.85,0.99 0.95,1.02 0.82,1.80 0.90,1.05 adj. p-value 3.11E-09 0.3717 0.0185 0.7067 0.2693 0.4975 OR 1.09 1.03 1.18 0.87 1.00 1.21 0.91 95% CI 1.03,1.16 0.97,1.10 1.11,1.27 0.82,0.93 0.97,1.04 0.82,1.72 0.85,0.98 p-value 0.00321 0.376 1.18E-06 2.10E-05 0.888 0.321 0.00867 adj. OR 1.15 1.36 0.81 0.98 1.17 0.92 adj. 95% CI 1.07,1.22 -,0.16,11.63 0.75,0.87 0.93,1.03 0.79,1.67 0.85,0.98 adj. p-value 3.01E-05 0.7769 3.76E-10 0.5618 0.4089 0.0133 Ovarian Cancer DNase I coverage Gene expression GC content Replication timing Proportion rare SNP Cancer gene Conservation (GERP) Lung Cancer DNase I coverage Gene expression GC content Replication timing Proportion rare SNP Cancer gene Conservation (GERP) Mutation density is dictated by chromatin accessibility Mutation density is dictated by chromatin accessibility Mutation density is dictated by chromatin accessibility Mutation density is dictated by chromatin accessibility Regions with increased mutation rates coincides with in transcription factor binding site. Digital genomic footprinting Neph et al. Nature 2012 Differential NER is responsible for increased promoter mutation density Use mutations from genomes of people without NER (i.e. xeroderma pigmentosum – XPC-/-) Whole SCC genomes of XPCwildand XPC-/- patients from Zheng et al. (2014) Cell Reports 9:1228 type Mutations/mb Repair (normalised read count) Mutational signatures Use mutations patterns to determine what mutagen has cause the cancer - Currently there are 30 signatures identified. Signature 1 associated with aging, signature 7 associated with UV exposure, etc Mutagen underlying many signature still unknown. Alexandrov et al. (2013) Nature 500:415-421 http://cancer.sanger.ac.uk/cosmic/signatures Mutations (%) Melanoma mutation signatures Mutations (%) Lung cancer mutation signatures Transcription initiation is necessary for impaired NER What about enhancers? Andersson et al Nature 2014 Promoters (Top 25% DHS) Ubiquitous enhancers (n=200) Enhancers (Matched DHS) Permissive enhancers (Matched DHS) 4 *** *** 6 3 4 2 el an M om an Lu ng 0 om 0 Lu ng 1 a O va ria n 2 el * *** a O va ria n *** M p=0.37 GG-NER machinery Lesion recognition by XPC is occluded Lesion recognition and repair Transcription preinitiation complex UV light DNA damage Enhancer TSS Promoter • Active transcription initiation inhibits NER. • Promoter mutation hotspots are present in cancers where NER is necessary to repair DNA lesions. • Is the mechanism a potential source of cancer causing mutations? Or are these mutations well tolerated? Perera D*, Poulos RC*, Shah A, Beck D, Pimanda JE, Wong JWH (2016) Nature 532: 259-263 CTCF binding sites are highly mutated in cancer Cohesin complex Ong & Corces Nature Rev Genetics 2014 CTCF CTCF binding site mutations are also highly mutated in skin cancers Skin cancers form CTCF motif mutations at specific and unique positions Other cancers: ~45% Oesophageal adenocarcinoma ~20% Hepatocellular carcinoma ~15% Gastric adenocarcinoma ~3% Colorectal adenocarcinoma Katainen et al Nature Genetics 2014 CTCF mutations only accumulate at cohesin loops Allele specific CTCF binding in COLO-829 COLO829 mutations WGS CTCF ChIP-seq (rep1) CTCF ChIP-seq (rep2) Reads WT: 8 Mutant: 10 ChIP-seq IgG ChIP-seq DNase-seq H3K27ac (Melanocytes) Reads WT: 77 Mutant: 6 Allele specific CTCF binding in COLO-829 *** 1.0 0.8 0.6 0.4 0.2 Pe a k n= 92 ot if) , (n on -m M ot if, n= 12 0.0 Is the lost of CTCF binding important in melanoma? CTCF loop anchors Mutations CTCF ChIPseq DNase-seq Neighbourhood genes Genes in mutated neighbourhoods n=47 skin cancers ASB8 COL2A1 PFKM SENP1 TMEM106C VDR IRF8 RASD2 CACUL1 CNDP2 FAM69C GLI2 GPR37 GRIN2B IFNK IGSF9 IKBKB MOB3B NANOS1 PLAT PPP1R1C PRLHR SLAMF9 SMOC1 # mutated samples 6 4 2 0 SSFA2 Poulos RC, Thoms JAI, Guan YF, Unnikrishnan A, Pimanda JE & Wong JWH Cell Reports 2016. Hnisz et al. Science (2016) 351:1454-1458 Ji et al. Cell Stem Cell (2015) 18:262-275 Summary – Part 1 • Impaired nucleotide excision repair (NER) results in mutation hotspots at active promoters and CTCF/cohesin binding sites. • This is most obvious in melanoma due to the dependence of NER to fix UV-induced lesions. • Is this a mechanism that generally contributes to cancer development or is largely a passenger event? • Are CTCF binding site mutations in other cancers (with signature 17) also caused by a similar mechanism? Why is there a lack of promoter mutations in colorectal cancer? Colon adenocarcinoma Mutations/mb 50 40 30 20 10 Promoter DHS 25 00 50 00 75 00 10 00 0 0 -1 00 00 -7 50 0 -5 00 0 -2 50 0 0 Distance from DHS (bps) DNA methylation driven mutagenesity DNA methylation underlies decreased promoter mutations in colorectal cancers Replication timing has different impact on mutation rates in different types CRCs Repair of DNA methylation is dependent on MMR and TDG Base excision repair (Thymine deglycosylase) me ----CG-------GC---me me deamination ----TG-------GC---- ----CG-------GC---- me me Mismatch repair (Replication dependent) DNMT ----CG-------GC---me Modelling mutation probability based on epigenetic factors How can understanding mutational processes help? Summary – part 2 • DNA methylation drives CpG mutation accumulation rate in colorectal cancers. • Data shows that mismatch repair normally repairs most mCpG driven mismatches. • Investigate how mutations in other cancers are driven by this mCpG phenomenon. • Develop cancer type specific models to improve cancer driver mutation prediction. Acknowledgements All the research groups that have made their data publically available – TCGA, ICGC, Sanger Institute, ENCODE, FANTOM5, etc. Bioinformatics and Integrative Genomics Rebecca Poulos Dilmi Perera Anushi Shah Diego Chacon Regina Ryan Felix Ma Stem Cell Research Group A/Prof. John Pimanda Dr Julie Thoms Dr Ashwin Unnikrishnan Yi Fang Guan Centre for Health Technologies, UTS Dr Dominik Beck Intersect Pty Ltd.