A Simple Model-Based Approach to Inferring and Visualizing Cancer Mutation Signatures


Somatic (non-inherited) mutations are acquired throughout our lives in cells throughout our body. These mutations can be caused, for example, by DNA replication errors or exposure to environmental mutagens such as tobacco smoke. Some of these mutations can lead to cancer. Different cancers, and even different instances of the same cancer, can show different distinctive patterns of somatic mutations. These distinctive patterns have become known as “mutation signatures”. For example, C > A mutations are frequent in lung caners whereas C > T and CC > TT mutations are frequent in skin cancers. Each mutation signature may be associated with a specific kind of carcinogen, such as tobacco smoke or ultraviolet light. Identifying mutation signatures therefore has the potential to identify new carcinogens, and yield new insights into the mechanisms and causes of cancer, In this paper, we introduce new statistical tools for tackling this important problem. These tools provide more robust and interpretable mutation signatures compared to previous approaches, as we demonstrate by applying them to large-scale cancer genomic data.


Vyšlo v časopise: A Simple Model-Based Approach to Inferring and Visualizing Cancer Mutation Signatures. PLoS Genet 11(12): e32767. doi:10.1371/journal.pgen.1005657
Kategorie: Research Article
prolekare.web.journal.doi_sk: https://doi.org/10.1371/journal.pgen.1005657

Souhrn

Somatic (non-inherited) mutations are acquired throughout our lives in cells throughout our body. These mutations can be caused, for example, by DNA replication errors or exposure to environmental mutagens such as tobacco smoke. Some of these mutations can lead to cancer. Different cancers, and even different instances of the same cancer, can show different distinctive patterns of somatic mutations. These distinctive patterns have become known as “mutation signatures”. For example, C > A mutations are frequent in lung caners whereas C > T and CC > TT mutations are frequent in skin cancers. Each mutation signature may be associated with a specific kind of carcinogen, such as tobacco smoke or ultraviolet light. Identifying mutation signatures therefore has the potential to identify new carcinogens, and yield new insights into the mechanisms and causes of cancer, In this paper, we introduce new statistical tools for tackling this important problem. These tools provide more robust and interpretable mutation signatures compared to previous approaches, as we demonstrate by applying them to large-scale cancer genomic data.


Zdroje

1. Stratton MR, Campbell PJ, Futreal PA. The cancer genome. Nature. 2009;458(7239):719–724. doi: 10.1038/nature07943 19360079

2. Pfeifer GP, Denissenko MF, Olivier M, Tretyakova N, Hecht SS, Hainaut P. Tobacco smoke carcinogens, DNA damage and p53 mutations in smoking-associated cancers. Oncogene. 2002 Oct;21(48):7435–7451. doi: 10.1038/sj.onc.1205803 12379884

3. Pfeifer GP, You YH, Besaratinia A. Mutations induced by ultraviolet light. Mutat Res. 2005 Apr;571(1-2):19–31. doi: 10.1016/j.mrfmmm.2004.06.057 15748635

4. Burns MB, Lackey L, Carpenter MA, Rathore A, Land AM, Leonard B, et al. APOBEC3B is an enzymatic source of mutation in breast cancer. Nature. 2013 Feb;494(7437):366–370. doi: 10.1038/nature11881 23389445

5. Burns MB, Temiz NA, Harris RS. Evidence for APOBEC3B mutagenesis in multiple human cancers. Nat Genet. 2013 Sep;45(9):977–983. doi: 10.1038/ng.2701 23852168

6. Roberts SA, Lawrence MS, Klimczak LJ, Grimm SA, Fargo D, Stojanov P, et al. An APOBEC cytidine deaminase mutagenesis pattern is widespread in human cancers. Nat Genet. 2013 Sep;45(9):970–976. doi: 10.1038/ng.2702 23852170

7. Nik-Zainal S, Alexandrov LB, Wedge DC, Van Loo P, Greenman CD, Raine K, et al. Mutational processes molding the genomes of 21 breast cancers. Cell. 2012 May;149(5):979–993. doi: 10.1016/j.cell.2012.04.024 22608084

8. Alexandrov LB, Nik-Zainal S, Wedge DC, Aparicio SA, Behjati S, Biankin AV, et al. Signatures of mutational processes in human cancer. Nature. 2013 Aug;500(7463):415–421. doi: 10.1038/nature12477 23945592

9. Alexandrov LB, Nik-Zainal S, Wedge DC, Campbell PJ, Stratton MR. Deciphering signatures of mutational processes operative in human cancer. Cell Rep. 2013 Jan;3(1):246–259. doi: 10.1016/j.celrep.2012.12.008 23318258

10. Fischer A, Illingworth CJ, Campbell PJ, Mustonen V. EMu: probabilistic inference of mutational processes and their localization in the cancer genome. Genome Biol. 2013 Apr;14(4):R39. doi: 10.1186/gb-2013-14-4-r39 23628380

11. Krawczak M, Ball EV, Cooper DN. Neighboring-nucleotide effects on the rates of germ-line single-base-pair substitution in human genes. Am J Hum Genet. 1998 Aug;63(2):474–488. doi: 10.1086/301965 9683596

12. Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics. 2000 Jun;155(2):945–959. 10835412

13. Blei DM, Ng AY, Jordan MI. Latent dirichlet allocation. Journal of Machine Learning Research. 2003;3:993–1022.

14. Eddelbuettel D, François R, Allaire J, Chambers J, Bates D, Ushey K. Rcpp: Seamless R and C++ integration. Journal of Statistical Software. 2011;40(8):1–18. doi: 10.18637/jss.v040.i08

15. Schneider TD, Stephens RM. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 1990 Oct;18(20):6097–6100. doi: 10.1093/nar/18.20.6097 2172928

16. Totoki Y, Tatsuno K, Covington KR, Ueda H, Creighton CJ, Kato M, et al. Trans-ancestry mutational landscape of hepatocellular carcinoma genomes. Nat Genet. 2014 Dec;46(12):1267–1273. doi: 10.1038/ng.3126 25362482

17. Rrnyi A. On measures of entropy and information. In: Fourth Berkeley symposium on mathematical statistics and probability. vol. 1; 1961. p. 547–561.

18. Hoang ML, Chen CH, Sidorenko VS, He J, Dickman KG, Yun BH, et al. Mutational signature of aristolochic acid exposure as revealed by whole-exome sequencing. Sci Transl Med. 2013 Aug;5(197):197ra102. doi: 10.1126/scitranslmed.3006200 23926200

19. Shinbrot E, Henninger EE, Weinhold N, Covington KR, Goksenin AY, Schultz N, et al. Exonuclease mutations in DNA polymerase epsilon reveal replication strand specific mutation patterns and human origins of replication. Genome Res. 2014 Nov;24(11):1740–1750. doi: 10.1101/gr.174789.114 25228659

20. Dellino GI, Cittaro D, Piccioni R, Luzi L, Banfi S, Segalla S, et al. Genome-wide mapping of human DNA-replication origins: levels of transcription at ORC1 sites regulate origin selection and replication timing. Genome Res. 2013 Jan;23(1):1–11. doi: 10.1101/gr.142331.112 23187890

21. Costello M, Pugh TJ, Fennell TJ, Stewart C, Lichtenstein L, Meldrim JC, et al. Discovery and characterization of artifactual mutations in deep coverage targeted capture sequencing data due to oxidative DNA damage during sample preparation. Nucleic Acids Res. 2013 Apr;41(6):e67. doi: 10.1093/nar/gks1443 23303777

22. Falush D, Stephens M, Pritchard JK. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics. 2003 Aug;164(4):1567–1587. 12930761

23. Hoyer PO. Non-negative matrix factorization with sparseness constraints. Journal of Machine Learning Research. 2004;5:1457–1469.

24. Engelhardt BE, Stephens M. Analysis of population structure: a unifying framework and novel methods based on sparse factor analysis. PLoS Genetics. 2010;6(9):e1001117. doi: 10.1371/journal.pgen.1001117 20862358

25. Kulesza A, Taskar B. Determinantal point processes for machine learning. arXiv preprint arXiv:12076083. 2012;.

26. Kwok JT, Adams RP. Priors for diversity in generative latent variable models. In: Advances in Neural Information Processing Systems; 2012. p. 2996–3004.

27. Hofmann T. Probabilistic latent semantic indexing. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval. ACM; 1999. p. 50–57.

28. Tang H, Peng J, Wang P, Risch NJ. Estimation of individual admixture: analytical and study design considerations. Genetic Epidemiology. 2005;28(4):289–301. doi: 10.1002/gepi.20064 15712363

29. Zhou H, Alexander D, Lange K. A quasi-Newton acceleration for high-dimensional optimization algorithms. Statistics and Computing. 2011;21(2):261–273. doi: 10.1007/s11222-009-9166-3 21359052

30. Griffiths TL, Steyvers M. Finding scientific topics. Proc Natl Acad Sci USA. 2004 Apr;101 Suppl 1:5228–5235. doi: 10.1073/pnas.0307752101 14872004

31. Teh YW, Newman D, Welling M. A collapsed variational Bayesian inference algorithm for latent Dirichlet allocation. In: Advances in Neural Information Processing Systems; 2006. p. 1353–1360.

32. Raj A, Stephens M, Pritchard JK. fastSTRUCTURE: variational inference of population structure in large SNP data sets. Genetics. 2014 Jun;197(2):573–589. doi: 10.1534/genetics.114.164350 24700103

33. Teh YW, Jordan MI, Beal MJ, Blei DM. Hierarchical dirichlet processes. Journal of the American Statistical Association. 2006;101(476). doi: 10.1198/016214506000000302

34. Meyerson M, Gabriel S, Getz G. Advances in understanding cancer genomes through second-generation sequencing. Nature Reviews Genetics. 2010;11(10):685–696. doi: 10.1038/nrg2841 20847746

35. Helleday T, Eshtad S, Nik-Zainal S. Mechanisms underlying mutational signatures in human cancers. Nature Reviews Genetics. 2014;15(9):585–598. doi: 10.1038/nrg3729 24981601

36. Zhao X, Huang H, Speed TP. Finding short DNA motifs using permuted Markov models. J Comput Biol. 2005;12(6):894–906. doi: 10.1089/cmb.2005.12.894 16108724

37. Schuster-Bockler B, Lehner B. Chromatin organization is a major influence on regional mutation rates in human cancer cells. Nature. 2012 Aug;488(7412):504–507. doi: 10.1038/nature11273 22820252

38. Hodgkinson A, Chen Y, Eyre-Walker A. The large-scale distribution of somatic mutations in cancer genomes. Hum Mutat. 2012 Jan;33(1):136–143. doi: 10.1002/humu.21616 21953857

39. Liu L, De S, Michor F. DNA replication timing and higher-order nuclear organization determine single-nucleotide substitution patterns in cancer genomes. Nat Commun. 2013;4:1502. doi: 10.1038/ncomms2502 23422670

40. Lawrence MS, Stojanov P, Polak P, Kryukov GV, Cibulskis K, Sivachenko A, et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature. 2013 Jul;499(7457):214–218. doi: 10.1038/nature12213 23770567

41. Polak P, Karli R, Koren A, Thurman R, Sandstrom R, Lawrence MS, et al. Cell-of-origin chromatin organization shapes the mutational landscape of cancer. Nature. 2015 Feb;518(7539):360–364. doi: 10.1038/nature14221 25693567

42. Varadhan R, Roland C. Simple and globally convergent methods for accelerating the convergence of any EM algorithm. Scandinavian Journal of Statistics. 2008;35(2):335–353. doi: 10.1111/j.1467-9469.2007.00585.x

43. Efron B, Tibshirani RJ. An introduction to the bootstrap. CRC press; 1994.

44. Akaike H. A new look at the statistical model identification. IEEE Transactions on Automatic Control. 1974;19(6):716–723. doi: 10.1109/TAC.1974.1100705

45. Schwarz G, et al. Estimating the dimension of a model. The Annals of Statistics. 1978;6(2):461–464. doi: 10.1214/aos/1176344136

46. Alexander DH, Lange K. Enhancements to the ADMIXTURE algorithm for individual ancestry estimation. BMC Bioinformatics. 2011;12(1):246. doi: 10.1186/1471-2105-12-246 21682921

47. Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 2009 Sep;19(9):1655–1664. doi: 10.1101/gr.094052.109 19648217

48. Ding C, Li T, Peng W. On the equivalence between non-negative matrix factorization and probabilistic latent semantic indexing. Computational Statistics & Data Analysis. 2008;52(8):3913–3927. doi: 10.1016/j.csda.2008.01.011

Štítky
Genetika Reprodukčná medicína

Článok vyšiel v časopise

PLOS Genetics


2015 Číslo 12
Najčítanejšie tento týždeň
Najčítanejšie v tomto čísle
Kurzy

Zvýšte si kvalifikáciu online z pohodlia domova

Získaná hemofilie - Povědomí o nemoci a její diagnostika
nový kurz

Eozinofilní granulomatóza s polyangiitidou
Autori: doc. MUDr. Martina Doubková, Ph.D.

Všetky kurzy
Prihlásenie
Zabudnuté heslo

Zadajte e-mailovú adresu, s ktorou ste vytvárali účet. Budú Vám na ňu zasielané informácie k nastaveniu nového hesla.

Prihlásenie

Nemáte účet?  Registrujte sa