Iterative Usage of Fixed and Random Effect Models for Powerful and Efficient Genome-Wide Association Studies


Genome-Wide Association Studies (GWAS) can reveal genetic-phenotypic relationships, but have limitations. To control false positives, population structure and kinship are incorporated in a fixed and random effect Mixed Linear Model (MLM). However, because of the confounding between population structure, kinship, and quantitative trait nucleotides (QTNs), MLM leads to false negatives, missing some potentially important discoveries. Here, we present a new method, Fixed and random model Circulating Probability Unification (FarmCPU). FarmCPU performs marker tests with associated markers as covariates in a fixed effect model and optimization on the associated covariate markers in a random effect model separately. This process enables efficient computation, removes the confounding, prevents model over-fitting, and controls false positives simultaneously. FarmCPU controls false positives as well as MLM with reductions in both false negatives and computing times. Researchers will not only be able to analyze big data, but will also have greater success with fewer mistakes when mapping genes of interest.


Vyšlo v časopise: Iterative Usage of Fixed and Random Effect Models for Powerful and Efficient Genome-Wide Association Studies. PLoS Genet 12(2): e32767. doi:10.1371/journal.pgen.1005767
Kategorie: Research Article
prolekare.web.journal.doi_sk: 10.1371/journal.pgen.1005767

Souhrn

Genome-Wide Association Studies (GWAS) can reveal genetic-phenotypic relationships, but have limitations. To control false positives, population structure and kinship are incorporated in a fixed and random effect Mixed Linear Model (MLM). However, because of the confounding between population structure, kinship, and quantitative trait nucleotides (QTNs), MLM leads to false negatives, missing some potentially important discoveries. Here, we present a new method, Fixed and random model Circulating Probability Unification (FarmCPU). FarmCPU performs marker tests with associated markers as covariates in a fixed effect model and optimization on the associated covariate markers in a random effect model separately. This process enables efficient computation, removes the confounding, prevents model over-fitting, and controls false positives simultaneously. FarmCPU controls false positives as well as MLM with reductions in both false negatives and computing times. Researchers will not only be able to analyze big data, but will also have greater success with fewer mistakes when mapping genes of interest.


Zdroje

1. Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, others. Common {SNPs} explain a large proportion of the heritability for human height. Nat Gen. 2010;42: 565–569.

2. Tian F, Bradbury PJ, Brown PJ, Hung H, Sun Q, Flint-Garcia S, et al. Genome-wide association study of leaf architecture in the maize nested association mapping population. Nat Genet. 2011;43: 159–162. doi: 10.1038/ng.746 21217756

3. Frazer KA, Murray SS, Schork NJ, Topol EJ. Human genetic variation and its contribution to complex traits. Nat Rev Genet. 2009;10: 241–251. doi: 10.1038/nrg2554 19293820

4. Visscher PM, Yang J, Goddard ME. A commentary on “common SNPs explain a large proportion of the heritability for human height” by Yang et al. (2010). Twin Res Hum Genet. 2010;13: 517–524. doi: 10.1375/twin.13.6.517 21142928

5. Marchini J, Cardon LR, Phillips MS, Donnelly P. The effects of human population structure on large genetic association studies. Nat Genet. 2004;36: 512–517. 15052271

6. Yang J, Zaitlen N a, Goddard ME, Visscher PM, Price AL. Advantages and pitfalls in the application of mixed-model association methods. Nat Genet. 2014;46: 100–6. doi: 10.1038/ng.2876 24473328

7. Price AL, Zaitlen NA, Reich D, Patterson N. New approaches to population stratification in genome-wide association studies. Nat Rev Genet. 2010;11: 459–463. doi: 10.1038/nrg2813 20548291

8. Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38: 904–909. 16862161

9. Yu J, Pressoir G, Briggs WH, Vroh Bi I, Yamasaki M, Doebley JF, et al. A unified mixed-model method for association mapping that accounts for multiple levels of relatedness. Nat Genet. 2006;38: 203–208. 16380716

10. Thornsberry JM, Goodman MM, Doebley J, Kresovich S, Nielsen D, Buckler ES. Dwarf8 polymorphisms associate with variation in flowering time. Nat Genet. 2001;28: 286–289. 11431702

11. Larsson SJ, Lipka AE, Buckler ES. Lessons from Dwarf8 on the Strengths and Weaknesses of Structured Association Mapping. PLoS Genet. 2013;9.

12. Reich D, Price AL, Patterson N. Principal component analysis of genetic data. Nature genetics. 2008. pp. 491–492. doi: 10.1038/ng0508-491 18443580

13. McVean G. A genealogical interpretation of principal components analysis. PLoS Genet. 2009;5.

14. Zhao K, Aranzana MJ, Kim S, Lister C, Shindo C, Tang C, et al. An Arabidopsis example of association mapping in structured samples. PLoS Genet. 2007;3: 0071–0082.

15. Kang HM, Zaitlen NA, Wade CM, Kirby A, Heckerman D, Daly MJ, et al. Efficient control of population structure in model organism association mapping. Genetics. 2008;178: 1709–1723. doi: 10.1534/genetics.107.080101 18385116

16. Kang HM, Sul JH, Service SK, Zaitlen NA, Kong S- Y, Freimer NB, et al. Variance component model to account for sample structure in genome-wide association studies. Nat Genet. 2010;42: 348–354. doi: 10.1038/ng.548 20208533

17. Zhang Z, Ersoz E, Lai CQ, Todhunter RJ, Tiwari HK, Gore M a, et al. Mixed linear model approach adapted for genome-wide association studies. Nat Genet. Nature Publishing Group; 2010;42: 355–360.

18. Zhou X, Stephens M. Genome-wide efficient mixed-model analysis for association studies. Nature Genetics. 2012. pp. 821–824. doi: 10.1038/ng.2310 22706312

19. Lippert C, Listgarten J, Liu Y, Kadie CM, Davidson RI, Heckerman D. FaST linear mixed models for genome-wide association studies. Nature Methods. 2011. pp. 833–835. doi: 10.1038/nmeth.1681 21892150

20. Svishcheva GR, Axenovich TI, Belonogova NM, van Duijn CM, Aulchenko YS. Rapid variance components–based method for whole-genome association analysis. Nature Genetics. 2012. pp. 1166–1170. doi: 10.1038/ng.2410 22983301

21. Atwell S, Huang YS, Vilhjálmsson BJ, Willems G, Horton M, Li Y, et al. Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature. 2010;465: 627–631. doi: 10.1038/nature08800 20336072

22. Li M, Liu X, Bradbury P, Yu J, Zhang Y- M, Todhunter RJ, et al. Enrichment of statistical power for genome-wide association studies. BMC Biol. 2014;12: 73. doi: 10.1186/s12915-014-0073-5 25322753

23. Listgarten J, Lippert C, Kadie CM, Davidson RI, Eskin E, Heckerman D. Improved linear mixed models for genome-wide association studies. Nature Methods. 2012. pp. 525–526. doi: 10.1038/nmeth.2037 22669648

24. Wang Q, Tian F, Pan Y, Buckler ES, Zhang Z. A super powerful method for genome wide association study. PLoS One. 2014;

25. Segura V, Vilhjálmsson BJ, Platt A, Korte A, Seren Ü, Long Q, et al. An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nature Genetics. 2012. pp. 825–830. doi: 10.1038/ng.2314 22706313

26. VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91: 4414–4423. doi: 10.3168/jds.2007-0980 18946147

27. Lan Q, Hsiung C a, Matsuo K, Hong Y-C, Seow A, Wang Z, et al. Genome-wide association analysis identifies new lung cancer susceptibility loci in never-smoking women in Asia. Nat Genet. 2012;44: 1330–5. doi: 10.1038/ng.2456 23143601

28. Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 2014;42.

29. Romay MC, Millard MJ, Glaubitz JC, Peiffer JA, Swarts KL, Casstevens TM, et al. Comprehensive genotyping of the USA national maize inbred seed bank. Genome Biol. 2013;14: R55. doi: 10.1186/gb-2013-14-6-r55 23759205

30. Neves HHR, Carvalheiro R, Queiroz S a. A comparison of statistical methods for genomic selection in a mice population. BMC Genet. 2012;13: 100. doi: 10.1186/1471-2156-13-100 23134637

31. Fan B, Onteru SK, Du ZQ, Garrick DJ, Stalder KJ, Rothschild MF. Genome-wide association study identifies loci for body composition and structural soundness traits in pigs. PLoS One. 2011;6.

32. Michaels SD, Amasino RM. FLOWERING LOCUS C encodes a novel MADS domain protein that acts as a repressor of flowering. Plant Cell. 1999;11: 949–956. 10330478

33. Tucker G, Price AL, Berger B. Improving the power of GWAS and avoiding confounding from population stratification with PC-select. Genetics. 2014. pp. 1045–1049.

34. Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS Genet. 2006;2: 2074–2093.

35. Loh P, Tucker G, Bulik-sullivan BK, Vilhj BJ. Efficient Bayesian mixed model analysis increases association power in large cohorts. Nat Genet. 2014;47: 284–290.

36. Bulik-Sullivan B, Loh P- R, Finucane H, Ripke S, Yang J, Psychiatric Genomics Consortium SWG, et al. LD Score Regression Distinguishes Confounding from Polygenicity in Genome-Wide Association Studies [Internet]. Nature Genetics. 2015.

37. Widmer C, Lippert C, Weissbrod O, Fusi N, Kadie C, Davidson R, et al. Further improvements to linear mixed models for genome-wide association studies. Sci Rep. 2014;4: 6874. doi: 10.1038/srep06874 25387525

38. Kulikova T, Akhtar R, Aldebert P, Althorpe N, Andersson M, Baldwin A, et al. EMBL Nucleotide Sequence Database in 2006. Nucleic Acids Res. 2007;35.

39. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet. 2007;81: 559–575. 17701901

40. Aulchenko YS, Ripke S, Isaacs A, van Duijn CM. GenABEL: an R library for genome-wide association analysis. Bioinformatics. 2007;23: 1294–1296. 17384015

41. Altshuler D, Lander E, Ambrogio L. A map of human genome variation from population scale sequencing. Nature. 2010;476: 1061–1073.

42. Lipka AE, Tian F, Wang Q, Peiffer J, Li M, Bradbury PJ, et al. GAPIT: genome association and prediction integrated tool. Bioinformatics. 2012;28: 2397–9. 22796960

Štítky
Genetika Reprodukčná medicína
Prihlásenie
Zabudnuté heslo

Nemáte účet?  Registrujte sa

Zabudnuté heslo

Zadajte e-mailovú adresu, s ktorou ste vytvárali účet. Budú Vám na ňu zasielané informácie k nastaveniu nového hesla.

Prihlásenie

Nemáte účet?  Registrujte sa