An Empirical Bayes Mixture Model for Effect Size Distributions in Genome-Wide Association Studies

English version České info

We describe in detail the implications of a particular mixture model (a scale mixture of two normals) for effect size distributions from genome-wide genotyping data. Parameters from this model can be used for estimation of the non-null proportion, the probability of replication in de novo samples, the local false discovery rate, power for detecting non-null loci, and proportion of variance explained from additive effects. Here, we fit this model by minimizing discrepancies with nonparametric estimates from a resampling-based algorithm. We examine the effects of linkage disequilibrium (LD) on effect sizes and parameter estimates, both analytically and in simulations. We validate this approach using meta-analysis test statistics (“z-scores”) from two large GWAS, one for Crohn’s disease and the other for schizophrenia. We demonstrate that for these studies a scale mixture of two normal distributions generally fits empirical replication effect sizes well, providing an excellent fit for the schizophrenia effect sizes but underestimating the tails of the distribution for Crohn’s disease.

Vyšlo v časopise: An Empirical Bayes Mixture Model for Effect Size Distributions in Genome-Wide Association Studies. PLoS Genet 11(12): e32767. doi:10.1371/journal.pgen.1005717
Kategorie: Research Article
prolekare.web.journal.doi_sk: https://doi.org/10.1371/journal.pgen.1005717

Souhrn

Zdroje

1. Franke A, McGovern DP, Barrett JC, Wang K, Radford-Smith GL, et al. (2010) Genome-wide meta-analysis increases to 71 the number of confirmed crohn’s disease susceptibility loci. Nature genetics 42 : 1118–1125. doi: 10.1038/ng.717 21102463

2. Purcell SM, Wray NR, Stone JL, Visscher PM, O’Donovan MC, et al. (2009) Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460 : 748–752. doi: 10.1038/nature08185 19571811

3. Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK, et al. (2010) Common snps explain a large proportion of the heritability for human height. Nature genetics 42 : 565–569. doi: 10.1038/ng.608 20562875

4. Davies G, Tenesa A, Payton A, Yang J, Harris SE, et al. (2011) Genome-wide association studies establish that human intelligence is highly heritable and polygenic. Molecular psychiatry 16 : 996–1005. doi: 10.1038/mp.2011.85 21826061

5. Yang J, Bakshi A, Zhu Z, Hemani G, Vinkhuyzen AA, et al. (2015) Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nature genetics. doi: 10.1038/ng.3390

6. Glazier AM, Nadeau JH, Aitman TJ (2002) Finding genes that underlie complex traits. Science 298 : 2345–2349. doi: 10.1126/science.1076641 12493905

7. Hayes B, Goddard M, et al. (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157 : 1819–1829. 11290733

8. Wray NR, Goddard ME, Visscher PM (2007) Prediction of individual genetic risk to disease from genome-wide association studies. Genome research 17 : 1520–1528. doi: 10.1101/gr.6665407 17785532

9. Speed D, Hemani G, Johnson MR, Balding DJ (2012) Improved heritability estimation from genome-wide snps. The American Journal of Human Genetics 91 : 1011–1021. doi: 10.1016/j.ajhg.2012.10.010 23217325

10. Zhou X, Carbonetto P, Stephens M (2013) Polygenic modeling with bayesian sparse linear mixed models. PLoS genetics 9: e1003264. doi: 10.1371/journal.pgen.1003264 23408905

11. Park JH, Wacholder S, Gail MH, Peters U, Jacobs KB, et al. (2010) Estimation of effect size distribution from genome-wide association studies and implications for future discoveries. Nature genetics 42 : 570–575. doi: 10.1038/ng.610 20562874

12. Park JH, Gail MH, Weinberg CR, Carroll RJ, Chung CC, et al. (2011) Distribution of allele frequencies and effect sizes and their interrelationships for common genetic susceptibility variants. Proceedings of the National Academy of Sciences 108 : 18026–18031. doi: 10.1073/pnas.1114759108

13. Bulik-Sullivan B, Loh PR, Finucane H, Ripke S, Yang J, et al. (2014) Ld score regression distinguishes confounding from polygenicity in genome-wide association studies. bioRxiv: 002931.

14. Bukszár J, McClay JL, van den Oord EJ (2009) Estimating the posterior probability that genome-wide association findings are true or false. Bioinformatics 25 : 1807–1813. doi: 10.1093/bioinformatics/btp305 19420056

15. Yang J, Weedon MN, Purcell S, Lettre G, Estrada K, et al. (2011) Genomic inflation factors under polygenic inheritance. European Journal of Human Genetics 19 : 807–812. doi: 10.1038/ejhg.2011.39 21407268

16. Chatterjee N, Wheeler B, Sampson J, Hartge P, Chanock SJ, et al. (2013) Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies. Nature genetics 45 : 400–405. doi: 10.1038/ng.2579 23455638

17. Schork AJ, Thompson WK, Pham P, Torkamani A, Roddey JC, et al. (2013) All snps are not created equal: Genome-wide association studies reveal a consistent pattern of enrichment among functionally annotated snps. PLoS Genet 9: e1003449. doi: 10.1371/journal.pgen.1003449 23637621

18. Zablocki RW, Schork AJ, Levine RA, Andreassen OA, Dale AM, et al. (2014) Covariate-modulated local false discovery rate for genome-wide association studies. Bioinformatics: btu145.

19. Efron B, Tibshirani R (2002) Empirical bayes methods and false discovery rates for microarrays. Genetic epidemiology 23 : 70–86. doi: 10.1002/gepi.1124 12112249

20. Consortium SPGWASG, et al. (2011) Genome-wide association study identifies five new schizophrenia loci. Nature genetics 43 : 969–976. doi: 10.1038/ng.940

21. Ahmad T, Satsangi J, McGovern D, Bunce M, DP J (2002) Review article: the genetics of inflammatory bowel disease. Aliment Pharmacol Ther 15 : 731–748. doi: 10.1046/j.1365-2036.2001.00981.x

22. Lee SH, Wray NR, Goddard ME, Visscher PM (2011) Estimating missing heritability for disease from genome-wide association studies. The American Journal of Human Genetics 88 : 294–305. doi: 10.1016/j.ajhg.2011.02.002 21376301

23. Efron B (2010) Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. Cambridge: Cambridge University Press.

24. Sullivan PF, Kendler KS, Neale MC (2003) Schizophrenia as a complex trait: evidence from a meta-analysis of twin studies. Archives of general psychiatry 60 : 1187–1192. doi: 10.1001/archpsyc.60.12.1187 14662550

25. Lee SH, DeCandia TR, Ripke S, Yang J, Sullivan PF, et al. (2012) Estimating the proportion of variation in susceptibility to schizophrenia captured by common snps. Nature genetics 44 : 247–250. doi: 10.1038/ng.1108 22344220

26. of the Psychiatric Genomics Consortium SWG, et al. (2014) Biological insights from 108 schizophrenia-associated genetic loci. Nature 511 : 421–427. doi: 10.1038/nature13595 25056061

27. Efron B (2007) Size, power and false discovery rates. The Annals of Statistics 35 : 1351–1377. doi: 10.1214/009053606000001460

28. Devlin B, Roeder K (1999) Genomic control for association studies. Biometrics 55 : 997–1004. doi: 10.1111/j.0006-341X.1999.00997.x 11315092

29. DerSimonian R, Laird N (1986) Meta-analysis in clinical trials. Controlled clinical trials 7 : 177–188. doi: 10.1016/0197-2456(86)90046-2 3802833

30. Goddard ME, Wray NR, Verbyla K, Visscher PM (2009) Estimating effects and making predictions from genome-wide marker data. Statistical Science 24 : 517–529. doi: 10.1214/09-STS306

31. Zhou X, Stephens M (2012) Genome-wide efficient mixed-model analysis for association studies. Nature genetics 44 : 821–824. doi: 10.1038/ng.2310 22706312

32. Willer CJ, Li Y, Abecasis GR (2010) Metal: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26 : 2190–2191. doi: 10.1093/bioinformatics/btq340 20616382

33. Storey JD (2002) A direct approach to false discovery rates. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 64 : 479–498. doi: 10.1111/1467-9868.00346

34. Sun L, Bull SB (2005) Reduction of selection bias in genomewide studies by resampling. Genetic epidemiology 28 : 352–367. doi: 10.1002/gepi.20068 15761913

35. Faye LL, Sun L, Dimitromanolakis A, Bull SB (2011) A flexible genome-wide bootstrap method that accounts for rankingand threshold-selection bias in gwas interpretation and replication study design. Statistics in medicine 30 : 1898–1912. doi: 10.1002/sim.4228 21538984