Prediction of Complex Human Traits Using the Genomic Best Linear Unbiased Predictor

English version České info

Despite important advances from Genome Wide Association Studies (GWAS), for most complex human traits and diseases, a sizable proportion of genetic variance remains unexplained and prediction accuracy (PA) is usually low. Evidence suggests that PA can be improved using Whole-Genome Regression (WGR) models where phenotypes are regressed on hundreds of thousands of variants simultaneously. The Genomic Best Linear Unbiased Prediction (G-BLUP, a ridge-regression type method) is a commonly used WGR method and has shown good predictive performance when applied to plant and animal breeding populations. However, breeding and human populations differ greatly in a number of factors that can affect the predictive performance of G-BLUP. Using theory, simulations, and real data analysis, we study the performance of G-BLUP when applied to data from related and unrelated human subjects. Under perfect linkage disequilibrium (LD) between markers and QTL, the prediction R-squared (R²) of G-BLUP reaches trait-heritability, asymptotically. However, under imperfect LD between markers and QTL, prediction R² based on G-BLUP has a much lower upper bound. We show that the minimum decrease in prediction accuracy caused by imperfect LD between markers and QTL is given by (1−b)², where b is the regression of marker-derived genomic relationships on those realized at causal loci. For pairs of related individuals, due to within-family disequilibrium, the patterns of realized genomic similarity are similar across the genome; therefore b is close to one inducing small decrease in R². However, with distantly related individuals b reaches very low values imposing a very low upper bound on prediction R². Our simulations suggest that for the analysis of data from unrelated individuals, the asymptotic upper bound on R² may be of the order of 20% of the trait heritability. We show how PA can be enhanced with use of variable selection or differential shrinkage of estimates of marker effects.

Vyšlo v časopise: Prediction of Complex Human Traits Using the Genomic Best Linear Unbiased Predictor. PLoS Genet 9(7): e32767. doi:10.1371/journal.pgen.1003608
Kategorie: Research Article
prolekare.web.journal.doi_sk: https://doi.org/10.1371/journal.pgen.1003608

Souhrn

Zdroje

1. GuttmacherAE, CollinsFS (2002) Genomic medicine—a primer. New England Journal of Medicine 347 : 1512–1520.

2. National Institutes of Health, National Human Genome Research Institute (n.d.) A catalog of published genome-wide association studies. Available: http://www.genome.gov/gwastudies/.

3. MaherB (2008) Personal genomes: The case of the missing heritability. Nature 456 : 18.

4. ManolioTA, CollinsFS, CoxNJ, GoldsteinDB, HindorffLA, et al. (2009) Finding the missing heritability of complex diseases. Nature 461 : 747–753.

5. Lango AllenH, EstradaK, LettreG, BerndtSI, WeedonMN, et al. (2010) Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467 : 832–838 doi:10.1038/nature09410

6. de los CamposG, GianolaD, AllisonDB (2010) Predicting genetic predisposition in humans: the promise of whole-genome markers. Nat Rev Genet 11 : 880–886 doi:10.1038/nrg2898

7. YangJ, BenyaminB, McEvoyBP, GordonS, HendersAK, et al. (2010) Common SNPs explain a large proportion of the heritability for human height. Nature genetics 42 : 565–569.

8. MakowskyR, PajewskiNM, KlimentidisYC, VazquezAI, DuarteCW, et al. (2011) Beyond Missing Heritability: Prediction of Complex Traits. PLoS Genet 7: e1002051.

9. MeuwissenTH, HayesBJ, GoddardME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157 : 1819–1829.

10. BenjaminDJ, CesariniD, van der LoosMJHM, DawesCT, KoellingerPD, et al. (2012) The genetic architecture of economic and political preferences. Proceedings of the National Academy of Sciences 109 : 8026–8031.

11. HabierD, FernandoRL, DekkersJCM (2007) The impact of genetic relationship information on genome-assisted breeding values. Genetics 177 : 2389–2397.

12. HendersonCR (1975) Best linear unbiased estimation and prediction under a selection model. Biometrics 31 : 423–447.

13. PszczolaM, StrabelT, MulderHA, CalusMPL (2012) Reliability of direct genomic values for animals with different relationships within and to the reference population. Journal of dairy science 95 : 389–400.

14. DawberTR, MeadorsGF, MooreFEJr (1951) Epidemiological Approaches to Heart Disease: The Framingham Study*. American Journal of Public Health and the Nations Health 41 : 279–286.

15. CornelisMC, AgrawalA, ColeJW, HanselNN, BarnesKC, et al. (2010) The Gene, Environment Association Studies consortium (GENEVA): maximizing the knowledge obtained from GWAS by collaboration across studies of multiple conditions. Genetic epidemiology 34 : 364–372.

16. de los CamposG, HickeyJM, DaetwylerHD, Pong-WongR, CalusMPL (2012) Whole Genome Regression and Prediction Methods Applied to Plant and Animal Breeding. Genetics 193 : 327–345.

17. HoerlAE, KennardRW (1970) Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12 : 55–67.

18. FisherRA (1918) The correlation between relatives on the supposition of Mendelian inheritance. Transactions of the Royal Society of Edinburgh 52 : 399–433.

19. WrightS (1921) Systems of mating. II. The effects of inbreeding on the genetic composition of a population. Genetics 6 : 124.

20. HillWG, WeirBS (2011) Variation in actual relationship as a consequence of Mendelian sampling and linkage. Genetics Research 93 : 47–64 doi:10.1017/S0016672310000480

21. RitlandK (1996) A marker-based method for inferences about quantitative inheritance in natural populations. Evolution 1062–1073.

22. LynchM, RitlandK (1999) Estimation of pairwise relatedness with molecular markers. Genetics 152 : 1753.

23. VanRadenP (2007) Genomic measures of relationship and inbreeding. Interbull bull 37 : 33–36.

24. HayesBJ, VisscherP, GoddardM (2009) Increased accuracy of artificial selection by using the realized relationship matrix. Genet Res 91 : 47–60.

25. StrandénI, ChristensenOF (2011) Allele coding in genomic evaluation. GSE 43 : 25.

26. ZhangZ, LiuJ, DingX, BijmaP, De KoningDJ, et al. (2010) Best linear unbiased prediction of genomic breeding values using a trait-specific marker-derived relationship matrix. PloS one 5: e12648.

27. VanRadenPM, Van TassellCP, WiggansGR, SonstegardTS, SchnabelRD, et al. (2009) Invited review: reliability of genomic predictions for North American Holstein bulls. Journal of Dairy Science 92 : 16–24.

28. GoddardM (2009) Genomic selection: prediction of accuracy and maximisation of long term response. Genetica 136 : 245–257.

29. DaetwylerHD, VillanuevaB, WoolliamsJA (2008) Accuracy of predicting the genetic risk of disease using a genome-wide approach. PLoS One 3: e3395.

30. DaetwylerHD, Pong-WongR, VillanuevaB, WoolliamsJA (2010) The Impact of Genetic Architecture on Genome-Wide Evaluation Methods. Genetics 185 : 1021–1031 doi:10.1534/genetics.110.116855

31. VisscherPM (2010) A commentary on ‘common SNPs explain a large proportion of the heritability for human height’ by Yang et al.(2010). Twin Research and Human Genetics 13 : 517.

32. JanssL, de los CamposG, SheehanN, SorensenDA (2012) Inferences from Genomic Models in Stratifi_ed Populations. Genetics 693–704 doi:10.1534/genetics.112.141143

33. GoddardME, HayesBJ (2009) Mapping genes for complex traits in domestic animals and their use in breeding programmes. Nature Reviews Genetics 10 : 381–391.