#PAGE_PARAMS# #ADS_HEAD_SCRIPTS# #MICRODATA#

Robust Prediction of Expression Differences among Human Individuals Using Only Genotype Information


Many genetic variants that are significantly correlated to gene expression changes across human individuals have been identified, but the ability of these variants to predict expression of unseen individuals has rarely been evaluated. Here, we devise an algorithm that, given training expression and genotype data for a set of individuals, predicts the expression of genes of unseen test individuals given only their genotype in the local genomic vicinity of the predicted gene. Notably, the resulting predictions are remarkably robust in that they agree well between the training and test sets, even when the training and test sets consist of individuals from distinct populations. Thus, although the overall number of genes that can be predicted is relatively small, as expected from our choice to ignore effects such as environmental factors and trans sequence variation, the robust nature of the predictions means that the identity and quantitative degree to which genes can be predicted is known in advance. We also present an extension that incorporates heterogeneous types of genomic annotations to differentially weigh the importance of the various genetic variants, and we show that assigning higher weights to variants with particular annotations such as proximity to genes and high regional G/C content can further improve the predictions. Finally, genes that are successfully predicted have, on average, higher expression and more variability across individuals, providing insight into the characteristics of the types of genes that can be predicted from their cis genetic variation.


Vyšlo v časopise: Robust Prediction of Expression Differences among Human Individuals Using Only Genotype Information. PLoS Genet 9(3): e32767. doi:10.1371/journal.pgen.1003396
Kategorie: Research Article
prolekare.web.journal.doi_sk: https://doi.org/10.1371/journal.pgen.1003396

Souhrn

Many genetic variants that are significantly correlated to gene expression changes across human individuals have been identified, but the ability of these variants to predict expression of unseen individuals has rarely been evaluated. Here, we devise an algorithm that, given training expression and genotype data for a set of individuals, predicts the expression of genes of unseen test individuals given only their genotype in the local genomic vicinity of the predicted gene. Notably, the resulting predictions are remarkably robust in that they agree well between the training and test sets, even when the training and test sets consist of individuals from distinct populations. Thus, although the overall number of genes that can be predicted is relatively small, as expected from our choice to ignore effects such as environmental factors and trans sequence variation, the robust nature of the predictions means that the identity and quantitative degree to which genes can be predicted is known in advance. We also present an extension that incorporates heterogeneous types of genomic annotations to differentially weigh the importance of the various genetic variants, and we show that assigning higher weights to variants with particular annotations such as proximity to genes and high regional G/C content can further improve the predictions. Finally, genes that are successfully predicted have, on average, higher expression and more variability across individuals, providing insight into the characteristics of the types of genes that can be predicted from their cis genetic variation.


Zdroje

1. SchadtEE, LambJ, YangX, ZhuJ, EdwardsS, et al. (2005) An integrative genomics approach to infer causal associations between gene expression and disease. Nat Genet 37: 710–717 doi:10.1038/ng1589.

2. FraserHB, XieX (2009) Common polymorphic transcript variation in human disease. Genome Research 19: 567–575 doi:10.1101/gr.083477.108.

3. EmilssonV, ThorleifssonG, ZhangB, LeonardsonAS, ZinkF, et al. (2008) Genetics of gene expression and its effect on disease. Nature 452: 423–428 doi:10.1038/nature06758.

4. LiJ, LiuY, KimT, MinR, ZhangZ (2010) Gene expression variability within and between human populations and implications toward disease susceptibility. PLoS Comput Biol 6 doi:10.1371/journal.pcbi.1000910.

5. SchadtEE, MonksSA, DrakeTA, LusisAJ, CheN, et al. (2003) Genetics of gene expression surveyed in maize, mouse and man. Nature 422: 297–302 doi:10.1038/nature01434.

6. NicolaeDL, GamazonE, ZhangW, DuanS, DolanME, et al. (2010) Trait-Associated SNPs Are More Likely to Be eQTLs: Annotation to Enhance Discovery from GWAS. PLoS Genet 6: e1000888 doi:10.1371/journal.pgen.1000888.t001.

7. NicaAC, MontgomerySB, DimasAS, StrangerBE, BeazleyC, et al. (2010) Candidate Causal Regulatory Effects by Integration of Expression QTLs with Complex Trait Genetic Associations. PLoS Genet 6: e1000895 doi:10.1371/journal.pgen.1000895.t002.

8. FrazerKA, BallingerDG, CoxDR, HindsDA, StuveLL, et al. (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449: 851–861 doi:10.1038/nature06258.

9. StrangerBE, ForrestMS, DunningM, IngleCE, BeazleyC, et al. (2007) Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science 315: 848–853 doi:10.1126/science.1136678.

10. StrangerBE, NicaAC, ForrestMS, DimasA, BirdCP, et al. (2007) Population genomics of human gene expression. Nat Genet 39: 1217–1224 doi:10.1038/ng2142.

11. CheungVG, NayakRR, WangIX, ElwynS, CousinsSM, et al. (2010) Polymorphic cis- and trans-regulation of human gene expression. PLoS Biol 8 doi:10.1371/journal.pbio.1000480.

12. StrangerBE, MontgomerySB, DimasAS, PartsL, StegleO, et al. (2012) Patterns of cis regulatory variation in diverse human populations. PLoS Genet 8: e1002639 doi:10.1371/journal.pgen.1002639.

13. VeyrierasJ-B, KudaravalliS, KimSY, DermitzakisET, GiladY, et al. (2008) High-Resolution Mapping of Expression-QTLs Yields Insight into Human Gene Regulation. PLoS Genet 4: e1000214 doi:10.1371/journal.pgen.1000214.t001.

14. PickrellJK, MarioniJC, PaiAA, DegnerJF, EngelhardtBE, et al. (2010) Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 464: 768–772 doi:10.1038/nature08872.

15. MontgomerySB, SammethM, Gutierrez-ArcelusM, LachRP, IngleC, et al. (2010) Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464: 773–777 doi:10.1038/nature08903.

16. NicaAC, PartsL, GlassD, NisbetJ, BarrettA, et al. (2011) The Architecture of Gene Regulatory Variation across Multiple Human Tissues: The MuTHER Study. PLoS Genet 7(2): e1002003 doi:10.1371/journal.pgen.1002003.

17. DixonAL, LiangL, MoffattMF, ChenW, HeathS, et al. (2007) A genome-wide association study of global gene expression. Nat Genet 39: 1202–1207 doi:10.1038/ng2109.

18. StrangerBE, ForrestMS, ClarkAG, MinichielloMJ, DeutschS, et al. (2005) Genome-Wide Associations of Gene Expression Variation in Humans. PLoS Genet 1: e78 doi:10.1371/journal.pgen.0010078.sg003.

19. DegnerJF, PaiAA, Pique-RegiR, VeyrierasJ-B, GaffneyDJ, et al. (2012) DNase I sensitivity QTLs are a major determinant of human expression variation. Nature 482: 390–394 doi:10.1038/nature10808.

20. GaffneyDJ, VeyrierasJ-B, DegnerJF, RogerP-R, PaiAA, et al. (2012) Dissecting the regulatory architecture of gene expression QTLs. Genome biology 13: R7 doi:10.1186/gb-2012-13-1-r7.

21. MaloN, LibigerO, SchorkN (2008) Accommodating linkage disequilibrium in genetic-association analyses via ridge regression. The American Journal of Human Genetics 82: 375–385.

22. ZhangW, ZhuJ, SchadtEE, LiuJS (2010) A Bayesian Partition Method for Detecting Pleiotropic and Epistatic eQTL Modules. PLoS Comput Biol 6: e1000642 doi:10.1371/journal.pcbi.1000642.t003.

23. ZouW, ZengZ-B (2009) Multiple interval mapping for gene expression QTL analysis. Genetica 137: 125–134 doi:10.1007/s10709-009-9365-z.

24. LeeS-I, DudleyAM, DrubinD, SilverPA, KroganNJ, et al. (2009) Learning a Prior on Regulatory Potential from eQTL Data. PLoS Genet 5: e1000358 doi:10.1371/journal.pgen.1000358.t001.

25. CoverT, HartP (1967) Nearest neighbor pattern classification. Information Theory, IEEE Transactions on 13: 21–27 doi:10.1109/TIT.1967.1053964.

26. DasarathyB (1991) Nearest Neighbor ({NN}) Norms:{NN} Pattern Classification Techniques. citeulikeorg

27. ZouH, HastieT (2005) Regularization and variable selection via the elastic net - Zou - 2005 - Journal of the Royal Statistical Society: Series B (Statistical Methodology) - Wiley Online Library. Journal of the Royal Statistical Society: … 67: 301–320.

28. BenjaminiY, HochbergY (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B (Methodological) 289–300.

29. TilloD, KaplanN, MooreIK, Fondufe-MittendorfY, GossettAJ, et al. (2010) High nucleosome occupancy is encoded at human regulatory sequences. PLoS ONE 5: e9129 doi:10.1371/journal.pone.0009129.

30. KasowskiM, GrubertF, HeffelfingerC, HariharanM, AsabereA, et al. (2010) Variation in transcription factor binding among humans. Science 328: 232–235 doi:10.1126/science.1183621.

31. LuJ, ClarkAG (2012) Impact of microRNA regulation on variation in human gene expression. Genome Research doi:10.1101/gr.132514.111.

32. WangZ, ZangC, RosenfeldJA, SchonesDE, BarskiA, et al. (2008) Combinatorial patterns of histone acetylations and methylations in the human genome. Nat Genet 40: 897–903 doi:10.1038/ng.154.

33. DimasAS, DeutschS, StrangerBE, MontgomerySB, BorelC, et al. (2009) Common Regulatory Variation Impacts Gene Expression in a Cell Type-Dependent Manner. Science 325: 1246–1250 doi:10.1126/science.1174148.

34. PriceAL, HelgasonA, ThorleifssonG, MccarrollSA, KongA, et al. (2011) Single-tissue and cross-tissue heritability of gene expression via identity-by-descent in related or unrelated individuals. PLoS Genet 7: e1001317 doi:10.1371/journal.pgen.1001317.

Štítky
Genetika Reprodukčná medicína

Článok vyšiel v časopise

PLOS Genetics


2013 Číslo 3
Najčítanejšie tento týždeň
Najčítanejšie v tomto čísle
Kurzy

Zvýšte si kvalifikáciu online z pohodlia domova

Získaná hemofilie - Povědomí o nemoci a její diagnostika
nový kurz

Eozinofilní granulomatóza s polyangiitidou
Autori: doc. MUDr. Martina Doubková, Ph.D.

Všetky kurzy
Prihlásenie
Zabudnuté heslo

Zadajte e-mailovú adresu, s ktorou ste vytvárali účet. Budú Vám na ňu zasielané informácie k nastaveniu nového hesla.

Prihlásenie

Nemáte účet?  Registrujte sa

#ADS_BOTTOM_SCRIPTS#