Gene-Based Testing of Interactions in Association Studies of Quantitative Traits

English version České info

Various methods have been developed for identifying gene–gene interactions in genome-wide association studies (GWAS). However, most methods focus on individual markers as the testing unit, and the large number of such tests drastically erodes statistical power. In this study, we propose novel interaction tests of quantitative traits that are gene-based and that confer advantage in both statistical power and biological interpretation. The framework of gene-based gene–gene interaction (GGG) tests combine marker-based interaction tests between all pairs of markers in two genes to produce a gene-level test for interaction between the two. The tests are based on an analytical formula we derive for the correlation between marker-based interaction tests due to linkage disequilibrium. We propose four GGG tests that extend the following P value combining methods: minimum P value, extended Simes procedure, truncated tail strength, and truncated P value product. Extensive simulations point to correct type I error rates of all tests and show that the two truncated tests are more powerful than the other tests in cases of markers involved in the underlying interaction not being directly genotyped and in cases of multiple underlying interactions. We applied our tests to pairs of genes that exhibit a protein–protein interaction to test for gene-level interactions underlying lipid levels using genotype data from the Atherosclerosis Risk in Communities study. We identified five novel interactions that are not evident from marker-based interaction testing and successfully replicated one of these interactions, between SMAD3 and NEDD9, in an independent sample from the Multi-Ethnic Study of Atherosclerosis. We conclude that our GGG tests show improved power to identify gene-level interactions in existing, as well as emerging, association studies.

Vyšlo v časopise: Gene-Based Testing of Interactions in Association Studies of Quantitative Traits. PLoS Genet 9(2): e32767. doi:10.1371/journal.pgen.1003321
Kategorie: Research Article
prolekare.web.journal.doi_sk: https://doi.org/10.1371/journal.pgen.1003321

Souhrn

Zdroje

1. HindorffLA, SethupathyP, JunkinsHA, RamosEM, MehtaJP, et al. (2009) Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A 106 : 9362–9367.

2. EichlerEE, FlintJ, GibsonG, KongA, LealSM, et al. (2010) Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet 11 : 446–450.

3. ManolioTA, CollinsFS, CoxNJ, GoldsteinDB, HindorffLA, et al. (2009) Finding the missing heritability of complex diseases. Nature 461 : 747–753.

4. FrazerKA, MurraySS, SchorkNJ, TopolEJ (2009) Human genetic variation and its contribution to complex traits. Nat Rev Genet 10 : 241–251.

5. CarlborgO, HaleyCS (2004) Epistasis: too often neglected in complex trait studies? Nature Reviews Genetics 5 : 618–U614.

6. CordellHJ (2009) Detecting gene-gene interactions that underlie human diseases. Nature Reviews Genetics 10 : 392–404.

7. MooreJH, AsselbergsFW, WilliamsSM (2010) Bioinformatics challenges for genome-wide association studies. Bioinformatics 26 : 445–455.

8. MooreJH, WilliamsSM (2009) Epistasis and Its Implications for Personal Genetics. American Journal of Human Genetics 85 : 309–320.

9. ZukO, HechterE, SunyaevSR, LanderES (2012) The mystery of missing heritability: Genetic interactions create phantom heritability. Proceedings of the National Academy of Sciences 109 : 1193–1198.

10. LiZ, PinsonSRM, ParkWD, PatersonAH, StanselJW (1997) Epistasis for three grain yield components in rice (Oryza sativa L.). Genetics 145 : 453–465.

11. SegreD, DeLunaA, ChurchGM, KishonyR (2004) Modular epistasis in yeast metabolism. Nature Genetics 37 : 77–83.

12. PetterssonM, BesnierF, SiegelPB, CarlborgÖ (2011) Replication and Explorations of High-Order Epistasis Using a Large Advanced Intercross Line Pedigree. PLoS Genet 7: e1002180 doi:10.1371/journal.pgen.1002180

13. WeiWH, HemaniG, GyeneseiA, VitartV, NavarroP, et al. (2012) Genome-wide analysis of epistasis in body mass index using multiple human populations. European Journal of Human Genetics 20 : 857–862.

14. HeJ, WangK, EdmondsonAC, RaderDJ, LiC, et al. (2011) Gene-based interaction analysis by incorporating external linkage disequilibrium information. European Journal of Human Genetics 19 : 164–172.

15. MaL, BrautbarA, BoerwinkleE, SingCF, ClarkAG, et al. (2012) Knowledge-Driven Analysis Identifies a Gene-Gene Interaction Affecting High-Density Lipoprotein Cholesterol Levels in Multi-Ethnic Populations. PLoS Genet 8: e1002714 doi:10.1371/journal.pgen.1002714

16. MaL, RuneshaHB, DvorkinD, GarbeJR, DaY (2008) Parallel and serial computing tools for testing single-locus and epistatic SNP effects of quantitative traits in genome-wide association studies. BMC bioinformatics 9 : 315.

17. WanX, YangC, YangQ, XueH, FanX, et al. (2010) BOOST: A fast approach to detecting gene-gene interactions in genome-wide case-control studies. The American Journal of Human Genetics 87 : 325–340.

18. HahnLW, RitchieMD, MooreJH (2003) Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions. Bioinformatics 19 : 376–382.

19. SchupbachT, XenariosI, BergmannS, KapurK (2010) FastEpistasis: a high performance computing solution for quantitative trait epistasis. Bioinformatics 26 : 1468–1469.

20. ZhangX, HuangSP, ZouF, WangW (2010) TEAM: efficient two-locus epistasis tests in human genome-wide association study. Bioinformatics 26: i217–i227.

21. PrabhuS, Pe'erI (2012) Ultrafast genome-wide scan for SNP-SNP interactions in common complex disease. Genome Research 22 : 2230–2240.

22. CordellHJ (2002) Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. Human Molecular Genetics 11 : 2463–2468.

23. WuX, DongH, LuoL, ZhuY, PengG, et al. (2010) A Novel Statistic for Genome-Wide Interaction Analysis. PLoS Genet 6: e1001131 doi:10.1371/journal.pgen.1001131

24. OhS, LeeJ, KwonM-S, WeirB, HaK, et al. (2012) A novel method to identify high order gene-gene interactions in genome-wide association studies: Gene-based MDR. BMC bioinformatics 13: S5.

25. LiS, CuiY (2012) Gene-centric gene–gene interaction: A model-based kernel machine method. The Annals of Applied Statistics 6 : 1134–1161.

26. RajapakseI, PerlmanMD, MartinPJ, HansenJA, KooperbergC (2012) Multivariate Detection of Gene-Gene Interactions. Genetic epidemiology 36 : 622–630.

27. LiM-X, Kwan JohnnySH, Sham PakC (2012) HYST: A Hybrid Set-Based Test for Genome-wide Association Studies, with Application to Protein-Protein Interaction-Based Association Analysis. The American Journal of Human Genetics 91 : 478–488.

28. NealeBM, ShamPC (2004) The future of association studies: Gene-based analysis and replication. American Journal of Human Genetics 75 : 353–362.

29. JorgensonE, WitteJS (2006) A gene-centric approach to genome-wide association studies. Nature Reviews Genetics 7 : 885–891.

30. WangL, JiaP, WolfingerRD, ChenX, ZhaoZ (2011) Gene set analysis of genome-wide association studies: methodological issues and perspectives. Genomics 98 : 1–8.

31. LiMX, GuiHS, KwanJSH, ShamPC (2011) GATES: A Rapid and Powerful Gene-Based Association Test Using Extended Simes Procedure. American Journal of Human Genetics 88 : 283–293.

32. LiuJZ, McraeAF, NyholtDR, MedlandSE, WrayNR, et al. (2010) A Versatile Gene-Based Test for Genome-wide Association Studies. American Journal of Human Genetics 87 : 139–145.

33. WangK, LiMY, BucanM (2007) Pathway-based approaches for analysis of genomewide association studies. American Journal of Human Genetics 81 : 1278–1283.

34. BushWS, McCauleyJL, DeJagerPL, DudekSM, HaflerDA, et al. (2011) A knowledge-driven interaction analysis reveals potential neurodegenerative mechanism of multiple sclerosis susceptibility. Genes Immun 12 : 335–340.

35. TurnerSD, BergRL, LinnemanJG, PeissigPL, CrawfordDC, et al. (2011) Knowledge-driven multi-locus analysis reveals gene-gene interactions influencing HDL cholesterol level in two independent EMR-linked biobanks. PLoS ONE 6: e19586 doi:10.1371/journal.pone.0019586

36. TzengJY, ZhangD, PongpanichM, SmithC, McCarthyMI, et al. (2011) Studying gene and gene-environment effects of uncommon and common variants on continuous traits: a marker-set approach using gene-trait similarity regression. The American Journal of Human Genetics 89 : 277–288.

37. MaL, BallantyneCM, BelmontJW, KeinanA, BrautbarA (2012) Interaction between SNPs in the RXRA and near ANGPTL3 gene region inhibit apolipoprotein B reduction following statin-fenofibric acid therapy in individuals with mixed dyslipidemia. Journal of Lipid Research 53 : 2425–2428.

38. BushWS, McCauleyJL, DeJagerPL, DudekSM, HaflerDA, et al. (2011) A knowledge-driven interaction analysis reveals potential neurodegenerative mechanism of multiple sclerosis susceptibility. Genes and Immunity 12 : 335–340.

39. GaudermanWJ, MurcrayC, GillilandF, ContiDV (2007) Testing association between disease and multiple SNPs in a candidate gene. Genetic epidemiology 31 : 450–450.

40. WangK, AbbottD (2008) A principal components regression approach to multilocus genetic association studies. Genetic epidemiology 32 : 108–118.

41. LiMY, WangK, GrantSFA, HakonarsonH, LiC (2009) ATOM: a powerful gene-based association test by combining optimally weighted markers. Bioinformatics 25 : 497–503.

42. PengG, LuoL, SiuHC, ZhuY, HuPF, et al. (2010) Gene and pathway-based second-wave analysis of genome-wide association studies. European Journal of Human Genetics 18 : 111–117.

43. HuangH, ChandaP, AlonsoA, BaderJS, ArkingDE (2011) Gene-Based Tests of Association. PLoS Genet 7: e1002177 doi:10.1371/journal.pgen.1002177.

44. WuMC, KraftP, EpsteinMP, TaylorDM, ChanockSJ, et al. (2010) Powerful SNP-Set Analysis for Case-Control Genome-wide Association Studies. American Journal of Human Genetics 86 : 929–942.

45. KweeLC, LiuDW, LinXH, GhoshD, EpsteinMP (2008) A powerful and flexible multilocus association test for quantitative traits. American Journal of Human Genetics 82 : 386–397.

46. MukhopadhyayI, FeingoldE, WeeksDE, ThalamuthuA (2010) Association Tests Using Kernel-Based Measures of Multi-Locus Genotype Similarity Between Individuals. Genetic epidemiology 34 : 213–221.

47. JiangB, ZhangX, ZuoY, KangG (2011) A powerful truncated tail strength method for testing multiple null hypotheses in one dataset. Journal of Theoretical Biology 277 : 67–73.

48. ZaykinDV, ZhivotovskyLA, WestfallPH, WeirBS (2002) Truncated product method for combining P-values. Genetic epidemiology 22 : 170–185.

49. WangK, AbbottD (2008) A principal components regression approach to multilocus genetic association studies. Genet Epidemiol 32 : 108–118.

50. WilliamsOD (1989) The Atherosclerosis Risk in Communities (ARIC) Study -⁠ Design and Objectives. American Journal of Epidemiology 129 : 687–702.

51. BildDE, BluemkeDA, BurkeGL, DetranoR, RouxAVD, et al. (2002) Multi-ethnic study of atherosclerosis: Objectives and design. American Journal of Epidemiology 156 : 871–881.

52. The International HapMap Consortium (2003) The International HapMap Project. Nature 426 : 789–796.

53. DurbinRM, AbecasisGR, AltshulerDL, AutonA, BrooksLD, et al. (2010) A map of human genome variation from population-scale sequencing. Nature 467 : 1061–1073.

54. PurcellS, NealeB, Todd-BrownK, ThomasL, FerreiraMAR, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. The American Journal of Human Genetics 81 : 559–575.

55. ConneelyKN, BoehnkeM (2007) So many correlated tests, so little time! Rapid adjustment of P values for multiple correlated tests. American Journal of Human Genetics 81 : 1158–1168.

56. Genz A, Bretz F, Hothorn T (2007) mvtnorm: multivariate normal and t distribution.R package version 08-0, http://cranr-projectorg/doc/packages/mvtnormpdf.

57. MoskvinaV, SchmidtKM (2008) On multiple testing correction in genome wide association studies. Genetic epidemiology 32 : 567–573.

58. TaylorJ, TibshiraniR (2006) A tail strength measure for assessing the overall univariate significance in a dataset. Biostatistics 7 : 167–181.

59. BarrettJC, FryB, MallerJ, DalyMJ (2005) Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21 : 263–265.

60. LiY, WillerCJ, DingJ, ScheetP, AbecasisGR (2010) MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genetic epidemiology 34 : 816–834.

61. TeslovichTM, MusunuruK, SmithAV, EdmondsonAC, StylianouIM, et al. (2010) Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466 : 707–713.

62. MaL, YangJ, RuneshaHB, TanakaT, FerrucciL, et al. (2010) Genome-wide association analysis of total cholesterol and high-density lipoprotein cholesterol levels using the Framingham Heart Study data. BMC Medical Genetics 11 : 55.

63. MaL, HanSZ, YangJ, DaY (2010) Multi-locus Test Conditional on Confirmed Effects Leads to Increased Power in Genome-wide Association Studies. PLoS ONE 5: e15006 doi:10.1371/journal.pone.0015006.

64. PriceAL, PattersonNJ, PlengeRM, WeinblattME, ShadickNA, et al. (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nature genetics 38 : 904–909.

65. AltshulerDM, GibbsRA, PeltonenL, DermitzakisE, SchaffnerSF, et al. (2010) Integrating common and rare genetic variation in diverse human populations. Nature 467 : 52–58.

66. QinBY, LamSS, CorreiaJJ, LinK (2002) Smad3 allostery links TGF-beta receptor kinase activation to transcriptional control. Genes Dev 16 : 1950–1963.

67. AngelakopoulouA, ShahT, SofatR, ShahS, BerryDJ, et al. (2012) Comparative analysis of genome-wide association studies signals for lipids, diabetes, and coronary heart disease: Cardiovascular Biomarker Genetics Collaboration. European Heart Journal 33 : 393–407.

68. SamaniNJ, ErdmannJ, HallAS, HengstenbergC, ManginoM, et al. (2007) Genomewide association analysis of coronary artery disease. N Engl J Med 357 : 443–453.

69. LiY, GrupeA, RowlandC, HolmansP, SeguradoR, et al. (2008) Evidence that common variation in NEDD9 is associated with susceptibility to late-onset Alzheimer's and Parkinson's disease. Hum Mol Genet 17 : 759–767.

70. ChapuisJ, MoisanF, MellickG, ElbazA, SilburnP, et al. (2008) Association study of the NEDD9 gene with the risk of developing Alzheimer's and Parkinson's disease. Hum Mol Genet 17 : 2863–2867.

71. WollmerMA (2010) Cholesterol-related genes in Alzheimer's disease. Biochim Biophys Acta 1801 : 762–773.

72. SeamanSR, Mueller-MyhsokB (2005) Rapid simulation of p-values for product methods and multiple-testing adjustment in association studies. Annals of Human Genetics 69 : 772–773.

73. StelzlU, WormU, LalowskiM, HaenigC, BrembeckFH, et al. (2005) A human protein-protein interaction network: a resource for annotating the proteome. Cell 122 : 957–968.

74. Von MeringC, KrauseR, SnelB, CornellM, OliverSG, et al. (2002) Comparative assessment of large-scale data sets of protein-protein interactions. Nature 417 : 399–404.

75. ShannonP, MarkielA, OzierO, BaligaNS, WangJT, et al. (2003) Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome research 13 : 2498–2504.

76. WangK, LiM, BucanM (2007) Pathway-based approaches for analysis of genomewide association studies. The American Journal of Human Genetics 81 : 1278–1283.

77. YadavH, QuijanoC, KamarajuAK, GavrilovaO, MalekR, et al. (2011) Protection from Obesity and Diabetes by Blockade of TGF-beta/Smad3 Signaling. Cell Metabolism 14 : 67–79.

78. LedesmaMD, DottiCG (2006) Amyloid excess in Alzheimer's disease: What is cholesterol to be blamed for? Febs Letters 580 : 5525–5532.