Relationship Estimation from Whole-Genome Sequence Data

English version České info

The determination of the relationship between a pair of individuals is a fundamental application of genetics. Previously, we and others have demonstrated that identity-by-descent (IBD) information generated from high-density single-nucleotide polymorphism (SNP) data can greatly improve the power and accuracy of genetic relationship detection. Whole-genome sequencing (WGS) marks the final step in increasing genetic marker density by assaying all single-nucleotide variants (SNVs), and thus has the potential to further improve relationship detection by enabling more accurate detection of IBD segments and more precise resolution of IBD segment boundaries. However, WGS introduces new complexities that must be addressed in order to achieve these improvements in relationship detection. To evaluate these complexities, we estimated genetic relationships from WGS data for 1490 known pairwise relationships among 258 individuals in 30 families along with 46 population samples as controls. We identified several genomic regions with excess pairwise IBD in both the pedigree and control datasets using three established IBD methods: GERMLINE, fastIBD, and ISCA. These spurious IBD segments produced a 10-fold increase in the rate of detected false-positive relationships among controls compared to high-density microarray datasets. To address this issue, we developed a new method to identify and mask genomic regions with excess IBD. This method, implemented in ERSA 2.0, fully resolved the inflated cryptic relationship detection rates while improving relationship estimation accuracy. ERSA 2.0 detected all 1^st through 6^th degree relationships, and 55% of 9^th through 11^th degree relationships in the 30 families. We estimate that WGS data provides a 5% to 15% increase in relationship detection power relative to high-density microarray data for distant relationships. Our results identify regions of the genome that are highly problematic for IBD mapping and introduce new software to accurately detect 1^st through 9^th degree relationships from whole-genome sequence data.

Vyšlo v časopise: Relationship Estimation from Whole-Genome Sequence Data. PLoS Genet 10(1): e32767. doi:10.1371/journal.pgen.1004144
Kategorie: Research Article
prolekare.web.journal.doi_sk: https://doi.org/10.1371/journal.pgen.1004144

Souhrn

Zdroje

1. BoehnkeM, CoxNJ (1997) Accurate inference of relationships in sib-pair linkage studies. Am J Hum Genet 61 : 423–429.

2. VoightBF, PritchardJK (2005) Confounding from cryptic relatedness in case-control association studies. PLoS Genet 1: e32.

3. KongA, ThorleifssonG, GudbjartssonDF, MassonG, SigurdssonA, et al. (2010) Fine-scale recombination rate differences between sexes, populations and individuals. Nature 467 : 1099–1103.

4. XingJ, WatkinsWS, ShlienA, WalkerE, HuffCD, et al. (2010) Toward a more uniform sampling of human genetic diversity: A survey of worldwide populations by high-density genotyping. Genomics 96 : 199–210.

5. LinTH, MyersEW, XingEP (2006) Interpreting anonymous DNA samples from mass disasters–probabilistic forensic inference using genetic markers. Bioinformatics 22: e298–306.

6. Alvarez-CuberoMJ, SaizM, Martinez-GonzalezLJ, AlvarezJC, EisenbergAJ, et al. (2012) Genetic identification of missing persons: DNA analysis of human remains and compromised samples. Pathobiology 79 : 228–238.

7. ThompsonEA (1975) The estimation of pairwise relationships. Ann Hum Genet 39 : 173–188.

8. Ehm MGWM (1996) Test statistic to detect errors in sib-pair relationships. Am J Hum Genet Suppl 69: A217.

9. EpsteinMP, DurenWL, BoehnkeM (2000) Improved inference of relationship for pairs of individuals. Am J Hum Genet 67 : 1219–1231.

10. HuffCD, WitherspoonDJ, SimonsonTS, XingJ, WatkinsWS, et al. (2011) Maximum-likelihood estimation of recent shared ancestry (ERSA). Genome Res 21 : 768–774.

11. HennBM, HonL, MacphersonJM, ErikssonN, SaxonovS, et al. (2012) Cryptic distant relatives are common in both isolated and cosmopolitan genetic samples. PLoS One 7: e34267.

12. GusevA, LoweJK, StoffelM, DalyMJ, AltshulerD, et al. (2009) Whole population, genome-wide mapping of hidden relatedness. Genome Res 19 : 318–326.

13. BrowningBL, BrowningSR (2011) A fast, powerful method for detecting identity by descent. Am J Hum Genet 88 : 173–182.

14. RoachJC, GlusmanG, SmitAF, HuffCD, HubleyR, et al. (2010) Analysis of Genetic Inheritance in a Family Quartet by Whole-Genome Sequencing. Science 328 : 636–9.

15. Smit AFA, Hubley R. (2008–2010) RepeatModeler Open-1.0.

16. AlbrechtsenA, MoltkeI, NielsenR (2010) Natural selection and the distribution of identity-by-descent in the human genome. Genetics 186 : 295–308.

17. GusevA, PalamaraPF, AponteG, ZhuangZ, DarvasiA, et al. (2012) The architecture of long-range haplotypes shared within and across populations. Mol Biol Evol 29 : 473–486.

18. PriceAL, WealeME, PattersonN, MyersSR, NeedAC, et al. (2008) Long-range LD can confound genome scans in admixed populations. Am J Hum Genet 83 : 132–135 author reply 135–139.

19. TianC, PlengeRM, RansomM, LeeA, VillosladaP, et al. (2008) Analysis and application of European genetic substructure using 300 K SNP information. PLoS Genet 4: e4.

20. RoachJC, GlusmanG, SmitAF, HuffCD, HubleyR, et al. (2010) Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328 : 636–639.

21. RoachJC, GlusmanG, HubleyR, MontsaroffSZ, HollowayAK, et al. (2011) Chromosomal haplotypes by genetic phasing of human families. Am J Hum Genet 89 : 382–397.

22. ConsortiumTIH (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449 : 851–861.

23. McVeanGA, MyersSR, HuntS, DeloukasP, BentleyDR, et al. (2004) The fine-scale structure of recombination rate variation in the human genome. Science 304 : 581–584.

24. BrowningSR, BrowningBL (2007) Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am J Hum Genet 81 : 1084–1097.

25. ThomasA, SkolnickMH, LewisCM (1994) Genomic mismatch scanning in pedigrees. IMA J Math Appl Med Biol 11 : 1–16.

26. HillWG, WhiteIM (2013) Identification of pedigree relationship from genome sharing. G3 (Bethesda) 3 : 1553–1571.

27. PurcellS, NealeB, Todd-BrownK, ThomasL, FerreiraMA, et al. (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81 : 559–575.

28. BrowningBL, BrowningSR (2013) Improving the Accuracy and Efficiency of Identity-By-Descent Detection in Population Data. Genetics 194(2): 459–471.