-
Články
- Časopisy
- Kurzy
- Témy
- Kongresy
- Videa
- Podcasty
Repetitive Elements May Comprise Over Two-Thirds of the Human Genome
Transposable elements (TEs) are conventionally identified in eukaryotic genomes by alignment to consensus element sequences. Using this approach, about half of the human genome has been previously identified as TEs and low-complexity repeats. We recently developed a highly sensitive alternative de novo strategy, P-clouds, that instead searches for clusters of high-abundance oligonucleotides that are related in sequence space (oligo “clouds”). We show here that P-clouds predicts >840 Mbp of additional repetitive sequences in the human genome, thus suggesting that 66%–69% of the human genome is repetitive or repeat-derived. To investigate this remarkable difference, we conducted detailed analyses of the ability of both P-clouds and a commonly used conventional approach, RepeatMasker (RM), to detect different sized fragments of the highly abundant human Alu and MIR SINEs. RM can have surprisingly low sensitivity for even moderately long fragments, in contrast to P-clouds, which has good sensitivity down to small fragment sizes (∼25 bp). Although short fragments have a high intrinsic probability of being false positives, we performed a probabilistic annotation that reflects this fact. We further developed “element-specific” P-clouds (ESPs) to identify novel Alu and MIR SINE elements, and using it we identified ∼100 Mb of previously unannotated human elements. ESP estimates of new MIR sequences are in good agreement with RM-based predictions of the amount that RM missed. These results highlight the need for combined, probabilistic genome annotation approaches and suggest that the human genome consists of substantially more repetitive sequence than previously believed.
Vyšlo v časopise: Repetitive Elements May Comprise Over Two-Thirds of the Human Genome. PLoS Genet 7(12): e32767. doi:10.1371/journal.pgen.1002384
Kategorie: Research Article
prolekare.web.journal.doi_sk: https://doi.org/10.1371/journal.pgen.1002384Souhrn
Transposable elements (TEs) are conventionally identified in eukaryotic genomes by alignment to consensus element sequences. Using this approach, about half of the human genome has been previously identified as TEs and low-complexity repeats. We recently developed a highly sensitive alternative de novo strategy, P-clouds, that instead searches for clusters of high-abundance oligonucleotides that are related in sequence space (oligo “clouds”). We show here that P-clouds predicts >840 Mbp of additional repetitive sequences in the human genome, thus suggesting that 66%–69% of the human genome is repetitive or repeat-derived. To investigate this remarkable difference, we conducted detailed analyses of the ability of both P-clouds and a commonly used conventional approach, RepeatMasker (RM), to detect different sized fragments of the highly abundant human Alu and MIR SINEs. RM can have surprisingly low sensitivity for even moderately long fragments, in contrast to P-clouds, which has good sensitivity down to small fragment sizes (∼25 bp). Although short fragments have a high intrinsic probability of being false positives, we performed a probabilistic annotation that reflects this fact. We further developed “element-specific” P-clouds (ESPs) to identify novel Alu and MIR SINE elements, and using it we identified ∼100 Mb of previously unannotated human elements. ESP estimates of new MIR sequences are in good agreement with RM-based predictions of the amount that RM missed. These results highlight the need for combined, probabilistic genome annotation approaches and suggest that the human genome consists of substantially more repetitive sequence than previously believed.
Zdroje
1. FrithMCPheasantMMattickJS 2005 Genomics: The amazing complexity of the human transcriptome. Eur J Hum Genet 13 894 897
2. MattickJSMakuninIV 2006 Non-coding RNA. Hum Mol Genet 15 R17 29
3. PheasantMMattickJS 2007 Raising the estimate of functional human sequences. Genome Res 17 1245 1253
4. BatzerMADeiningerPL 2002 Alu repeats and human genomic diversity. Nat Rev Genet 3 370 379
5. EichlerEE 2001 Recent duplication, domain accretion and the dynamic mutation of the human genome. Trends Genet 17 661 669
6. KazazianHHJr 2004 Mobile Elements: Drivers of Genome Evolution. Science 303 1626 1632
7. SmitAFAHubleyRGreenP 1996–2004 RepeatMasker Open-3.0. http://www.repeatmasker.org
8. JurkaJ 2000 Repbase Update: a database and an electronic journal of repetitive elements. Trends Genet 16 418 420
9. International Chicken Genome Sequencing Consortium 2004 Sequence and comparative analysis of the chicken genome provide unique perspectives on vertebrate evolution. Nature 432 695 716
10. International Human Genome Sequencing Consortium 2001 Initial sequencing and analysis of the human genome. Nature 409 860 921
11. KirknessEFBafnaVHalpernALLevySRemingtonK 2003 The Dog Genome: Survey Sequencing and Comparative Analysis. Science 301 1898 1903
12. Lindblad-TohKWadeCMMikkelsenTSKarlssonEKJaffeDB 2005 Genome sequence, comparative analysis and haplotype structure of the domestic dog. Nature 438 803 819
13. MikkelsenTSWakefieldMJAkenBAmemiyaCTChangJL 2007 Genome of the marsupial Monodelphis domestica reveals innovation in non-coding sequences. Nature 447 167 177
14. Mouse Genome Sequencing Consortium 2002 Initial sequencing and comparative analysis of the mouse genome. Nature 420 520 562
15. PontiusJUMullikinJCSmithDRAgencourt SequencingTLindblad-TohK 2007 Initial sequence and comparative analysis of the cat genome. Genome Res 17 1675 1689
16. Rat Genome Sequencing Project Consortium 2004 Genome sequence of the Brown Norway rat yields insights into mammalian evolution. Nature 428 493 521
17. LunterGRoccoAMimouniNHegerACaldeiraA 2008 Uncertainty in homology inferences: assessing and improving genomic sequence alignment. Genome Res 18 298 309
18. BrosiusJ 1999 Genomes were forged by massive bombardments with retroelements and retrosequences. Genetica 107 209 238
19. JurkaJKapitonovVVKohanyOJurkaMV 2007 Repetitive sequences in complex genomes: structure and evolution. Annu Rev Genomics Hum Genet 8 241 259
20. GuWCastoeTAHedgesDJBatzerMAPollockDD 2008 Identification of repeat structure in large genomes using repeat probability clouds. Anal Biochem 380 77 83
21. WarrenWCClaytonDFEllegrenHArnoldAPHillierLW 2010 The genome of a songbird. Nature 464 757 762
22. PriceALJonesNCPevznerPA 2005 De novo identification of repeat families in large genomes. Bioinformatics 21 i351 358
23. JurkaJZietkiewiczELabudaD 1995 Ubiquitous mammalian-wide interspersed repeats (MIRs) are molecular fossils from the mesozoic era. Nucl Acids Res 23 170 175
24. KuhnRMKarolchikDZweigASTrumbowerHThomasDJ 2007 The UCSC genome browser database: update 2007. Nucl Acids Res 35 D668 673
25. NekrutenkoALiWH 2001 Transposable elements are found in a large number of human protein-coding genes. Trends Genet 17 619 621
26. KarlinSAltschulSF 1990 Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proc Natl Acad Sci U S A 87 2264 2268
27. AchazGBoyerFRochaEPCViariACoissacE 2007 Repseek, a tool to retrieve approximate repeats from large DNA sequences. Bioinformatics 23 119 121
28. BaoZEddySR 2002 Automated De Novo Identification of Repeat Sequence Families in Sequenced Genomes. Genome Res 12 1269 1276
29. LiRYeJLiSWangJHanY 2005 ReAS: Recovery of ancestral sequences for transposable elements from the unassembled reads of a whole genome shotgun. PLoS Comput Biol 1 e43
30. EdgarRCMyersEW 2005 PILER: identification and classification of genomic repeats. Bioinformatics 21 i152 158
31. QuesnevilleHBergmanCMAndrieuOAutardDNouaudD 2005 Combined evidence annotation of transposable elements in genome sequences. PLoS Comput Biol 1 166 175
32. KurtzS 2011 Vmatch large scale sequence analysis software. http://www.vmatch.de
33. LeratE 2010 Identifying repeats and transposable elements in sequenced genomes: how to find your way through the dense forest of programs. Heredity 104 520 533
34. RayDABatzerMA 2011 Reading TE leaves: new approaches to the identification of transposable element insertions. Genome Res 21 813 820
35. CastoeTAHallKTGuibotsy MboulasMLGuWde KoningAP 2011 Discovery of highly divergent repeat landscapes in snake genomes using high-throughput sequencing. Genome Biol Evol 3 641 653
36. FeschotteCKeswaniURanganathanNGuibotsyMLLevineD 2009 Exploring repetitive DNA landscapes using REPCLASS, a tool that automates the classification of transposable elements in eukaryotic genomes. Genome Biol Evol 1 205 220
37. QuinlanARHallIM 2010 BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26 841 842
38. BensonG 1999 Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27 573 580
39. AltschulSFMaddenTLSchafferAAZhangJZhangZ 1997 Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25 3389 3402
Štítky
Genetika Reprodukčná medicína
Článek A Complex Genomic Rearrangement Involving the Locus Causes Dermal Hyperpigmentation in the ChickenČlánek Genome Instability and Transcription Elongation Impairment in Human Cells Depleted of THO/TREXČlánek A Population Genetics-Phylogenetics Approach to Inferring Natural Selection in Coding SequencesČlánek Interspecific Sex in Grass Smuts and the Genetic Diversity of Their Pheromone-Receptor SystemČlánek Genomic Distribution and Inter-Sample Variation of Non-CpG Methylation across Human Cell Types
Článok vyšiel v časopisePLOS Genetics
Najčítanejšie tento týždeň
2011 Číslo 12- Gynekologové a odborníci na reprodukční medicínu se sejdou na prvním virtuálním summitu
- Je „freeze-all“ pro všechny? Odborníci na fertilitu diskutovali na virtuálním summitu
-
Všetky články tohto čísla
- The Connection between Space and Thinking: An Interview with Rafael Viñoly
- An Assessment of the Individual and Collective Effects of Variants on Height Using Twins and a Developmentally Informative Study Design
- Widespread Cotranslational Formation of Protein Complexes
- Genomes Reveal Transition of Bacteria from Aquatic to Terrestrial Environments
- A Complex Genomic Rearrangement Involving the Locus Causes Dermal Hyperpigmentation in the Chicken
- Plasticity of BRCA2 Function in Homologous Recombination: Genetic Interactions of the PALB2 and DNA Binding Domains
- Transcription Is Required to Establish Maternal Imprinting at the Prader-Willi Syndrome and Angelman Syndrome Locus
- Substitutions in the Amino-Terminal Tail of Neurospora Histone H3 Have Varied Effects on DNA Methylation
- MAPK/ERK Signaling Regulates Insulin Sensitivity to Control Glucose Metabolism in
- A Comprehensive Analysis of Shared Loci between Systemic Lupus Erythematosus (SLE) and Sixteen Autoimmune Diseases Reveals Limited Genetic Overlap
- Genome Instability and Transcription Elongation Impairment in Human Cells Depleted of THO/TREX
- Genome-Wide Meta-Analysis of Five Asian Cohorts Identifies as a Susceptibility Locus for Corneal Astigmatism
- A Population Genetics-Phylogenetics Approach to Inferring Natural Selection in Coding Sequences
- HIF-1 Regulates Iron Homeostasis in by Activation and Inhibition of Genes Involved in Iron Uptake and Storage
- Ror2 Enhances Polarity and Directional Migration of Primordial Germ Cells
- DNA Methylation of the Gonadal Aromatase () Promoter Is Involved in Temperature-Dependent Sex Ratio Shifts in the European Sea Bass
- A Genetic Screening Strategy Identifies Novel Regulators of the Proteostasis Network
- Interspecific Sex in Grass Smuts and the Genetic Diversity of Their Pheromone-Receptor System
- The Synthetic Multivulva Genes Prevent Ras Pathway Activation by Tightly Repressing Global Ectopic Expression of EGF
- Mining the Allelic Spectrum Reveals the Contribution of Rare and Common Regulatory Variants to HDL Cholesterol
- Identification of a Genomic Reservoir for New Genes in Primate Genomes
- Genomic Distribution and Inter-Sample Variation of Non-CpG Methylation across Human Cell Types
- Identification of Evolutionarily Conserved Exons as Regulated Targets for the Splicing Activator Tra2β in Development
- Acute Multiple Organ Failure in Adult Mice Deleted for the Developmental Regulator Wt1
- Age-Related Neuronal Degeneration: Complementary Roles of Nucleotide Excision Repair and Transcription-Coupled Repair in Preventing Neuropathology
- Target Site Recognition by a Diversity-Generating Retroelement
- Ancestral Components of Admixed Genomes in a Mexican Cohort
- Targeted Proteolysis of Plectin Isoform 1a Accounts for Hemidesmosome Dysfunction in Mice Mimicking the Dominant Skin Blistering Disease EBS-Ogna
- Autosomal Recessive Dilated Cardiomyopathy due to Mutations Results from Abnormal Dystroglycan O-Mannosylation
- SREBP Coordinates Iron and Ergosterol Homeostasis to Mediate Triazole Drug and Hypoxia Responses in the Human Fungal Pathogen
- The RNA Silencing Enzyme RNA Polymerase V Is Required for Plant Immunity
- An Anti-Checkpoint Activity for Rif1
- The FGFR4-G388R Polymorphism Promotes Mitochondrial STAT3 Serine Phosphorylation to Facilitate Pituitary Growth Hormone Cell Tumorigenesis
- Common Variants Show Predicted Polygenic Effects on Height in the Tails of the Distribution, Except in Extremely Short Individuals
- The Fission Yeast Stress-Responsive MAPK Pathway Promotes Meiosis via the Phosphorylation of Pol II CTD in Response to Environmental and Feedback Cues
- Integrating Genome-Wide Genetic Variations and Monocyte Expression Data Reveals -Regulated Gene Modules in Humans
- Repetitive Elements May Comprise Over Two-Thirds of the Human Genome
- A Novel Checkpoint and RPA Inhibitory Pathway Regulated by Rif1
- Hierarchical Generalized Linear Models for Multiple Groups of Rare and Common Variants: Jointly Estimating Group and Individual-Variant Effects
- The Major Roles of DNA Polymerases Epsilon and Delta at the Eukaryotic Replication Fork Are Evolutionarily Conserved
- A High-Resolution Whole-Genome Map of Key Chromatin Modifications in the Adult
- A Densely Interconnected Genome-Wide Network of MicroRNAs and Oncogenic Pathways Revealed Using Gene Expression Signatures
- A Functional Phylogenomic View of the Seed Plants
- Histone H3K9 Trimethylase Eggless Controls Germline Stem Cell Maintenance and Differentiation
- Ribosomal Protein Mutants Control Tissue Growth Non-Autonomously via Effects on the Prothoracic Gland and Ecdysone
- , , and Are Required to Activate or Delimit the Spread of the Transcriptional Response to Epidermal Wounds in
- Mechanisms Establishing TLR4-Responsive Activation States of Inflammatory Response Genes
- Candidate Gene Screen in the Red Flour Beetle Reveals as Ancient Regulator of Anterior Median Head and Central Complex Development
- Charcot-Marie-Tooth–Linked Mutant GARS Is Toxic to Peripheral Neurons Independent of Wild-Type GARS Levels
- The RNA–Methyltransferase Misu (NSun2) Poises Epidermal Stem Cells to Differentiate
- PLOS Genetics
- Archív čísel
- Aktuálne číslo
- Informácie o časopise
Najčítanejšie v tomto čísle- Targeted Proteolysis of Plectin Isoform 1a Accounts for Hemidesmosome Dysfunction in Mice Mimicking the Dominant Skin Blistering Disease EBS-Ogna
- The RNA Silencing Enzyme RNA Polymerase V Is Required for Plant Immunity
- The FGFR4-G388R Polymorphism Promotes Mitochondrial STAT3 Serine Phosphorylation to Facilitate Pituitary Growth Hormone Cell Tumorigenesis
- Target Site Recognition by a Diversity-Generating Retroelement
Prihlásenie#ADS_BOTTOM_SCRIPTS#Zabudnuté hesloZadajte e-mailovú adresu, s ktorou ste vytvárali účet. Budú Vám na ňu zasielané informácie k nastaveniu nového hesla.
- Časopisy