Characterising and Predicting Haploinsufficiency in the Human Genome


Haploinsufficiency, wherein a single functional copy of a gene is insufficient to maintain normal function, is a major cause of dominant disease. Human disease studies have identified several hundred haploinsufficient (HI) genes. We have compiled a map of 1,079 haplosufficient (HS) genes by systematic identification of genes unambiguously and repeatedly compromised by copy number variation among 8,458 apparently healthy individuals and contrasted the genomic, evolutionary, functional, and network properties between these HS genes and known HI genes. We found that HI genes are typically longer and have more conserved coding sequences and promoters than HS genes. HI genes exhibit higher levels of expression during early development and greater tissue specificity. Moreover, within a probabilistic human functional interaction network HI genes have more interaction partners and greater network proximity to other known HI genes. We built a predictive model on the basis of these differences and annotated 12,443 genes with their predicted probability of being haploinsufficient. We validated these predictions of haploinsufficiency by demonstrating that genes with a high predicted probability of exhibiting haploinsufficiency are enriched among genes implicated in human dominant diseases and among genes causing abnormal phenotypes in heterozygous knockout mice. We have transformed these gene-based haploinsufficiency predictions into haploinsufficiency scores for genic deletions, which we demonstrate to better discriminate between pathogenic and benign deletions than consideration of the deletion size or numbers of genes deleted. These robust predictions of haploinsufficiency support clinical interpretation of novel loss-of-function variants and prioritization of variants and genes for follow-up studies.


Vyšlo v časopise: Characterising and Predicting Haploinsufficiency in the Human Genome. PLoS Genet 6(10): e32767. doi:10.1371/journal.pgen.1001154
Kategorie: Research Article
prolekare.web.journal.doi_sk: 10.1371/journal.pgen.1001154

Souhrn

Haploinsufficiency, wherein a single functional copy of a gene is insufficient to maintain normal function, is a major cause of dominant disease. Human disease studies have identified several hundred haploinsufficient (HI) genes. We have compiled a map of 1,079 haplosufficient (HS) genes by systematic identification of genes unambiguously and repeatedly compromised by copy number variation among 8,458 apparently healthy individuals and contrasted the genomic, evolutionary, functional, and network properties between these HS genes and known HI genes. We found that HI genes are typically longer and have more conserved coding sequences and promoters than HS genes. HI genes exhibit higher levels of expression during early development and greater tissue specificity. Moreover, within a probabilistic human functional interaction network HI genes have more interaction partners and greater network proximity to other known HI genes. We built a predictive model on the basis of these differences and annotated 12,443 genes with their predicted probability of being haploinsufficient. We validated these predictions of haploinsufficiency by demonstrating that genes with a high predicted probability of exhibiting haploinsufficiency are enriched among genes implicated in human dominant diseases and among genes causing abnormal phenotypes in heterozygous knockout mice. We have transformed these gene-based haploinsufficiency predictions into haploinsufficiency scores for genic deletions, which we demonstrate to better discriminate between pathogenic and benign deletions than consideration of the deletion size or numbers of genes deleted. These robust predictions of haploinsufficiency support clinical interpretation of novel loss-of-function variants and prioritization of variants and genes for follow-up studies.


Zdroje

1. NgPC

HenikoffS

2006 Predicting the effects of amino acid substitutions on protein function. Annu Rev Genomics Hum Genet 7 61 80

2. WilkieAOM

1994 The molecular basis of genetic dominance. J Med Genet 31 89 98

3. XueY

DalyA

YngvadottirB

LiuM

CoopG

2006 Spread of an inactive form of caspase-12 in humans is due to recent positive selection. Am J Hum Genet 78 659 670

4. NgSB

TurnerEH

RobertsonPD

FlygareSD

BighamAW

2009 Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461 272 276

5. NgPC

LevyS

HuangJ

StockwellTB

WalenzBP

2008 Genetic variation in an individual human exome. PLoS Genet 4 e1000160 doi:10.1371/journal.pgen.1000160

6. ConradDF

PintoD

RedonR

FeukL

GokcumenO

2009 Origins and functional impact of copy number variation in the human genome. Nature

7. LeeC

IafrateAJ

BrothmanAR

2007 Copy number variations and clinical cytogenetic diagnosis of constitutional disorders. Nat Genet 39 S48 54

8. DangV

KassahnK

MarcosA

RaganM

2008 Identification of human haploinsufficient genes and their genomic proximity to segmental duplications. Eur J Hum Genet 16 1350 1357

9. SeidmanJG

SeidmanC

2002 Transcription factor haploinsufficiency: when half a loaf is not enough. J Clin Invest 109 451 455

10. BlekhmanR

ManO

HerrmannL

BoykoAR

IndapA

2008 Natural Selection on Genes that Underlie Human Disease Susceptibility. Curr Biol 18 883 889

11. KondrashovFA

KooninEV

2004 A common framework for understanding the origin of genetic dominance and evolutionary fates of gene duplications. Trends Genet 20 287 290

12. NguyenD-Q

WebberC

PontingCP

2006 Bias of selection on human copy-number variants. PLoS Genet 2 e20 doi:10.1371/journal.pgen.0020020

13. DeutschbauerAM

JaramilloDF

ProctorM

KummJ

HillenmeyerME

2005 Mechanisms of haploinsufficiency revealed by genome-wide profiling in yeast. Genetics 169 1915 1925

14. VeitiaRA

2007 Exploring the molecular etiology of dominant-negative mutations. Plant Cell 19 3843 3851

15. HamoshA

ScottAF

AmbergerJ

BocchiniC

ValleD

2002 Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res 30 52 55

16. BlakeJA

BultCJ

EppigJT

KadinJA

RichardsonJE

2009 The Mouse Genome Database genotypes::phenotypes. Nucleic Acids Res 37 D712 719

17. International Schizophrenia Consortium 2008 Rare chromosomal deletions and duplications increase risk of schizophrenia. Nature 455 237 241

18. FirthHV

RichardsSM

BevanAP

ClaytonS

CorpasM

2009 DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources. Am J Hum Genet 84 524 533

19. BoykoAR

WilliamsonSH

IndapAR

DegenhardtJD

HernandezRD

2008 Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet 4 e1000083 doi:10.1371/journal.pgen.1000083

20. BustamanteCD

Fledel-AlonA

WilliamsonS

NielsenR

HubiszMT

2005 Natural selection on protein-coding genes in the human genome. Nature 437 1153 1157

21. LohmuellerKE

IndapAR

SchmidtS

BoykoAR

HernandezRD

2008 Proportionally more deleterious genetic variation in European than in African populations. Nature 451 994 997

22. van der HeijdenGJ

DondersAR

StijnenT

MoonsKG

2006 Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostic research: a clinical example. J Clin Epidemiol 59 1102 1109

23. JiW

FooJN

O'RoakBJ

ZhaoH

LarsonMG

2008 Rare independent mutations in renal salt handling genes contribute to blood pressure variation. Nat Genet 40 592 599

24. GirirajanS

RosenfeldJA

CooperGM

AntonacciF

SiswaraP

A recurrent 16p12.1 microdeletion supports a two-hit model for severe developmental delay. Nat Genet 42 203 209

25. NgPC

HenikoffS

2001 Predicting deleterious amino acid substitutions. Genome Res 11 863 874

26. SunyaevS

RamenskyV

KochI

LatheW3rd

KondrashovAS

2001 Prediction of deleterious human alleles. Hum Mol Genet 10 591 597

27. MadsenBE

BrowningSR

2009 A groupwise association test for rare mutations using a weighted sum statistic. PLoS Genet 5 e1000384 doi:10.1371/journal.pgen.1000384

28. McCarrollS

KuruvillaF

KornJ

CawleyS

NemeshJ

2008 Integrated detection and population-genetic analysis of SNPs and copy number variation. Nat Genet 40 1166 1174

29. KornJ

KuruvillaF

McCarrollS

WysokerA

NemeshJ

2008 Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nat Genet 40 1253 1260

30. HubbardTJ

AkenBL

AylingS

BallesterB

BealK

2009 Ensembl 2009. Nucleic Acids Res 37 D690 697

31. CooperGM

StoneEA

AsimenosG

GreenED

BatzoglouS

2005 Distribution and intensity of constraint in mammalian genomic sequence. Genome Res 15 901 913

32. SuAI

WiltshireT

BatalovS

LappH

ChingKA

2004 A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci U S A 101 6062 6067

33. AssouS

Le CarrourT

TondeurS

StromS

GabelleA

2007 A meta-analysis of human embryonic stem cells transcriptome integrated into a web-based expression atlas. Stem Cells 25 961 973

34. SmithCM

FingerJH

HayamizuTF

McCrightIJ

EppigJT

2007 The mouse Gene Expression Database (GXD): 2007 update. Nucleic Acids Res 35 D618 623

35. BrownKR

JurisicaI

2005 Online predicted human interaction database. Bioinformatics 21 2076 2082

36. Chatr-aryamontriA

CeolA

PalazziLM

NardelliG

SchneiderMV

2007 MINT: the Molecular INTeraction database. Nucleic Acids Res 35 D572 574

37. Keshava PrasadTS

GoelR

KandasamyK

KeerthikumarS

KumarS

2009 Human Protein Reference Database–2009 update. Nucleic Acids Res 37 D767 772

38. RualJ-F

VenkatesanK

HaoT

Hirozane-KishikawaT

DricotA

2005 Towards a proteome-scale map of the human protein-protein interaction network. Nature 437 1173 1178

39. VastrikI

D'EustachioP

SchmidtE

Joshi-TopeG

GopinathG

2007 Reactome: a knowledge base of biologic pathways and processes. Genome Biol 8 R39

40. LeeI

LiZ

MarcotteEM

2007 An improved, bias-reduced probabilistic functional gene network of baker's yeast, Saccharomyces cerevisiae. PLoS ONE 2 e988 doi:10.1371/journal.pone.0000988

41. LeeI

LehnerB

CrombieC

WongW

FraserAG

2008 A single gene network accurately predicts phenotypic effects of gene perturbation in Caenorhabditis elegans. Nat Genet 40 181 188

42. van DongenS

2008 Graph clustering via a discrete uncoupling process. SIAM J Matrix Anal & Appl 30 121 141

43. ForbesS

ClementsJ

DawsonE

BamfordS

WebbT

2006 COSMIC 2005. Br J Cancer 94 318 322

44. FawcettT

2006 An introduction to ROC analysis. Pattern Recognition Letters 27 861 874

45. BaldiP

BrunakS

ChauvinY

AndersenCA

NielsenH

2000 Assessing the accuracy of prediction algorithms for classification: an overview. Bioinformatics 16 412 424

Štítky
Genetika Reprodukčná medicína

Článok vyšiel v časopise

PLOS Genetics


2010 Číslo 10
Najčítanejšie tento týždeň
Najčítanejšie v tomto čísle
Kurzy

Zvýšte si kvalifikáciu online z pohodlia domova

Eozinofilní granulomatóza s polyangiitidou
nový kurz

Betablokátory a Ca antagonisté z jiného úhlu
Autori: prof. MUDr. Michal Vrablík, Ph.D., MUDr. Petr Janský

Autori: doc. MUDr. Petr Čáp, Ph.D.

Farmakoterapie akutní a chronické bolesti

Získaná hemofilie - Povědomí o nemoci a její diagnostika

Všetky kurzy
Prihlásenie
Zabudnuté heslo

Nemáte účet?  Registrujte sa

Zabudnuté heslo

Zadajte e-mailovú adresu, s ktorou ste vytvárali účet. Budú Vám na ňu zasielané informácie k nastaveniu nového hesla.

Prihlásenie

Nemáte účet?  Registrujte sa