GERV: A Statistical Method for Generative Evaluation of Regulatory Variants for Transcription Factor Binding

This webpage provides data for our manuscript on predicting non-coding variants's effect on transcription factor binding:

Zeng H., Hashimoti T., Kang D. D., Gifford D. K.(2015) "GERV: A Statistical Method for Generative Evaluation of Regulatory Variants for Transcription Factor Binding". Bioinformatics [link]

Motivation: The majority of disease-associated variants identified in genome-wide association studies (GWAS) reside in noncoding regions of the genome with regulatory roles. Thus being able to interpret the functional consequence of a variant is essential for identifying causal variants in the analysis of GWAS studies.

Results: We present GERV (Generative Evaluation of Regulatory Variants), a novel computational method for predicting regulatory variants that affect transcription factor binding. GERV learns a k-mer based generative model of transcription factor binding from ChIP-seq and DNase-seq data, and scores variants by computing the change of predicted ChIP-seq reads between the reference and alternate allele. The k-mers learned by GERV capture more sequence determinants of transcription factor binding than a motif-based approach alone, including both a transcription factor’s canonical motif as well as associated co-factor motifs. We show that GERV outperforms existing methods in predicting SNPs associated with allele-specific binding. GERV correctly predicts a validated causal variant among linked SNPs, and prioritizes the variants previously reported to modulate the binding of FOXA1 in breast cancer cell lines. Thus, GERV provides a powerful approach for functionally annotating and prioritizing causal variants for experimental follow-up analysis.

To run GERV

Source code and documentation: Github

Genome file needed to run the model: hg19 mm10


Trained Model

The GERV models trained for NFKB, CTCF, FOS, JUND, MAX and MYC as in the paper can be accessed from here

The GERV models trained on 84 TF ChIP-seq experiments from ENCODE project: here (12G)


Other supplementary data for the paper

GERV predictions for Allele-Specific Binding (ASB) SNP and two negative sets for NFKB, CTCF, FOS, JUND, MAX and MYC: here

GERV and deltaSVM scores for breast cancer associated variant set (AVS) (intersected with 1KG SNP): FOXA1_AVS_SCORE_GERV_DeltaSVM.zip


Contact

For questions or to request additional data please contact Haoyang Zeng (haoyangz@mit.edu), or David Gifford (gifford@mit.edu).


Last updated 1/20/2017.