Gang Fang, Majda
Haznadar, Wen Wang, Haoyu Yu, Michael Steinbach,
Timothy R. Church, William S. Oetting, Brian Van Ness and Vipin Kumar
Last updated: 04/19/2012
Abstract:
There has been
increased interest in discovering combinations of single-nucleotide
polymorphisms (SNPs) that are strongly associated with a phenotype even
if each SNP has little individual effect. Efficient approaches have
been proposed for searching two-locus combinations from genome-wide
datasets. However, for high-order combinations, existing methods either
adopt a brute-force search which only handles a small number of SNPs,
or use heuristic search that may miss informative combinations. In
addition, existing approaches lack statistical power because of the use
of statistics with high degrees-of-freedom and the huge number of
hypotheses tested during ombinatorial search. Due to these
challenges, high-order combinations are mostly studied on simulated
data up to the order of three or from real datasets covering a small
number of genes. Thus, functional interactions in high-order
combinations have not been systematically explored. We leverage
discriminative-pattern-mining algorithms from the data-mining community
to search for high-order combinations in case-control datasets. The
substantially improved efficiency and scalability demonstrated on
synthetic and real datasets with several thousands of SNPs allows the
study of several important mathematical and statistical properties of
SNP combinations with order as high as eleven. We further explore
functional interactions in high-order combinations and reveal a general
connection between the increase in discriminative power of a
combination over its subsets and the functional coherence among the
genes comprising the combination, supported by multiple datasets.
Finally, we study several significant high-order combinations
discovered from a lung-cancer dataset and a kidney-transplant-rejection
dataset in detail to provide novel insights on the complex diseases.
Codes:
Matlab code to discover high-order disease-associated SNP combinations: (HSC version 0.1 (instructions in read.txt))
Correspondence: Gang Fang (gangfang cs umn edu) and Vipin Kumar (kumar cs umn edu)