Data Mining for Biomedical Informatics Group

SHEAR: Sample heterogeneity estimation and assembly by reference


SHEAR is a tool for next-generation sequencing data analysis that predicts SVs, accounts for heterogeneous variants by estimating their representative percentages, and generates personal genomic sequences to be used for downstream analysis. By utilizing structural variant detection algorithms, SHEAR also offers improved performance in the form of a stronger ability to handle difficult structural variant types and improved computational efficiency.

SDC: Subspace differential coexpression


SDC is a tool used to identify differential coexpression patterns in case-control gene expression data. We extend differential coexpression analysis by defining a subspace differential coexpression pattern, i.e., a set of genes that are coexpressed in a relatively large percent of samples in one class, but in a much smaller percent of samples in the other class. We propose a general approach based upon association analysis framework that allows exhaustive yet efficient discovery of subspace differential coexpression patterns.

RAP: Range-support association patterns


Range-support Association Pattern (RAP) is a novel approach based on association pattern analysis for discovering constant-row biclusters in gene expression data. Contrary to traditional association pattern discovery approaches, RAP works with real-valued data sets without discretizing them. RAP discovers small, highly-coherent biclusters as opposed to large blocks discovered by traditional biclustering approaches.

SMP: SupMaxPair


SMK aims at the efficient discovery of discriminative patterns from biological data with high density and high dimensionality (e.g. gene expression data and SNP data), and especially for the discovery of those patterns with relatively low-support but high discriminative power (e.g. odds ratio, information gain, p-value, etc), which complements existing discriminative pattern mining algorithms.

ETI: Error-tolerant itemsets


Traditional association mining algorithms use a strict definition of support that requires every item in a frequent itemset to occur in each supporting transaction. In real-life data sets, this limits the recovery of frequent patterns as they are fragmented due to random noise and other errors in the data. We implemented a suite of algorithms to discover approximate frequent itemsets in the presence of noise.