Project Summary
The use of multidisciplinary scientific evidence based practice (EBP) guidelines during hospitalization can assist low income and minority populations to regain and maintain health, thus reducing rehospitalization. However, EBP guidelines may not be equally effective across all populations. The national mandate for all health professionals to implement interoperable electronic health records (EHRs) by 2015 provides an opportunity for reuse of EHR data to address new research questions that explore patterns of patient characteristics and resources, EBP interventions (actions of health professionals in treatment of the patient), and improvement in health, e.g., the effectiveness of multi-disciplinary EBP during hospitalization and follow-up. This exploratory project is interested in groups of patients that (1) share a particular condition, e.g., severe sepsis and septic shock or patients with diabetes or diabetic complications, (2) were hospitalized for this condition or related complications, (3) were treated as an outpatient in a clinic during a succeeding period of time, and (4) have an identifiable outcome, e.g., rehospitalization, emergency room (ER) visits, death related to the condition, or condition under control without rehospitalization. These patients will be analyzed to understand the differences between patients with the same condition but different outcomes, with the goals of (1) evaluating whether EBP guidelines made a difference and (2) discovering interventions which lead to improvement in outcomes that may need to be added to EBP guidelines. Achievement of these goals requires the development of new analysis techniques for deriving insights into health outcomes from EHR data.
The algorithms and approaches developed in this project will advance health informatics by enabling researchers to extract, from the relatively raw and unorganized mass of data in an EHR, a higher level view of the evolution of the patient's health and treatment over time and use that information to analyze the differences between patients with favorable and unfavorable health outcomes. More specifically, new techniques and tools will be developed to (1) create patient and intervention profiles that summarize important characteristics of the patient, their environment, and their treatment, (2) find groups (clusters) and patterns in these profiles, and (3) use the profiles, clusters, and patterns to analyze the differences in outcomes between patients with a common health condition.
Achievement of these goals poses significant challenges. For instance, EHR data in its original form is, for research and analysis purposes, mostly in a relatively unorganized and low-level format, e.g., flowsheets, which contain primarily nursing documentation, have numerous rows of data representing patient assessments and results as well as laboratory and other diagnostic tests. This necessitates the extraction and summarization of information relevant for the task. Because time plays such an important role in this data, extracting useful features from the data across time is critical. However, the time series involved are often irregular. More generally, not all patients have the same set of information and information is not available at regular intervals. Furthermore, data may need to be viewed at multiple temporal resolutions, e.g., sudden increase in blood pressure versus gradual, but noisy increase over several years. Additional complexities arise from population substructure, differences in the types of features, incorporating knowledge of prior dependencies among features, and incomplete and missing data. This project will address these challenges. Success in these efforts will advance data mining in the areas of classification, clustering and pattern mining, as well as various types of temporal data analysis, including trend, change point, and anomaly detection.
The novel pattern mining approaches proposed for this project will help generate the insights biomedical researchers need to make progress in understanding a number of serious health problems and avoiding poor outcomes. Such progress is likely to advance personalized health care and thus has the potential to improve human health and reduce health care costs. Beyond health applications, this work has broad and immediate applications to any complex system for which creating a comprehensive predictive model for complex entities is often unrealistic, at least in the near future, and the best that can be hoped for is to identify specific patterns that provide insight into the current or future state of an entity or system with respect to certain specific conditions of interest. Examples include transportation and energy systems, business and government organizations, ecosystems, sophisticated machinery, and computer / network systems. The creation of the proposed frameworks and algorithms will also directly train a number of graduate and undergraduate students in the areas of data mining and its use in analyzing health data. The results of this project will be presented in various conferences and journals in computer science, as well as those in domains related to the target applications.