Research Projects

EHR-based Pharmacogenomics
This line of research is to develop informatics approaches to extract phenotypic data (drug exposure and drug reponse) for pharmacogenomics research from EHRs. It inovloves natural language processing, machine learning, and data mining technologies. Currently we are working on extracting medication information from clinical notes and modeling drug exposure status of patients based on longitudinal data from EHR. We are collaborating with clinical teams to investigate pharmacogenomics of multiple drugs including wafarin, irinotecan, and tacrolimus. This work is funed by PGRN and VESPA grants.

Cancer epidemologic studies
The specific aim of this funded study is to develop an automated informatics approach to extract both fine-grained cancer findings and general clinical information from electronic medical records and use them to conduct cancer related epidemiological studies.

Recognition and Disambiguation of Clinical Abbreviations
This funded project is to develop a frameword that can 1) recognize abbrevaitions from clinical text; 2) build sense inventories of clinical abbreviations; 3) disambiguate abbreviations based on context; and 4) real-time encode abbreviations to remove ambiguity at the entry time.

Basic Methods of NLP and Text Mining
We are interested in developing new algorithms and systems in following NLP and Text Mining areas: Grammar Induction from clinical text; Statistical Parsing; Topic Modeling using Latent Dirichlet Allocation

Literature mining of nutrition studies
Nutrition plays an important role in disease prevention and treatment. This project is to extract gene/nutrition/disease knowledge from Pubmed articles, thus to facilitate personalized nutrition.

More About Me