Research

My primary interests are in the application of computing technologies to biomedical problems, particularly in creating structured and computer-accesible representations of biomedical and radiological knowledge that enable humans and machines to manage and analyze massive data sets. I am particularly interested in computational methods to integrate diverse clinical and imaging data, data mining, information extraction from medical reports, and using machine learning methods to improve physician diagnostic accuracy.

Large biomedical databases are helping catalyze research in the basic sciences. Likewise, massive clinical data sets in hospital repositories are potentially useful in research, teaching, and process improvement. Making optimal use of these clinical data requires a combination of appropriate knowledge representation and optimal techniques for data mining and analysis. In order to effectively access, manage, and analyze the massive amounts of data in these resources, structured representations of knoweldge are needed for annotation and information summarization.

I am interested in developing the informatics methods and infrastructure to make discovery of new information from biomedical data possible. My current efforts are in four areas. First, I am interested in techniques for knowledge representation and knowledge discovery from biomedical databases. I am working with a team of researchers to create the National Center for Biomedical Ontology, with the aim to create tools to enable researchers access and interpret biomedical data using ontologies. I am a particpant on the subcommittee for the RadLex project, whose aim is to create a lexicon to provide a uniform structure for capturing, indexing, and retrieving a variety of radiology information sources. I am also participating in the Cancer Biomedical Informatics Grid (caBIG) project of the National Cancer Institute, to identify and develop shared and open standards, tools and data sources that provide for the exchange of biomedical data throughout the cancer research community. Finally, I am working on a project at Stanford called Radbank, a project to link all radiology and pathology reports in the the medical enterprise. I am interested in developing techniques for feature extraction and data mining in this large database, and methods that could uncover the significance of particular combinations of findings related to disease. As part of this work, I am also interested in novel user interfaces to biomedical data, computational architectures for managing the diverse data and knowledge in radiology, and linking this resource with external databases.

Second, I am working on a project to link ontologies with geometrical models of anatomy. The goal is to integrate anatomic knowledge with geometrical models to predict the consequences of penetrating injury. Geometrical models generally contain no intrinsic information that a computer can use for reasoning tasks, such as inferring consequences of injury. By integrating knowledge into geometrical models, we can create representations that are computable and can be used in intelligent applications.

Third, I am interested in developing statistical natural language processing (NLP) methods to extract and summarize information in radiology reports and published articles. This can be useful for healthcare process improvement and quality assessment. I have been applying these methods to discover articles in the literature that contain pharmacogenetics data (to streamline the process of curating the literature in a pharmacogenetics database), as well as to analyze a database of ultrasound reports to recognize and categorize positive DVT reports, negative DVT reports, and non-DVT studies to measure the utilization rate of DVT ultrasound.

Finally, I am interested in using probabilistic modeling methods, specifically Bayesian networks, to model the uncertainties that relate radiology findings to diagnoses and build diagnostic decision support tools to improve radiologist diagnostic accuracy. In mammography, there is a spectrum of diagnostic accuracy of practitioners that is attributed to differences in experienced. We believe that greater consistency among radiologists may be attained using a decision support tool (in addition to education). I have implemented a Bayesian network that relates BI-RADS descriptors of findings seen on mammography with breast disease diagnoses. Preliminary results with our BN suggest that it accurately estimates the probability that imaging findings represent various pathophysiologic conditions. We have also demonstrated that our Bayesian network can help radiologists identify breast biopsies that are discordant with mammography and discover cases where biopsy sampling error may have occurred.