The Data Coordinating Center
The Data Coordinating Center (DCC) provides services to members of the
School of Medicine.
The services have as their main focus needs related
to managing data of ongoing or new research projects. We specialize in
the planning, development, management, and operation of systems that
ensure achievement of of the goals of these projects in a
technologically modern environment.
About Us
We are a service center of the School of Medicine that is based in its
Department of Health Research and Policy. The DCC was launched formally
in March 2002 to fulfill the needs of research in the Stanford
University School of Medicine.
We are best reached via email. Please send email to
hrpdcc@lists.stanford.edu with a
brief description of your project and we will get back to you.
Organization
The DCC is headed by
Balasubramanian Narasimhan and
overseen by a senior advisory board consisting of
- Mark A. Hlatky,
Professor, Department of Health Research and Policy, Professor of Cardiovascular
Medicine
- Phil Lavori, Professor
(Research) of Health Research and Policy
- Ronald Levy, Professor of Medicine, Chief, Division of Oncology
- Richard A. Olshen, Professor
and Chief, Division of Biostatistics, Department of Health Research and Policy
- Thomas Quertermous,
William G. Irwin Professor in Cardiovascular Medicine
- Robert J. Tibshirani, Professor
and Chair, Department of Health Research and Policy and Professor of Statistics
by courtesy
- Alice Whittemore,
Professor and Associate Chair, Department of Health Research and Policy
Professor Olshen chairs the board.
People
The current staff consists of
Services
The DCC provides several services to School of Medicine research activities,
contingent upon the availability of sufficient resources on both DCC and project
ends. These services can be divided broadly into several categories.
Planning
DCC personnel work with investigators to plan for their needs as they pertain to
data. The ideal engagement occurs at the time a project is being conceived or
a grant application is being written, rather than later, although we realize
this may not be possible for some projects. The planning phase involves
- Determining the duration and requirements of the project
- Determining how the investigators would interact with project data. A
guiding principle is that data that are stored are meant to be retrieved
conveniently!
- Estimating the resources needed for a project in terms of hardware,
software and personnel
- Charting a time line for execution of the deliverables with specified
milestones
- Reviewing the involvement of the DCC after the project funding ceases
Infrastructure
The DCC infrastructure enables us to provide the
following technological capabilities to clients.
- Modern database management with entry and access over the Web when
necessary with rigorous attention to the quality of the data
- Establishment and maintenance of databases that can be scaled to increase
in size seamlessly as they evolve in terms of hardware, algorithms, and needs
of particular projects
- Development of tools for rapid prototyping of data forms and relationships
among sources of data, including porting data from laboratory instrumentation
to integrated central databases
- Provision of access to various secondary databases, including extensive
links to publicly available databases
Security
The DCC provides investigators with a secure environment to store
their data. Attention is paid to both physical security (locks and keys) and
to the security of data (on site and off-site backups, encryption, authentication,
role-based access). The DCC uses Secure Socket Layer (SSL) connections when
providing researchers to role-based access the data over the web. Security
procedures are continuously monitored and upgraded as necessary. The DCC works
with the Stanford University Privacy and Data Security Officer to comply with
all HIPAA regulations as they
pertain to a project.
Science and Education
The DCC has close connections with the Division of Biostatistics,
the Department of Statistics and the
Department of Genetics. The
DCC brings developments in computer-intensive statistical inference to bear upon
our data. Our collaborations have resulted in production of widely used tools
such as enhancements to CARTR (Olshen et. al.), and original development of
SAM (Tibshirani, Narasimhan
et.al.) and PAM (Hastie,
Tibshirani, Narasimhan, et. al.).
DCC personnel are also involved in education of colleagues and investigators.
Our staff have given seminars and lectures on data management, security in
classrooms and seminars as opportunities permit.
Current Projects
The DCC is involved in the following projects.
- SAPPHIRe
- The Stanford Asian Pacific Program in Hypertension and Insulin Resistance
(SAPPHIRe) is part of the Family Blood Pressure Program (FBPP) network and
is funded by NHLBI. The first phase of this project was a collaboration among
Stanford University, Hawaii and Taiwan. In 2000, this project entered a second
follow-up phase. Dr. Thomas Quertermous,
William G. Irwin Professor in Cardiovascular Medicine, is the principal investigator.
The DCC is involved in data entry, management and reporting for this project.
The initial application used a Sybase database with a Perl/CGI interface.
The application was completely rewritten and ported to a modern Java/Oracle
interface and in the process a number of enhancements and new features were
added to this project.
- PIMA
- This is an NIH-funded clinical trial headed by Dr. Bryan Myers, Professor
and Chief, Division of Nephrology, Stanford University School of Medicine
in collaboration with Robert G. Nelson of NIH. In this study on the population
of PIMA Indians in Arizona, individuals are entered into a randomized, controlled
trial of losartan plus standard care versus standard care over several years.
The DCC is involved in building data entry and reporting systems for the entire
project.
- NOPain
- This study deals with the manipulation of the nitrous oxide synthase pathway
in arterial disease using L-arginine. It consists of two parts, one a dose-ranging
study and another a randomized controlled clinical trial. The DCC is involved
in designing the data entry systems and reporting systems for the entire project.
Dr. John
Cooke, Professor of Medicine and Director, Section of Vascular Medicine,
Stanford University School of Medicine is the principal investigator.
- Genetic Determinants of PAD
- This is a large study of the genetic determinants that increase the propensity
of an individual to develop hemodynamically significant atherosclerosis in
the arteries of a lower extremity. Through these efforts investigators will
also examine the interactions of genetic determinants with known risk factors
for atherosclerosis. Principal Investigator is Dr. John P. Cooke, with co-PI
Dr. Thomas Quertermous. This project therefore dovetails well with the SAPPHIRe
and NOPain projects and with the Reynolds Center in that DCC technologies
brought to bear upon the earlier projects will enable our work here. Expertise
at finding SNPs, as in SAPPHIRe and the Reynolds Center, will figure here,
and so, too, will microarray analysis. This project will be somewhat different
from the others in that genotyping will be done in the Cardiovascular Research
Center on the Stanford Campus proper. Our approach via the Web will once again
prove important.
- CHIPCSD
- The Children's Health Initiative (CHI) funded a project for creating a pediatric
cardiac surgery database. The goal is to build a database that is geared both
to research and to patient care. Our main contact is
Dr. Daniel Bernstein, Professor and Chief,
Division of Cardiology in the Department
of Pediatrics, Stanford University School of Medicine . The project is currently
under development and is expected to go live sometime in June 2003.
- Hypoxic Cytotoxins
- This project consists of four sub-projects each dealing with a different
aspect of cytotoxic drug treatment for cancer. Project 1 seeks to design,
synthesize and further develop several series of small-molecule drugs for
each of the other projects. Project 2 will develop new prodrugs that become
activated to cytotoxic anticancer drugs by the nonpathogenic obligate anaerobe
C sporogenes genetically engineered to express the prodrug-activating enzymes.
Project 3 aims to develop an improved analog of the hypoxia-selective cytotoxin
tirapazamine (TPZ) and the last, Project 4 hopes to find drugs that are preferentially
toxic to cells expressing the hypoxia inducible transcription factor, HIF-1a.
This is an effort led by Dr. Martin Brown, Professor
of Radiation Oncology, in collaboration with researchers in New Zealand.
The project is currently under development and expected to go live some time
in June 2003.
- The Reynolds Center at Stanford
- The aim of the Donald
W. Reynolds Cardiovascular Clinical Research Center at Stanford University
is to provide better care for patients with heart disease through the application
of modern genetic approaches.
Dr. Mark Hlatky,
is Director of the Center, which has a strong collaboration with Kaiser Research in Oakland. Projects seek
to utilize the techniques of modern molecular biology to identify genes for
which abnormalities predispose to heart disease in a specific way. These genes
will then be examined for unique mutations that can serve as markers to track
disease in larger populations. The project is large in scope and consists of
several subprojects.
The DCC is involved in many activities of the Reynolds Center. We have built
systems for recruitment, scheduling clinic visits, generating reports and result
letters, clinical visit data collection, barcode generation, and sample tracking.
As the analysis phase of the project ramps us, the DCC is the place where the
final summary data will reside. Systems are under development to tailor reports
to authorized users of the data for scientific analysis.
- Prospective Randomized Study of Elective Colon and Rectal
Surgery, With and Without Mechanical Bowel Preparation
- This study is undertaken with the leadership of Drs. Mark Welton
and Andrew Shelton of the Department of Surgery. The goal is to
compare rates of infectious complications and rates at which bowel
re-attachments separate in elective colon and rectal surgery, with
and without mechanical cleansing (purging) of the bowel. Again,
we in the DCC work with the investigators to design forms and to
enable entering data over the Web, as well as successfully to
archive the data for future purposes.
- Dr. Ronald Levy's Lymphoma Program
- The Levy Lab, under the direction of Dr. Ronald Levy, has been studying the treatment
of non-Hodgkin's lymphomas, improving therapy of this cancer, understanding their
pathogenesis and studying normal lymphoctye biology. Research in monoclonal antibody
therapies and tumor vaccines is ongoing. The DCC will work with investigators
to design forms and to enable entering data over the Web, as well as successfully to
archive the data for future purposes.
- TA: Viral and Host Mechanisms
- Pathophysiology of transplant coronary vasculopathy focusing on the role of diabetes and CMV infection. Noninvasive diagnosis of cardiac allograft rejection; pathobiology of graft rejection.
Technologies
A founding principle of the DCC was that it would bring to bear developments
in Free Software,
Open Source Software and Web technologies
to bear on its activities. The following are some of the software and tools
used at the DCC.
- GNU/Linux
- The DCC servers are all GNU Linux based systems. GNU/Linux systems provide
us with a solid, secure and stable environment at a fraction of the cost
of other platforms. In particular, we use a
hardened version of Linux developed here
at Stanford University.
- Oracle
- We use oracle as our core database software. Oracle is the premier database
program and has solid support on the Linux platform.
- Java
- As most of the services provided by the DCC are Web-based, we make extensive
use of Java technology from Sun Microsystems.
- Apache
- We run the excellent Apache as our web server with SSL enabled.
- Jakarta Project tools
- The Jakarta project is the source for most of our development tools. At
the DCC, we use
- Ant, the Java-based
tool for building web applications
- Tomcat,
the well-known servlet container for serving up
XML, the eXtensible Markup Language tools such as the Xerces parser and the Xalan style sheet processor. In particular, we also use SVG, Batik, and FOP.
- ECS,
the Element Construction Kit for generating dynamic Web pages.
- Log4j,
a logging library for Java
- POI, a library
for Java for dynamically generating OLE documents on the fly
- REGEXP,
a regular expression library for dynamically validating form fields
and building indigenous tools. We also use the
GNU Regexp library.
- Taglibs,
useful library of custom tags for use with Java Server Pages
- Struts,
a model-view-controller framework for constructing web applications
with servlets and JavaServer Pages
- GNU
- We use a number of Free Software Foundation tools such as Autoconf and Emacs, (with JDE).
- R
- For statistical analysis, we use R, a modern statistical environment for
data analysis. Using remote connection packages, we use R at the backend
for generating statistical analysis and plots.
- Tigris
- We use design tools such as
Argouml, a UML design tool with cognitive
support.
- DataVision
- We use DataVision for generating reports.
We are also evaluating and testing a number of other software such as JBOSS, Eclipse and additional commercial tools that
may not have exact open source equivalents.
Of course, there are situations when no existing software can fit the need.
The DCC has developed indigenous tools in such cases to fill the need.
Contact DCC
Last modified: August 1, 2004