Quantcast
:: Welcome to the Wong Lab ::
 
 
   
 
   
 
  Welcome to the Wong Lab.

A) OVERVIEW
Our group develops methods and software for the analysis of the data from high throughput genomics projects. Particular interests include the analysis of gene expression profiles and cis-regulatory sequences. We are working closely with collaborating laboratories in investigations of cancer and developmental biology. We develop and enhance tools in exploratory data analysis, multivariate analysis, information theory, machine learning, Monte Carlo, graph theory, linear and nonlinear differential equations, and applied them to problems in computational and systems biology.

B) METHODOLOGICAL RESEARCH

1) Microarray analysis
The current generation of oligonucleotide expression arrays for human or mouse contain enough probe sets to measure the expression levels of almost all mammalian genes. Since the probe sequences are now available from NetAffx, we have the opportunity to identify, for every probe on the array, all transcripts that have the potential to hybridize to the probe. With this information, we are now developing de-convolution algorithms aiming to correct for cross-hybridizing signal contributed by non-target transcripts, thereby allowing more accurate estimates for the levels of low-concentration transcripts.
We are also developing methods for handling non-expression arrays. For example, we work with the Murray lab to analyze tag-probe arrays from yeast competition experiments. We have an on-going collaboration with Affymetrix to develop methods for analyzing data from their 100 SNP arrays, both in the context of linkage and association studies, and for detection of chromosomal copy number aberrations in cancer cells.

2) Cis-regulatory analysis and comparative genomics
We are interested in understanding the co-expression patterns observed from microarray gene expression experiments. To this end we develop methods for the identification of regulatory sequence elements in the upstream or intronic regions of genes showing co-expression. We build probabilistic motifs for binding sites of transcription factors and design statistical methods to assess the significance of recurring motifs and combinatorial patterns of such motifs. The effective use of multiple genome information to support these analyses is a major research goal. We are working with several collaborators who are generating expression profiles and transcription factor binding location data to study mammalian gene regulation.

3) Statistical learning and computation
In addition to employing existing statistical and computing methods in biological applications, we continue to develop new methods in these core methodological areas. For example, we have developed a nonlinear cost function for support-vector machine (SVM) type classifier. Our method, called psi-learning, exploits recent advances in Difference-of-Convex optimization and has provably superior asymptotic rate than the standard SVM. In the unsupervised learning area, we have recently introduced Tight-Clustering, an algorithm that overcomes the combinatorial search complexity to deliver very tight and stable clusters of sizes 5-50. Our experience, from gene expression studies, suggests that these tight clusters are more suitable for biological interpretation than standard hierarchical and K-means clustering.
One of our long-term objectives is to incorporate more biological knowledge in the computational analysis of genomics data. Despite heroic efforts by genome databases, most of the knowledge is still embedded in text format in the primary literature. As such, this knowledge is difficult to use in computational analysis. We are beginning to investigate approaches that allow more efficient use of this knowledge. For example, we have designed the knowledge-management software GeneNotes. This program creates a database to keep track of relations, sentences and paragraphs highlighted during online browsing of abstracts or primary literature. Using natural language processing techniques, the program can draw the researcher’s attention to key phrases and sentences, and can automatically detect binary relations. It can operate in a learning-mode in order to improve its prediction accuracy. The software also provides tools for visualizing and interpreting the information captured by the database. This software can interact with our expression analysis software to enable novel approaches to study the biological background of co-expressed gene clusters.
Our group has longstanding interest in Monte Carlo simulation have contributed to the development of several useful algorithms in this area: data-augmentation, sequential imputation, dynamic weighting and evolutionary MC. Recently we introduced a dual space approach using the energy-temperature duality to design a sampler capable to overcoming steep energy barriers and simultaneously providing estimates of Boltzmann averages and density of states. We are applying this “equi-energy sampling” approach to the study of protein folding using simplified energy function.

C) BIOLOGICAL INTEREST
The major areas in biology of interest to our laboratory include developmental genomics and signal transduction networks. We are as interested in advancing the biological understanding, as in developing the computational analysis methods. We have initiated an effort to study the transcriptional program during early embryonic development. Our approach uses the in vitro development of mouse embryonic stem cells and microarray profiling of FACS-purified cells. Furthermore, we have initiated investigations on Hox gene regulation. Our interest in finding regulatory elements through comparative genomics meshes well with these projects. In the area of signal transduction, we are working with Perrimon on the use RNA interference to investigate the network of interactions among kinases and phosphatases in Drosophila. We are also investigating the use of novel experimental strategies, such as periodic signal input, for the study of genetic networks.
     
 
  Wong Lab, Bio-X program, Stanford University | James Clark Center, 318 Campus Drive, Stanford, CA 94305-4065