Publications
Journal Papers

D. Varodayan, D. Chen, M. Flierl, and B. Girod,
"Wyner-Ziv coding of video with unsupervised motion vector learning",
Signal Processing: Image Communication, Vol. 23, No. 5, June 2008.
[
Abstract]
[
Paper]

|
Distributed source coding theory has long promised a new method of encoding video that is much lower in complexity than conventional methods.
In the distributed framework, the decoder is tasked with exploiting the redundancy of the video signal.
Among the difficulties in realizing a practical codec has been the problem of motion estimation at the decoder.
In this paper, we propose a technique for unsupervised learning of forward motion vectors during the decoding of a frame with reference to its previous reconstructed frame.
The technique, described for both pixel-domain and transform-domain coding, is an instance of the Expectation Maximization algorithm.
The performance of our transform-domain motion learning video codec improves as GOP size grows.
It is better than using motion-compensated temporal interpolation by 0.5 dB when GOP size is 2, and by even more when GOP size is larger.
It performs within about 0.25 dB of a codec that knows the motion vectors through an oracle, but is hundreds of orders of magnitude less complex than a corresponding brute-force decoder motion search approach would be.
|

|

|

D. Carpenter, T. Bell, D. Chen, D. Ng, C. Baran, B. Reinisch, and I. Galkin,
"Proton cyclotron (PC) echoes and a new resonance observed by the RPI instrument on the IMAGE satellite",
Journal of Geophysical Research, Vol. 112, August 2007.
[
Abstract]
[
Paper]

|
At various altitudes in the plasmasphere, sounder pulses from the Radio Plasma Imager (RPI) instrument on the IMAGE satellite can couple strongly to protons, a process revealed in echo time delay versus frequency forms that arrive at multiples of the local proton cyclotron period tp.
Lower-altitude (<4000 km) versions of two of these proton cyclotron (PC) forms were previously observed in the topside ionosphere.
A new resonance, apparently confined to altitudes above ~7000 km, was observed at a frequency ~15% above the electron cyclotron frequency fce.
We believe that PC echoes and the new resonance are driven by a variety of mechanisms, but only exceptionally strong echoes in the whistler-mode domain are discussed in detail.
Those echoes indicate that peak excitation of the protons occurs as a transient event at the beginning of each rf pulse.
We infer that there is spatial bunching of accelerated protons during the initial formation of an electron sheath around the positive-voltage antenna element.
The gyrating protons then produce a series of electrostatic pulses at multiples of tp.
The most efficient proton excitation is expected to occur at frequencies near and below the proton plasma frequency fpp.
In contrast to these echoes, the discrete PC echoes above fce and near fZ show evidence of thermal-mode wave propagation at the rf frequency of the sounder pulses, while the new resonance above fce suggests the existence of a ringing phenomenon in the plasma that is unique to altitudes above ~7000 km.
|

|

|
Conference Papers

D. Chen, S. Tsai, R. Vedantham, R. Grzeszczuk, and B. Girod,
"Streaming mobile augmented reality on mobile phones",
International Symposium on Mixed and Augmented Reality (ISMAR), Orlando, Florida, October 2009.
[
Abstract]
[
Paper]
[
Presentation]
[
Video]

|
Continuous recognition and tracking of objects in live video captured on a mobile device enables real-time user interaction.
We demonstrate a streaming mobile augmented reality system with 1 second latency.
User interest is automatically inferred from camera movements, so the user never has to press a button.
Our system is used to identify and track book and CD covers in real time on a phone's viewfinder.
Efficient motion estimation is performed at 30 frames per second on a phone, while fast search through a database of 20,000 images is performed on a server.
|

|

|

S. Tsai, D. Chen, G. Takacs, V. Chandrasekhar, J. Singh, and B. Girod,
"Location coding for mobile image retrieval",
International Mobile Multimedia Communications Conference (MobiMedia), London, England, September 2009.
[
Abstract]
[
Paper]
[
Presentation]

|
For mobile image retrieval, efficient data transmission can be achieved by sending only the query features.
Each query feature is composed of a descriptor and a location in the image.
The former is used to find candidate matching images using a "bag-of-words" approach, while the latter is used in a geometric consistency check to map features in the query image to corresponding features in the database image.
We investigate how to compress the location information and how lossy compression affects the geometric consistency check.
The location information is converted into a location histogram and a context-based arithmetic coding with location refinement method is then proposed to code the histogram.
The effects of lossily compressing the location information are evaluated empirically in terms of the errors in corresponding features and the error of the estimated geometric transformation model.
From our experiments, rates at ~5.1 bits per feature can achieve errors comparable to lossless coding.
The proposed scheme achieves a 12.5x rate reduction compared to the floating point representation, and 2.8x rate reduction compared to a fixed point representation.
|

|

|

V. Chandrasekhar, D. Chen, Z. Li, G. Takacs, S. Tsai, R. Grzeszczuk, and B. Girod,
"Low-rate image retrieval with tree histogram coding",
International Mobile Multimedia Communications Conference (MobiMedia), London, England, September 2009.
[
Abstract]
[
Paper]
[
Presentation]

|
To perform image retrieval using a mobile device equipped with a camera, the mobile captures an image, transmits data wirelessly to a server, and the server replies with the associated database image information.
Query data compression is crucial for low-latency retrieval over a wireless network.
For fast retrieval from large databases, Scalable Vocabulary Trees (SVT) are commonly employed.
In this work, we propose using distributed image matching where corresponding Tree-Structured Vector Quantizers (TSVQ) are stored on both the mobile device and the server.
By quantizing feature descriptors using an optimally pruned TSVQ on the mobile device and transmitting just a tree histogram, we achieve very low bitrates without sacrificing recognition accuracy.
We carry out tree pruning optimally using the BFOS algorithm and design criteria for trading off classification-error-rate and bitrate effectively.
For the well known ZuBuD database, we achieve 96% accuracy with only ~1000 bits per image.
By extending accurate image recognition to such extremely low bitrates, we can open the door to new applications on mobile networked devices.
|

|

|

V. Chandrasekhar, G. Takacs, D. Chen, S. Tsai, R. Grzeszczuk, and B. Girod,
"CHoG: compressed histogram of gradients",
IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), Miami, Florida, June 2009.
[
Abstract]
[
Paper]
[
Presentation]

|
Establishing visual correspondences is an essential component of many computer vision problems, and is often done with robust, local feature-descriptors.
Transmission and storage of these descriptors are of critical importance in the context of mobile distributed camera networks and large indexing problems.
We propose a framework for computing low bit-rate feature descriptors with a 20x reduction in bit rate.
The framework is low complexity and has significant speed-up in the matching stage.
We represent gradient histograms as tree structures which can be efficiently compressed.
We show how to efficently compute distances between descriptors in their compressed representation eliminating the need for decoding.
We perform a comprehensive performance comparison with SIFT, SURF, and other low bit-rate descriptors and show that our proposed CHoG descriptor outperforms existing schemes.
|

|

|

S. Tsai, D. Chen, J. Singh, and B. Girod,
"Image-based retrieval with a camera-phone", Technical Demo,
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Taipei, Taiwan, April 2009.
[
Abstract]
[
Paper]
[
Presentation]
[
Video]

|
Image-based retrieval with a camera-phone is gaining audience in mobile applications such as virtual city tour guides, movie poster recognition for trailer previews, and CD cover identification for music shopping.
The hassle of typing and keying in text is relieved by imaging the object the user wishes to query.
The camera-phone uploads the query image or query data extracted from the image to a server over a wireless connection.
Using the received query information, the server quickly and reliably identifies the matching object from a large database of object images, and returns the information the user desires back to the camera-phone.
|

|

|

M. Makar, C.-L. Chang, D. Chen, S. Tsai, and B. Girod,
"Compression of image patches for local feature extraction",
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Taipei, Taiwan, April 2009.
[
Abstract]
[
Paper]
[
Presentation]

|
Local features are widely used for content-based image retrieval and object recognition.
We present an efficient method for encoding digital images suitable for local feature extraction.
First, we find the patches in the image corresponding to the detected features.
Then, we extract these patches at their characteristic scale and orientation and encode them for efficient transmission.
A Discrete Cosine Transform (DCT) with adaptive block size is used for patch compression.
We compare this method to directly compressing feature descriptors using transform coding.
Experimental results show the superior performance of our technique.
Image patches can be compressed to rates around 55 bits/patch (18x compression relative to uncompressed SIFT feature descriptors) and still achieve good image matching performance.
|

|

|

D. Chen, S. Tsai, V. Chandrasekhar, G. Takacs, J. Singh, and B. Girod,
"Tree histogram coding for mobile image matching",
IEEE Data Compression Conference (DCC), Snowbird, Utah, March 2009.
[
Abstract]
[
Paper]
[
Presentation]
[
Video]

|
For mobile image matching applications, a mobile device captures a query image, extracts descriptive features, and transmits these features wirelessly to a server.
The server recognizes the query image by comparing the extracted features to its database and returns information associated with the recognition result.
For slow links, query feature compression is crucial for low-latency retrieval.
Previous image retrieval systems transmit compressed feature descriptors, which is well suited for pairwise image matching.
For fast retrieval from large databases, however, scalable vocabulary trees are commonly employed.
In this paper, we propose a rate-efficient codec designed for tree-based retrieval.
By encoding a tree histogram, our codec can achieve a more than 5x rate reduction compared to sending compressed feature descriptors.
By discarding the order amongst a list of features, histogram coding requires 1.5x lower rate than sending a tree node index for every feature.
A statistical analysis is performed to study how the entropy of encoded symbols varies with tree depth and the number of features.
|

|

|

V. Chandrasekhar, G. Takacs, D. Chen, S. Tsai, J. Singh, and B. Girod,
"Transform coding of image feature descriptors",
SPIE Visual Communications and Image Processing (VCIP), San Jose, California, January 2009.
[
Abstract]
[
Paper]
[
Presentation]

|
We investigate transform coding to efficiently store and transmit SIFT and SURF image descriptors.
We show that image and feature matching algorithms are robust to significantly compressed features.
We achieve near-perfect image matching for both SIFT and SURF using ~2 bits/dimension.
When applied to SIFT and SURF, this provides a 16x compression relative to conventional floating point representation.
We establish a strong correlation between MSE and matching error for feature points and images.
Feature compression enables many application that may not otherwise be possible, especially on mobile devices.
|

|

|

D. Chen, S. Tsai, V. Chandrasekhar, G. Takacs, J. Singh, and B. Girod,
"Robust image retrieval using multiview scalable vocabulary trees",
SPIE Visual Communications and Image Processing (VCIP), San Jose, California, January 2009.
[
Abstract]
[
Paper]
[
Presentation]
[
Video]

|
Content-based image retrieval using a Scalable Vocabulary Tree (SVT) built from local scale-invariant features is an effective method of fast search through a database.
An SVT built from fronto-parallel database images, however, is ineffective at classifying query images that suffer from perspective distortion.
In this paper, we propose an efficient server-side extension of the single-view SVT to a set of multiview SVTs that may be simultaneously employed for image classification.
Our solution results in significantly better retrieval performance when perspective distortion is present.
We also develop an analysis of how perspective increases the distance between matching query-database feature descriptors.
|

|

|

S. Tsai, D. Chen, J. Singh, and B. Girod,
"Rate-efficient, real-time CD cover recognition on a camera-phone", Technical Demo,
ACM Multimedia (ACM MM), Vancouver, British Columbia, Canada, October 2008.
[
Abstract]
[
Paper]
[
Video]

|
Automatic CD cover recognition has interesting applications for comparison shopping and music sampling.
We demonstrate a real-time CD cover recognition using a cameraphone.
By snapping a picture of a CD cover with her cameraphone, a user can conveniently retrieve information related to the CD.
Robust image feature extraction is applied to overcome the image distortions in the query photo.
To limit the amount of data transmitted over a wireless network, we compress the query image or features extracted from the query image.
On the database side, fast and reliable image matching against a database of 10,000 CD covers is accomplished using a scalable vocabulary tree.
|

|

|

D. Varodayan, D. Chen, and B. Girod,
"Network image coding for multicast",
IEEE International Workshop on Multimedia Signal Processing (MMSP), Queensland, Australia, October 2008.
[
Abstract]
[
Paper]
[
Presentation]

|
We consider a new problem in network image coding for multicast.
In a multihop mesh network, structured as a directed graph, all nodes decode and display reconstructions of the image (at possibly different qualities).
Each node may also perform transcoding before transmitting data downstream in the network.
The problem is the design of the coding and transcoding schemes to deliver the best image quality over the network.
For a network with diamond topology, we show that multiple description coding combined with Wyner-Ziv transcoding is often superior to other methods.
We argue further that the benefits are magnified for larger networks containing one or more diamond subnets.
Our image coding experiments demonstrate that multiple description coding with Wyner-Ziv transcoding outperforms single description coding or multiple description coding with conventional transcoding, for both a diamond network and a two-hop mesh network with four branches.
|

|

|

D. Chen, V. Chandrasekhar, G. Takacs, J. Singh, and B. Girod,
"Color restoration for objects of interest using robust image features",
IEEE International Workshop on Multimedia Signal Processing (MMSP), Queensland, Australia, October 2008.
[
Abstract]
[
Paper]
[
Presentation]

|
Illumination distortion due to uncontrolled lighting can severely degrade the color appearance of a photo.
Frequently, the desired colors for objects in a newly taken query image are found in a previously stored database image.
Then, the goal is to change the colors in the query image to match the colors in the database image.
This paper presents a color restoration system that automatically retrieves a database image which matches the query image, even if the two images are taken from different viewpoints and under different illuminations.
Robust features enable both accurate retrieval from the database and efficient sampling of the color differences between the query and database images.
A spatially varying color mismatch model is generated, and the colors of the query image are effectively restored.
|

|

|

D. Chen, D. Varodayan, M. Flierl, and B. Girod,
"Wyner-Ziv coding of multiview images with unsupervised learning of disparity and Gray code",
IEEE International Conference on Image Processing (ICIP), San Diego, California, October 2008.
[
Abstract]
[
Paper]
[
Presentation]

|
Wyner-Ziv coding of multiview images avoids communications between source cameras.
To achieve good compression performance, the decoder must relate the source and side information images.
Since correlation between the two images is exploited at the bit level, it is desirable to map small Euclidean distances between coefficients into small Hamming distances between bitwise codewords.
This important mapping property is not achieved with the binary code but can be achieved with the Gray code.
Comparing the two mappings, it is observed that the Gray code offers a substantial benefit for unsupervised learning of unknown disparity but provides limited advantage if disparity is known.
Experimental results with multiview images demonstrate the Gray code achieves PSNR gains of 2 dB over the binary code for unsupervised learning of disparity.
|

|

|

D. Chen, D. Varodayan, M. Flierl, and B. Girod,
"Wyner-Ziv coding of multiview images with unsupervised learning of two disparities",
IEEE International Conference on Multimedia and Expo (ICME), Hannover, Germany, June 2008.
[
Abstract]
[
Paper]
[
Presentation]

|
Wyner-Ziv coding of multiview images is an attractive solution because it avoids communications between individual cameras.
To achieve good rate-distortion performance, the Wyner-Ziv decoder must reliably estimate the disparities between the multiview images.
For the scenario where two reference images exist at the decoder, we propose a codec that effectively performs unsupervised learning of the two disparities between an image being Wyner-Ziv coded and the two reference images.
The proposed two-disparity decoder disparity-compensates the two references images and generates side information more accurately than an existing one-disparity decoder.
Experimental results with real multiview images demonstrate that the proposed codec achieves PSNR gains of 1-5 dB over the one-disparity codec.
|

|

|

D. Chen, D. Varodayan, M. Flierl, and B. Girod,
"Distributed stereo image coding with improved disparity and noise estimation",
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Las Vegas, Nevada, March 2008.
[
Abstract]
[
Paper]
[
Presentation]

|
Distributed coding of correlated grayscale stereo images is effectively addressed by a recently proposed codec that learns block-wise disparity at the decoder.
Based on the Slepian-Wolf theorem, one image can be transmitted at a rate approaching the conditional entropy if the other image is referenced as side information at the decoder.
This paper improves the methods in the decoder design by refining disparity estimates to pixel resolution, generating more accurate initial disparity estimates, and modeling noise as a nonstationary random field.
The new decoder enables up to an additional 9 percent bit rate savings for lossless coding.
When the rate is insufficient for lossless reconstruction, the new decoder improves PSNR and significantly reduces visually unpleasant blocking artifacts.
|

|

|

D. Chen and C. Musat,
"Vis3D: A new data viewer with fast volume rendering",
American Geophysical Union Annual Conference (AGU AC), San Francisco, California, December 2007.
[
Abstract]
[
Presentation]

|
Accurate, fast, and convenient visualization of large data sets is a recurring need in seismic data processing and analysis.
Modern data viewers should exploit state-of-the-art volume rendering technologies to effectively generate images with short time delays and maintain viewer interactivity.
A new data viewer called Vis3D is presented which uses a wavelet-based multi-resolution framework to achieve fast volume rendering.
Each data set is converted into a multi-resolution pyramid, and the resolution most appropriate for the available computing resources is chosen.
|

|

|

A. Mavlankar, D. Chen, S. Zakhary, M. Flierl, and B. Girod,
"Noise processing for simple Laplacian pyramid synthesis based on dual frame reconstruction",
Picture Coding Symposium (PCS), Lisbon, Portugal, November 2007.
[
Abstract]
[
Paper]
[
Presentation]

|
The Laplacian pyramid (LP) provides a frame expansion.
Thus, there exist infinitely many synthesis operators which achieve perfect reconstruction in the absence of quantization.
However, if the subbands are quantized in the open-loop mode then the dual frame synthesis operator, which is the pseudo-inverse of the analysis operator, minimizes the mean squared error (MSE) in the reconstruction.
Note that this requires modification of the conventional simple synthesis scheme.
For the open-loop mode, we propose novel quantization noise processing at the encoder that allows us to achieve the same performance as dual frame reconstruction and yet retain the simple synthesis scheme at the decoder.
This has the advantage that the decoder can be simple in structure as well as be agnostic of whether the encoder was open-loop or closed-loop and achieves minimum MSE reconstruction for both cases.
Experimental results show a gain of around 1 dB with the dual frame reconstruction compared to the simple synthesis operator.
Furthermore, experiments confirm that this gain can also be obtained by retaining the simple synthesis operator and performing the proposed quantization noise processing at the encoder.
|

|

|

D. Chen, R. Clapp, and B. Biondi,
"Data-fusion of volumes, visualization of paths, and revision of viewing sequences in Ricksep",
American Geophysical Union Annual Conference (AGU AC), San Francisco, California, December 2006.
[
Abstract]
[
Paper]
[
Presentation]

D. Chen, A. Chang, D. Bates, M. Pivi, and T. Raubenheimer,
"Single-bunch electron cloud effects in the GLC/NLC US-Cold and Tesla low emittance transport lines",
European Particle Accelerator Conference (EPAC), Lucerne, Switzerland, July 2004.
[
Abstract]
[
Paper]

|
This paper examines the severity of the electron cloud effects in the Low Emittance Transport (LET) of linear colliders including the Bunch-Compressor System (BCS) and Beam Delivery System (BDS).
We examine the electron cloud effects in the normal-conducting GLC/NLC or X-Band, and the super-conducting US-Cold and TESLA linear collider designs through the use of specially developed computer simulation codes.
An estimate of the critical cloud density is given for the BDS and BCS of the X-Band collider.
|

|

|
Other Presentations

"Streaming mobile augmented reality",
Bay Area Vision Meeting (BAVM), August 2009.
[
Presentation]

"Image-based retrieval using a camera phone",
Stanford Computer Forum Annual Meeting (CFAM), April 2009.
[
Presentation]

"CD cover recognition on cell phones",
Guest Lecture, EE 368 Digital Image Processing taught by Prof. Bernd Girod, Stanford University, May 2008.
[
Presentation]

"Distributed stereo image coding with improved disparity and noise estimation",
Stanford Computer Forum Annual Meeting (CFAM), April 2008.
[
Presentation]

"Distributed image coding with improved estimation of disparity and noise",
Final Presentation, Received Best Project Award, EE 398B Image Communications II taught by Prof. Bernd Girod, Stanford University, May 2007.
[
Presentation]