Skip to content

Stanford University
The topic of my research at Stanford University has been video streaming with interactive region-of-interest (IRoI). The user can control pan/tilt/zoom while watching the video. Alternatively, the system can choose an RoI for presentation and relieve some navigation burden.

High-spatial-resolution digital imaging sensors are becoming more widespread. In addition, high-spatial-resolution videos can also be stitched from views from multiple cameras, as implemented by Hewlett-Packard in their video conferencing product Halo [1]. However, challenges in delivering this high-resolution content are posed by the limited resolution of display panels and limited data rate for communication. Imagine that a client, limited by one of these factors, requests the server to stream a high-spatial-resolution video. One approach would be to stream a spatially downsampled version of the entire video scene to suit the client's display window resolution or available data rate. However, with this approach, the user might not be able to watch a local region-of-interest (RoI) in the highest captured resolution. We propose a video delivery system which enables virtual pan/tilt/zoom while watching the video such that the server can adapt and stream only relevant regions of the video. Such a system benefits from spatial-random-access-enabled video compression which allows accessing arbitrary RoIs within the compressed bit-stream. A single encoding prepares a comprehensive bit-stream and relevant parts of this bit-stream can be served to different clients depending on their individual RoIs. We first proposed such a video coding scheme in [2]. We proposed improvements to the coding scheme in [3] and [4].

The figure below illustrates the user interface which allows the user to control the RoI while watching the video. The display screen at the client's side is sub-divided into two areas:
Smooth zoom control is possible by operating the scroll of the mouse. The RoI can be translated by holding down the left mouse-button and moving the mouse. As shown in the figure below, a rectangle, overlaid on the thumbnail, indicates the location of the RoI. The color and size of the rectangle vary according to the zoom factor.


User Interface


We are currently designing the system such that it works over practical packet-switched networks like the Internet. Our research involves various optimizations of the coding scheme along with algorithms for packet scheduling and pre-fetching data [5,6] from the server to ensure good quality of the delivered video along with low latency of interaction.

We have recently proposed [7,8] exploiting overlaps in RoIs within a peer population to employ application-layer multicasting for an efficient and scalable delivery-mechanism. Notable challenges include adapting the overlay topology on-the-fly to account for changing RoIs, stringent latency constraint due to the interactive nature of the system, and limited bandwidth at the server hosting the IRoI video session.

For a comprehensive overview of the proposed system, the reader may refer to [9,10].

Demos:


[a] Aditya Mavlankar recently built a demonstrator at Deutsche Telekom Research Laboratories in Berlin, Germany. The demonstrator shows interactive viewing of a soccer game. The view of the entire soccer playfield was obtained by stitching views from multiple cameras. The RoI can be chosen to conveniently focus on a part of the playfield. Also provided is an automatic mode in which the system can track the ball and choose the RoI. The automatic mode relieves navigation burden although the user can change the zoom factor. Download video of demo.

[b] Head-tracking for finer selection of RoI in automatic mode: A camera placed under the TV screen can track the user's head. The user can shift the RoI to the right or to the left in the automatic mode described in [a] above by moving his/her head. Download video of demo.

[c] ViewXtreme (now called ClassX) is a lecture video recording and publishing system, which Aditya Mavlankar co-invented together with Piyush Agrawal and Prof. Bernd Girod. A short video demonstrating the benefits of the system can be seen here. Roughly, couple of hundred students watch lecture videos using the ClassX system every quarter.

References:


[1] Halo: Video Conferencing Product by Hewlett-Packard

[2] Aditya Mavlankar, Pierpaolo Baccichet, David Varodayan, and Bernd Girod, "Optimal Slice Size for Streaming Regions of High Resolution Video with Virtual Pan/Tilt/Zoom Functionality," Proc. of 15th European Signal Processing Conference (EUSIPCO), Poznan, Poland, Sept. 2007 ([paper], [presentation]) (Best Student Paper Award)

[3] Aditya Mavlankar and Bernd Girod, "Background Extraction and Long-Term Memory Motion-Compensated Prediction for Spatial-Random-Access-Enabled Video Coding," Proc. of International Picture Coding Symposium (PCS), Chicago, Illinois, USA, May. 2009 ([paper])

[4] Aditya Mavlankar and Bernd Girod, "Spatial-Random-Access-Enabled Video Coding for Interactive Virtual Pan/Tilt/Zoom Functionality," IEEE Transactions on Circuits and Systems for Video Technology (CSVT) (Submitted)

[5] Aditya Mavlankar, David Varodayan, and Bernd Girod, "Region-of-Interest Prediction for Interactively Streaming Regions of High Resolution Video," Proc. of 16th IEEE International Packet Video Workshop (PV), Lausanne, Switzerland, Nov. 2007 ([paper], [poster]) (Student Travel Grant Awarded; Sponsored by Vidyo (formerly Layered Media) and Microsoft Research Asia)

[6] Aditya Mavlankar and Bernd Girod, "Pre-Fetching Based on Video Analysis for Interactive Region-of-Interest Streaming of Soccer Sequences," Proc. of IEEE International Conference on Image Processing (ICIP), Cairo, Egypt, Nov. 2009 ([paper])

[7] Aditya Mavlankar, Jeonghun Noh, Pierpaolo Baccichet, and Bernd Girod, "Peer-to-Peer Multicast Live Video Streaming with Interactive Virtual Pan/Tilt/Zoom Functionality" Proc. of International Conference on Image Processing (ICIP), San Diego, CA, USA, Oct. 2008 ([paper])

[8] Aditya Mavlankar, Jeonghun Noh, Pierpaolo Baccichet, and Bernd Girod, "Optimal Server Bandwidth Allocation for Streaming Multiple Streams via P2P Multicast" Proc. of IEEE International Workshop on Multimedia Signal Processing (MMSP), Cairns, Australia, Oct. 2008 ([paper])

[9] Aditya Mavlankar, "Peer-to-Peer Video Streaming with Interactive Region-of-Interest," Ph.D. Dissertation, Department of Electrical Engineering, Stanford University, Apr. 2010 ([pdf])

[10] Aditya Mavlankar and Bernd Girod, "Video Streaming with Interactive Pan/Tilt/Zoom," in M. Mrak, M. Grgic, and M. Kunt (eds.), High-Quality Visual Experience: Creation, Processing and Interactivity of High-Resolution and High-Dimensional Video Signals, Springer (ISBN: 978-3-642-12801-1, [pdf])





Last modified: June 27, 2010.