Project Proposal
Title : Bandwidth Extrapolation of Audio Signals
Name : Sung-Won Yoon, David Choi

I) Problem : 
Sampling theorem states that the sampling rate limits the frequencies that
can be recovered for an arbitrary signal.  However, for signals which exhibit
special structure, it may be possible to recover the frequency components
beyond the limit imposed by the sampling theorem.

Previous works concentrated mostly on bandwidth extrapolation for speech
signals.  We want to explore the possibility of bandwidth extrapolation of
audio signals in general.

II) Objective:
Given an audio signal bandlimited to 16 kHz, our objective is to estimate
the high frequency components (8-16 kHz) from the low frequency components
(0-8kHz).  We may revise our definition of the high/low frequency division
according to need.

III) Experimental Setup:
Our experiment will involve the use of wideband (16 kHz) training data,
which we will use to optimize a variety of models which are described below.
To better understand our results, we intend to initially work with a simple
audio signal produced by a single instrument. 

After training, we will test each model on new data (still from the same
instrument). The test data will naturally be bandlimited to 8 kHz, and the
resulting high frequency estimate will be judged according to its MSE and
its perceptual quality.

IV) Proposed Solutions:
Each model will consist of two main parts: preprocessing by LOT(MDCT) and
estimation of the high frequency components.

 1) LOT (Lapped Orthogonal Transform) or MDCT (Modified Discrete Cosine Transform)
We decided to use the LOT to prevent the blocking artifacts which would arise
from windowing the original signal.  We will implement the LOT as described
in [6].

 2) Regeneration of High Frequency Components
Given the lack of previous work in bandwidth extrapolation for audio signals,
we decided to test two relatively simple linear approaches.

    a) Linear estimation in the frequency domain
       Each high frequency estimate is a weighted combination of the low
       frequency components from the same window.

    b) Principal Components Analysis (PCA)
       To apply PCA, we require stationarity of the signal.  Since windowing
       an audio signal produces a quasi-stationary signal and the LOT
       preserves quasi-stationarity, PCA is applicable on the LOT coefficients.
       Better results may possibly be attained by dividing the windows of the
       audio signal into different classes. We would then perform PCA
       separately on each class and test for improved performance.


V) Possible Extensions
It is interesting to note that the speech-only bandwidth extrapolation
systems operated poorly on general audio signals. Furthermore, one paper [9]
states that the high frequency components of speech could not be
satisfactorily estimated without side information. For these reasons, it is
possible that both linear estimation and PCA will produce unsatisfactory
results. As a result, we think the following extensions to be worth
mentioning, although they may lie outside the scope of a 1-quarter project.

 1) Nonlinear Methods
    There exists a plethora of nonlinear bandwidth extrapolation methods for
    speech signals. The applicability of these methods for audio signals is
    uncertain.

 2) Masking
    The effects of masking have been widely exploited in audio coding and
    are of great importance to that field.  In all previous approaches to
    bandwidth extrapolation, however, the applicability of masking has yet
    to be explored. 

VI) Workplan (beginning with the week of Feb 5)

week 1: 
1) Evaluate appropriateness of the LOT, i.e. quantify the correlation of
   the LOT coefficients in the test data. 
2) Implement, and test linear estimation in the frequency domain. 

week 2:
1) Fine tune linear estimation experiment. This entails adjusting the
   parameters involve (number of weights, etc), and also selecting a
   variety of test data.
2) Implement and test PCA.

week 3:
1) Fine tune PCA experiment
2) Compare results.

week 4:
1) prepare writeup and presentation.

VII) References

1. C. Avendaro, H. Hermansky & E.A. Wan, "Beyond Nyquist: towards to recovery
of broad-bandwidth speech from narrow bandwidth speech," in Proc. Eurospeech,
(Madrid), pp. 165-168, 1995.

2. M. Bosi, "EE 367C course reader." [perceptual audio coding]

3. Y.M. Cheng, D. O'Shaugnessy & P. Mermelstein, "Statistical recovery of
wideband speech from narrowband speech," IEEE Trans. Speech Audio Process.,
vol. 2, pp. 544-548, 1994.

4. J. Epps & W.H. Holmes, "A new technique for wideband enhancement of coded
narrowband speech," Proc. IEEE Speech Coding Workshop (Porvoo Finland), 1999,
pp. 174-176.

5. S. Gustafsson, P. Jax & P. Vary, "A Novel Psychoacoustically Motivated
Audio Enhancement Algorithm Preserving Background Noise Characteristics,"
Proceedings of the IEEE International Conference on Acostics, Speech, and
Signal Processing, Vol. 1, pp. 397-400, 1998.

6. H. Malvar & D.H. Staelin, "The LOT: transform coding without blocking
effects", IEEE Trans. Acoust., Speech, Signal Processing, vol. 37, Apr. 1989.

7. P. Noll, "Digital audio coding of visual communications," Proceedings of
the IEEE, Vol. 83 6, pp. 925-943, June 1995.

8. T. Painter & A. Spanias, "Perceptual coding of digital audio."

9. J. Valin & R. Lefebvre, "Bandwidth extension of narrowband speech for
low bit-rate wideband coding."