Project Proposal Title : Bandwidth Extrapolation of Audio Signals Name : Sung-Won Yoon, David Choi I) Problem : Sampling theorem states that the sampling rate limits the frequencies that can be recovered for an arbitrary signal. However, for signals which exhibit special structure, it may be possible to recover the frequency components beyond the limit imposed by the sampling theorem. Previous works concentrated mostly on bandwidth extrapolation for speech signals. We want to explore the possibility of bandwidth extrapolation of audio signals in general. II) Objective: Given an audio signal bandlimited to 16 kHz, our objective is to estimate the high frequency components (8-16 kHz) from the low frequency components (0-8kHz). We may revise our definition of the high/low frequency division according to need. III) Experimental Setup: Our experiment will involve the use of wideband (16 kHz) training data, which we will use to optimize a variety of models which are described below. To better understand our results, we intend to initially work with a simple audio signal produced by a single instrument. After training, we will test each model on new data (still from the same instrument). The test data will naturally be bandlimited to 8 kHz, and the resulting high frequency estimate will be judged according to its MSE and its perceptual quality. IV) Proposed Solutions: Each model will consist of two main parts: preprocessing by LOT(MDCT) and estimation of the high frequency components. 1) LOT (Lapped Orthogonal Transform) or MDCT (Modified Discrete Cosine Transform) We decided to use the LOT to prevent the blocking artifacts which would arise from windowing the original signal. We will implement the LOT as described in [6]. 2) Regeneration of High Frequency Components Given the lack of previous work in bandwidth extrapolation for audio signals, we decided to test two relatively simple linear approaches. a) Linear estimation in the frequency domain Each high frequency estimate is a weighted combination of the low frequency components from the same window. b) Principal Components Analysis (PCA) To apply PCA, we require stationarity of the signal. Since windowing an audio signal produces a quasi-stationary signal and the LOT preserves quasi-stationarity, PCA is applicable on the LOT coefficients. Better results may possibly be attained by dividing the windows of the audio signal into different classes. We would then perform PCA separately on each class and test for improved performance. V) Possible Extensions It is interesting to note that the speech-only bandwidth extrapolation systems operated poorly on general audio signals. Furthermore, one paper [9] states that the high frequency components of speech could not be satisfactorily estimated without side information. For these reasons, it is possible that both linear estimation and PCA will produce unsatisfactory results. As a result, we think the following extensions to be worth mentioning, although they may lie outside the scope of a 1-quarter project. 1) Nonlinear Methods There exists a plethora of nonlinear bandwidth extrapolation methods for speech signals. The applicability of these methods for audio signals is uncertain. 2) Masking The effects of masking have been widely exploited in audio coding and are of great importance to that field. In all previous approaches to bandwidth extrapolation, however, the applicability of masking has yet to be explored. VI) Workplan (beginning with the week of Feb 5) week 1: 1) Evaluate appropriateness of the LOT, i.e. quantify the correlation of the LOT coefficients in the test data. 2) Implement, and test linear estimation in the frequency domain. week 2: 1) Fine tune linear estimation experiment. This entails adjusting the parameters involve (number of weights, etc), and also selecting a variety of test data. 2) Implement and test PCA. week 3: 1) Fine tune PCA experiment 2) Compare results. week 4: 1) prepare writeup and presentation. VII) References 1. C. Avendaro, H. Hermansky & E.A. Wan, "Beyond Nyquist: towards to recovery of broad-bandwidth speech from narrow bandwidth speech," in Proc. Eurospeech, (Madrid), pp. 165-168, 1995. 2. M. Bosi, "EE 367C course reader." [perceptual audio coding] 3. Y.M. Cheng, D. O'Shaugnessy & P. Mermelstein, "Statistical recovery of wideband speech from narrowband speech," IEEE Trans. Speech Audio Process., vol. 2, pp. 544-548, 1994. 4. J. Epps & W.H. Holmes, "A new technique for wideband enhancement of coded narrowband speech," Proc. IEEE Speech Coding Workshop (Porvoo Finland), 1999, pp. 174-176. 5. S. Gustafsson, P. Jax & P. Vary, "A Novel Psychoacoustically Motivated Audio Enhancement Algorithm Preserving Background Noise Characteristics," Proceedings of the IEEE International Conference on Acostics, Speech, and Signal Processing, Vol. 1, pp. 397-400, 1998. 6. H. Malvar & D.H. Staelin, "The LOT: transform coding without blocking effects", IEEE Trans. Acoust., Speech, Signal Processing, vol. 37, Apr. 1989. 7. P. Noll, "Digital audio coding of visual communications," Proceedings of the IEEE, Vol. 83 6, pp. 925-943, June 1995. 8. T. Painter & A. Spanias, "Perceptual coding of digital audio." 9. J. Valin & R. Lefebvre, "Bandwidth extension of narrowband speech for low bit-rate wideband coding."