Audio Content Analysis

ASIP-NET seminar on Audio Content Analysis


To register for this event you must be logged in as a member of ASIP-NET.

Time and place

June 3, 2008, 11.00-16.15
Aalborg University, Niels Jernes Vej 14, Room 4.111, Aalborg, Denmark

Opens external link in new windowMap

Opens external link in new windowDirections from Aalborg Airport


Free participation in tutorial lectures June 4, 2008

Persons registered for this ASIP-NET seminar are invited to participate Tutorial Lectures of the Speech Analysis and Processing for Knowledge Discovery Workshop, June 4, 10:00-12:30

  • Phonetic perspectives on modelling information in the speech ssignal by Professor Sarah Hawkins, Department of Linguistics, University of Cambridge, UK
  • New paradigms for speech analysis and processing: the source-filter model revisited and gesture-controlled analysis-by-synthesis by Professor Christoph d’Alessandro, CNRS-LIMSI, Orsay, France

Registered persons are also invited for the lunch on June 4 12:30-13:30. For full registration please to the workshop taking place June 4-6, 2008, please visit the workshop homepage at


Opens internal link in current windowDownload lecture slides from the file archive




Welcome by Professor Søren Holdt Jensen, Aalborg University


Musical audio analysis using sparse representations

Reader Dr. Mark Plumbley, Dept. of Electronic Engineering, Queen Mary University, London, United Kingdom

Abstract: The method of "sparse representations", based on the idea that observations  should be represented by only a few items chosen from a large number of  possible items, has emerged recently as an interesting approach to the analysis  of images and audio.  New theoretical advances and practical algorithms mean  that the sparse representations approach is becoming a powerful signal processing and analysis method. In this talk, I will introduce some of  the key concepts behind sparse representations, and describe some of our work  using sparse representations in musical audio applications such as automatic  music transcription and musical audio source separation.

Biography: Dr Mark Plumbley is Reader in Signal Processing in the Centre for Digital Music at Queen Mary University of London. His research concerns the analysis and processing of musical audio signals, using a wide range of signal processing techniques, including independent component analysis (ICA) and sparse representations. Dr Plumbley is Chair of the International ICA Steering Committee, a member of the of the IEEE Machine Learning in Signal Processing Technical Committee, and an Associate Editor for IEEE Transactions on Neural Networks. He was General Chair and host of the 7th International Conference on Independent Component Analysis and Source Separation (ICA 2007), held at Queen Mary in September 2007.




The ISP toolbox and a tempo-insensitive distance measure for cover song identification based on chroma features

Ph.D. student, Jesper Højvang Jensen, Aalborg University

Abstract: We present a distance measure between audio files designed to identify cover songs. The distance measure is based on the chromagram, from which it extracts a signature that compactly describes a song and is insensitive to changes in instrumentation, tempo and time shifts.

Furthermore, we introduce the ISP toolbox that among others contain MATLAB code for the above-mentioned algorithm.


Discovering Music Structure via Similaritiy Fusion

Ph.D., Anders Meng, Oticon A/S and DTU Informatics

Abstract: Automatic methods for music navigation and music recommendation exploit the structure in the music to carry out a meaningful exploration of the “song space”. To get a satisfactory performance from such systems, one should incorporate as much information about song similarity as possible; however, how to do so is not obvious.

In this talk we present an idea for combining different similarity measures based on the Probabilistic Latent Semantic Analysis (PLSA).

In this framework, songs will be projected to a relatively low dimensional space of “latent semantics”, which explains relatively well the different music similarities. The suitability of the PLSA model for representing music structure is studied in a simplified scenario consisting of 4412 songs and two similarity measures among them (text and music genre).






Coffee break and networking


Content-based search in broadcast news

Ph.D. Student, Lasse L. Mølgaard, DTU Informatics

Abstract: The ever-growing amount of multimedia data in the Internet requires a deeper analysis of the content to enable meaningful search in these data sources. This talk will present the Castsearch demo for search in audio broadcasts, more specifically news shows.

News broadcasts are made searchable by extracting the clear structure imposed by the format. Classifying the parts of a show as either music and speech, and identifying changes between speakers gives a coarse structure that can help in the search for a specific clip. We further applied a speech recognition engine on the audio files to obtain transcripts. These transcripts are far from perfect, but using Latent Semantic Analysis-techniques shows that it is possible to infer general concepts, that help in finding relevant clips from the news shows.

Demo: A demonstration of the system using CNN podcasts can be found at:


Demonstration of ISOUND music search engines

Assoc. Prof. Jan Larsen, DTU Informatics

Abstract: I will demo two music search engines developed in the Intelligent Sound projet. Muzeeker ( uses Wikipedia as a content resource to organize music search and provides additional annotations and links. The MIRocket (Music Information Rocket) uses audio features and other meta data to automatically recommend songs in a playlist.


Wrap up

The seminar is supported by the Signal Processing Chapter of the IEEE Denmark Section


Member Comments

no comments given