The Department of Computational Perception at the Johannes Kepler University Linz was founded in October 2004, with the appointment of Prof. Gerhard Widmer. Its mission is to develop computational models and algorithms that permit computers to perceive and “understand” aspects of the external world, where we interpret “perception” in the widest sense of the word, as the extraction of useful high-level information from complex, possibly low-level data, including text, audio, video, image, and sensor data.
While the research carried out at the department is highly interdisciplinary, connecting fields like machine learning, music understanding, signal processing, and web mining, the area of music information retrieval (MIR) has always been a focus. In the following, a selection of our research directions related to music retrieval and recommendation is described and corresponding prototype applications are presented. Supporting videos are available from our YouTube channel.
Audio content-based search, retrieval, and browsing
The development of highly efficient music feature extractors from audio, which describe for instance aspects of timbre or rhythm, is one of our core research topics. Given such features, we then elaborate approaches to automatic playlist generation, browsing music collections, and similarity-based search and retrieval, among others. Our algorithms have shown superior results in the audio music similarity and retrieval task of the annual Music Information Retrieval Evaluation eXchange (MIREX) competition several times. A prototype application for similarity-based music retrieval is the “Wolperdinger” music search engine, developed by Dominik Schnitzer, a former PhD student. This interface facilitates retrieving songs most similar to the currently played one, in terms of audio similarity. To this end, the user just needs to click on the seed song and is immediately presented a new playlist of alike songs.
A prototype for browsing collections of music that is potentially unknown to the user is the “nepTune” interface. Given an arbitrary collection of digital music files, “nepTune” performs content-based clustering to create a virtual island landscape which allows the user to freely navigate in this collection and hear the closest sounds via a surround sound system. A demo video is available here. This research area is led by Peter Knees.
Music identification from audio and from performance
Exploiting other kinds of audio features, we further elaborate methods to automatically segment an audio stream and identify music pieces therein, even when the input signal is distorted or exhibits changes in tempo. To this end, Reinhard Sonnleitner develops tempo- and pitch-invariant fingerprinting methods and systems that are able to correctly identify music tracks from a short audio recording even in very noisy environments like live mixes by DJs, where tempo and pitch usually are manipulated in various ways. In this context, music segmentation techniques, which we research too, play an important role to detect tracks in audio streams.
Furthermore, similar techniques can be used in the domain of classical music, where the goal is to identify the piece that is being performed based on a database of sheet music. Here, we mainly focus on live piano performances and develop methods that are able to achieve correct identification within a few seconds. Coupled with a score following algorithm, this technology can be used as a versatile music identification and tracking system. Without the need to look for the score sheet, a musician can simply sit down at the piano and play a few bars. The system identifies the piece, finds the correct position, tracks the progress over time, and even turns the pages automatically. A demo video is available here. Research in this direction, including working prototype applications, is conducted by Andreas Arzt.
User-aware recommendation and playlist generation
Taking into account contextual aspects of the listener is vital to create user-aware models of listening behavior. We develop and use such models to specifically address the tasks of automatic playlist adaptation, music recommendation, and music browsing, according to the listener’s current preferences. These preferences are influenced by a variety of factors, such as time, weather, activity, location, and social context (alone, with friends or family). A person might want to listen to an agitating rock song when doing sports, for instance, but prefer some relaxing reggae music when being at the beach on a sunny day. This line of research is directed by Markus Schedl.
Playlist generation on smart phones
Given today’s wide availability of smart mobile devices, we implemented one of our approaches to automatic playlist generation and adaptation in an Android application, called “Mobile Music Genius“. This music player monitors more than 100 aspects of the listener’s context while she interacts with the player. It learns and constantly improves models that describe the relationship between the user context and her preferred artist or song. Feedback, such as play, pause, or skip events is used to infer where a user likes or dislikes an item.
Playlists can be created manually or automatically by defining a seed song and some properties of the playlist (e.g., number of songs or whether songs by the seed artist should be included). In the latter case, the playlist is populated with songs most similar to the seed, which are sought using a model of tag-based similarity.
During playback, the player continuously monitors the user context and compares it to the temporally preceding context. Once the discrepancy between the two exceeds a sensitivity threshold, a playlist update is triggered after the current song. In this case, the new context information is fed into a classifier trained on the user’s previous context and music preference data. The classifier then outputs a list of songs that were listened to in similar context, which are in turn added to the playlist.
Geospatial music recommendation from social media data
While our research on user-aware playlist generation takes into account a wide variety of contextual listener aspects in a relatively small amount (a few thousands records), we also exploit the abundance of information that is present in social media data (hundreds of millions of records), even though much fewer aspects of the user context are directly available. As our focus is on MIR, we concentrate on Last.fm and Twitter, from which we mine music- and listener-related information. This work already resulted in two data sets available for research, i.e., “MusicMicro” and “Million Musical Tweets Dataset“. In addition to the posting text, tweets are frequently attached GPS information, which enable interesting research tasks, such as geospatial music recommendation, interfaces to explore music listening behavior around the world, and music popularity estimation.
Using the aforementioned data sources, in particular, information about listening events and listener’s characteristics, we elaborate methods to enhance music recommendation algorithms: content-based, collaborative filtering, and hybrids. We are particularly interested in hybrid fusion schemes and in analyzing the influence of user characteristics on the performance of music recommendation algorithms.
Browsing interfaces to explore listening behavior
Another related task is the development of browsing interfaces to explore music listening events and music preferences on a worldwide scale. Such interfaces should be capable of analyzing differences between regions or countries. One of our prototypes is the “Music Tweet Map” web application, which allows its user to retrieve and to explore the listening events of microbloggers, by time, location, genre, artist, and track. It was developed and is maintained by David Hauger.