Semantic Web for Music
Pioneering research in the Centre for Digital Music (C4DM) has led to advances in extraction of semantic information from musical audio, and linking this information on the Semantic Web.
Our Ontologies (sets of concepts and relationships) such as Music Ontology have formed the basis of the BBC Programmes Ontology, used in the BBC “/programmes” website (2.5 million viewers / week). We have worked with the MusicBrainz open source music encyclopaedia to publish its information as linked data. Our ontologies have also been used by Libre.fm/GNU FM, the semantic web Upper Mapping and Binding Exchange Layer (UMBEL), and the Press Association.
Our research and the Ontologies we have developed has had an economic impact, an impact on public policy and services and an impact on society, culture and creativity. One of the main commercial assets of large-scale media content providers like the BBC is their massive and developing library of content. Semantic web technologies are key to enabling fine-grained real-time access to this. Our work is assisting the BBC to develop this access, in turn providing enhanced information to the public about television and sports. It has also enabled other music-based services. The Events ontology has been taken outside the music domain and used for news and sports and other applications.
C4DM has been one of the pioneers developing Linked Open Data technologies. Within SIMAC and OMRAS2 projects, we designed and published some of the first large-scale Semantic Web resources. Since 2007, our dbtune.org server has been publishing several key sets of music-related Linked Open Data derived from: last.fm music catalogue; BBC (playcounts and John Peel sessions); Musicbrainz open music encyclopaedia; MySpace; and Magnatune label. It delivers over 30G/month data traffic.
Impact on Public Policy and Services
BBCBusiness requirements at the BBC create the need for more sophisticated publishing and navigation strategies within the BBC’s journalism activities. This includes creation of more sophisticated information architectures with more topical indexes, the ability to merge data (e.g. sports statistics) with stories to create a more coherent output, or linking to external and other BBC Web sites. To help realize this, BBC Future Media and Technologies decided in 2008 to apply and extend Semantic Web technologies developed at C4DM, recruiting one of our team members to enable them to do so (Raimond).
The C4DM Music Ontology (including Event and Timeline ontologies) has supported improvements to web presence, navigation between broadcast brands, content management, information sharing with communities and journalism. Previously, websites of specific programmes and broadcast brands that do not naturally link would be entirely separate, but these ontologies allow links to be easily created for cross-domain navigation. The ontologies also allow data to be made available to 3rd party developers e.g. via “BBC backstage”. They also allow community-curated data to be linked to BBC information, enhancing programme websites with additional relevant information. For example, the BBC Music website uses these ontologies to link data from the MusicBrainz community encyclopaedia (see below), providing additional information about artists currently played on BBC media outlets, and provide recommendations using cultural information available on the Semantic Web. This BBC Music service was announced in late 2008 [Im7].
The bbc.co.uk/music page where play frequency, programme, news and event data are combined using the ontologies developed with the support of C4DM
The new “BBC Programmes” website (Fig 1) describes and links the content of 1,000 to 1,500 programmes in the BBC’s daily output, and has some 2.5 million users per week. To support this, the BBC created its Programmes Ontology [Im8], based on the Music Ontology, and using Event and Timeline ontologies [Im1, Im9].
The “Event Ontology” was used to underpin the experimental “Mythology Engine” which allows people to explore BBC dramas, and is used in a range of sporting events such as the BBC World Cup 2010 website, to improve journalism and coverage of the 2010 Winter Olympics, and was used as the ontological backbone for the Olympics 2012 website which averaged 7.1M daily online unique browsers [Im1].
MusicBrainz is a community-maintained open source encyclopaedia of music information. Through a JISC-funded project (Linked Music Metadata, PI: Dixon, 2010-2011 £94,894), we worked with MusicBrainz to publish its database as Linked Data [Im2]. MusicBrainz now provides its information as RDF linked data on each of their pages, a very large dataset containing some 23.8GB of NTriples, and about 180M assertions. This enables third party developers to access music metadata in a machine readable format from the browser. To illustrate usage, MusicBrainz is set to a global limit of 300 requests per second, and will decline requests beyond that.
Impact on Society, Culture and Creativity
Libre.fm and GNU FMLibre.fm ( http://libre.fm/ ) is a community-focussed, on-line radio station similar to Last.fm, powered by free software GNU FM. Its users can request information on artists, albums or tracks using an application programming interface that returns a “Semantic Graphs” (items and relationships) which use the Music Ontology and Event Ontology; RDF data is embedded in its web pages [Im10, Im3]. Libre.fm has over 100,000 users with some 72M tracks recorded [Im10].
As the semantic web has grown, a need for coordination has emerged. The Upper Mapping and Binding Exchange Layer (UMBEL) ( http://umbel.org/ ) is a top-level ontology, a reference structure of 28,000 concepts and base vocabulary designed to help semantic content interoperate on the Web. This has been developed by the technology consulting companies Structured Dynamics LLC ( http://structureddynamics.com/ ) and Ontotext AD ( http://www.ontotext.com/ ) with the aim to serve as the primary means for negotiating meaning between domain and task-specific ontologies currently in use by the Linked Data community. The Music Ontology and Event Ontology are two “Linked External Ontologies” used in this layer [Im4].
Press Association Semantic News Platform (SNaP) The Press Association, producer of 40% of UK news, working with Ontoba ( http://www.ontoba.com/ ), a consultancy specialising in digital media technical and semantic web publishing architectures, developed a set of ontologies to represent news assets and their relationships to the real world in a semantic news platform. The ontology used for news events inherits from the C4DM Events Ontology. [Im5]
Impact on Economy
The economic impact of this work is through MusicWeb, a service under commercial evaluation provided by Academic Rights Press to music libraries around the world, accessible to students and scholars in music, social science and business. MusicWeb finds links between music artists in intuitive new ways using Semantic Web technologies and music related ontologies developed in the OMRAS2 project, and originally trialled as catfishsmooth.net. Though still in beta release, Academic Rights Press has 50 customer institutions with about 12,500 users. [Im6]
This Case Study focuses on research undertaken in the Centre for Digital Music (C4DM) into Semantic Web technologies for Music Informatics. C4DM research into Music Informatics is grounded in the use of digital signal processing (DSP) for extracting features from musical audio. In 1998 we published one of the field’s earliest papers, on automatic music genre analysis [R1]. This led to a JISC/NSF Digital Libraries co-funded project, OMRAS (On-line Music Recognition and Search, www.omras.org, 1999-2003) with Sandler, Oxford University and University of Massachusetts (Amherst). Sandler (Professor of Signal Processing) and other key researchers (Bello, Reiss) and academics (Plumbley) moved to Queen Mary in 2001. The OMRAS project (now at Queen Mary and Goldsmiths) pioneered the use of audio queries to search symbolic music databases, summarized in [R2]. It also established the ISMIR (International Symposium on Music Information Retrieval) conference series (ismir2000.ismir.net), which grew to over 260 delegates in 2012, itself leading to the establishment of an international society.
The EU FP6 SIMAC project (mtg.upf.edu/static/semanticaudio, 2004-2006) was one of the first major European Music Informatics projects. C4DM was responsible for developing and defining feature extraction algorithms, including rigidly defined structured semantics for audio features. This led to a major EPSRC ICT “Large Grant” project OMRAS2 (http://www.omras2.org, 2006-2010, £2.2M). The SIMAC and OMRAS2 projects developed and pioneered the use of Semantic Web technologies for music and other media content, both through the development and release of “Ontologies” (defined sets of concepts and relationships) such as the Music Ontology and Event Ontology, [R3] and the provision of some of the first Open Linked Data servers in the field [R4].
Led by Sandler and Plumbley a wide range of ontologies have been developed at C4DM by group members since 2007, including: Music Ontology (Raimond, Abdallah, Jacobson, Fazekas); Fundamental ontologies such as Event and Timeline Ontologies (Abdallah, Raimond); Audio Features Ontology (Raimond, Pastor Escuerdo, Cannam, Jacobson, Fazekas et al); Similarity Ontology (Jacobson, Raimond); Studio Ontology (Fazekas); Temperament Ontology (Fazekas, Tidhar); Audio Effects Ontology [R5] (Wilmering, Fazekas); Instrument Ontology [R6] (in progress: Kolozali, Fazekas, Barthet).
Tools developed by C4DM group members utilising Semantic Web technologies in Music Informatics research include: Sonic Visualiser (Cannam), which reads and writes Music Ontology RDF (Resource Description Framework); Sonic Annotator (Levy, Cannam), a batch annotation tool, which produces RDF output using the Music Ontology; SAWA (Fazekas), a Web based demonstrator of Semantic Web technologies; Hotttabs (Anglade, Barthet, Fazekas, Kolozali, Macrae), a guitar tab and video tool which uses the Music Ontology.
Isophonics.net is the home for software and data resources provided publically by C4DM as part of its open source research policy. For the Semantic Web these resources include DBTune ( http://dbtune.org/ ), which provides access to music-related structured data, in a Linked Data fashion. It hosts SPARQL end-points exposing interlinked music related data from Magnatune, Jamendo, The BBC John Peel sessions, Last-FM, MySpace and MusicBrainz. Our list of SPARQL end-points is at http://ismir2009.grasstunes.net/taxonomy/term/5. Also, the “Reference Annotations” of ground truth data from C4DM at http://isophonics.net/datasets are also available in Music Ontology RDF.
- [R1] T. Lambrou, P. Kudumakis, R. Speller, M. Sandler, A. Linney. Classification of audio signals using statistical features on time and wavelet transform domains. Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, 12-15 May 1998, Volume: 6, 3621-3624, Seattle, WA, USA, ISSN: 1520-6149, ISBN: 0-7803-4428-6: GS 130 Citations.
- [R2] J. Pickens, J.P. Bello, G. Monti, M. Sandler, T. Crawford, M. Dovey & D. Byrd, Polyphonic score retrieval using polyphonic audio queries: A harmonic modeling approach, Journal of New Music Research, 2003, Vol. 32, Number 2, 223-236. GS 92 Citations
- [R3] Y. Raimond, S.A. Abdallah, M.B. Sandler & F.Giasson, The Music Ontology, 8th International Symposium for Music Information Retrieval (ISMIR-07), Vienna (Austria) Sept. 2007, p417-422. GS 117 Citations
- [R4] G. Fazekas, Y. Raimond, K. Jacobson & M. Sandler. An overview of semantic web activities in the OMRAS2 project. Journal of New Music Research. Volume 39, Issue 4 (Special Issue on Music Informatics and the OMRAS2 Project), 2010 GS 14 Citations
- [R5] T. Wimering, G. Fazekas, and M. Sandler. (2011) Towards ontological representations of digital audio effects. 14th Int. Conference on Digital Audio Effects (DAFx-11), Paris, France, September 19-23, 2011.
- [R6] S. Kolozali, M. Barthet, G. Fazekas, M. Sandler, Automatic ontology generation for musical instruments based on audio analysis, IEEE Trans. on Audio, Speech and Language Processing, 21(10), p. 2207-2220, 2013.
- [Im1] Press release at: http://www.bbc.co.uk/mediacentre/latestnews/2012/sport-online-figures.html
- [Im2] For LinkedBrainz project description see: http://wiki.musicbrainz.org/LinkedBrainz
- [Im3] http://foocorp.org/projects/fm/developers/
- [Im4] http://umbel.org/specifications/annexes/annex
- [Im5] http://www.ontoba.com/blog/pressnet-news-ontology
- [Im6] http://www.academiccharts.com/assets/im/contents/MusicWeb Overview.pdf
- [Im7] Raimond, Y.: “Music Ontology linked data on BBC.co.uk/music” DBTune Blog 28 Jul 2008. http://blog.dbtune.org/post/2008/07/28/Music-Ontology-linked-data-on-BBCcouk/music
- [Im8] Programmes Ontology: http://www.bbc.co.uk/ontologies/programmes/2009-09-07.shtml
- [Im9] Raimond, Y.; Scott, T.; Oliver, S.; Sinclair, P. & Smethurst, M. Use of Semantic Web technologies on the BBC Web Sites. In Wood, D. (Ed.): Linking Enterprise Data, Springer US, 2010, 263-283. doi: 10.1007/978-1-4419-7665-9_13
- [Im10] Libre.fm statistics: http://librefm.wordpress.com/2013/03/31/72-million-songs-later/ Overview: http://wiki.musicontology.com/index.php/Social_Music_Network_-_Libre.fm