Published September 20, 2018 | Version 1
Dataset Open

UnmixDB: A Dataset for DJ-Mix Information Retrieval

  • 1. IRCAM

Description

A collection of automatically generated DJ mixes with ground truth, based on creative-commons-licensed freely available and redistributable electronic dance tracks.

In order to evaluate the DJ mix analysis and reverse engineering methods, we created a dataset of excerpts of open licensed dance tracks and automatically generated mixes based on these.

Each mix is based on a playlist that mixes 3 track excerpts beat-synchronously, such that the middle track is embedded in a realistic context of beat-aligned linear cross fading to the other tracks.
The first track's BPM is used as the seed tempo onto which the other tracks are adapted.

Each playlist of 3 tracks is mixed 12 times with combinations of 4 variants of effects and 3 variants of time scaling using the treatments of the sox open source command-line program [http://sox.sourceforge.net].

Each track excerpt contains about 20s of the beginning and 20s of the end of the source track. However, the exact choice is made taking into account the metric structure of the track. The cue-in region, where the fade-in will happen, is placed on the second beat marker starting a new measure, and lasts for 4 measures.  The cue-out region ends with the 2nd to last measure marker. We assure at least 20s for the beginning and end parts. The cut points where they are spliced together is again placed on the start of a measure, such that no artefacts due to beat discontinuity are introduced.

The UnmixDB dataset contains the ground truth for the source tracks and mixes in ASCII label format with tab-separated columns starttime, endtime, label.
For each mix, the start, end, and cue points of the constituent tracks are given, along with their BPM  and speed factors.
We use the convention that the label starts with a number indicating which of the 3 source tracks the label refers to.

The song excerpts are accompanied by their cue region and tempo information in .txt files in table format.

Additionally, we provide the .beat.xml files containing the beat tracking results for the full tracks available from Sonnleitner et. al. 2016.

Our DJ mix dataset is based on the curatorial work of Sonnleitner et. al. (ISMIR 2016), who collected Creative-Commons licensed source tracks of 10 free dance music mixes from Mixotic. We used their collected tracks to produce our track excerpts, but regenerated artificial mixes with perfectly accurate ground truth.

The code used to create the dataset from the above is published at https://github.com/Ircam-RnD/unmixdb-creation, such that other researchers can create test data from other track collections or in other variants.

 

Files

mixotic-set044-excerpts.zip

Files (4.2 GB)

Name Size Download all
md5:eadf9fb3b6ed7dad72db72acc111c66a
889.9 MB Preview Download
md5:2d42608a742cfbe61ad0524724b2de1f
752.5 MB Preview Download
md5:423d62a8c2659507d6b964ac4ec2bffc
85.1 MB Preview Download
md5:8416257ceac136931b681237e627c2a7
838.0 MB Preview Download
md5:50c184f70bec11f33b519a44ee24a671
550.2 MB Preview Download
md5:3e5c5590ec2b33529034de5497bd9fa5
1.1 GB Preview Download
md5:9f16be56fb18d642df9693c87436ac93
3.5 kB Preview Download
md5:3ed6ca036311d8455ec86d7ff101e313
130.9 kB Preview Download

Additional details

Funding

ABC DJ – Artist-to-Business-to-Business-to-Consumer Audio Branding System 688122
European Commission