Jekyll2018-11-20T01:46:40+00:00http://jblsmith.github.io/JORDAN B. L. SMITHPersonal websiteGetting SALAMI from YouTube2018-11-19T00:00:00+00:002018-11-19T00:00:00+00:00http://jblsmith.github.io/Getting-SALAMI-from-YouTube<p>In 2011, a team of us at McGill released the <a href="https://github.com/DDMAL/salami-data-public">SALAMI dataset</a> of structural annotations of lots of music; it was the largest dataset of its kind at the time, and still is.
Unfortunately, it has never been easy for other researchers to obtain the audio data: we can provide all the metadata required to identify the tracks, but we don’t own the music so we can’t sell it.</p>
<p>However, this year at ISMIR, after chatting with yet another industry researcher who hoped I could share the audio, a workaround occurred to me: why not let YouTube share the audio?
If I could confirm which SALAMI tracks were available in which YouTube videos, then others could download the audio themselves.</p>
<p>Over the past two weeks I have put together a quick project to do exactly that. It’s still in progress, but with a few simple searches and fingerprinting efforts, I managed to find matches for at least half the audio in SALAMI.</p>
<p><a href="https://github.com/jblsmith/matching-salami">Visit the project repository on GitHub here.</a></p>
<p>Some notes on how it works:</p>
<h3 id="step-1-fingerprinting">Step 1: Fingerprinting</h3>
<p>I used <a href="https://github.com/dpwe/audfprint">Dan Ellis’ audfprint package</a> using the default settings to do the fingerprinting.</p>
<p>Using audfprint, I made a database of all the public SALAMI tracks (which is currently 7/8ths of the total annotated set). <a href="https://github.com/jblsmith/matching-salami/blob/master/salami_public_fpdb.pklz">This database is part of the repo</a>, so anyone can check an audio file in their possession against it using audfprint to confirm that they have the correct audio.</p>
<ul>
<li>audfprint uses the standard Shazam algorithm, which I knew would miss versions that had been time-stretched or pitch-shifted to avoid YouTube’s copyright detection algorithms. For example, it did not detect that <a href="https://www.youtube.com/watch?v=bXvMJzgP1OQ">this song</a> sounds like a perfect match for SALAMI song 20.</li>
<li>I briefly tested Joren Six’s Panako package, which promised to detect matches despite changes in tempo or pitch, but it did not appear to work as such out of the box.</li>
</ul>
<h3 id="step-2-querying-youtube">Step 2: Querying YouTube</h3>
<p>I used the <a href="https://developers.google.com/youtube/v3/quickstart/python">YouTube API library for python</a> to query YouTube using the artist, composer and title for every track. (For some tracks, “artist” or “composer” were unavailable.)</p>
<p>Then I used <a href="https://rg3.github.io/youtube-dl/">youtube-dl</a> to download the first search result whose length roughly matched the length of the audio file in the database (±20%).</p>
<ul>
<li>I first used <a href="https://github.com/nficano/pytube">pytube</a> to download youtube files, which is definitely lightweight and simple, but it ran into lots of errors downloading some files.</li>
<li>A nice advantage of youtube-dl is that it allowed me to specify a consistent post-processing routine; in my case, converting all the videos to 192kbps mp3 using ffmpeg.</li>
<li>To get the length of a local mp3 file, I used <a href="https://mutagen.readthedocs.io/en/latest/">mutagen</a>, which was very simple to use and which seems to not run into any errors due to variable bit rates — that wasn’t an issue for the audio on YouTube, but was for some of the local SALAMI audio files!</li>
<li>I guess I forgot that librosa has a <a href="https://librosa.github.io/librosa/generated/librosa.core.get_duration.html">get_duration method</a>!</li>
</ul>
<h3 id="step-3-matching-audio">Step 3: Matching audio</h3>
<p>All that remains is to use audfprint to query the database with the downloaded audio, and interpret the output to decide if the audio matches.</p>
<ul>
<li>This step might have been simpler if I had stuck with using <a href="https://dataset.readthedocs.io/en/latest/">dataset</a> to manage my list of downloaded youtube files, but I scrapped that, opting instead for a plaintext CSV file. Writing your own in-out routines leaves plenty of room for error — like accidentally overwriting all your work! — but has the advantage of dead-simple human editing. I wanted that, so that I could add YouTube IDs by hand for the system to download and check later.</li>
</ul>
<h3 id="results">Results</h3>
<p>After these 3 steps, I found matching audio on YouTube for:</p>
<ul>
<li>452 / 833 tracks from <a href="http://jmir.sourceforge.net/index_Codaich.html">Codaich</a></li>
<li>29 / 49 tracks from <a href="http://isophonics.net/datasets">Isophonics</a></li>
<li>0 / 100 tracks from <a href="https://staff.aist.go.jp/m.goto/RWC-MDB/">RWC</a></li>
</ul>
<p>I didn’t expect to find any of the RWC audio, and anyway, that’s <a href="https://staff.aist.go.jp/m.goto/RWC-MDB/#how_to_use">available for purchase directly from AIST</a>.</p>
<p>Matches were found for 3/5ths of the Isophonics data, but that is also easily purchased since it all derives from the Beatles catalogue and another 4 albums (comprising 7 discs).</p>
<p>Most importantly, the system found 54% of the Codaich audio, which I was pleased with for a first pass! I only had high hopes for finding the popular music, but actually there were plenty of matches in each genre class:</p>
<ul>
<li>158 / 210 popular tracks</li>
<li>138 / 205 jazz tracks</li>
<li>56 / 217 classical tracks</li>
<li>100 / 201 world music tracks</li>
</ul>
<h3 id="future-steps">Future steps</h3>
<p>The information is <a href="https://github.com/jblsmith/matching-salami">up on GitHub</a> as of today for anyone to use, but I have a few obvious next steps to take before I can call this project finished:</p>
<ol>
<li>Find the rest of the audio — perhaps by re-running the system but using additional metadata fields, like album title.</li>
<li>Add convenience scripts for others, to:
<ol>
<li>Download the audio from YouTube</li>
<li>Zero-pad / crop the audio to fit the timing of the SALAMI annotations.</li>
</ol>
</li>
</ol>In 2011, a team of us at McGill released the SALAMI dataset of structural annotations of lots of music; it was the largest dataset of its kind at the time, and still is. Unfortunately, it has never been easy for other researchers to obtain the audio data: we can provide all the metadata required to identify the tracks, but we don’t own the music so we can’t sell it.Tiling project2018-10-16T00:00:00+00:002018-10-16T00:00:00+00:00http://jblsmith.github.io/Tiling-project<p>I recently updated a hobby project of mine to Github. The goal was to make an image feed where all the images would have matching edges, but where these edges could evolve over time. I’d still like to tweak it and add new types of designs, but a version of it is finished!</p>
<p>As an exmple, here’s the beginning of the blog. All the edges between tiles have an alternating black/white pattern, but the blog starts off with all-white edges along the bottom and bottom-right edges.</p>
<p><img src="/images/blog_excerpt_beginning.png/" alt="Poster showing the Tokyo 2020 Games logos, designed by Asao Tokolo" /></p>
<p>New images are added to the top left of the feed, shifting all the other tiles along in a 3-column format, just like a certain popular photo-sharing social network—but since that service doesn’t have an API that allows robo-posting, I put the project on Tumblr instead! Please visit <a href="https://random-tiles.tumblr.com/">random-tiles.tumblr.com</a>.</p>
<p>I was inspired to work on this project when I learned about <a href="http://tokolo.com/">Asao Tokolo</a>’s work on tiling. Tokolo <a href="http://www.spoon-tamago.com/2016/04/26/who-is-asao-tokolo-the-designer-behind-tokyos-2020-olympic-emblem/">designed the winning pair of logos</a> for the Tokyo 2020 Olympic and Paralympic Games, but he is also known for creating a fun set of interlocking patterns which have appeared on <a href="https://tmagazine.blogs.nytimes.com/2009/01/09/the-post-materialist-a-patterns-math-magic/">ceramic tiles, fridge magnets, and more</a>. My hope is that some of the fun and beauty of Tokolo’s arabesque design is captured by my blog.</p>
<p>In the future, I would like to achieve a look closer to Tokolo’s designs by drawing actual arcs across each image instead of generating the images out of basic tiles, but, as <a href="http://tokolo.com/img/RespectForCompass.gif">the sketch atop Tokolo’s homepage</a> suggests, the simple, elegant look of his tiles hides lots of careful and subtle engineering and design work.</p>
<p>Links:</p>
<ul>
<li><a href="https://random-tiles.tumblr.com/">visit the blog</a> to see the images;</li>
<li><a href="https://github.com/jblsmith/tiling">visit Github</a> to view the code and to get a sense of how it was made.</li>
</ul>I recently updated a hobby project of mine to Github. The goal was to make an image feed where all the images would have matching edges, but where these edges could evolve over time. I’d still like to tweak it and add new types of designs, but a version of it is finished!Modeling time signature changes at HAMR2018-10-10T00:00:00+00:002018-10-10T00:00:00+00:00http://jblsmith.github.io/Modeling-time-signature-changes-at-HAMR<p><a href="https://labrosa.ee.columbia.edu/hamr/">HAMR (Hacking Audio and Music Research)</a> is a hackathon event that has been held many times since 2013, and as a satellite event to every ISMIR conference since 2014. I attended <a href="https://labrosa.ee.columbia.edu/hamr_ismir2018/">this year’s event</a> held at <a href="https://www.deezer.com/en/">Deezer</a> and really enjoyed it! To team up with others and try to solve a new research problem ASAP was exhilerating—it was almost like trying to solve a puzzle hunt.</p>
<p>I proposed a project to discover and model time signature changes and strange hypermeters—like in “Hey Ya!” by Outkast, where instead of getting 4 bars at a time (the most common hypermeter in popular music), we get 5½ bars at a time. That situation is actually a combination of strange hypermeter and a time signature change; we ultimately focused on just the second aspect, trying to visualize and detect time signature changes.</p>
<p>I worked with <a href="https://github.com/olivierlar">Olivier Lartillot</a> (who designed the famous <a href="https://www.jyu.fi/hytk/fi/laitokset/mutku/en/research/materials/mirtoolbox">MIRtoolbox</a>) and <a href="https://github.com/romi1502">Romain Hennequin</a>, a researcher at Deezer. The progress we made on the problem felt very exciting, and others at the event shared our enthusiasm, awarding us the “Best Research Direction” prize!</p>
<p>You can see more about the project at its github page: <a href="https://github.com/jblsmith/hypermeter">github.com/jblsmith/hypermeter</a></p>
<p>And on YouTube, you can <a href="https://youtu.be/u3IJ2CYw66I?t=485">watch me give a brief overview of the project</a> to the ISMIR audience.</p>HAMR (Hacking Audio and Music Research) is a hackathon event that has been held many times since 2013, and as a satellite event to every ISMIR conference since 2014. I attended this year’s event held at Deezer and really enjoyed it! To team up with others and try to solve a new research problem ASAP was exhilerating—it was almost like trying to solve a puzzle hunt.Invited talk on multi-dimensional music structure2018-06-27T00:00:00+00:002018-06-27T00:00:00+00:00http://jblsmith.github.io/Invited-talk-on-multi-dimensional-music-structure<p>Earlier this month, I was invited to give a talk at the <a href="https://eventum.upf.edu/19834/detail/erc-music-conference.html">European Music Research Conference</a>. I presented an overview of three recent research projects that all grew, in part, from a shared motivation: to understand musical structure—the way a piece of music is organized—not as a flat, one-dimensional, holistic phenomenon, but as a set of conflicting views of a piece, each view with its own rationale, the various rationales sometimes in conflict and sometimes in harmony each other.</p>
<p>You can watch <a href="https://eventum.upf.edu/19834/programme/european-research-music-conference.html">all the talks given at the conference online</a>, including my own:</p>
<iframe width="560" height="315" src="https://www.youtube.com/embed/zH7qw3tEydM" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen=""></iframe>
<p>The conference was delightful for the breadth of research on display. Besides the usual suspects—projects in music informatics, music perception and cognition, music theory, composition, performance, interface design, etc.—there were also projects in linguistics, medieval history, and even archaeology! (This last one refers to the talk on “rock art soundscapes”, which had nothing to do with Pink Floyd, and everything to do with the sonic properties of sites of prehistoric rock art.)</p>
<p>I’m especially grateful to the conference organizers (Xavier Serra and the <a href="https://www.upf.edu/web/mtg">Music Technology Group at UPF</a>) for labouring to put the videos online. My video was for a while blocked by YouTube on copyright grounds, due to the minute-long excerpt I used from the Paul Simon song “Can’t Run But”. Lesson learned: in the future I will definitely use sound examples from copyright-free music (or music from artists with less aggressive labels)!</p>Earlier this month, I was invited to give a talk at the European Music Research Conference. I presented an overview of three recent research projects that all grew, in part, from a shared motivation: to understand musical structure—the way a piece of music is organized—not as a flat, one-dimensional, holistic phenomenon, but as a set of conflicting views of a piece, each view with its own rationale, the various rationales sometimes in conflict and sometimes in harmony each other.Two articles accepted to ICASSP2018-02-20T00:00:00+00:002018-02-20T00:00:00+00:00http://jblsmith.github.io/Two-articles-accepted-to-ICASSP<p>In my last year at AIST, I worked on two projects related to nonnegative factorization, and both have been accepted to ICASSP! Even better, each will be delivered at an oral session.</p>
<p>The first one deals with a way to literally “de-compose” a song comprised of loops into source-separated tracks corresponding to each loop, as well as a map of which loops are activated, and when. To accomplish this, we propose a novel reconfiguration of the spectrogram into a “spectral cube”, which allows us to use nonnegative tensor factorization to model the song as a combination of note, rhythm and loop templates.</p>
<blockquote>
<p><strong>Details:</strong> “Nonnegative tensor factorization for source separation of loops in audio.” By Jordan B. L. Smith and Masataka Goto. In <em>Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).</em> 2018.</p>
<p><strong>Abstract:</strong> The prevalence of exact repetition in loop-based music makes it an opportune target for source separation. Nonnegative factorization approaches have been used to model the repetition of looped content, and kernel additive modeling has leveraged periodicity within a piece to separate looped background elements. We propose a novel method of leveraging periodicity in a factorization model: we treat the two-dimensional spectrogram as a three-dimensional tensor, and use nonnegative tensor factorization to estimate the component spectral templates, rhythms and loop recurrences in a single step. Testing our method on synthesized loop-based examples, we find that our algorithm mostly exceeds the performance of competing methods, with a reduction in execution cost. We discuss limitations of the algorithm as we demonstrate its potential to analyze larger and more complex songs.</p>
</blockquote>
<p>I advised on and helped write the second paper, which concerns a method of structure analysis devised by Tian Cheng. She decomposes self-similarity matrices (SSMs) in a clever way using non-negative matrix factor 2-D deconvolution (NMF2D). Enhancing the <em>stripes</em> of an SSM (which show repeated sequences) is generally easier than enhancing the <em>blocks</em> (which show homogenous repeated blocks)—but it is also easier to <em>interpret</em> the structure of a song from a block-enhanced SSM. Cheng proposes a way to use NMF2D to model a stripe-enhanced SSM as a set of “blocks” of repetition-types; her method thus combines the clarity of stripe structure and the ready interpretability of block structure.</p>
<blockquote>
<p><strong>Details:</strong> “Music structure boundary detection and labelling by a deconvolution of path-enhanced self-similarity matrix.” By Tian Cheng, Jordan B. L. Smith and Masataka Goto. In <em>Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).</em> 2018.</p>
<p><strong>Abstract:</strong> We propose a music structure analysis method that converts a path-enhanced self-similarity matrix (SSM) into a block-enhanced SSM using non-negative matrix factor 2-D deconvolution (NMF2D). With a non-negative constraint, the deconvolution intuitively corresponds to the repeated stripes in the path-enhanced SSM. Then the block-enhanced SSM is constructed without any clustering technique. We fuse block-enhanced SSMs obtained using different parameters, resulting in better and more robust results. Discussion shows that the proposed method can be a potential tool for analysing music structure at different scales.</p>
</blockquote>
<p>We have just submitted the camera-ready versions, and I’m really looking forward to presenting them at my first ICASSP.</p>In my last year at AIST, I worked on two projects related to nonnegative factorization, and both have been accepted to ICASSP! Even better, each will be delivered at an oral session.Bonjour, Ircam!2018-02-10T00:00:00+00:002018-02-10T00:00:00+00:00http://jblsmith.github.io/Bonjour-Ircam<p>As of February 1st, I have started a new position at <a href="https://www.ircam.fr/">Ircam</a> and <a href="http://www.centralesupelec.fr/">CentraleSupélec</a>. I am a post-doctoral researcher on a project called “<a href="http://dig-that-lick.eecs.qmul.ac.uk/">Dig That Lick</a>”, which will analyze melodic patterns in jazz music on a large scale.</p>
<p>It happens to be the second <a href="https://diggingintodata.org/">Digging Into Data</a>-funded project I’ve worked on: during my Master’s, I worked on the <a href="http://ddmal.music.mcgill.ca/research/salami">SALAMI project</a>, which was <a href="https://diggingintodata.org/awards/2009/project/structural-analysis-large-amounts-music-information">funded by the same scheme</a>.</p>
<p>The aim of Dig That Lick is to extract melodies from millions of jazz songs—especially the solo sections—and then to discover what “licks” or snippets of melody are reused most commonly. Then, using the metadata for all the songs—i.e., who played which solos, and who did they previously learn from or jam with—we hope to better understand how the jazz licks are invented and spread.</p>
<p>It all seems like a perfect analogy to genetics and genealogy (i.e., a genealonalogy): melodies, like DNA, can be thought of as strings of symbols, where each lick is a short substring, like a gene. Our goal is to determine which of these licks are important, and how they reproduced and were spread between musicians and styles.</p>
<p>I will be working one day a week at Ircam, which is housed in the Pompidou Centre:</p>
<div class="project_img"><a href="/images/pompidou-large.jpg"><img src="/images/pompidou-small.jpg" alt="Book cover" style="width: 400px; border-radius: 15px;" /></a></div>
<p>But most days I will be at CentraleSupélec, a grande école south of Paris that is part of the newly-agglomerated Université de Paris-Saclay. The campus has some new, truly gorgeous buildings, but is still a bit under construction:</p>
<div class="project_img"><a href="/images/supelec-pano-large.jpg"><img src="/images/supelec-pano-small.jpg" alt="Panorama of CentraleSupélec" style="width: 1200px; border-radius: 15px;" /></a></div>As of February 1st, I have started a new position at Ircam and CentraleSupélec. I am a post-doctoral researcher on a project called “Dig That Lick”, which will analyze melodic patterns in jazz music on a large scale.Farewell, AIST2017-12-15T00:00:00+00:002017-12-15T00:00:00+00:00http://jblsmith.github.io/Farewell-AIST<p>Today was my final day at AIST. I am moving to Paris to start a new post-doc at IRCAM! It’s been a wonderful three years here in Goto’s lab, and I am really grateful to everyone in the lab who helped make my time here so enjoyable.</p>
<p>I’m happy also with the projects I managed to completed! At a farewell presentation on Tuesday, I gave the lab a brief overview of them:</p>
<ul>
<li><a href="https://staff.aist.go.jp/jun.kato/CrossSong/">The CrossSong Puzzle</a></li>
<li><a href="/projects/music-video-classification/">YouTube Video Classification</a></li>
<li><a href="/projects/multi-part-pattern-analysis/">Multi-Part Pattern Description</a></li>
<li>Another project, hopefully to be presented at ICASSP next April</li>
</ul>
<p>Afterwards, we hurried along to 千年の宴 (“Thousand Year Feast”), which happens to be the same izakaya where my welcome party was held back in November 2014. Goto-san and the team offered me some lovely going-away gifts, a really touching card, and for dessert, a very delicious (and apt, given that I’m moving to Paris) éclair:</p>
<div class="project_img"><a href="/images/blog-aist_farewell_cake.jpg"><img src="/images/blog-aist_farewell_cake.jpg" alt="Farewell Eclair" style="width: 600px; border-radius: 15px;" /></a></div>
<p>Right back at you—thank you, AIST!</p>
<p>I’m going to miss the group terribly, and hope to see many of them next year at ISMIR in Paris.</p>Today was my final day at AIST. I am moving to Paris to start a new post-doc at IRCAM! It’s been a wonderful three years here in Goto’s lab, and I am really grateful to everyone in the lab who helped make my time here so enjoyable.ISMIR in Suzhou2017-10-30T00:00:00+00:002017-10-30T00:00:00+00:00http://jblsmith.github.io/ISMIR-in-Suzhou<p>Last week I attended <a href="https://ismir2017.smcnus.org/">ISMIR 2017</a> in Suzhou, China, and once again it was an excellent conference, with a high-quality scientific program (every talk was interesting and well delivered) and top-notch hosting from the National University of Singapore: terrific food and venues, and the chartered shuttles to and from the airport saved lots of people a lot of time.</p>
<p>I presented two papers, <a href="/Towards-richer-descriptions-of-structure/">described in an earlier blog post</a>. Both (very colourful) posters are available to download (click to access PDF):</p>
<div display="inline-block" style="clear: both;">
<div class="project_img" style="float: left;"><a href="/documents/smith2017-ismir-automatic_interpretation_of_music-poster.pdf"><img src="/documents/poster-thumbs/smith2017-ismir-automatic_interpretation_of_music-poster.jpg" alt="Smith and Goto 2017 poster thumbnail" style="width: 300px; border-radius: 15px;" /></a></div>
<div class="project_img" style="float: right;"><a href="/documents/smith2017-ismir-multi_part_pattern_analysis-poster.pdf"><img src="/documents/poster-thumbs/smith2017-ismir-multi_part_pattern_analysis-poster.jpg" alt="Smith and Chew 2017 poster thumbnail" style="width: 300px; border-radius: 15px;" /></a></div>
</div>
<div style="clear: both;">
I had some excellent conversations with colleagues who visited my poster, for which I'm very grateful!
</div>
<p>I also had the pleasure of introducing <a href="http://elainechew-research.blogspot.com/">Prof. Elaine Chew</a> as the <a href="https://ismir2017.smcnus.org/keynotes/">first keynote speaker</a>. The organizers had the inspired idea of letting each speaker be introduced by one of their former students. It was an honour to introduce her, and to be included among the many projects she described in her talk.</p>
<p>Finally, I chaired the <a href="https://ismir2017.smcnus.org/programschedule/#Oral6">6th oral session</a>, on the topic of Structure. We had four great talks: two about chords (predicting and modeling chord sequences), and two about melody (generating and perceiving them).</p>
<p>I’m definitely looking forward to <a href="http://ismir2018.ircam.fr/">ISMIR 2018 in Paris</a>!</p>Last week I attended ISMIR 2017 in Suzhou, China, and once again it was an excellent conference, with a high-quality scientific program (every talk was interesting and well delivered) and top-notch hosting from the National University of Singapore: terrific food and venues, and the chartered shuttles to and from the airport saved lots of people a lot of time.CrossSong, in print and online2017-09-12T00:00:00+00:002017-09-12T00:00:00+00:00http://jblsmith.github.io/CrossSong-in-print-and-online<p>Our article about the CrossSong puzzle is now print! The article was published in the <em>Journal of New Music Research</em> (JNMR) with Open Access, so everyone can view <a href="http://www.tandfonline.com/doi/full/10.1080/09298215.2017.1303519">the full article online</a>. I’m also pleased to announce a major update to the <a href="https://staff.aist.go.jp/jun.kato/CrossSong/demo/">online demo of our system</a>.</p>
<p>New features include:</p>
<ol>
<li>
<p>Different puzzle sizes, so that users can get familiar with the basics before diving into the large puzzles. I strongly recommend trying the <a href="https://staff.aist.go.jp/jun.kato/CrossSong/demo/puzzle.html?num_squares=2&amp;dirname=puzz1-rwc_short&amp;difficulty_level=0">2x2 puzzle</a> and <a href="https://staff.aist.go.jp/jun.kato/CrossSong/demo/puzzle.html?num_squares=3&amp;dirname=puzz4-rwc_rock_vs_dance&amp;difficulty_level=0">3x3 puzzle</a>—you’ll have a lot more fun solving the 4x4 puzzles afterward!</p>
</li>
<li>
<p>New puzzle layouts and new music! I’m fond of the <a href="https://staff.aist.go.jp/jun.kato/CrossSong/demo/puzzle.html?num_squares=4&amp;dirname=puzz3-rwc_rock_vs_dance&amp;difficulty_level=4">first “Rock vs. Dance” puzzle</a>. (Note that, for copyright reasons, all the puzzle content is from the <a href="https://staff.aist.go.jp/m.goto/RWC-MDB/">RWC database</a>.)</p>
</li>
</ol>
<p>We hope you enjoy the game, and would love to hear your feedback—whether it’s a question, complaint, or feature request!</p>Our article about the CrossSong puzzle is now print! The article was published in the Journal of New Music Research (JNMR) with Open Access, so everyone can view the full article online. I’m also pleased to announce a major update to the online demo of our system.Towards richer descriptions of structure: two new articles accepted to ISMIR2017-07-17T00:00:00+00:002017-07-17T00:00:00+00:00http://jblsmith.github.io/Towards-richer-descriptions-of-structure<p>Earlier this summer, both articles I worked on were accepted to this year’s ISMIR conference. Last weekend I submitted the camera-ready copies for both of them, which I’m sharing now.</p>
<p>They are very different papers, but they both start with the same problem: structural annotations do not capture the richness of music.</p>
<p>In the first paper, we estimate richer descriptions of music by analyzing the repetition structure of individual instrument parts within a song. This requires combining structure analysis with source separation.</p>
<blockquote>
<p><strong>Details:</strong> “Multi-part pattern analysis: Combining structure analysis and source separation to discover intra-part repeated sequences.” By Jordan B. L. Smith and Masataka Goto. To appear in <em>Proceedings of the International Society for Music Information Retrieval Conference</em>. 2017. <a href="/documents/smith2017-ismir-multi_part_pattern_analysis.pdf">PDF</a>, <a href="/documents/smith2017-ismir-multi_part_pattern_analysis.bib">BIB</a></p>
<p><strong>Abstract:</strong> Structure is usually estimated as a single-level phenomenon with full-texture repeats and homogeneous sections. However, structure is actually multi-dimensional: in a typical piece of music, individual instrument parts can repeat themselves in independent ways, and sections can be homogeneous with respect to several parts or only one part. We propose a novel MIR task, multi-part pattern analysis, that requires the discovery of repeated patterns within instrument parts. To discover repeated patterns in individual voices, we propose an algorithm that applies source separation and then tailors the structure analysis to each estimated source, using a novel technique to resolve transitivity errors. Creating ground truth for this task by hand would be infeasible for a large corpus, so we generate a synthetic corpus from MIDI files. We synthesize audio and produce measure-by-measure descriptions of which instruments are active and which repeat themselves exactly. Lastly, we present a set of appropriate evaluation metrics, and use them to compare our approach to a set of baselines.</p>
</blockquote>
<p>In the second paper, the goal is to take a structural analysis by a listener and gain some extra insight into it:
did they indicate a given section break because of a change in harmony, or rhythm, instrumentation?
We validate a technique for estimating this information.</p>
<blockquote>
<p><strong>Details:</strong> “Automatic interpretation of music structure analyses: A validated technique for post-hoc estimation of the rationale for an annotation.” By Jordan B. L. Smith and Elaine Chew. To appear in <em>Proceedings of the International Society for Music Information Retrieval Conference</em>. 2017. <a href="/documents/smith2017-ismir-automatic_interpretation_of_music.pdf">PDF</a>, <a href="/documents/smith2017-ismir-automatic_interpretation_of_music.bib">BIB</a></p>
<p><strong>Abstract:</strong> Annotations of musical structure usually provide a low level of detail: they include boundary locations and section labels, but do not indicate what makes the sections similar or distinct, or what changes in the music at each boundary. For those studying annotated corpora, it would be useful to know the rationale for each annotation, but collecting this information from listeners is burdensome and difficult. We propose a new algorithm for estimating which musical features formed the basis for each part of an annotation. To evaluate our approach, we use a synthetic dataset of music clips, all designed to have ambiguous structure, that was previously used and validated in a psychology experiment. We find that, compared to a previous optimization-based algorithm, our correlation-based approach is better able to predict the rationale for an analysis. Using the best version of our algorithm, we process examples from the SALAMI dataset and demonstrate how we can augment the structure annotation data with estimated rationales, inviting new ways to research and use the data.</p>
</blockquote>Earlier this summer, both articles I worked on were accepted to this year’s ISMIR conference. Last weekend I submitted the camera-ready copies for both of them, which I’m sharing now.