Idea: correlate Touhou music production against Japanese youth unemployment: does the total production of music as measured in seconds increase with unemployment?

Opposite view, recessions dent production (perhaps because people are working harder and so have less free time even if other people are unemployed?) http://www.gamesetwatch.com/2009/12/sound_current_yokohamas_mediam.php

While the turnout at M3 remains strong, at the same time an economic recession cannot help but touch a community whose activities rely on having free time. Furthermore while previously many hobbyists dreamed of someday breaking into the industry, more recently many also fear that game companies will begin cracking down on unlicensed tributes.

The open source alternative is MusicBrainz; looks like it has 190 albums, but like 90%+ link to VGMdb, so I’m not sure I want to include them (waste of effort, and if someone just copied over all of VGMdb a few years ago, it’ll be badly misleading to any capture-recapture analysis of population size).

Email in 1 March 2013 failed to elicit any reply by 27 May; then requested a purchase on `/r/TOUHOUMUSIC.

My request was filled on 8 June 2013 as an ISO and ZIP file. The ISO file seemed to be broken: file just calls it ‘data’, and when I mount it as a loopback iso9660 file, mount throws an error. I redownloaded it and compared it, bu the copies were identical. The good news is that the zip file seems to work fine. The data is in a .accdb file in a subfolder, which turns out to be the latest Microsoft Access database format. Unfortunately, this turns out to be almost entirely unsupported by anything on Linux (except for a Java library), but fortunately, an acquaintance had an Office 365 subscription and re-exported the .accdb file as the older Access format .mdb file (Microsoft Access database) which was successfully read and converted to CSV by mdb-tools’s mdb-export (UTF-8 CSV format).

The CSV seems to contain ~1316 entries; corresponding to ~1316 albums with the respective circle name, URL, genre, possibly vocalist, and 2 more fields I cannot figure out due to lack of Japanese proficiency (but none seem to be release dates). The entries look like this:

I am a little surprised that there are only 1316 entries. Either I’ve overestimated their thoroughness or this is limited to a specific convention or something like that… Need to look into this more. This doesn’t include the track-level data I was hoping for, but a list of albums can still be useful for estimating completeness of capture.

The Touhou project page turns out to be incomplete: each entry had to be manually annotated as related to Touhou. I was pointed to a search query which turned up many more results by looking for any page with the string “Touhou” in the “games” field.

The VGMdb administrators kindly gave me read-only access to their MySQL databases. I grabbed the entirety of the tables vgmdb_albums and vgmdb_tracks from the main VGMdb database; I exported them as 2 CSV files with comma separators, renamed 2013-vgmdb-albums.csv and 2013-vgmdb-tracks.csv. Before loading the exports, I had to delete all escaped quotes; the default R CSV parsing doesn’t handle them. The track rows are 1 track with an album ID, so to turn each track record/row into an equivalent of the torrent rows, I need to fill in based on the album table.

A loose group of 4chan users on the /jp/ subforum collaborate each Comiket to upload and distribute doujin manga, games, and music released at that Comiket; some are uploaded by Comiket attendees, some are bought from resellers like Comic Toranoana, and many files are harvested from Japanese P2P filesharing networks like Winny/Share/Perfect Dark. I compiled a list of ~400 files from the /r/TouhouMusic C83 thread (principally from the 4chan links) & the blog All Doujin Music and gradually downloaded them from January to March 2013. After dead links, I was left with 400-500 files. Many are not music, or even Touhou-related, so I hand-filtered albums, looking for signs of being Touhou doujin works (credits to ZUN, Touhou characters in the artwork, themes I recognized as Touhou, etc); when I was not sure, I erred on the side of exclusion. The final compilation yielded 3503 files (evenly split: 1776 Touhou vs 1728 “other”) with 953 Touhou music files.

Constant growth model: the first game was released in 1996, no? So that gives 17 years to accumulate 1.26TB or 1,260GB or 74.1GB per year. The screenshot is downloading at 0kb/s, which is not useful, but it says 2640 days left so we can estimate that he’s downloading at 0.47GB per day (1260/2640), and over a year 0.47GB is 174GB which is 2.35x faster than the 74GB per year. So at that annual increase, OP is not doomed and can in fact catch up.

Exponential growth mode: a little trickier since we can’t force a formula just from the cumulative total and elapsed time. I need more data. So using my 2012 Touhou Lossy Torrent data, I can try to regress an exponential against the annual count… but wait! The amount of music does not seem to be increasing exponentially!

Looks like Touhou music’s growth peaked in 2009; this might reflect the torrent’s incompleteness, except the torrent is from 2012, and you’d expect coverage of 2010 or 2011 to be pretty good by that point. So the growth of the torrent overall looks more like a sigmoid or log:

The following is a buggy program for scraping Touhou albums from VGMdb; it works on a limited subset of album pages, but has an unknown number of fatal bugs. I abandoned it once I was offered read-only database access, and that was what I actually used to get my VGMdb data. This is in case I ever need to go back.