Otohiko wrote:Though it does irk me that while the audio data base is a bit of a mess and people get away with gross inaccuracies and fakes, while I get thrown out for a source on my MEP that really was "unknown" as stated

If you have a good proposal for cleaning up right about fifty thousand audio entries, let us know...

Otohiko wrote:Though it does irk me that while the audio data base is a bit of a mess and people get away with gross inaccuracies and fakes, while I get thrown out for a source on my MEP that really was "unknown" as stated

If you have a good proposal for cleaning up right about fifty thousand audio entries, let us know...

Oh, I'm not picking at the database, I totally sympathise. I'm not sure if the "no unknown/various" rule is neccesarily productive given the situation with it, that's all. But eh, if it must be... I didn't have anything this year that'd get reasonably nominated, so whatever

The Birds are using humanity in order to throw something terrifying at this green pig. And then what happens to us all later, that’s simply not important to them…

Otohiko wrote:Though it does irk me that while the audio data base is a bit of a mess and people get away with gross inaccuracies and fakes, while I get thrown out for a source on my MEP that really was "unknown" as stated :(

If you have a good proposal for cleaning up right about fifty thousand audio entries, let us know...

How about this:

Replicate a constantly maintained music database -- the MusicBrainz replicator service may work best, because it is large and has an active community augmenting it and ensuring its correctness.

Flag any pairs of (artist, title) in the .org music database that aren't found in the database replicated from MusicBrainz as suspect. Mark all videos of creators with any videos containing such suspect music entries as ineligible for the VCAs: there still seem to be quite a few people who care about it, so it should serve as incentive for getting things fixed up.

In the case where the .org audio database contains a legitimate entry that is not found in the MusicBrainz database, get it resolved with MusicBrainz *first*. The member with the troublesome entry can do this.

If that doesn't work and the legitimacy is still unchallenged, a table for augmenting and/or overriding MusicBrainz data can be maintained. With the above procedure, it should be significantly smaller and easier to manage than the gargantuan mass of stuff that currently exists.

trythil wrote:Replicate a constantly maintained music database -- the MusicBrainz replicator service may work best, because it is large and has an active community augmenting it and ensuring its correctness.<b>(cut)</b>In the case where the .org audio database contains a legitimate entry that is not found in the MusicBrainz database, get it resolved with MusicBrainz *first*. The member with the troublesome entry can do this.

If that doesn't work and the legitimacy is still unchallenged, a table for augmenting and/or overriding MusicBrainz data can be maintained. With the above procedure, it should be significantly smaller and easier to manage than the gargantuan mass of stuff that currently exists.

We've talked about this before and I still think there's enough odd audio that wouldn't work well. It'd be a good idea for most stuff but for unusual audio the answer shouldn't be 'take it up with this other site totally unrelated to the org at all THEN talk to us if that doesn't work out'. Refer to a database is fine, it would work for the vast majority of stuff, but if it doesn't list the audio the first and only step after that should be confirmation with the admins like we currently do with anime. Which considering how much more possible audio sources there are than animes produced brings to mind the question of who's volunteering to take care of this day after day?

Personally I think that if someone wants to clean up the database but doesn't want to make a huge job for themselves in the future the easiest method would be to link common misspelling like animes are linked, revamp the audio entry system so it doesn't take an hour to enter in a multi-audio video (this is the only reason I personally have a totally bullshit audio entry in my profile), and call it done. It's not perfect but it'd catch the linkin park misspelling crap without the permanent need for a volunteer approving unusual audio sources.

An example of the type of audio I'm talking about here. From my own profile I have a video with the 'badger badger badger' web thingy audio. It's not on musicbrainz and I doubt it ever will be. Ditto with Strongbad audio or other web memes like that. Remixes can often be the same situation, they may have web only distribution but not be in a database and are unlikely to ever be in it.

Another example, I'm currently doing a video to Morning Musume's song 'Love Peace Hero ga Yattekita'. This song is probably listed on musicbrainz but damned if I can find it. An artist search for 'morning musume' brings up four pages of results, most in Kanji, so even finding the right group is rather questionable. At least with romanized words I could pattern match even though I don't know what the title actually means. But Kanji? Shit, I'm an american. I'm lucky enough if I can figure out English much less that foreign pictanary stuff. The right selection could be staring me in the face in kanji and I still wouldn't be able to recognize it. Considering that japanese audio is used a fair amount I doubt I'd be the only one having this issue.

A third example, if someone takes <a href="http://www.animemusicvideos.org/phpBB/viewtopic.php?t=77275">this guy's offer</a> then the audio wouldn't be in any database. It's a custom made audio track that's never been on a CD. Without even looking you know that won't be in musicbrainz.

This also wouldn't address the wide variety of other non-english songs that people use. I've seen videos with Russian, Hungarian, Italian, Polish, Korean, Chinese, Finnish, Swedish, Spanish, Japanese, and many more. Not to mention that getting correct spelling for these songs and artists is probably asking for a miracle considering some people here can barely spell words in their native language.

Plus, there's a giant chunk of videos that use movie trailer audio, TV commercials, spoken audio, and other sources instead of songs. And don't forget the (usually horrible) renditions of songs used in the Iron Chef Idol MEPs 'sung' by the participants. I kid, some performances are pretty good.

Having said this, I still support something to be done to clean up the more egregious errors (Likin Park, Lincoln Park, Linken Park, -linkin park-, **Linken Park**, **Linken Park**, and other similar misspellings, plus the variations that include the song name and/or the album name with the artist name). The scary thing is, doing this will probably add about 2000 more entires to Linkin Park's over 5000 AMV count.

Even with the "random" entries, a significant portion of the database would be cleared up with the method trythil suggested. It may have to be part of a multi-tiered strategy, but it would more than likely fix a good 80% of the problems by itself.

The results would, sadly, still have to be sifted through by volunteers.

BasharOfTheAges wrote:Even with the "random" entries, a significant portion of the database would be cleared up with the method trythil suggested. It may have to be part of a multi-tiered strategy, but it would more than likely fix a good 80% of the problems by itself.

The results would, sadly, still have to be sifted through by volunteers.

Willen wrote:This also wouldn't address the wide variety of other non-english songs that people use. I've seen videos with Russian, Hungarian, Italian, Polish, Korean, Chinese, Finnish, Swedish, Spanish, Japanese, and many more. Not to mention that getting correct spelling for these songs and artists is probably asking for a miracle considering some people here can barely spell words in their native language.

Find me some that don't exist in the MB database, and it's not too hard to get them entered. The replicated database I wrote about can, and should, be regularly synced with MB's master database.

The problem of non-music sources can be addressed initially by the overrides table I wrote about. As time goes on, we can look into linking that up to some other source.

As far as spelling and the like go...well, yes, that will still be a problem.

My proposal is designed to provide an approximate solution to the technical problems of tracking and merging duplicates, and maintaining a list of audio sources. It cannot address, and was not meant to address, the larger social problem of stupid and/or lazy people.