So finally we have v2! Thanks to the site owners and for all contributing users.

I just noticed that oneyoudontknow has proposed this problem and give a list of pages with deformed letters. That list can be become huge if we inspect the Russian bands. Anyway, I want to give a slightly more systematic look at this issue. Hope the mod won't regard my article as a duplicate.

=====================================Edit: I think making of list of pages with messed up letters is helpful, but currently I don't have time to look up each band from Eastern Europe and Eastern Asia, say. So I will not touch the task. Also, users can report messed up letters.

Also, I suggest to the mods to open the right of editing album titles to metal demons, i.e. users with more than 10,000 points. (Currently there are only 62 of them.) So I don't have to report messed up album titles to the mod. I don't know if opening this right could be potentially dangerous to the site. If this is inapplicable, I suggest that we recruit some users who have specific knowledge about a language and let them deal with reports on this issue.

====================================== New Edit (2011/11/15): The following is a (highly incomplete) list of albums with mojibakes in tracklists, titles and lyrics. This list will keep growing (big or small) as I compile more such releases and as users repair the mojibakes. If I add the tag "(lyrics)" after the release, then it means that lyrics is among the things deformed. If you try to repair an album, make sure that you repair everything, including the lyrics; if you cannot repair the lyrics, please do not modify it.

An easy way to repair mojibakes: go to v1 of that page (you can find the v1 link at the bottom of each page), change the encoding to the corresponding language, then chances are, the correct text come back. This method is useful, but does not guarantee satisfactory result in every case! Example: For deformed Polish texts, choose the encoding Central European (Windows), and see if the correct texts return. Please confirm your results (e.g. by googling) before pasting them here.

NOTE: you do not need to know the specific language (of course it's better if you do), but you need to know how it looks like. For example, in Polish, there is no such letter "ê", it shows up very often because it is deformed from the letter "ę". Similarly, the strange thing ³ is deformed from ł (l slash).

A particular annoying thing in v1 is the deformation of non-Latin characters into unreadable mojibakes (see below). As a consequence, a lot of valuable information about the band name, album title, tracklist, lyrics is thus lost. I have been spending a lot effort for years to repair the deformed characters on the pages of Chinese/Japanese bands. And it is heart-breaking to see them deformed again after some time. The tracklists of small/obscure bands are really difficult to find, and I often need to refer to my own CD collection. But now I am abroad and my CDs are not with me, so once some tracklist got deformed, I may never be able to repair it. Since the site serves as an encyclopedia, the key information in original language is valuable and helpful to the native speakers, and since v2 uses UFT-8 encoding, the letter deformation will not be a problem anymore, so let us repair the mojibakes that already existed from v1 and add key information in original language.

The following is my little guideline. Any suggestions, opinions are welcome.

What are mojibakes?Mojibakes are deformed letters that occur when the default encoding system of a website is incompatible with the input language. For example, if I type Рассвет (in Cyrillic) in v1 metal archives, it would likely be deformed into Ðàññâåò, which is totally unreadable. This phenomenon is widespread in band pages that contain non-Latin characters, e.g. German, Finnish, Czech, Russian, Japanese, Chinese, etc.

UTF-8 solutionv2 uses UTF-8 as its encoding system, which incorporates every character in the Unicode character set. So this should produce no more mojibakes. But the mojibakes that already existed in v1 will not automatically be fixed in v2. So I suggest that if the users see any mojibakes, report them or fixed them if he has the power.

HTML entities:Some words were written in HTML entities, and could represent normally in v1. For example, Kurazh has a song called “Дождь” and was written as &#1044;&#1086;&#1078;&#1076;&#1100; When you read the page, everything is represented fine. But if you search the song “Дождь”, it will return nothing. The user can use the edit tool to see if the letters are written in HTML or just letters themselves.

Note: Please never switch the encoding system away from the default UTF-8. I guess if you try to edit a page under another encoding system, it will still cause trouble.

What is key information?IMO, the band name, label name, band members’ names, album title, tracklist, lyrics are key information.

When should the original language accompany the key information?The only key information that should come with original language is the official thing. For example, Seikima-II is officially known as 聖飢魔 II in Japan, Tang Dynasty is officially (and only) known as 唐朝 in China. So we should add the original band names. Ritual Day has two official names (look at the logo), one in English, and one in Chinese (施教日); the Chinese name should also be added. On the other hand, Dark Mirror ov Tragedy from Korea has no Korean name, so there is no need to add any Korean translation.

Some bands give the tracklist in more than one language. The tracklist in each language should be recorded, and the one in “Main language” should appear in the tracklist in Metallum and the other may go to additional notes (the user should also mark that the translation is official), or both tracklists appear in tracklist in Metallum. However, if the band only gives one tracklist, then the original one should go to tracklist in Metallum. The user may give an English translation, or Romanization, if the original one is not English, but the translation should go to the additional notes. In the past, to avoid mojibakes, some information is translated/transliterated, and the original one is ignored.

Example: Forest (Rus) – In the Flame of Glory. The information is provided by the band in both Russian and English (look at the cover). Its Russian title is “В Пламени Славы”. Some song titles are also in Russian. So the Russian version should appear as main language and English info should also be added and marked as official.Example: Tang Dynasty (唐朝) - A Dream Return to Tang Dynasty. As far as I know, this album never had an official English translation; it is only known as "梦回唐朝". So I stubbornly think that the title should at least be "梦回唐朝 (A Dream Return to Tang Dynasty)" in the discography, or the English translation should go to the additional notes. Agree?

Special languagesSerbian. This is the only European language that is in active digraphia; it can be written in both Cyrillic and Latin Script. What script should be provided depends on the band. Example, Dažd - Naživo!, on this album, everything is written in Cyrillic, so the title is “Наживо!”, and the tracklist on the back cover is also in Cyrillic script. So we should provide the information in Cyrillic, and may add the corresponding Latin script to additional notes.Chinese. There are two ways to write it, Traditional Chinese and Simplified Chinese. The former is widely used in Mainland China, and the later is standard in Taiwan. Some bands from Mainland write the tracklist in traditional way. But IMO, the writings are interchangeable so it suffices to provide the info in one way.

Translation or Transliteration?This is a tricky problem. Transliteration does not help the reader understand the language. It only represents the approximate pronunciation, which, in some cases, may become quite misleading. For Japanese, there is an observable trend in transliteration rather than translation. Example, Japan’s 伝承歌劇団 is known as Densyou-Kagekidan, rather than “Traditional Opera” (translation). On the other hand, the Chinese people tend to translate things because words written in the standard transliteration system, the pinyin system, would appear a little weird. Example, the original name of The Dark Prison Massacre is 暗狱戮尸, which, if transliterated, would be “An Yu Lu Shi”.

In my opinion, band members’ names should always be transliterated. As for band name and tracklist, it depends on the users predilection.

Half-width or Full-width?Since the eastern Asian languages are written in blocks, the characters are wide than letters; they are called full-width form. It is possible to write English in full-width form, so one can get “Ｍｅｔａｌ ａｒｃｈｉｖｅｓ”, so ugly. Also, in Japanese, it is possible to write hiragana and katakana in half-width form, but this is very rare. So the rule is: for letter-based languages, use half-width form. For Eastern Asian languages (Chinese, Japanese and Korean), use full-width form. In most cases, the users need not to worry about the correct form because the correct form for a language is also the default form, unless it is changed manually.

Last edited by sofeshue on Sat Dec 31, 2011 10:40 pm, edited 5 times in total.

I suspect it's going to be a lot of effort to repair the broken "mojibakes" and htmlentities. Feel free to use the read-only v1 as a reference to copy/paste some htmlentitized titles that might have gotten truncated or garbled in the v2 conversion somehow.

_________________

Von Cichlid wrote:

I work with plenty of Oriental and Indian persons and we get along pretty good, and some females as well.

Markeri, in 2013 wrote:

a fairly agreed upon date [of the beginning of metal] is 1969. Metal is almost 25 years old

This will surely take a lot of time for bands from Russia / Ukraine / Belarus and so on... By the way, should I always use UTF-8 when entering Cyrillic symbols (even if I do it via copy-pasting them from another page)?

I just noticed that oneyoudontknow has proposed this problem and give a list of pages with deformed letters. That list can be become huge if we inspect the Russian bands. Anyway, I want to give a slightly more systematic look at this issue. Hope the mod won't regard my article as a duplicate.

All I did was create a short list that is by no means complete and it is a daunting task to do this alone. The thing with the splits was already quite time consuming back then.

The question is:create a taskforce that goes through all releases of all bands of those countries in which such issues can occur or some other way.

Scenes that have this problem are:China, Taiwan, Japan, Korea, Russia, East Europe

This kind of nitpicky issue is exactly why I joined MA in the first place--it always bothered me seeing broken Cyrillic and incorrect encoding on the site. Glad we will have good support for this now. I have a few issues I'd like to discuss as well that may help improve searching of the site for users:

Simplification -- It is customary for serbian (and other related slavic balkan languages) speakers to simplify typing of words that use diacritics. The letters 'c' and 'ć' are entirely distinct but online it is not uncommon to see, for example, the word 'voće' substituted with the simplified version 'voce'. The pronunciation doesn't change but those who understand the language usually can easily tell from context that 'voce' actually means 'voće'. I think bands/albums/songs with diacritics like Dažd should consistently include the simplified version as well for ease of use in the search, much like 'Motorhead' returns 'Motörhead'. For the most part balkan bands with such issues seem to be taken care of but this should apply more broadly to all languages in which such simplification is customary.A related issue is the problem of 'dj' versus 'đ' in serbian--these are the same letter but for whatever reason in latinate there are two available versions that are both regularly used. I tend to prefer the đ version in my own writing but both versions are grammatically valid and should both be included to aid searching.To be clear, I do not think that simplifications like 'zh' for 'ž' should be used since they are not part of the standard latinate characters taught in the education system.

Interpretation -- Before the launch of V2, I noticed that a number of bands with cyrillic names had latinate interpretations of the letters that I found absolutely unacceptable for inclusion in this database. For example, 'Аркона' was written as 'Apkoha' which I consider completely junk information. I removed this kind of data from a number of bands and I feel that this should be consistent policy over the entire database. I doubt people cleaning mojibakes would enter this kind of data themselves but this is just something to be aware of.

This is all I can think of for now. I think there was one more nitpicky issue I had thought of but it's slipping my mind now. If I recall it, I'll post it here. I'm quite keen to clean up this annoyance with the database and I'll be happy to contribute as much as I can.

I think it'd be better to make a list first, because not everyone here knows respective languages and/or alphabets -> such releases are better off first reported, then fixed by someone with appropriate knowledge than fixed right away in a half-assed manner.

I might fix Raventale's album when I'm done with splits for today. By the way, the album title is "На Хрустальных Качелях" ("Na Khrustal'nykh Kachelyakh", if somebody wants to add a transliterated title as well), and it happens to be the phrase that's messed up in the first review.

I agree with making a list first, perhaps after V2 has moved to the metal-archives.com domain so we don't have a bunch of broken links. I think I'd be able to figure out russian/ukrainian/polish/etc. reasonably well but I'd rather a native or well-educated speaker correct them instead.

Perhaps people can report errors on the main site in addition to listing them here? Catachthonian, I noticed you'd been reporting things as 'typos' and I'm already taking care of the ones you'd mention. Perhaps those involved in this task force can do this as well for any fields they cannot edit.

I agree with making a list first, perhaps after V2 has moved to the metal-archives.com domain so we don't have a bunch of broken links. I think I'd be able to figure out russian/ukrainian/polish/etc. reasonably well but I'd rather a native or well-educated speaker correct them instead.

Or maybe ask certain other mods (Witcher, PiotrB, Fulgurius...) to proofread if they're not too busy with the queue(s).

Quote:

Perhaps people can report errors on the main site in addition to listing them here? Catachthonian, I noticed you'd been reporting things as 'typos' and I'm already taking care of the ones you'd mention. Perhaps those involved in this task force can do this as well for any fields they cannot edit.

Good idea, but in this case they should probably report it both here and on the main site.

By the way, mind fixing the title of Raventale's album I posted above? I've fixed the track titles (and I'll fix the lyrics too as soon as I find them on the net).

Fixed all the HTML entities I was aware of / was responsible for (those I can recall at least). Have to say that the idea to auto-convert HTML-written lyrics upon import was great and it saved me a lot of work.

I think bands/albums/songs with diacritics like Dažd should consistently include the simplified version as well for ease of use in the search, much like 'Motorhead' returns 'Motörhead'. For the most part balkan bands with such issues seem to be taken care of but this should apply more broadly to all languages in which such simplification is customary.

Actually, the search can now find diacritics even if you don't use them, so there is no need to write "Dazd" or "Motorhead".

Nice, I didn't realize that! I'll go ahead and get rid of diacritic-free band names in the ANS fields as I see them since they're now redundant.

I just had a quick glance over a few regional bands and noticed a lot of stuff that needs cleaning. Hopefully I'll have more constant free time soon rather than just sniping things here and there for a few minutes between blocks of 'working'...

OKAY GUYS, based on my prior experience, this is what I've found to be helpful (links in bold):

Cyrillic Decoder - My own scriptStill a work-in-progress, but it works great with straight Russian (not Ukrainian, I think). Anybody using it should let me know if there's any errors and I'll fix the script. If the results look a bit funny (or if they don't translate well in Google, for example), compare them against the results from the next encoder:

Cyrillic DecoderWorks good for a variety of Cyrillic-based alphabets, but has been known to give odd corrections from time-to-time.

Bunch of Polish Letters:These are letters that I've seen stick out of people's surnames (and sometimes place names). Best source is often to check Google's cached version of the page back when the letters were still intact (apparently, editing a page that had these characters in the fields could break them after submission. Don't ask...)

Code:

¹ ą³ ł± ą¿ żã ăŸ źñ ńœ śŒ Ś£ Ł¶ śæ ćê ę

Quote:

Also, I suggest to the mods to open the right of editing album titles to metal demons, i.e. users with more than 10,000 points. (Currently there are only 62 of them.) So I don't have to report messed up album titles to the mod. I don't know if opening this right could be potentially dangerous to the site. If this is inapplicable, I suggest that we recruit some users who have specific knowledge about a language and let them deal with reports on this issue.

Much as I'd loooove to help out with that, I'm afraid opening those sort of rights to users is just going to encourage point-whoredom. There's a whole lot of other tasks I'd love to help out with, too: removing duplicate artists, fixing artist names (formerly known as "Display names"), capitalising titles properly... Still, if any of the mods are willing, I'd be all too eager to help out. I'm just afraid of sounding like I'm trying to weasel my way into power.

In either case, I'm happy continuing to file reports if that's the case.

Last edited by Alhadis on Sun Apr 17, 2011 6:35 am, edited 1 time in total.

I think bands/albums/songs with diacritics like Dažd should consistently include the simplified version as well for ease of use in the search, much like 'Motorhead' returns 'Motörhead'. For the most part balkan bands with such issues seem to be taken care of but this should apply more broadly to all languages in which such simplification is customary.

Actually, the search can now find diacritics even if you don't use them, so there is no need to write "Dazd" or "Motorhead".

Actually... when adding band members, if you search WITH diacritics, you won't find anything. You have to omit them...

(e.g. search for artist "Kiełtyka", nothing comes up. Search for "Kieltyka" and you get it.)

Alhadis wrote:

I'm just afraid of sounding like I'm trying to weasel my way into power.

I just started to go through every band from China and try to repair the deformed words. However, the first band, 206 and Thinkers posed me a big problem: their lyrics are completely fucked up. Does anyone know some professional decoding software that I can use to recover the deformed lyrics? (I've tried a few, including CodeView, but with no luck.) I remember clearly that these lyrics were transcribed by myself from the inner sleeve. But now I am abroad and there is nowhere to find the lyrics online or anywhere. What a pain in the ass!

Also, the right of deleting tracks does not work for me. I deleted the "intro" / "outro" (checked various sources, none of them ever mentioned them) from the following album; after I saved it, they appear again. I tried again, same result.http://v2.metal-archives.com/albums/Aga ... s%29/88509

Alhadis, there's at least one thing wrong with your Cyrillic decoding script. It doesn't get the "ё" (yo) character right.

What I use is Notepad++, a free text editor. If you open it up, make sure encoding (Encoding menu) is set to ANSI, then paste in the borked cyrillic (e.g. "Ëÿçã Êëèíêîâ"). Then change the encoding to (menu) "Character sets > Cyrillic > Windows-1251". Then copy and paste back into metal-archives.

Alhadis, there's at least one thing wrong with your Cyrillic decoding script. It doesn't get the "ё" (yo) character right.

What I use is Notepad++, a free text editor. If you open it up, make sure encoding (Encoding menu) is set to ANSI, then paste in the borked cyrillic (e.g. "Ëÿçã Êëèíêîâ"). Then change the encoding to (menu) "Character sets > Cyrillic > Windows-1251". Then copy and paste back into metal-archives.

That's because it's testing it against the Russian character set, not Ukrainian (to my knowledge, the Russians don't use the 'ё' character in their alphabet... or something to that effect.)

Actually it is used in Russian. It distinguishes "ye" from "yo". Apparently in common usage (e.g. newspapers) the diacritic is sometimes left out, but pedantically it should be used. If you don't believe me, browse the Russian wikipedia for a bit, it won't take long to find one...

Actually it is used in Russian. It distinguishes "ye" from "yo". Apparently in common usage (e.g. newspapers) the diacritic is sometimes left out, but pedantically it should be used. If you don't believe me, browse the Russian wikipedia for a bit, it won't take long to find one...

Actually it is used in Russian. It distinguishes "ye" from "yo". Apparently in common usage (e.g. newspapers) the diacritic is sometimes left out, but pedantically it should be used. If you don't believe me, browse the Russian wikipedia for a bit, it won't take long to find one...

Do you actually speak any of these languages yourself?

He is right, there is no letter ё in Ukrainian, but there is such a letter in Russian language, although it is often replaced by е. That's not a mistake when you write е instead of ё, but "officially" those are two different letters.

When searching the following string : *&*;* in the various fields (band name, album title, etc), it is easy to get the list of html entities.I'll try to correct those I can (accentuated characters in song titles for instance).

Searching *&* in Label name seems also a good way to get rid of non-existent automatically generated labels, like this one : Asphyxiate Rec & GoatowaRex.

I've been fixing Ukranian, Belarusian, Russian album / song titles and lyrics.

Maybe a new category can be added in reports, as mojibakes or encoding problems? So it's easier to sort reports and not look through four hundred of them.

Also I'm entering cyrilic track titles for albums that were released like that, and placing english translations in additional notes (I omit transliterations).

User kluseba has removed the cyrilic and leaved only english translations on a few albums of band Aria before I was finished with them, (example: Aria - Герой Асфальта: XX Лет) if somebody could leave him a note about that.