Hi would like to share how to create and deploy a dsl dictionary for Goldendict 0.9+-----------Requirements:-----------------1. Linux/Windows2. Text editor3. Goldendict installed

About DSL format:----------------------it's abboy lingo's dsl source format; and we can easily create any kind of list into a dictionary.

using dsl:

we can create a telephone directory for us:joe 1234567amy 2345678yogi 3456789

but let's try country-capital first;

Quick start:--------------Let's say we start making a word country-capital city dictionary, where we can have country name followed by it's major city or many cities.let's take data from here: http://geography.about.com/od/countryin ... pitals.htm-------------suppose we have only these six countries in our list:-------------Afghanistan - KabulAlbania - TiraneAlgeria - AlgiersAndorra - Andorra la VellaAngola - LuandaAntigua and Barbuda - Saint John's--------------

Now the dictionay dsl format will be like this (observe the header -the first 3 lines):------------------------------------<file begins here>----------------------

Now we can put the myfile.utf16.dsl to golden-dictionary's search path (dictionaries folder).Finally to save space; we can do compression to the file [Linux users]:------------dictzip myfile.utf16.dsl------------will get us 'dictzip myfile.utf16.dsl.dz' which is very small file

extract back to dsl.dz dictzipped file format back to .dsl-----------dictunzip dictzip myfile.utf16.dsl.dz

and to switch back to utf8 format-----------iconv -f myfile.utf16.dsl $file -t utf-8 -o myfile.utf8.step.dsl #utf16 is now utf8 still we have '\r' windows line-terminatorssed 's/\\\r$//g' myfile.utf8.stage1.dsl > myfile.utf8.dsl # now we can edit the file in linux using gedit/kedit etc.

Hope this helps. will add more info. later Happy creating new dictionaries!

1. DSL tagI found that DSL format can be very useful to make a user dictionary, for it's in uncompiled plain text format. So I can make changes anytime I want and put in new articles incrementally.Then when I want to make the article richer and more readable, dsl format's markup tags are necessary. I found some tags formats are supported in dsl from Lingvo's help file.

I tested and confirmed that above tags are internally recognized in GD.Some Lingvo tags seem to do nothing in GD. These tags seem to have no references in article-style.css and don't show any recognizable effect.

[*], [/*] - the text between these tags is only displayed in full translation mode[trn], [/trn] - translations zone. [com], [/com] - comments zone. [!trs], [/!trs] - the text between these tags will not be indexed

If I misunderstand something, please let me know it.

2. representative headword for multiple onesIn DSL dictionary, I want to make only one headword appear for several synonyms.That is, assume several headwords(ex. yi, いち, 일, 一) have one article, I want to make GD show like this even when I search "yi", いち or 일)

I used Babylon program formerly and it only showed the main headword, and BGL dictionaries are still acting so as in babylon. So I hopefully guess that function will also be possible in DSL format. Am I wrong?

Last edited by panho10 on Tue Apr 06, 2010 6:59 pm, edited 2 times in total.

panho10 wrote:2. representative headword for multiple onesIn DSL dictionary, I want to make only one headword appear for several synonyms.That is, assume several headwords(ex. yi, いち, 일, 一) have one article, I want to make GD show like this even when I search "yi", いち or 일)

一 one, single; individual; undivided

I used Babylon program formerly and it only showed the main headword, and BGL dictionaries are still acting so as in babylon. So I hopefully guess that function will also be possible in DSL format. Am I wrong?

I think that software is not designed with "multi headwords -> multi meanings" but it is designed as "one headword -> many meangings"! So, you want yi, いち, 일, 一 to have "one, single; individual; undivided"

we should make it single headword to meaning format (i know this will increase the dictionary size) but even with such huge 2,00,000 entries, the zipped dsl is about 2mb. so, it would be practical to swith to one-headword to one-or-many-means like this:

------------for your convenience, i'll attach the script files... you need linux to run those... I'm learning python to make these "bash scripts" work on windows/macosx and to improve performance of the dictionary creation process.

Thanks for reply. It's a pity I can't use Linux. I am not conversant with computer, just a simple user.

My concern is in utilizing Hanyudacidian(Chinese version Oxford unabridged of sort).It has tens of thousands of Chinese hieroglyphs and more than 300,000 of headwords.

Therefore searching keywords through keyboard typing is difficult. So I want to give each headwords corresponding pronunciations in Roman, Japanese, Korean thereby making searching easy and fast.

Naturally search result must show original Chinese words because other headwords are just pronunciations and each pronunciation has several corresponding words.

In Babylon Pro the function is supported (with limitations, of course, because each meanig can have only one headword and variant forms of headword can't be displayed unless you make separate article, but it is very unefficient in a large database).

If DSL can't support representative headword function, I hope ikm make unique GD dictionary format which incorporates merits of other dictionary formats.

panho10 wrote:Thanks for reply. It's a pity I can't use Linux. I am not conversant with computer, just a simple user.

My concern is in utilizing Hanyudacidian(Chinese version Oxford unabridged of sort).It has tens of thousands of Chinese hieroglyphs and more than 300,000 of headwords.

Therefore searching keywords through keyboard typing is difficult. So I want to give each headwords corresponding pronunciations in Roman, Japanese, Korean thereby making searching easy and fast.

Naturally search result must show original Chinese words because other headwords are just pronunciations and each pronunciation has several corresponding words.

In Babylon Pro the function is supported (with limitations, of course, because each meanig can have only one headword and variant forms of headword can't be displayed unless you make separate article, but it is very unefficient in a large database).

If DSL can't support representative headword function, I hope ikm make unique GD dictionary format which incorporates merits of other dictionary formats.

Ok, i've created a python3 program which you can run in any operating system to get the following result:

observe that chinese character is inserted into the meaning!after running "python3 korean.py" in c:\python3 folder where you need to keep your "file" as "file.txt" to get file-output.txt in utf8 format

other solution is to have 2 dictionaries, 1. one for roman,japanese,korean to chinese and 2. second dictioanry is chinese to english/other language3. when we search yi in roman,japanese,korean dictionary, we will be shown 一, and we can click on 一 to get english meanings from 2nd dictionary. need to add [ref] mean [/ref] tag for clickable search

Please use this program; there are ways to convert the .py to exe for easier deployment. i am yet to try that. here is the python3 program which can make word-meaning type word1,word2,word3<separated by tab>meanings.

I appreciate your helpful reply. Actually the original data is very complex and large(more than 200MB in text only).Since Chinese headwords also have some variants in some cases and meaning sections have many lines,your script can't be applied directly. And some Chinese words don't have corresponding pronunciations because they are found only in ancient texts and are not yet known how to pronounce them. Anyway I can use your script some other dictionaries in the future. Thank you. I just wanted to add query by pronunciation. I am sorry to find that we can't do it in DSL itself.

In above code, three headwords(yi, いち, 일) are only pronunciation forms in Roman, Japanese and Korean.So I hope to make it possible that even if I search by any of yi, いち or 일, only 一 should be displayed as the representative headword in the query result.

This will make two headwords appear in index: 一 and yi. However, the "yi" headword in the card itself will appear as "一 (yi)". Try it out for yourself. Note that the parentheses (round braces) are escaped with the backslashes -- they need to since when unescaped they have another meaning (so-called "optional parts").

DSL has some (mostly known to few) things like this.

p.s. About creating custom format -- there's little value in creating a format when there are no dictionaries in that format, it's a chicken-and-egg problem. It takes quite some time to develop one, but it won't gain any significant adoption fast.

Yeah, actually I myself found the method in langvo's help file and tried it.But I din't like the result because it made the article look somewhat unclean.By the way if a new custom format is meaningless, how about considering to support gls format.gls is a raw format before compiled to bgl format. and the syntax is as follows and html tags are supported.

As you will know, alternative terms are searchable but not shown in article display.But I would prefer it if GD can extend dsl syntax internally and allow some sort of babylon-like unseen headword alternative function.