To create a glossary of terms (words and phrases), you can either create it as
you go along during a pilot project, or you can extract terms from the source
text and compile them into a glossary.

The advantage of creating it during a pilot project is that the term list is
created in context, but the disadvantage is that it is a slower method and it
yields less terms (although most terms are useful). The advantages of
automatic term extraction are that it is fast and yields potentially more
relevant terms. The disadvantage is that terms are not viewed in context, and
you have to initially search through many unimportant or useless terms.

Since some texts are available in bitext format (which can be a TM or a PO file
or similar), it would be possible through smart guessing to figure out which
source words fit which target words. These tools are generally called “word
aligners” in the computational linguistics industry, and sadly most “free”
tools are free for academic purposes only. Word aligners have great potential
because bitexts are usually translated by humans, and so the terms are likely
to be accurate.

The Translate Toolkit contains
poterminology which can do bilingual term
extraction. It does frequency analysis on the input files, and uses stop words
to improve the results. It has several parameters that allow you to change its
behaviour. It does some alignment and fills in all translations for a term
that it found in the translation files.

Monolingual term extractors usually look for words or phrases that occur more
than X number of times in the text, or for words that occur in the text but do
not occur in a dictionary or established term list. Such an extraction usually
contains a large percentage of useless terms unless steps are taken to remove
irrelevant repetitions from the result.

Tim Craven’s ExtPhr32 for MS Windows is not GPL but it is freeware for all
purposes. It is very fast. Unfortunately it converts all terms to uppercase.
You can also use a stoplist. You can choose how many occurances of a term must
be the minimum, and you can choose the minimum number of words in a term. The
output can be exported to two column plaintext (the second column contains a
count).

PlusTools is a macro that runs within MS Word on MS Windows. It is not GPL but
it is freeware for all purposes. It is slow but potentially useful for smaller
texts because it can exclude words that occur in MS Word’s spellchecker and/or
thesaurus, or words that have less than X synonymns in the thesaurus. You can
also exclude certain words (similar to a stoplist, but you can add any words to
it), words beginning with a certain set of characters, and words that are
smaller than X number of characters.

A concordancer is a tool that displays a word in context. An excellent, easy
to use concordancer, is Corpsis (previously Tenka Text). Or, if you’re willing
to pay some money, try Mike Scott’s Wordsmith tools. Corpsis can display
multiwords phrases using wildcards.

Whichever way you look at it, a glossary is a database, and most comprehensive
glossaries can be edited in a database editor tool. For simple, three-column
glossaries, a spreadsheet program may be all that’s necessary, though.

Virtaal can be used to edit many
formats, including many formats commonly used for storing terminology.

There are quite a few glossary viewers, but they often require that you convert
your glossary to their weird format. Examples are StarDict, Jalingo,
jDictionary and Pododict. If you’re willing to pay for a non-free product,
AnyLexic is an excellent glossary tool that supports simple formats too.

Virtaal can open many formats, including
many formats commonly used for storing terminology.