By
Randy Nichols (LANAKI)President of the American
Cryptogram Association from 1994-1996. Executive Vice
President from 1992-1994

CLASSICAL CRYPTOGRAPHY COURSE

BY LANAKI

Jan. 13, 1996

Revision 0

LECTURE 6

XENOCRYPT MORPHOLOGY

Part II

SUMMARY

In Lecture 6, we continue our review of materials related
to ciphers created in languages other than English. In order to
augment PHOENIX's soon to be published ACA Xenocrypt Handbook,
we will focus on six diverse systems: Arabic, Russian, Chinese,
Latin, Norwegian, and Hungarian. Each offers a unique
perspective in deciphering communications and supports the
cultural universal concept presented in Lecture 5.

Lecture 7 will give practical language data for
Xenocrypts commonly published in the Cryptogram - French,
Italian, Spanish, Portuguese. [I will not cover either
Esperanto or Interlinguia. I consider both as useful as
advanced Hittite in modern communications.]

SHAREWARE

I have transmitted to the Crypto
Drop Box word translation software for Russian, Spanish,
German, Danish and Portuguese. Single use license is granted.
Also, I have sent a Russian tutorial program to NORTH DECODER
to put on the Crypto Drop.

ARABIAN CONTRIBUTIONS TO
CRYPTOLOGY

A colleague of mine in Sweden sent me an interesting
reminder of the historical foundations of cryptology. He
suggested that I include in one of my lectures a discussion of
Dr. Ibrahim A. Al-Kadi's outstanding 1990 paper to the Swedish
Royal Institute of Technology in Stockholm regarding the Arabic
contributions to cryptology.

Dr. Al-Kadi reported on the Arabic scientist by the name
of Abu Yusuf Yaqub ibn Is-haq ibn as Sabbah ibn 'omran ibn
Ismail Al- Kindi, who authored a book on cryptology the
"Risalah fi Istikhraj al-Mu'amma" (Manuscript for the
Deciphering Cryptographic Messages) circa 750 AD. Al-Kindi
introduced cryptanalysis techniques, classification of ciphers,
Arabic Phonetics and Syntax and most importantly described the
use of several statistical techniques for cryptanalysis. [This
book apparently antedates other cryptology references by 300
years.] [It also predates writings on probability and
statistics by Pascal and Fermat by nearly 800 years.]

Dr. Al-Kadi also reported on the mathematical writings of
Al- Khwarizmi (780-847) who introduced common technical terms
such as 'zero', 'cipher', 'algorithm', 'algebra' and 'Arabic
numerals.' The decimal number system and the concept of zero
were originally developed in India.

The Arabs translated in the early ninth century,
Brahmagupta's "Siddharta" from Sanscrit into Arabic.
The new numerals were quickly adopted through-out the Islamic
empire from China to Spain. Translations of Al-Khwarizmi's book
on arithmetic by Robert of Chester, John of Halifax and the
Italian Leonardo of Pisa, aka Fibonacci strongly advocated the
use of Arabic numerals over the previous Roman Standard
Numerals (I,V,X,C,D,M).

The Roman system was very cumbersome because there was no
concept of zero or (empty space). The concept of zero which we
all think of as natural was just the opposite in medieval
Europe. In Sanscrit, the zero was called "sunya" or
"empty". The Arabs translated the Indian into the
Arabic equivalent "sifr". Europeans adopted the
concept and symbol but not name, but transformed it into Latin
equivalent "cifra" and "cephirium"
{Fibonnaci did this}. The Italian equivalent of these words
"zefiro", "zefro" and "zevero".
The latter was shortened to "Zero".

The French formed the word "chiffre" and
conceded the Italian word "zero". The English used
"zero" and "Cipher" from the word ciphering
as a means of computing. The Germans used the words
"ziffer" and "chiffer".

The concept of zero or sifr or cipher was so confusing
and ambiguous to common Europeans that in arguments people
would say "talk clearly and not so far fetched as a
cipher". Cipher came to mean concealment of clear messages
or simply encryption. Dr. Al-Kadi concluded that the Arabic
word sifr, for the digit zero, developed into the European
technical term for encryption. [KADI], [ALKA], [MRAY], [YOUS],
[BADE] , [NIC7]

NOTES ON RUSSIAN LANGUAGE

Reference [DAVI] gives one of the better breakdowns of
the modern Russian Alphabet (Soviet, post 1918) for solving
Russian Cryptograms in "The Cryptogram".

A prime difficulty for English speaking students of
Russian is the scarcity of linguistic cognates in the two
languages. Russian is more complex than other romantic
languages which have many common word derivatives. The highly
inflected Russian grammar aids rather than hinders the
cryptographer by supplying him with valuable tools for
decrypting.

My keyboard and supporting software does not permit a
comfortable translation of the Cyrillic, so I refer you to the
September-October 1976 Cryptogram for a survey of Russian and
several Xenocrypt examples.

RUSSIAN KRIPTOGRAMMA COLLECTION

ELINT

Radio communications can be heard which vary in frequency
from below the broadcast band, to almost the upper edge of the
radio spectrum (Ku-band satellite communications.) Common bands
are:

Whereas, VHF and UHF frequency ranges are occupied by cellular
phones, police, fire and government communications, the bulk of
HF region is devoted to COMINT signals. You should be able to
hear traffic from all over the globe, rather than the 50-75 mile
limit on the VHF and UHF bands. Three types of HF radio
communications may be heard/intercepted: continuous wave
(CW/Morse Code), single side band (SSB), and radio teletype
(RTTY). The Cubans seem to favor the latter form of
communication, especially from their revitalized center at
Lourdes.

[ROAC] provides the reader with common
abbreviations used in Russian RTTY and Morse traffic. His book
describes the delicate art (and guess work required) in traffic
analysis of Russian Kriptogramma messages between ship to shore.

Russian achievements in the art of cryptography rank first
rate to say the least. Three of my favorite cipher Russian
systems are: 1) Nihilist, 2) VIC - Disruption (aka straddling
bipartite monoalphabetic substitution super-enciphered by
modified double transposition) and 3) the One-Time Pad. Each of
these systems introduced tactical advantages for adverse
communication and had limited disadvantages for their service.

NIHILIST SUBSTITUTION

For some reason, Russian prisoners were not allowed
computers in their cells. Inmates were forbidden to talk, and to
outwit their jailers they invented a "knock" system to
indicate the rows and columns of a simple checkerboard (Polybius
square at 5x5 for English or 6x6 for 35 Russian letters). For
ex:

Prisoners memorized the proper numbers and "talked" at
about 10-15 words per minute. One of the advantages was that it
afforded communication by a great variety of media - anything
that could be dotted, knotted, pierced, flashed or indicate
numerals in any way could be used. The innocuous letter was
always suspicious. [KAH1]

Cipher text letters were indicated by the number of
letters written together; breaks in count by spaces in
handwriting; upstrokes, downstrokes, thumbnail prints, all
subtly used to bootleg secrets in and out of prisons. The system
was universal in penal institutions. American POW's used it in
Vietnam. [LEWY], [SOLZ]

A simpler form of the Nihilist was in double transposition. The
plain-text was written in by rows (or diagonals); a keyword
switched the rows; a same or different keyword switched the
columns, and the resulting cipher text was removed by columns or
by one of forty (40) or more routes out of the square.

Clues to cryptanalysis of the Nihilist systems were
reconstructing the routes, evenness of distribution of vowels,
period determination and digram/trigram frequency in cipher
text. The USA Army for many years used a similar system.
Reference [COUR] discusses the U.S. Army Double Transposition
Cipher in detail.

VIC-DISRUPTION CIPHER

The Vic-Disruption Cipher brought the old Nihilist
Substitution to a peak of perfection. It merged the straddling
checkerboard with the one-time key. It increased the efficiency
of the checkerboard by specifically giving the high frequency
letters (O,S,N,E,A; P,G ) the single digits (along with two low
frequency letters). The seven letters: 'snegopa' comprise about
40% of normal Russian text. Let me focus on interesting
elements.

The VIC algorithm is described as follows:The plain
text is encoded by a Substitution Table (ST). The intermediate
cipher text [ICT] is then passed through two (2) transposition
tables (TT1 and TT2), each performing a different transposition
on the ICT.

TT1 performs a simple columnar transposition: the ICT is
placed in TT1 by rows and removed by columns in the order of
TT1's columnar key and transcribed into TT2.

TT2 is vertically partitioned into Disruption , or D
areas. These partitions are formed by diagonals extending down
the table to the right boundary in columnar key order. The first
D area begins under column keynumber 1 and extends down to the
right border of TT2. A row is skipped. The second D area starts
under keynumber 2. The process continues for the entire key. The
number of rows in TT2 .ne. TT1 and is calculated by dividing the
number of cipher text input digits by the width of the table.

The ICT from TT1 is inscribed into TT2 horizontally from
left to right skipping the D areas. When all the non D area is
filled , then the D areas are filled in the same way. The cipher
text is removed by column per key order without regard to the D
areas.

KEYS

The VIC system used four memorized keys. Key 1 - the date
of WWII victory over Japan - 3/9/1945; Key 2 - the sequence of 5
numbers like pi - 3.1415; Key 3 - the first 20 letters of the
"Lone Accordion", or famous Russian song/poem, and Key
4 - the agent number, say 7. Key 1 was changed regularly. Key 4
was changed irregularly.

DISRUPTION ALGORITHM

The keys were used to generate the keys for transposition
and the coordinates for a checkerboard for substitution through
a complex LRE (Left to right enumeration) logic. The process
injected an arbitrary 5 number group into the cipher text which
strongly influenced the end result. This group changed from
message to message, so the enciphering keys (and cipher text)
would bear no exploitable relationship to each other. Not only
did TT1 and TT2 keys differ but also the widths of the blocks
did as well.

The coordinates kept changing. The D areas prevented the
analyst from back derivation of the first TT1. The D areas
increased the difficulty of finding the pattern and the
straddling effect on the checkerboard increased the difficulty
of frequency counts. Although not impossible to break, in
practice a tough monkey indeed. The FBI failed for four years to
solve it.

top line are among most frequent English letters similar
to 'SNEGOPAD' in Russian.

Ambiguity in decipherment is reduced because the last
three slots in the first row are empty and the first coordinate
of the two coordinate characters is unique.

[VOGE] gives a detailed look at the key generation
recursion mathematics for this cipher. It describes the LRE
(left to right enumeration) process in nauseating detail.

The TT1 and TT2 are built up on the recursion sequence
X(i+5) = X(i) + X(i+1) for i = 1,5 using mod 10 math. Key 1 was
used to insert at end of message (5th unit in this example). Key
1 was also the initial point for a series of manipulations with
Key 2,3,and 4.

RUSSIAN IMPROVEMENTS

Hayhanen incorporated some nasty refinements. Before
encipherment, the plain text was bifurcated and the two halves
switched so that the standard beginnings and endings could not
be identified. The ST contained a 'message starts' character.
The ST was extended to ASCII characters. The VIC encipherment
consisted of one round. After 1970, with the advent of
programmable hand calculators, a multiple round version was
produced.

MERITS

Consisting of simple enough elements, this cipher is one
tough monkey.

The complication in substitution was the straddling device
on the checkerboard. The irregular alternating of coordinates of
two different lengths makes it harder for cryptanalysis by
dividing the list into proper pairs and singletons.

The complication in the transposition was the Disruption areas.
D areas blocked the reconstruction of the first tableau. A
correct sorting of the columns is forestalled by the D areas.

The cipher text is only 62% increased over plain text
because of the high frequency letters in the first row of the
ST.

ONE-TIME PAD REVISITED

The One-Time Pad was covered in LECTURE 3 and we are
reminded that it is truly an unbreakable cipher system. There
are many descriptions of this cipher. Bruce Schneier's
discussions are quite relevant. [SCHN] , [SCH2]

FRESH KEY DRAWBACK

The One-Time Pad has a drawback - the quantities of fresh
key required. For military messages in the field (a fluid
situation) a practical limit is reached. It is impossible to
produce and distribute sufficient fresh key to the units. During
WWII, the US Army's European theater HQs transmitted, even
before the Normandy invasion, 2 million five (5) letter code
groups a day! It would have therefore consumed 10 million
letters of key every 24 hours - the equivalent of a shelf of 20
average books. [SCHN]

RANDOMNESS

The real issue for the One-Time Pad, is that the
keys must be truly random. Attacks against the One-Time Pad must
be against the method used to generate the key itself. Pseudo-
random number generators don't count; often they have nonrandom
properties. Reference [SCHN] Chapter 15, discusses in detail
random sequence generators and stream cipher. [SCHN], [KAHN],
[RHEE]

CHINESE CRYPTOGRAPHY

ENCIPHERING

Dr. August suggests that the Four Corner System
and the Chinese Phonetic Alphabet System lend themselves to
manual cryptographic treatment. His treatment of these two
systems is easier to understand than some military texts on the
subject. [AUG1]

3
Xj = U v1 eq 1
1-3

This union is called an asymmetric code.

The Four Corner System encodes characters into several
generic shapes. Each character is broken into four (4)
quadrants, and assigned a digit to the generic shape that best
corresponds to the actual shape.

The Chinese Phonetic Alphabet is Pinyin with symbols
instead of English letters. Each symbol corresponds to one of 37
ordered phonetic sounds. The 21 initial, 3 medial and 13 finals
are a unique ordered set - a true alphabet.

The strength of encryption of Chinese is dependent on the
specific Chinese encoding character schemes. Three cases are:

1). Phonetic Alphabet Only: The cipher must include both a
transposition (to hide cohesion and positional
limitations) and a substitution (to hide the frequency
patterns.)
2) Four Corner System: The cipher can be based on ring
operations [performed on codewords rather than
characters, either on an individual basis or over the
whole message; the name comes from the algebraic
operations involving integers mod 10 or mod 37] which
super-encipher the encoded text.
3) Combination of Methods 1) and 2): A text encoded by a
combination of both methods will need a cipher employing
both transposition and substitution. The transposition
needs to mix up the symbols within codewords and the
message itself. This prevents a bifurcated analysis.
[AUG1], [AUG2]

1. Initials follow a medial or final.
2. Finals follow an initial or medial.
3. [zh, ch, sh ] do not combine with i or u'.
4. [ j, q, x ] do not combine with a or e finals.
5. qa, qan = no but quan, qian, qia = yes
6. no double phonetics in a single codeword.
7. medials double frequently.
8. 13 limits on combinations within a codeword.

Approximately 63% of characters require 2 phonetic symbols. About
1/3 were three long, and about 4% are one symbol.

In Chinese there is more dependence between encoding and
enciphering operations than in English. The choice of the
encoding system influences the type of enciphering operations.
Dr. August provides solved examples of the above systems. [AUG2]

HISTORICAL PERSPECTIVES

China appears to have had a much delayed entry into the cipher
business. Partially because so many Chinese did not read or
write, and partially because the language was so complex, Chinese
cryptography was limited until the 19 century. But there were
seeds.

The Chinese strategist Sun Tzu (500 b.c.) recommended a
true but small code, which limited the plaintext to 40 elements
and assigned them to the first 40 characters of a poem, forming
a substitution table. Richard Deacon describes a method of code
encryption which the secret society Triads used in the early
1800's. [DEAC] The Tong's in San Francisco used the same system.
This method limited the plaintext space and based codewords on
multiples of three.

The "Inner Ring" techniques taught to Sa Bu
Nim's (teachers) by the masters of Korean Tae Kwon Do (which
came from the Ancient Tae Kwan and before that Kung Fu) were
passed on by means of codeword transposition ciphers. [CHOI] In
1985, Sun Yat-Sen used codes to transmit information by
telegraph. [TUKK]) During WWII, Herbert Yardley taught
Kuomintang soldiers to cryptanalyze Japanese ciphers. However,
the Japanese had already outpaced the Chinese in cryptanalytical
abilities.

Japan's Chuo tokujobu (Central Bureau Of Signal
Intelligence) was responsible for crypto-communication and
signal intelligence, including cryptanalysis, translation,
interception, and direction finding against the Soviet Union,
China and Britain. It began operations in 1921. [YUKI],[YAR1]

In May 1928, the Angohan (Codes and Ciphers Office)
obtained excellent results in intercepting and decoding Chinese
codes during the Sino-Japanese clash at Tsinan between Chiang
Kaishek's Northern Expeditionary Army and the IJA (Imperial
Japanese Army). [FUMI]

The warlord Chang Tso-lin was murdered in June 1928.
Angohan succeeded in decoding "Young Marshal" Chang
Hsueh-liang's secret communications and made a substantial
contribution to the understanding of the warlord politics of
Manchuria. [SANB]

The Anjohan not only mastered the basics of Chinese codes
and ciphers but also broke the Nanking Government and the
Chinese Legation codes in Tokyo. [YOKO]

The Chinese codes in 1935 were called "Mingma".
They were basically made up of four digit numbers. The Chinese
did not encode the name of either the sender or receiver, nor
the date or the time of the message. The China Garrison Army's
Tokujohan office was able to disclose the composition, strength,
and activities of Chiang Kai-shek's branch armies, such as those
led by Sung Che-yuan and Chang Hseuh-liang. It was not able to
decode the Chinese Communist or Air Force messages. [HIDE]

By the time of the 1937 Sino-Japanese War, Japanese
cryptanalytical experts had been able to greatly expand their
knowledge of the Chinese system of codes and ciphers, as well as
improve their decoding skills. About 80% of what was intercepted
was decoded. This included military and diplomatic codes but not
the Communist code messages. [EIIC]

Chinese Nationalists upgraded their Mingma codes in 1938.
They adopted a different system, called tokushu daihon (special
code book) in Japanese which complicated by mixing compound
words. By October, 1940, Chiang Kai-shek's main forces were
using a repeating key system. This stumped the Japanese
cryptanalysts for a short time, then they returned to a 75%
decoding level during the war. They continued to make great
contributions to major military operations in China. [HIDE]

The Japanese broke the Kuomintang codes during the
Chungyuang Operation in the Southern Shansi or Chungt'iao
Mountain Campaign. [CHUN] In February 1941, significant
penetration of Communist signal traffic was obtained. [YOKO]

The tokujo operations against the North China Area Army
and the Chinese Communist codes was tragic failure. [HISA] The
IJA's China experts held a highly negative image towards the
Chinese.

This may have prejudiced their attitude towards
intelligence estimates of China and the Chinese which in turn
adversely affected their operational (crypto-intelligence)
thinking on China in general. [THEO]

When the Sian mutiny broke out and Chiang Kai-shek was
kidnapped in December 1936, Major General Isogai (IJA's leading
expert in COMINT for China) toasted (more like roasted) the
demise of Chiang. Colonel Kanji Ishiwara (Japan's chief military
strategist) deplored the incident because he felt China was on
the brink of unity because of Chiang Kai-shek's efforts. He
considered the ability to read Chiang's codes just a matter of
doing the business of war. [SHIN]

LATIN

BRASSPOUNDER gives us a good introduction to Latin in Reference
[LATI]. Until modern times Latin was a dominant language in
schools, churches, and state in Western Europe. Professionals use
Latin to confuse the general populace. Latin is closely related
to all of the Romance languages.

The Latin alphabet is the same as the English-language
alphabet, except that it has no equivalents for K, W, J, or U.
These have crept into current usage for their phonetic value.
The J replaced I as in hic jacet instead of the classical hic
iacet. The letter W has no equivalent. The letter U was the
Greek Y, and in classical times was written as a U. C is now
used to form the hard sound as in CEL instead of KEL. A double
UU approximated a W. Latin therefore is a 25 letter alphabet.

The order of frequency according to Kluber, reduced to
percentages, taken from reference [TRAI]:

Norwegian is a beautiful language which consists of two forms,
Bokmal (Book Language) and Nynorsk. Book language is the
generally read form. Norwegian is similar to English with the
addition of three vowels AE, 0, A'. Foreign consonant letters are
C, Q, W, X and Z. Based on 5153 letters, a frequency analysis
reduced to 100 letters is:

Phoenix's soon to be published ACA Xenocrypt Handbook gives
further data on digraphs and trigraphs representing less than 2%
of totals.

HUNGARIAN

Hungarian (aka Magyar) is related to Finnish and Estonian.
Hungarian has 38 sounds based on a Latin alphabet. Reference
[HUNG] shows the full alphabet as a combination of letters. There
is no Q, W, or X in Hungarian. Only 23 Latin letters are used.
Reference [HUNG] also gives Xenocrypt examples.

Hungarian has four special characteristics:

1. It agglutinates - adjectives, possessives are expressed by suffixes.
2. It has vowel harmony - they fall into high and low vowel categories.
High - E, I, OE, UE and Low- A O U. In a word they are all either
high or low. 3. It assimilates consonants - usually the third or
fourth letter from the end. Many doubles. 4. It has no gender differentiation.

The three part crib can only be located in one position. A first
guess of ZIMMER gives der, die, and zweit. A guess of FREUND
yields much of the in the rest of the text. Schicksalsschlag can
be found in the dictionary. Fre-1.