Early character codes associated with the optical or electrical telegraph could only represent a subset of the characters used in written languages, sometimes restricted to upper case letters, numerals and some punctuation only. The low cost of digital representation of data in modern computer systems allows more elaborate character codes (such as Unicode) which represent most of the characters used in many written languages. Character encoding using internationally accepted standards permits worldwide interchange of text in electronic form.

Morse code was introduced in the 1840s and is used to encode each letter of the Latin alphabet, each Arabic numeral, and some other characters via a series of long and short presses of a telegraph key. Representations of characters encoded using Morse code varied in length.

The Baudot code, a five-bit encoding, was created by Émile Baudot in 1870, patented in 1874, modified by Donald Murray in 1901, and standardized by CCITT as International Telegraph Alphabet No. 2 (ITA2) in 1930.

Fieldata, a six- or seven-bit code, was introduced by the U.S. Army Signal Corps in the late 1950s.

IBM's Binary Coded Decimal (BCD) was a six-bit encoding scheme used by IBM in as early as 1959 in its 1401 and 1620 computers, and in its 7000 Series (for example, 704, 7040, 709 and 7090 computers), as well as in associated peripherals. BCD extended existing simple four-bit numeric encoding to include alphabetic and special characters, mapping it easily to punch-card encoding which was already in widespread use. It was the precursor to EBCDIC.

ASCII was introduced in 1963 and is a seven-bit encoding scheme used to encode letters, numerals, symbols, and device control codes as fixed-length codes using integers.

The limitations of such sets soon became apparent, and a number of ad hoc methods were developed to extend them. The need to support more writing systems for different languages, including the CJK family of East Asian scripts, required support for a far larger number of characters and demanded a systematic approach to character encoding rather than the previous ad hoc approaches.

In trying to develop universally interchangeable character encodings, researchers in the 1980s faced the dilemma that on the one hand, it seemed necessary to add more bits to accommodate additional characters, but on the other hand, for the users of the relatively small character set of the Latin alphabet (who still constituted the majority of computer users), those additional bits were a colossal waste of then-scarce and expensive computing resources (as they would always be zeroed out for such users).

The compromise solution that was eventually found and developed into Unicode was to break the assumption (dating back to telegraph codes) that each character should always directly correspond to a particular sequence of bits. Instead, characters would first be mapped to a universal intermediate representation in the form of abstract numbers called code points. Code points would then be represented in a variety of ways and with various default numbers of bits per character (code units) depending on context. To encode code points higher than the length of the code unit, such as above 256 for 8-bit units, the solution was to implement variable-width encodings where an escape sequence would signal that subsequent bits should be parsed as a higher code point.

A character set is a collection of characters that might be used by multiple languages.

Example: The Latin character set is used by English and most European languages, though the Greek character set is used only by the Greek language.

A coded character set is a character set in which each character corresponds to a unique number.

A code point of a coded character set is any allowed value in the character set.

A code unit is a bit sequence used to encode each character of a repertoire within a given encoding form.

Character repertoire (the abstract set of characters)

The character repertoire is an abstract set of more than one million characters found in a wide variety of scripts including Latin, Cyrillic, Chinese, Korean, Japanese, Hebrew, and Aramaic.

Other symbols such as musical notation are also included in the character repertoire. Both the Unicode and GB18030 standards have a character repertoire. As new characters are added to one standard, the other standard also adds those characters, to maintain parity.

The code unit size is equivalent to the bit measurement for the particular encoding:

Example of a code unit: Consider a string of the letters "abc" followed by U+10400𐐀DESERET CAPITAL LETTER LONG I (represented with 1 char32_t, 2 char16_t or 4 char8_t). That string contains:

four characters;

four code points

either:

four code units in UTF-32 (00000061, 00000062, 00000063, 00010400)

five code units in UTF-16 (0061, 0062, 0063, d801, dc00), or

seven code units in UTF-8 (61, 62, 63, f0, 90, 90, 80).

To express a character in Unicode, the hexadecimal value is prefixed with the string 'U+'. The range of valid code points for the Unicode standard is U+0000 to U+10FFFF, inclusive, divided in 17 planes, identified by the numbers 0 to 16. Characters in the range U+0000 to U+FFFF are in the plane 0, called the Basic Multilingual Plane (BMP). This plane contains most commonly-used characters. Characters in the range U+10000 to U+10FFFF in the other planes are called supplementary characters.

The following table shows examples of code point values:

Character

Unicode code point

Glyph

Latin A

U+0041

Α

Latin sharp S

U+00DF

ß

Han for East

U+6771

東

Ampersand

U+0026

&

Inverted exclamation mark

U+00A1

¡

Section sign

U+00A7

§

A code point is represented by a sequence of code units. The mapping is defined by the encoding. Thus, the number of code units required to represent a code point depends on the encoding:

UTF-8: code points map to a sequence of one, two, three or four code units.

UTF-16: code units are twice as long as 8-bit code units. Therefore, any code point with a scalar value less than U+10000 are encoded with a single code unit. Code points with a value U+10000 or higher require two code units each. These pairs of code units have a unique term in UTF-16: "Unicode surrogate pairs".

UTF-32: the 32-bit code unit is large enough that every code point is represented as a single code unit.

GB18030: multiple code units per code point are common, because of the small code units. Code points are mapped to one, two, or four code units.[3]

Unicode and its parallel standard, the ISO/IEC 10646 Universal Character Set, together constitute a modern, unified character encoding. Rather than mapping characters directly to octets (bytes), they separately define what characters are available, corresponding natural numbers (code points), how those numbers are encoded as a series of fixed-size natural numbers (code units), and finally how those units are encoded as a stream of octets. The purpose of this decomposition is to establish a universal set of characters that can be encoded in a variety of ways.[4] To describe this model correctly requires more precise terms than "character set" and "character encoding." The terms used in the modern model follow:[4]

A character repertoire is the full set of abstract characters that a system supports. The repertoire may be closed, i.e. no additions are allowed without creating a new standard (as is the case with ASCII and most of the ISO-8859 series), or it may be open, allowing additions (as is the case with Unicode and to a limited extent the Windows code pages). The characters in a given repertoire reflect decisions that have been made about how to divide writing systems into basic information units. The basic variants of the Latin, Greek and Cyrillic alphabets can be broken down into letters, digits, punctuation, and a few special characters such as the space, which can all be arranged in simple linear sequences that are displayed in the same order they are read. But even with these alphabets, diacritics pose a complication: they can be regarded either as part of a single character containing a letter and diacritic (known as a precomposed character), or as separate characters. The former allows a far simpler text handling system but the latter allows any letter/diacritic combination to be used in text. Ligatures pose similar problems. Other writing systems, such as Arabic and Hebrew, are represented with more complex character repertoires due to the need to accommodate things like bidirectional text and glyphs that are joined together in different ways for different situations.

A coded character set (CCS) is a function that maps characters to code points (each code point represents one character). For example, in a given repertoire, the capital letter "A" in the Latin alphabet might be represented by the code point 65, the character "B" to 66, and so on. Multiple coded character sets may share the same repertoire; for example ISO/IEC 8859-1 and IBM code pages 037 and 500 all cover the same repertoire but map them to different code points.

A character encoding form (CEF) is the mapping of code points to code units to facilitate storage in a system that represents numbers as bit sequences of fixed length (i.e. practically any computer system). For example, a system that stores numeric information in 16-bit units can only directly represent code points 0 to 65,535 in each unit, but larger code points (say, 65,536 to 1.4 million) could be represented by using multiple 16-bit units. This correspondence is defined by a CEF.

Next, a character encoding scheme (CES) is the mapping of code units to a sequence of octets to facilitate storage on an octet-based file system or transmission over an octet-based network. Simple character encoding schemes include UTF-8, UTF-16BE, UTF-32BE, UTF-16LE or UTF-32LE; compound character encoding schemes, such as UTF-16, UTF-32 and ISO/IEC 2022, switch between several simple schemes by using byte order marks or escape sequences; compressing schemes try to minimise the number of bytes used per code unit (such as SCSU, BOCU, and Punycode).

Finally, there may be a higher level protocol which supplies additional information to select the particular variant of a Unicode character, particularly where there are regional variants that have been 'unified' in Unicode as the same character. An example is the XML attribute xml:lang.

The Unicode model uses the term character map for historical systems which directly assign a sequence of characters to a sequence of bytes, covering all of CCS, CEF and CES layers.[4]

Historically, the terms "character encoding", "character map", "character set" and "code page" were synonymous in computer science, as the same standard would specify a repertoire of characters and how they were to be encoded into a stream of code units – usually with a single character per code unit. But now the terms have related but distinct meanings,[5] due to efforts by standards bodies to use precise terminology when writing about and unifying many different encoding systems.[4] Regardless, the terms are still used interchangeably, with character set being nearly ubiquitous.

A "code page" usually means a byte-oriented encoding, but with regard to some suite of encodings (covering different scripts), where many characters share the same codes in most or all those code pages. Well-known code page suites are "Windows" (based on Windows-1252) and "IBM"/"DOS" (based on code page 437), see Windows code page for details. Most, but not all, encodings referred to as code pages are single-byte encodings (but see octet on byte size.)

IBM's Character Data Representation Architecture (CDRA) designates with coded character set identifiers (CCSIDs) and each of which is variously called a "charset", "character set", "code page", or "CHARMAP".[4]

The term "code page" does not occur in Unix or Linux where "charmap" is preferred, usually in the larger context of locales.

Contrasted to CCS above, a "character encoding" is a map from abstract characters to code words. A "character set" in HTTP (and MIME) parlance is the same as a character encoding (but not the same as CCS).

"Legacy encoding" is a term sometimes used to characterize old character encodings, but with an ambiguity of sense. Most of its use is in the context of Unicodification, where it refers to encodings that fail to cover all Unicode code points, or, more generally, using a somewhat different character repertoire: several code points representing one Unicode character,[6] or versa (see e.g. code page 437). Some sources refer to an encoding as legacy only because it preceded Unicode.[7] All Windows code pages are usually referred to as legacy, both because they antedate Unicode and because they are unable to represent all 221 possible Unicode code points.

As a result of having many character encoding methods in use (and the need for backward compatibility with archived data), many computer programs have been developed to translate data between encoding schemes as a form of data transcoding. Some of these are cited below.

1.
Code
–
An early example is the invention of language which enabled a person, through speech, to communicate what he or she saw, heard, felt, or thought to others. But speech limits the range of communication to the distance a voice can carry, decoding is the reverse process, converting code symbols back into a form that the recipient of that understands time. One reason for coding is to communication in places where ordinary plain language. For example, semaphore, where the configuration of flags held by a signaller or the arms of a semaphore tower encodes parts of the message, another person standing a great distance away can interpret the flags and reproduce the words sent. An extension of the code for representing sequences of symbols over the alphabet is obtained by concatenating the encoded strings. Before giving a precise definition, this is a brief example. The mapping C = is a code, whose alphabet is the set. Using the extension of the code, the encoded string 0011001011 can be grouped into codewords as 0 011 0 01 011, and these in turn can be decoded to the sequence of source symbols acabc. Using terms from formal language theory, the mathematical definition of this concept is as follows. In this section, we consider codes which encode each character by a code word from some dictionary. Variable-length codes are useful when clear text characters have different probabilities. A prefix code is a code with the property, there is no valid code word in the system that is a prefix of any other valid code word in the set. Huffman coding is the most known algorithm for deriving prefix codes, prefix codes are widely referred to as Huffman codes even when the code was not produced by a Huffman algorithm. Other examples of prefix codes are country calling codes, the country and publisher parts of ISBNs, Krafts inequality characterizes the sets of codeword lengths that are possible in a prefix code. Virtually any uniquely decodable code, not necessary a prefix one. Codes may also be used to represent data in a way more resistant to errors in transmission or storage, such a code is called an error-correcting code, and works by including carefully crafted redundancy with the stored data. Examples include Hamming codes, Reed–Solomon, Reed–Muller, Walsh–Hadamard, Bose–Chaudhuri–Hochquenghem, Turbo, Golay, Goppa, low-density parity-check codes, error detecting codes can be optimised to detect burst errors, or random errors. A cable code replaces words with words, allowing the same information to be sent with fewer characters, more quickly

2.
Abstraction level
–
In the Unix operating system, most types of input and output operations are considered to be streams of bytes read from a device or written to a device. This stream of model is used for file I/O, socket I/O. The devices physical characteristics are mediated by the system which in turn presents an abstract interface that allows the programmer to read. The operating system performs the actual transformation needed to read. Most graphics libraries such as OpenGL provide a graphical device model as an interface. The library is responsible for translating the commands provided by the programmer into the specific device commands needed to draw the graphical elements and objects, in computer science, an abstraction level is a generalization of a model or algorithm, away from any specific implementation. These generalizations arise from broad similarities that are best encapsulated by models that express similarities present in various specific implementations, a layer is on top of another, because it depends on it. Every layer can exist without the layers above it, and requires the layers below it to function, frequently abstraction layers can be composed into a hierarchy of abstraction levels. The ISO-OSI networking model comprises seven abstraction layers, a famous aphorism of David Wheeler is All problems in computer science can be solved by another level of indirection. This is often deliberately misquoted with abstraction substituted for indirection and it is also sometimes misattributed to Butler Lampson. Kevlin Henneys corollary to this is. except for the problem of too many layers of indirection, firmware may include only low-level software, but can also include all software, including an operating system and applications. A distinction can also be made from low-level programming languages like VHDL, machine language, assembly language to compiled languages and interpreter, hardware abstraction Application programming interface Application binary interface Database Information hiding Layer – in object-oriented design Protection ring Software engineering

3.
Number
–
Numbers that answer the question How many. Are 0,1,2,3 and so on, when used to indicate position in a sequence they are ordinal numbers. To the Pythagoreans and Greek mathematician Euclid, the numbers were 2,3,4,5, Euclid did not consider 1 to be a number. Numbers like 3 +17 =227, expressible as fractions in which the numerator and denominator are whole numbers, are rational numbers and these make it possible to measure such quantities as two and a quarter gallons and six and a half miles. What we today would consider a proof that a number is irrational Euclid called a proof that two lengths arising in geometry have no common measure, or are incommensurable, Euclid included proofs of incommensurability of lengths arising in geometry in his Elements. In the Rhind Mathematical Papyrus, a pair of walking forward marked addition. They were the first known civilization to use negative numbers, negative numbers came into widespread use as a result of their utility in accounting. They were used by late medieval Italian bankers, by 1740 BC, the Egyptians had a symbol for zero in accounting texts. In Maya civilization zero was a numeral with a shape as a symbol. The ancient Egyptians represented all fractions in terms of sums of fractions with numerator 1, for example, 2/5 = 1/3 + 1/15. Such representations are known as Egyptian Fractions or Unit Fractions. The earliest written approximations of π are found in Egypt and Babylon, in Babylon, a clay tablet dated 1900–1600 BC has a geometrical statement that, by implication, treats π as 25/8 =3.1250. In Egypt, the Rhind Papyrus, dated around 1650 BC, astronomical calculations in the Shatapatha Brahmana use a fractional approximation of 339/108 ≈3.139. Other Indian sources by about 150 BC treat π as √10 ≈3.1622 The first references to the constant e were published in 1618 in the table of an appendix of a work on logarithms by John Napier. However, this did not contain the constant itself, but simply a list of logarithms calculated from the constant and it is assumed that the table was written by William Oughtred. The discovery of the constant itself is credited to Jacob Bernoulli, the first known use of the constant, represented by the letter b, was in correspondence from Gottfried Leibniz to Christiaan Huygens in 1690 and 1691. Leonhard Euler introduced the letter e as the base for natural logarithms, Euler started to use the letter e for the constant in 1727 or 1728, in an unpublished paper on explosive forces in cannons, and the first appearance of e in a publication was Eulers Mechanica. While in the subsequent years some researchers used the letter c, e was more common, the first numeral system known is Babylonian numeric system, that has a 60 base, it was introduced in 3100 B. C. and is the first Positional numeral system known

4.
Computer data storage
–
Computer data storage, often called storage or memory, is a technology consisting of computer components and recording media used to retain digital data. It is a function and fundamental component of computers. The central processing unit of a computer is what manipulates data by performing computations, in practice, almost all computers use a storage hierarchy, which puts fast but expensive and small storage options close to the CPU and slower but larger and cheaper options farther away. In the Von Neumann architecture, the CPU consists of two parts, The control unit and the arithmetic logic unit. The former controls the flow of data between the CPU and memory, while the latter performs arithmetic and logical operations on data, without a significant amount of memory, a computer would merely be able to perform fixed operations and immediately output the result. It would have to be reconfigured to change its behavior and this is acceptable for devices such as desk calculators, digital signal processors, and other specialized devices. Von Neumann machines differ in having a memory in which they store their operating instructions, most modern computers are von Neumann machines. A modern digital computer represents data using the numeral system. Text, numbers, pictures, audio, and nearly any form of information can be converted into a string of bits, or binary digits. The most common unit of storage is the byte, equal to 8 bits, a piece of information can be handled by any computer or device whose storage space is large enough to accommodate the binary representation of the piece of information, or simply data. For example, the works of Shakespeare, about 1250 pages in print. Data is encoded by assigning a bit pattern to each character, digit, by adding bits to each encoded unit, redundancy allows the computer to both detect errors in coded data and correct them based on mathematical algorithms. A random bit flip is typically corrected upon detection, the cyclic redundancy check method is typically used in communications and storage for error detection. A detected error is then retried, data compression methods allow in many cases to represent a string of bits by a shorter bit string and reconstruct the original string when needed. This utilizes substantially less storage for many types of data at the cost of more computation, analysis of trade-off between storage cost saving and costs of related computations and possible delays in data availability is done before deciding whether to keep certain data compressed or not. For security reasons certain types of data may be encrypted in storage to prevent the possibility of unauthorized information reconstruction from chunks of storage snapshots. Generally, the lower a storage is in the hierarchy, the lesser its bandwidth and this traditional division of storage to primary, secondary, tertiary and off-line storage is also guided by cost per bit. In contemporary usage, memory is usually semiconductor storage read-write random-access memory, typically DRAM or other forms of fast but temporary storage

5.
Telegraphy
–
Telegraphy is the long-distance transmission of textual or symbolic messages without the physical exchange of an object bearing the message. Thus semaphore is a method of telegraphy, whereas pigeon post is not, telegraphy requires that the method used for encoding the message be known to both sender and receiver. Such methods are designed according to the limits of the medium used. The use of signals, beacons, reflected light signals. In the 19th century, the harnessing of electricity led to the invention of electrical telegraphy, the advent of radio in the early 20th century brought about radiotelegraphy and other forms of wireless telegraphy. The word telegraph was first coined by the French inventor of the Semaphore line, Claude Chappe, a telegraph is a device for transmitting and receiving messages over long distances, i. e. for telegraphy. The word telegraph alone now generally refers to an electrical telegraph, Wireless telegraphy is also known as CW, for continuous wave, as opposed to the earlier radio technique of using a spark gap. Contrary to the definition used by Chappe, Morse argued that the term telegraph can strictly be applied only to systems that transmit. This is to be distinguished from semaphore, which transmits messages. Smoke signals, for instance, are to be considered semaphore, according to Morse, telegraph dates only from 1832 when Pavel Schilling invented one of the earliest electrical telegraphs. A telegraph message sent by a telegraph operator or telegrapher using Morse code was known as a telegram. A cablegram was a sent by a submarine telegraph cable. Later, a Telex was a sent by a Telex network. A wire picture or wire photo was a picture that was sent from a remote location by a facsimile telegraph. A diplomatic telegram, also known as a cable, is the term given to a confidential communication between a diplomatic mission and the foreign ministry of its parent country. These continue to be called telegrams or cables regardless of the used for transmission. Commercial electrical telegraphs were introduced from 1837, the first telegraphs came in the form of optical telegraph, including the use of smoke signals, beacons, or reflected light, which have existed since ancient times. Early proposals for a telegraph system were made to the Royal Society by Robert Hooke in 1684 and were first implemented on an experimental level by Sir Richard Lovell Edgeworth in 1767

6.
Unicode
–
Unicode is a computing industry standard for the consistent encoding, representation, and handling of text expressed in most of the worlds writing systems. As of June 2016, the most recent version is Unicode 9.0, the standard is maintained by the Unicode Consortium. Unicodes success at unifying character sets has led to its widespread, the standard has been implemented in many recent technologies, including modern operating systems, XML, Java, and the. NET Framework. Unicode can be implemented by different character encodings, the most commonly used encodings are UTF-8, UTF-16 and the now-obsolete UCS-2. UTF-8 uses one byte for any ASCII character, all of which have the same values in both UTF-8 and ASCII encoding, and up to four bytes for other characters. UCS-2 uses a 16-bit code unit for each character but cannot encode every character in the current Unicode standard, UTF-16 extends UCS-2, using one 16-bit unit for the characters that were representable in UCS-2 and two 16-bit units to handle each of the additional characters. Many traditional character encodings share a common problem in that they allow bilingual computer processing, Unicode, in intent, encodes the underlying characters—graphemes and grapheme-like units—rather than the variant glyphs for such characters. In the case of Chinese characters, this leads to controversies over distinguishing the underlying character from its variant glyphs. In text processing, Unicode takes the role of providing a unique code point—a number, in other words, Unicode represents a character in an abstract way and leaves the visual rendering to other software, such as a web browser or word processor. This simple aim becomes complicated, however, because of concessions made by Unicodes designers in the hope of encouraging a more rapid adoption of Unicode, the first 256 code points were made identical to the content of ISO-8859-1 so as to make it trivial to convert existing western text. For other examples, see duplicate characters in Unicode and he explained that he name Unicode is intended to suggest a unique, unified, universal encoding. In this document, entitled Unicode 88, Becker outlined a 16-bit character model, Unicode could be roughly described as wide-body ASCII that has been stretched to 16 bits to encompass the characters of all the worlds living languages. In a properly engineered design,16 bits per character are more than sufficient for this purpose, Unicode aims in the first instance at the characters published in modern text, whose number is undoubtedly far below 214 =16,384. By the end of 1990, most of the work on mapping existing character encoding standards had been completed, the Unicode Consortium was incorporated in California on January 3,1991, and in October 1991, the first volume of the Unicode standard was published. The second volume, covering Han ideographs, was published in June 1992, in 1996, a surrogate character mechanism was implemented in Unicode 2.0, so that Unicode was no longer restricted to 16 bits. The Microsoft TrueType specification version 1.0 from 1992 used the name Apple Unicode instead of Unicode for the Platform ID in the naming table, Unicode defines a codespace of 1,114,112 code points in the range 0hex to 10FFFFhex. Normally a Unicode code point is referred to by writing U+ followed by its hexadecimal number, for code points in the Basic Multilingual Plane, four digits are used, for code points outside the BMP, five or six digits are used, as required. Code points in Planes 1 through 16 are accessed as surrogate pairs in UTF-16, within each plane, characters are allocated within named blocks of related characters

7.
Braille
–
Braille /ˈbreɪl/ is a tactile writing system used by people who are blind or visually impaired. It is traditionally written with embossed paper, braille-users can read computer screens and other electronic supports thanks to refreshable braille displays. They can write braille with the slate and stylus or type it on a braille writer, such as a portable braille note-taker. Braille is named after its creator, Frenchman Louis Braille, who lost his eyesight due to a childhood accident, in 1824, at the age of 15, Braille developed his code for the French alphabet as an improvement on night writing. He published his system, which included musical notation, in 1829. The second revision, published in 1837, was the first binary form of writing developed in the modern era, Braille characters are small rectangular blocks called cells that contain tiny palpable bumps called raised dots. The number and arrangement of these dots distinguish one character from another, since the various braille alphabets originated as transcription codes of printed writing systems, the mappings vary from language to language. Braille cells are not the thing to appear in braille text. There may be embossed illustrations and graphs, with the lines either solid or made of series of dots, arrows, bullets that are larger than braille dots, a full Braille cell includes six raised dots arranged in two lateral rows each having three dots. The dot positions are identified by numbers from one through six,64 solutions are possible from using one or more dots. A single cell can be used to represent a letter, number, punctuation mark. In the face of screen-reader software, braille usage has declined, in Barbiers system, sets of 12 embossed dots encoded 36 different sounds. It proved to be too difficult for soldiers to recognize by touch, in 1821 Barbier visited the Royal Institute for the Blind in Paris, where he met Louis Braille. Brailles solution was to use 6-dot cells and to assign a specific pattern to each letter of the alphabet. At first, braille was a transliteration of French orthography, but soon various abbreviations, contractions. The expanded English system, called Grade-2 Braille, was complete by 1905, for blind readers, Braille is an independent writing system, rather than a code of printed orthography. Braille is derived from the Latin alphabet, albeit indirectly, in Brailles original system, the dot patterns were assigned to letters according to their position within the alphabetic order of the French alphabet, with accented letters and w sorted at the end. The first ten letters of the alphabet, a–j, use the upper four dot positions and these stand for the ten digits 1–9 and 0 in a system parallel to Hebrew gematria and Greek isopsephy

8.
International maritime signal flags
–
International maritime signal flags refers to various flags used to communicate with ships. The principal system of flags and associated codes is the International Code of Signals, various navies have flag systems with additional flags and codes, and other flags are used in special uses, or have historical significance. There are various methods by which the flags can be used as signals, Each flag spells an alphabetic message, one or more flags form a code word whose meaning can be looked up in a code book held by both parties. An example is the Popham numeric code used at the Battle of Trafalgar, in yacht racing and dinghy racing, flags have other meanings, for example, the P flag is used as the preparatory flag to indicate an imminent start, and the S flag means shortened course. NATO uses the same flags, with a few unique to warships, the NATO usage generally differs from the international meanings, and therefore warships will fly the Code/answer flag above the signal to indicate it should be read using the international meaning. During the Allied occupations of Axis countries after World War II, use, being swallowtails, they are commonly referred to as the C-pennant, D-pennant, and E-pennant. Notes Substitute or repeater flags allow messages with duplicate characters to be signaled without the need for multiple sets of flags, the four NATO substitute flags are as follows, The International Code of Signals includes only the first three of these substitute flags. To illustrate their use, here are some messages and the way they would be encoded, How Ships Talk With Flags, October 1944, Popular Science John Savards flag page

9.
Chinese telegraph code
–
The Chinese Telegraph Code, Chinese Telegraphic Code, or Chinese Commercial Code is a four-digit decimal code for electrically telegraphing messages written with Chinese characters. A codebook is provided for encoding and decoding the Chinese telegraph code and it shows one-to-one correspondence between Chinese characters and four-digit numbers from 0000 to 9999. Chinese characters are arranged and numbered in order according to their radicals. Each page of the book shows 100 pairs of a Chinese character, the PRC’s Standard Telegraph Codebook provides codes for approximately 7,000 Chinese characters. Senders convert their messages written with Chinese characters to a sequence of digits according to the codebook, for instance, the phrase 中文信息, meaning “information in Chinese, ” is rendered into the code as 0022242902071873. It is transmitted using the Morse code, receivers decode the Morse code to get a sequence of digits, chop it into an array of quadruplets, and then decode them one by one referring to the book. The codebook also defines codes for Zhuyin alphabet, Latin alphabet, Cyrillic alphabet, and various symbols including special symbols for months, days in a month, senders may translate their messages into numbers by themselves, or pay a small charge to have them translated by a telegrapher. Chinese expert telegraphers used to remember several thousands of codes of the most frequent use, the Standard Telegraph Codebook gives alternative three-letter code for Chinese characters. It compresses telegram messages and cuts international fees by 25% as compared to the four-digit code, looking up a character given a number is straightforward, page, row, column. However, looking up a given a character is more difficult. The four corner method was developed in the 1920s to allow people to easily look up characters by the shape. The first telegraph code for Chinese was brought into use soon after the Great Northern Telegraph Company introduced telegraphy to China in 1871, Septime Auguste Viguier, a Frenchman and customs officer in Shanghai, published a codebook, succeeding Danish astronomer Hans Carl Frederik Christian Schjellerup’s earlier work. In consideration of the former code’s insufficiency and disorder of characters and it remained in effect until the Ministry of Transportation and Communications printed a new book in 1929. In 1933, a supplement was added to the book, the Mainland version, the Standard Telegraph Codebook, adopted the simplified Chinese characters in 1981. The Chinese telegraph code can be used for a Chinese input method for computers, ordinary computer users today hardly master it because it needs a lot of rote memorization. However, the four corner method, which allows one to look up characters by shape, is used. The Hong Kong residents’ identification cards have the Chinese telegraph code for the holder’s Chinese name, business forms provided by the government and corporations in Hong Kong often require filling out telegraph codes for Chinese names. The codes help inputting Chinese characters to a computer, Chinese telegraph code is used extensively in law enforcement investigations worldwide that involve ethnic Chinese subjects where variant phonetic spellings of Chinese names can create confusion

10.
Hans Schjellerup
–
Hans Carl Frederik Christian Schjellerup was a Danish astronomer. He was born at Odense, the son of a jeweller, initially he was apprenticed as a watch maker, but in 1848 he passed the entrance exam for the Polytechnic School of Copenhagen. He graduated by passing an examination in applied mathematics and mechanics, in 1851 he became an observer at the University Observatory in Copenhagen. He soon became an instructor at the Polytechnic School, then in 1854 a Professor of Mathematics at the Denmark Naval Academy, in 1866, after the new observatory had been completed, Schjellerup assembled a catalog of red stars. In 1869, he drafted a proposal for a Chinese telegraph dictionary and he became an associate of the British Royal Astronomical Society in 1879. At the University of Copenhagen he became director of the observatory, serving in that capacity up until his death, the crater Schjellerup on the Moon is named after him. Monthly Notices of the Royal Astronomical Society, open Doors, Vilhelm Meyer and the establishment of General Electric in China

11.
Morse code
–
Morse code is a method of transmitting text information as a series of on-off tones, lights, or clicks that can be directly understood by a skilled listener or observer without special equipment. It is named for Samuel F. B, Morse, an inventor of the telegraph. Because many non-English natural languages use more than the 26 Roman letters, each Morse code symbol represents either a text character or a prosign and is represented by a unique sequence of dots and dashes. The duration of a dash is three times the duration of a dot, each dot or dash is followed by a short silence, equal to the dot duration. The letters of a word are separated by an equal to three dots, and the words are separated by a space equal to seven dots. The dot duration is the unit of time measurement in code transmission. To increase the speed of the communication, the code was designed so that the length of each character in Morse varies approximately inversely to its frequency of occurrence in English. Thus the most common letter in English, the letter E, has the shortest code, Morse code is used by some amateur radio operators, although knowledge of and proficiency with it is no longer required for licensing in most countries. Pilots and air controllers usually need only a cursory understanding. Aeronautical navigational aids, such as VORs and NDBs, constantly identify in Morse code, compared to voice, Morse code is less sensitive to poor signal conditions, yet still comprehensible to humans without a decoding device. Morse is, therefore, an alternative to synthesized speech for sending automated data to skilled listeners on voice channels. Many amateur radio repeaters, for example, identify with Morse, in an emergency, Morse code can be sent by improvised methods that can be easily keyed on and off, making it one of the simplest and most versatile methods of telecommunication. The most common signal is SOS or three dots, three dashes, and three dots, internationally recognized by treaty. Beginning in 1836, the American artist Samuel F. B, Morse, the American physicist Joseph Henry, and Alfred Vail developed an electrical telegraph system. This system sent pulses of current along wires which controlled an electromagnet that was located at the receiving end of the telegraph system. A code was needed to transmit natural language using only these pulses, around 1837, Morse, therefore, developed an early forerunner to the modern International Morse code. Around the same time, Carl Friedrich Gauss and Wilhelm Eduard Weber as well as Carl August von Steinheil had already used codes with varying lengths for their telegraphs. In 1837, William Cooke and Charles Wheatstone in England began using a telegraph that also used electromagnets in its receivers

12.
Baudot code
–
The Baudot code, invented by Émile Baudot, is a character set predating EBCDIC and ASCII. It was the predecessor to the International Telegraph Alphabet No,2, the teleprinter code in use until the advent of ASCII. Each character in the alphabet is represented by a series of bits, the symbol rate measurement is known as baud, and is derived from the same name. Technically, five-bit codes began in the 16th century, when Francis Bacon developed the now called Bacons cipher. However, this cipher is not a cipher and as such is not readily suitable for telecommunications. Baudot invented his original code in 1870 and patented it in 1874 and it was a 5-bit code, with equal on and off intervals, which allowed telegraph transmission of the Roman alphabet and punctuation and control signals. It was based on an earlier code developed by Carl Friedrich Gauss and it was a Gray code, nonetheless, the code by itself was not patented because French patent law does not allow concepts to be patented. Baudots original code was adapted to be sent from a manual keyboard, the code was entered on a keyboard which had just five piano type keys, operated with two fingers of the left hand and three fingers of the right hand. Operators had to maintain a rhythm, and the usual speed of operation was 30 words per minute. The table on the shows the allocation of the Baudot code which was employed in the British Post Office for continental. It will be observed that a number of characters in the code are replaced by fractionals in the inland code. Code elements 1,2 and 3 are transmitted by keys 1,2 and 3, code elements 4 and 5 are transmitted by keys 4 and 5, and these are operated by the first two fingers of the left hand. Baudots code became known as International Telegraph Alphabet No,1, and is no longer used. In 1901, Baudots code was modified by Donald Murray, prompted by his development of a typewriter-like keyboard. The Murray system employed an intermediate step, a keyboard perforator, which allowed an operator to punch a tape. At the receiving end of the line, a mechanism would print on a paper tape. For example, the letters are E and T. The ten two-hole letters are AOINSHRDLZ, very similar to the Etaoin shrdlu order used in Linotype machines, ten more letters have three holes, and the four-hole letters are VXKQ

13.
ASCII
–
ASCII, abbreviated from American Standard Code for Information Interchange, is a character encoding standard. ASCII codes represent text in computers, telecommunications equipment, and other devices, most modern character-encoding schemes are based on ASCII, although they support many additional characters. ASCII was developed from telegraph code and its first commercial use was as a seven-bit teleprinter code promoted by Bell data services. Work on the ASCII standard began on October 6,1960, the first edition of the standard was published in 1963, underwent a major revision during 1967, and experienced its most recent update during 1986. Compared to earlier telegraph codes, the proposed Bell code and ASCII were both ordered for more convenient sorting of lists, and added features for other than teleprinters. Originally based on the English alphabet, ASCII encodes 128 specified characters into seven-bit integers as shown by the ASCII chart above. The characters encoded are numbers 0 to 9, lowercase letters a to z, uppercase letters A to Z, basic punctuation symbols, control codes that originated with Teletype machines, for example, lowercase j would become binary 1101010 and decimal 106. ASCII includes definitions for 128 characters,33 are non-printing control characters that affect how text and space are processed and 95 printable characters, of these, the IANA encourages use of the name US-ASCII for Internet uses of ASCII. The ASA became the United States of America Standards Institute and ultimately the American National Standards Institute, there was some debate at the time whether there should be more control characters rather than the lowercase alphabet. The X3.2.4 task group voted its approval for the change to ASCII at its May 1963 meeting, the X3 committee made other changes, including other new characters, renaming some control characters and moving or removing others. ASCII was subsequently updated as USAS X3. 4-1967, then USAS X3. 4-1968, ANSI X3. 4-1977 and they proposed a 9-track standard for magnetic tape, and attempted to deal with some punched card formats. The X3.2 subcommittee designed ASCII based on the earlier teleprinter encoding systems, like other character encodings, ASCII specifies a correspondence between digital bit patterns and character symbols. This allows digital devices to communicate each other and to process, store. Before ASCII was developed, the encodings in use included 26 alphabetic characters,10 numerical digits, ITA2 were in turn based on the 5-bit telegraph code Émile Baudot invented in 1870 and patented in 1874. The committee debated the possibility of a function, which would allow more than 64 codes to be represented by a six-bit code. In a shifted code, some character codes determine choices between options for the character codes. It allows compact encoding, but is reliable for data transmission. The standards committee decided against shifting, and so ASCII required at least a seven-bit code, the committee considered an eight-bit code, since eight bits would allow two four-bit patterns to efficiently encode two digits with binary-coded decimal

14.
Latin alphabet
–
The Latin alphabet is the most widely used alphabetic writing system in the world. It is the script of the English language and is often referred to simply as the alphabet in English. It is an alphabet which originated in the 7th century BC in Italy and has changed continually over the last 2500 years. It has roots in the Semitic alphabet and its offshoot alphabets, the Phoenician, Greek, the phonetic values of some letters changed, some letters were lost and gained, and several writing styles developed. Two such styles, the minuscule and majuscule hands, were combined into one script with alternate forms for the lower and upper case letters, due to classicism, modern uppercase letters differ only slightly from their classical counterparts. The Latin alphabet started out as uppercase serifed letters known as roman square capitals, the lowercase letters evolved through cursive styles that developed to adapt the formerly inscribed alphabet to being written with a pen. Throughout the ages, many stylistic variations of each letter have evolved that are still identified as being the same letter. From the Cumae alphabet, the Etruscan alphabet was derived, the Latins ultimately adopted 21 of the original 26 Etruscan letters. Gaius Julius Hyginus, who recorded much Roman mythology, mentions in Fab, the Parcae, Clotho, Lachesis, and Atropos invented seven Greek letters — A B H T I Y. Others say that Mercury invented them from the flight of cranes, which, palamedes, too, son of Nauplius, invented eleven letters, Simonides, too, invented four letters — Ó E Z PH, Epicharmus of Sicily, two — P and PS. The Greek letters Mercury is said to have brought to Egypt, Cadmus in exile from Arcadia, took them to Italy, and his mother Carmenta changed them to Latin to the number of 15. Apollo on the added the rest. The original Latin alphabet was, The oldest Latin inscriptions do not distinguish between /ɡ/ and /k/, representing both by C, K and Q according to position, K was used before A, Q was used before O or V, C was used elsewhere. This is explained by the fact that the Etruscan language did not make this distinction, C originated as a turned form of Greek Gamma and Q from Greek Koppa. In later Latin, K survived only in a few such as Kalendae, Q survived only before V. G was later invented to distinguish between /ɡ/ and /k/, it was simply a C with an additional diacritic. C stood for /ɡ/ I stood for both /i/ and /j/, V stood for both /u/ and /w/. K was marginalized in favour of C, which stood for both /ɡ/ and /k/

15.
Arabic numerals
–
In this numeral system, a sequence of digits such as 975 is read as a single number, using the position of the digit in the sequence to interpret its value. The symbol for zero is the key to the effectiveness of the system, the system was adopted by Arab mathematicians in Baghdad and passed on to the Arabs farther west. There is some evidence to suggest that the numerals in their current form developed from Arabic letters in the Maghreb, the current form of the numerals developed in North Africa, distinct in form from the Indian and eastern Arabic numerals. The use of Arabic numerals spread around the world through European trade, books, the term Arabic numerals is ambiguous. It most commonly refers to the widely used in Europe. Arabic numerals is also the name for the entire family of related numerals of Arabic. It may also be intended to mean the numerals used by Arabs and it would be more appropriate to refer to the Arabic numeral system, where the value of a digit in a number depends on its position. The decimal Hindu–Arabic numeral system was developed in India by AD700, the development was gradual, spanning several centuries, but the decisive step was probably provided by Brahmaguptas formulation of zero as a number in AD628. The system was revolutionary by including zero in positional notation, thereby limiting the number of digits to ten. It is considered an important milestone in the development of mathematics, one may distinguish between this positional system, which is identical throughout the family, and the precise glyphs used to write the numerals, which varied regionally. The glyphs most commonly used in conjunction with the Latin script since early modern times are 0123456789. The first universally accepted inscription containing the use of the 0 glyph in India is first recorded in the 9th century, in an inscription at Gwalior in Central India dated to 870. Numerous Indian documents on copper plates exist, with the symbol for zero in them, dated back as far as the 6th century AD. Inscriptions in Indonesia and Cambodia dating to AD683 have also been found and their work was principally responsible for the diffusion of the Indian system of numeration in the Middle East and the West. In the 10th century, Middle-Eastern mathematicians extended the decimal system to include fractions. The decimal point notation was introduced by Sind ibn Ali, who wrote the earliest treatise on Arabic numerals. Ghubar numerals themselves are probably of Roman origin, some popular myths have argued that the original forms of these symbols indicated their numeric value through the number of angles they contained, but no evidence exists of any such origin. In 825 Al-Khwārizmī wrote a treatise in Arabic, On the Calculation with Hindu Numerals, Algoritmi, the translators rendition of the authors name, gave rise to the word algorithm

16.
Telegraph key
–
Telegraph key is a general term for any switching device used primarily to send Morse code. Similar keys are used for all forms of telegraphy, such as in ‘wire’ or electrical telegraph. Since its original inception, the telegraph keys design has developed such that there are now multiple types of keys, a straight key is the common telegraph key as seen in various movies. It is a bar with a knob on top and a contact underneath. When the bar is depressed against spring tension, it forms a circuit, traditionally, American telegraph keys had flat topped knobs and narrow bars. British telegraph keys had ball shaped knobs and thick bars and this appears to be purely a matter of culture and training, but the users of each are tremendously partisan. Straight keys have been made in variations for over 150 years. They are the subject of a community of key collectors. The straight keys used in wire telegraphy also had a bar that closed the electrical circuit when the operator was not actively sending messages. This was to complete the path to the next station so that its sounder would operate. Although occasionally included in later keys for reasons of tradition, the bar is unnecessary for radio telegraphy. The straight key is simple and reliable, but the pumping action needed to send a string of dots poses some significant drawbacks. Transmission speeds vary from 5 words per minute, by novice operators, in the early days of telegraphy, a number of professional telegraphers developed a repetitive stress injury known as glass arm or telegraphers paralysis. ‘Glass arm’ or ‘Telegraphers paralysis’ may be reduced or eliminated by increasing the side play of the key by loosening the adjustable trunnion screws. Such problems can be avoided by using a good technique, the first widely accepted alternative key was the sideswiper or sidewinder, sometimes called a cootie key. This key uses an action with contacts in both directions and the arm spring-loaded to return to center. A series of dits could be sent by rocking the arm back, the alternating action produces a distinctive rhythm or swing which noticeably affects the operators transmission style, hence the name iambic. Although the sideswiper is seldom seen or used today, nearly all advanced keys use some form of side-to-side action, a popular side-to-side mechanical key is the semi-automatic key or bug, sometimes known as a Vibroplex key, after the company that first manufactured them

17.
IBM
–
International Business Machines Corporation is an American multinational technology company headquartered in Armonk, New York, United States, with operations in over 170 countries. The company originated in 1911 as the Computing-Tabulating-Recording Company and was renamed International Business Machines in 1924, IBM manufactures and markets computer hardware, middleware and software, and offers hosting and consulting services in areas ranging from mainframe computers to nanotechnology. IBM is also a research organization, holding the record for most patents generated by a business for 24 consecutive years. IBM has continually shifted its business mix by exiting commoditizing markets and focusing on higher-value, also in 2014, IBM announced that it would go fabless, continuing to design semiconductors, but offloading manufacturing to GlobalFoundries. Nicknamed Big Blue, IBM is one of 30 companies included in the Dow Jones Industrial Average and one of the worlds largest employers, with nearly 380,000 employees. Known as IBMers, IBM employees have been awarded five Nobel Prizes, six Turing Awards, ten National Medals of Technology, in the 1880s, technologies emerged that would ultimately form the core of what would become International Business Machines. On June 16,1911, their four companies were amalgamated in New York State by Charles Ranlett Flint forming a fifth company, the Computing-Tabulating-Recording Company based in Endicott, New York. The five companies had 1,300 employees and offices and plants in Endicott and Binghamton, New York, Dayton, Ohio, Detroit, Michigan, Washington, D. C. and Toronto. They manufactured machinery for sale and lease, ranging from commercial scales and industrial time recorders, meat and cheese slicers, to tabulators and punched cards. Thomas J. Watson, Sr. fired from the National Cash Register Company by John Henry Patterson, called on Flint and, Watson joined CTR as General Manager then,11 months later, was made President when court cases relating to his time at NCR were resolved. Having learned Pattersons pioneering business practices, Watson proceeded to put the stamp of NCR onto CTRs companies and his favorite slogan, THINK, became a mantra for each companys employees. During Watsons first four years, revenues more than doubled to $9 million, Watson had never liked the clumsy hyphenated title of the CTR and in 1924 chose to replace it with the more expansive title International Business Machines. By 1933 most of the subsidiaries had been merged into one company, in 1937, IBMs tabulating equipment enabled organizations to process unprecedented amounts of data, its clients including the U. S. During the Second World War the company produced small arms for the American war effort, in 1949, Thomas Watson, Sr. created IBM World Trade Corporation, a subsidiary of IBM focused on foreign operations. In 1952, he stepped down after almost 40 years at the company helm, in 1957, the FORTRAN scientific programming language was developed. In 1961, IBM developed the SABRE reservation system for American Airlines, in 1963, IBM employees and computers helped NASA track the orbital flight of the Mercury astronauts. A year later it moved its headquarters from New York City to Armonk. The latter half of the 1960s saw IBM continue its support of space exploration, on April 7,1964, IBM announced the first computer system family, the IBM System/360

18.
Binary-coded decimal
–
In computing and electronic systems, binary-coded decimal is a class of binary encodings of decimal numbers where each decimal digit is represented by a fixed number of bits, usually four or eight. Special bit patterns are used for a sign or for other indications. The precise 4-bit encoding may vary however, for technical reasons, the ten states representing a BCD decimal digit are sometimes called tetrades with those dont care-states unused named pseudo-tetrads or pseudo-decimal digit). BCDs principal drawbacks are an increase in the complexity of the circuits needed to implement basic arithmetics. BCD was used in many early computers, and is implemented in the instruction set of machines such as the IBM System/360 series and its descendants. BCD takes advantage of the fact that any one decimal numeral can be represented by a four bit pattern, the most obvious way of encoding digits is natural BCD, where each decimal digit is represented by its corresponding four-bit binary value, as shown in the following table. This is also called 8421 encoding, other encodings are also used, including so-called 4221 and 7421 — named after the weighting used for the bits — and excess-3. For example, the BCD digit 6, 0110b in 8421 notation, is 1100b in 4221, 0110b in 7421, Packed, Two numerals are encoded into a single byte, with one numeral in the least significant nibble and the other numeral in the most significant nibble. To represent numbers larger than the range of a single byte any number of contiguous bytes may be used, also note how packed BCD is more efficient in storage usage as compared to unpacked BCD, encoding the same number in unpacked format would consume twice the storage. Shifting and masking operations are used to pack or unpack a packed BCD digit, other logical operations are used to convert a numeral to its equivalent bit pattern or reverse the process. BCD is very common in systems where a numeric value is to be displayed, especially in systems consisting solely of digital logic. By employing BCD, the manipulation of data for display can be greatly simplified by treating each digit as a separate single sub-circuit. This matches much more closely the reality of display hardware—a designer might choose to use a series of separate identical seven-segment displays to build a metering circuit. If the numeric quantity were stored and manipulated as pure binary, therefore, in cases where the calculations are relatively simple, working throughout with BCD can lead to a simpler overall system than converting to and from binary. Most pocket calculators do all their calculations in BCD, the same argument applies when hardware of this type uses an embedded microcontroller or other small processor. Often, smaller code results when representing numbers internally in BCD format, for these applications, some small processors feature BCD arithmetic modes, which assist when writing routines that manipulate BCD quantities. In Packed BCD, each of the two nibbles of each byte represent a decimal digit, Packed BCD has been in use since at least the 1960s and is implemented in all IBM mainframe hardware since then. Most implementations are big endian, i. e. with the more significant digit in the half of each byte

19.
IBM 1401
–
The IBM1401 is a variable wordlength decimal computer that was announced by IBM on October 5,1959. Over 12,000 units were produced and many were leased or resold after they were replaced with newer technology, the 1401 was withdrawn on February 8,1971. These features include, high speed card punching and reading, magnetic tape input and output, high speed printing, stored program, and arithmetic and logical ability. The 1401 may be operated as an independent system, in conjunction with IBM punched card equipment, monthly rental for 1401 configurations started at US$2,500. IBM was pleasantly surprised to receive 5,200 orders in just the first five weeks – more than predicted for the life of the machine. By late 1961, the 2000 installed in the USA were about one quarter of all electronic stored-program computers by all manufacturers, the number of installed 1401s peaked above 10,000 in the mid-1960s. In all, by the nearly half of all computer systems in the world were 1401-type systems. The system was marketed until February 1971, commonly used by small businesses as their primary data processing machines, the 1401 was also frequently used as an off-line peripheral controller for mainframe computers. In such installations, with an IBM7090 for example, the computers used only magnetic tape for input-output. It was the 1401 that transferred input data from slow peripherals to tape, and transferred output data from tape to the card punch and this allowed the mainframes throughput to not be limited by the speed of a card reader or printer. During the 1970s, IBM installed many 1401s in India and Pakistan where they were in use well into the 1980s, some of todays Indian and Pakistani software entrepreneurs started on these 1401s. The first computer in Pakistan, for example, was a 1401 installed at Pakistan International Airlines, each alphanumeric character in the 1401 was encoded by six bits, called B, A,8,4,2,1. The B, A bits were called zone bits and the 8,4,2,1 bits were called numeric bits, for digits 1 through 9, the bits B, A were zero, the digit BCD encoded in bits 8,4,2,1. Thus the letter A,12,1 in the punched card character code, was encoded B, A,1, encodings of punched card characters with two or more digit punches can be found in the Character and op codes table. IBM called the 1401s character code BCD, even though that term describes only the digit encoding. The 1401s alphanumeric collating sequence was compatible with the punched card collating sequence, associated with each memory location were two other bits, called C for odd parity check and M for word mark. Each memory location then, had the following bits, C B A8421 M The 1401 was available in six memory configurations,1400,2000,4000,8000,12000, each character was addressable, addresses ranging from 0 through 15999. A very small number of 1401s were expanded to 32,000 characters by special request, some operations used specific memory locations

20.
IBM 1620
–
The IBM1620 was announced by IBM on October 21,1959, and marketed as an inexpensive scientific computer. After a total production of two thousand machines, it was withdrawn on November 19,1970. Modified versions of the 1620 were used as the CPU of the IBM1710, core memory cycle times were 20 microseconds for the Model I,10 microseconds for the Model II. For an explanation of all three known interpretations of the code name see the section on the machines development history. It was a word length decimal computer with a memory that could hold anything from 20,000 to 60,000 decimal digits increasing in 20,000 decimal digit increments. Memory was accessed two decimal digits at the same time and it was set to mark the most significant digit of a number. In the least significant digit of 5-digit addresses it was set for indirect addressing, in the middle 3 digits of 5-digit addresses they were set to select one of 7 index registers. Some instructions, such as the B instruction, only used the P Address, fixed-point data words could be any size from two decimal digits up to all of memory not used for other purposes. Floating-point data words could be any size from 4 decimal digits up to 102 decimal digits, the machine had no programmer-accessible registers, all operations were memory to memory. The table below lists Alphameric mode characters, the table below lists numeric mode characters. The Model I used the Cyrillic character Ж on the typewriter as a general purpose invalid character with correct parity, in some 1620 installations it was called a SMERSH, as used in the James Bond novels that had become popular in the late 1960s. The Model II used a new character ❚ as a general purpose invalid character with correct parity and he also showed how the machines paper tape reading support could not properly read tapes containing record marks, since record marks are used to terminate the characters read in storage. Most 1620 installations used the more convenient punched card input/output, rather than paper tape, the successor to the 1620, the IBM1130 was based on a totally different, 16-bit binary architecture. The Monitors provided disk based versions of 1620 SPS IId, FORTRAN IId as well as a DUP, both Monitor systems required 20,000 digits or more of memory and 1 or more 1311 disk drives. A standard preliminary was to clear the computer memory of any previous users detritus - being magnetic cores and this was effected by using the console facilities to load a simple computer program via typing its machine code at the console typewriter, running it, and stopping it. This was not challenging as only one instruction was needed such as 160001000000, loaded at address zero and this was the normal machine code means of copying a constant of up to five digits. The digit string was addressed at its end and extended through lower addresses until a digit with a flag marked its end. But for this instruction, no flag would ever be found because the source digits had shortly before been overwritten by digits lacking a flag, each 20,000 digit module of memory took just under one second to clear

21.
IBM 700/7000 series
–
The IBM 700/7000 series is a series of large-scale computer systems that were made by IBM through the 1950s and early 1960s. The series includes several different, incompatible processor architectures, the 700s use vacuum tube logic and were made obsolete by the introduction of the transistorized 7000s. The 7000s, in turn, were replaced by System/360. However the 360/65, the first 360 powerful enough to replace 7000s, early problems with OS/360 and the high cost of converting software kept many 7000s in service for years afterward. All machines use magnetic core memory, except for early 701 and 702 models, early computers were sold without software. The System/360 combined the best features of the 7000 and 1400 series architectures into a single design, however, some 360 models have optional features that allow them to emulate the 1400 and 7000 instruction sets in microcode. While the architectures differ, the machines in the class share electronics technologies. Tape drives use 7-track format, with the IBM727 for vacuum tube machines, both the vacuum tube and most transistor models use the same card readers, card punches and line printers that were introduced with the 701. These units, the IBM711,721 and 716, are based on IBM accounting machine technology and they are relatively slow and it was common for 7000 series installations to include an IBM1401, with its much faster peripherals, to do card-to-tape and tape-to-line-printer operations off-line. Three later machines, the 7010, the 7040 and the 7044, some of the technology for the 7030 was used in data channels and peripheral devices on other 7000 series computers, e. g.7340 Hypertape. Known as the Defense Calculator while in development in the IBM Poughkeepsie Laboratory, Data formats Numbers are either 36 bits or 18 bits long, only fixed point. Fixed-point numbers are stored in binary sign/magnitude format, instruction format Instructions are 18 bits long, single address. First machines were the vacuum-tube 704 and 709, followed by the transistorized 7090,7094, 7094-II, the ultimate model was the Direct Coupled System consisting of a 7094 linked to a 7044 that handled input and output operations. Data formats Numbers are 36 bits long, both fixed point and floating point, fixed-point numbers are stored in binary sign/magnitude format. Instruction format The basic instruction format is a prefix, fifteen-bit decrement, three-bit tag. The prefix field specifies the class of instruction, the decrement field often contains an immediate operand to modify the results of the operation, or is used to further define the instruction type. The three bits of the tag specify three index registers, the contents of which are subtracted from the address to produce an effective address, the address field either contains an address or an immediate operand. The Index registers operate using twos complement format and when used to modify an instruction address are subtracted from the address in the instruction, on machines with three index registers, if the tag has two or three bits set then their values are ORed together before being subtracted

22.
Integer
–
An integer is a number that can be written without a fractional component. For example,21,4,0, and −2048 are integers, while 9.75, 5 1⁄2, the set of integers consists of zero, the positive natural numbers, also called whole numbers or counting numbers, and their additive inverses. This is often denoted by a boldface Z or blackboard bold Z standing for the German word Zahlen, ℤ is a subset of the sets of rational and real numbers and, like the natural numbers, is countably infinite. The integers form the smallest group and the smallest ring containing the natural numbers, in algebraic number theory, the integers are sometimes called rational integers to distinguish them from the more general algebraic integers. In fact, the integers are the integers that are also rational numbers. Like the natural numbers, Z is closed under the operations of addition and multiplication, that is, however, with the inclusion of the negative natural numbers, and, importantly,0, Z is also closed under subtraction. The integers form a ring which is the most basic one, in the following sense, for any unital ring. This universal property, namely to be an object in the category of rings. Z is not closed under division, since the quotient of two integers, need not be an integer, although the natural numbers are closed under exponentiation, the integers are not. The following lists some of the properties of addition and multiplication for any integers a, b and c. In the language of algebra, the first five properties listed above for addition say that Z under addition is an abelian group. As a group under addition, Z is a cyclic group, in fact, Z under addition is the only infinite cyclic group, in the sense that any infinite cyclic group is isomorphic to Z. The first four properties listed above for multiplication say that Z under multiplication is a commutative monoid. However, not every integer has an inverse, e. g. there is no integer x such that 2x =1, because the left hand side is even. This means that Z under multiplication is not a group, all the rules from the above property table, except for the last, taken together say that Z together with addition and multiplication is a commutative ring with unity. It is the prototype of all objects of algebraic structure. Only those equalities of expressions are true in Z for all values of variables, note that certain non-zero integers map to zero in certain rings. The lack of zero-divisors in the means that the commutative ring Z is an integral domain

23.
Writing system
–
A writing system is any conventional method of visually representing verbal communication. While both writing and speech are useful in conveying messages, writing differs in also being a form of information storage. The processes of encoding and decoding writing systems involve shared understanding between writers and readers of the meaning behind the sets of characters that make up a script, the general attributes of writing systems can be placed into broad categories such as alphabets, syllabaries, or logographies. Any particular system can have attributes of more than one category, in the alphabetic category, there is a standard set of letters of consonants and vowels that encode based on the general principle that the letters represent speech sounds. In a syllabary, each symbol correlates to a syllable or mora, in a logography, each character represents a word, morpheme, or other semantic units. Other categories include abjads, which differ from alphabets in that vowels are not indicated, alphabets typically use a set of 20-to-35 symbols to fully express a language, whereas syllabaries can have 80-to-100, and logographies can have several hundreds of symbols. Systems will also enable the stringing together of these groupings in order to enable a full expression of the language. The reading step can be accomplished purely in the mind as an internal process, writing systems were preceded by proto-writing, which used pictograms, ideograms and other mnemonic symbols. Proto-writing lacked the ability to capture and express a range of thoughts. Soon after, writing provided a form of long distance communication. With the advent of publishing, it provided the medium for a form of mass communication. Writing systems are distinguished from other possible symbolic communication systems in that a system is always associated with at least one spoken language. In contrast, visual representations such as drawings, paintings, and non-verbal items on maps, such as contour lines, are not language-related. Some other symbols, such as numerals and the ampersand, are not directly linked to any specific language, every human community possesses language, which many regard as an innate and defining condition of humanity. However, the development of writing systems, and the process by which they have supplanted traditional oral systems of communication, have been sporadic, uneven, once established, writing systems generally change more slowly than their spoken counterparts. Thus they often preserve features and expressions which are no current in the spoken language. One of the benefits of writing systems is that they can preserve a permanent record of information expressed in a language. In the examination of individual scripts, the study of writing systems has developed along partially independent lines, thus, the terminology employed differs somewhat from field to field

24.
UTF-8
–
UTF-8 is a character encoding capable of encoding all possible characters, or code points, defined by Unicode and originally designed by Ken Thompson and Rob Pike. The encoding is variable-length and uses 8-bit code units and it was designed for backward compatibility with ASCII and to avoid the complications of endianness and byte order marks in the alternative UTF-16 and UTF-32 encodings. The name is derived from Unicode Transformation Format – 8-bit, UTF-8 is the dominant character encoding for the World Wide Web, accounting for 88. 9% of all Web pages in April 2017. The Internet Mail Consortium recommended that all programs be able to display and create mail using UTF-8. UTF-8 encodes each of the 1,112,064 valid code points in Unicode using one to four 8-bit bytes, code points with lower numerical values, which tend to occur more frequently, are encoded using fewer bytes. The following table shows the structure of the encoding, the x characters are replaced by the bits of the code point. If the number of significant bits is no more than 7, the first line applies, if no more 11 bits, the second line applies, the first 128 characters need one byte. Three bytes are needed for characters in the rest of the Basic Multilingual Plane, four bytes are needed for characters in the other planes of Unicode, which include less common CJK characters, various historic scripts, mathematical symbols, and emoji. The salient features of this scheme are as follows, Backward compatibility, One-byte codes are used for the ASCII values 0 through 127, clear indication of byte sequence length, The first byte indicates the number of bytes in the sequence. The length of multi-byte sequences is determined as it is simply the number of high-order 1s in the leading byte. Self-synchronization, The leading bytes and the continuation bytes do not share values and this means a search will not accidentally find the sequence for one character starting in the middle of another character. It also means the start of a character can be found from a position by backing up at most 3 bytes to find the leading byte. Consider the encoding of the Euro sign, €, the Unicode code point for € is U+20AC. According to the table above, this will take three bytes to encode, since it is between U+0800 and U+FFFF. Hexadecimal 20AC is binary 0010000010101100, the two leading zeros are added because, as the scheme table shows, a three-byte encoding needs exactly sixteen bits from the code point. All continuation bytes contain exactly six bits from the code point, so the next six bits of the code point are stored in the low order six bits of the next byte, and 10 is stored in the high order two bits to mark it as a continuation byte. Finally the last six bits of the point are stored in the low order six bits of the final byte. The three bytes 111000101000001010101100 can be concisely written in hexadecimal, as E282 AC

25.
String (computer science)
–
In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, a string is generally understood as a data type and is often implemented as an array of bytes that stores a sequence of elements, typically characters, using some character encoding. A string may also more general arrays or other sequence data types and structures. When a string appears literally in source code, it is known as a literal or an anonymous string. In formal languages, which are used in logic and theoretical computer science. Let Σ be a non-empty finite set of symbols, called the alphabet, no assumption is made about the nature of the symbols. A string over Σ is any sequence of symbols from Σ. For example, if Σ =, then 01011 is a string over Σ, the length of a string s is the number of symbols in s and can be any non-negative integer, it is often denoted as |s|. The empty string is the string over Σ of length 0. The set of all strings over Σ of length n is denoted Σn, for example, if Σ =, then Σ2 =. Note that Σ0 = for any alphabet Σ, the set of all strings over Σ of any length is the Kleene closure of Σ and is denoted Σ*. In terms of Σn, Σ ∗ = ⋃ n ∈ N ∪ Σ n For example, if Σ =, although the set Σ* itself is countably infinite, each element of Σ* is a string of finite length. A set of strings over Σ is called a language over Σ. For example, if Σ =, the set of strings with an number of zeros, is a formal language over Σ. Concatenation is an important binary operation on Σ*, for any two strings s and t in Σ*, their concatenation is defined as the sequence of symbols in s followed by the sequence of characters in t, and is denoted st. For example, if Σ =, s = bear, and t = hug, then st = bearhug, String concatenation is an associative, but non-commutative operation. The empty string ε serves as the identity element, for any string s, therefore, the set Σ* and the concatenation operation form a monoid, the free monoid generated by Σ. In addition, the length function defines a monoid homomorphism from Σ* to the non-negative integers, a string s is said to be a substring or factor of t if there exist strings u and v such that t = usv

26.
Plane (Unicode)
–
In the Unicode standard, a plane is a continuous group of 65,536 code points. There are 17 planes, identified by the numbers 0 to 16decimal, Plane 0 is the Basic Multilingual Plane, which contains most commonly-used characters. The higher planes 1 through 16 are called supplementary planes, or humorously astral planes, as of Unicode version 9.0, six of the planes have assigned code points, and four are named. The limit of 17 is due to the design of UTF-16, which can encode 16 supplementary planes and the BMP, to a value of 0x10FFFF. The encoding scheme used by UTF-8 was designed with a larger limit of 231 code points. Since Unicode limits the points to the 17 planes that can be encoded by UTF-16, code points above 0x10FFFF are invalid in UTF-8. The 17 planes can accommodate 1,114,112 code points, of these,2,048 are surrogates,66 are non-characters, and 137,468 are reserved for private use, leaving 974,530 for public assignment. Planes are further subdivided into Unicode blocks, which, unlike planes, the 273 blocks defined in Unicode 9.0 cover 24% of the possible code point space, and range in size from a minimum of 16 code points to a maximum of 65,536 code points. For future usage, ranges of characters have been mapped out for most known current and ancient writing systems. The first plane, plane 0, the Basic Multilingual Plane contains characters for almost all languages. A primary objective for the BMP is to support the unification of prior character sets as well as characters for writing, most of the assigned code points in the BMP are used to encode Chinese, Japanese, and Korean characters. The High Surrogates and Low Surrogate codes are reserved for encoding characters in UTF-16 by using a pair of 16-bit codes, one High Surrogate. A single surrogate code point will never be assigned a character,65,408 of the 65,536 code points in this plane have been allocated to a Unicode block, leaving just 128 code points in unallocated ranges. As of Unicode 9.0, the BMP comprises the following 161 blocks, Plane 1, the Supplementary Multilingual Plane, contains historic scripts, scripts include Linear B, Egyptian hieroglyphs, and cuneiform scripts, and also reform orthographies like Shavian and Deseret. Symbols and notations include historic and modern musical notation, mathematical alphanumerics, Emoji and other sets, and game symbols for playing cards, Mah Jongg. Plane 3 is tentatively named the Tertiary Ideographic Plane, but as of version 9.0 there are no characters assigned to it and it is reserved for Oracle Bone script, Bronze Script, Small Seal Script, additional CJK unified ideographs, and other historic ideographic scripts. It is not anticipated that all these planes will be used in the foreseeable future, the number of possible symbol characters that could arise outside of the context of writing systems is potentially huge. At the moment, these 11 planes out of 17 are unused, Plane 14, the Supplementary Special-purpose Plane, currently contains non-graphical characters

27.
Byte
–
The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of used to encode a single character of text in a computer. The size of the byte has historically been hardware dependent and no standards existed that mandated the size. The de-facto standard of eight bits is a convenient power of two permitting the values 0 through 255 for one byte, the international standard IEC 80000-13 codified this common meaning. Many types of applications use information representable in eight or fewer bits, the popularity of major commercial computing architectures has aided in the ubiquitous acceptance of the 8-bit size. The unit symbol for the byte was designated as the upper-case letter B by the IEC and IEEE in contrast to the bit, internationally, the unit octet, symbol o, explicitly denotes a sequence of eight bits, eliminating the ambiguity of the byte. It is a respelling of bite to avoid accidental mutation to bit. Early computers used a variety of four-bit binary coded decimal representations and these representations included alphanumeric characters and special graphical symbols. S. Government and universities during the 1960s, the prominence of the System/360 led to the ubiquitous adoption of the eight-bit storage size, while in detail the EBCDIC and ASCII encoding schemes are different. In the early 1960s, AT&T introduced digital telephony first on long-distance trunk lines and these used the eight-bit µ-law encoding. This large investment promised to reduce costs for eight-bit data. The development of microprocessors in the 1970s popularized this storage size. A four-bit quantity is called a nibble, also nybble. The term octet is used to specify a size of eight bits. It is used extensively in protocol definitions, historically, the term octad or octade was used to denote eight bits as well at least in Western Europe, however, this usage is no longer common. The exact origin of the term is unclear, but it can be found in British, Dutch, and German sources of the 1960s and 1970s, and throughout the documentation of Philips mainframe computers. The unit symbol for the byte is specified in IEC 80000-13, IEEE1541, in the International System of Quantities, B is the symbol of the bel, a unit of logarithmic power ratios named after Alexander Graham Bell, creating a conflict with the IEC specification. However, little danger of confusion exists, because the bel is a used unit

28.
Greek alphabet
–
It is the ancestor of the Latin and Cyrillic scripts. In its classical and modern forms, the alphabet has 24 letters, Modern and Ancient Greek use different diacritics. In standard Modern Greek spelling, orthography has been simplified to the monotonic system, examples In both Ancient and Modern Greek, the letters of the Greek alphabet have fairly stable and consistent symbol-to-sound mappings, making pronunciation of words largely predictable. Ancient Greek spelling was generally near-phonemic, among consonant letters, all letters that denoted voiced plosive consonants and aspirated plosives in Ancient Greek stand for corresponding fricative sounds in Modern Greek. This leads to groups of vowel letters denoting identical sounds today. Modern Greek orthography remains true to the spellings in most of these cases. The following vowel letters and digraphs are involved in the mergers, Modern Greek speakers typically use the same, modern, in other countries, students of Ancient Greek may use a variety of conventional approximations of the historical sound system in pronouncing Ancient Greek. Several letter combinations have special conventional sound values different from those of their single components, among them are several digraphs of vowel letters that formerly represented diphthongs but are now monophthongized. In addition to the three mentioned above, there is also ⟨ου⟩, pronounced /u/, the Ancient Greek diphthongs ⟨αυ⟩, ⟨ευ⟩ and ⟨ηυ⟩ are pronounced, and respectively in voicing environments in Modern Greek. The Modern Greek consonant combinations ⟨μπ⟩ and ⟨ντ⟩ stand for and respectively, ⟨τζ⟩ stands for, in addition, both in Ancient and Modern Greek, the letter ⟨γ⟩, before another velar consonant, stands for the velar nasal, thus ⟨γγ⟩ and ⟨γκ⟩ are pronounced like English ⟨ng⟩. There are also the combinations ⟨γχ⟩ and ⟨γξ⟩ and these signs were originally designed to mark different forms of the phonological pitch accent in Ancient Greek. The letter rho, although not a vowel, also carries a rough breathing in word-initial position, if a rho was geminated within a word, the first ρ always had the smooth breathing and the second the rough breathing leading to the transiliteration rrh. The vowel letters ⟨α, η, ω⟩ carry an additional diacritic in certain words, the iota subscript. This iota represents the former offglide of what were originally long diphthongs, ⟨ᾱι, ηι, ωι⟩, another diacritic used in Greek is the diaeresis, indicating a hiatus. In 1982, a new, simplified orthography, known as monotonic, was adopted for use in Modern Greek by the Greek state. Although it is not a diacritic, the comma has a function as a silent letter in a handful of Greek words, principally distinguishing ό. There are many different methods of rendering Greek text or Greek names in the Latin script, the form in which classical Greek names are conventionally rendered in English goes back to the way Greek loanwords were incorporated into Latin in antiquity. In this system, ⟨κ⟩ is replaced with ⟨c⟩, the diphthongs ⟨αι⟩ and ⟨οι⟩ are rendered as ⟨ae⟩ and ⟨oe⟩ respectively, and ⟨ει⟩ and ⟨ου⟩ are simplified to ⟨i⟩ and ⟨u⟩ respectively

29.
Cyrillic script
–
The Cyrillic script /sᵻˈrɪlɪk/ is a writing system used for various alphabets across eastern Europe and north and central Asia. It is based on the Early Cyrillic, which was developed in the First Bulgarian Empire during the 9th century AD at the Preslav Literary School. As of 2011, around 252 million people in Eurasia use it as the alphabet for their national languages. With the accession of Bulgaria to the European Union on 1 January 2007, Cyrillic became the official script of the European Union, following the Latin script. Cyrillic is derived from the Greek uncial script, augmented by letters from the older Glagolitic alphabet and these additional letters were used for Old Church Slavonic sounds not found in Greek. The script is named in honor of the two Byzantine brothers, Saints Cyril and Methodius, who created the Glagolitic alphabet earlier on, modern scholars believe that Cyrillic was developed and formalized by early disciples of Cyril and Methodius. In the early 18th century the Cyrillic script used in Russia was heavily reformed by Peter the Great, the new form of letters became closer to the Latin alphabet, several archaic letters were removed and several letters were personally designed by Peter the Great. West European typography culture was also adopted, Cyrillic script spread throughout the East and South Slavic territories, being adopted for writing local languages, such as Old East Slavic. Its adaptation to local languages produced a number of Cyrillic alphabets, capital and lowercase letters were not distinguished in old manuscripts. Yeri was originally a ligature of Yer and I, iotation was indicated by ligatures formed with the letter І, Ꙗ, Ѥ, Ю, Ѩ, Ѭ. Sometimes different letters were used interchangeably, for example И = І = Ї, there were also commonly used ligatures like ѠТ = Ѿ. The letters also had values, based not on Cyrillic alphabetical order. The early Cyrillic alphabet is difficult to represent on computers, many of the letterforms differed from modern Cyrillic, varied a great deal in manuscripts, and changed over time. Few fonts include adequate glyphs to reproduce the alphabet, the Unicode 5.1 standard, released on 4 April 2008, greatly improves computer support for the early Cyrillic and the modern Church Slavonic language. In Microsoft Windows, Segoe UI is notable for having complete support for the archaic Cyrillic letters since Windows 8, the development of Cyrillic typography passed directly from the medieval stage to the late Baroque, without a Renaissance phase as in Western Europe. Late Medieval Cyrillic letters show a tendency to be very tall and narrow. Peter the Great, Czar of Russia, mandated the use of westernized letter forms in the early 18th century, over time, these were largely adopted in the other languages that use the script. The development of some Cyrillic computer typefaces from Latin ones has also contributed to the visual Latinization of Cyrillic type, Cyrillic uppercase and lowercase letter forms are not as differentiated as in Latin typography

Morse code is a method of transmitting text information as a series of on-off tones, lights, or clicks that can be …

Typical "straight key". This U.S. model J-38, was manufactured in huge quantities during World War II. The signal is "on" when the knob is pressed, and "off" when it is released. Length and timing of the dots and dashes are entirely controlled by the telegraphist.

Morse code receiver, recording on paper tape

A U.S. Navy Morse Code training class in 2015. The sailors will use their new skills to collect signals intelligence.

A commercially manufactured iambic paddle used in conjunction with an electronic keyer to generate high-speed Morse code, the timing of which is controlled by the electronic keyer. Manipulation of dual-lever paddles is similar to the Vibroplex, but pressing the right paddle generates a series of dahs, and squeezing the paddles produces dit-dah-dit-dah sequence. The actions are reversed for left-handed operators.

A German manuscript page teaching use of Arabic numerals (Talhoffer Thott, 1459). At this time, knowledge of the numerals was still widely seen as esoteric, and Talhoffer presents them with the Hebrew alphabet and astrology.

The Baudot code, invented by Émile Baudot, is a character set predating EBCDIC and ASCII. It was the predecessor to the …

Paper tape with holes representing the "Baudot-Murray Code". Note the fully punched columns of "Delete/Letters select" codes at start of the message (on the right); these were used to cut the band easily between distinct messages. The message then starts with a Figure shift control followed by a carriage return.

International Telegraph Alphabet 2 (British variant)

Keyboard of a teleprinter using the Baudot code (US variant), with FIGS and LTRS shift keys