Having a Glossary meant I could reduce the text on most
pages, while expanding background for the definitions, and
relating the ideas to other similar, contradictory, or more
basic ideas.

Why Bother with Definitions?

The value of a definition is insight. But:

Simple descriptions are not always possible.

Terms have meaning within particular contexts.

Tedious examples may be required to expose the full meaning.

Good definitions can expose assumptions and provide a basis for
reasoning to larger conclusions.

Consider the idea that
cryptography is used to keep secrets:
We expect a
cipher to win each and every contest brought
by anyone who wishes to expose secrets.
We call those people
opponents, but who are they really, and
what can they do?
In practice, we cannot know.
Opponents operate in secret:
We do not know their names, nor how many they are, nor where they
work.
We do not know what they know, nor their level of experience or
resources, nor anything else about them.
Because we do not know our opponents, we also do not know what
they can do, including whether they can
break our ciphers.
Unless we know these things that cannot be known, we cannot tell
whether a particular cipher design will prevail in battle.
We cannot expect to know when our cipher has failed.

Even though the entire reason for using cryptography is to
protect secret information, it is
by definition impossible to know whether a cipher can
do that.
Nobody can know whether a cipher is strong enough, no matter
how well educated they are, or how experienced, or how well
connected, because they would have to know the opponents best of all.
The definition of cryptography implies a contest between a cipher
design and unknown opponents, and that means a successful outcome
cannot be guaranteed by anyone.

Sometimes the Significance is Implied

Consider the
cryptographer who says: "My cipher is
strong," and the
cryptanalyst who says: "I think your
cipher is weak."
Here we have two competing
claims with different sets of possibilities:
First, the cryptographer has the great disadvantage of not being
able to
prove cipher strength, nor to even list every
possible
attack so they can be checked.
In contrast, the cryptanalyst might be able to actually
demonstrate weakness, but only by dint of massive effort
which may not succeed, and will not be compensated even if it does.
Consequently, most criticisms will be extrapolations, possibly
based on experience, and also possibly wrong.

The situation is inherently unbalanced, with a bias against
the cryptographer's detailed and thought-out claims, and for
mere handwave first-thoughts from anyone who deigns to comment.
This is the ultimate conservative bias against anything new,
and for the status quo.
Supposedly the bias exists because if the cryptographer's claim is
wrong user secrets might be exposed.
But the old status-quo ciphers are in that same position.
Nothing about an old cipher makes it necessarily strong.

Unfortunately, for users to benefit from cryptography they have
to accept some strength argument.
Even more unfortunately:

Many years of trusted use do not testify about strength, but
do provide both motive and time for opponents to develop secret
attacks.

Many failures to break a cipher do not imply it is strong.

There can be no expertise on the strength of unbroken ciphers.

So on the one hand we need a cipher, and on the other have no way to
know how strong the various ciphers are.
For an industry, this is breathtakingly disturbing.

In modern society we purchase things to help us in some way.
We go to the store, buy things, and they work.
Or we notice the things do not work, and take them back.
We know to take things back because we can see the results.
Manufactured things work specifically because design and production
groups can
test which designs work better or worse or not
at all.
In contrast, if the goal of cryptography is to keep secrets, we
generally cannot expect to know whether our cipher has succeeded or
failed.
Cryptography cannot test the fundamental property of interest:
whether or not a secret has been kept.

The inability to test for the property we need is an extraordinary
situation; perhaps no other manufactured thing is like that.
Because the situation is unique, few understand the consequences.
Cryptography is not like other manufactured things: nobody can trust
it because nobody can test it.
Nobody, anywhere, no matter how well educated or experienced, can
test the ability of an unbroken cipher to keep a secret in practice.
Thus we see how mere definitions allow us to deduce fundamental
limitations on cryptography and cryptanalysis by simple reasoning
from a few basic facts.

Relationships Between Ideas

The desire to expose relationships between ideas meant expanding
the Glossary beyond cryptography per se to cover terms from
related areas like
electronics,
math,
statistics,
logic and
argumentation.
Logic and argumentation are especially important in cryptography,
where measures are few and math
proofs may not apply in practice.

This Crypto Glossary is directed toward anyone who wants a
better understanding of what cryptography can and cannot do.
It is intended to address basic cryptographic principles in ways
that allow them to be related,
argued, and deeply understood.
It is particularly concerned with fundamental limits on
cryptography, and contradictions between rational thought and
the current cryptographic wisdom.
Some of these results may be controversial.

The Glossary is intended to build the fundamental understandings
which lie at the base of all cryptographic reasoning, from novice
to professional and beyond.
It is particularly intended for users who wish to avoid being taken
in by attacker
propaganda.
(Propaganda is an expected part of cryptography, since it can cause
users to take actions which make things vastly easier for
opponents.)
The Glossary is also for academics who wish to see and avoid the
logic errors so casually accepted by previous generations.
One goal of the Glossary is to clarify the usual casual claims that
confuse both novices and professionals.
Another is to provide some of the historical technical background
developed before the modern mathematical approach.

Reason in Cryptography

The way we understand reality is to follow logical
arguments.
All of us can do this, not just professors or math experts.
Even new learners can follow a cryptographic argument, provided
it is presented clearly.
So, in this Glossary, one is occasionally expected to
actually follow an argument and come to a personal conclusion.
That can be scary when the result contradicts the conventional
wisdom; then one then starts to question both the argument and
the reasoning, as I very well know.
But that scary feeling is just an expected consequence of a field
which has allowed various unsupported claims and unquestioned
beliefs to wrongly persist (see
old wives' tales).

Unfortunately, real cryptography is not
well-modeled by current
math (for example, see
proof and
cryptanalysis).
It is normally expected that the link between theory and reality
is provided by the
assumptions the math requires.
(Obviously, proof conclusions only apply in practice when
every assumed quality actually occurs in practice.)
In math, each of these assumptions has equal value (since the lack of
any one will void the conclusion), but in practice some assumptions
are more equal than others.
Certain assumptions conceivably can be guaranteed by the
user, but other assumptions may be impossible to guarantee.
When a model requires assumptions that cannot be verified in
practice, that model cannot predict reality.

Current mathematical models almost never allow situations
where the user can control every necessary
assumption, making most
proof results meaningless in practice.
In my view, mathematical cryptography needs practical models.
Of course, one might expect more realistic models to be less able to
support the current plethora of mathematical results.
Due to the use of more realistic models, some results in the Crypto
Glossary do contradict well-known math results.

Opposing Philosophies

By carrying the arguments of conventional cryptographic wisdom
to their extremes, it is possible to see two opposing groups, which
some might call theoretical versus practical.
While this simplistic model is far too coarse to take very seriously,
it does have some basis in reality.

The Crypto Theorists supposedly argue that no cryptosystem
can be trusted unless it has a mathematical
proof, since anything less is mere wishes
and hope.
Unfortunately, there is no such cryptosystem.
No cipher can be guaranteed strong in practice, and that is the
real meaning of the
one time pad.
As long as even one unbreakable system existed, there was at least
a possibility of others, but now there is no reason for such hope.
The OTP is secure only in simplistic theory, and strength cannot be
guaranteed in practice for users.
This group seems most irritating when they imply that math proofs
are most important, even when in practice those proofs provide no
benefits to the user.

The Crypto Practitioners supposedly argue that systems
should be designed to oppose the most likely reasonable threats,
as in physical
threat model
analysis.
In the physical world it is possible to make statements about
limitations of opponents and attacks; unfortunately, few such
statements can be made in cryptography.
In cryptography, we know neither the opponents nor their attacks nor
what they can do in combination.
Successful attack programs can be reproduced and then applied by
the most naive user, who up to that time had posed only the most
laughable threat.

Both groups are wrong:
There will be no proof in practice, and speculating on the abilities
of the opponents is both delusional and hopeless.
Moreover, no correct compromise seems possible.
Taking a little proof from one side and some threat analysis from
the other simply is not a valid recipe for making secure ciphers.

There is a valid recipe for security and that is a growing,
competitive industry of cipher development.
Society needs more than just a few people developing a handful of
ciphers, but actual design groups who continually innovate, design,
develop, measure, attack and improve new ciphers in a continuing
flow.
That is expensive work, as the NSA budget clearly shows.
Open society will get such results only if open society will pay
for them.
Since payment is the issue, it is clear that "free" ciphers act to
oppose exactly the sort of open cryptographic development society
needs.

Absent an industry of cipher design, perhaps the best
we can do is to design systems in ways such that a cipher actually
can fail, while the overall system retains security.
That is redundancy, and is a major part of engineering most forms
of life-critical systems (e.g., airliners), except for cryptography.
The obvious start is
multiple encryption.

What is the Point?

The practical worth of all this should be a serious regard for
cryptographic
risk.
The possibility of cryptographic failure exists despite all claims
and proofs to the contrary.
Users who have something to protect must understand that cryptography
has risks, and there is a real possibility of failure.
If a possibility of information exposure is acceptable, one might
well question the use of cryptography in the first place.

Even if users only want their information probably to
be secure, they still have a problem:
Only our
opponents know our cipher failures,
because they occur in secret.
Our opponents do not expose our failures because they want those
ciphers to continue in use.
Few if any users will know when there is a problem, so we cannot
count how many ciphers fail, and so cannot know that probability.
Since there can be no expertise about what unknown opponents do,
looking for an "expert opinion" on cipher failure probabilities
or strength is just nonsense.

Conventional cryptographic expertise is based on the open
literature.
Unfortunately, unknown attacks can exist, and even the best
informed cannot predict strength against them.
While defending against known attacks may seem better
than nothing, that actually may be nothing to opponents who
have another approach.
In the end, cipher and cryptosystem designers vigorously defend
against attacks from academics who will not be their opponents.

On the other hand, even opponents read the open literature,
and may make academic attacks their own.
But surprisingly few academic attacks actually recover key or
plaintext and so can be said to be real, practical threats.
Much of the academic literature is based on strength assumptions
which cannot be guaranteed or vulnerability assumptions which need
not exist, making the literature less valuable in practice than it
may appear.

Math cannot prove that a cipher is strong in practice, so
we are forced to accept that any cipher may fail.
We do not, and probably can not know the likelihood of that.
But we do know that a single cipher is a
single point of failure
which just begs disaster. (Also see
standard cipher.)

It is possible to design in ways which reduce risk.
Systems can be designed with redundancy to eliminate the single
point of failure (see
multiple encryption).
This is often the done in safety-critical fields, but rarely in
cryptography. Why?
Presumably, people have been far too credulous in accepting math
proofs which rarely apply in practice.
Thus we see the background for my emphasis on basics, reasoning,
proof, and realistic math models.

Simple Encryption

To protect against fire, flood or other disaster, most software
developers should store their current work off-site.
The obvious solution is to first encrypt the files and then upload
an archive to a web site.
The straightforward use of cryptography to protect archives is an
example of the pristine technical situation often seen as normal.
Then we think of cipher strength and key protection, which seem
to be all there is.
But most cryptography is not that simple.

Climate of Secrecy.
For any sort of cryptography to work, those who use it must not
give away the secrets.
Most times keeping secrets is as easy, or as hard, as just not
talking or writing about them.
Issues like minimizing paper output and controlling and destroying
copies seem fairly obvious, although hardly business as usual.
But secrets are almost always composed in plaintext, and the
computers doing that may have plaintext secrets saved in various
hidden operating system files.
And opponents may introduce programs to compromise computers which
handle secrets.
It is thus necessary to control all forms of access to equipment
which holds secrets, despite that being awkward and difficult.
It is especially difficult to control access on the net.

Network Security.
Computers only can do what they are told to do.
When network designers decide to include features which allow
attacks, that decision is as much a part of the problem as an
attack itself.
It seems a bit much to complain about insecurity when insecurity is
part of the design.
Design decisions have made the web insecure.
Until web systems only implement features which maintain security,
there can be none.

It is possible to design computing systems more secure than the
ones we have now.
If we provide no internal support for external attack, no attacks
can prevail.
The entire system must be designed to limit and control external web
access and prevent surprises that slip by unnoticed.
We can decompose the system into relatively small modules, and then
test those modules in a much stronger way than trying to test a
complex program.
A possible improvement might be some form of restricted intermediate
or quarantine store between the OS and the net.
Better security design may mean that some things now supported
insecurely no longer can be supported at all.

Current practice identifies two environments: The local computer,
which is "fully" trusted, and the Internet, which is not trusted.
This verges on a misuse of the concept of
trust, which requires substantial consequences
for misuse or betrayal.
Absent consequences, trust is mere unsupported
belief and provides no basis for reasoning.
We do not trust a machine per se, since it only does what the
designer made it do.
And when there are no consequences for bad design, there really is
no reason to trust the designer either.

A better approach would be fine OS control over individual programs,
including individual scripts, providing validation and detailed
limits on what each program can do, on a per-program basis.
This would expand the firewall concept from just net access to
every resource, including processor time, memory, all forms of I/O,
plus the ability to invoke, or be invoked by, other programs.
For example, most programs do not need, and so would not be allowed,
net access, even if invoked by a program or running under a process
which has such access.
Programs received from the net would by default start out in
quarantine, not have access to normal store, and could run only
under strong limitations.
A human would have to explicitly elevate them to a selected higher
status, with the change logged.
Program operation exceeding limitations would be prevented, logged,
and accumulated in a control which supported validation, fine tuning,
selective responses and serious quarantine.

Security is Off-The-Net.
The best way to avoid web insecurity has nothing to do with
cryptography.
The way to avoid web insecurity is to not connect to the web, ever.
Use a separate computer for secrets, and do not connect it to the
net, or even a LAN, since computers on the LAN probably will be on
the net.
Carefully move information to and from the secrets computer with a
USB flash drive.
Protect access to that equipment.

Glossary Structure and Use

For most users, the Crypto Glossary will have many underlined
(or perhaps colored) words.
Usually, those are hypertext "links" to other text in the Glossary;
just click on the desired link.

Links to my other pages generally offer a choice between a
"local" link or a full web link.
The user working from a downloaded copy of the Glossary only would
normally use the full web links.
The user working from a CD or disk-based copy of all my pages would
normally use the local links.

Links to my other pages also generally open and use another window.
(Hopefully that will avoid the need to reload the Glossary after a
reference to another article.)
Similarly, links from my other pages to terms in the Glossary
also generally open a window specifically for the Glossary.
(In many cases, that will avoid reloading the Glossary for every new
term encountered on those pages.)

In cryptography, as in much of language in general, the exact same
word or phrase often is used to describe two or more distinct ideas.
Naturally, this leads to confused, irreconcilable argumentation
until the distinction is exposed (and often thereafter).
Usually I handle this in the Crypto Glossary by having multiple
numbered definitions, with the most common usage (not necessarily
the best usage) being number 1.

The worth of this Glossary goes beyond mere definitions.
Much of the worth is the relationships between ideas:
Hopefully, looking up one term leads to other ideas which are
similar or opposed or which support the first.
The Glossary is a big file, but breaking it into many small files
would ruin much of the advantage of related ideas, because then most
related terms would be in some other part.
And although the Glossary could be compressed, that would generally
not reduce download time, because most modems automatically
compress data during transmission anyway.
Dial-up users typically should download the Glossary onto local
storage, then use it locally, updating periodically.

Value

I have obviously spent a lot of personal time constructing this
Crypto Glossary, with the hope that it would be more than just
background to my work.
Hopefully, the Glossary and the associated introduction:
"Learning About Cryptography" (see
locally, or @:
http://www.ciphersbyritter.com/LEARNING.HTM)
will be of some wider benefit to the crypto community.
So, if you have used this Glossary lately, why not drop me a short
email and tell me so?
Feel free to tell me how much it helped or even how it failed you;
perhaps I can make it better for the next guy.
If you use web email, just copy and paste my email address:
ritter@ciphersbyritter.com

Resistor excess noise is a
1/f noise generated in non-homogenous resistances,
such as the typical thick-film surface-mount (SMT) resistor composed
of conductor particles and fused glass.
It is thought that
DC current forms a preferential path through
the conductive grains, a path that varies dynamically at random,
thus modulating the resistance and creating noise.
Homogenous metal films do not have 1/f noise.

The especially large amount of 1/f noise in MOSFET's
could be understood if the glass layer the gate rests on is
unexpectedly rough.
That could could create islands of conduction (which some literature
appears to support), which then act like resistive grains.

In a single-crystal
semiconductor, 1/f noise
may be related to the organization of atomic bonding at the outside
surfaces, which must be different than inside the crystalline bulk
material.
If the semiconductor surface could be shown to be composed of
conductive islands, that would be an enlightening result.

A
block code
which represents 8-bit data as 10-bit values or
codewords.
This gives 1024 codewords to encode 256 values plus perhaps 12
new control codes; this freedom can be used to approach general
bit balance in each codeword.
However, since a 10-bit code has only 252 balanced values (whereas
256 data and perhaps 12 control symbols are required), balancing
must extend across codewords.
The encoding process must maintain a count of the current unbalance
and correct that in the next codeword. Also see
coding theory.

In the study of
logic, something observed similarly by
most observers, or something agreed upon, or which has the same
value each time measured.
Something not in dispute, unarguable, and independent of other
state. As opposed to
contextual.

Alternating
Current:
Electrical power which repeatedly reverses direction of flow.
As opposed to
DC.

Generally used for power distribution because the changing
current supports the use of
transformers. Utilities can thus
transport power at high
voltage and low
current, which minimize
"ohmic"
or I2R losses. The high voltages are then reduced
at power substations and again by pole transformers for delivery
to the consumer.

An additive
combiner uses numerical concepts similar
to addition to
mix multiple values into a single result.
This is the basis for conventional
stream ciphering.
Also see
extractor.

One example is
byte addition
modulo 256, which simply adds
two byte values, each in the range 0..255, and produces the
remainder after division by 256, again a value in the byte range
of 0..255.
The modulo is automatic in an addition of two bytes which
produces a single byte result.
Subtraction is also an "additive" combiner.

Almost arbitrary initialization (some element must have its
least significant bit set).

A simple design which is easy to get right.

In addition, a vast multiplicity of independent cycles has the
potential of confusing even a
quantum computer,
should such a thing become realistic.

For Degree-n Primitive, and Bit Width w
Total States: 2nw
Non-Init States: 2n(w-1)
Number of Cycles: 2(n-1)(w-1)
Length Each Cycle: (2n-1)2(w-1)
Period of lsb: 2n-1

The binary addition of two bits with no carry input is just
XOR, so the
lsb of an Additive RNG has
the usual maximal length period.

A
degree-127
Additive RNG using 127 elements of 32 bits each
has 24064 unique states. Of these, 23937
are disallowed by initialization (the
lsb's are all "0") but this is just one
unusable state out of 2127. There are still
23906 cycles which each have almost 2158
steps. (The
Cloak2stream cipher uses an Additive RNG
with 9689 elements of 32 bits, and so has 2310048 unique
states. These are mainly distributed among 2300328
different cycles with almost 29720 steps each.)

Like any other
LFSR, and like any other
RNG, and like any other
FSM, an Additive RNG is very weak when
standing alone.
But when steps are taken to hide the sequence (such as using a
jitterizer nonlinear filter and
Dynamic Substitution
combining), the resulting cipher can have significant strength.

The mechanics of AES are widely available elsewhere.
Here I note how one particular issue common to modern block ciphers
is reflected in the realized AES design.
That issue is the size of the implemented
keyspace compared to the size of the
potential keyspace for
blocks of a given size.

A Block Cipher Model

A common academic model for conventional block ciphers is a "family
of permutations." The
"permutation" part of this means that
every plaintext block value is found as ciphertext, but generally
in a different position.
The "family" part of this can mean every possible permutation.
However, modern block ciphers key-select only an infinitesimal
fraction of those possibilities.

Suppose we have a block which may take on any of n
different values.
How many ways can those n block values be rearranged as
in a block cipher?
Well, the first value can be placed in any of the n possible
positions, but that fills one position so the second value has only
n-1 positions available.
Continuing on, the third has n-2 possibilities
and so on for n different factors.
Thus we find that the number of options is the same as the definition of
factorial.
The number of distinct
permutations of n different values
is n-factorial.

The Corresponding AES Model

A 128-bit key can select
2128 emulated tables.
However, a 128-bit block has an
alphabet of about
3.4x1038 different values, and so could have
3.4x1038factorial emulated tables.
That value is BIG, BIG, BIG, but still within range of my
JavaScript page:

There we find that 3.4x1038 different values
have on the order of 2(1040)
distinct permutations.
That value would take 1040 bits to represent,
and can be directly compared to the 256 bits needed to represent
the larger keys used in AES.

A 128-bit block can be any one of
2128 or 3.4x1038
different values.
To form a particular permutation, the first value can be placed in
any of 3.4x1038 places, the second in
3.4x1038-1 places, and so on for
3.4x1038 different
factors.
As a ballpark calculation, we might expect
3.4x1038-factorial to be similar to
(1038)1038.
That would be the same as 2 to the power
1038 log2 1038, which is
2 to the power 1038 * 128, and
that is about 2 to the power 1040,
nicely confirming the JavaScript results.

AES Reality

The obvious conclusion is that almost none of the
keyspace implicit in the theoretical
model of a conventional
block cipher is actually implemented
in AES, and that is consistent with other modern designs.
Is that important?
Apparently not, but nobody really knows.
It does seem to imply that just a few
known plaintext blocks should be
sufficient to identify the correct key from a set of possibilities,
which might make known plaintext more of an issue than normally
claimed.
Does it lead to a known break?
No, or at least not yet.
But having only a tiny set of keyed permutations should lead to
questions about patterns and relationships within the selected set.

The real issue here is not the exposure of a particular weakness
in AES, since no such weakness is shown.
Instead, the issue is that conventional cryptographic wisdom does
not force models to correspond to reality, and poor models lead to
errors in reasoning.
The distinction between
theory and
practice is pronounced in cryptography.
For other examples of failure in the current cryptographic wisdom,
see
one time pad,
BB&S,
DES, and, of course,
old wives' tale.

Is AES Enough for Government Secrets?

AES is said to be certified for SECRET and TOP SECRET
classified material.
That might have us believe that AES is trusted by
NSA, but it may mean less than it seems.

No cipher, by itself, can guarantee security.
Any cryptographic system will have to be certified by NSA before
protecting classified information.
In practice, cryptosystems will be provided by NSA to contractors,
those systems may or may not use AES, and they may not use AES in
the expected form.
That does not imply that AES is bad, it just means that we cannot
really know what NSA will allow, despite general claims.

Note that all of the variables xi are to
the first power only, and each coefficient ai
simply enables or disables its associated variable.
The result is a single
Boolean value, but the constant term
a0 can produce either possible output
polarity.

Here are all possible 3-variable affine Boolean functions (each
of which may be inverted by complementing the constant term):

Most of the classic hand-ciphers can be seen as
simple substitutionstream ciphers. Each
plaintext letter selects an entry in the
substitution table (for that
cipher), and the contents of that entry becomes the
ciphertext letter.
The affine equation thus represents one way to set
up the table, as a particular simple
permutation of the letters in the table.
(Of course, by using the equation we need no explicit table, but
we also constrain ourselves to the simplicity of the equation.)
To assure that we have a permutation, we require that a and n
be
relatively prime, that is, the
gcd(a,n) = 1, or in number theory
notation, just (a,n) = 1.

In modern terms, the
strength of the classic substitution
ciphers is essentially nil. In modern
cryptanalysis, we generally assume
that the
opponent has a substantial amount of
known plaintext.
Since the table does not change, every known-plaintext character
has the potential to fill in another entry in the table.
Very soon the table is almost completely exposed, which ends all
strength.
These simple substitution ciphers with small, fixed tables (or even
just equations for such tables) are also extremely vulnerable to
attacks using
ciphertext only.

(ANF).
Typically, the symbolic representation of a
mapping
in the usual sum-of-products form.

For
Boolean functions in symbolic
form, each
term
is an input variable combination for which the output is '1'.

For Boolean functions in explicit form, basically a
truth table: simply a list of the
output value as it will occur when stepping through all possible
input variable combinations, one-by-one.
This is just the
bit sequence of the output value as it
would occur in input-variable order.

An oft-overlooked proposal by
Shannon, describing both the construction of a
multiple encryption cipher
(called the "product") and the
keyed
selection of one from among many ciphers (called the "weighted sum").

"The first combining operation is called the product operation and
corresponds to enciphering the message with the first secrecy system
R and enciphering the resulting cryptogram with the second
system S, the keys for R and S being chosen
independently."

"The second combining operation is 'weighted addition.'

S = pR + qSp + q = 1.

It corresponds to making a preliminary choice as to whether system
R or S is to be used with probabilities p and
q, respectively. When this is done or R or S
is used as originally defined." [p.658]

More specifically (and with a change of notation):

"If we have two secrecy systems T and R we can often
combine them in various ways to form a new secrecy system S.
If T and R have the same
domain (message space)
we may form a kind of 'weighted sum,'

S = pT + qR

where p + q = 1.
This operation consists of first making a preliminary choice with
probabilities p and q determining which of T
and R is used.
This choice is part of the
key of S.
After this is determined T or R is used as originally
defined.
The total key of S must specify which of T and R
is used, and which key of T (or R) is used."

"More generally we can form the sum of a number of systems.

S = p1T + p2R + . . . + pmU Sum( pi ) = 1

We note that any system T can be written as a sum of fixed
operations

T = p1T1 + p2T2 + . . . + pmTm

Ti being a definite enciphering operation of
T corresponding to key choice i, which has probability p."

"A second way of combining two secrecy systems is taking the
'product,' . . . .
Suppose T and R are two systems and the domain
(language space) of R can be identified with the range
(cryptogram space) of T.
Then we can apply first T to our language and then R
to the result of this enciphering process.
This gives a resultant operation S which we write as a product

S = RT

The key for S consists of both keys of T and R
which are assumed chosen according to their original probabilities
and independently. Thus if the m keys of T are chosen
with probabilities

p1p2 . . . pm

and the n keys of R have probabilities

p'1p'2 . . . p'n ,

then S has at most mn keys with probabilities
pip'j.
In many cases some of the product transformations
RiTj will be the same and can be
grouped together, adding their probabilities.

"Product encipherment is often used; for example, one follows a
substitution by a transposition or a transposition by a Vigenere,
or applies a code to the text and enciphers the result by substitution,
transposition, fractionation, etc."

"It should be emphasized that these combining operations of
addition and multiplication apply to secrecy systems as a whole.
The product of two systems TR should not be confused with
the product of the transformations in secrecy systems
TiRj . . . ."

It is easy to dismiss this as being of historical interest only,
but there are advantages here which are well beyond our current
usage.

For the keyed selection among ciphers, there would be some sort
of simple protocol (i.e., not cryptographic per se), for
communicating cipher selections to the deciphering end.
(Perhaps there would be some sort of simple handshake for email use.)
The result would be to have (potentially) a new selection from
a set of ciphers on a message-by-message basis.

Having frequent cipher changes guarantees that we can
change ciphers, immediately and easily, if any cipher we use is
found weak.

A cipher change terminates any existing
break of a particular cipher which has been
exposing our information.
Since we cannot expect to know when a break exists, changing to a
different cipher can minimize the effect of a cipher fault even
though we know nothing about that fault.

Using different ciphers at different times prevents information
from being concentrated under a single cipher.
This prevents the opposing attack budget from concentrating on
one target.

The ability to easily change ciphers supports the continued
creation and use of new ciphers, which the opponents must then
identify, obtain, analyze and break.
Although single new cipher design costs can be distributed among
users simply by selling product, each opponent must bear the full
cost of analysis, since most attackers cannot cooperate.
And as the set of ciphers continues to grow, the opponents may
never catch up to the complete set of ciphers actually in use.

Cipher selection has minimal execution cost.

With respect to
multiple encryption or
ciphering "stacks" (as in "protocol stacks"), there are various
security advantages:

A cipher stack prevents a single broken cipher from exposing our
information.
Since any particular cipher may be broken and we will not know (the
opponents do not tell us), this protects against a dangerous
single point of failure.

A three-cipher stack hides the
known-plaintext (and
"defined-plaintext") information for each individual cipher.
Such information simply is not exposed to the opponents, which thus
prevents
known-plaintext attacks (and
defined-plaintext attacks) on the individual ciphers.
The construction thus eliminates whole classes of attack on the
component ciphers.

A three-cipher stack gives us exponentially many different
ciphering stack possibilities.
The intent here is not to add
keyspace, since reasonable
ciphers already have enough keyspace.
Instead, the point is the easy construction of many conceptually
different overall ciphering functions which the opponents must
engage.

Users who are "Nervous Nellies" could specify that their
particular favorite cipher would always be part of the (changing)
cipher stack, thus "guaranteeing" at least as much strength as using
that cipher alone. (If the adjacent cipher was the same, in
decipher mode, using the same key, then there would be no strength.
So do not do that. If an arbitrary cipher was likely to reduce
strength, that would be an
attack, and we see no such attack.)

A three-layer cipher stack obviously has an execution cost of
three layers of ciphering.

An algorithm intended to execute reliably as a computer program
necessarily must handle, or in some way at least deal with,
absolutely every error condition which can possibly occur in
operation.
(We do assume functional hardware, and thus avoid programming around
the possibility of actual hardware faults, such as memory or
CPU failure.)
These "error conditions" normally include Operating System errors
(e.g., bad parameters passed to an OS operation, resource not
available, various I/O failures, etc.), and arithmetic issues
(e.g., division by zero, overflow, etc.) which may halt execution
when they occur.

Other possibilities include errors the OS will not know about,
including the misuse of programmer-defined data structures, such as
buffer overrun.

A practical algorithm must recognize various things which
validly may occur, even if such things are exceedingly rare.
One example might be in assuming that two floating-point variables
which represent the same value will be equal.
Another example might be to assume that a floating-point variable
will "never" have some particular value (which might lead to a
divide-by-zero fault).
Yet another example would be to assume that an arbitrary selection
of x will lead to a sufficiently long cycle in
BB&S, even if the alternative is very,
very unlikely.

My term for a
cipher system computer file relating
names to keys.
This allows ordinary users to specify which
key is to be used by using the far-end name,
without knowing the actual key itself.
Thus, the actual key can be long and random and can change over time
and the user need not coordinate these changes.

In particular, my
Cloak2 and
Penknife ciphers implemented encrypted
alias files of text lines of arbitrary length, each of which
included name, start date, and key.
New keys were made available only as secure ciphertext, but the
alias files were arranged so they could consist of multiple
ciphertext files simply concatenated as ciphertext.
Thus, new keys could be added to the start of the alias file
just using a simple and secure file copy operation.
When searching for a particular alias, the date was also checked,
and that key used only when the correct date had arrived.
This allowed an entire office of users to change to a new key
automatically, at the same time, without even knowing
they were using a different key.
Appropriate functions allowed access to old keys so that email
traffic could be archived in ciphertext form.

Obviously, an alias file must be encrypted.
The single key or
keyphrase decrypting an alias file thus
provides access to all the keys in the file.
But each alias file contains only a subset of the keys in use within
an organization, and even those are only valid over a subset of time.
An organization security officer could archive old alias files,
strip out the old keys and add new ones, then encipher the new
alias file under a new pass phrase.
In this way, the contents of old encrypted email would not be
hidden from the authorizing organization.
Alias file maintenance could be either as complex or as simple
as one might like.

A descriptive
variancestatistic
based on deviation from preceding sample.
This is computed as the sum of the squares of differences
between each sample and the previous sample, divided by 2, and
divided by the number of samples-1.

Allan Variance is useful in analysis of residual noise in
precision frequency measurement.
Five different types of noise are defined: white noise phase
modulation, flicker noise phase modulation, white noise frequency
modulation, flicker noise frequency modulation, and random walk
frequency modulation.
A log-log plot of Allan variance versus sample period produces
approximate straight line values of different slopes in four of
the five possible cases.
A different (more complex) form called "modified Allan deviation"
can distinguish between the remaining two cases.

The
Balanced Block Mixing (BBM)
which I introduced to cryptography in my article:
"Keyed Balanced Size-Preserving Block Mixing Transforms" (
locally, or @:
http://www.ciphersbyritter.com/NEWS/94031301.HTM)
in early 1994 (three years before the Rivest publication), and then
developed in a series of subsequent articles, apparently can be an
especially fine example of an all-or-nothing transform.

In
statistics, the statement formulated to
be logically contrary to the
null hypothesis.
The alternative hypothesis H1 includes every
possible result other than the specific outcome specified in
the null hypothesis.

The alternative hypothesis H1 is also called the
research hypothesis,
and is logically identical to
"NOT-H0" or "H0
is not true."

a
component or device intended to sense a
signal and produce a larger version of that signal.
In general, any amplifying device is limited by available power,
frequency
response, and device maximums for
voltage,
current, and
power dissipation. Also see:
voltage divider.

Transistors are
analog amplifiers which are basically
linear over a reasonable range and so
require
DC power. In contrast,
relays are classically mechanical devices
with direct metal-to-metal moving connections, and so can handle
generally higher power and
AC current.
The classic analog amplifier is an
operational amplifier.

Unexpected oscillation can be indicated by:

An unusually hot active device as felt by a finger.

Unexpectedly high current flow as shown by a multimeter.

Unexpected sounds as heard from a speaker monitoring the
unit under test.

Unexpected output signal as seen on an oscilloscope.

Unexpected variation when touching various pins, as shown on a
multimeter measuring the output signal, the output DC level
or power supply current.

Unexpected signal or variation as shown by an RF voltage probe
connected to multimeter.

Unexpected signal or variation as seen on a wideband AC voltmeter.

Oscillation occurs when:

an amplified signal finds its way back to the amp input; AND

the gain through the amplifier and feedback exceeds 1.0; AND

the total phase shift around the feedback loop is 360 degrees.

To stop undesired oscillation:

Increase isolation between input and output; OR

Decrease gain; OR

Change phase.

To Increase Isolation

Bypass the amplifier power pins.
All current from the output pin originally comes through a power pin.
Signal at the output is necessarily reflected in signal at the power
pins.
Unless power lines are bypassed with significant storage capacitance,
signal on the output will feed back to the input.
Serious bypassing should be a part of normal use.
The negative supply often needs to be cleaner than the positive
supply.

Decouple the amplifier power pins.
Add series resistors to the power supply to form low-pass filters
(in combination with the bypass capacitors), and thus decrease
high-frequency feedback between stages via the supply.

Try moving input and output leads as far apart as possible.
If that improves the situation, the feedback path has at least
been identified.

Try preventing capacitive coupling between input and output.
Interpose a conductive shield (try a finger) between in and out.
If that helps, consider shielding the input and output lines.
Or put in a permanent metallic shield like a piece of copper sheet
or PC board material.

For units with single-ended input and output signals and high
overall gain, try breaking the
ground loop.
Try isolation
transformers on input and/or output
signal lines.
Any stage with 40dB or more of resulting gain can be a particular
problem when output signal returns through the same ground used
by the input signal.

Increase high frequency negative feedback.
Try using a small capacitor across the feedback resistor to
reduce overall gain starting at a high frequency.
(Make capacitive reactance equal to feedback resistance perhaps
an octave above the highest desired frequency.)

High-pass filter the input.
A small series resistor and shunt capacitor form an
RC filter that
can take effect above the highest frequency of interest.
Differential inputs each with their own input resistor can use a
single shunt capacitor across the input pins.

Redesign for reduced stage gain.
Use another stage if more gain is necessary.

To change phase:

Identify the frequency-determining path.
Sometimes, touching various connections with fingers or a grounded
capacitor will affect oscillation.
If so, add components or change values to force a frequency which
has insufficient gain to support oscillation.

For capacitive loads, try a small series output resistor.
Capacitive loads as small as 50pF, or even just cable capacitance,
are notorious for causing
operational amplifier problems.
A small resistor on the output (e.g., the 50 or 75 ohm characteristic
impedance of driven coax) before the cable can settle things down
considerably.

Use a "snubber" series RC across resonant LC tanks.
For resonances at least 100x the highest desired frequency,
Ridley Engineering recommends a resistor equal to the
inductive (or
capacitive, since it is resonant)
reactance at the resonant frequency
(R = XL = 2 Pi f L), with a series capacitor of
C = 1 / (Pi f R).
An analysis from Hagerman Technology suggests a resistor of
R = SQRT( L / C) with a capacitor equal to
1 / (R f), which is about the same.

From Greek geometry, "according to a ratio."
To "draw a parallel" between situations which seem to have some
property in common. A mode of
argumentation in which a known
relationship or pattern is applied to a new situation.
A form of
inductive reasoning which
supports the creation of new testable consequences.

When two things are related by appropriate similarity in
structure or function, we can infer that what is known about one
thing also may apply to the other.
Such an inference may or may not be true, but it can be examined
and tested.

1. In mathematics and programming, a variable which is an
input to a
function. A parameter. An
independent variable.
2. In the study of
logic, reasons that support a
conclusion.
Technically, a sequence of statements
(premises) followed by a conclusion
statement.
An argument is valid only if the conclusion necessarily
follows from the premises, and may be invalid:

Refutation can occur in various ways.
Disputing the evidence being used to support a claim can be
considered a new claim and different evidence presented.
However, disputing the reasoning itself requires only logic and
typically no further evidence at all. See
extraordinary claims.

Like
cryptography, argumentation is
war, and tricks abound when winning
is the ultimate goal.
But arguing to win is fundamentally unscientific, since learning
occurs mainly when an error is found and recognized.

The first requirement of successful argumentation is to
have a stated topic or
thesis.
Without a stated topic, an unscrupulous opponent can lead the
argument to some apparently similar but more vulnerable issue,
and few in the audience will notice.
That is especially true when a topic is introduced casually,
and then changed by the opponent in the very first response.
Another approach for the opponent is to indignantly bring up and
discuss in detail some supposed error on an irrelevant but
apparently related topic.
A clever topic change also may cause awkward repetition and
babbling in the attempt to expose the change and reverse it.
The correct response is to be aware enough to recognize the topic
change immediately, and return to the original topic; to argue
that the comments are off-topic is to introduce a new topic.

There is no way to make an opponent stay on-topic, and
if they know they will lose on-topic, that actually may be
impossible.
Moreover, the opponent may pose various questions (on some new
topic), and claim you are not being responsive, the discussion
of that claim itself being a new topic.
But if you want to take your topic to conclusion, you cannot
follow an opponent who wants anything but that. (Also see
spin.)

The second requirement of successful argumentation is to force
the discussion to remain on the material content.
If the original argument might be successful, an unscrupulous
opponent may seek to divert the discussion to the appropriateness
of, or bias in, the symbols or names used for the concept.
Or the opponent may find and protest premises stated without
mathematical precision.
But a conventional argument need be neither mathematically
complete nor mathematically precise to be valid. (This is the
fallacy of
accident.)
The correct response is to point out that the comments are
irrelevant and return to the material issue; to argue that the
comments are wrong is to argue a changed topic.

How to Win

The goal of scientific argumentation is to improve knowledge
and insight, not to anoint a "superior" contestant.
Sadly, those willing to "win" with dishonesty generally do find
an easily mislead audience.

Almost all on-line arguments are technically
informal in the sense of depending
upon context and definitions.
The need for particular context generally leaves ample room to
confuse the issue, even for someone who knows almost nothing
about the topic.

If the proposed argument is basically unsound, that case can be
won on its merits.

How to Attack

If the proposed argument is basically sound, but based on
analogy, we need to realize that there are
few really good analogies.
Examine the analogy in detail and try various cases until one is
found that is good in the analogy but bad in the proposed argument.

If the proposed argument is basically sound, one can win anyway
by changing the topic and doing so in a smooth way the audience
will not notice.

Look for an invalid context.
Generally an argument is valid only within a particular context.
Find a different context where the argument does not work and
"disprove" it on that basis. Or,

Look for a source of confusion.
Usually the argument will depend upon some particular words, either
explicit or implied, that have multiple reasonable definitions.
Choose one or more of those, and show how the argument fails with
one of the other definitions.
Typically, the opponent will reply and explain the proper context,
which, of course, you already know.

Look for an error.
In the process of describing the context, the opponent generally
will increase the number of word dependencies, which is just more
material to exploit.
However, if the explanation can itself be interpreted as error,
you can expose that error and then claim the opponent is not only
wrong, but demonstrably incompetent.

How to Defend

Most responses carry a least a thin patina of respectability.
However, many times a response is actually just the first sad shot
in a verbal combat that seeks defeat and winning by deception.
Unfortunately, it may be difficult to distinguish between mere
ignorance and actual attack.
It is thus important to actually examine the logic of any response.

1. In abstract algebra, a
dyadic operation in which two sequential
operations on three arguments can first operate on either the
first two or the last two arguments, producing the same result in
either case: (a + b) + c = a + (b + c).
2. In algebra, the associative law for addition and
multiplication.
The algebraic law for evaluating the result of grouping
terms or
factors in
different ways, as in conventional arithmetic:

In a mathematical
proof, each and every assumption must be
true for the proof result to be true.
If the truth of any assumption is unknown, the proof is formally
incomplete and the result has no meaning.

In practice, proofs have meaning only to the extent that each
and every required assumption can be assured, including
assumptions which may not be immediately apparent.
In practical cryptography, while some assumptions possibly could
be assured by the user, others could only be assured by the cipher
designer, who must then be
trusted, along with his company, the entire
distribution path and so on.
Even worse, still other assumptions may be impossible to
assure in practice by any means at all, which makes any such
proof useless for practical cryptography.

Also RS-232 and similar "serial port" signals, in which
byte or character values are transferred
bit-by-bit in bit-serial format.
Since digital signals require both proper
logic levels and proper timing to sense
those levels, timing is established by the leading edge of a
"start bit" sent at the start of each data byte. See
asynchronous transmission.

The common RS-232 "serial port" signal, where data are
sent on a single wire.
This is complicated because a
bit is not just a
logic level, but also a time to
sample that level.
The necessary timing is established by the edge between a
"high" stop bit or resting level, and a "low" start bit.

Transmit: The line rests "high."
When a character is to be sent, a start bit or "low" level is sent
for one bit-time.
Then each data bit is sent, for one bit-time each, as are one or two
stop or "high" level bit-times.
Then, if no more data are ready for sending, the line just rests
"high."

Receive: The line is normally "high."
The instant the line goes "low" is the beginning of a start bit,
and that establishes an origin for bit timing.
Exactly 1.5 bit-times later, hopefully in the middle of the first
data-bit time, the line level is sampled to record the first
incoming bit.
The second bit is recorded one bit-time later, and so on.
When all bits have been recorded, the receiver sends the resulting
character, all bits simultaneously, to a local register or
FIFO queue for pickup.
Note that all this implies that we know the format of the character
with respect to bit time and number of bits.

Timing Accuracy: Everything depends upon both transmit
and receive ends having approximately the same bit timing.
The leading edge of the start bit temporarily
synchronizes the receiver, even though
the transmit and receive clock rates may be somewhat different.
With 8-bit characters, the last data bit is sampled exactly 8.5
bit-times from the detected leading edge of the start bit.
If the receive timing varies as much as +/- 0.5 bit in 8.5, the
last bit will be sampled outside the correct bit time.
So the total timing accuracy must be within +/- 5.8 percent, for
all sources transmit and receive clock variation, including
sampling delay in detecting the start bit.
Nowadays this is easily achieved with cheap
crystal oscillatorclock modules
and digital count logic.

General ways in which a
cryptanalyst may try to
"break" or penetrate the secrecy of a
cipher. These are notalgorithms; they are just
approaches as a starting place for constructing
specific algorithms.

In normal
cryptanalysis we start out knowing
plaintext,
ciphertext, and
cipher construction.
The only thing left unknown is the
key.
A practical attack must recover the key.
(Or perhaps we just know the ciphertext and the cipher, in which
case a practical attack would recover plaintext.) Simply finding a
distinguisher (showing that the
cipher differs from the chosen model) is not, in itself, an
attack.
If an attack does not recover the key (or perhaps the particular
key-selected internal
state used
in ciphering), it is not a real attack.

In cryptography, when someone says they have "an attack,"
the implication is that they have a successful attack (a
break) and not just another failed attempt.
It is obviously much easier to simply claim to have an attack
than to actually analyze, innovate, build and test a working attack,
which makes it necessary to back up such claims with
evidence.
Arrogant claims, with "proof left as an exercise for the student"
or "read the literature" responses, deserve jeers instead of the
cowed respect they often get.

A claim to have an attack can be justified by:

describing the process in such detail that others can
understand it and could use it to break the cipher,

actually performing the attack in practice and showing the
claimed results (e.g., finding the unknown
key, given
known plaintext), or

demonstrating the ability to do something fundamental which
should be impossible (like finding several strings which each have
the same
cryptographic hash result).

Simply finding a
distinguisher is not in itself
sufficient to expose keying or plaintext as required of a real attack.

It is not sufficient to say: "My interpretation of the theory is
that there must be a break, so the cipher is broken"; it is
instead necessary to actually devise a process which recovers key or
plaintext.
Furthermore, there are many attacks which work against scaled-down
tiny ciphers, but which do not scale up as valid attacks against
the original large cipher:
Just because we can solve newspaper-amusement ciphers (tiny versions
of conventional
block ciphers) does not imply that any
real-size block ciphers are "broken."
The process used to solve newspaper ciphers is not "an attack"
on block ciphers in general.

Classically, attacks were neither named nor classified; there
was just: "here is a cipher, and here is 'the' attack."
(Many different attacks may be possible, but even one practical
attack is sufficient to cause us to avoid that cipher.)
And while this gradually developed into named attacks, there is no
overall attack taxonomy.
Currently, attacks are often classified by the information available
to the attacker or constraints on the attack, and then by
strategies which use the available information.
Not only
ciphers, but also cryptographic
hash functions can be attacked, generally
with very different strategies.

Information Constraints

Ciphertext Only: We have only ciphertext to work with.
Sometimes the statistics of the ciphertext provide insight and
can lead to a break.

Known Plaintext:
We have some, or even an extremely
large amount, of plaintext and the associated ciphertext.

Defined Plaintext:
We can submit arbitrary messages to
be ciphered and capture the resulting ciphertext.
(Also Chosen Plaintext and Adaptive Chosen Plaintext.)
(A subset of Known Plaintext.)

Defined Ciphertext: We can submit arbitrary messages
to be deciphered and see the resulting plaintext.
(Also Chosen Ciphertext and Adaptive Chosen Ciphertext.)

Chosen Key: We can specify a change in any particular
key bit, or some other relationship between keys.

Timing: We can measure the duration of ciphering
operations and use that to reveal the key or data.

Fault Analysis: We can induce random faults into the
ciphering machinery, and use those to expose the key.

Man-in-the-Middle:
We can subvert the routing capabilities
of a computer network, and pose as the other side to each of the
communicators. (This is a key authentication attack on
public key systems.)

Attack Strategies

The goal of an attack is to reveal some unknown plaintext, or the
key (which will reveal the plaintext). An attack which succeeds with
less effort than a brute-force search we call a
break.
An "academic" ("theoretical," "certificational") break may involve
impractically large amounts of data or resources, yet still be called
a "break" if the attack would be easier than brute force.
(It is thus possible for a "broken" cipher to be much stronger than
a cipher with a short key.) Sometimes the attack strategy is thought
to be obvious, given a particular informational constraint,
and is not further classified.

Brute Force
(also Exhaustive Key Search): Try to decipher ciphertext under
every possible key until readable messages are produced.
(Also "brute force" any searchable-size part of a
cipher.)

Codebook (the classic
"codebreaking" approach): Collect a
codebook of transformations
between plaintext and ciphertext.

Differential
Cryptanalysis: Find a statistical correlation between
key values and cipher transformations (typically the
Exclusive-OR of text pairs), then use sufficient defined
plaintext to develop the key.

Linear Cryptanalysis: Find a linear approximation
to the keyed S-boxes in a cipher, and use that to
reveal the key.

Meet-in-the-Middle: Given a two-level multiple encryption,
search for the keys by collecting every possible result for
enciphering a known plaintext under the first cipher, and
deciphering the known ciphertext under the second cipher; then
find the match.

Key Schedule: Choose keys which produce known effects
in different rounds.

Birthday (usually a hash
attack): Use the
birthday paradox, the idea that
it is much easier to find two values which match than it is to
find a match to some particular value.

Formal Coding (also Algebraic): From the cipher design,
develop equations for the key in terms of known plaintext,
then solve those equations.

Correlation: In a
stream cipher, distinguish between
data and confusion, or between different confusion streams, from
a statistical imbalance in a
combiner.

Dictionary: Form a list
of the most-likely keys, then try those keys one-by-one (a way to
improve brute force).

Replay: Record and save some ciphertext blocks or messages
(especially if the content is known), then re-send those blocks
when useful.

Many attacks try to isolate unknown small components or aspects
so they can be solved separately, a process known as
divide and conquer. Also see:
security.

A network in the form of a tree is used, with goals represented
as nodes.
Various possible ways to achieve a particular goal are represented
as branches, which then can be taken as goals with their own branch
nodes.

In cryptographic analysis, the idea is that the root node will
represent the ultimate security we seek.
Each path to the root then represents the accumulated effort needed
to break that security.
The problem is that it is typically impossible to assure that
every alternative attack has been considered.
And if some unconsidered approach is cheaper than any other,
that becomes the true limit on security, despite not being
present in the analysis.

Attack tree analysis does not tend to expose unconsidered
attacks.
Yet those are exactly the issues which carry the greatest
cryptographic risk, because we can at least generally quantify the
risk from known attacks.
Since an attack tree cannot do what most needs to be done, it
would seem to be a strange choice for cryptographic risk analysis.
One could even argue that an attack tree is most useful as a
formal aid in deluding naive executives and users.

Threat models basically concern
what is to be protected, from whom, and for
how long.
But with ciphers, we seek to protect all our data, from
everyone, forever.
The extreme nature of these expectations is only part of what
makes a conventional threat model unhelpful in understanding
ciphering risks.

Cipher failure and exploitation happens in secret, so we cannot
know how often it occurs and cannot develop a probability for it.
Absent a probability of cipher failure, any attempt to understand
ciphering risk is necessarily limited.

A more effective approach to system security is to build with
understandable components. In
component design, we can define
exactly what each component permits.
In component analysis, we can consider the security effects and
expose the precise range of things each component allows.
If none of the allowed things can cause a security problem, we will
have no security problems.
Components essentially become a custom language of system design
which has no way of expressing security faults.

A component-based security design is far more restrictive and so
is far more demanding than the conventional mode of hacking
through a design and implementation.
However, this design process provides a road map for real security,
as opposed to
belief in results from flawed analytical
tools (like attack trees) and ad hoc analysis that simply
cannot deliver the assurances we need.

The ability to analyze security must be designed into a system;
it cannot be just added on to finished systems.

When sampling with replacement, eventually we again find some
object or value which has been found before. We call such an
occurrence a "repetition." A value found exactly twice is a
double, or "2-rep"; a value found three times is a triple or
"3-rep," and so on.

For a known
population, the number of repetitions
expected at each level has long been understood to be a
binomial expression.
But if we are sampling in an attempt to establish the
effective size of an unknown population, we have two problems:

The binomial equations which predict expected repetitions
do not reverse well to predict population, and

Exact repetitions discard information and so are less
accurate than we would like. For example, if we have a
double and then find another of that value, we now have
a triple, and one less double. So if we are using
doubles to predict population, the occurrence of a triple
influences the predicted population in exactly the wrong
direction.

Fortunately, there is an unexpected and apparently previously
unknown combinatoric relationship between the population and the
number of combinations of occurrences of repeated values. This
allows us to convert any number of triples and higher n-reps
to the number of 2-reps which have the same probability. So if we
have a double, and then get another of the same value, we have a
triple, which we can convert into three 2-reps. The total number
of 2-reps from all repetitions (the augmented 2-reps value)
is then used to predict population.

We can relate the number of samples s to the population
N through the expected number of augmented doubles
Ead:

Ead(N,s) = s(s-1) / 2N .

This equation is exact, provided we interpret all
the exact n-reps in terms of 2-reps. For example, a triple is
interpreted as three doubles; the augmentation from 3-reps to 2-reps
is (3 C 2) or 3. The augmented result is the sum of the
contributions from all higher repetition levels:

n i
ad = SUM ( ) r[i] .
i=2 2

where ad is the number of augmented doubles, and r[i]
is the exact repetition count at the i-th level.

And this leads to an equation for predicting population:

Nad(s,ad) = s(s-1) / 2 ad .

This predicts the population Nad as based on a mean value
of augmented doubles ad. (For an example and comparison to
various other methods, see the
conversation:

Clearly, we expect the number of
samples to be far larger than the number of augmented doubles, but
an error in the augmented doubles ad should produce a
proportionally similar error in the predicted population Nad.
We typically develop ad to high precision by averaging the
results of many large trials.

It is possible to authenticate individual
blocks, provided they are large enough to
minimize the impact of adding extra authentication data in each
block (see
block code).
One advantage lies in avoiding the alternative of buffering an
entire message before it can be authenticated.
That can be especially important for real-time (e.g., voice)
communications.

1. The right to demand acceptance.
2. A recognized, appointed, or certified expert responsible for
statements or conclusions.

Science does not recognize mere authority
as sufficient basis for a conclusion, but instead requires that facts
and reasoning be exposed for review.
The simple use of a name does not automatically create an
ad verecundiam fallacy
("Appeal to Awe").
A name can identify a body of work giving the needed facts and
the reasoning supporting a scientific conclusion.

Authority tends to hide the basis for drawing conclusions.
Authority tends to avoid addressing complaints of false reasoning.
Authority tends to hide reasoning and insists that a statement is
correct simply because of who made it.
A person repeating a conclusion from an authority often has no
idea of the reasoning behind it, or what it really means with
respect to limits or context.

In contrast, scientific thought exposes the factual basis and
the reasoning, which tells us what the conclusion really means.
Scientific thought is democratic and informs, and ideally gives
everyone the same materials from which to draw factual conclusions,
some of which may be new, strange and disconcerting, but nevertheless
correct.

In
statistics, the
linearcorrelation of a sequence to itself
("auto").
The extent that any particular sample linearly predicts subsequent
and prior samples (typically a different value for each positive
or negative delay).

AUTOmatic DIgital Network.
A message switching network for the U.S. military, fielded in the
late 1960's.
Typically secure on each link between large secure facilities called
ASC's (Automatic Switching Centers), with internal computer message
switching or forwarding between links.
Replaced by the MilNet packet switching network in the early 1990's.

The observed property of a
block cipher constructed in
layers or
"rounds" with respect to a tiny change
in the input. The
change of a single input bit generally produces multiple
bit-changes after one round, many more bit-changes after another
round, until, eventually, about half of the block will change.
An analogy is drawn to an avalanche in snow, where a small
initial effect can lead to a dramatic result.
As originally described by Feistel:

"As the input moves through successive layers the pattern of
1's generated is amplified and results in an unpredictable
avalanche. In the end the final output will have, on average,
half 0's and half 1's . . . ." [p.22]
-- Feistel, H. 1973. Cryptography and Computer Privacy.
Scientific American.228(5):15-23.

"For a given transformation to exhibit the avalanche effect,
an average of one half of the output bits should change whenever
a single input bit is complemented." [p.523]
-- Webster, A. and S. Tavares. 1985. On the Design of
S-Boxes.
Advances in Cryptology -- CRYPTO '85.523-534.

In
semiconductorelectronics, the dominant reverse-voltage
breakdown mode in so-called "zener"
diodes and other PN junctions with breakdown
voltages over 6 or 8 volts. A source of
shot noise and additional noise from
avalanche which can be much greater than shot noise.

In normal junctions, the space-charge region
(depletion region) between P and N
materials is fairly broad, so the extreme fields found in
Zener breakdown do not occur.
However, a combination of applied voltage, temperature, and random
motion may cause a covalent bond to break anyway, in a manner similar
to normal diode leakage.
When a breakdown does occur, the charge carrier is attracted by the
opposing potential and drops through the space-charge region,
periodically interacting with covalent bonds there.
When the field is sufficiently high, a falling charge carrier may
build up enough energy to break another carrier free when it hits.
Then both the original and resulting carriers continue to accelerate
through the space-charge region, each possibly hitting and breaking
many other bonds.
The result is a growing avalanche of carriers produced by each single
breakdown.
The avalanche effect can be seen as a form of
amplification and can be huge,
for example, 10**8.

In a series of almost forgotten semiconductor physics research
papers from the 1950's and 1960's, avalanching breakdown was shown
to consist of a multitude of "microplasma" events of perhaps 20uA
each.
These events are not completely independent, but instead interact,
but also have some apparently random component, probably thermal.
At least some of the microplasma events seem to have
negative dynamic resistance
and function like tiny like neon bulbs (and may even emit light).
One implication if this is an ability of some avalanching "zener"
diodes to directly support small, unsuspected
oscillations.
A series of very extensive discussions on sci.electronics.design
in 1997 (search: "zener oscillation") give experimental details.
Both LC tank oscillation and RC relaxation oscillation were
demonstrated in practice.
Thus, avalanche multiplication, often
assumed to be unquestionably
"quantum random," actually may have a disturbing amount of
predictable structure. True
Zener breakdown does not appear
to have the same problems, nor does
thermal noise, as far as we know.
Unfortunately, these "purer" sources may be much smaller than noise
from avalanche multiplication.

In contrast to Zener breakdown, which has a negative
temperature
coefficient, avalanche multiplication has a positive temperature
coefficient, like most resistances or
conductors.
Presumably this is due to heat causing increased activity in the
crystal lattice, thus preventing electrons from falling as far
before interacting, thus reducing the probability of breaking
another bond, and reducing the amplification.
In junctions that break down at about 6 volts the temperature
effects tend to cancel.
Also see: "Random Electrical Noise: A Literature Survey"
(locally, or @:
http://www.ciphersbyritter.com/RES/NOISE.HTM).

A
cipher design fault, planned or accidental,
which allows the apparent strength of the design to be easily
avoided by those who know the trick. When the design background
of a cipher is kept secret, a back door is often suspected.
Also see: trap door.

There is some desire to generalize this definition to describe
multiple-input functions. (Is a
dyadic function "balanced" if, for one
value on the first input, all output values can be produced, but
for another value on the first input, only some output values
are possible?) Presumably a two-input balanced function would
be balanced for either input fixed at any value, which would
essentially be a
Latin square or a
Latin square combiner. Also see
Balanced Block Mixing. As opposed to
bias. Also see
Ideal Secrecy and
Perfect Secrecy.

Balance is a pervasive requirement in many areas of
cryptography; for example:

Technically, a Balanced Block Mixer is an m-input-port
m-output-port mechanism with various properties:

The overall mapping is one-to-one and invertible: Every
possible input value (over all ports) to the mixer produces
a different output value (including all ports), and every
possible output value is produced by a different input value;

Each output port is a function of every input port;

Any change to any one of the input ports will produce a
change to every output port;

Stepping any one input port through all possible values
(while keeping the other input ports fixed) will step every
output port through all possible values.

The inverse mixing behaves similarly.
Say, for example, we are mixing 64
bytes of message into 64 bytes of result:
If we know 63 of the result bytes, we can step through the values
of the 64th byte, and get 256 different messages, each of which
will produce the 63 bytes we know (a
homophonic sort of situation).
If the actual messages are random-like and evenly distributed,
it will be difficult to know which particular message is implied.
The amount of uncertainty we have in the result is reflected in
the amount of uncertainty we have about the message.

The Basic Balanced Block Mixer

The basic Balanced Block Mixer is a pair of
orthogonal Latin squares.
The two input ports affect the rows and columns of both squares,
with the selected result in each square being the two output ports.
For example, here is a tiny nonlinear "2-bit" or
"order 4" BBM:

Suppose we wish to mix (1,3); 1 selects the second row up in both
squares, and 3 selects the rightmost column, thus selecting (2,0)
as the output. Since there is only one occurrence of (2,0) among
all entry pairs, this discrete mixing function is reversible, as well
as being balanced on both inputs.

In practice, we would probably want to use at least order 16,
which can be efficiently stored as an ordinary 256-byte "8-bit"
substitution table,
one with a particular
oLs structure in the data.

Scalable Linear Balanced Block Mixers

One way to use the BBM mixing concept is to develop linear
equations for oLs mixing for scaling to various sizes
(see my article:
Fencing and Mixing Ciphers from
1996 Jan 16).
We can do that in the finite
field of
mod-2 polynomials with an
irreducible modulus.
So we can easily have similar mixers of 16, 32, 64, 128 and 256 bit
port widths, and so on.
By using multiple mixers of different size in various connections,
we can easily mix blocks of size compatible to existing ciphers,
and much larger.

Explicit Nonlinear Balanced Block Mixers

A usually better way to use the BBM mixing concept is to
develop small, nonlinear and
keyed oLs's for use in
FFT-like patterns with 2n ports.
It is easy to construct keyed nonlinear orthogonal pairs of Latin
squares of arbitrary 4n order as I describe in my articles:

In any FFT-style structure, there is exactly one "path" from
any input to any output, and "cancellation" cannot occur.
Thus, we can guarantee that any change to any one
input must "affect" each and every output.
Similarly, each input is equally represented in each output,
which is
ideal mixing.
The resulting wide ideal mixing structure, using small BBM tables
as each
butterfly operation, is itself
a BBM, and is dynamically
scalable to virtually arbitrary size.

Scalable Mixing Advantages

Mixing has long been a problem in block ciphers.
The difficulty of mixing wide block values is one reason most
conventional block ciphers are small.
But having a small block means that there is not much room to add
features like:

on each block (also see
block coding and
huge block cipher advantages).
Per-block authentication, for example, allows blocks to be used
in real time, or delivered out of order, without first buffering
and authenticating the entire message.

Large blocks also have room to hold sufficient uniqueness to
support
electronic codebook mode,
which is not normally appropriate for block ciphers.
Large blocks in
ECB mode can support secure ciphering without
ciphertext expansion, a goal
which is very hard to reach in other ways.

When a BBM is implemented in
software, the exact same unchanged
routine can handle both wide mixing for real operation and narrow
"toy" mixing for thorough experimental testing. This supports both
scalable operation, and exhaustive
testing of
the exact code used in actual operation.

In
hardware, BBM block throughput or block rate
can be independent of block size.
Wide blocks can be mixed in the same time as narrow blocks by
pipelining each sub-layer of the mixing.
That of course makes large blocks far faster per
byte than small ones.

In the context of
cryptography, a
combinermixes two input
values into a result value. A balanced combiner must provide a
balanced relationship between each input
and the result.

In a statically-balanced combiner, any particular result
value can be produced by any value on one input, simply by
selecting some appropriate value for the other input. In this way,
knowledge of only the output value provides no information
-- not even statistical information
-- about either input.

The common examples of cryptographic combiner, including byte
exclusive-OR
(mod 2polynomial addition), byte addition
(integer addition mod 256), or other
"additive" combining, are
perfectly balanced. Unfortunately, these simple combiners are
also very weak, being inherently
linear and without internal
state.

A Latin square combiner
is an example of a statically-balanced
reversible nonlinear combiner with massive internal state.
A Dynamic Substitution
Combiner is an example of a dynamically or
statistically-balanced reversible nonlinear combiner with
substantial internal state.

In
electronics, typically an
interconnecting cable with two conductors having an exactly opposite
or symmetric signal on each.
More specifically, a two-wire signal interconnection where each wire
has the same impedance to ground or any other conductor.
In contrast to unbalanced line (like coaxial cable), where one
of the conductors (the shield) has a low impedance to ground.
In all cases, two conductors are required to transport a loop of
current carrying a signal.

Each conductor of a balanced line systems should have similar
driver output impedances (ideally low), similar wire effects, and
similar receiver termination impedances (ideally high).
At audio frequencies cables are not transmission lines, so "cable
impedance" is not an issue, and the differential receiver need not
match either the cable or the
driver.
When each wire has a similar impedance to ground, external magnetic
and electrostatic fields should act on them similarly, producing a
common effect on each wire which can "cancel out."

A
transformer winding makes a good
balanced line driver. In contrast,
operational amplifier circuits
with direct outputs probably will have only roughly-similar output
impedances.
Output resistors (e.g., 100 ohms) typically isolate each op amp
output from the cable, and any difference will represent driver
imbalance to external noise.
After being transported, the
differential mode signal is
taken between the two conductors, thus ignoring
common mode noise.
A transformer winding makes a good differential receiver and also
provides
ground loop isolation.
Operational amplifier receivers need a common-mode-rejection null
adjustment for best performance.

At audio
frequencies, the main advantage of balanced
line is rejection of AC hum and related power noises.
This can be achieved by driving only one line with the desired audio
signal, provided both lines are terminated similarly both in the
driver and receiver.

At radio frequencies, balanced line also minimizes undesired
signal radiation.
When the current changes in each wire are equal but opposite, they
radiate "out of phase," resulting in cancellation.
This is especially useful in
TEMPEST, but does require that both lines
be actively driven.

In a
bipolartransistor, the internal
resistance between the base lead and
the functioning base element.
Often denoted Rbb' or rb' and having a typical
value of perhaps 100 ohms. A major source of
thermal noise in a bipolar
transistor.

A bipolar transistor is made by diffusing impurities into
a thin slice of extremely pure single-crystal
semiconductor, such as silicon.
Typically, the collector contact is made at the top surface, and
the emitter contact is made on the bottom.
The base element is essentially a thin film situated between the
collector and emitter plates.
The base current must flow on the film, which is naturally more
resistive than the other thicker elements.

The BB&S
RNG is basically a simple squaring of the current
seed
(x) modulo a
composite (N) composed of two
primes (P,Q) of
public key size.
Primes P and Q must both be
congruent
to 3 mod 4, but the BB&S articles say that P and Q
also must be
special primes.
The special primes construction apparently has the advantage of
controlling the cycle structure of the system, and is part of the
BB&S design in the original articles.
Unfortunately, the special primes construction generally is not
presented in current texts.
Instead the texts deceptively describe a simplified version which
they nevertheless call BB&S.
Readers who do not study referenced articles will assume they know
what BB&S said, but they are only partly correct.

Unlike more common RNG's, the BB&S construction is notmaximal length,
but instead defines systems with multiple
cycles, including
degenerate, short and long cycles.
With large integer factors, state values on short cycles are very
rare, but do exist.
Short cycles are dangerous with any RNG, because when a RNG sequence
begins to repeat, it has just become predictable, despite any
theorems to the contrary.
Consequently, if we
key BB&S by choosing x[0] at random,
we may unknowingly select a weak short cycle (a
weak key), which would make the sequence
predictable as soon as the cycle starts to repeat.

The original BB&S articles lay out the technology to compute
the exact length of a long-enough cycle in the BB&S system.
Since it can be much easier to verify cycle length than to actually
traverse the cycle, this is a practical way to verify that
x[0] selects a long-enough cycle.
Values of x[0] can be chosen and checked until a long cycle
is selected.
Modern cryptography insists, to the point of strident intimidation,
that such verification is unnecessary.
However, the original authors apparently thought it was important
enough to include in their work.

The real issue here is not the exposure of a particular weakness
in BB&S, since choosing x[0] on a short cycle is very
unlikely.
But "unlikely" is not the same as "impossible."
And if the design goal is to eliminate every known weakness, even
extensive math which concludes "that particular weakness is too
unlikely to worry about" is beside the point:
"unlikely" does not satisfy the goal.
Mathematics does not get to impose goals on designers or users.

BB&S is said to be "proven secure" in the sense that iffactoring is hard, then the sequence
is unpredictable. And many people do think that factoring large
composites of public key size is hard.
Yet when a short cycle is selected and used, BB&S is obviously
insecure, and that is a direct contradiction for anyone who imagines
that "proven secure" applies to them.
Just knowing the length of a cycle (by finding sequence repetition)
should be enough to expose the factors.
This is also
evidence that the
assumption
that factoring is hard is not universally true.
Of course, we already know that factoring is not hard
-- when we give away the factors.
And giving away the factors is pretty much what we do if we allow
ourselves to select, use and traverse a short cycle.
For other comments on the proof, see the sci.crypt conversations:

The advantage of the special primes construction apparently is
that all "short" (but not degenerate) cycles are "long enough" for
use.
Thus, we can simply choose x[0] at random, and then easily
test that it is not on a
degenerate cycle.
(Just get some x[0], step x[0] to x[1], save
x[1], step x[1] to x[2], then compare
x[2] to x[1] and if they are the same, start over.)
The result is a guarantee that the selected cycle is
"long enough" for use. See the sci.crypt discussion:

Note that I now recommend taking two steps before checking
for a degenerate cycle.

It is sometimes said that the
special primes construction adds
nothing to BB&S, but that really depends more on the goals of the
cipher designer than the math.
Since BB&S is very slow in comparison to other
RNG's, someone selecting BB&S clearly has
decided to pay a heavy toll with the expectation of getting an
RNG which is "proven secure" in practice.
(That actually misrepresents the BB&S proof, which apparently
allows weakness to exist provided it is not an easy way to
factor N.)
The obvious goal is to get a practical RNG which has no
known weakness at all.

No mere
proof
can protect us when we ourselves choose and use a
weak key, even if doing that is shown
to be statistically very unlikely.
And if we do use a weak key, the "proven secure" RNG is clearly
insecure, which surely contradicts the motive for using BB&S
in the first place.
In contrast, simply by using the special primes construction and
checking for degenerate cycles, weak keys can be eliminated,
at modest expense.
Eliminating a known possibility of weakness, even if that
possibility is very small, seems entirely consistent with the
goal of achieving a practical RNG with no known
weakness, even if the result is not an RNG proven to have
absolutely no weakness at all.

Some would say that even the special primes construction is
overkill, but without it the so-called "proof of strength" becomes
a mere wish or hope that a short cycle is not being used, and
I see that as a contradiction.
It also might be a cautionary tale as to what mathematical
cryptography currently accepts as
proof, and as to what such "proof" means
in practical use.
For other examples of failure in the current cryptographic wisdom,
see
one time pad, and
AES (as an example of the size of the
permutation family in real conventional
block ciphers), and, of course,
old wives' tale.
Also see
algorithm.

The base-10 logarithm of the ratio of two
power values (which is also the same as
the difference between the log of each power value). The basis
for the more-common term
decibel: One bel equals 10 decibels.

A
subjective
conviction in the truth of a
proposition.
Mere belief is in contrast to a
hypothesis, which states some expectation
of fact and so may be verified or falsified by observation
and comparison in an experiment or just reality itself. Also see
dogma.

Ordinarily we distinguish mere belief from
proven truth, belief
thus implying something less than conclusive
evidence.
In this sense, to believe is to be willing to accept
unproven or even unprovable
assumptions, such as having faith, or
trusting in some machine or property.
One issue is whether such assumptions or trust is reasonable
in the real world.

Limiting what one can or should believe seems intertwined
with freedom of speech and individual rights:
Surely, anyone can believe what they want.
However, to the extent that we have real responsibilities to others
and society at large, unfounded belief can not uphold those
obligations.
In a seminal essay called "The Ethics of Belief" (circa 1877 and
reprinted on the web), William Clifford shows how unfounded belief
is insufficient support for decisions of life and death and
reputation.
Many of us would extend that to business planning (the
recent Waltzing with Bears by DeMarco and Lister (see
risk management) reprints the
first section of "The Ethics of Belief" as an appendix), as
well as scientific discussions and claims.

In that point of view, claiming something is true, when one has
not investigated the topic and does not know, is ethically wrong,
even if the claim it turns out (by pure dumb luck) to be correct.
It is not enough to claim something and hope it works out; it is
instead necessary to know that the claim is correct before making
the claim.
The ethical requirement is to have performed an investigation
sufficient to expect to know one way or another, and come to a
rationally supportable conclusion.
While not rising to the level of known fact, belief is something
on which reputation rests.
Being wrong thus has consequences to reputation, provided the error
is in the essence and not mere correctable detail.

This idea of requiring substantial investigation to come to
a belief may seem to conflict with the
scientific method, in that
a scientist seemingly makes a mere
claim, which generally stands until shown
false.
But in reality we expect that claim to be something beyond "mere."
We demand that a scientific investigator have put sufficient
professional effort into a conclusion before using a scientific
podium to spout off.
The investigation is what provides an ethical basis for belief,
which still may be wrong or (more likely) incomplete.

For example,
scientific publication
does not mean that all of science supports the described conclusions,
which are still just claims made by particular scientists.
Showing (not necessarily
proving) a claim to be wrong is
part of the process of science, not unwarranted intrusion.
Showing someone wrong in this context naturally affects
reputation, but rarely results in absolute ruin.

The process of experimentation involves making "claims," often
to be disproven, but those are clearly labeled
hypotheses for experiment, not
conclusions for use by others.

In contrast, when we have conclusive evidence of truth
we have knowledge and fact instead of belief.
Facts do not require belief, nor do they respond to voting
or authority.
Clearly, science depends upon knowledge and fact, not personal
beliefs, and it is crucial to know the difference. Also see:
scientific method,
extraordinary claims and
rhetoric.

". . . it has often to be considered as a defect from a
cryptographic point of view that bent functions are necessarily
non-balanced."
-- Dobbertin, H. 1994.
Construction of Bent Functions and Balanced Boolean Functions
with High Nonlinearity.
K.U. Leuven Workshop on Cryptographic Algorithms (Fast
Software Encryption).61-74.

Bent sequences are said to have the highest possible uniform
nonlinearity. But, to put this in perspective, recall that we
expect a random sequence of 16 bits to have 8 bits different
from any particular sequence, linear or otherwise. That is also
the maximum possible nonlinearity, and here we actually
get a nonlinearity of 6.

There are various more or less complex constructions for these
sequences. In most cryptographic uses, bent sequences are modified
slightly to achieve balance.

1. In
argumentation, favoring one side
over another, typically by making the burden of
proof more
difficult for one side. Also see
extraordinary claims.
2. In
statistics, results consistently some
amount displaced from the theoretical or practical expectation.
This is a clear indication that the measured process has not been
completely understood or
modeled.
3. In
electronics, typically a static or
DCvoltage or
current.
A bias voltage is necessary to condition some forms of transducer
(see, for example,
Geiger-Mueller tube).
However, the most common use is to keep some active device (typically a
transistor) "partly on," so that it can
amplify or respond to both positive and
negative parts of an
AC signal.

Transistor biasing is trickier than it might seem from knowing
the simple purpose of keeping the device "partly on":

There will be a wide range of different input bias solutions
to cover a wide range of no-signal output currents, which also
implies a wide range of output resistor values.

Output bias is normally controlled by the same base or gate
used for input signal, which to some extent compromises the
distinction between the two different functions.

Transistors apparently cannot be manufactured to close
specifications, so different devices, even of exactly the same type,
may need significantly different input bias values to achieve the
same output results.

One common biasing approach is to place a particular DC voltage
on the base or gate, and a resistor in the emitter or source lead.
Transistor action then tends to increase current until the emitter
or source has a voltage related to the base or gate, a form of
negative
feedback.
This sets the output bias current, which with a particular pull-up
resistor sets a desired output voltage.
One difficulty with this approach is that it demands an input signal
with lower impedance than the biasing, so that the AC signal will
dominate.
Another issue is that the emitter or source resistor will use some
of the available voltage simply to establish bias, voltage which
then is not available across the device for AC signals. (Also see
transistor self-bias.)

Apparently, a form of
data compression in which arbitrary
values or arbitrary strings can be decompressed into seemingly
grammatical text.
More generally, the ability to decompress random
blocks or strings into data with structure
and
statistics similar to that expected from
plaintext.
The phrase "Bijective Compression" should be taken as a name or
term of art, and not a mathematical
description.

Making random data decompress into language text (necessarily
also random in some way) would seem to be difficult.
Different classes of plaintext, such as language, database files,
program code, or whatever, probably require different compressors
or at least different compression models.
With respect to language text, such a compressor should decompress
random strings into spaced correct words or "word salad."
That should complicate attempts to automatically distinguish the
original
message or block from among other possibilities.

Should bijective compression actually be possible and practical,
the significance would be massive.
Computerized
attacks can succeed only if a correct
deciphering can be recognized automatically.
When incorrect decipherings have structure which is close to
plaintext, a computer may not be able to distinguish them from
success.
If humans skill is needed to read and judge the result of thousands
or millions of brute-force attempts, traversing a keyspace may take
tens of millions of times longer than simple computer scanning.
Making an attack millions of times harder than it was before could
be the difference between complete practical security and almost
no security at all.
Holding an attack loop down to human reading speeds could produce
a massive increase in practical
strength.

From the Latin for "dual" or "pair."
1. Dominantly used to indicate
"base 2": The numerical representation in which each digit has an
alphabet of only two symbols: 0 and 1.
This is just one particular
coding or representation of a value which
might otherwise be represented (with the exact same value) as
octal (base 8),
decimal (base 10), or
hexadecimal (base 16). Also see
bit and
Boolean.
2. The confusing counterpart to
unary when describing the
arity or number of inputs or
arguments to a
function, but
dyadic is almost certainly a better choice.

Binomial literally means "two names." In
statistics, the probability of finding
exactly k successes in nindependentBernoulli trials, each of which
has exactly two possible outcomes, when the "success" probability
is p:

n k n-k
P(k,n,p) = ( ) p (1-p)
k

This ideal
distribution is produced by evaluating
the probability function for all possible k, from 0 to
n.

If we have an experiment which we think should produce a
binomial distribution, and then repeatedly and systematically find
very improbable test values, we may choose to reject the
null hypothesis that the experimental
distribution is in fact binomial.

1. Having two polarities, as in + and -
voltages (e.g.,
AC), and P and N
semiconductor.
2. A junction
transistor (NPN or PNP).
As opposed to a field effect transistor.
3. Circuits or devices implemented with junction transistors.

The apparent
paradox that, in a schoolroom of only 23
students, there is a 50 percent probability that at least two will
have the same birthday.
The "paradox" is that we have an even chance of success with just
23 of 365 possible days represented.

The "paradox" is resolved by noting that we have a 1/365 chance
of success for each possible pairing of students, and there
are 253 possible pairs or
combinations of 23 things taken 2 at
a time. (To count the number of pairs, we can choose any of the 23
students as part of the pair, then any of the 22 remaining students
as the other part. But this counts each pair twice, so we have
23 * 22 / 2 = 253 different pairs.)
Note that 253 / 365 = 0.693151.

This problem seems to beg confusion between probability and
expected counts, since the correct expectation is often fractional.
We can relate the probability of finding a "double" of some
birthday (Pd) to the expected number of doubles (Ed) as
approximately (equations (5.4) and (5.5) from my article):

Pd = 1 - e-Ed ,
so
Ed = -Ln( 1 - Pd ) .

For a success probability of 0.5, the expected doubles are

Ed = -Ln( 1 - 0.5 ) = 0.693147 .

One way to model the overall probability of success is from the
probability of failure(1 - 1/365 = 0.99726)
multiplied by itself for each pair that could have a possible match.
For the birthday case this model gives the overall failure probability
of 0.99726253 (0.99726 to the 253rd power) or 0.4995, for a
success probability of 0.5005.

A different model addresses the probability of success for each
sample, instead of each pair.
For population (N) and samples (s) (equation (1.2) from my article):

Pd(N,s) = 1 - (1-1/N)(1-2/N)..(1-(s-1)/N) ,

which gives a success probability for 23 samples of 0.5073.

Sometimes the problem is to find the number of samples (s) needed
for a given probability of success in finding doubles (Pd) from a
given population (N).
Starting with equation (2.5) from my article and substituting (5.5),
we get:

s(N,p) = (1 + SQRT(1 - 8N Ln( 1 - Pd ))) / 2 .

For the birthday case the number of samples needed from a population
of 365 for an even chance of success is:

1. The smallest possible discrete unit of information. A
Boolean value: True or False; Yes or No;
one or zero; Set or Cleared. A contraction of
"binary digit," apparently coined by
J. W. Tukey.
Virtually all information to be communicated or stored
digitally is
coded in some way which
fundamentally relies on individual bits.
Alphabetic characters are often stored in eight bits, which is a
byte.
2. The
real number average of information in
bits per symbol as used in
entropy, assuming the computation uses
base-2 logs.

In
digitalelectronics,
bits generally are represented by
voltage levels on connected
wires, at a given time.
When the bit-value on a wire changes, some time will elapse
until the wire reaches the new voltage level.
Until that happens, the wire voltage is not a valid digital
level and should not be interpreted as having a particular bit
value. Also see:
logic level.

Use a code in which most codewords are balanced, with a
state machine to count the
unbalance and correct it as soon as possible (e.g.,
8b10b).
This also expands the message.

Use a
scrambler or simple
Vernam cipher to randomize and
statistically balance the data.
This does not expand the message, but also does not guarantee
perfect balance, although blocks of reasonable size will be
"nearly" balanced to high probability.

"Exact bit-balance can be achieved by accumulating data to
a block byte-by-byte, only as long as the block can be
balanced by adding appropriate bits at the end."

"We will always add at least one byte of 'balance data' at
the end of the data, a byte which will contain both 1's and
0's. Subsequent balance bytes will be either all-1's or
all-0's, except for trailing 'padding' bytes, of some
balanced particular value. We can thus transparently remove
the balance data by stepping from the end of the block, past
any padding bytes, past any all-1's or all-0's bytes, and
past the first byte containing both 1's and 0's. Padding
is needed both to allow balance in special cases, and when
the last of the data does not completely fill the last block."

"This method has a minimum expansion of one byte per block,
given perfectly balanced binary data. ASCII text may expand
by as much as 1/3, which could be greatly reduced with a
pre-processing data compression step."

(My article
"A Keyed Shuffling System for Block Cipher Cryptography," illustrates
key hashing, a nonlinearized RNG, and byte shuffling.
We would do a similar thing for bit-permutation, but with a larger
and wider RNG and shuffling bits instead of bytes. See either
locally, or @:
http://www.ciphersbyritter.com/KEYSHUF.HTM).

Ciphering by bit-transposition has unusual resistance to
known plaintext attack
because many, many different bit-permutations of the
plaintext data will each produce exactly
the same
ciphertext result.
Consequently, even knowing both the plaintext and the associated
ciphertext does not reveal the
shuffling sequence.
Bit-permutation thus joins
double-shuffling in hiding the
shuffling sequence, which is important when we cannot guarantee
the strength of that sequence (as we generally cannot).

In
TEMPEST,
electrical wiring or signals carrying
ciphertext
which thus can be exposed without danger.

Black Box

An
engineering term for a
component only
specified with input and output
values, with the internal implementation irrelevant.
In this way, the real complexity of the component need not be
considered in understanding a
system using the component.

Digitallogic IC's are wildly successful examples of
hardware black box components.
Externally, they perform useful digital functions, and in most cases,
digital designers need not think about the internal construction.
Internally, however, the "digital" devices use
analogtransistors to effect digital operation.

An example of black box
software design is a subroutine or
Structured Programming
module, where all interaction with the caller is in the form of
parameters.
The module uses the given resources, does what it needs, completes,
and returns to the caller.
As long as the module does what we want, there is no need to know
how the module works, so we can avoid dealing with internal
complexity at the lower level.
And when the module does not work, it can be debugged in a
minimal environment which avoids most of the complexity of the larger
system, thus making
debugging far easier.

Block

1. A fixed amount of data treated as a single unit.
2. More than one element treated as a single unit.
As opposed to a sequence of elements, as in a
stream.

In a discussion of
block cipher concepts,
cryptography implicitly uses
definition (2), because it is the accumulation of multiple
characters (and the resulting larger ciphering
alphabet) which is characteristic
of conventional block ciphers.
A one-element "block" simply cannot exhibit the various block
issues (such as
mixing,
diffusion,
padding and
expansion)
that we see in a real block cipher, and so fails to
model
both the innovation and the resulting problems.
Similar effects occur when any
scalable model is simplified beyond
reason. (See:
scientific method.)
It is also possible to cipher blocks of dynamically
selectable size, or even fine-grained
variable size.

All real block ciphers are in fact
streamed
to handle more than one block of data.
The actual ciphering might be seen as a stream meta-ciphering using
a block cipher transformation.
The point of this is not to provide a convenient
academic way to contradict any possible
response to a question of "stream or block," but instead
to identify the origin of various ciphering properties and
problems (see:
a cipher taxonomy).

It is not possible to block-cipher just a single
bit or
byte of a block.
(When that is possible, we may be dealing with a
stream cipher.)
If individual bytes really must be block-ciphered, it will be
necessary to fill out each block with
padding
in some way that allows the padding to be distinguished from the
actual plaintext data after deciphering.

Partitioning an arbitrary stream of into fixed-size blocks
generally means the ciphertext handling must support data
expansion, if only by one block. But handling even minimal
data expansion
may be difficult in some systems.

The distinction between "block" and "stream" corresponds to the
common distinction between "block" and "character" device drivers in
operating systems.
This is the need to accumulate multiple elements and/or pad to a
full block before a single operation, versus the ability to operate
without delay but requiring multiple operations.
This is a common, practical distinction in data processing and data
communications.

A competing interpretation of block versus stream operation
seems to be based on transformation "re-use":
In that interpretation, block ciphering is about having a complex
transformation, which thus directly supports re-use (providing each
plaintext block "never" re-occurs).
In that same interpretation, stream ciphering is about supporting
transformation re-use by changing the transformation itself.
These effects do of course exist (although in my view they are not
the most fundamental issues for analysis or design).
But that interpretation also allows both qualities to exist
simultaneously at the same level of design, and so does not provide
the full
analytical benefits of a true
logicaldichotomy.

A
cipher which requires the accumulation
of multiple data characters (or
bytes) in a
block before ciphering can start.
This implies a need for storage to hold the accumulation,
and time for the accumulation to occur.
It also implies a need to handle partly-filled blocks at the
end of a
message.
In contrast, a
stream cipher can cipher bytes
immediately, as they occur, and as many or as few as are required.
Block ciphers can be called
"codebook-style" ciphers, and are
typically constructed as
product ciphers, thus showing a
broad acceptance of
multiple encryption. Also see
variable size block cipher and
a cipher taxonomy.

There is a disturbing insistence by some academics that
only one model deserves the name "block cipher."
That is a problem because other types of
cipher do operate on data in
blocks, yet do not follow that model.
The problem is that general characteristics of "block ciphering"
are unlikely to be resolved when only one type of cipher is allowed
under that name.
This Glossary does not recognize those limitations.

Conventional Block Ciphers

A conventional
block cipher
is a transformation between all possible
plaintext block values and all possible
ciphertext block values, and is thus an
emulated
simple substitution on huge
block-wide values.
Within a particular block size, both plaintext and ciphertext have
the same set of possible values, and when the ciphertext values
have the same ordering as the plaintext, ciphering is obviously
ineffective.
So effective ciphering depends upon re-arranging the
ciphertext values from the plaintext ordering, and this is a
permutation of the plaintext values.
A conventional block cipher is
keyed by constructing a particular
permutation of ciphertext values for each key.

The mathematical model of a conventional block cipher is
bijection, and the set of all possible
block values is the
alphabet.
In cryptography, the bijection model corresponds to an invertible
table having a storage element associated with each possible
alphabet value.
Since each different table represents a different
permutation of the alphabet,
the number of possible tables is the
factorial of the alphabet size.

In particular, a conventional block cipher with a
64-bit block has an alphabet size of
264(that is, 2**64) elements (about
18446744073709552000), and the potential number of keys is the
factorial of that, a number which needs 1.15397859e+22 bits (about
10**22 bits or more than a million million million
bits) for full representation.
(For these and similar computations, try the "Base Conversion, Logs,
Powers, Factorials, Permutations and Combinations in JavaScript" page
locally, or @:
http://www.ciphersbyritter.com/JAVASCRP/PERMCOMB.HTM)
But, in practice, a popular block cipher like
DES
can only select from among
256(that is, 2**56)
different tables using 56 key bits.
To calculate the ratio of two values expressed as exponents we
subtract, so to find the proportion of tables we can actually use, we have
1.15x1022 - 56, (or
1.15*(10**22)),
which does not even change the larger expressed value.
Thus, DES and other conventional block ciphers generally support
only an almost infinitesimal fraction of the keys possible under
their own mathematical and cryptographic
model.
This is an inherent selection of a tiny subset of keys, which is a
massive deviation from the model of
balanced,
flat or unbiased keying across
all possibilities. Also see
AES.

Block Cipher Data Diffusion

In an ideal conventional block cipher, changing even a single
bit of the input block will change all bits of the ciphertext result,
each with
independent probability 0.5.
This means that about half of the bits in the output will
change for any different input block, even for differences of just
one bit. This is
overall diffusion and is
present in a block cipher, but usually not in a
stream cipher. Data diffusion is
a simple consequence of the keyed invertible simple substitution
nature of the ideal block cipher.

Improper diffusion of data throughout a block cipher can have
serious strength implications. One of the functions of data
diffusion is to hide the different effects of different internal
components. If these effects are not in fact hidden, it may be
possible to attack each component separately, and break the
whole cipher fairly easily.

Partitioning Messages into Fixed Size Blocks

A large message can be ciphered by partitioning the plaintext
into blocks of a size which can be ciphered. This essentially
creates a stream meta-cipher which repeatedly uses the same block
cipher transformation. Of course, it is also possible to re-key
the block cipher for each and every block ciphered, but this is
usually expensive in terms of computation and normally unnecessary.

A message of arbitrary size can always be partitioned into some
number of whole blocks, with possibly some space remaining in the
final block. Since partial blocks cannot be ciphered, some
random
padding can be introduced to fill out the
last block, and this naturally expands the ciphertext. In this
case it may also be necessary to introduce some sort of structure
which will indicate the number of valid bytes in the last block.

Block Partitioning without Expansion

Proposals for using a block cipher supposedly withoutdata expansion
may involve creating a tiny
stream cipher for the last block.
One scheme is to re-encipher the ciphertext of the preceding block,
and use the result as the
confusion sequence. Of course,
the cipher designer still needs to address the situation of files
which are so short that they have no preceding block.
Because the one-block version is in fact a stream cipher,
we must be very careful to never re-use a confusion sequence.
But when we only have one block, there is no prior
block to change as a result of the data. In this case, ciphering
several very short files could expose those files quickly.
Furthermore, it is dangerous to encipher a
CRC value in such a block, because
exclusive-OR enciphering is transparent to the field of mod 2
polynomials in which the CRC operates. Doing this could allow an
opponent to adjust the message CRC in a known way, thus avoiding
authentication exposure.

Another proposal for eliminating data expansion consists of
ciphering blocks until the last short block, then re-positioning
the ciphering window to end at the last of the data, thus
re-ciphering part of the prior block. This is a form of chaining
and establishes a sequentiality requirement which requires that
the last block be deciphered before the next-to-the-last
block. Or we can make enciphering inconvenient and deciphering
easy, but one way will be a problem. And this approach cannot
handle very short messages: its minimum size is one block. Yet
any general-purpose ciphering routine will encounter short
messages. Even worse, if we have a short message, we still need
to somehow indicate the correct length of the message, and this
must expand the message, as we saw before. Thus, overall, this
seems a somewhat dubious technique.

On the other hand, it does show a way to chain blocks for
authentication in a large-block cipher: We start out by
enciphering the data in the first block. Then we position the
next ciphering to start inside the ciphertext of the previous
block. Of course this would mean that we would have to decipher
the message in reverse order, but it would also propagate any
ciphertext changes through the end of the message. So if we add
an authentication field at the end of the message (a keyed value
known on both ends), and that value is recovered upon deciphering
(this will be the first block deciphered) we can authenticate the
whole message. But we still need to handle the last block
padding problem and possibly also the short message problem.

Block Size and Plaintext Randomization

Ciphering raw plaintext data can be dangerous when the cipher
has a relatively small
block size.
Language plaintext has a strong, biased distribution of symbols and
ciphering raw plaintext would effectively reduce the number of possible
plaintext blocks.
Worse, some plaintexts would be vastly more probable than others,
and if some
known plaintext were available,
the most-frequent blocks might already be known. In this way,
small blocks can be vulnerable to classic
codebook attacks which
build up the ciphertext equivalents for many of the plaintext
phrases. This sort of attack confronts a particular block size,
and for these attacks Triple-DES is no stronger than simple DES,
because they both have the same block size.

The usual way of avoiding these problems is to randomize
the plaintext block with an
operating mode such as
CBC. This can ensure that the plaintext
data which is actually ciphered is evenly distributed across
all possible block values. However, this also requires an
IV which thus expands the ciphertext.

Worse, a block
scrambling or
randomization
function like CBC is public, not private.
It is easily reversed to check overall language statistics and
thus distinguish the tiny fraction of
brute force results which produce
potentially valid plaintext blocks.
This directly supports brute force attack, as well as any
attack in which brute force is a final part.
One alternative is to use a preliminary cipher to randomize
the data instead of an exposed function.
Pre-ciphering prevents easy plaintext discrimination; this is
multiple ciphering,
leading in the direction
Shannon'sIdeal Secrecy.

Another approach (to using the full block data space) is to
apply data compression to the plaintext before enciphering.
If this is to be used instead of
plaintext randomization, the designer must be very careful that
the data compression does not contain regular features which
could be exploited by the opponents.

An alternate approach is to use blocks of sufficient size
for them to be expected to have a substantial amount of uniqueness or
entropy.
If we expect plaintext to have about one bit of
entropy per byte of text, we might want a block size of at
least 64 bytes before we stop worrying about an uneven
distribution of plaintext blocks. This is now a practical
block size.

As far as we know, the original automated ciphers were
stream ciphers developed from the
Vernam work with teleprinter
encryption, as patented in 1919.
Block terminology seems to have come along
much later, to distinguish fundamentally different designs from
the old, well-known
streams.
This distinction occurred long before modern open cryptographic
analysis.
Distinguishing "block" from "stream" in the present day is important
because it is useful: a true
dichotomy is the friend both of the
analyst and student.
(Also see
a cipher taxonomy.)

It may be helpful to recall a range of published distinctions
between "stream cipher" and "block cipher" (and if anyone has any
earlier references, please send them along).
Note that open discussion was notably muted during the Cold War,
especially during the 50's, 60's and 70's.
I see the earlier definitions as attempts at describing an existing
codification of knowledge, which was at the time tightly held but
nevertheless still well-developed.

Some Early Definitions

1976. "Much as error correcting codes are divided into
convolutional and block codes, cryptographic systems can be divided
into two broad classes: stream ciphers and block ciphers.
Stream ciphers process the plaintext in small chunks (bits or
characters), usually producing a pseudo-random sequence of bits
which is added modulo 2 to the bits of the plaintext.
Block ciphers act in a purely combinatorial fashion on large
blocks of text, in such a way that a small change in the input
block produces a major change in the resulting output." (p. 646)
[Unfortunately, these two different definitions establish
four classes, not just the two classes suggested by the
first sentence.
For example, what do we call a cipher which acts on "blocks of text"
but not in a "purely combinatorial fashion"?
Having two different distinctions for only two classes is an error
that we need to recognize and get beyond. /tfr].-- Diffie, W. and M. Hellman.
"New Directions in Cryptography."
IEEE Transactions on Information Theory.
Vol. IT-22, No. 6, November 1976.

1979. "Block ciphers divide the plaintext into blocks,
usually of a fixed size, and operate on each block independently."
(p. 415)
"Stream ciphers, in contrast, do not treat the incoming
characters independently." (p. 415)
[Note that blocking and independence are
different concepts, thus introducing the confusion of
exactly which part of the definition to follow. /tfr]-- Diffie, w. and M. Hellman.
"Privacy and Authentication: An Introduction to Cryptography."
Proceedings of the IEEE.
Vol. 67, No. 3, March 1979.

Some Current Definitions

1996. "Symmetric algorithms can be divided into two
categories.
Some operate on the plaintext a single bit (or sometimes byte) at
a time; these are called stream algorithms or
stream ciphers.
Others operate on the plaintext in groups of bits.
The groups of bits are called blocks, and the algorithms
are called block algorithms or block ciphers." (p. 4)
-- Schneier, B.
Applied Cryptography.

1997. "A block cipher is a function which maps n-bit
plaintext blocks to n-bit ciphertext blocks; n is
called the blocklength.
It may be viewed as a simple substitution cipher with large
character size." (p. 224)
[But this definition excludes the case of a cipher with an output
block larger than the input block, so what would such a cipher be
called? /tfr]
"Stream ciphers . . . are, in one sense, very simple block ciphers
having block length equal to one."
"They also can be used when the data must be processed one symbol
at a time (e.g., if the equipment has no memory, or buffering of
data is limited)." (p. 20)
--Menezes, A., van Oorschot, P. and Vanstone, S.
Handbook of Applied Cryptography.

The intent of classification is understanding and use.
Accordingly, it is up to the analyst or student to "see" a
cipher in the appropriate context, and it is often useful to consider
a cipher to be a hierarchy of ciphering techniques.
For example, it is extremely rare for a block cipher to encipher
exactly one block.
But when that same cipher is re-used again that seems a lot like
repeated substitution, which is the basis for stream ciphering.
(Of course repeatedly using the same small substitution would be
ineffective, but if we attempt to classify ciphers by their
effectiveness, we start out assuming what we are trying to
understand or prove.)
So an alternate way to "see" the re-use of a block cipher is as a
higher-level stream "meta-cipher" which uses a block cipher component.
But that is exactly what we call "block ciphering."

Alternate Distinctions

Some academics insist upon distinguishing stream versus
block ciphering by saying that block ciphers have no retained
state between blocks, while stream ciphers do.
Simply saying that, however, does not make it true, and only one
example is needed to expose the distinction as false and misleading.
A good example for that is my own
Dynamic Transposition cipher,
which is a block cipher in that it requires a full block of data
before processing can begin, yet also retains state between blocks.
So if DT is not a block cipher, what is it?
We would hope to define only two categories, not four or more.
Note that Lempel (1979, above) explicitly says that transposition
is a block cipher. Again, see:
a cipher taxonomy to see one
approach on how ciphers relate.

Blocks by Implementation

Another issue is that stream ciphers can be implemented in ways
that accumulate a block of data before ciphering.
Internally, such systems generally have a streaming system which
traverse the block element-by-element, perhaps multiple times.
It is important to see beyond an apparent block requirement stemming
from data manipulation only, which thus contributes no strength,
to the internal ciphers which (hopefully) do provide strength.

It is also possible to have multiple stream ciphers work on the
same "block," and then we do have a legitimate "block cipher"
(or perhaps a "block meta-cipher") formed by
multiple encryption of stream
ciphers.
(Although multiple ciphering with
additive stream ciphers is
usually unhelpful, most conventional block ciphers are in fact
multiple encryptions internally, so internal multiple ciphering is
hardly a crazy approach.)
But if we want to understand strength, we still need to consider the
fundamental ciphering operations which, here, are streams.
Simply making something work like a block cipher does not give it
the same
model as a conventional
block cipher, and so does not provide for analysis at that level.
In the end, we might see such a construction as a block meta-cipher
composed of internal stream ciphers.

In the study of technology, it is often important to create a
scientific model which describes
observed reality. In
cryptography, things are more fluid
than in the physical sciences, because the reality of ciphering is
itself is a construction, and there are many such constructions.
Consequently, it is not always obvious what model delivers the best
insight.
In fact, different models may provide different insights on exactly
the same reality.

Each of these models has widely different implications, advantages
and problems.

The Block as a Bijection

The common academic model of a block cipher is the mathematical
bijection, which
cryptography calls
simple substitution.
In practice, such a cipher requires a table far too large to
instantiate, and so the actual cipher only emulates a huge,
keyed table.

One advantage of the bijection model is that specific, measurable
mathematical things can be said about a bijection. Of course
exactly the same things also can be said about simple substitution,
and the field of ciphering is cryptography, not mathematics.

One problem with the bijection model is that it does not attempt
to establish a
dichotomy.
In the bijection model, "block cipher" just another label in a
presumably endless sequence of such labels, each representing a
distinct ciphering approach.
Consequently, the bijection model makes a poor contribution toward
an overall
cipher taxonomy useful in the
analysis of arbitrary cipher designs.

Another problem with the bijection model is that it establishes
yet another
term of art: The word
block is well known, understood, and
rarely disputed. The word
cipher is also widely agreed upon.
The phrase "block cipher" obviously includes nothing about
bijections.
So to define "block cipher" in terms of bijections is to take the
phrase far beyond the simple meaning of the terms.
We could scarcely describe this as anything other than misleading.

Yet another problem with the bijection model, is that, since it
presumes to define "block cipher" as a particular type of cipher,
what are we to do with ciphers which operate on blocks and yet
do not function as bijections (e.g.,
transposition cipher)?
No longer are ciphers related by their proper description.
This is even more misleading.

Ultimately, the problem with the bijection model is not the
model itself:
The model is what it is because substitution is what it is.
The problem is the insistence by some academics that this is the
only valid model for a "block cipher."
A much better choice for the bijection model is the phrase:
"conventional block cipher."

The Block as Static State

The static state model puts forth the proposition that
stream ciphers dynamically change their internal
state, whereas block ciphers do not.
Typically, there is also an understanding that the bijective
block cipher model applies.

One problem with the static state definition is again in the
name itself: The phrase "block cipher" does not include the
word "state."
To use the phrase "block cipher" for a property of state is to
create yet another
term of art, preempting the obvious
meaning of the phrase "block cipher," and preventing related
block-like ciphers from having similar descriptions, thus
misleading both instructor and student.

Another problem with the static state model is that we can build
stream-like ciphers which do not change their internal state
(in fact, I claim we
stream a
substitution table when we
repeatedly use it across a message, just like we stream DES).
Similarly, we can build block-like ciphers which do change
their internal state (I usually offer my
Dynamic Transposition cipher
as an example, but so is a block cipher built from multiple
internal stream operations).
So if we accept the static state model, what do we call those
ciphers which function on blocks, and yet do change state?
Why preempt the well-known terms "block" or "stream" for the
fundamentally different properties of internal state?

Ultimately, what insight does state classification provide that
warrants usurping the obvious descriptive phrases "block cipher"
and "stream cipher" instead of thinking up something appropriate?

The Block as Multiple Elements

The original mechanized ciphers were
stream ciphers, starting with the
Vernam cipher of 1919.
The term "block cipher" may have been introduced in the secret
world of government security to draw a practical distinction between
the well-known
stream concept, and the newer designs that
operated on a
block.
(That would have been in the 50's, 60's or even 70's; hopefully,
someone will either confirm this or correct it.)
In the multiple-element model, a block cipher requires the
accumulation of more than one data element before ciphering can
begin.

One advantage of the multiple-element definition is that it
forms an easy
dichotomy with the definition of a
stream cipher
as a cipher which does not require such accumulation.
Also note that this is no mere semantic issue, but is instead just
one representation of a broader concept of "one versus many" which
rises repeatedly in computing practice, including:

"block" versus "character" device drivers in an OS, and

"parallel" versus "serial" buses in digital electronics.

The various consequences of the single-element versus
multiple-element dichotomy are well known: When blocks are
accumulated from individual elements, storage is required for that
accumulation, and time is required as well, which can imply
latency.
In contrast, when elements need not be accumulated, there need be
neither storage nor latency, but the total overhead may be greater.
While latency probably is not much of an issue for email ciphering,
latency can be significant for real-time streams like music or
video, or interactive handshake protocols.
Overhead is, of course, a significant issue in
system design.

To see how the multiple-element block cipher definition works,
consider the following:

What is a transposition cipher? It is one form of a
block cipher.

What is a cipher which repeatedly runs several different
internal stream-like ciphers across a data block? That is also
one form of a block cipher.

What is simple substitution? Also one form of a
block cipher. Although notthe form of a block cipher,
it is the "conventional block cipher."

Strangely, a degenerate block is exactly the same as a
degenerate sequence: just one element.
In neither case does that element teach about the larger object: a
one-element block does not have diffusion between elements, and a
one-element stream does not have correlation between elements.
(Similarly, is a single electronic wire with a fixed voltage
one-wire "parallel" or one-value "serial"?)
From this we conclude that the most important aspects of
cryptographic (and electronic) design and analysis simply
do not exist as a single element, so it is inappropriate
to either use or judge a model at that level.

A block size of nbits
typically means that 2n
different codewords or "block values" can occur.
An (n,k) block code uses those 2n
codewords to represent the equal or smaller count of
2k different messages. Thus, a 64-bit
block cipher normally encodes 64
plaintext bits into 64
ciphertext bits as a simple (64,64) code.
But if 16 input bits are reserved for other use, the coding
expands
48 plaintext bits into 64 ciphertext bits,
so we have a (64,48) code.

The normal use for extra codewords is to implement some form of
error detection and/or
error correction.
This overhead is not normally called "inefficient coding," but
is instead a simple cost of providing improved quality.
In cryptography, the extra code words may be used to add security
or improve performance by implementing:

Note that the FWT computation is done for efficiency only.
It is wholly practical to compute the nonlinearity of short
sequences by hand.
It is only necessary to manually compare each bit of the measured
sequence to each bit of an
affine Boolean function.
That gives us the distance from that particular function, and
we repeat that process for every possible
affine Boolean function
of the measurement length.

Especially useful in
S-box analysis, where the nonlinearity for
the
table is often taken to be the
minimum of the nonlinearity values computed for each output bit.

The
stream cipher
formed by intermingling or multiplexing two or more
streams of data into a single stream of
ciphertext. For
encryption, presumably some form of
keyedRNG would select which
plaintext stream would contribute the
next
bit or
byte to the ciphertext stream.
For decryption, a similar keyed RNG would demultiplex or
allocate each ciphertext bit or byte to the appropriate
plaintext stream.
If the different streams are similar, from the ciphertext it should
be difficult to distinguish which bits or bytes belong to which
plaintext stream.
Accordingly, the scheme may work best as a
super encryption or a meta-cipher
handling already encrypted and thus very similar random-like streams.
See: "Simon's Braided Stream Cipher"
(locally, or @:
http://www.ciphersbyritter.com/BRAID/BRAID.HTM).
Also see
ciphertext expansion,
transposition and
null.

Measures of block mixing:
The minimum number of nonzero elements (e.g., bytes) in both
the input and output, over all possible inputs.
Or the minimum number of changing elements in both the input
and output, over all possible input changes.
When that mixing is applied between layers of
S-boxes, this can be the minimum number of
changing or
active S-boxes.

Branch number specifically applies only to a
linear mixing.
Actually, even that is not quite right: the real problem is
keying, not nonlinearity (although in practice,
keying may imply nonlinearity).
To the extent that we can experimentally traverse the input block,
a branch number certainly can be developed for a nonlinear mixer.

But while any particular mixer can have a branch number, a
keyed mixer will have a branch number for every possible key.
Moreover, we would expect the minimum over all those nonlinear
mixings to be very low, just like the minimum strength of any
cipher over all possible keys (the
opponent trying just one key) is also very
low. Yet we do not attempt to characterize ciphers by their minimum
strength over all possible keys.

No keyed structure can be properly characterized by the extrema
over all keys. When we have
random variables such as keying,
we should be thinking of the
distribution of values, and the
probability of encountering extreme values.
And that is not branch number.

More insight is available in the description of the SQUARE cipher:

"It is intuitively clear that both linear and differential trails
would beneft from a multiplication polynomial that could limit the
number of nonzero terms in input and output difference (and
selection) polynomials.
This is exactly what we want to avoid by choosing a polynomial
with a high diffusion power, expressed by the so-called
branch number.

Let wh(a) denote the
Hamming weight
of a vector, i.e., the number of nonzero components in that vector."
[Normally, Hamming weight applies to bits, but here it is
being used for bytes./tfr]
"Applied to a state a, a difference pattern a' or a
selection pattern u, this corresponds to the number of
non-zero bytes. In [2] the branch number B of an invertible
linear mapping was introduced as

B(theta) = for a&LT;>0, min wh(a)+ wh(a)

This implies that the sum of the Hamming weights of a pair of
input and output difference patterns (or selection patterns) to
theta is at least B.
It can easily be shown that B is a lower bound for the
number of active S-boxes in two consecutive rounds of a linear or
differential trail."

"In [15] it was shown how a linear mapping over
GF(2m)n with optimal B(B = n + 1) can be constructed from a
maximum distance separable code."

Measurement of Linear Mixing

In the
wide trail strategy,
branch number applies to a particular unkeyed and linear
diffusion mechanism.
In the SQUARE design, branch number also applies to a particular
unkeyed and linear polynomial multiplication.
So branch number might also describe the simple linear form of
Balanced Block Mixing used in
Mixing Ciphers.
But linear BBM's apparently do not have an optimal branch number
(over all possible input data changes), although in most cases they
do have a good branch, and are dynamically scalable to both tiny
and huge blocks on a block-by-block basis.

Nonlinear Mixing

Instead of linear diffusion, it should be "intuitively obvious"
that nonlinear diffusion would be a better choice for a cipher,
if such could be obtained with good quality at reasonable cost.
Nonlinear
Balanced Block Mixing occurs
when the
butterfly functions are
keyed.
Keying is easily accomplished by constructing appropriate
orthogonal Latin squares
using the fast
checkerboard construction.
But "branch number" does not apply to these keyed nonlinear
constructions.

The Optimal Branch Value

The "optimal" branch value for the MDS codes in the SQUARE
design is given as n + 1.
The branch number is basically the minimum number of input
and output elements which are guaranteed to change, and
there are 2n such elements.
But when we try all possible cases of changed input data across
a block, somewhere in there we ourselves create various worst-case
inputs with only one element changed.
In those cases, even if all n outputs change, we only get
a branch of n + 1, so that is the most
possible.

1. To destroy an item, or offend and thus destroy an agreement.
2. In
cryptography, an
attack which destroys the advantage of a
cipher in hiding information.
We would call such an attack "successful."

From the Handbook of Applied Cryptography:

"1.23 Definition. An encryption scheme is said to be breakable
if a third party, without prior knowledge of the
key pair (e,d), can systematically recover
plaintext from corresponding
ciphertext in some appropriate time
frame." [p.14]

"Breaking an information security service (which often
involves more than simply encryption) implies defeating the
objective of the intended service." [p.15]

The term "break" seems to be a
term of art in academic
cryptanalysis, where it apparently
means a successful attack which takes less effort than
brute force (or the cipher design
strength, if that is less), even if the effort required is
impractical, and even if the attack is easily prevented at the
cipher system level.
This meaning of the term "break" can be seriously misleading
because, in English, "break" means "to render unusable" or
"to destroy," and not just "to make a little more dubious."

The academic meaning of "break" is also controversial, as it can
be used as a slander to demean both cipher and designer without
a clear analysis of whether the attack really succeeds.
And even if the attack does succeed, the question is whether
it actually reveals data or key material, thus making the cipher
dangerous for use in practice.

Everyone understands that a
cipher is "broken" when the information in
a message can be extracted without the
key, or when the key itself can be recovered,
with less effort than the design strength.
And a break is particularly significant when the work involved
need not be repeated on every message.
But when the amount of work involved is impractical, the situation
is best described as a theoretical or
academic break.
The concept of an "academic break" is especially an issue for
ciphers with a very large keyspace, in which case it is perfectly
possible for a cipher with an academic break to be more
secure than ciphers with lesser goals which have no "break."
It is also at least conceivable that an attack can be surprising and
insightful and, thus, "successful" even if it takes more
effort than the design strength, which would be no form of
"break" at all.

In my view, a documented flaw in a cipher, such as some statistic
which
distinguishes a practical cipher from
some
model, but without an
attack which recovers data or key, at most
should be described as a "theoretical" or "certificational"
weakness.
Unfortunately, even a problem which has no impact on security
is often promoted (improperly, in my view) to the term
academic break or even "break"
itself.

A form of
attack in which each possibility is tried
until success is obtained. Typically, a
ciphertext is
deciphered
under different
keys until
plaintext is recognized. On average,
this should take about half as many decipherings as there are keys.
Of course, the possibility exists that the correct key might be
chosen first.

Even when the key length of a cipher is sufficient to prevent
brute force attack, that key will be far too small to produce every
possible plaintext from a given ciphertext (see
Shannon'sPerfect Secrecy).
Combined with the fact that language is redundant, this means that
very few of the decipherings will be words in proper form.
So most wrong keys could be identified immediately.

On the other hand, recognizing plaintext may not be easy.
If the plaintext itself
-- and all known structure in the plaintext
-- could be hidden, even a brute force attack on the
keys could not succeed, for the correct deciphering could not
be recognized when it occurred.
If the plaintext was not language, but computer code, compressed
text, or even ciphertext from another cipher, recognizing a
correct deciphering could be difficult.
It seems odd that more systems do not seek to leverage this
advantage.
See:
known plaintext,
Shannon'sIdeal Secrecy and
multiple encryption.

Brute force is the obvious way to attack a cipher, and the way
most ciphers can be attacked, so ciphers are designed to have a
large enough
keyspace to make this much too expensive
to succeed in practice.
Normally, the design
strength of a cipher is based on the
cost of a brute-force attack.

A pair of
dyadicfunctions
used as the fundamental operation for "in place" forms of
FFT.
The name comes from a common graphic depiction of FFT operation
which has the shape of a standing "hourglass," or butterfly wings
on edge. Also see
fast Walsh transform.

In most FFT diagrams, the input elements are shown in a vertical
column at the left, and the result elements in a vertical column on
the right.
Lines represent signal flow from left to right.
There are two computations, and each requires input from each of
the two selected elements.
In an "in place" FFT, the results conveniently go back into the
same positions as the input elements.
So we have two horizontal lines between the same elements, and
two diagonal lines going to each "other" element, which cross.
This is the "hourglass" shape or "butterfly wings" on edge.

A source of stable power is the most important requirement for any
electronic device.
In particular,
digitallogic functions can only be trusted
to produce correct results if their power is kept within specified
limits.
It is up to the designer to provide sufficient correct power and
guarantee that it remain within limits despite whatever else is
going on.

Most digital logic families use "totem pole" outputs, which
means they have a transistor from Vcc or Vdd (power) to the output
pin, and another transistor from the output pin to Vss (ground).
Normally, only one transistor is ON, but as the output signal passes
from one state to another, transiently, both transistors can
be ON, leading to short, high-current pulses on both the Vcc and
ground rails.
These current pulses are essentially RF energy, and can and do
produce ringing on power lines and a general increase in system noise.
The pulses are also strong enough to potentially change both the
Vcc and ground voltage levels in the power distribution system near
the device, which can affect nearby logic and operation.
Typically this occurs at some random moment when the worst conditions
coincide to cause a logic fault.
To avoid that, we want to bypass the current pulse away from the
power system in general, so other devices are not affected.

For many years, a typical
rule of thumb was to use a 0.1uF ceramic
disc for each supply at each bipolar chip, plus a 1uF tantalum for
every 8 chips.
That may still be a good formula for slower analog chips and older
digital logic like LSTTL.
But as chip speed has increased, bypassing has become more complex.

Ideally, a bypass capacitor will be connected from every supply
pin to the ground pin right at each chip.
Ideally, there will be no lead left on either end of the capacitor:
not 1/4 inch, not 1/8 inch, which is one reason why surface-mount
capacitors are desirable.
Ideally, any necessary lead will be wide, flat copper.
But the ideal system is a goal, not reality.

One of the effects of higher system speeds is that normal system
operation now covers the
resonantfrequency
of the bypass capacitors.
Unfortunately, this resonance is not a fixed constant, even for a
particular type of part.
Bypass resonance is instead a circuit condition, involving the
reactance of the closest bypass capacitor, plus the inductance in
power connections, and reactance in other bypass capacitors.
Although it is virtually impossible to remove inductance from
PC-board traces, it is possible to use whole copper layers as
"power planes" for power distribution.

Resonance means that an impulse causes "ringing," in which
energy is propagated back and forth between inductance and
capacitance until it finally dissipates in circuit resistance or
is radiated away, but the resulting signal from many devices may
appear as increased system noise.

Resonance would actually seem to be the ideal bypass situation,
in that a resonant bypass presents the minimum impedance to ground.
But it does that only at one frequency; lower and higher frequencies
are less rejected.
It seems quite impractical to tune for resonance with the "random"
pulses occurring in complex logic.
And, above resonance, inductance dominates and then higher-frequency
noise and pulses are more able to affect the rest of the system.

Another approach has been to use various bypass capacitors,
typically 0.01uF and 0.1uF in parallel, "sprinkled around" the PC
layout.
The idea was that self-resonance in any one bypass capacitor would
be hidden by the other capacitor of different value and, thus,
different resonant frequency.
Alone, either a 0.01uF or a 0.1uF cap may do an effective job.
However, recent modeling indicated, and experimentation has
confirmed, that using both together can be substantially worse
than using either value alone.

The inherent limitation in bypassing is that the normal bypass
process is not "lossy" or dissipative.
Pulse energy can be stored in the inductance of short leads or
PC-board traces, and then "ring" in resonance with the usual
ceramic bypass capacitors.
Having many bypass caps often leads to complex RF filter-like
structures which just pass the ringing energy around.
An alternative is the wide use of tantalum bypass capacitors,
since tantalum becomes increasingly lossy at higher frequencies
and will dissipate pulse energy.

Several approaches seem reasonable:

Use one value and type of small bypass capacitor throughout.

Use more tantalum capacitors, which have a surprisingly good
high frequency bypass capability as well as considerable power
storage.

Use ferrite beads to isolate individual chips or small groups
of chips from the shared power distribution.
This would introduce a few ohms of resistance in distribution leads
at high frequencies only.
Resonance effects might still exist, but would be simplified,
localized and isolated from the rest of the system.
Related isolationist approaches have a long history in radio
technology, where it is commonly called
decoupling.

A collection of eight
bits. Also called an "octet."
A byte can represent 256 different values or symbols.
The common 7-bitASCII codes used to represent characters in
computer use generally are stored
in a byte; that is, one byte per character.

A basic
electroniccomponent
which acts as a reservoir for electrical power in the form of
voltage.
A capacitor acts to "even out" the voltage across its terminals, and
to "conduct" voltage changes from one terminal to the other.
A capacitor "blocks"
DC and conducts
AC in proportion to
frequency.
Capacitance is measured in Farads: A
current of 1 Amp into a capacitance
of 1 Farad produces a voltage change of 1 Volt per second across
the capacitor.

If we know the capacitance C in Farads and the frequency
f in Hertz, the capacitive
reactanceXC in Ohms is:

XC = 1 / (2 Pi f C)
Pi = 3.14159...

Capacitors in
parallel are additive.
Two capacitors in
series have a total capacitance
which is the product of the capacitances divided by their sum.

A capacitor is typically two
conductive "plates" or metal foils
separated by a thin
insulator or dielectric, such as
air, paper, or ceramic.
An electron charge on one plate attracts the opposite charge on the
other plate, thus "storing" charge.
A capacitor can be used to collect a small current over long time,
and then release a high current in a short pulse, as used in a
camera strobe or "flash."

The simple physical
model of a
component which is a simple capacitance
and nothing else works well at low
frequencies and moderate
impedances.
But at RF frequencies and modern digital rates, there is no
"pure" capacitance.
Instead, each capacitance has a series inductance that often
does affect the larger circuit. See
bypass.

The general concept of a sequence of stages: first one, then
another, then another. In
electronics, a sequence of operations or
components.
This usage long precedes the current use in cryptography of
cascade ciphering.

The earliest definition of "cascade cipher" I know (1983)
does not mention key independence:

"A Cascade Cipher (CC) is defined as a concatenation of
block cipher systems, thereafter referred to as its stages; the
plaintext of the CC plays the role of the plaintext of the first
stage, the ciphertext of the i-th stage is the plaintext of the
(i+1)-st stage and the ciphertext of the last stage is the
ciphertext of the CC.

"We assume that the plaintext and ciphertext of each stage consists
of m bits, the key of each stage consists of k bits and
there are t stages in the cascade."

[Note the lack of the term "independent."/tfr]

--
Even, S. and O. Goldreich. 1983.
"On the power of cascade ciphers."
Advances in Cryptology: Proceedings of Crypto '83.43-50.

A modern academic definition is:

"[The] Product of several ciphers is also a product cipher, such a
design is sometimes called a cascade cipher."

Similarly, the term
"product encipherment" is defined in
Shannon 1949 (and is quoted here under
Algebra of Secrecy Systems)
as the use of one cipher, then another with independent keys.
Thus, the independent key terminology was defined in cryptography
over half a century ago, and probably 34 years before
"cascade ciphering" was defined for the same idea without the
key independence requirement.
Both terms are commonly and legitimately confused in use.
Anyone using the terms "cascade ciphering" or "product ciphering"
would be well advised to explicitly state what the term is supposed
to mean, or to not complain when someone takes it to mean something
else.

The unexpected ability to find numerical relationships in
physical processes formerly considered
random. Typically these take the form
of iterative applications of fairly simple computations.
In a chaotic system, even tiny changes in
state eventually lead to major changes
in state; this is called "sensitive dependence on initial
conditions." It has been argued that every good computational
random number generator
is "chaotic" in this sense.

In physics, the state of an
analog physical system cannot be
fully measured, which always leaves some remaining uncertainty to
be magnified on subsequent steps. And, in many cases, a physical
system may be slightly affected by thermal noise and thus continue
to accumulate new information into its state.

In a
computer, the state of the
digitalsystem is explicit and
complete, and there is no uncertainty. No noise is accumulated.
All operations are completely
deterministic. This means that, in a
computer, even a "chaotic" computation is completely predictable
and repeatable.

The characteristic of a
finite field is the number of times
the multiplicative identity must be added to itself to produce
zero.
The characteristic will be some
prime numberp or, if the summation
will never produce zero, the characteristic is said to be zero.

Finding or building a large number of different
keyedLatin squares, and especially
orthogonal Latin squares,
can be extremely difficult.
Although some simple constructions are available for
statistical use, few of those support
making huge numbers of essentially random squares.
One solution which appears to be new to cryptography is what
I call the Checkerboard Construction:

One way to construct a larger square is to take some Latin square
and replace each of the symbols or elements with a full Latin square.
By giving the replacement squares different symbol sets, we can
arrange for symbols to be unique in each row and column, and so
produce a Latin square of larger size.

If we consider squares with numeric symbols, we can give each
replacement square an offset value, which is itself determined by
a Latin square. We can obtain offset values by multiplying the
elements of a square by its order:

Clearly, this Latin square exhibits massive structure at all
levels, but this is just a simple example.
In practice we would create and use a differentorder-4 table for each position, and yet another for
the offsets.
We would also
shuffle all rows, columns, and symbols in
the larger square.
And we could use order-16 squares to construct a keyed
square of order-256.
The result is a
balanced table in a difficult to predict
arrangement, a distinct selection from among a plethora of similar
tables, and, thus, apparently ideal for cryptographic use as a
Latin square combiner.

Keyspace

There are 576 Latin squares of order 4, any one of which can be
used as any of the 16 replacement squares.
The offset square is another order-4 square.
So we can construct 57617 (about 8 x 1046
or 2155) different squares of order 16 in this way.
Then we can shuffle the resulting square in 16! * 15!
(about 2 x 1025 or 284)
different ways, thus producing about 2 x 1072
squares, for about 240 bits of keying per square.
(Even if we restrict ourselves to
using only the 144 order 4 squares formed by shuffling
a single standard square, we still have a 206-bit keyspace.)
We could store two of the resulting order 16 "4 bit"
squares in a 256-byte table for use as a "pseudo-8 bit" combiner,
and might even select a combiner dynamically from an array of such
tables.

An early and simple form of
error detecting code.
Commonly, an actual summation of data values in a register of
some reasonable size like 16 bits.
Unfortunately, addition is not particularly effective in
detecting errors:
In one massive set of experiments with real data, a
16-bit checksum was shown to detect errors about as
well as a 10-bitCRC.

Improved checksums (e.g., Fletcher's checksums) include both data
values and data positions and may perform within a factor of 2 of CRC.
One advantage of a true summation checksum is a minimal computation
overhead in software (in hardware, a CRC is almost always smaller
and faster).
Another advantage is that when header values are changed in transit,
a summation checksum is easily updated, whereas a CRC update is
more complex and many implementations will simply re-scan the
full data to get the new CRC.

The term "checksum" is sometimes applied to any form of error
detection, including more sophisticated codes like CRC.

In the usual case, "many"
random samples are counted by
category or separated into value-range "bins." The reference
distribution gives us the the number of values to expect in
each bin. Then we compute a X2 test
statistic related to the difference
between the distributions:

X2 = SUM( SQR(Observed[i] - Expected[i]) / Expected[i] )

("SQR" is the squaring function, and we require that each
expectation not be zero.) Then we use a
tabulation of chi-square statistic values to look up the probability
that a particular X2 value or lower (in the
c.d.f.) would occur by random sampling if both
distributions were the same. The statistic also depends upon the
"degrees of freedom," which is
almost always one less than the final number of bins.
See the chi-square section of the "Normal, Chi-Square and
Kolmogorov-Smirnov Statistics Functions in JavaScript" page
(locally, or @:
http://www.ciphersbyritter.com/JAVASCRP/NORMCHIK.HTM#ChiSquare).

The c.d.f. percentage for a particular
chi-square value is the area of the statistic distribution to the
left of the statistic value; this is the probability of obtaining
that statistic value or less by random selection when testing
two distributions which are exactly the same. Repeated trials which
randomly sample two identical distributions should produce about the
same number of X2 values in each quarter of the distribution
(0% to 25%, 25% to 50%, 50% to 75%, and 75% to 100%). So if we
repeatedly find only very high percentage values, we can assume that
we are probing different distributions. And even a single very high
percentage value would be a matter of some interest.

Any statistic probability can be expressed either as the
proportion of the area to the left of the statistic value
(this is the "cumulative distribution function" or c.d.f.), or as
the area to the right of the value (this is the "upper tail").
Using the upper tail representation for the X2 distribution
can make sense because the usual chi-squared test is a "one tail" test
where the decision is always made on the upper tail. But the
"upper tail" has an opposite "sense" to the c.d.f., where higher
statistic values always produce higher percentage values.
Personally, I find it helpful to describe all statistics by their
c.d.f., thus avoiding the use of a wrong "polarity" when interpreting
any particular statistic. While it is easy enough to convert from
the c.d.f. to the complement or vise versa (just subtract from 1.0),
we can base our arguments on either form, since the statistical
implications are the same.

It is often unnecessary to use a statistical test if we just want
to know whether a function is producing something like the expected
distribution: We can look at the binned values and
generally get a good idea about whether the distributions change in
similar ways at similar places. A good rule-of-thumb is to expect
chi-square totals similar to the number of bins, but distinctly
different distributions often produce huge totals far beyond the
values in any table, and computing an exact probability for such
cases is simply irrelevant. On the other hand, it can be very
useful to perform 20 to 40 independent experiments to look for a
reasonable statistic distribution, rather than simply making a
"yes / no" decision on the basis of what might turn out to be a
rather unusual result.

Since we are accumulating discrete bin-counts, any
fractional expectation will always differ from any actual count.
For example, suppose we expect an
even distribution, but have many
bins and so only accumulate enough samples to observe about 1 count
for every 2 bins. In this situation, the absolute best sample
we could hope to see would be something like (0,1,0,1,0,1,...),
which would represent an even, balanced distribution over the range.
But even in this best possible case we would still be off by half
a count in each and every bin, so the chi-square result would not
properly characterize this best possible sequence. Accordingly, we
need to accumulate enough samples so that the quantization which
occurs in binning does not appreciably affect the accuracy of the
result. Normally I try to expect at least 10 counts in each bin.

But when we have a reference distribution that trails off toward
zero, inevitably there will be some bins with few counts.
Taking more samples will just expand the range of bins, some of which
will be lightly filled in any case. We can avoid quantization error
by summing both the observations and expectations from multiple bins,
until we get a reasonable expectation value (again, I like to see 10
counts or more).
This allows the "tails" of the distribution to be more properly
(and legitimately) characterized.
(The technique of merging adjacent bins is sometimes called
"collapsing.")

1. Any system which uses
encryption; a
cipher system.
2. A
key-selected secret transformation between
plaintext and
ciphertext.
A key-selected
function which takes plaintext to
ciphertext, and an inverse function which takes that ciphertext
back to the original plaintext.
3. Specifically, a secrecy
mechanism or process which operates on
individual characters or
bits independent of semantic content.
As opposed to a secret
code, which generally operates on words,
phrases or sentences, each of which may carry some amount of
complete meaning.

A good cipher can transform secret information into a multitude
of different intermediate forms, each of which represents the original
information. Any of these intermediate forms or ciphertexts
can be produced by ciphering the information under some key value.
The intent is that the original information only be exposed
by one of the many possible keyed interpretations of that
ciphertext. Yet the correct interpretation is available merely by
deciphering under the appropriate key.

A cipher appears to reduce the protection of secret information
to enciphering under some key, and then keeping that key secret.
This is a great reduction of effort and potential exposure, and is
much like keeping your valuables in your house, and then locking
the door when you leave. But there are also similar limitations
and potential problems.

With a good cipher, the resulting ciphertext can be stored or
transmitted otherwise exposed without also exposing the secret
information hidden inside. This means that ciphertext can be stored
in, or transmitted through, systems which have no secrecy protection.
For transmitted information, this also means that the cipher itself
must be distributed in multiple places, so in general the cipher
cannot be assumed to be secret. With a good cipher, only the
deciphering key need be kept secret. (See:
Kerckhoffs' requirements,
but also
security through obscurity.)

Note that a cipher does not, in general, hide the length
of a plaintext message, nor the fact that the message exists,
nor when it was sent, nor, usually, the addressing to whom and
from whom.
Thus, even the theoreticalone time pad
(often said to be
"proven unbreakable")
does expose some information about the plaintext
message.
If message length is a significant risk, random amounts of
padding
can be added to confuse that, although padding can of course only
increase message size, and is an overhead to the desired
communications or storage.
This typically would be handled at a level outside the cipher design
proper, see
cipher system.

It is important to understand that ciphers are unlike any other
modern product design, in that we cannot know when a cipher
"works." For example:

A bridge is designed with beams of known strength.
The strength of the resulting structure can be simulated and
computed.
When built, we can roll something heavy across and see that the
bridge "works."
Over time, we can develop trust that the bridge will work
(as a bridge) when we need it.

A car "works" when it moves, and over time we can develop
trust that it will move when and how we want.
We know when it does not work.

A typical computer program does something and we can see
the results, so we know if what we want actually occurs.
Over time, we can build trust that the program will do what
we want.

With a medicine, if we cannot see (or show) that it is
working for us, we switch to something else.

Ciphers are like none of those things because an
opponent may
break
the cipher and not bother to tell us: we simply cannot know when
a cipher "works."
Since we do not know if a cipher is keeping our secrets safe,
repeated use cannot build
trust.
And since we cannot see the outcome, we cannot know how to design
a cipher to "guarantee" (in any sense at all) that the result will
do what we need. In industrial
quality-control terms, cipher
design is literally "out of control." Also see:
scientific method.

In CBC mode the
ciphertext value of the preceding
block is
exclusive-ORcombined with the
plaintext value for the current block.
This randomization has the effect of distributing the resulting
block values evenly among all possible block values, and so
tends to prevent
codebook attacks.
But ciphering the first block generally requires an
IV or initial value to start the process.
And the IV necessarily
expands the ciphertext by
the size of the IV.

[There are various possibilities other than CBC for avoiding
plaintext block statistics in ciphers.
One alternative is to pre-cipher, presumably with a different
cipher and key, thus producing randomized plaintext blocks (see
multiple encryption).
Another alternative is to use a block at least 64 bytes wide,
which, if it contains language text, can be expected to contain
sufficient unknowable randomness to avoid codebook attacks (see
huge block cipher advantages).]

Note that the exposed nature of the CBC randomizer (the previous
block ciphertext) does not hide plaintext or plaintext statistics.
When simple deciphering exposes plaintext, the vast majority of
possible plaintexts can be rejected automatically, based on their
lack of bit-level and character and word structure.
Normal CBC does not improve this situation much at all.

CBC First Block Problems

In CBC mode, each randomizing value is the ciphertext from each
previous block.
Clearly, all the ciphertext is exposed to the opponent, so there
would seem to be little benefit associated with hiding the IV,
which, after all, is just the first of these randomizing values.
Clearly, in the usual case, if the opponent makes changes to a
ciphertext block in transit, that will hopelessly
garble two blocks (or perhaps just one)
of the recovered plaintext.
As a result, it is very unlikely that an opponent could make
systematic changes in the plaintext simply by changing the
ciphertext.

But the IV is a special case: if the IV is not enciphered,
and if the opponents can intercept and change the IV in transit,
they can change the first-block plaintext bit-for-bit,without a block-wide garble.
That means an opponent could make systematic changes to the
first block plaintext simply by changing the IV.
So, if the opponents know the first block plaintext (which could
be a logo, name, date, or fixed dollar value), the stage is set
for a potentially serious
man-in-the-middle (MITM)
problem.
(The far more serious public-key MITM problem is an authentication
failure with respect to the key, not an IV or data, and
is a completely different issue.)
Note that the CBC first block problem is completely independent
of the cipher key and whether or not it changes, and is an even
larger problem with the modern wider blocks used in
AES.

CBC First Block Solutions

Despite howls of protest to the contrary, it is easy to see that
the CBC first-block problem is a
confidentiality problem, not an
authentication problem.
To see this, we simply note that all that is necessary to avoid the
problem is to keep the IV secret.
When the IV is protected, the opponent cannot know which changes to
make to reach a desired plaintext.
And, since the problem can be fixed without any authentication at
all, it is clear that the problem was not a lack of authentication
in the first place.
Instead, the problem was caused by exposing the IV, and solving
that is the appropriate province of the CBC and
block level, instead of a
MAC at the
cipher system and message level.

To fix the CBC first-block problem it is not necessary to
check the plaintext for changes by using a MAC.
Nor is a MAC necessarily the only way to authenticate a message.
But if we are going to use a MAC anyway, that is one way to solve
the problem.
That works because a MAC can detect the systematic changes which
a lack of confidentiality may have allowed to occur.
But if a MAC is not otherwise desired, introducing a MAC to solve
the CBC first-block problem is probably overkill, because only the
block-wide IV needs to be protected, and not the entire message.

The reason we might not want to use a MAC is that a MAC carries
some inherent negative consequences. One of those is a processing
latency, in that we cannot validate the
recovered plaintext until we get to the end and check the digest.
Latency can be a serious problem with streaming data like audio
and video, and with interactive protocols.
But even with an email message we have to buffer the whole message
as decrypted and wait for the incoming data to finish before we can
do anything with it (or we can make encryption hard and decryption
easy, but one side will be a problem).
Or we can set up some sort of packet structure with localized
integrity checks and ciphertext expansion in each packet.
But that seems like a lot of trouble when an alternative is just to
encipher the IV.

Even when a MAC is used at a higher level anyway, it may be
important for
Software Engineering and
modular code construction to handle at the CBC level as many of the
problems which CBC creates as possible.
This avoids forcing the problem on, and depending upon a correct
response from, some unknown programmer at the higher level, who may
have other things on the mind.
Handling security problems where they occur and not passing them
on to a higher layer is an appropriate strategy for security
programming.

As the problems compound themselves, it seems legitimate to point
out that the CBC first-block problem is a CBC-level security issue
caused by CBC and by transporting the IV in the open.
The CBC first-block problem is easily prevented simply by
transporting the IV securely, by encrypting the IV before including
it with the ciphertext.
Also see "The IV in Block Cipher CBC Mode" conversation
(locally, or @:
http://www.ciphersbyritter.com/NEWS6/CBCIV.HTM).

A larger
system which includes one or more
ciphers and generally involves much more,
including:

KeyManagement.
Facilities to support secure key loading, user storage, use,
re-use, archival storage, loss, and destruction.
Also possibly facilities for secure key creation and
transport encryption.
Key storage may involve creating a password-encrypted
mini-database of actual key values selected by open alias (see:
alias file).
Key transport may involve a full
public key component, with its
own key construction, storage, use, loss and destruction
reqirements, especially including some form of cryptographic
key certification, and possibly including a large, complex,
and expensive certification
infrastructure. (Also see
hybrid.) Also see the
Cloak2 documents for implemented
key management features
(locally, or @:
http://www.ciphersbyritter.com/CLO2FEA.HTM)
and alias file usage
(locally, or @:
http://www.ciphersbyritter.com/PROD/CLO2DOC3.HTM#DetAF).

KeyphraseHashing.
In general, language phrases must be converted to randomized
and dense binary values for cipher use.
In general, a common hash such as
CRC is sufficient for the task, and a
cryptographic hash is not required.

Message Key Creation and Use.
In general, a random value is used as the key for the actual
data.
Typically, the random message key value will be the only thing
encrypted by the key from the alias file, or the
public key component.

Message Integrity Coding. Detect when a message has
changed in transit. This could be a
MAC, or even a simple
hash
protected by a conventional
block cipher.

Cipher Selection.
A cipher system need not always use the same cipher, and a
multiple encryption system
may use different ciphers in different sequences.
A dynamic selection protocol.

Data Compression.
Messages often can be compressed, leading to shorter messages
and apparently more-complex plaintext.
(But if the compression method is known and unkeyed, compression
may not add much cryptographic advantage.)
Data compression may help obscure the original message length,
however.

Message Length and Position Concealment.
In general, ciphers conceal message data, not message length.
That can be a serious security hazzard which may need to be
addressed. Random amounts of random data
(e.g., nulls) can be added to the top and
bottom of a message, or even inside the data itself.
Null positions can be keyed and then might essentially
constitute another cipher layer in a
multiple encryption system.

Data Length Limitation and Re-Keying.
Most ciphers can securely handle only some maximum amount
of data before the
keying must change to preserve security.
For example, if an opponent collects a
codebook of known-plaintext ciphertexts,
we can expect that to become useful at something like
the square-root of the number of different block values (see
birthday attack).
So a 64-bit block cipher probably should be limited to ciphering
2**32 blocks or less under a single key.
Since various academic attacks on DES need something like
2**47 known-plaintexts, simply limiting the amount of data
processed under one key prevents those attacks.
Indeed, many academic
attacks require a huge amount of
known plaintext or even
defined plaintext
ciphered under a single key.
By supporting automatic re-keying, the cipher system can
assure that the required message volume simply cannot exist.

IV Creation and Use. Most
operating modes for conventional
block ciphers need an IV, which then (usually) needs to be
protected and made part of the ciphertext.

Block Partitioning and End Handling.
Messages of arbitrary length need to be partitioned into the
fixed-size blocks used by conventional block ciphers, and
any remaining partial-block of data padded into a full block
or otherwise handled. It is important that any such solution
handle 1-byte messages.

Secure File Overwrite.
Operating systems generally do not erase files, but instead
simply make the deleted file space available for use.
It is possible to read "deleted" data from a disk until the
released sectors are actively overwritten.

Secure Memory Overwrite.
Memory buffers holding keys and data should be cleared by
active overwrite as soon as their use is complete.

Secure Plaintext Creation and Display.
Unless plaintext is "born secure," the operating system is
going to be a serious security problem.
A multitasking "swap file" is often a major security risk.

Taxonomy is classification or grouping
by common characteristics, instead of by name, development path,
or surface similarity.
Developing a useful taxonomy is one of the goals and roles of
science.
The advantage of a taxonomy is that we can study general concepts
which then apply to the many different things which fit in a
single group.
We can also compare and contrast different groups and their effects
on the things we study.
Ideally, a taxonomy will support future developments without major
changes in structure.
One way to do that is to start with a
dichotomy, which separates the universe
of all possibilities into exactly two groups.
Not only will all known things fit into one or the other
of those groups, but all future developments also will fit
in those same groups.
Fairly quickly, though, we must turn to enumeration to describe
distinct sub-classes.

For the analysis of cipher operation it is useful to collect
ciphers into groups based on their functioning (or intended
functioning). The goal is to group ciphers which are essentially
similar, so that as we gain an understanding of one cipher, we
can apply that understanding to others in the same group. We thus
classify not by the
components which make up the cipher, but
instead on the "black-box" operation of the cipher itself.

We seek to hide distinctions of size, because operation
is independent of size, and because size effects are usually
straightforward. We thus classify conventional
block ciphers as
keyedsimple substitution, just like
newspaper amusement ciphers, despite their obvious differences in
strength and construction. This allows us to compare the results
from an ideal tiny cipher to those from a large cipher construction;
the grouping thus can provide benchmark characteristics for
measuring large cipher constructions.

We could of course treat each cipher as an entity unto
itself, or relate ciphers by their dates of discovery, the tree of
developments which produced them, or by known strength. But each of
these criteria is more or less limited to telling us "this cipher is
what it is." We already know that. What we want to know is
what other ciphers function in a similar way, and then whatever is
known about those ciphers. In this way, every cipher need
not be an island unto itself, but instead can be judged and compared
in a related community of similar techniques.

Our primary distinction is between ciphers which handle all the
data at once
(block ciphers), and those which handle
some, then some more, then some more
(stream ciphers). We thus see the
usual repeated use of a block cipher as a stream meta-cipher
which has the block cipher as a component.
It is also possible for a stream cipher to be re-keyed or re-originate
frequently, and so appear to operate on "blocks." Such a cipher,
however, would not have the
overall diffusion we normally
associate with a block cipher, and so might usefully be regarded as
a stream meta-cipher with a stream cipher component.

The goal is not to give each cipher a label, but instead to seek
insight. Each cipher in a particular general class carries with it
the consequences of that class. And because these groupings ignore
size, we are free to generalize from the small to the large and so
predict effects which may be unnoticed in full-size ciphers.

Since the multiple data elements of a
block combine to
select a table entry, the smallest possible change to
any one of those data elements should select a new and
apparently random value, and thus "affect" the full
ciphertext block. This is
avalanche.

Binary-oriented simple substitution distributes bit-changes
between all code values
binomially, and
this effect can be sampled and examined statistically,
for a simple substitution signature.

Avalanche is two-way diffusion in the sense that "later"
plaintext can change "earlier" ciphertext, within a
single block, which is also a signature.

A conventional block cipher is intended to
simulate a
keyedsubstitution table
of a size which must be vastly larger than anything
which could be practically realized.
At issue is the quality of that simulation.

Few conventional block cipher designs are
scalable
to toy size which would support exhaustive testing.

Any substitution table becomes weak when its code
values are re-used.

Code value re-use can be minimized by randomizing the
plaintext block (e.g.,
CBC). This distributes the
plaintext evenly across the possible block values, but
at some point the transformation itself must change or
be exposed. And open randomization does not hide
plaintext structure from
brute force attack.

Another alternative is to have a huge block size so that
code value re-use is made exceedingly unlikely. A large
block also has room for a
dynamic keying field which
would make code value re-use even more unlikely. (Also see
huge block cipher advantages.)

Arguably a special case of Simple Substitution with
dynamic keying, but
with sufficiently different signatures and
characteristics to treat independently.

All elements to be ciphered first must be collected;
this is the block cipher signature.

Avalanche is neither present, nor needed, nor helpful
so the Simple Substitution signature does not exist.

By itself, transposition has various known weaknesses,
which can be avoided by design. But unless that is
done, we do not have a serious transposition cipher.
As a consequence, we have just one example of the group
properties.

One construction is to have a a
"homophonic" field as
part of the plaintext block. A random value in that
field thus selects a particular ciphertext from the
many which each reproduce exactly the same "data" field.

Arguably a special case of
Simple Substitution,
but it is not clear that homophonic ciphers could not
be built in other ways.

A
confusion sequence
generator
(RNG) which is not closed
and "free running," but which is affected by the
ciphertext.

Ciphertext, or possibly plaintext, modifies the
state in the RNG and thus the subsequent keystream.

Different messages sent under the same key end up
having different confusion sequences without any
message key.

If ciphertext essentially becomes the entire RNG
state, we can create a random-like confusion stream
which will re-synchronize after ciphertext data loss.
However, data loss is rarely an applications-level
issue in modern communications systems.

Under
known plaintext
attack, the common "ciphertext feedback" form
exposes both the output from, and the input to, the
confusion sequence RNG, which puts a lot of
pressure on RNG strength.

Code value re-use can be minimized by randomizing the
plaintext block (e.g.,
CBC). This distributes the
plaintext evenly across the possible block values, but
at some point the transformation itself must change or
be exposed. And open randomization does not hide
plaintext structure from
brute force attack.

Another alternative is to use a very large block so that
code value re-use is made exceedingly unlikely. A large
block also has room for a
dynamic keying field which
would make code value re-use even more unlikely. (See
huge block
cipher advantages.)

By itself, the use of a known number of multiple
alphabets in a regular sequence is not much stronger
than a single alphabet.

It is of course possible to select an alphabet at
pseudo-random, for example by re-keying DES after
every block ciphered. This requires an RNG and an
IV to select the starting
state. Re-keying DES will
take some time, however.

An improvement over random alphabets is the use of a
Latin square combiner
which effectively selects among a balanced set of
different fixed substitution alphabets.

Cylinder

A cipher which has or simulates multiple alphabet
disks on a single rod. The plaintext message is
entered one letter per disk by turning each disk so
the correct letter shows in a particular row. The
ciphertext is read off some other row.

Although operation typically occurs in "chunks"
which fill up the cylinder, a full block is not
required and individual characters can be ciphered.
This is the stream cipher signature.

Primary keying is the arrangement of the alphabet around
each disk, and the selection and arrangement of disks
on the rod.

By entering the plaintext on one row, any of n-1 other
rows can be sent as ciphertext; this selection is an
IV.

If the plaintext data are redundant, it is possible to
avoid sending the IV by selecting the one of n-1
possible decipherings which shows redundancy. But this
is not generally possible when ciphering arbitrary
binary data.

If an IV is selected first, each character ciphering in
that "chunk" is
independent of each other
ciphering.
There is no data diffusion.

In general, each disk is used at fixed periodic
intervals through the text, which is weak.

The ciphertext selection is
homophonic,
in the sense that different ciphertext rows each
represent exactly the same plaintext.

Cylinder operation is notpolyphonic in the usual
sense: While a single ciphertext can imply any
other row is plaintext, generally only one row has a
reasonable plaintext meaning.

Dynamic Substitution is important because it directly
confronts the classic attack on conventional stream
ciphers: The usual
additive combiner
immediately exposes the confusion sequence under
known plaintext
attack.
Since the structure of the RNG is assumed to be known,
an exposed confusion sequence supports attempts to
reconstruct the RNG state.
In contrast, a Dynamic Substitution combiner does not
expose the confusion sequence, which should make the
cipher stronger.

ITERATIVE

Multiple encryption
using one stream cipher repeatedly with a new random
IV on each iteration so as to
eventually achieve the effect of a much larger
message key.

Each iteration seemingly must expand the ciphertext by
the size of the IV, although this is probably about the
same expansion we would have with a message key.

Unfortunately, each iteration will take some time.

Particularly appropriate for a cylinder cipher, as
shown by W.T. Shaw.

If a
cipher
was like any other technological construction, we could just
test it to see how good it was.
Unfortunately, cipher
strength is not a measurable engineering
quantity, and as a result, strength is simply "out of control"
with respect to design and manufacturing
quality.
However, some basic tests can and should be done to weed out
the most obvious problems.

In general, absent special
coding for transmission
(such as converting full binary into
base-64 for email)
ciphertext should be
"random-like."
Accordingly, we can run all sorts of tests to try to find any
sort of structure or
correlation in the ciphertext, or between
plaintext,
key, and ciphertext. The many available
statisticalrandomness tests should provide
ample opportunity for virtually unlimited testing.

One obvious issue in block cipher construction is
diffusion.
If the resulting emulated table really is a permutation,
if we change the input value in any way, we expect the number of
bits which change in the output to occur in a
binomial distribution.
In addition, we expect each output bit to have a 50 percent
probability of changing.
We can measure these things.

Typically, we pick some random input value and cipher to get the
result; then we change some bit of the input and get the new result
and note which and how many bits changed.
One advantage of the binomial distribution is that, as block
size increases, the distribution becomes increaingly narrow
(for any reasonable probability).
Thus, we can hope to peer into tremendously small probabilities,
which may be about as much error as we can expect to find.

We also can develop a mean value for each output bit, or analyze
a particular bit more closely, looking for
correlations between input and output,
or between key and output, or between the key and some aspect of
the transformation between input and output.
We might look at correlations between each key bit and each output
bit, or between any combination of key bits versus any combination
of output bits and so on.
With increasingly large experiments, we can perform increasingly
fine statistical analyses.

An issue of at least potential concern is that conventional
block cipher designs do not implement a completely keyed
transformation , but instead implement only a tiny, tiny fraction
of all possible tables of the block size.
This opens the possibility of weakness in some form of correlation
resulting from a tiny subset of implemented permutations.
The issue then becomes one of trying to measure possible
structural correlations between the set of implemented permutations
and the key, including individual bits, or even arbitrary functions
of arbitrary multiple bits.
At real cipher size, such measurements will be difficult.
Or perhaps knowledge of some subset of the transformation could
lead to filling out the rest of the transformation;
at real cipher size, this may be very difficult to see.

Block Cipher Scalability

Cipher designs which arescalable can be tested at real size
when that is useful, or as tiny "toy" versions, when that
is useful.
Naturally, the tiny versions are not intended to be as strong
as the real-size versions, nor even to be a useful cipher
at that size.
One purpose is to support exhaustive correlation testing to
reveal structural problems which should be easier to discern in
the smaller construction.
The goal would be to find fault at the tiny size, and then use that
to develop insight leading to a scalable attack.
That same insight also should help improve the cipher design.

One advantage of scalability is to support attacks on the same
cipher at different sizes.
Once we find an attack on a toy-size version, we can measure how
hard that approach really is by actually doing it.
Then we can scale up the cipher slightly and measure how much the
difficulty has increased.
That can provide true evidence which can be used to
extrapolate the strength of the real-size cipher, under the given
attack.
I see this as vastly more believable information than we have for
current ciphers.

Another thing we might do is to measure
Boolean function nonlinearity
values.
This measure at least has the advantage of directly addressing
one form of strength: the linear
predictability of each
key-selected permutation.

Yet another thing we might investigate is the number of keys
that are actually different.
That is, do any keys produce the same emulated table, and if not,
how close are those tables?
Can we find any two keys that produce the same ciphertext from the
same plaintext? (See
population estimation and
multiple encryption.)

Testing Stream Ciphers

The conventional stream cipher consists of a keyed
RNG or confusion generator
and some sort of data and confusion
combiner, usually
exclusive-OR.
Since exclusive-OR has absolutely no strength of its own, the
strength of the classic stream cipher depends solely on the RNG.
Such testing is a common activity in cryptography, using various
available statistical randomness tests.
(But recall that many strengthless statistical RNG's do well on
such tests.)
I particularly recommend runs up/down, because we can develop a
useful non-flat distribution of results and then compare that to
the theoretical expectation.
We can do similar things with birthday tests, which are also
useful in confirming the coding efficiency or
entropy of
really random generators.

Modern stream ciphers with
nonlinear combiners (see, for example:
Dynamic Substitution)
seem harder to test.
Presumably we can test the ciphertext for
randomness, as usual, yet that would not
distinguish between the combiner and the RNG.
Possibly we could test the combiner with RNG, and then the RNG
separately, and compare distributions.
However, it is not clear what sort of tests would provide useful
insight to this construction.
Alternate suggestions are welcomed.

Ciphertext contains the same information as the original
plaintext, hopefully in a form which cannot be easily understood.
Cryptography hides information by
transforming a plaintext message into any one of a vast multitude
of different ciphertexts, as selected by a
key.
Ciphertext thus can be seen as a
code, in which the exact same ciphertext has
a vast number of different plaintext interpretations.
As a goal, it should be impractical to know which interpretation
represents the original plaintext without knowing the key.

Normally, ciphertext will appear
random; the values in the ciphertext
should occur in a generally
balanced way.
Normally, we do not expect ciphertext to
compress to a smaller size; that
implies efficient
coding (also see
entropy), but only for the
random-like ciphertext.
Since the amount of plaintext information in the message may be
far smaller, from that point of view the ciphertext coding may be
very inefficient.

It also may happen that the ciphertext can be
encoded inefficiently (perhaps as
base-64 for email transmission).
Note that such encoding does not require distinct steps
for ciphering and then encoding: Some ciphers directly produce
encoded (and, thus, expanded) ciphertext (see, for example:
Penknife). Such ciphertext will be
compressible, simply because
representing information with a subset of
ASCII characters is inherently less efficient
than a binary representation.
Thus, inefficiently coded ciphertext may well compress, and that
does not imply weakness in the cipher itself.

Ciphertext expansion is the general situation:
Stream ciphers need a
message key, and
block ciphers with a small block
need some form of plaintext randomization, which generally
needs an
IV to protect the first block. Only block
ciphers with a large size block generally can avoid ciphertext
expansion, and then only if each block can be expected to hold
sufficient uniqueness or
entropy to prevent a
codebook attack.

It is certainly true that in most situations of new construction
a few extra bytes are not going to be a problem. However, in some
situations, and especially when a cipher is to be installed into
an existing system, the ability to encipher data without
requiring additional storage can be a big advantage. Ciphering
data without expansion supports the ciphering of data structures
which have been defined and fixed by the rest of the system,
provided only that one can place the cipher at the interface
"between" two parts of the system. This is also especially
efficient, as it avoids the process of acquiring a different,
larger, amount of store for each ciphering. Such an installation
also can apply to the entire system, and not require the
re-engineering of all applications to support cryptography in
each one.

CFB assumes a
shift register of the block cipher
block size. An
IV or initial value first fills the register,
and then is ciphered. Part of the result, often just a single
byte, is used to cipher data, and the
resulting
ciphertext is also
shifted into the register. The new register value is ciphered,
producing another confusion value for use in stream ciphering.

One disadvantage of this, of course, is the need for a full
block-wide ciphering operation, typically for each data byte
ciphered. The advantage is the ability to cipher individual
characters, instead of requiring accumulation into a block
before processing.

In a sense, the idea of a
ciphertext-only attack is
inherently incomplete.
By themselves, symbols and
code values have no meaning.
So we can have all the ciphertext we want, but unless we can find
some sort of structure or relationship to plaintext, we have nothing
at all.
The extra information necessary to identify a break could be the bit
structure in the
ASCII code, the character structure of
language, or any other known relation.
But the ciphertext is never enough if we know
absolutely nothing about the plaintext.
It is our knowledge or insight about the plaintext, the statistical
structure, or even just the known use of one plaintext concept,
that allows us to know when deciphering is correct.

In practice, ciphertext-only attacks typically depend on some
error or weakness in the
encryption design which somehow relates
some aspect of
plaintext in the ciphertext. For example,
codes that always encrypt the same words in
the same way naturally leak information about how often those words
are used, which should be enough to identify the plaintext.
And the more words identified, the easier it is to fill in the gaps
in sentences, and, thus, identify still more words. Modern
ciphers are less likely to fall into that
particular trap, making ciphertext-only attacks generally more
academic than realistic (also see
break).

In
electronics, the "circular" flow of
electrons from a
power source, through
conductors and
components and back to the power source.
Or the arrangement of components which allows such flow and
performs some function.

1. In commerce, an assertion of value due.
2. In law, a statement of ownership, as in recovering a lost item,
or filing for mining or
patent rights. Also see
patent claims.
3. In
argumentation, a
conclusion to be shown correct.

Generally speaking,
plaintext.
Messages transmitted without encryption or "in the clear."
For time-sensitive and transient tactical information, getting
the message through as soon as possible may be far more important
than secrecy.

A repetitive or cyclic timing signal to coordinate
state changes in a
digital system.
A clock coordinates the movement of data and results through
various stages of processing.
Although a clock signal is digital, the source of the repetitive
signal is almost always an
analogcircuit,
typically a
crystal oscillator.

In a digital system we create a delay or measure time by simply
counting pulses from a stable
oscillator.
Since counting operations are digital, noise effects are virtually
eliminated, and we can easily create accurate delays which are as
long as the count in any counter we can build.

From the Latin codex, for tree trunk or wax-covered wooden
tablet.
1. A list of rules.
2. In
cryptography, symbols, colors, shapes,
flowers, musical notes, finger positions or values which stand for
symbols, values, words, sentences, ideas, sequences, or even
operations (as in
computer
"opcodes"). Often just a simple
substitution between numeric values.

Code values can easily represent not only symbols or characters,
but also words, names, phrases, and entire sentences (also see
nomenclator).
In contrast, a
cipher operates only on individual characters or
bits.
Classically, the meaning of each code value was collected in a
codebook.
Codes may be open (public) or secret.

Coding is a very basic part of
modern computation and generally implies no
secrecy or information hiding.
In modern usage, a code is often simply a correspondence between
information (such as character symbols) and values (such as the
ASCII code or
Base-64).
Because a code can represent entire phrases with a single number,
one early application for a public code was to decrease the cost
of telegraph messages.

In general, secret
codes are weaker than
ciphers, because a typical code will simply
substitute or transform each different
word or letter into a corresponding value.
Thus, the most-used
plaintext words or letters also become
the most-used code or
ciphertext values and the statistical
structure of the plaintext remains exposed. Then the
opponent easily can find the most-used
ciphertext values and realize that they represent the most-used
plaintext words. Accordingly, it is common to
superencipher a coded message
in an attempt to hide the codebook values.

A meaningful code is more than just data, being also the
interpretation of that data. The main concept of modern
cryptography is the use of a
key to select one interpretation from among
vast numbers of different interpretations, so that meaning is hidden
from those who do not have both the appropriate decryption program
and key.
Each particular ciphertext is interpreted by the decryption
system to produce the desired plaintext.
The pairing of value plus interpretation to produce
or do something occurs in various places:

In cryptography, we have
ciphertext plus
the key-selected transformation
which interprets that ciphertext.

In computer hardware, we have
opcode values plus a
computer which interprets those
different values as different operation commands.

In computer programming, we typically have raw
binarydata plus procedures
which interpret that data.

In computer-based documents, we have various formats
(e.g., .DOC or .PIF files) plus a program to properly
display those documents.

In biology we have DNA plus the cell which
interprets DNA, both being required to express the original meaning.

In real life, many useful things do require a particular thing
to use them.
For example, gasoline provides energy for cars, but only because
cars have the appropriate engine to perform the desired conversion.
Similarly, bullets require guns, radio broadcasting stations
require radios and so on.
But that probably reaches beyond the idea of a code, which is
basically limited to information- or symbol-oriented transformations.

Literally, the listing or "book" of
code
transformations. More generally, any collection of such
transformations. Classically, letters, common words and useful
phrases were numbered in a codebook; messages transformed into
those numbers were "coded messages." Also see
nomenclator.
A "codebook style cipher" refers to a
block cipher.

A form of
attack in which the
opponent simply tries to build or collect a
codebook of all the possible transformations
between
plaintext and
ciphertext (under a single
key). This is the classic approach we
normally think of as "codebreaking."
Also called "bookbreaking."

The usual
ciphertext only
approach depends upon the plaintext having strong statistical
biases which make some values far more probable than others, and
also more probable in the context of particular preceding known
values.
While this is not
known plaintext, it is
a form of known structure in the plaintext.
Such attacks can be defeated if
the plaintext data are randomized and thus evenly and independently
distributed among the possible values (see
balance).

When a codebook attack is possible on a
block cipher, the complexity of the
attack is controlled by the size of the block (that is, the number
of elements in the codebook) and not the
strength of the cipher.
This means that a codebook attack would be equally effective
against either
DES or
Triple-DES.

One way a block cipher can avoid a codebook attack is by having
a large
block size which will contain an unsearchable
amount of plaintext "uniqueness" or
entropy. Another approach is to randomize the
plaintext block, by using an
operating mode such as
CBC, or
multiple encryption.
Yet another approach is to change the
key frequently, which is one role of the
message key introduced at the
cipher system level.

Specifically, the work of attempting to
attack and
break a secret
code. More generally, attempting to defeat
any kind of secrecy system.

Codebreaking is what we normally think of when hearing the WWII
crypto stories, especially the Battle of Midway, because many
secrecy systems of the time were codes.
According to the story, the Japanese are preparing an attack on
Midway island, and have given Midway the coded designation "AF."
American cryptanalysts have exposed the designator "AF," but not
what it represents.
Assuming the "AF" to be Midway, American codebreakers have Midway
falsely report the failure of their fresh-water plant in open
traffic.
Then, two days later, intercepted Japanese traffic states that
"AF" is short of fresh water.
Thus, "AF" is confirmed as Midway.

Note that there had to be a way to identify the actual target
(plaintext) with the code value
(ciphertext) before the meaning was
exposed.
Simply having the ciphertext itself, without finding structure in
the ciphertext or some relationship to plaintext, is almost never
enough, see
ciphertext-only attack.

In a mathematical
expression, a
factor of a
term.
Typically a constant value or simple
variable which multiplies the parameter
of interest (often X).
However, any proper subset grouping of factors technically would be
a coefficient of the overall product.

The uncomfortable psychological reaction to the experience of
finding that some of our core
beliefs are factually wrong.

The classic example is of a cult who believed the Earth was
going to end at a particular time.
Supposedly, many members gave up their houses and jobs and so on, but
the Earth did not end.
As a consequence, less-involved members generally accepted that their
belief was false.
But more-involved members instead insisted that the actions of the
cult showed their faith, which was then rewarded by the Earth not
ending.

Obviously it is difficult to use
logic to address issues of faith, but
science is not a faith and does not require
belief.
Therefore, when we find that current scientific positions are wrong,
they can be changed with only minor discomfort and anguish.
Supposedly. (Also see
mere semantics and
old wives' tale.)

In mathematics, the term for any particular subset of symbols,
independent of order. (Also called the binomial coefficient.)
The number of combinations of n things, taken k
at a time, read "n choose k" is:

In mathematics, combinatorics is related to counting
selections, arrangements or other subsets of finite
sets. One result is to help us
understand the probability of a particular subset in the
universe of possible values.

Consider a conventional
block cipher:
For any given size block, there
is some fixed number of possible messages. Since every
enciphering must be reversible (deciphering must work), we
have a 1:1 mapping between
plaintext and
ciphertext blocks.
The set of all plaintext values and the set of all ciphertext
values is the same set; particular values just have different
meanings in each set.

Keying gives us no more ciphertext values,
it only re-uses
the values which are available. Thus, keying a block cipher
consists of selecting a particular arrangement or
permutation
of the possible block values. Permutations are a combinatoric
topic. Using combinatorics we can talk about the number of
possible permutations or keys in a block cipher, or in cipher
components like substitution tables.

Permutations can be thought of as the number of unique
arrangements of a given length on a particular set. Other
combinatoric concepts include
binomials
and
combinations
(the number of unique given-length subsets of a given set).

Irreversible or non-invertible combiners are often
proposed to mix multiple
RNG's into a single
confusion sequence, also for
use in stream cipher designs. But that is harder than it looks.
For example, see:

A term used in
S-box analysis to describe a property of
the value arrangement in an invertible
substitution or, equivalently, a
conventional
block cipher.
If we have some input value, and then change one bit in that
value, we expect about half the output bits to change; this is
the result of
diffusion; when partial diffusion is
repeated we develop
avalanche; and the ultimate result is
strict avalanche.
Completeness tightens this concept and requires that changing
a particular input bit produce a change in a particular output bit,
at some point in the transformation (that is, for at least one input
value). Completeness requires that this relationship occur at least
once for every combination of input bit and output bit.
It is tempting to generalize the definition to apply to multi-bit
element values, where this makes more sense.

Completeness does not require that an input bit change
an output bit for every input value (which would not make
sense anyway, since every output bit must be changed at
some point, and if they all had to change at every
point, we would have all the output bits changing, instead
of the desired half). The inverse of a complete function is not
necessarily also complete.

As originally defined in Kam and Davida:

"For every possible key value, every output bit
ci of the SP network depends upon all input
bits p1,...,pn and not just a
proper subset of the input bits." [p.748]
-- Kam, J. and G. Davida. 1979.
Structured Design of Substitution-Permutation Encryption Networks.
IEEE Transactions on Computers.C-28(10):747-753.

An ordered pair of
real numbers (x,y) treated as the
vector
from the origin at (0,0) to (x,y), and represented either as real and
imaginary
rectangular
coordinates (x,y) or as the magnitude and angle (mag,ang) of the
vector.
DenotedC.
The rectangular representation is also called
"Cartesian";
the magnitude and angle form is called
"polar."

To build an appropriate algebra and make complex numbers a
field,
the rectangular representation is written as (x+iy) [or (x+jy)],
where i [or j] has the value SQRT(-1).
The symbol i is called "imaginary," but we might just consider it a
way for the algebra to relate the values in the ordered pair.
Clearly, i * i = -1.

we get complex algebra, and can perform most operations and
even evaluate trignometric and other complex functions like we do
with reals.

In cryptography, perhaps the most common use of complex
numbers occurs in the
FFT,
which typically transforms values in rectangular form.
Sometimes we want to know the magnitude or length of the
implied vector, which we can get by converting the rectangular
(x,y) representation into the (mag,ang) representation:

magnitude: mag(z) = SQRT( x*x + y*y )
angle: ang(z) = arctan( y / x)
Note: Computer arctan(x) functions are generally unable to
place the angle in the proper quadrant, but arctan2(x,y)
routines -- with two input parameters -- may be available
to do so.

A part of a larger construction; a building-block in an overall
design or
system. Modern
digital design is based on the use of a few
general classes of pre-defined, fully-specified parts. Since even
digital logic can use or even require
analog values internally, by enclosing these
values the logic component can hide complexity and present the
appearance of a fully digital device.

The most successful components are extremely general and can be
used in many different ways. Even as a brick is independent of the
infinite variety of brick buildings, a
flip-flop is independent of the infinite
variety of logic machines which use flip-flops.

The source of the ability to design and build a wide variety of
different electronic logic machines is the ability to interconnect
and use a few very basic but very general parts.

The use of individual components to produce a working
complex system in production requires:
first, a comprehensive
specification for each part; and
next, full testing to guarantee that each part actually
meets the specification (see:
quality management).

Digital logic is normally specified to operate correctly
over a range of supply voltage, temperature, loading,
clock rates, and other appropriate parameters.
Specified limits (minimum's or maximum's) guarantee that a working
part will operate correctly even with the worst case of all
parameters simultaneously.
This process allows large, complex systems to operate properly in
practice, provided the designer makes sure that none of the
parameters can exceed their correct range.

Originally the job title for a person who performed a laborious
sequence of arithmetic computations. Now a machine for performing
such calculations.

A logic machine with:

Some limited set of fundamental computations. Typical operations
include simple arithmetic and
Boolean logic. Each operation is
selected by a particular operation code value or
"opcode." This is a
hardware interpretation of the opcode.

The ability to follow a list of instructions or commands,
performing each in sequence. Thus capable of simulating a wide
variety of far more complex "instructions."

The ability to execute or perform at least some instructions
conditionally, based on parameter values or intermediate results.

The ability to store values into a numbered "address space"
which is far larger than the instruction set, and later to recover
those values when desired.

A material in which electron flow occurs easily. Typically a
metal; usually copper, sometimes silver, brass or aluminum.
A
wire. As opposed to an
insulator and a
semiconductor.

As a
rule of thumb, a cubic centimeter (cc)
of a solid has about 1024 or 1E24 atoms.
In a metal, usually each atom contributes one or two electrons,
so a metal has about 1024 (1E24) free electrons per cc.
This massive number of free electrons has a tiny
resistance to
current flow of something like
10-6 ohms across a cubic centimeter of copper,
or about one microhm per cm3.
Apparently the International Annealed Copper Standard (IACS) says
that annealed copper with a cross sectional area of a square
centimeter should have a resistance of about 1.7241 microhms/cm
(at 20 degrees Celsius), which is satisfactorily close.

A cube with one millimeter sides has 1/100 the cross sectional
area of a centimeter cube (and is about like AWG 17 wire), and so
would have 100x the resistance per cm., but also is only 1/10 the
length, for about 17 microhms per millimeter copper cube.
A meter of AWG 17 wire would have 1000 millimeter-size cubes
at 17 microhms each, so we would expect it to have about
17 milliohms total resistance.
As a check, separate wire tables give the resistance of AWG 17 at
5.064 ohms per 1000ft (304.8m), which is 0.017 ohms
(17 milliohms) per meter.

1. A secret agreement among like-minded individuals to work
toward a particular hidden goal despite outward appearances.
2. Legally, an agreement to break the law.

In a conspiracy, multiple individuals can each contribute a
minor action to accumulate a large effect.
One obvious approach is to use gossip to give the impression that
all right-thinking people are against some one or some thing.
A conspiracy can be difficult to oppose, because a major effect
can be achieved with minor actions that individually do not call
for a major response.

In the study of
logic, an observed fact dependent upon other
facts not being observed. Or a statement which is
conditionally true, provided other unmentioned conditions have the
appropriate
state. As opposed to
absolute.

In
electronics, the idea that
current flow occurs in the direction
opposite to the direction of electron flow.
This occurs because we assign a negative charge to electrons.
Conventional current flow occurs from
anode to
cathode within a device, and from the
cathode of one device to the anode of another.
For example, conventional current flow starts at the cathode or (+)
end of a battery, connects to the anode of a component and flows out
the cathode, which connects to the anode or (-) end of the battery.

Polynomial multiplication.
A multiplication of each term against each other term, with no
"carries" from term to term. Also see
correlation.

Used in the analysis of signal processing to develop the response
of a processing system to a complicated real-valued input signal.
The input signal is first separated into some number of discrete
impulses. Then the system response to an impulse
-- the output level at each unit time delay after
the impulse -- is determined.
Finally, the expected response is computed as the sum of the
contributions from each input impulse, multiplied by the magnitude
of each impulse.
This is an approximation to the convolution integral with an infinite
number of infinitesimal delays. Although originally accomplished
graphically, the process is just polynomial multiplication.

It is apparently possible to compute the convolution of two
sequences by taking the
FFT of each, multiplying these results
term-by-term, then taking the inverse FFT. While there is an
analogous relationship in the
FWT, in this case the "delays" between the
sequences represent
mod 2 distance differences, which may or may
not be useful.

A U.S. federal (and worldwide) right by which the owner of
a creative work can prevent others from copying and thus stealing
that work.
Copyright covers "original works of authorship" that are "fixed in
a tangible form of expression," including:
books, pamphlets, musical scores and plays; pictures, graphics and
sculpture; movies, audio recordings and architecture.
Included as literary works are computer program
source code, and
compilations of existing material, facts or data, which may include
even simple lists of facts, when produced by creative selection.
A copyright owner has the exclusive right to: reproduce the work,
to derive subsequent works, to distribute copies, and to perform or
display the work.
Copyright is an
intellectual property right;
also see
plagiarism.

Copyright protects a particular expression, but not the
underlying idea, process or function it may perform, which is
the province of
patent protection.
Copyright protects form, not content: Copyright can
protect particular text and diagrams, but not the described concept.
In general, copyright comes into existence simply by creating
a picture or manuscript or making a selection; theoretically, no
notice or registration is required. (See the Library of Congress
circular "Copyright Basics":
http://www.loc.gov/copyright/circs/circ1.html#cr).
However, formal registration is required before a lawsuit can be
filed, and registration within 3 months of publication supports
recovery of statutory damages and attorney fees; otherwise,
apparently only actual damages can be recovered.
Similarly, no copyright notice is required, but having one like this:

Copyright 1991 Terry Ritter. All Rights Reserved.

may avoid an "innocent infringement" defense.
Protection currently lasts 70 years beyond the death of the author,
or 95 years from date of publication for works for hire.
Copyright is not handled by the
PTO but instead by the United States Copyright Office
(http://lcweb.loc.gov/copyright/)
in the Library of Congress.

1. A
statistical relationship, not
necessarily linear, typically between two
variables.
A co-relation (Galton, 1869).
2. The probability that two sequences of symbols will, in any
position, have the same symbol.
(We expect two
random
binary sequences to have the same
bit values about half the time.)
3. The general idea that symbols or sequences of symbols will have
some non-random relationship in value.
For example, each new bit in an
LFSR sequence is
perfectly correlated to some computation involving earlier bits
in the sequence. (Also see
independent and
rule of thumb.)

One way to evaluate a common correlation of two real-valued
sequences is to multiply them together term-by-term and sum all
results.
If we do this for all possible "delays" between the two sequences,
we get a "vector" or 1-dimensional array of correlations
which is a
convolution. Then the maximum value
represents the delay with the best simple correlation.

A
statistic or measure of simple linear
correlation between two binary sequences.
Correlation coefficient values range from -1 to +1 and are related
to the probability that, given a symbol from one sequence, the other
sequence will have that same symbol. A value of:

-1 implies a 0.0 probability (the second sequence is the
complement of the first),

Note that integer counting produces perhaps the best possible
signal for investigating block cipher deficiencies in the rightmost
bits. Accordingly, incrementing by some large random constant, or
using some sort of
LFSR or other polynomial counter which changes
about half its bits on each step may be more appropriate.

In
statistics,
a measure of the extent to which
random variable
X will predict the value of random variable Y.
When X and Y are
independent, the covariance is 0.
When X and Y are linearly related, the covariance squared is the
variance
of X times the variance of Y.

CRC error-checking is widely used in practice to check the data
recovered from magnetic storage.
When data are written to disk, a CRC of the original data is computed
and stored along with the data itself.
When data are recovered from disk, a new CRC is computed from the
recovered data and that result compared to the recovered CRC.
If the CRC's do not match, we have a "CRC error."

Computer disk-read operations always have some chance of a
"soft error" which does not re-occur when the same sector is re-read,
so the usual hardware response is to try again, some number of times.
If that does not solve the problem, the error may be reported to the
user and could indicate the start of serious disk problems.

A CRC operation is essentially a remainder over the huge numeric
value which is the data; the mod 2 polynomials make this "division"
both faster and simpler than one might expect.
Related techniques like integer or floating point division can have
similar power, but are unlikely to be as simple.
In general, "division" techniques only miss errors which are some
product of the divisor, and so n-bit codes miss only
1 out of every 2n (that is, 2**n) possible errors, on
average.
Earlier techniques, such as
checksum are significantly less able to
detect errors in real data.

The CRC result is an excellent (but
linear) hash value corresponding to the data.
Compared with other hash alternatives, CRC's are simple and
straightforward. They are well-understood.
They have a strong and complete basis in mathematics, so there can be
no surprises.
CRC error-detection is mathematically tractable and
provable without recourse to unproven
assumptions. And CRC hashes do not need
padding. None of this is true for most
cryptographic hash
constructions.

For error-detection, the CRC register is first initialized to
some fixed value known at both ends, nowadays typically "all 1's."
Then each data element is processed, each of which changes the CRC
value.
When all of the data have been processed, the CRC result is sent
or stored at the end of the data.
Frequently the CRC result first will be complemented, so that a CRC
of the data and the complemented result will produce a fixed
"magic number."
This allows efficient hardware error-checking, even when the hardware
does not know how large the data block will be in advance.
(Typically, the end of transmission, after the CRC, is indicated by
a hardware "done" signal.)

Nowadays, CRC's are often computed in software which is generally
more efficient with larger data quantities.
Thus we see 8-bit, 16-bit or 32-bit data elements being processed.
However, CRC's can be computed on individual data bits, and on
records of arbitrary bit length, including zero bits, one bit, or
any uneven or dynamic number of bits. As a consequence, no
padding is ever needed for CRC hashing.

This fragment needs to execute 8 times to compute the CRC for a
full data byte.
However, a better way to process a byte in software is to
pre-compute a 256-element table representing every possible CRC
change corresponding to a single byte.
The table value is selected by a data byte XORed with the current
top byte of the CRC register (in a left-shift implementation).

In the late 60's and early 70's, the first CRC's were
initialized as "all-0's."
Then it was noticed that extra or missing 0-bits at the start of
the data would not be detected, so it became virtually universal
to init the CRC as "all-1's."
In this case, extra or missing zeros at the start are
detected, and extra or missing ones at the start are
detected as well.

It is possible for multiple errors to occur and the CRC result
to end up the same as if there were no error.
But unless the errors are introduced intentionally, this is
very unlikely.
Various common errors are detected absolutely, such as:

Any single bit added, anywhere.

Any single bit deleted.

Any single bit changed.

All "burst errors" (contiguous bits in error) of length
smaller than the polynomial.

If we have enough information, it is relatively easy to compute
error patterns which will take a CRC value to any desired CRC value.
Because of this, data can be changed in ways which will produce
the original CRC result.
Consequently, no CRC has any appreciable cryptographic
strength,
but some applications in cryptography need no strength:

One example is
key processing, where the uncertainty
in a User Key phrase of arbitrary size is collected into a
hash result of fixed size. In general, the hash result would
be just as good for the opponent as the original key phrase,
so no strength shield could possibly improve the situation.

Another example is the
hash accumulation of the uncertainty in
slightly uncertain physically random or
really random events.
When true randomness is accumulated, it is already as
unknowable as any strength shield could make it.

On the other hand, a CRC, like most computer
hashing operations, is normally used so that
we do not have "enough information."
When substantially more information is hashed than the CRC can
represent, any particular CRC result will be produced by a vast
number of different input strings. In this way, even a linear CRC
can be considered an irreversible "one way" or "information
reducing" transformation. Of course, when a string shorter than
the CRC polynomial is hashed, it should not be too difficult to
find the one string that could produce any particular CRC result.

The CRC polynomial need not be particularly special. Unlike the
generator polynomials used in
LFSR's, a CRC poly need not be
primitive nor even
irreducible.
Indeed, the early 16-bit CRC polys were
composite with a factor of "11" which is
equivalent to the information produced by a
parity bit.
(Since parity was the main method of error-detection at
the time, the "11" factor supported the
argument that CRC
was better.)
However, modern CRC polys generally are primitive, which
allows the error detection guarantees to apply over larger amounts
of data.
It also allows the CRC operation to function as an
RNG.
But the option exists to use secret random polynomials to detect
errors without being as predictable as a standard CRC.
Polynomial division does not require mathematical structure
(such as an irreducible or primitive), beyond the basic
mod 2 operations.

Different CRC implementations can shift left or right, take data
lsb or
msb first, and be initialized as zeros or ones,
each option naturally producing different results.
Various CRC standards specify different options.
Obviously, both ends must do things the same way, but it is not
necessary to conform to a standard to have quality error-detection
for a private or new design.
Variations in internal handling can make a CRC with one set of options
produce the same result as a CRC with other options.

When the logical complement of a CRC result is appended to the
data and processed msb first, the CRC across that data and the result
produces a "magic" value which is a constant for a particular poly and
set of options.
In general, the sequence reverse of a good poly is also a good poly, and
there is some advantage to having CRC polys which are about half 1's.
In some notations we omit the msb which is always 1 (as is the lsb).
For notational convenience, we can write
x3 + x2 + x + 1 as:
3,2,1,0. These are the positions of 1-bits
in the poly. Common CRC polys include:

In normal
cryptanalysis we start out knowing
plaintext,
ciphertext, and
cipher construction.
The only thing left unknown is the
key. A practical
attack must recover the key.
(Or perhaps we just know the ciphertext and the cipher, in which
case a real attack would recover plaintext.) Simply finding a
distinguisher (showing that the
cipher differs from the chosen model) is not, in itself, an
attack or break.

What Makes a Cipher "Strong"?

Because no theory guarantees strength for any conventional
cipher (see, for example, the
one time pad and
proof), ciphers traditionally
have been considered "strong" when they have been used for a long
time with "nobody" knowing how to break them easily.

Expecting cipher strength because a cipher is not known to
have been broken is the logic
fallacy of
ad ignorantium: a
belief which is claimed to be true because
it has not been
proven false.

Cryptanalysis seeks to extend this admittedly-flawed process by
applying known
attack strategies to new
ciphers (see
heuristic), and by actively seeking
new attacks.
Unfortunately, real attacks are directed at particular ciphers,
and there is no end to different ciphers.
Even a successful break is just one more trick from a virtually
infinite collection of unknown knowledge.

In cryptanalysis it is normal to assume that at least
known-plaintext is available;
often,
defined-plaintext is assumed.
The result is typically some value for the amount of
work which will achieve a
break (even if that value is impractical);
this is the
strength of the cipher under a given attack.
Different attacks on the same cipher may thus imply different amounts
of strength.
While cryptanalysis can demonstrate "weakness" for a
given level of effort, cryptanalysis cannotprove that there is no simpler attack
(see, for example,
attack tree and
threat model):

Lack of proof of weakness is not proof of
strength.

Indeed, when ciphers are used for real, the
opponents can hardly be expected to
advertise a successful break, but will instead work hard to
reassure users that their ciphers are still secure. The fact that
apparently "nobody" knows how to break a cipher is somewhat
less reassuring from this viewpoint.
(Also see the discussion: "The Value of Cryptanalysis,"
locally, or @:
http://www.ciphersbyritter.com/NEWS3/MEMO.HTM).
For this reason, using a wide
variety of different ciphers can make good sense: That reduces the
value of the information protected by any particular cipher, which
thus reduces the rewards from even a successful attack. Having
numerous ciphers also requires the opponents to field far greater
resources to identify, analyze, and automate breaking (when possible)
of each different cipher. Also see:
Shannon'sAlgebra of Secrecy Systems.

where PT = plaintext block value, K = Key, and
CT = ciphertext block value.
The brackets "[ ]" mean the operation of indexing: the selection of
a particular position in an array and returning that element value.
Here, E[K] represents a particular, huge, emulated
encryption "table," while E[K][PT] selects a single
entry (a block value) from that table.
So we can recover the plaintext with:

D[K][CT] = PT, or D[K,PT] = CT

where D[K] represents an
inverse or decryption table.
But an attacker does not know and thus must somehow develop the
decryption table.

We assume that an
opponent has collected quite a lot of
information, including lots of plaintext and the associated
ciphertext (a condition we call
known plaintext).
The opponent also has a copy of the cipher and can easily compute
every enciphering or deciphering transformation.
What the opponent does not have, and what he is presumably
looking for, is the key.
The key would expose the myriad of other ciphertext block values
for which the opponent has no associated plaintext.

We might imagine the opponent attacking a cipher with a
deciphering machine having a huge "channel-selector" dial to select
a key value.
As one turns the key-selector, each different key produces a
different deciphering result on the display.
So all the opponent really has to do is to turn the key dial until
the plaintext message appears.
Given this extraordinarily simple attack (known as
brute force), how can any
cipher be considered secure?

In a real cipher, we make the key dial very, very, very
big! The
keyspace of a real cipher is much too big,
in fact, for anyone to try each key in a reasonable amount of time,
even with massively-parallel custom hardware.
That leaves the opponent with a problem: brute force does not work.

Nevertheless, the cipher equation seems exceedingly simple.
There is one particular huge emulated table as selected by the key,
and the opponent has a sizable set of positions and values from
that table.
Moreover, all the known and unknown entries are created by
exactly the same
mechanism and key.
So, if the opponent can in some way relate the known entries to the
rest of the table, thus
predicting unknown entries, the cipher
may be
broken.
Or if the opponent can somehow relate known plaintexts to the key
value, thus predicting the key, the key may be exposed.
And with the key, ciphertext for which there is no corresponding
plaintext can be exposed, thus breaking the cipher.
Finding these relationships is where the cleverness of the
individual comes in.
In a real sense, a cipher is a puzzle, and we currently
cannot guarantee that there is no particular "easy" way for a
smart team to solve it.

One peculiarity of conventional block ciphers is that they
cannot emulate all possible tables, but instead only a tiny, tiny
fraction thereof (see
block cipher).
Even what we consider a huge key simply cannot select from among
all possible tables because there are far too many.
Now, the "tiny fraction" of tables actually emulated is still
too many to traverse (this is a "large enough" keyspace), but,
clearly, some special selection is happening which might be
exploited.
Having even one particular value at one known table position is
sufficiently special that we expect that only one key would produce
that particular relationship in a conventional cipher.
So, in practice, just one known-plaintext pair generally
should be sufficient to identify the correct key, if only we could
find some way to do it.

More Realistic Attacks

In academic cryptanalysis we normally assume that we do not
know the
key, but do know the
cipher and everything about it.
We also assume essentially unlimited amounts of
known plaintext to use in an
attack to find the key.
In practice things are considerably different.

In practical cryptanalysis we may not know which cipher has been
used.
The cipher may not ever have been published, or may have been
modified from the base version in various ways.
Even a cipher we basically know may have been used in a way which
will disguise it from us, for example:

The external key may have been custom processed or hashed
and may not be the key the internal cipher uses.
The key value could have been casually bit-reversed or
byte-reversed or oddly padded or so on.
Or the key itself could have been intentionally enciphered.

The message text or data could be changed in just as many
different ways.

So even if we have the right key, we can only show that by checking
every cipher we know in every way it could have been used.

Unknown Ciphers

Selecting among different ciphers is part of
Shannon's 1949
Algebra of Secrecy Systems.
In a modern computer implementation, we could select ciphers
dynamically.
The number of selectable transformations increases exponentially
when several ciphers are used in sequence
(multiple encryption).
Considering Shannon's academic work in this area, the use of
well-known standardized designs is, ironically and rather sadly,
current cryptographic orthodoxy (see
risk analysis).

The general mathematical
model of ciphering is that of a
keyed transformation (a
mapping or
function).
Numerically, we can make the general model work for a system of
multiple ciphers by allowing some "key" bits to select the cipher,
with the rest of the key bits going to key that cipher.
But in the adapted model, different parts of the key will have vastly
different difficulties for the opponent.
Finding the correct key within a cipher may be hard, yet could be
much, much easier than finding the exact cipher actually being used.
Differences in the difficulty of finding different key bits are
simply glossed over in the adapted general model.

Somehow obtaining and
breaking every cipher which possibly could
have been used is a vastly larger problem than the relatively small
increase in keyspace indicated by the number of possible ciphers.
For example, if we think we have found the key and want to check
it, on a known cipher that has essentially no cost and may take a
microsecond.
But if we want to check the key on an unknown cipher, we first have
to obtain that cipher.
That may require the massive ongoing cost of maintaining an
intelligence field service to obtain copies of secret ciphers.
Once the needed cipher is obtained, finding a practical break may
take experts weeks or months, if a break is even found.
Taken together, this is a vast increase in difficulty for the
opponent per cipher choice compared to the difficulty per key choice
within a single cipher.

Just as it may be impossible to try every 128-bit key at even a
nanosecond apiece, it also may be impossible to keep up with a far
smaller but continuing flow of new secret ciphers which take
hundreds of billions of times longer to handle.
This advantage seems to be exploited by
NSA in keeping cipher designs secret (also see
security through obscurity).
Given the stark contrast of yet another real example which contradicts
the current cryptographic wisdom, crypto academics continue to insist
that standardizing and exposing the cipher design makes sense.
Surely, exposing a cipher does support gratuitous analysis and help
to expose some cipher weakness, but does not, in the end, give us
a proven strong cipher.
In the end, exposing the cipher may turn out to benefit opponents
far more than users.

In practice, an individual attacker mainly must hope that the
cipher to be broken is flawed. An attacker can collect
ciphertext statistics and hope for some
irregularity, some
imbalance or statistical
bias that will identify the cipher class, or
maybe even a well-known design. An attacker can make
plaintext assumptions and see if some key
will produce those words.
But enciphering guessed plaintext seems an unlikely path to success
when every possible cipher, and every possible modification of that
cipher, is the potential encryption source.
All this is a very difficult problem, and far different than the
normal academic analysis.

Other Weaknesses

Many academic attacks are essentially theoretical, involving huge
amounts of data and computation. But even when a direct technical
attack is practical, that may be the most difficult, expensive
and time-consuming way to obtain the desired information. Other
methods include making a paper copy, stealing a copy, bribery,
coercion, and electromagnetic monitoring. No cipher can keep secret
something which has been otherwise revealed. Information
security thus involves far more than just
cryptography, and a cryptographic
system is more than just a cipher (see:
cipher system).
Even finding that information has been revealed does not mean that
a cipher has been broken, although good security virtually requires
that assumption.
(Of course, when we can use only one cipher, we cannot change
ciphers anyway.)

Unfortunately, we have no way to know how strong a cipher appears
to our opponents.
Even though the entire reason for using cryptography is a
belief
that our cipher has sufficient strength,
science
provides no basis for such belief.
At most, cryptanalysis can give us only an upper limit
to the strength of a cipher, which is not particularly helpful,
and can only do that when a cipher actually can be broken.
But when a cipher is not broken, cryptanalysis has told us
nothing about the strength of a cipher, and unbroken ciphers are
the only ones we use.

Cryptanalytic Contributions and Limits

The ultimate goal of cryptanalysis is not to break every
possible cipher (that would be the end of an industry and also the
end of new PhD's in the field).
Instead, the obvious goal is understanding why some ciphers
are weak, and why other ciphers seem strong.
It is not much of a leap from that to expect
cryptanalysts to work with, or at
least interact with, cipher designers, with a common goal
of producing better ciphers.

Unfortunately, cryptanalysis is ultimately limited by what can
be done: there are no ciphering techniques which guarantee strength,
and there is no test which tells us how weak an arbitrary cipher
really is.
Accordingly, exposing a particular weakness in a particular cipher
may be about as much as cryptanalysis can offer, even if that means
a deafening silence about similar designs, ciphers which
have been repaired, or significant cipher designs which remain
both unbroken and undiscussed.

In
cryptography, some supposedly technical
disputes just seem to go on and on.
Some topics have led to many hundreds of postings on Usenet sci.crypt
in various conversations across multiple years.
Clearly, those are controversial topics, whether academics think so
or not.
But it is also controversial to point out alternatives which
conflict with current cryptographic wisdom or techniques.
Dissent and disagreement are how controversies start.

Apparent agreement among academics does not imply a lack of
academic controversy, since many will side with the conventional
wisdom, while others step back to consider the
arguments.
Since reality is not subject to majority rule, even universal
academic agreement would not constitute a scientific argument, which
instead requires facts and exposed logical reasoning.
Controversy may even imply that academic cryptographers are
unaware of the issue, or have not really considered it in a deep way.
For if clear, understandable and believable explanations already
existed, there would be little room for debate.
Controversy arises when the given explanations are false, or
obscure, or unsatisfactory.

Scientific controversy is less about conflict than exposing Truth.
That happens by doing research, then taking a stand and supporting
it with facts and scientific argument.
Many of these issues should have indisputable answers or expose
previously ignored consequences.
Wishy-washy statements like "some people think this, some think
that," not only fail to inform, but also fail to frame a discussion
to expose the real answer.

Science Means Models

One aspect of
science is the creation of quantitative
models which describe or
predict reality.
Since poor models lead to errors in reasoning, a science reacts
to poor predictions by improving the models.
In contrast, cryptography reacts by making excuses about why the
model really is right after all, or does not apply, or does not
matter.
Examples include:

A common model for a conventional block cipher is "a family of
permutations."
But that family is huge, and in practice almost none of that family
can be selected.
A better model would be "a tiny subset of a family of permutations,"
which would lead immediately to questions about subset size and
structure.
Absent a realistic model, such questions rarely arise, and are even
more rarely addressed.

The
one time pad (OTP) is often said to be
"provensecure." But
NSA has in fact broken OTP's in practice, as
we know from their description of
VENONA.
In exactly what way can cryptography claim that "proven secure"
has any practical meaning, when using a "proven secure" cipher can
result in death by execution from cipher failure?
If "proven secure" is intended to be only theoretical, where is the
practical analysis we need?

Cryptanalysis is sometimes said to be
how we know our ciphers are strong, but that is false.
The best cryptanalysis can do is find a problem, but not finding a
problem does not make a cipher "strong," only "not known to be weak."
That is not
mere semantics, but instead sets
absolute limits on cryptanalysis as a source of knowledge about
strength.
It also implies that cryptography must do something different just
to get the knowledge of strength that most people think we already
have.
By not exposing this limitation, cryptography encourages students,
buyers and users to form the conceptual model that mathematics can
deliver guaranteed strength in practice when in reality that is
almost never possible.
And it is similarly impossible to know the probability of failure.

In my view, cryptography has presided over a fundamental breakdown in
logic, perhaps created by
awe of supposedly superior
mathematical theory.
Upon detailed examination, however, theoretical math often turns out
to be inapplicable to the case at hand.
Practical results which conflict with theory are ignored or
dismissed, even though confronting reality is how science improves
models.
Demanding belief in conventional cryptographic wisdom requires
people to think in ways which accept logical falsehood as truth,
and then they apply that lesson.
Reasoning errors have become widespread, accepted, and prototypes
for future thought.
Disputes in cryptography are commonly argued with logic
fallacies, and may be "won" with
arguments that have no force at all.
Since experimental results are rare in cryptography, we cannot
afford to lose reason, because that is almost all we have.

One Cipher is Not Enough

A major logical flaw in conventional cryptography is the
belief that one good cipher is enough.

But since cipher
strength occurs only in the context of our
opponents, how could we ever know
that we have a "good" cipher, or how "good" it is?

So if we cannot measure "good," and cannot prove
"good," then exactly how do we know our ciphers are "good?"
The answer, of course, is that we do not and can not know any such
thing.
In fact, nobody on our side can know that our ciphers are
"good," no matter how well educated, experienced or smart they
may be, because that is determined by our opponents in secret.
Anyone who feels otherwise should try to put together what they
see as a
scientificargument to prove their point.

When conventional cryptography accepts a U.S. Government
standard cipher as "good" enough,
there is no real need:

Conventional cryptography encourages a belief in known cipher
strength, thus ignoring both logic and the lessons of the past.
That places us all at
risk of cipher failure, which probably
would give no indication to the user and so could be happening
right now.
That we have no indication of any such thing is not particularly
comforting, since that is exactly what our opponents would want to
portray, even as they expose our information.

When an ordinary person makes a claim, they can be honestly
wrong.
But when a trained expert in the field makes a claim that we know
cannot be supported, and continues to make such claims, we are
pretty well forced into seeing that as either professional
incompetence or deliberate deceit.
Encouraging people to use only one cipher by claiming they need
nothing else is exactly what one would expect from
an opponent who knows how to break the cipher.
Maybe that is just coincidence.

A List of Crypto Controversies

Cryptographic controversies include:

AES:
AES is the new conventional block cipher standard.
However, like most modern block ciphers, it uses only a
breathtakingly small portion of the possible keyspace.
And while that may not be a known weakness, it is at least
disturbing.

BB&S:
Blum, Blum and Shub is the way we refer to a well known article
which describes a very slow random number generator which is
supposedly "provably secure."
Current crypto texts generally present a simplified version,
but deceptively use the same BB&S name.
That simplified construction has a very rare weakness related
to choosing the initial value, a weakness that can be fixed at
modest cost.
As a consequence, a designer who wants assurance that every
known fixable weakness has been eliminated will use the original
construction.
Confrontation arises when someone insists that no competent
designer could have such a goal.

Bijective Compression:
When random blocks or strings decompress into apparently
grammatical text, it may be difficult to automatically distinguish
the correct decryption from incorrect decryptions.
The result is a potential increase in practical strength.

Block Cipher Definitions:
Ordinarily, the person wishing to display an insight gets to
set up the definitions, but of course communication requires
both the writer and reader to have the same definition.
Various block cipher definitions exist.

CBC First Block Problem:
CBC requires an IV, typically sent with ciphertext.
That IV is exclusive-ORed with the plaintext of the first block.
So if opponents know the first block plaintext, and can intercept
and change the IV, and the IV is sent as plaintext, the opponents
can change the deciphered first block to any desired value.
This is a form of man-in-the-middle attack.
The usual advice is to use a MAC to authenticate the full
deciphered message, and that works.
But the fundamental problem is confidentiality, not
authentication.
A MAC is unnecessary if the IV is enciphered.

Cryptanalysis:
Cryptanalysis is sometimes thought of as the way we know the
strength of a cipher, but that is wrong.
At its best, cryptanalysis can find out that a cipher is weak,
and then we do not use that cipher.
But cryptanalysis which finds no break does not mean that the
cipher is strong, and we will use it anyway.

Data Compression:
Data compression can increase the unicity distance while
simultaneously decreasing the size of plaintext which must
exceed that distance to be attacked.

Distinguisher:
A distinguisher typically is some sort of statistical test that
will show an imperfect distribution in, for example, a
conventional block cipher.
But by itself, that does not show weakness, and is no form
of a break at all.

Entropy:
Shannon's entropy measures coding efficiency.
It produces the same result whether a sequence is known in advance
or not.
It produces the same result whether a sequence is predictable
or not.
Entropy does not measure unpredictability.

Huge Block Cipher Advantages:
If we have a technology which allows the efficient construction
of huge blocks, such blocks can have significant advantages.

Known Plaintext:
Modern ciphers are sometimes thought to be invulnerable to known
plaintext.
But if we model a cipher as a mathematical function taking
plaintext to ciphertext, known plaintext is how an attacker
examines the function that the cryptographer wishes to hide.

Old Wives' Tales:
Various delusions are widely accepted in the strange field of
cryptography.

One Time Pad:
The one time pad is often held up as the only example of a
proven unbreakable cipher.
Unfortunately, only the theoretical version is proven
unbreakable, and that version only protects theoretical data.
In practice there is no proven secure one time pad,
because the required assumptions cannot be provably achieved
in practice.
Unfortunately, the crypto texts cause every crypto newbie to
come to a different and wrong conclusion.
That is first a failure of the field to properly model reality,
and next a willingness to deceive those with the least crypto
training and experience.

Proof:
It is currently fashionable for cryptographic constructions to
be claimed (and acclaimed) "provably secure."
But for a proof to apply, every assumption it uses must be both
known, and known to be true.
For a proof to be effective for a user, that user must be able
to first know every required assumption, and next
show that each has been achieved in practice.
Since few if any cryptosystems actually allow that, for users,
"provably secure" generally is just another crypto deception,
especially for those with the least crypto training and
experience.

Randomness Testing:
Randomness cannot be certified by test, just indicated.
If absolute randomness is required, no test can tell us when
we have it.

Really Random:
Predictable statistical randomness is easily created, but
unpredictable real randomness cannot be generated by a
deterministic machine.
Although various forms of "entropy" are supposedly available
in a computer system, upon closer examination we may find that
little unpredictability remains.

Risk:
Ciphers may be the only product in modern manufacture which
cannot be tested to their design goal.
If the goal of a cipher is to keep secrecy, that can only be
judged by opponents who work against us in secret and do not
announce their results.
Since we cannot know when our ciphers fail, we cannot begin
to know the risk of cipher failure.
Designers cannot know the outcome of their attempts to meet
ciphering goals.
That means there can be no expertise in cipher strength, by
anyone, no matter how well educated or experienced.
Even opponents only know their part of the truth.

Scalability:
There is, and probably can be, no test to measure the strength
of an arbitrary cipher.
All ciphers are completely outside manufacturing control with
respect to strength.
An alternative is to develop designs which can be implemented
at tiny size, and then exhaustively tested.
Tiny ciphers will not be strong, but they can be rigorously
tested in ways that large ciphers cannot.
Scalability is an enabling technology which provides insight
to strength, something sorely needed but not supported by
conventional cryptography.

Snake Oil:
Since there are no tests of cipher strength, certain rules
of thumb have been used to indicate weakness.
Those rules have problems.

Software Patent:
There are no software patents; there are just patents which
apply to software.
There are no pure algorithm patents, but patents always have
described machines which implement algorithms.
In particular, process claims apparently have been granted
almost from the beginning of U.S. patents.
(But see a patent lawyer in any real situation.)

Strength:
Crypto people talk about strength precisely the way that
virgins talk about sex: with great enthusiasm and little
understanding.
There is and can be no expertise on cipher strength.

Term of Art:
Terms used by various professions often mean something
different than the exact same phrase used on its own.
That can lead to endless difficulty in understanding and
discussion, especially for those with the least training
and experience.

Trust:
The value of trust inherently depends upon having a
relationship with consequences should trust fail.

Unpredictability:
One aspect of science is the construction of numerical models
which "correctly predict" measurable outcomes.
Similarly, cryptanalysis seeks to build models which
"correctly predict" the unknown cipher key.
The inability to construct a key-predicting scientific model
is one way to see cipher strength.

In my view, cryptography often does not understand or attempt
to address controversial issues in a scientific way.
In areas where cryptography cannot distinguish between truth and
falsehood, it cannot advance.

If anyone has any other suggestions for this list, please let
me know.

Greek for "hidden writing." The art and science of transforming
(encrypting) information
(plaintext) into an intermediate form
(ciphertext) which
secures information in storage or transit.
Normally, security occurs as a result of having a vast number of
different transformations, as selected by some sort of
key. Then, if an
opponent acquires some ciphertext, a
vast number of different plaintext messages presumably could have
produced that exact same ciphertext, one for each of the possible
keys.

We normally assume that the opponents have a substantial amount of
known plaintext to use in their work (see
cryptanalysis).
So the situation for the opponents involves taking what is known and
trying to extrapolate or
predict what is not known.
That is similar to building a
scientific model intended to
predict larger reality on the basis of many fewer experiments.
Since the whole idea is to make prediction difficult for the opponent,
unpredictability can be called
the essence of cryptography.

Cryptography is a part of
cryptology, and is further divided into
secret
codes versus
ciphers.
As opposed to
steganography, which seeks to
hide the existence of a message, cryptography seeks to
render a message unintelligible even when the message is
completely exposed.

In practice, cryptography should be seen as a system or
game which includes both users and
opponents: True
scientific measures of
strength do not exist when a
cipher has not been
broken, so users can only hope for
their cipher systems to protect their messages.
But opponents may benefit greatly if users can be convinced to
adopt ciphers which opponents can break.
Opponents are thus strongly motivated to get users to
believe in the strength of a few weak
ciphers. Because of this,
deception,
misdirection,
propaganda and
conspiracy are inherent in the
practice of cryptography. (Also see
trust and
risk analysis.)

And then, of course, we have the natural response to these
negative possibilities, including individual
paranoia and
cynicism.
We see the consequences of not being able to test cipher security
in the arrogance and aggression of some newbies.
Even healthy users can become frustrated and
fatalistic when they understand cryptographic reality.
Cryptography contains a full sea of unhealthy psychological states.

Great Expectations in Cryptography

For some reason (such as the lack of direct academic statements
on the issue), some networking people who use and depend upon
cryptography every day seem to have a slightly skewed idea about
what cryptography can do.
While they seem willing to believe that ciphers might be broken,
they assume such a thing could only happen at some great effort.
Apparently they believe the situation has been somehow assured by
academic testing.
But that belief is false.

Ciphers are like puzzles, and while some ways to solve the
puzzle may be hard, other ways may be easy.
Moreover, once an easy way is found, that can be put into a program
and copied to every "script kiddie" around.
The hope that every attacker would have to invest major effort to
find their own fast break is just wishful thinking.
And even as their messages are being exposed, the users probably
will think everything is fine, just like we think right now.
Cipher failure could be happening to us right now, because there
will be no indication when failure occurs.

What are the chances of cipher failure?
We cannot know!
Ciphers are in that way different from nearly every other
constructed object.
Normally, when we design and build something, we measure it to see
that it works, and how well.
But with ciphers, we cannot measure how well our ciphers
resist the efforts of our opponents.
Since we have no way to judge effectiveness, we also cannot judge
risk.
Thus, we simply have no way to compare whether the cipher design
is more likely to be weak than the user, or the environment, or
something else.
As sad as this situation may seem, it is what we have.

When compared to the alternative of blissful ignorance,
it should be a great advantage to know that ciphers cannot be
depended upon.
First, design steps could be taken to improve things (although that
would seem to require a widespread new understanding of the
situation that has always existed).
Next, we note that ciphers can at most reveal only what they
try to protect:
When protected information is not disturbing, or dangerous,
or complete, or perhaps not even true, exposure becomes much less
of an issue.

Keying in Cryptography

Modern cryptography generally depends upon translating a message
into one of an astronomical number of different intermediate
representations, or
ciphertexts, as selected by a
key.
If all possible intermediate representations have similar appearance,
it may be necessary to try all possible keys (a
brute force attack)
to find the key which deciphers the message.
By creating
mechanisms
with an astronomical number of keys, we can make this approach
impractical.

Keying is the essence of modern cryptography.
It is not possible to have a
strongcipher without keys, because it is the
uncertainty about the key which creates the "needle in a haystack"
situation which is conventional strength.
(A different approach to strength is to make every message equally
possible, see:
Ideal Secrecy.)

Nor is it possible to choose a key and then reasonably expect
to use that same key forever. In
cryptanalysis,
it is normal to talk about hundreds of years of computation and
vast effort spent
attacking
a cipher, but similar effort may be applied to obtaining the key.
Even one forgetful moment is sufficient to expose a key to such
effort.
And when there is only one key, exposing that key also exposes all
the messages that key has protected in the past, and all messages
it will protect in the future.
Only the selection and use of a new key terminates
insecurity
due to key exposure.
Only the frequent use of new keys makes it possible to expose a
key and not also lose all the information ever protected.

Engineering in Cryptography

Cryptography is not an
engineeringscience:
It is not possible to know when cryptography is "working," nor how
close to not-working it may be:

There is no test that will give us the
strength of an arbitrary
cipher.

There is no "materials science" that will tell us how much
strength to expect, given the component parts.

Cryptography may be seen as a
dynamic battle between
cryptographer and
cryptanalyst or
opponent.
The cryptographer tries to produce a
cipher which can retain
secrecy. Then,
when it becomes worthwhile, one or more cryptanalysts may try to
penetrate that secrecy by
attacking the
cipher. Fortunately for the war, even after fifty years of
mathematical cryptology, not one practical cipher has
been accepted as
provensecure in practice. (See, for example, the
one-time pad.)

Note that the successful cryptanalyst must keep good attacks
secret, or the opposing cryptographer will just produce a
stronger
cipher. This means that the cryptographer is in the odd position
of never knowing whether his or her best cipher designs are
successful, or which side is winning.

Cryptographers are often scientists who are trained to ignore
unsubstantiated claims.
But the field of cryptography often turns the
scientific method
on its head, because almost never is there a complete
proof of cryptographic
strength in practice.
In cryptography, scientists accept the failure to
break a
cipher
as an indication of strength (that is the
ad ignorantiumfallacy),
and then demand substantiation for claims of weakness.
But there will be no substantiation when a
ciphersystem is
attacked and
broken for real, while
continued use will endanger all messages so "protected."
Evidently, the conventional scientific approach of requiring
substantiation for claims is not particularly helpful for users
of cryptography.

Since the scientific approach does not provide the assurance
of cryptographic strength that users want and need, alternative
measures become appropriate:

It can be a reasonable policy to not adopt a widely-used
cipher, since such ciphers provide the best target for attackers
and also the best reward for
cryptoanalytic success.

Another reasonable policy is to change ciphers periodically,
perhaps even on a message-by-message basis. This limits the
extent of use of any weak cipher.

Yet another reasonable policy is to use
multiple encryption as a
matter of course (see
Algebra of Secrecy Systems).
Using three different ciphers on every message protects the message
even if one of the ciphers has been broken in secret.

It is especially important to consider the effect the underlying
equipment has on the design.
Even apparently innocuous operating system functions, such
as the multitasking "swap file," can capture supposedly secure
information, and make that available for the asking.
Since ordinary disk operations generally do not even attempt to
overwrite data on disk, but instead simply make that storage
free for use, supposedly deleted data is, again, free for the
asking.
A modern cryptosystem will at least try to address such issues.

1. In Physics, atoms arranged in a repeating structure.
2. In
electronics,
typically a small part of a thin slice of a much larger
quartz
crystal, intended to regulate or
filter a particular
frequency.
Typically a thin slab of translucent mineral, perhaps a quarter of
an inch square, usually with metal terminals plated on each large
side.
Frequency preference occurs because of mechanical
resonance-- an actual flexing of the crystalline rock itself
-- which "rings" at a particular frequency.

Quartz is a
piezoelectric
material, so a
voltage
across the terminals forces the quartz wafer to bend slightly,
thus storing mechanical energy in physical tension and compression
of the solid quartz.
The physical mass and elasticity of quartz cause the wafer to
mechanically
resonate
at a natural frequency depending on the size and shape of the
quartz blank.
The crystal will thus "ring" when the electrical force is released.
The ringing will create a small
sine wave
voltage across electrical contacts touching the crystal, a voltage
which can be
amplified and fed back into the
crystal, to keep the ringing going as oscillation.

Crystals are typically used to make exceptionally stable electronic
oscillators
(such as the
clock oscillators widely used in
digital
electronics) and the relatively narrow frequency
filters often used in radio.

It is normally necessary to physically grind a crystal blank to
the desired frequency. While this can be automated, the accuracy
of the resulting frequency depends upon the effort spent in exact
grinding, so "more accurate" is generally "more expensive."

Frequency stability over temperature depends upon slicing the
original crystal at precisely the right angle.
Temperature-compensated crystal oscillators (TCXO's) improve
temperature stability by using other components which vary with
temperature to correct for crystal changes.
More stability is available in oven-controlled crystal oscillators
(OCXO's), which heat the crystal and so keep it at a precise
temperature despite ambient temperature changes.

Sometimes suggested in
cryptography
as the basis for a
TRNG,
typically based on
phase noise
or
frequency
variations.
But a crystal oscillator is deliberately designed for high frequency
stability; it is thus the worst possible type of oscillator from
which to obtain and exploit frequency variations.
And crystal oscillator phase noise (which we see as edge
jitter)
is typically tiny and must be detected on a cycle-by-cycle basis,
because it does not accumulate.
Detecting a variation of, say, a few picoseconds in each 100nSec
period of a typical 10 MHz oscillator is not something we do on
an ordinary computer.

Another common approach to a crystal oscillator TRNG is to
XOR
many such oscillators, thus getting a complex high-speed waveform.
(The resulting
digital
signal rate increases as the sum of all the oscillators.)
Unfortunately, the high-speed and asynchronous nature of the wave
means that
setup and
hold
times cannot be guaranteed to latch that data for subsequent use.
(Latching is inherent in, say, reading a value from a computer
input port.)
That leads to statistical bias and possible
metastable
operation.
Futher, the construction is essentially
linear
and may power up similarly each time it is turned on.

The measure of electron flow, in amperes.
A current of one amp is one coulomb per second, or about
6.24 x 1018 electrons per second.
Conventional current flow
is in the direction opposite to the flow of electrons, because
electrons have been assigned a negative charge.
The movement of electrons is a negative flow, which is generally
equivalent to a positive flow in the opposite direction.

1. Something which repeats over and over, like a wheel turning.
2. One full
period of a repetitive signal.
3. In a
FSM, a
path which repeats endlessly.
As opposed to an
arc.
In any FSM of finite size, every state sequence must eventually
repeat or lead into a sub-sequence which repeats; every state
eventually leads to a cycle.

In some
RNG constructions, (e.g.,
BB&S and the
Additive RNG)
the system consists of multiple independent cycles, possibly of
differing lengths.
Since having a cycle of a guaranteed length is one of the main
requirements for an RNG, the possibility that a short cycle may
exist and be selected for use can be disturbing.

The ability to represent data in forms which take less
storage than the original.
The limit to this is the amount of uniqueness in the data.

Sometimes people claim that they have a method to compress a
file, and that they can compress it again and again, until it is
only a byte long.
Unfortunately, it is impossible to compress all possible
files down to a single
byte each,
because a byte can only select 256 different results.
And while each byte value might represent a whole file of data, only
256 such files could be selected or indicated.

Normally, compression is measured as the percentage size
reduction; 60 percent is a good compression for ordinary text.

In general, compression occurs by representing the most-common
data values or sequences as short code values, leaving longer code
values for less-common sequences.
Understanding which values or sequences are more common is the
"model" of the source data.
When the model is wrong, and supposedly less-common values actually
occur more often, that same compression may actually
expand the data.

Data compression is either "lossy," in which some of the
information is lost, or "lossless" in which all of the original
information can be completely recovered.
Lossy data compression can achieve far greater compression, and
is often satisfactory for audio or video information (which are both
large and may not need exact reproduction).
Lossless data compression must be used for binary data such as
computer programs, and probably is required for most
cryptographic uses.

Compression and Encryption

Compressing
plaintext data has the advantage of
reducing the size of the plaintext, and, thus, the
ciphertext as well.
Further, data compression tends to remove known characteristics
from the plaintext, leaving a compressed result which is more
random.
Data compression can simultaneously expand the
unicity distance
and reduce the amount of ciphertext available which must
exceed that distance to support
attack.
Unfortunately, that advantage may be most useful with fairly short
messages. Also see:
Ideal Secrecy.

One goal of
cryptographic data compression would
seem to be minimize the statistical structure of the plaintext.
Since such structure is a major part of
cryptanalysis, that would seem to be
a major advantage.
However, we also assume that our
opponents are familiar with our
cryptosystem, and they can use the same decompression we use.
So the opponents get to see the structure of the original plaintext
simply by decompressing any trial
decryption they have.
And if the decompressor cannot handle every possible input value,
it could actually assist the opponent by identifying wrong
decryptions.

When using data compression with encryption, one pitfall is that
many compression schemes add recognizable data to the compressed
result.
Then, when that compressed result is encrypted, the "recognizable
data" represents
known plaintext, even when only the
ciphertext is available.
Having some guaranteed known plaintext for every message could be
a very significant advantage for opponents, and unwise cryptosystem
design.

It is normally impossible to compress random-like ciphertext.
However, some cipher designs do produce ciphertext with a restricted
alphabet which can of course be compressed.
Also see
entropy.

Bijective Compression

Another possibility is to have a data decompressor that can take
any random value to some sort of grammatical source text.
That may be what is sometimes referred to as
bijective compression.
Typically, a random value would decompress into a sort of nonsensical
"word salad" source text.
However, the statistics of the resulting "word salad" could be very
similar to the statistics of a correct message.
That could make it difficult to computationally distinguish between
the "word salad" and the correct message.
If "bijective compression" imposes an attack requirement for human
intervention to select the correct choice, that might complicate
attacks by many orders of magnitude.
The problem, of course, is the need to devise a compression scheme
that decompresses random values into something grammatically similar
to the expected plaintext.
That typically requires a very extensive statistical model, and of
course at best only applies to a particular class of plaintext
message.

An extension of the "bijective" approach would be to add random
data to compressed text.
Obviously, there would have to be some way to delimit or otherwise
distinguish the plaintext from the added data, but that may be
part of the compression scheme anyway.
More importantly, the random data probably would have to be added
between the compressed text in some sort of keyed way, so that it
could not easily be identified and extracted.
The keying requirement would make this a form of encryption.
The result would be a
homophonic encryption, in that the
original plaintext would have many different compressed
representations, as selected by the added random data.
Having many different but equivalent representations allows the same
message to be sent multiple times, each time producing a different
encrypted result.
But it is also potentially dangerous, in that the compressed message
expands by the amount of the random data, which then may represent
a hidden channel.
Since, for encryption purposes, any random data value is as good
as another, that data could convey information about the key and
the user would never know.
Of course, the same risk occurs in
message keys or, indeed, almost any
nonce.

Most
electronic devices require DC
-- at least internally
-- for proper operation, so a substantial part
of modern design is the "power supply" which converts 120 VAC wall
power into 12 VDC, 5 VDC and/or 3 VDC as needed by the
circuit
and active devices.

The interactive
analytical process of correcting
bugs in the design of a complex
system.
A normal part of the development process.

Contrary to naive expectations, a complex system almost never
performs as desired when first realized. Both
hardware and
softwaresystem design environments generally
deal with systems which are not working.
When a system really works, the design and development
process is generally over.

Debugging involves identifying problems, analyzing the source of
those problems, then changing the construction to fix the problem.
(Hopefully, the fix will not itself create new problems.)
This form of interactive analysis can be especially difficult because
the realized design may not actually be what is described in the
schematics, flow-charts, or other working documents: To some extent
the real system is unknown.

The most important part of debugging is to understand in great
detail exactly what the system is supposed to do.
In hardware debugging, it is common to repeatedly reset the system,
and start a known sequence of events which causes a failure.
Then, if one really does know the system, one can probe at various
points and times and eventually track down the earliest point where
the implemented system diverges from the design intent.
The thing that causes the divergence is the bug.
Actually doing this generally is harder than it sounds.

Software debugging is greatly aided by a design and implementation
process that decomposes complex tasks into small, testable procedures
or modules, and then actually testing those procedures.
Of course, sometimes the larger system fails anyway, in which case
the procedure tests were insufficient, but they can be changed and
the fixed procedure re-tested.
Sometimes the hardest part of the debugging is to find some set of
conditions that cause the problem.
Once we have that, we can repeatedly run through the code until
we find the place where the expected things do not occur.

Debugging real-time software is compounded by the difficulty of
knowing the actual sequence of operations.
This frequently differs from our ideal model of multiple completely
independent software processes.

One possibility is to build in some sort of "hidden screen" that
displays important system data, like the amount of free memory,
disk space, current buffer situation, etc.

Another possibility is to create a real-time log of which
procedures were entered when, on which thread, with their parameters.
This code can be added to the start of the normal code in each
routine, and enabled with a global flag.

One might think to comment-out the logging code eventually,
or to be able to conditionally compile that code away, but bugs
generally do not give us a chance to re-compile with the debug
code before they appear.
We need to be able to set a system flag at runtime to turn
data logging on and off.

There are various ways to implement real-time logging:

One way might be to have some sort of high-speed link to
another system which records that data for later analysis.

Alternately, it may be possible to have another thread
which does the logging in the system itself, which should of
course be as simple and reliable as possible.

Yet another approach might be to allocate a fairly-substantial
memory buffer, and just write the log data sequentially into
memory.
In this case we want to stop logging as soon as possible after
the bug occurs so critical data are not overwritten.

When analyzing the data, check every possibility, no matter how
unlikely: At gigahertz instruction rates, even very, very unlikely
things can happen fairly often.
Whether something is improbable or not is irrelevant if it is
the bug.

Two of the better books on the debug process include:

Agans, D. 2002. Debugging. The 9 Indispensable Rules for
Finding Even the Most Elusive Software and Hardware Problems.
AMACOM.

Telles, M. and Y. Hsieh. 2001. The Science of Debugging.
Coriolis.

Poster graphics with the 9 rules can be found at:
http://www.debuggingrules.com/,
but the shorthand form may seem somewhat arcane unless one already
knows the intent.
The best thing is to read the book.
Failing that, however, I offer my interpretation, plus a few
insights:

Know, in great detail, exactly how the system is supposed to
work.
If we do not know when things are right or wrong, how can we trace
back to the first wrong thing?

Find a situation which causes the system to misbehave.
Then we can reproduce it often and track it down.
But if the problem is intermittent, it may be necessary to collect
data on the whole system repeatedly until the problem is detected
to get data worth analyzing.

Don't waste much time theorizing; instead collect real data
from the actual system until the data show the problem.
Bugs tend to be odd or improbable, because the more obvious things
will have been addressed already.
It is often a waste of time to look at the "likely" sources of the
problem, and difficult to imagine "unlikely" sources.

Try to identify sections of the system that are working
correctly.
If we can identify the problem with a particular section, we often
can eliminate much of the system as the cause.
Sometimes we can "comment out" calls to routines without changing
the problem, and then know those routines are not a cause.

Change at most one thing at a time.
This classic statement is well-known throughout experimental
Science and Engineering, for if we change even two things, and the
problem is affected, we do not know which change caused the effect.
Although tedious, we must run the system and check for the problem
after every single change.
And if a change does not help, change it back, so that the
design documents and printouts match what we really have.
Collect proposed design or implementation changes for use
after the bug is found.

Log every debug action, in sequence, for later review.
Especially make copies of the system before changes.
The copies and log entries should make it possible to undo any
significant action, and to check back on symptoms or results from
previous experiments.

Actually check the real system and make sure all your assumed
values are in fact what you think.
Just the fact that you are working on a bug means the system is not
functioning as assumed.
The documentation and listings may themselves be wrong.

Describe the problem to someone else.
The simple act of organizing what we know so we can describe it
may help to see the problem from another angle.
Keep to the facts and avoid jumping to conclusions.

Continue until you can fix the problem and then replace
it again at will.
Only when we can control the bug effect can we be sure that the
real problem has been identified.
It is not sufficient to assume the problem has somehow "gone away"
during debugging and not know how that happened.

In
electronics, the concept of isolating
circuit stages from a common power supply, and from each other.
The basic concept may be useful in modern digital electronics. See
bypass and
amplifier.

Each stage in a typical radio
circuit might use 100 ohm
resistors from
power, and then a bypass
capacitor of perhaps 0.01uF to circuit
ground.
The RF signals in one stage are attenuated at the common power bus
by the frequency-sensitive
voltage divider or
filter consisting of the
resistor and the power supply capacitors.
Any RF signals on the power supply bus are attenuated again by the
resistor to the next stage and that bypass capacitor.

Since a stage current of 20mA could cause a 2V drop across the
100 ohm resistor, smaller resistors, or even chokes using ferrite
beads, might be used instead.

A defined plaintext attack typically needs to send a particular
plaintext value to the internal cipher (thus "knowing" that value),
and get the resulting ciphertext. Typically, a large amount of
plaintext is needed under a single key. A
cipher system which prevents any
one of the necessary conditions also stops the corresponding
attacks.

Many defined plaintext attacks are interactive, and so require
the ability to choose subsequent plaintext based on previous
results, all under one key.
It is relatively easy to prevent interactive attacks by having a
message key facility, changing message keys on each message, and
by handling only complete messages instead of a continuing flow or
stream that an opponent can modify interactively.

In
statistics, the number of completely
independent values in a sample.
The number of sampled values or
observations or bins, less the number of defined or freedom-limiting
relationships or "constraints" between those values.

If we choose two values completely independently, we have a
DF of 2. But if we must choose two values such that the second is
twice the first, we can choose only the first value independently.
Imposing a relationship on one of the sampled value means that we
will have a DF of one less than the number of samples, even though
we may end up with apparently similar sample values.

In a typical
goodness of fit test such as
chi-square, the reference
distribution (the expected counts) is
normalized to give the same number of counts as the experiment. This
is a constraint, so if we have N bins, we will have a DF of N - 1.

In a crystalline
semiconductor like a
diode or
bipolartransistor, the distance of ionization
on both sides of a junction between P and N materials.
Also called "depletion layer," "depletion zone," "junction region"
and "space charge region."
The "depletion" part of this is the lack of majority charge carriers
in the area.

Normally, an N-type semiconductor has no net electrical charge,
but does have an excess of electrons which are not in a stable bond.
Similarly, on its own, P-type semiconductor also has no net charge,
but has a surplus of holes or bond-positions which have no
electrons.
When the two materials occur in the same crystal lattice, there are
opposing forces:

Thermal energy may cause free electrons from the N material to
find themselves in the P material where they can form a bond.
Extra electrons add excess negative charge to formerly balanced
atoms making them negative ions.
Atoms to which the electrons originally belonged will have excess
positive charge as positive ions.

As a consequence, a potential will build between the P and N
materials, thus making it more difficult for N material electrons
to travel into the P material, and less difficult for P material
electrons to travel into the N material.
At some potential, the thermal movement and accumulation of
electrons between N and P balances out or is in equilibrium.

The result is a static potential or bias due solely to the material
physics.

For diode conduction to occur, sufficient potential must be
applied so that the depletion field effect is overwhelmed.
The depletion field (or "junction voltage" or "barrier voltage")
is typically about 0.6V in silicon.

DES is a 64-bitblock cipher with a 56-bitkey. However, a
keyspace of 256 keys
is now too small for serious work.
A keyspace of at least 80 key bits is now recommended, and AES jumped
to 128 and 256 key bits.
One possibility for DES is to form the
multiple encryption known as
Triple DES: three sequential cipherings
by DES, each with an independent key.
That should produce an expected strength of something like 112 bits,
which is more than enough to defeat
brute force attacks.

The mechanics of DES are widely available elsewhere.
Here I note how one particular issue common to modern block ciphers
is reflected in DES. A common academic
model for conventional block ciphers
is a "family of
permutations."
The issue is the size of the implemented
keyspace compared to the size of the
potential keyspace for
blocks of a given size.

For 64-bit blocks and 56-bit keys, DES provides:

256 keyed or emulated tables, out of about

21,000,000,000,000,000,000,000
possibilities.

The obvious conclusion is that almost none of the keyspace
implicit in the model is actually implemented in DES, and that is
consistent with other modern block cipher designs.
While that does not make modern ciphers weak, it is a little disturbing.
See more detailed comments under
AES.

A process whose sequence of operations is fully determined
by its initial
state. A mechanical or clockwork-like
process whose outcome is inevitable, given its initial setting.
Pseudorandom. See:
finite state machine. As opposed to
stochastic.

Deterministic Finite Automata

The theoretical concepts of deterministic finite automata (DFA)
are usually discussed as an introduction to lexical analysis and
language grammars, as used in compiler construction.
In that model, computation occurs in a
finite state machine, and the
computation sequence is modeled as a network, where each node
represents a particular
state value and there is exactly one
arc
from each node to one other node.
Normal program execution steps from the initial node to another
node, then another, step by step, until the terminal node is
reached.
In general, the model corresponds well both to
software sequential instruction execution,
and to
synchronousdigitalhardware operations controlled by hardware
clockcycles.

Within this theory, however, at least two different definitions for
"nondeterministic" are used:

a FSM which can pass through multiple state values at a
single step, and

a FSM which has more than one exit path from at least one node.

Such executions can be called "nondeterministic" in sense (1) that
the number of execution cycles may vary, and sense (2) that the
execution path may vary, from run to run.
Unfortunately, this established
model of FSM determinism can lead
to confusion in cryptographic analysis.

Nondeterministic Cryptographic Computation

In
cryptography, we are usually
interested in "nondeterministic" behavior to the extent that it is
unpredictable.
One example of cryptographic nondeterministic behavior would be a
really random sequence generator.
Programs, on the other hand, including most
random number generators,
are almost always deterministic.
Even programs which use random values to select execution paths
generally get those values from deterministic statistical RNG's,
making the overall computation cryptographically predictable.

One issue of cryptographic determinism is the question of whether
user input or mechanical latencies can make a program "nondeterministic."
To the extent that user input occurs at a completely arbitrary time,
that should represent some amount of
uncertainty or
entropy.
In reality, though, it may be that certain delays are more likely
than others, thus making the uncertainty less than it might seem.
Subsequent program steps based on the user input cannot increase
the uncertainty, being simply the expected result of a particular
input.

If hardware device values or timing occur in a completely
unpredictable manner, that should produce some amount of
uncertainty.
But computing hardware generally is either completely exposed or
strongly predictable.
For example, disk drives can appear to have some access-time
uncertainty based on the prior position of the read arm and
rotational angle of the disk itself.
But if the arm and disk position information is known, there is
relatively little uncertainty about when the desired track will be
reached and the desired sector read.

If disk position state was actually unknowable, we could
have a source of cryptographic uncertainty.
But disk position is not unknowable, and indeed is fairly
well defined after just a single read request.
Subsequent operations might be largely predictable.
Normally we do not consider disk position state, or other computer
hardware state, to be sufficiently protected to be the source of
our security.
In the best possible case, disk position state might be
hidden to most opponents, but that hope is probably not
enough for us to assign cryptographic uncertainty to those values.

Similar arguments pertain to most automatic sources of supposed
computer uncertainty. Also see
really random.

Latin for: "god from the Machine," translated from the
even older Greek.
Originally, a device for playwrites to resolve even the most
complex plot line, as in: "A machine with a god hidden inside
appears and fixes everything."

More generally, the idea that results unimaginable to any
normal person or impossible for any normal design are actually
from a god inside and not the machine itself.

Greek for "cut in two." In the study of
logic and
argumentation, the division of a
topic into exactly two normally distinct and mutually-exclusive
categories.
The goal is to have a single clear distinction by which every item
in the universe of discourse can be assigned to one class or the
other.
The power of the technique obviously depends upon the extent to
which the above goal is achieved.

A true dichotomy adds tremendous power to
analysis by identifying particular
effects with particular categories.
Finding one such effect thus identifies a category, which then
predicts the rest of the effects of that category and the lack of
the effects of the opposing category.
And when only two categories exist, even seeing the lack
of the effects from one category necessarily implies the other.

The basic idea of Differential Cryptanalysis is to first cipher
some plaintext, then make particular changes in that plaintext and
cipher it again.
Particular ciphertext differences occur more frequently with some
key values than others, so when those differences occur, particular
keys are (weakly) indicated.
With huge numbers of tests, false indications will be distributed
randomly, but true indications always point at the same key values
and so will eventually rise above the noise to indicate some part
of the key.

Diffusion is the property of an operation such that changing
one
bit
(or byte) of the input will change adjacent
or near-by bits (or bytes) after the operation. In a
block cipher, diffusion propagates
bit-changes from one part of a block to other parts of the block.
Diffusion requires
mixing, and the step-by-step process of
increasing diffusion is described as
avalanche.
Diffusion is in contrast to confusion.

Normally we speak of data diffusion, in which changing
a tiny part of the plaintext data may affect the whole ciphertext.
But we can also speak of key diffusion, in which changing
even a tiny part of the
key should change each bit in the
ciphertext with probability 0.5.

Perhaps the best diffusing
component is
substitution, but
this diffuses only within a single substituted value.
Substitution-permutation
ciphers extend diffusion beyond a single value by moving the bits
of each substituted element to other elements, substituting again,
and repeating.
But this only provides guaranteed diffusion if particular
substitution tables are constructed.

Pertaining to discrete or distinct finite values. As
opposed to
analog
or continuous quantities.
Values reasonably represented as
Boolean or
integer quantities, as opposed to values
that require
real numbers for accurate representation.

The idealized
model of an
electronic device with two terminals
which allows
current to flow in only one direction.

In practice, real diodes do allow some "leakage" current in the
reverse direction, and leakage approximately doubles for every
10degC increase in
temperature.

Semiconductor junctions also have a
"forward" voltage (typically 0.6V in silicon, or 0.3V in germanium
and for
Schottky
devices in silicon) which must be exceeded for conduction to occur.
This bias voltage is basically due to the semiconductor
depletion region.
The forward voltage has a negative temperature coefficient of about
-2.5mV/degC, in either silicon or germanium.

Semiconductor junctions also have a dynamic resistance (for small
signals) that varies inversely with current:

As a result of the forward voltage and internal
resistance,
power is dissipated as heat when current
flows.
Diodes are made in a range from tiny and fast signal devices
through large and slow high power devices packaged to remove
internal heat.
There is a wide range of constructions for particular properties,
including photosensitive and light-emitting diodes (LED's).

All real diodes have a reverse "breakdown"
voltage,
above which massive reverse conduction can occur. (See
avalanche multiplication and
Zener breakdown.)
Real devices also have current, power, and temperature limitations
which can easily be exceeded, a common result being a wisp of smoke
and a short-circuit connection where we once had a diode.
However, if the current is otherwise limited, diode breakdown can
be exploited as a way to "regulate" voltage (although IC regulator
designs generally use internal "bandgap" references for better
performance).

All diodes break down, but those specifically designed to do so
at particular low voltages are called Zener diodes (even if
they mainly use avalanche multiplication).
A semiconductor junction in reverse voltage breakdown typically
generates good-quality
noise which can be
amplified and exploited for use
(or which must be filtered out).
A bipolar
transistor base-emitter junction can be
used instead of a Zener or other diode.
Another noise generation alternative is to use an IC which has a
documented and useful noise spectrum, such as the "IC Zener" LM336
(see some noise circuits
locally, or @:
http://www.ciphersbyritter.com/NOISE/NOISRC.HTM).
Also see
avalanche multiplication and
Zener breakdown.

A distinguisher makes a contribution to cryptanalysis by showing
that a model does not work for the particular cipher.
The problem comes in properly understanding what it means to not
model the reality under test.
In many cases, a successful distinguisher is presented as a
successful attack, with the stated or implied result being that the
cipher is
broken.
But that goes beyond what is known.
When a scientific model is shown to not apply to reality, that does
not make reality "wrong."
It just means that the tested model is not useful.
Maybe the best implication is that a new model is needed.

Distinguishers generally come under the heading of "computational
indistinguishability," and much of this activity occurs in the area
of conventional block ciphers.
One of the problems of that area is that many cryptographers interpret
"block cipher" as an emulated, huge
simple substitution (that is, a
key-selected
pseudorandompermutation).
But it is entirely possible for ciphers to work on blocks yet not
fit that model.
Clearly, if a cipher does not really function in that way, it may
be possible to find a distinguisher to prove it.

The real issue in cipher design is
strength.
The problem is that we have no general measure to give us the
strength of an abstract cipher.
But a distinguisher provides testimony about conformance to model,
not strength.
A distinguisher simply does not testify about weakness.

If we have a discrete distribution, with a finite number
of possible result values, we can speak of "frequency" and
"probability" distributions:
The "frequency distribution" is the expected number of
occurrences for each possible value, in a particular
sample size.
The "probability distribution" is the probability of getting
each value, normalized to a probability of 1.0 over the sum of all
possible values.

Here is a graph of a typical "discrete probability distribution"
or "discrete probability density function," which displays the
probability of getting a particular statistic value for the case
"nothing unusual found":

Unfortunately, it is not really possible to think in the same way
about continuous distributions: Since continuous distributions have
an infinite number of possible values, the probability of getting
any particular value is zero. For continuous distributions,
we instead talk about the probability of getting a value in some
subrange of the overall distribution. We are often concerned with
the probability of getting a particular value or below, or the
probability of a particular value or above.

Here is a graph of the related "cumulative probability distribution"
or "cumulative distribution function" (c.d.f.)
for the case "nothing unusual found":

The c.d.f. is just the sum of all probabilities for a given value
or less. This is the usual sort of function used to interpret a
statistic: Given some result, we can
look up the probability of a lesser value (normally called
p) or a greater value (called
q = 1.0 - p).

Usually, a test statistic is designed so that extreme values are
not likely to occur by chance in the case "nothing unusual found"
which is the
null hypothesis. So if we do
find extreme values, we have a strong argument that the results were
not due simply to random sampling or other random effects, and may
choose to reject the null hypothesis and thus accept the
alternative hypothesis.

Usually the ideal distribution in
cryptography is "flat" or
uniform.
Common discrete distributions include:

In abstract algebra, the case of a
dyadic operation, which may be called
"multiplication," which can be applied to equations involving
another dyadic operation, which may be called "addition," such
that:
a(b + c) = ab + ac and
(b + c)a = ba + bc.

The general concept of being able to split a complexity into
several parts, each part naturally being less complex than the
total. If this is possible, the
opponent may be able to solve all
of the parts far easier than the supposedly complex whole.
Often part of an attack.

This is a particular danger in cryptosystems, since most ciphers
are built from less-complex parts. Indeed, a major role of
cryptographic design is to combine small
component parts into a larger complex
system which cannot be split apart.

If shuffling is implemented so the shuffling sequence is used as
efficiently as possible, simply knowing the resulting permutation
should suffice to reconstruct the shuffling sequence, which is the
first step toward
attacking the
RNG.
While common shuffle implementations do discard some of the
sequence, we can guarantee to use at least twice as much
information as the table or block can represent simply by shuffling
twice.
Double-shuffling will not produce any more permutations, but it
should prevent the mere contents of a permuted table or block from
being sufficient to reconstruct the original shuffling sequence.

In a sense, double-shuffling is a sort of one-way information
valve which produces a key-selected permutation, and also hides the
shuffling sequence which made the selection.

1. Digital Signal Processor. A type of microprocessor optimized
for fast multiplication and value accumulation as often needed in
signal processing operations like
FFT.
2. Digital Signal Processing. The translation of analog signals
into a stream of digital values, which can be analyzed and modified
by mathematical computation.

A
term of art in finance and law:
An expectation of thorough investigation.
It is expected that a buyer actually check the substance of claims
and the assumptions involved for any transaction or investment
opportunity.

That aspect of a cipher which allows a
key to be changed with
minimal overhead. A dynamically-keyed
block cipher might impose
little or no additional computation to change a key on a
block-by-block basis. The dynamic aspect of keying could be
just one of multiple keying mechanisms in the same cipher.

Another way to have a dynamic key in a block cipher is to add a
confusionlayer which mixes the key value with the
block. For example, exclusive-OR could be used to mix a 64-bit
key with a 64-bit data block.

Dynamic Substitution is a
substitution table
in which the arrangement of the entries changes during operation.
This is particularly useful as a strong replacement for the
strengthless
exclusive-OR combiner in
stream ciphers.

In the usual case, an invertible substitution table is keyed by
shuffling under the control of a
random number generator.
One combiner input is used to select a value from within the table
to be the result or output; that is normal substitution.
But the other combiner input is used to select an entry at random,
then the values of the two selected entries are exchanged.
So as soon as a
plaintext mapping is used for output,
it is immediately reset to any possibility, and the more often any
plaintext value occurs, the more often that particular
transformation changes.

The arrangement of a
keyedsubstitution table starts out
unknown to an
opponent.
From the opponent's point of view, each table entry could be any
possible value with uniform probability.
But after the first value is mapped through that table, the
just-used table entry or transformation is at least potentially
exposed, and no longer can be considered unknown.
Dynamic Substitution acts to make the used transformation again
completely unknown and unbiased, by allowing it to again take on
any possible value.
Thus, the amount of information leaked about table contents is
replaced by information used to re-define each just-used entry.

One goal of Dynamic Transposition is to leverage the concept of
ShannonPerfect Secrecy
on a block-by-block basis.
In contrast to the usual ad hocstrength claims, Perfect Secrecy has
a fundamental basis for understanding and believing cipher
strength.
That basis occurs when the ciphering operation can produce any
possible transformation between
plaintext and
ciphertext. As a result, even
brute force no longer works,
because running through all possible keys just produces all
possible block values.
And, in contrast to conventional block ciphers, which actually
implement only an infinitesimal part of their theoretical model,
each and every Dynamic Transposition permutation can be made
practically available.

One interesting aspect of Dynamic Transposition is a fundamental
hiding of each particular ciphering operation.
Clearly, each block is ciphered by a particular permutation.
If the opponent knew which permutation occurred, that would be
useful information.
But the opponent only has the plaintext and ciphertext of each block
to expose the ciphering permutation, and a vast plethora of
different permutations each take the exact same plaintext to the
exact same ciphertext.
(This is because wherever a '1' occurs in the ciphertext, any
plaintext '1' would fit.)
As a consequence, even
known-plaintext attack
does not expose the ciphering permutation, which is information
an opponent would apparently need to know.
The result is an unusual block cipher with an unusual fundamental
basis in strength.

In Ebers-Moll the collector current is a function of base-emitter
voltage, not current.
Unfortunately, base-emitter voltage Vbe itself varies both as a
function of collector current
(delta Vbe = 60mV per power of ten collector current),
and temperature (delta Vbe = -2.1mV per deg C).
Vbe also varies slightly with collector voltage Vce
(delta Vbe ~ -0.0001 * delta Vce),
which is known as the Early effect.

In a process using a resource to produce a result, the
ratio of the result to the resource.

In
cryptography two aspects of efficiency
concern
keys.
Keys are often generated from
unknowable bits obtained from
really-random generators.
When unknowable bits are a limited resource there is motive both
to decrease the amount used in each key, and to increase the amount
being generated.
However, both of these approaches have the potential to weaken the
system.
In particular, insisting on high efficiency in really-random
post-processing can lead to reversible processing which is the exact
opposite of the goal of unknowable bits.

Extra unknowable bits may cover unknown, undetected problems both
in key use (in the cipher), and in key generation (in the randomness
generator).
Because there is so much we do not know in cryptography, it is
difficult to judge how close we are to the edge of insecurity.
We need to question the worth of efficiency if it ends up helping
opponents to break a cipher.
A better approach might be to generate high quality unknowability,
and use as much of it as we can.

The remarkable self-propagating physical field consisting of
energy in synchronized and changing
electric and
magnetic fields.
Energy in the electric or potential field collapses and creates or
"charges up" a magnetic field.
Energy in the magnetic field collapses and "charges up" an electric
field.
This process allows physical electrical and magnetic fields which
normally are short-range phenomena to "propagate" and thus carry
energy over relatively large distances at the speed of light.
Light itself is an electromagnetic field or wave.
Other examples include "radio" waves (including TV, cell phones,
etc.), and microwave cooking.

It is important to distinguish between a long-distance propagating
electromagnetic field and simpler and more range-limited independent
electric and magnetic fields.

It is unnecessary to consider how a field has been generated.
Exactly the same sort of magnetic field is produced either by
solid magnets or by passing DC current through a coil of wire
making an electromagnet.
A field from an electromagnet is not necessarily an electromagnetic
field in the sense of a propagating wave; it is just another
magnetic field.

Changing magnetic fields can be produced by forcing magnets to
rotate (as in an alternator) or changing the current through an
electromagnet.
Typical dynamic magnetic field sources might include AC motor clocks,
mixing motors, fans, or even AC power lines.
It would be extremely difficult for low-frequency changes or physical
movement to generate a propagating electromagnetic field.

Radio frequency voltage is the basis of most radio transmission.
Radio antenna designs convert RF power into synchronized electric
and magnetic fields producing a true electromagnetic field which
can be radiated into space.

It is important to distinguish between the expanding or
"radiating" property of an electromagnetic field, as opposed to the
damaging ionizing radiation produced by a radioactive source.

As far as we know
--
and a great many experiments have been conducted on this
-- electromagnetic waves are not life-threatening
(unless they transfer enough power to dangerously heat the water in
our cells).
The belief that electromagnetic fields are not dangerous is also
reasonable, since light itself is an electromagnetic wave,
and life on Earth developed in the context of the electromagnetic
field from the Sun. Indeed, plants actually use that field to their
and our great benefit.

(EMI). Various forms of
electromagnetic
or radio waves which interfere with communication.
This can include incidental interference, as from a running
AC/DC motor, and interference from otherwise legitimate sources
(as when a mobile transmitter is operated close to and thus
overloads a receiver).

Reducing emitted EMI and dealing with encountered EMI
is an issue in most modern electronic design. Also see
TEMPEST and
shielding.

ESD. In
electronics,
the instantaneous discharge of high
voltagestatic electricity
through invisible sparks or arcs due to breakdown of air insulation.
Static charge can accumulate to very high voltages (but relatively
low power) in normal activities, such as walking or sliding
across a chair seat.
ESD is significant in that the high voltage involved can break
down the tiny PN junctions in
semiconductor components even though
the power involved is tiny.
This breakdown can cause outright failure or
-- worse -- degraded
or unreliable operation.

To prevent ESD, we "simply" prevent the accumulation of static
electricity, or prevent discharge through a sensitive device.
Some approaches include the use of high-resistance ESD surfaces to
keep equipment at a known potential (typically "ground"), conductive
straps to connect people to the equipment (or "ground") before they
touch it, and ample humidity to improve static discharge through air.
Other measures include the use of "ESD shoes" to ground individuals
automatically, the use of metalized insulated bags, and improved ESD
protection in the devices themselves.

Unless grounded, two people are rarely at the same electrical
potential or voltage, so handing a sensitive board or device from
one person to another can complete a
circuit
for the discharge of static potential.
That could be prevented by shaking hands before giving a board to
another person, or by placing the board on an ESD surface to be
picked up.

ECB is the naive method of applying a block cipher, in that
the plaintext is simply partitioned into
appropriate size blocks, and each block is enciphered separately
and independently. When we have a small block size, ECB is
generally unwise, because language text has biased statistics which
will result in some block values being re-used frequently, and this
repetition will show up in the raw ciphertext.
This is the basis for a successful
codebook attack.

On the other hand, if we have a large block (at least, say, 64
bytes), we may expect it to contain enough
unknowable uniqueness or
"entropy" (at least, say, 64
bits) to prevent a codebook attack.
In that case, ECB mode has the advantage of supporting independent
ciphering of each block.
That, in turn, supports various things, like ciphering blocks
in arbitrary order, or the use of multiple ciphering hardware
operating in parallel for higher speeds.

Modern packet-switching network technologies often deliver raw
packets out of order. The packets will be re-ordered eventually,
but having out-of-sequence packets can be a problem for low-level
ciphering if the blocks are not ciphered independently.

1. The
modeling activity associated
with designing new constructions or devices to behave as predicted.
Often directed toward achieving a known goal at minimal cost or
risk.
Also the use of a specialized body of knowledge to
analyze and predict the behavior of a
complex
system.
2. The product definition and
specification activities
associated with new design and construction.
3. The
testing activity of assuring that materials
and constructions meet specifications.
4. The inventive activity of creating
novel and unexpected designs, sometimes to the point of being beyond
the understanding of conventional wisdom.
Also the resulting accumulation of a body of knowledge; see
patents.
5. The experimental activity of measuring designs, improving on
those results and creating better, more optimal designs than those
currently known, even with comprehensive theory.
Sometimes the use of new devices in old systems.
6. The
heuristic activity of building the best
designs and constructions possible in absence of comprehensive
theory, often by trial-and-error; see
Software Engineering.
7. The planning and control activity associated with the coordination
of multiple designers and constructors and their needed equipment and
resources, potentially expensive and long-lead-time construction
materials, across time, in the implementation of large complex
systems; see
Software Engineering,
system design and
risk management.

H is in
bits per symbol when the log is taken
to base 2. Also called "communications entropy."

The idea of disorder from physics:
Eventually, everything breaks into disorder.
Eventually, the universe will die the "heat death" of
evenly-distributed energy.
While not completely incompatible with Shannon entropy, "disorder"
can mean whatever is convenient since it has no specific measure.
In contrast, Shannon entropy deals with information and bits, not
matter.

Mystical
unpredictability.
The Shannon entropy computation
can measure the information rate, but it
cannot distinguish between predictable and
unpredictable information.
Using "entropy" to imply "unpredictable" is simply inconsistent
with the Shannon information theory computation.

Classical Introductions

In the original literature, and even thereafter, we do not find
what I would accept as a precise word-definition of
information-theoretic entropy.
Instead, we find the development of a specific numerical computation
which Shannon names "entropy."

Apparently the term "entropy" was taken from the physics because
the form of the computation in information theory was seen
to be similar to the form of "entropy" computations used in
physics.
The "entropy" part of this is thus the formal similarity of
the computation, instead of a common underlying idea, as is often
supposed.

The meaning of Shannon entropy is both implicit in and limited by
the specific computation.
Fortunately for us, the computation is relatively simple (as these
things go), and it does not take a lot of secondary, "expert"
interpretation to describe what it does or means.
We can take a few simple, extreme distributions and easily
calculate entropy values to give us a feel for how the
measure works.

Basically, Shannon entropy is a measure of
coding efficiency in terms of
information bits per communicated bit.
It gives us a measure of optimal coding, and the advantage is that
we can quantify how much we would gain or lose with a different
coding.
But no part of the computation addresses the context required for
"uncertainty" about what we could or could not predict.
Exactly the same values occur, giving the same entropy result,
whether we can predict a sequence or not.

"Suppose we have a set of possible events whose
probabilities of occurrence are p1,
p2, ..., pn.
These probabilities are known but that is all we know concerning
which event will occur.
Can we find a measure of how much 'choice' is involved in the
selection of the event or of how uncertain we are of the outcome?"
--Shannon, C. E. 1948.
A Mathematical Theory of Communication.
Bell System Technical Journal.27:379-423.

"In a previous paper the entropy and redundancy of a language have
been defined.
The entropy is a statistical parameter which measures, in a certain
sense, how much information is produced on the average for each
letter of a text in the language.
If the language is translated into binary digits (0 or 1) in the
most efficient way, the entropy H is the average number of
binary digits required per letter of the original language."
--Shannon, C. E. 1951.
Prediction and Entropy of Printed English.
Bell System Technical Journal.30:50-64.

"[Even if we tried] all forms of encoding we could think of, we
would still not be sure we had found the best form of encoding, for
the best form might be one which had not occurred to us."
"Is there not, in principle at least, some statistical measurement
we can make on the messages produced by the source, a measure which
will tell us the minimum average number of binary digits per symbol
which will serve to encode the messages produced by the source?"
-- Pierce, J. R. 1961.
Symbols, Signals and Noise.
Harper & Row.

"If we want to understand this entropy of communication theory, it
is best first to clear our minds of any ideas associated with the
entropy of physics."
". . . the literature indicates that some workers have never
recovered from the confusion engendered by an early admixture of
ideas concerning the entropies of physics and communication theory."
-- Pierce, J. R. 1961.
Symbols, Signals and Noise.
Harper & Row.

Entropy and Predictability

When we have a sequence of values from a random variable, that
sequence may or may not be
predictable.
Unfortunately, there are virtually endless ways to predict a
sequence from past values, and since we cannot test them all, we
generally cannot know if a sequence is predictable or not (see
randomness testing).
So until we can predict a sequence, we are "uncertain" about each new
value, and from that point of view, we might think the "uncertainty"
of the sequence is high. That is how all
RNG sequences look at first.
It is not until we actually can predict those values
that we think the "uncertainty" is low, again from our individual
point of view.
That is what happens when the inner state of a statistical RNG is
revealed.
So if we expect to interpret entropy by what we can predict, the
result necessarily must be both contextual and dynamic.
Can we seriously expect a simple, fixed computation to automatically
reflect our own changing knowledge?

In practice, the entropy computations use actual sequence values,
and will produce the same result whether we can predict those values
or not.
The entropy computation uses simple frequency-counts reflected as
probabilities, and that is all.
No part of the computation is left over for values that have anything
to do with prediction or human uncertainty.
Entropy simply does not discriminate between information on the basis
of whether we can predict it or not.
Entropy does not measure how unpredictable the information is.
In reality, entropy is a mere
statistical measure of
information rate or
coding efficiency.
Like other statistical measures, entropy simply ignores the puzzles
of the
war between
cryptographers and their
opponents the
cryptanalysts.

Entropy(1) is useful in
coding theory and
data compression, but requires a
knowledge of the probabilities of each value which we usually know by
sampling.
Consequently, in practice, we may not really know the "true"
probabilities, and the probabilities may change through time.
Furthermore, calculated entropy(1) does not detect any
underlying order that might exist between value probabilities, such as a
correlation, or a
linear relationship, or any other aspect of
cryptographically-weak randomness.

The limit to the unpredictability in any
deterministicrandom number generator
is just the number of bits in the
state of that generator, plus knowledge
of the particular generator design.
Nevertheless, most RNG's will have a very high calculated
entropy(1) in bits-per-symbol, because each symbol or
value is produced with about the same probability, despite the
sequence being ultimately predictable.
Accordingly, a high entropy(1) value does not
imply that a source really is random, or that it produces
unknowable randomness at any rate, or indeed have any relationship
at all to the amount of
unknowable cryptographic randomness
present, if any.

On the other hand, it is very possible to consider pairs of
symbols, or triples, etc.
The main problem is practicality, because then we need to collect
exponentially more data, which can be effectively impossible.
Nevertheless, if we have a trivial toy
RNG such as a
LFSR with a long single cycle, and which
outputs its entire state on each step, it may be possible to
detect problems using entropy(1).
Although a random generator should produce any possible next
value from any state, we will find that each state leads into
the next without choice or variation.
But we do not need entropy(1) to tell us this, because we know
it from the design.
On the other hand, cryptographic RNG's with substantial internal
state which output only a small subset of their state are far too
large to measure with entropy(1).
And why would we even want to, when we already know the
design, and thus the amount of internal state, and thus the
maximum entropy(3) they can have?

Some fairly new and probably useful formulations have been given
which include the term "entropy," and so at first seem to be other
kinds of entropy.
However, the name "entropy" came from a formal similarity to
the computation in physics.
To the extent that new computations are less similar to the
physics, they confuse by including the term "entropy."

"[The] set of a posteriori probabilities describes how the
cryptanalyst's knowledge of the message and key gradually becomes
more precise as enciphered material is obtained.
This description, however, is much too involved and difficult for
our purposes.
What is desired is a simplified description of that approach to
uniqueness of the possible solutions."

". . . a natural mathematical measure of this uncertainty is the
conditional entropy of the transmitted signal when the received
signal is known. This conditional entropy was called, for
convenience, the equivocation."

". . . it is natural to use the equivocation as a theoretical
security index. It may be noted that there are two significant
equivocations, that of the key and that of the message.
These will be denoted by
HE(K) and
HE(M)
respectively.
They are given by:

HE(K) = SumE,K[ P(E,K) log PE(K) ]
HE(M) = SumE,M[ P(E,M) log PE(K) ]

in which E, M and K are the cryptogram, message and key and

P(E,K) is the probability of key K and
cryptogram E

PE(K) is the a posteriori
probability of key K if cryptogram E is
intercepted

P(E,M) and PE(M) are the
similar probabilities for message instead of key.

"The summation in
HE(K) is over all
possible cryptograms of a certain length (say N letters)
and over all keys.
For
HE(M) the summation is over all
messages and cryptograms of length N. Thus
HE(K) and
HE(M)
are both functions of N, the number of intercepted letters."

Here we have all three possible sequences from a non-ergodic
process:
across we have the average of symbols through time (the
"temporal average"), and
down we have the average of symbols in a particular position
over all possible sequences (the "ensemble average"):

When a process is ergodic, every possible ensemble average is equal
to the time average.
As increasingly long sequences are examined, we get increasingly
accurate probability estimates.
But when a process is non-ergodic, the measurements we take over time
from one or a few sequences may not represent all possible sequences.
And measuring longer sequences may not help. Also see
entropy.

A data
coding
which adds redundant information to the data representation, so
that if data are changed in storage or transit, some amount of
damage can be corrected. Also see:
error detecting code and
block code.

A data
coding
which adds redundant information to the data representation, so
that if data are changed in storage or transit, the damage can
be detected with high probability, but not certainty. Often a
hash like
CRC.
Other examples include:
checksum and
parity. Also see:
error correcting code and
block code.

1. Information used in
reasoning toward
proof.
2. Information that tells us what happened. As opposed to
risk, which addresses what might happen
in the future.

Although most hard sciences can depend upon experimental
measurement to answer basic questions,
cryptography is different.
Often, issues must be
argued on lesser evidence, but science
rarely addresses kinds of evidence, or how conclusions might be
drawn.

A quote attributed to Carl Sagan is that: "extraordinary
claims require extraordinary evidence."
Apparently this comment repeatedly occurred in a long-running
battle between the famous astronomer and UFO
believers. Also see
claim and
burden of proof.

It is easy to sympathize with the quote, but it is also easily
misused:
For example, what, exactly, is being "claimed"?
Is that stated, or do we have
argument by innuendo?
And just who determines what is "extraordinary"?
And what sort of evidence could possibly be sufficiently
"extraordinary" to convince someone who has a contrary
bias?

In a
scientific discussion or
argument, a
distinction must be made between what is actually known
as fact and what has been merely assumed and accepted
for lo, these many years.
Often, what is needed is not so much
evidence, as the
reasoning to expose
conclusions which have drawn far beyond
the evidence we have.
Even if well-known "experts" have chosen to
believe overdrawn conclusions, that
does not make those conclusions correct, and also does not require
new evidence, let alone anything "extraordinary."

In a cryptographic context, an extractor is a
mechanism which
produces the inverse effect of a
combiner. This allows data to be
enciphered in a combiner, and then
deciphered in an extractor.
Sometimes an extractor is exactly the same as the combiner, as is
the case for
exclusive-OR.

(FMEA). An analytical process used in fault analysis or
risk analysis to organize the
examination of
component or subprocess
failure and consequential results.

The system under analysis is considered to be a set of
components or
black box elements, each of which may fail.
By considering each component in turn, the consequences of a
failure in each particular component can be extrapolated, and
the resulting costs or dangers listed.
For each failure it is generally possible to consider alternatives
to minimize either the probability or the effect of such failure.
Things that might be done include:

improve component
quality so that the usual failure modes no
longer apply, or at least occur less frequently;

use multiple redundant components organized such that any
single component failure does not produce a system failure; and

In the philosophical study of
logic, sometimes apparently-reasonable
arguments that do not maintain truth,
and often lead to false
conclusions.
A fallacious argument does not mean that a conclusion is wrong,
only that the conclusion is not supported by that argument.
Although most authors agree on the classical fallacies, there is
no single name for various modern fallacies, and different
authors give different organizations.
There can be no exhaustive list.
Also see:
inductive reasoning,
deductive reasoning,
proof,
rhetoric, and
propaganda.
Common fallacies include:

Argument By
Innuendo -- directing the reader
to an unwarranted and usually false conclusion without
stating the conclusion directly so it can be challenged,
or hiding the argument itself behind a mere claim, so the
argument cannot be challenged.

If two Boolean functions are not correlated, we expect
them to agree half the time, which we might call the "expected
distance." When two Boolean
functions are correlated, they
will have a distance greater or less than the expected distance,
and we might call this difference the
unexpected distance or UD.
The UD can be positive or negative, representing distance to a
particular affine function or its complement.

It is easy to do a fast Walsh transform by hand. (Well, I say
"easy," then always struggle when I actually do it.)
Let's do the FWT of function f : (1 0 0 1 1 1 0 0).
First note that f has a binary power length, as required.
Next, each pair of elements is modified
by an "in-place
butterfly"; that is, the values in each
pair produce
two results which replace the original pair, wherever they were
originally located. The left result will be the two values added;
the right will be the first less the second. That is,

(a',b') = (a+b, a-b)

So for the values (1,0), we get (1+0, 1-0) which is just (1,1).
We start out pairing adjacent elements, then every other element,
then every 4th element, and so on until the correct pairing is
impossible, as shown:

The result is the
unexpected distance to each
affine Boolean function.
The higher the absolute value, the greater the "linearity";
if we want the
nonlinearity, we must subtract
the absolute value of each unexpected distance from the expected
value, which is half the number of bits in the function. Note that
the range of possible values increases by a factor of 2 (in both
positive and negative directions) in each sublayer mixing; this is
information expansion, which we often try to avoid in cryptography.

The FWT provides a strong mathematical basis for
block cipher mixing such that all input
values will have an equal chance to affect all output values.
Cryptographic mixing then occurs in butterfly operations based on
balanced block mixing structures
which replace the simple add / subtract butterfly in the FWT and
confine the value ranges so information expansion does not occur.
A related concept is the well-known
FFT, which can use exactly the same mixing
patterns as the FWT.

(FTA). An analytical process used in fault analysis or
risk analysis to
organize the examination of particular undesirable outcomes.
(See
tree analysis.)

First, the various undesired outcomes are identified.
Then each sequence of events which could cause such an
outcome is also identified.

If it is possible to associate an independent probability with
each event, it may be possible to compute an overall probability
of occurrence of the undesirable outcome.
Then one can identify the events which make the most significant
contributions to the overall result.
By minimizing the probability of those events or reducing their
effect, the overall probability of the negative outcome may
be reduced.

Various system aspects can be investigated, such as unreliability
(including the effect of added redundancy), system failure (e.g.,
the causes of a plane crash), and customer dissatisfaction.
A potential advantage is efficiency when multiple faults seem to
converge in a particular node, since it may be possible to modify
that one node to eliminate the effect of many faults at once.
However, that also would be contrary to a "defense in depth" policy
of protecting all levels, wherever a fault might occur or propagate.

Returning the output back to the input. In
electronics, a major tool in
circuit design. Used with
analogamplifiers, negative feedback can
improve
linearity (reduce distortion).
In contrast, positive feedback can increase amplification
and is the fundamental basis for
oscillation.
Feedback is also used in
digital systems such as
LFSR's and most other
RNG's.
to create long but predictable sequences.
In cryptography, feedback is used in
autokeystream ciphers to continually
add ciphertext complexity to the state of the confusion or
running key RNG.

Horst Feistel, a senior employee of IBM in the 60's and
70's, responsible for the
Feistel construction
used in the early Lucifer cipher and then DES itself.
Awarded a number of important crypto patents, including:
3,768,359 and 3,768,360 and 4,316,055.

Normally, in a Feistel construction, the input block is split
into two parts, one of which drives a transformation whose result
is exclusive-OR combined into the other block. Then the "other
block" value feeds the same transformation, whose result is
exclusive-OR combined into the first block. This constitutes 2 of
perhaps 16 "rounds."

L R
| |
|--> F --> + round 1
| |
+

One advantage of the Feistel construction is that the
transformation does not need to be invertible. To reverse any
particular layer, it is only necessary to apply the same
transformation again, which will undo the changes of the original
exclusive-OR.

A disadvantage of the Feistel construction is that
diffusion
depends upon the internal transformation. There is no guarantee of
overall diffusion, and the number
of rounds required is often found by experiment.

Fencing is a
term-of-art which describes a layer of
substitution tables.
In
schematic or data-flow diagrams,
the row of tiny substitution boxes stands like a picket fence
between the data on each side.

A fencing
layer is a
variable size block cipher
layer composed of small (and therefore realizable)
substitutions. Typically
the layer contains many separate
keyedsubstitution tables. To
make the layer extensible, either the substitutions can be
re-used in some order, or in some pre-determined sequence, or
the table to be used at each position selected by some computed
value.

Fast Fourier Transform. A numerically advantageous
way of computing a
Fourier transform.
Basically a way of transforming information between
amplitude values sampled periodically
through time, and amplitude values sampled across
complexfrequency. The FFT performs this
transformation in time proportional to n log n, for some n a
power of 2.

While exceedingly valuable, the FFT tends to run into practical
problems in use which can require a deep understanding of the
process:

An FFT assumes that the waveform is
stationary and thus repetitive
and continuous, which is rarely the case.

Sampling a continuous wave can create spurious "frequency"
values related to the sampling and not the wave itself.

The range of possible values increases by a factor of 2 (in
both positive and negative directions) in every sublayer mixing;
this is information expansion, which we often try to avoid in
cryptography.

An FFT transforms the sampled wave into specific precise
frequency components, not frequency bands: in most signals other
frequencies will dominate and may not be represented as expected
in FFT results.

The FFT provides a strong mathematical basis for
block cipher mixing in that all input
values will have an equal chance to affect all output values.
But an ordinary FFT expands the range of each sample by a factor of
two for each mixing sub-layer, which does not produce a conventional
block cipher.
A good alternative is
Balanced Block Mixing, which has
the same general structure as an FFT, but uses
balancedbutterfly operations based on
orthogonal Latin squares.
These replace the simple add / subtract butterfly in the ordinary
FFT, yet confine the value ranges so information expansion does not
occur. Another concept related to the FFT is the
fast Walsh-Hadamard transform
(FWT), which can use exactly the same mixing patterns as the FFT.

In abstract algebra, a
commutativering
with more than one element, a unity (multiplicative
identity) element, and a corresponding
multiplicative
inverse for every nonzero element.
(This means we can divide without loss.) Also see:
group.

In general, a field supports the four basic operations
(addition, subtraction, multiplication and division), and satisfies
the normal rules of arithmetic. An operation on any two elements
in a field is a result which is also an element in the field.
The
real numbers and
complex numbers are examples of
infinite fields.

In a field, each element must have an inverse, and the product of
an element and its inverse is 1. This means that every non-zero
row and column of the multiplication table for a field must
contain a 1. Since row 2 of the mod 4 table does not contain a 1,
the set of integers mod 4 is not a field. This is because 4
is not a prime.

In
electronics,
typically a device or
circuit
intended to reduce some
frequencies
in preference to other frequencies.
Filters can be low-pass, high-pass, or bandpass.
Filtering also occurs in power supplies, where rectified
AC is smoothed into
DC output. Also see:
voltage divider.

For example, suppose we have an active filter with a
voltage gain of 1 at 425Hz and
a gain of 6 at 2550Hz:

In a finite field, every nonzero element x can be squared,
cubed, and so on, and at some power will eventually become 1. The
smallest (positive) power n at which xn = 1
is the
order of element x.
This of course makes x an "nth
root of unity," in that it satisfies the
equation xn = 1.

A finite field of order q will have one or more
primitive elements a whose
order is q-1 and whose powers cover all nonzero field elements.

For every element x in a finite field of order q,
xq = x.

Also denoted Fq for a finite field of
order q. Also see:
characteristic.

FSM.
The general mathematical model of computation consisting of
a finite amount of storage or
state, transitions between state
values, and output as some function of state.
Given full knowledge of the "state machine" (that is, the next-state
and output functions), plus the initial state values, the resulting
sequence of states and outputs is defined absolutely.
This is completely predictable
deterministic computation.

In particular, a finite collection of states S, an input
sequence with
alphabetA, an output sequence
with alphabet B, an output
functionu(s,a), and
a next state function d(s,a).

Other than
really random generators for
nonce or
message key values, all the computations of
cryptography are finite state machines
and so are completely
deterministic.
Much of cryptography thus rests on the widespread but unproven
belief that the internal state of a
cryptographic machine (itself a FSM) cannot be deduced from a
substantial amount of known output, even when the machine design
is completely defined.

A class of
digitallogiccomponent
which has a single
bit of
state with various control signals to
effect a state change. There are several common versions:

Latch: the output follows the input, but only while the
clock input is "1"; lowering the clock
prevents the output from changing.

SR FF: Set / Reset; typically created by cross-connecting
two 2-input NAND
gates, in which case the inputs are
complemented: a "0" on the S input forces a stable "1" state,
which is held until a "0" on the R input forces a "0".

D or "delay" FF: senses the input value at the time of a
particular clock transition.

JK FF: the J input is an
AND
enable for a clocked or
synchronous transition to "1"; the K
input is an AND enable for a clocked transition to "0"; and often
there are S and R inputs to force "1" or "0" (respectively)
asynchronously.

The control aspect of a
digital communications system which
temporarily stops a data source from sending data.
This allows the data sink to "catch up" or perform other operations
without losing data.

Ultimately, flow control is one of the most important aspects of a
communications network.
Inside the network, however, protocols may simply send and re-send
data until those particular data are acknowledged.

In mathematics, a
proof which does not depend in any way at all
upon the meanings of the terms.
Each unique term can be replaced by an arbitrary unique symbol
without affecting the truth of the statement.
The logic structure of a formal proof is thus susceptible to
mechanical verification. As opposed to an
informal proof.

Under suitable conditions any periodic function can be
represented by a
Fourier series. (Various other
"orthogonal functions" are now known.)

The use of sine and cosine functions is particularly interesting,
since each
term (or pair of terms) represents a single
frequency oscillation with a particular
amplitude and
phase.
So to the extent that we can represent an amplitude waveform as a
series of sine and cosine functions, we thus describe the frequency
spectrum associated with that waveform.
This frequency spectrum describes the frequencies which must be
handled by a
circuit to reproduce the original waveform.
This illuminating computation is called a
Fourier transform.

The Fourier transform relates
amplitude samples at periodic discrete
times to amplitude samples at periodic discrete
frequencies. There are thus two
representations: the amplitude vs. time waveform, and the
amplitude vs.
complex frequency
(magnitude and phase) spectrum.
Exactly the same information is present in either representation,
and the transform supports converting either one into the other.
This computation is efficiently performed by the
FFT.

In a cryptographic context, one of the interesting ideas of the
Fourier transform is that it represents a thorough
mixing of each input value to every output
value in an efficient way.
On the other hand, using the actual FFT itself is probably
impractical for several reasons:

Range Expansion. An FFT significantly expands the range
of each input variable, and thus requires more storage (for exact
results) than the original values.

Floating Point. The use of sine and cosine functions
virtually implies floating point results, which is not ideal for
cryptography.

The basic idea of efficiently combining each value with every
other value is generalized in cryptography as
Balanced Block Mixing.
BBM structures can be applied in FFT-like patterns, and can
support a wide range of
keyed,
non-expanding,
nonlinear, and yet reversible
transformations.

In general, the number of repetitions or cycles per
second.
Specifically, the number of repetitions of a sine wave signal per
second: A signal of a single frequency is a sine wave of that
frequency.
Any deviation from the sine waveform can be seen as components of
other frequencies, as described by an
FFT.
Now measured in Hertz (Hz); previously called cycles-per-second
(cps).

Typically, audio frequencies range from 20Hz to 20kHz, although
many designs try to be flat out to 100kHz.
Video baseband frequencies range from DC up to something like 3MHz
or 5 MHz.
Common names for radio frequency (RF) ranges with generally
similar properties are:

To distort or make unintelligible, or the distorted or
unintelligible result. In
cryptography,
decipheredplaintext which is unintelligible, in part
or whole, typically due to use of a wrong
key, transmission errors, or possible
attack.

Geffe, P. 1973.
How to protect data with ciphers that are really
hard to break.
Electronics. January 4.
99-101.

For conventional stream ciphers, which are basically
just an RNG and
exclusive-OR, the name of
the game is to find a strong RNG.
Most RNG designs are essentially
linear
and easily broken with just a small amount of the produced
sequence.
The Geffe combiner was an attempt to combine two RNG's and
produce a stronger sequence than each.
But that was not to be.

An
electronic radioactivity detector
which produces a strong
voltage pulse when an ionizing-radiation
event is detected.
While a Geiger-Mueller tube cannot measure the strength of an event,
it does makes some events obvious with little or no additional
amplification or processing.
The number of events per unit of time then represents a "count" of
the ionizing radiation being encountered.
Often made with a thin mica window which allows alpha particles
to be sensed.

Typically a cylindrical tube with an outer conductive shell
(the cathode), a wire (the anode) in the center, and filled
with a gas like argon at low pressure.
Depending on the tube involved, a positive
bias of perhaps 500 or 600
volts above the
cathode is applied to the anode through a
resistance of perhaps 1 to 10 Megohms.
We would expect both tube temperature and applied voltage to
affect detection sensitivity to some extent.

When an ionizing event like a gamma ray interacts with an
atom of the internal gas, a fast electron may be ejected from
the shell of the atom.
As the ejected electron encounters other atoms, it may cause other
electrons to be ejected, producing a cascade or "avalanche" of gas
ions along their paths.
If the full distance between cathode and anode becomes ionized,
a strong
current pulse or arc will occur.
Presumably, many weaker or wrongly positioned events occur which do
not form an arc and are not sensed.
However, even unsensed events may have short-term and localized
effects on sensitivity that may be hard to quantify.

After the initial pulse (which discharges the interelectrode
capacitance), current for the arc flows through the anode
resistance, which should cause the applied voltage to drop below
the level which would sustain an arc.
After the arc ends, an internal trace gas, such as an alcohol,
may help to "quench" the ionization which could cause the same arc
to reoccur.
During the avalanche and quench period, the tube cannot detect
new events.
Meanwhile, the anode voltage climbs back toward the operational
level (charging the interelectrode capacitance) until another
sufficiently-strong and properly-placed ionizing event occurs.

Goodness-of-fit tests can at best tell us whether one
distribution is or is not the same as the other,
and they say even that only with some probability. It is
important to be very careful about experiment design, so that,
almost always, "nothing unusual found" is the goal we seek. When
we can match distributions, we are obviously able to state exactly
what the experimental distribution should be and is. But there
are many ways in which distributions can differ, and simply
finding a difference is not evidence of a specific effect.
(See null hypothesis.)

Web page HTML does not have a specific representation for Greek
symbols.
However, if the user has a font which does have Greek symbols,
it is easy to tell the browser to use that font.
The obvious choice is the "Symbol" font, since it is normally
present and widely available.
As usual, an
ASCII character will select a shape to
display, but now the shape set is taken from the "Symbol" font.
For example:

In
electronics, the concept of a common
reference, against which
voltage
can be detected or measured. Also known as "earth."
1. A chassis or circuit common point.
2. A metallic connection buried in dirt.

Perhaps the earliest common use of a ground was in the
electric telegraph, which came into use around the time of the
U.S. Civil War. A battery
voltage was switched onto a common wire
using a telegraph key, and electromagnets up and down the wire
responded by making a click.
When the key was released, the electromagnets would release and
make a clack, the time between click and clack being the
dot or dash of Morse code. But for
current to flow in the
circuit, there had to be a return path.
One way to do that was to string two wires for each circuit.
However, it was found that a metal surface in the earth, such as
a rod driven into the ground, can contact, within a few ohms of
resistance, the same reference as used
by everybody else.
So, especially for small signals, the return path can be through
the actual dirt itself, thus saving a lot of copper and making
a system economically more viable.

The original concept of radio was to launch and collect signals
from the air, as referenced to the common ground.
What actually happens is the propagation of an
electromagnetic wave, which
can be detected without a common reference.
But a ground can play an important role in an antenna system,
especially at lower RF
frequencies.

In the past, the usual ground reference was the copper cold water
pipe which extended in the earth from the home to the city water
main.
In many homes, this reference was carried throughout the home
on substantial copper pipe with soldered connections.
Unfortunately, the introduction of nonconductive plastic water
pipe, while convenient and cheap, also has eliminated an easy
ground reference.

Power distribution, with a massive appetite for copper, is a
natural application for one-wire connections, but in this case
there are surprising and dangerous complexities.
Nowadays, at the AC socket, we have both a protective ground wire
which connects directly to some ground, and also a return power
path, which is connected to ground at some point.

Ideally, the metal chassis or case of anything connected to
the power lines should connect to the protective ground.
Ideally, if protected equipment shorts out and connects live power
to the case, that will blow the equipment fuse or even a power
box circuit breaker, instead of electrocuting the operator.
Even more ideally, a ground fault interrupter (GFI) can detect
even a small amount of protective current flow and open an
internal breaker.
However, the protective ground system itself is generally tested
at most once (upon installation), and if it goes bad under load,
we will not know until bad things happen.
While GFI's do have a "test" button, most ordinary equipment does
not.

As different amounts of AC current flow about the home or
building, wire resistance causes the voltage between the two AC
socket grounds to vary, which is the origin of a
ground loop.
But ground loops are not limited to power circuits, and can
present serious problems in instrumentation and audio systems.

The condition where a
voltage exists between different
ground points in the same signal system,
all of which points are supposed to be exactly the same.
There are various causes:

Ground loop hum is sometimes caused or induced by
transformer-like coupling between
AC power wires and the associated safety ground
wire.
This source of hum will vary dynamically according to the
current flowing in those particular AC
wires.

Ground loop hum also can be caused by AC current flow in the
safety ground wire, current which ideally should not be present.
Any current flowing in the safety ground always develops a small
voltage difference across the
resistance of that wire.
The source could be some amount of leakage from equipment, perhaps
capacitive coupling inside a transformer, or any of a wide range
of component failure possibilities.

Ground loop signal coupling also can occur in the case of
unbalanced signal interconnections.
The usual culprit is single-conductor shielded wire using "RCA
connectors."
If a substantial signal is transported some distance by unbalanced
cable into a relatively low resistance load, there will be some
signal current flow on the return shield.
However, if the RCA connectors connect to chassis ground, and both
pieces of equipment are safety grounded (as required), and plugged
into different outlets, there will also be some return signal
current flow in the safety ground.
The result will be a small signal voltage between the outlet
grounds, which may then impose itself on other unbalanced
interconnections.

The simple ground model would have us believe that there is no
resistance in the ground, which is of course false.
Even sending a small signal from one amplifier to another on an
unbalanced line implies that some current will flow on that line,
and that same current also flows through the ground connections
(often, a "shield" conductor).
Thus, the voltage across different parts of ground will vary
dynamically depending on the ground resistance, which can cause a
cross-coupling between independent unbalanced channels.
Even if that coupling is tiny, when working with tiny signals,
it may matter anyway, especially in a
TEMPEST context, or when working with
signals in a receiver.

The ground loop problem is inherent in unbalanced signal lines.
The same effects occur inside circuitry, but then the problem
is under the control of a single designer or manufacturer; most
problems occur when interconnecting different units.
In general, ground loop cross-coupling effects are minimized by
reducing ground resistance and increasing input or load resistance.
Alternately, broadcast audio systems use
balanced line interconnections that
do not need ground as a part of the signal path.
Balanced lines also tend to "cancel out" common-mode noise
picked up by cables on long runs.

Consumer equipment generally uses unbalanced lines where the
signal is referenced to some ground, typically on "RCA connectors."
But because of ground loops, different equipment can have
different references, thus introducing power line hum into the
signal path.

Various responses are possible, but the one which is not
possible is to open or disconnect the safety ground.
The safety ground is there to protect life and should never be
subverted.
3-prong to 2-prong AC adapters should never be used when equipment
has 3-wire plugs.
Because sound systems are interconnected, a system isolated from
safety ground allows a failure on even one remote piece of equipment
to electrify the entire system, and that is breathtakingly dangerous.
Better alternatives always exist.

Probably the easiest approach is to make sure that all
equipment has grounded power plugs, and that those are inserted
into a single power strip, instead of different wall outlets.
3-wire extensions should be used if necessary.
This will tend to keep all pieces at the same ground level, a
relative zero, even if that varies in an absolute sense.
This is a "star" grounding system.

Maybe the next thing is to directly connect each equipment
signal ground to each other signal ground, using large copper wire.
Large wire is not used to handle large currents, but instead to
reduce interconnection resistance, and thus minimize the voltage
effect of any leakage current.
The "single power strip" approach should have done that already, but
some equipment may not use chassis ground as a signal reference.

The most direct solution to audio ground loops with existing
equipment is to add external audio isolation
transformers in
each signal line between equipment.
However, audio isolation transformers can be expensive, two are
required for stereo, and they generally have worse specifications
than the equipment which they interconnect.
They also may require known load impedances, which typically are not
otherwise required by consumer equipment, and so may need a resistor
across the secondary (the side connected to the input of the next
equipment).
Transformers need no power and last essentially forever.

Instead of audio isolation transformers, it is possible to
buy or build active
op amp unbalanced-to-balanced
and balanced-to-unbalanced converters.
These minimize common-mode noise from cable pickup (although less
well, and perhaps much less well than a transformer), but do
not really isolate ground.
The advantage is in not requiring a shield ground, or in allowing
the shield path to be open, while ignoring ground noise.
Op amp balanced line converters can have an excellent frequency
response, but, unlike transformers, must be powered.

Another possibility is to use power isolation
transformers, which are large and also
expensive, but which do not affect signal quality.
Sometimes a center-tap on the secondary can be grounded to the
following equipment, thus producing balanced AC power that can be
helpful in reducing hum.
If the problem is AC leakage into safety ground from some piece of
equipment, isolating the power can prevent current flow into the
safety ground.

A possible fix for some problems is to insert a small (say, 10 ohm
to 100 ohm) resistor in series with the audio shield at one end of
each cable.
If the problem was excessive current flow in the shield, the added
resistor will prevent that, and the small resistance should have
little effect on the signal going to a high-impedance input.
But the resistor can worsen electromagnetic shielding and so cause
pickup and interference on those lines (see
shielding).

It is also possible to open the shield connections entirely, thus
depending on the AC safety ground for signal coupling.
Using the AC ground as a signal reference is normally a bad idea, but
this response has no cost, is relatively easy to try, and may improve
things occasionally, although perhaps temporarily.
Opening the shield is common when working with balanced lines that
inherently ignore signals on the AC ground.

When the original source of the problem is the CATV cable, two
back-to-back 75 ohm to 300 ohm baluns may isolate the shield ground.
But some cheap "transformers" have no transformer inside and so do
not isolate.
The user should use an ohmmeter to verify that the input shield
does not connect to the output shield.

The pervasive nature of ground loops is a good reason to use
isolated balanced lines.
It is also a reason to use optical digital interconnections, which
inherently isolate the ground references in different pieces of
equipment.

In abstract algebra, a nonempty
setG, and a
closeddyadic (two-input, one-output) operation with
associativity, an
identity element and
inverse elements.
Whatever the operation is, we may choose to call it
"multiplication" and denote it with * as usual, the group
denoted (G,*).
Closure means that if elements (not necessarily numbers)
a, b are in G, then ab (that is, a*b)
is also in G. Also see:
cyclic group.

There is a single
identity element e which works with
all elements: for e and any a in G, ea = ae = a

There a corresponding
inverse for every element: for any a in G,
there is an a-1 in G such that
a-1a = e = aa-1

The
integers under addition
(Z,+) form a group, as do the
reals
(R,+).
A set with a closed operation which is just associative is a
semigroup.
A set with a closed operation which is both associative and
has an identity is a
monoid.
A
ring has a second dyadic operation which
is distributive over the first operation.
A
field is a ring where the second operation
forms an
abelian group.

A measure of the difference or "distance" between two binary
sequences of equal length; in particular, the number of bits which
differ between the sequences. This is the
weight or the number of 1-bits in the
exclusive-OR of the two sequences.

By itself, software does not function.
Only hardware can function.
The best software can do is to present a list of operations for
hardware to perform, when and if hardware gets around to performing
operations.
Hardware always does all computation, which ultimately limits the
efficiency of computation.

Hardware computation is not limited to the software concept
of sequentiality: In hardware, each individual
cipherlayer can have separate computation hardware
in a classic example of
pipelineing.
During each computation period, each of the multiple layers can be
computed simultaneously, and adding layers does not reduce the
data rate.
That allows cipher computations like
balanced block mixing
to occur at a constant block rate, independent of
block size, by adding hardware
computation layers.
Such a computation will have a
latency for each added layer, but the
computation time per byte goes down in proportion to block size.

A classic
computer operation which forms a
fixed-size result from an arbitrary amount of data.
Ideally, even the smallest change to the input data will change
about half of the bits in the result.
Often used for table look-up, so that very similar language terms
or phrases will be well-distributed throughout the table.
Also often used for
error-detection, and, known as a
message digest,
authentication. Also see
salt.

Error Detection

For
error detection, a hash of
message data will produce a
particular hash value, which then can be included in the message
before it is sent (or stored or enciphered).
When the data are received (or read or deciphered), the message
is hashed again, and the result should match the included value.
If the hash is different, something has changed, and the usual
solution is to request the data be sent again.
But the hash value is typically much smaller than the data, so
there must be "many" different data sets which will produce
that same value, which is called hash "collision."
Because of this, "error detection" inherently cannot detect all
possible errors, and this is independent of any
linearity
in the hash computation.

An excellent example of a hash function is a
CRC operation. CRC is a
linear function without cryptographic
strength, but does have a strong
mathematical basis which is lacking in ad hoc methods.
Strength is not needed when
keys are processed into the
state or
seed used in a
random number generator,
because if either the key or the state becomes known, the keyed
cipher has been broken already.
Strength is also not needed when a hash is used to accumulate
uncertainty in data from a
really random generator, since
the hash construction cannot expose unknowable randomness anyway.

Cryptographic Hashing

In contrast, a
cryptographic hash function
such as that used for authentication must be "strong."
That is, it must be "computationally infeasible" to find two input
values which produce the same hash result.
Otherwise, an opponent could produce a different message which
hashes to the correct authentication value.
In general, this means that a cryptographic hash function should be
nonlinear overall and the hash state or
result should be 256 bits or more in size (to prevent
birthday attacks).

Sometimes a cryptographic hash function is described in the
literature as being "collision free," which is a misnomer.
A collision occurs when two different texts produce exactly
the same hash result.
Given enough texts, collisions will of course occur, precisely
because any fixed-size result has only so many possible code
values.
The intent for a cryptographic hash is that collisions be hard to
find (which implies a large internal state), and that particular
hash values be impossible to create at will (which implies some
sort of nonlinear construction).

Reversibility

A special cryptographic hash is not needed to assure
that hash results do not expose the original data:
When the amount of information hashed is substantially larger
than the internal state or the amount of state ultimately exposed,
many different data sequences will all produce the exact same hash
result (again, "collision").
The inability to distinguish between the data sequences and so
select "the" original is what makes a hash
one way.
This applies to all "reasonable" hash constructions independent
of whether they are "cryptographic" or not.
In fact, we can better guarantee the collision distributions when
we have a relatively simple linear hash than if we must somehow
analyze a complex ad hoc cryptographic hash.

On the other hand, when less information is hashed than
the amount of revealed state, the hashing may be reversible, even
if the hash is "cryptographic."
And, again, that is independent of the strength of the hash
transformation.

Distribution Flattening

Currently, almost all of
cryptography is based on complex but
deterministic (and, thus, at least
potentially solvable) operations like
ciphers and hashes.
Because of the occasionally disastrous effectiveness of
cryptanalysis, every
cipher system has need of at least
a few absolutely
unpredictable values, which can be
described as
really random.
Really random values have various uses, including
message keys and protocol
nonces.
Generally, such values are obtained by attempting to detect or
sample some molecular or atomic process, such as electrical
noise.

For most cryptographic use, values should occur in a
uniform distribution, so that
no value will be predictable (by an
opponent) any more than any other value.
Unfortunately, few measurable molecular or atomic processes have a
uniform distribution.
As a consequence, some deterministic processing must be applied to
somehow "flatten" the non-uniform distribution.

In
statistics, and with
real number values, it is common to
simply compute an inverse and multiply.
Unfortunately, that depends upon knowing the original distribution
very well, but in practice the sampled distributions from quantum
levels are not ideal and do vary.

No simple, fixed,
integer value transformation can compensate
for a distribution
bias where some values appear more often than
they should.
Bias is a property of a set of values, not individual items, so
treating individual values similarly seems unlikely to correct the
problem.
On the other hand, a transformation of multiple values, like
block ciphering, can go a long way.
Because a cipher
block generally holds 64 bits or more worth
of sample values, we might never see two identical
plaintext blocks, and thus never
produce a bias in the
ciphertext.
With a block cipher, any bias in the sample values tends to be hidden
by the multiple values in a block, although at substantial expense.

Perhaps the most common way to flatten a distribution is to hash
multiple sample values into a result for use. Using a
CRC hash as an example, we can model a CRC
operation as something like a large, fast
modulo.
Now, when the CRC is initialized to a fixed value, a particular
input sequence always produces the same result, just like any other
deterministic operation, including a cryptographic hash.
So when inputs repeat, results repeat, and that carries the bias from
the input to the output, even if only a subset of result bits are
used.
The worst possible situation would be to "hash" each sample value
independently into a smaller result, since then the most frequent
sample values would transfer the bias directly into the results.
Normally, though, if we hash enough sample values at the same
time, we expect the input sequence to "never" repeat, so the
results should be almost completely corrected.
Thus, the issue is not just having more input than output, but also
having enough input so that any particular input string will "never"
recur.

An improvement is to initialize the CRC to a random starting
state before each hash operation.
Because of the random initialization, any remaining
bias (as in particularly frequent or infrequent
values) will be distributed among all possible output values.
When using a
CRC for hashing, a separate random value is
not required, since a random value is already in the CRC state
as a result of the previous hash.
Thus, what is required is simply to not initialize the CRC
to a fixed value before each hash operation.
For other hashes, the previous result could be hashed before new
sample values.

To assure that the hash is not reversible, the hash operation
must be overloaded; that is, at least twice as much information
must be hashed as the size of the hash result or the amount exposed.
And when bias must be corrected, a factor of 2.5 or more may be
a better minimum.
Reasonable choices might include a 16-bit CRC with 40 bits of
normally-distributed input data, or a 32-bit CRC with 80 bits
of input.

In most sciences, the main point of a mathematical
model is to predict reality, and
experimentation is how we know reality.
Experimentation thus sets the values that the mathematical model
must reproduce.
When there is a real difference between experiment and model (both
being competently evaluated), the model is wrong.

In general, experimentation cannot know every possible parameter,
or try every possible value, and so cannot assure us that something
never happens, or that every possibility has been checked.
That sort of thing typically requires a
proof, but such proof is always based on the
assumption that the mathematical model is sufficient and
correct.
Because experimentation often collects all measurable data, it is
generally better than proof at finding unexpected happenings
or relationships.

In
cryptography,
ciphers are basically approved for use by
experiments which find that various
attacks do not succeed.
Absent various assumptions (such as: no other attacks are possible,
and every approach has been fully investigated) that does not even
begin to approach what we would consider actual
proof of
strength.
Nevertheless, those results apparently are sufficient for the field
of cryptography to place real users and real data at
risk.

Since experimentation is the basis for all real use of
cryptography, it does seem odd that experimentation is often scorned in
mathematical cryptography.

In
electronicdigitallogic, the amount of time a signal
voltage
must be present and stable after the occurrence of a
clock
for the signal to be guaranteed to be recognized by all devices
over all allowed variations in device processing, power supply,
signal level, temperature, etc. Also see:
setup time.

In abstract algebra, a
mapping @ from
groupG into group H which "preserves the group operation."

For group G with operation #,
and group H with operation %;
for mapping @ from group G to group H;
given a, b in G:
The result of the group G operation on a and b,
when mapped into group H, must be the same as first mapping
a and b into H, and then performing
the group H operation:

@(a # b) = @a % @b

A homomorphic mapping (the map @ from group G into
group H) need not be
one-to-one.

Given a homomorphism from group G into group H and
mapping @ from G to H:

If e is the identity of G,
then e mapped into H is the identity of H.

For a in G, the inverse of a in G,
when mapped into H, is the same as first mapping
a into H and finding the inverse in H.

If G is
abelian
and the mapping from G to H is
onto,
then H is abelian.

A "huge block cipher" is my term for a conventional
block cipher with a maximum
block size substantially larger than the
common 64-bit, 128-bit or 256-bit block.
Most huge block ciphers will be
scalable, and so may select a wide
range of blocks, including tiny blocks, but also including 256-byte,
512-byte or even 4K byte blocks.
These sizes can be practical, given
FFT-like networks of
Balanced Block Mixing operations.
Mixing Ciphers can be made to
select block width in power-of-2 steps at ciphering time. (See
Mixing Cipher Design
Strategy.)

Various advantages can accrue from huge blocks (although not
all simultaneously):

The opportunity to add an "extra data" field of reasonable size
to the plaintext block while retaining reasonable overall
efficiency. That means ciphertext expansion, but the data
could be more important when used for:

When we have a huge block of plaintext, we may expect it
to contain enough (at least, say, 64 bits) uniqueness or
entropy to prevent a
codebook attack, which is the
ECB weakness.
In that case, ECB mode has the advantage of supporting the
independent ciphering of each block.
This, in turn, supports various things, such as the use of multiple
ciphering hardware operating in parallel for higher speeds.

As another example, modern packet-switching network technologies
often deliver raw packets out of order. The packets will be
re-ordered eventually, but we cannot start deciphering until
we have a full block in the correct order. But we might avoid
delay if blocks are ciphered independently.

A similar issue can make per-block authentication very useful.
Typically, authentication requires a scan of plaintext, and then
some structure to transport the authentication value with the
ciphertext, much like a common
error detecting code.
The problem is that all the data which are to be authenticated
in one shot, which often means the whole deciphered file, must be
buffered until an authentication result is reached.
We certainly cannot use the data until it is authenticated.
So we have to buffer all that data as we get it, but cannot use
any of it until we get it all and find that it
checks out.
We can avoid that overhead and latency by using per-block
authentication.

To implement per-block authentication, we use a keyed
cryptographic RNG
which produces a keyed sequence of values.
Both ends produce the same keyed sequence by using the same key.
We place a different random value in each block sent, and then
compare that to the result as each block is received.
This is very much like a per-block version of a
Message Key.

Normally, in
software, the more computation there is,
the longer it takes, but that is not necessarily true in
hardware, if we are willing to build or
buy more hardware. In particular, we can
pipeline hardware computations so that,
once we fill the pipeline (a modest
latency), we get a full block result on
every computation period (e.g., on every
clock pulse).
Thus, huge blocks can be much faster in hardware than software:
the larger the block, the larger the data rate, for any given
hardware implementation technology.

Something of mixed origin. In
cryptography, typically a
cipher system containing both
public key and
secret key component ciphers.
Typically, the public key system is used only to
transport the key for the secret key cipher.
It is the secret key cipher which actually enciphers and protects
the data.

In both logic and science, some hypotheses are better than others.
In logic, hypotheses are the same if they have the same formal
structure.
But in science, hypotheses with the exact same structure may or
may not be appropriate in particular
contexts.
A scientific hypothesis must be:

Testable: it must make
"predictions" that can be verified
experimentally or by comparison to factual data, and

Falsifiable: it must allow experiments or comparisons
to confirm or especially to disprove the hypothesis.

Hypotheses structured so that we can only develop evidence for
the question by investigating an essentially unlimited number of
possibilities are untestable.
Hypotheses in which no experiment of any kind can disprove
the question are unfalsifiable, and are best seen as mere
beliefs.
Hypotheses which apply, say, to all matter, are
unprovable absent testing on all matter, but
any one (repeatable) experiment could disprove the question.
Most scientific hypotheses are structured so that they can be
proven false or confirmed by
experiment, but cannot ever be proven true.

Many scientific questions are formally unproven in the sense
that they address the operation of all matter across time, only
a tiny subset of which can be sampled experimentally.
But most scientific issues also admit quantifiable
experimentation which makes it possible to bound the
interpretation of reality.
Quantifiable experimentation makes it possible to compare
different trials of the same experiment and see if things work
about the same each time, with each material, and in each place.
Of course, getting precisely similar values in each case may
just be a coincidence, which is why it is not proof.
But each trial does add to a growing mass of evidence for an overall
similarity which, while not proof, does provide both
statistical and factual support.

Cipher Strength

The usual logic of scientific experimentation is not available in
cryptography, in that cryptography has
no general, quantifiable test of
strength.
We only know the strength of a
cipher when an actual
attack is found (and then only know the
strength under that particular attack); until then we know nothing
at all about strength.
Until an attack is found, there is no experimental strength value, so
cryptanalytic experiments cannot
develop bounds on strength.
Nor are factual and comparable values developed.
As a result, cryptographic proof appears to require what Science
knows cannot be done: a testing of every possibility simply to show
that none of them work.

For example, in cryptography, we may wish to assert that a
certain
cipher is
strong, or
unbreakable by any means.
(We can insert "practical" with little effect.)
A cipher is unbreakable when no possible
attack can break it.
So to prove that, we apparently first must identify every possible
attack, and then check each to see if any succeed on the cipher
under test.
But not only do we not know every possible attack, it seems
unlikely that we could know every possible attack, or even
how many there are.
Thus, the assertion of strength seems unprovable.

Even if we could know every possible attack, we still
have problems:
Since attacks are classified as approaches (rather than
algorithms), it seems necessary to phrase
each in ways guaranteed to cover every possible use.
Yet it seems unlikely that we could be guaranteed to know every
possible ramification of even one approach.
Without comprehensive algorithms, it is hard to see how we could
provably know that any particular approach could not work.
So again the hypothesis of cipher strength seems unprovable
and unhelpful.

On the other hand, for each well-defined attack algorithm, we
probably can decide whether that would break any particular
cipher.
So it is at least conceivable that we can have proof of
strength against particular explicit attack algorithms.
Unfortunately, in most cases, attack approaches must be modified
for each individual cipher before an appropriate algorithm is
available, and failure might just indicate a poorly-adapted
approach.
So algorithmic tests are not particularly helpful.

Random Sequences

Similar issues occur with respect to
randombit sequences:
A sequence is random if no possible technique can extrapolate
future bits from all past ones, succeeding other than half the
time.
Again, we do not and probably can not know every possible
extrapolation technique, and as the set of "past" bits grows,
so do the number of possible techniques.
Not knowing each possible technique, we certainly cannot
check each one, making the hypothesis of sequence randomness
seemingly unprovable.

Fortunately, we do have some defined algorithms for
statisticalrandomness tests.
Thus, what we can say is that a particular test has
found no pattern, and that test can be repeated by various
workers on various parts of the sequence for confirmation.
What that does not do is build evidence for results from
other tests, and the number of such tests is probably
unbounded.
So, again, the hypothesis of randomness seems unprovable.

Scientific Strength

Even if we could develop cryptographic proof to the
same level as natural law, that probably would not be useful.
People already
believe ciphers are strong, even
without supporting evidence.
Belief with supporting evidence would, of course, be far
preferable.
But what is really needed is not more belief, but actual
affirmative
proof of strength.
And that is beyond what science can provide even for ordinary
natural law.

Why is lack of absolute proof acceptable in science and
not in cryptography?
In science, the issue is not normally whether an effect exists,
but instead the exact nature of that effect.
When an apple falls, we see gravity in action, we know it
exists, and then the argument is how it works in detail.
Even when science pursues an unknown effect, it does so
numerically in the context of experiments which measure
the property in question.

When a cipher operates, all we see is the operation of a
computing mechanism; we cannot see
secrecy or
security to know if they even exist,
let alone know how much we have.
Security cannot be measured because the appropriate context is our
opponents, and they are not talking.
While we may know that we could not penetrate security,
that is completely irrelevant unless we know that the same applies
to our opponents.

Does all this mean cryptography is hopeless?
Well, absolute
proof of absolute security seems unlikely.
But we can seek to
manage the risk of failure,
particularly because we cannot know how large that risk actually is.
For example, airplanes are designed with layers of redundancy
specifically to avoid catastrophic results from single subsystem
failures (see
single point of failure).
In cryptography, we could seek to not allow the failure of any
single cipher to breach security.
It would seem that we could approach that by
multiple encryption and by
dynamically selecting ciphers, which are parts of the
ShannonAlgebra of Secrecy Systems.

The disturbing aspect of the IDEA design is the extensive use
of almostlinear operations, and no nonlinear tables
at all. While technically nonlinear, the internal operations
seem like they might well be linear enough to be attacked.

The
strength delivered by even a simple
cipher when each and every
plaintext is equally probable and
independent of every other plaintext.
This is plaintext
balance, which is sometimes approached by
whitening, although producing true
independence remains a problem.

"With a finite key size, the
equivocation of
key and message generally approaches zero,
but not necessarily so.
In fact, it is possible for . . . HE(K) and
HE(M) to not approach zero as N
approaches infinity."

"An example is a simple substitution on an artificial language
in which all letters are equiprobable and successive letters
independently chosen."

"To approximate the ideal equivocation, one may first operate
on the message with a transducer which removes all redundancies.
After this, almost any simple ciphering system --
substitution, transposition, Vigenere, etc., is satisfactory."

The use of
CBC mode in
DES: By making every plaintext block
equally probable, DES is greatly strengthened against
codebook attack.
(Unfortunately, the public block randomization provided
by CBC does not hide plaintext statistics, and so does not hinder
brute force attack.)

The transmission of
randommessage key values: To the
extent that every value is equally probable, even a very
simple cipher is sufficient to protect those values.

The use of
data compression to reduce
the redundancy in a message before ciphering: This of course
can only reduce language redundancy. (Also, many
compression techniques send pre-defined tables before the
data and so are not suitable in this application.)

In
electronics,
the basic idea that there is an advantage to matching a source
or generator impedance to the load, typically by
transformer.

Consider what it would mean to make the load match the impedance
of a generator:
As we decrease the load impedance toward the generator impedance,
more current will flow into the load and more power will be
transferred.
But as we deliver more power, more power is also dissipated in the
generator, and thus simply lost to heat.
We deliver the maximum possible power to a load when the load has
the same impedance as the generator, but then we lose as much power
in the generator as we manage to transfer.
An efficiency of 50 percent is generally a bad idea for any signal,
low-level, high-level or power.

As the load impedance is decreased below the generator impedance,
now less power is delivered to the load, and even more
power is dissipated as heat in the generator.
The limit is what happens when a power amplifier output is shorted.
In practice, adding speakers in parallel to an amplifier output can
so reduce the load impedance to cause the amplifier to overheat or
self-protect or fail.

Audio Matching

In most audio work, there is little desire to "match"
impedances.
Normally, signals are produced by low-impedance sources (e.g.,
preamplifier outputs) for connection to high-impedance inputs
(e.g., amplifier inputs).
When transformers are used, they often create
balanced lines, receive balanced
signals, and provide
ground loop isolation.

Impedance matching tends to be of the most concern for
electro-mechanical devices.
The fidelity of sensitive mechanical sensors like phonograph
cartridges and professional microphones can be affected by their
loads.
For best performance, it is important to present the correct load
impedance for each device, which is a serious impedance matching
requirement.
However, nowadays this is often accomplished trivially with an
appropriate load resistor across a high-impedance input to an
amplifier or preamp.
In contrast, loudspeakers, which are also electro-mechanical, are
almost universally "voltage driven" devices.
Speakers are specifically designed to be driven from an source
having an impedance much lower than their own.

One old application somewhat like matching is the classic
"input transformer":
These devices take a low-level low-impedance signal to a larger
signal (typically ten times the original, or more), but then
necessarily at a much, much higher impedance.
When used with an amplifier that has a high input impedance anyway,
an input transformer can deliver a greater signal, without the
noise of a low-level amplification stage.
However, bipolar transistor input stages generally want to see a low
impedance source for best noise performance, so the advantage seems
limited to somewhat noisier FET and tube input circuits.

Good input transformers, with a wide and flat frequency response,
are very expensive and can be surprisingly sensitive to
nearby AC magnetic fields.
Although a thin metal shield is sufficient to protect against RF
fields, sheet metal steel will not diminish low-frequency magnetic
fields much at all.
Mu metal shields may help, although distance is the usual solution.

Power Matching

In power transformers, losing as much power to heat as we deliver
would be absolutely ridiculous.
We may transform AC power to get the voltage we need, but we do not
"match" the equipment load to the impedance of the AC line.

RF Matching

In radio frequency (RF) work, impedance matching is needed
to properly use coaxial cables.
Using source and load impedances appropriate for the coax minimizes
"standing waves" on that coax.
Standing waves increase current at voltage minima and thus increase
signal loss, even for low-level signals.
Standing waves also can cause voltage breakdown at voltage maxima,
which may include transmitter tuning capacitors or output transistors.
And almost any passive filter will require a known load impedance.

Something which absolutely cannot happen, under any condition,
over any amount of time.
Making information exposure "impossible" is often a goal in
cryptography.
But it is not uncommon for something
believed "impossible" to
turn out to be not even all that
improbable. Also see:
scientific method and
proof.

Unlikely. Something
believed to occur very infrequently.
For example, modern
cryptography is based on the use of
keys.
With any finite number of keys, it is always possible for an
opponent
to accidentally choose the correct one and so expose an enciphered
message.
(But that is not a
break unless one can do it at will and
with less effort than a
brute-force search.)
Choosing the correct key by accident is made "improbable" by having
many keys, but no matter how many keys there are, the possibility
of exposure still exists.
Thus,
proving that a
cipher has astronomical numbers of keys
still does not prove the cipher will protect information in
every possible case.

In
statistics, events are independent
when the occurrence of one event does not change the probability
of another.
Simple independence can be argued from first principles, or
measured with simple probability experiments.
In complex reality, however, it may be difficult to know which of
many different events may combine to influence another. Also see
correlation and
rule of thumb.

In the study of
logic,
reasoning by generalizing multiple
examples into a single overall idea or statement, sometimes by
analogy.
While often incorrect, inductive reasoning does provide a way to go
beyond known truth to new statements which may then be tested by
observation and experiment.
In contrast to
deductive reasoning.
Certain types of inductive reasoning can be assigned a correctness
probability using
statistical techniques.
Also see:
argument,
fallacy,
proof and
scientific method.

A basic
electroniccomponent
which acts as a reservoir for electrical power in the form of
current.
An inductor acts to "even out" the current flowing through it, and
to "emphasize" current changes across the terminals.
An inductor conducts
DC and opposes
AC in proportion to
frequency.
Inductance is measured in Henrys: A
voltage of 1 Volt across an inductance
of 1 Henry produces a current change of 1 Ampere per second
through the inductor.
A current change of 1 Ampere per second through an ideal 1 Henry
inductor generates a constant 1 Volt across the inductor.

If we know the inductance L in Henrys and the frequency
f in Hertz, the inductive
reactanceXL in Ohms is:

XL = 2 Pi f L
Pi = 3.14159...

Separate inductors in
series are additive.
However, turns on the same core increase inductance as the
square of the total turns.
Two separate inductors in
parallel have a total inductance
which is the product of the inductances divided by their sum.

An inductor is typically a coil or multiple turns of
conductor
wound on a magnetic or ferrous core, or even just
a few turns of wire in air.
Even a short, straight wire has inductance.
Current in the conductor creates a
magnetic field, thus "storing" charge.
When power is removed, the magnetic field collapses to maintain the
current flow; this can produce high voltages, as in automobile spark
coils.

The simple physical
model of a
component which is a simple inductance
and nothing else works well at low
frequencies and moderate
impedances.
But at RF frequencies and modern digital rates, there is no
"pure" inductance.
Instead, each inductor has a series resistance and parallel
capacitance that may well affect the larger circuit.

In mathematics, a
proof which does depend upon the
meanings of the terms, and thus is difficult or impossible to
verify by logic machine.
Informal proofs depend upon specific interpretations for their
terms, and so depend upon being interpreted in a particular context;
outside of that context the proof will not apply.
Specifically defining the appropriate context is an important part
of developing informal proofs.
As opposed to a
formal proof.

Many proofs are informal. Because they cannot be mechanically
verified, these proofs may have unseen problems, and often do
develop and change over time. See:
Method of Proof and Refutations.

One-to-one. A
mapping f: X -> Y where no two
values x in X produce the same result
f(x) in Y.
A one-to-one mapping is invertible for those values of X
which produce unique results f(x), but there may not be a
full inverse mapping g: Y -> X.

A material in which electron flow is difficult or impossible.
Classically air or vacuum, or wood, paper, glass, ceramic, plastic,
etc. As opposed to a
conductor and
semiconductor.

As a
rule of thumb, a cubic centimeter (cc)
of a solid has about 1024 or 1E24 atoms.
A good insulator like
quartz has only about
10 free electrons per cc., which implies that only
about one atom in 1023 (1E23) has a broken bond
(at room temperature and modest
voltage).
This gives a massive
resistance to
current flow of about
1018 (1E18) ohms across a centimeter cube.

1. An objective of
cryptography.
The idea that information is what it appears to be: uncorrupted
(unchanged) information from the source.
2. Adherence to a code of (typically) moral or intellectual
values. Also see:
data fabrication and
data falsification.

Intellectual Property

The idea that the creation of intellectual objects (such as
paintings, writings, and inventions) is sufficiently worthwhile
to be given legal protection against unrestricted replication.
Writers and inventors can thus profit from their own work, although
only if someone wishes to buy that work:
Legal protection is not a direct financial grant.
See the United States Patent and Trademark Office (PTO)
(http://www.uspto.gov/),
and the United States Copyright Office (in the Library of Congress)
(http://lcweb.loc.gov/copyright/)
sites.

In the United States, the basis of intellectual property law is
the Constitution, in Article 1, Section 8 (Powers of Congress):

"Congress shall have power . . . To promote the Progress of Science
and useful Arts, by securing for limited Times to Authors and Inventors
the exclusive Right to their respective Writings and Discoveries."

Intellectual property generally includes:

Trade Secrecy is the
right to keep formlas or inventions secret, and nevertheless profit
from the resulting products or use.
In the U.S., this is generally state law.

Trademarks typically are
symbols used on marketed goods to identify a particular maker.
In the U.S., an issue of federal law.

Copyright is a long-term grant
by which the owner of a creative work can recover damages and
penalties when others reproduce that work.
Note that copyright protects written works as a particular
fixed expression of an idea, but not the idea itself.
In the U.S., this is federal law.

Patent is a much shorter-term
grant by which the owner of an invention can recover damages and
penalties when others make, sell, or use
unlicensed copies of the invention.
A patent protects the essential functioning of an invention, and
thus can protect the idea itself, in many different implementations.
Because of international "harmonization" efforts, U.S. patent law
generally does correspond to the patent laws of many other
countries, even with respect to
software.
Of course, U.S. Patents apply only in the U.S.

Design Patents, which protect the ornamental physical
shape of an object, but do not protect functioning or
operation; and

Utility Patents, the conventional patent we
think of in a technical context, which do protect the
functioning of something which may be realized in
many different forms, all of which would be protected.
Utility patents require a complex and detailed application
which must be argued, corrected and approved, plus application
fees, issue fees and periodic maintenance fees.

Intermediate Block

In the context of a
layeredblock cipher, the data values
produced by one layer then used by the next.

In some realizations, an intermediate
block might be wired connections between layer
hardware. In the context of a general
purpose
computer, an intermediate block might
represent the movement of data between operations, or perhaps
transient storage in the original block.

1. In
electronics, a device which converts
low-voltageDC battery power into high-voltage
AC power.
2. The
logic function which performs the
NOT operation; a logic "gate" which outputs
the compliment of the
binary value on the input.

A polynomial only evenly divisible
by itself and 1. The polynomial analogy to
integerprimes. Often used to generate a
residue class
field for polynomial operations.

A polynomial form of the ever-popular
"Sieve of Eratosthenes"
can be used to build table of irreducibles through
degree 16.
That table can then be used to check any potential irreducible
through degree 32. While slow, this can be a simple, clear
validation of other techniques.

While it is often said that IV values need only be random-like or
unpredictable, and need not be
confidential, in the case of
CBC mode, that advice can lead to
man-in-the-middle attacks on
the first plaintext block.
If a MITM opponent knows the usual content of the first block, they
can change the IV to manipulate that block (and only that block) to
deliver a different address, or different dollar amounts, or different
commands, or whatever.
And while the conventional advice is to use a
MAC at a higher level
to detect changed plaintext, that is not always desirable or
properly executed.
But the CBC first-block problem is easily solved at the CBC level
simply by enciphering the IV and otherwise keeping it confidential,
and that can be reasonable even when a MAC will be used later.

Sometimes, iterative or repeated ciphering under different IV
values can provide sufficient added keying to perform the
message key function (e.g., the
"iterative stream cipher" in
a cipher taxonomy).

Variations in the
period
of a repetitive signal.
These period variations also can be seen as
frequency
or
phase
variations.

Often discussed with respect to
oscillator signals.
Oscillator jitter is commonly due to the small amounts of analog
noise
inherent in the physics of
electroniccircuitry, which thus affects the
analog-to-digital
conversion which indicates the start of each new period.
This is unpredictable variation, but generally
very tiny, bipolar around some mean frequency, and varies on a
cycle-by-cycle basis. It cannot be accumulated over many cycles
for easier sensing.

A different form of jitter occurs when a digital system uses
two or more independent oscillators or
clocks
which are not
synchronized.
In this case, one signal may slide
early or late with respect to the other, until an entire
cycle is lost or skipped.
But all this will be largely
deterministic,
based on the frequencies and phases of the different clocks.
To a large extent, this is something like two brass gears
rolling together, with a particular tooth on the smaller gear
appearing a predictable number of teeth later on the larger
gear.

The name "jitterizer" was established in section 5.5 of my 1991
Cryptologia article: "The Efficient Generation of
Cryptographic Confusion Sequences"
(locally, or @:
http://www.ciphersbyritter.com/ARTS/CRNG2ART.HTM#Sect5.5)
and is taken from the use of
an oscilloscope on digital circuits, where a signal which is not
"in sync" is said to
jitter.
Mechanisms designed to restore
synchronization are called
"synchronizers," so mechanisms designed to cause jitter
can legitimately be called "jitterizers."

"The system should be practically, if not theoretically, unbreakable."
(Unfortunately, and to a large extent forgotten or coarsely ignored
by conventional cryptography, is the truth that there areno actual, implemented, realized
systems which are "theoretically
unbreakable" (see, for example,
one-time pad,
proof,
strength and
cryptanalysis).
And almost universally ignored is the fact that cryptography has
no test or measure to show or even testify that a cipher is
"practically unbreakable" (see, for example,
cryptanalysis):
Just because our guys cannot break it does not imply that the
opponents are similarly limited.
But there does seem little point in using a known
breakablecipher, so we make do as best we can.)

"Compromise of the system details should not inconvenience
the correspondents."
(This requirement is often cited as a basis for stating that the
security of a cryptosystem must depend only on the
key, and not on
the secrecy of any other part of the system.
That is fine as far as it goes, but in systems which select among
different
ciphersusing a key, while the
general concept of selection would of course be known to an
opponent, the
actual cipher selected by the key would not.

Systems which select among an ever-increasing number of ciphers
can even make it difficult for an opponent to know the full set of
possible ciphers.
For the opponents, being forced to find, obtain and analyze a
continuing flow of new secret ciphers is vastly more expensive
than simply trying another key value in a known cipher.
Forcing the opponent to pay (in effort) to acquire each
of many cipher designs is not a bad idea.
While having many possible ciphers does not guarantee strength,
it should increase the cost of attacks and thus potentially
change the balance of power between user and attacker.

Kerckhoffs second requirement is also understood to discount
secret ciphers, as in
security through obscurity.
We of course want to use only ciphers we can continue to use
securely even when the cipher has been fully exposed.
But we certainly can use ciphers that start out secret,
even if we understand that eventually they will become exposed.

Note that the issue of secret ciphers is not stated directly by
Kerckhoffs, but is instead extrapolated from what he wrote.
What Kerckhoffs really says is that cipher exposure should
not cause "inconvenience."
But to the extent that "inconvenience" is an issue, various other
ramifications appear that the crypto texts studiously ignore:

When we have only one cipher, any newly-found weakness will
require system-wide upgrades and be a major "inconvenience" for
every user.
We can thus take Kerckhoffs second requirement as demanding the
ability to easily replace current ciphers with new ones.
Then if weakness is found in some particular cipher, we simply
use something else.
The same cryptography which loudly proclaims Kerckhoffs requirements
as a basis for modern design deliberately ignores the "inconvenience"
of cipher failure.

Each new cipher should have as much
cryptanalysis as we can
afford. But if we could really
trust even a heavily-cryptanalyzed
standard cipher, we would never
need to change ciphers in the first place.
It is sadly "inconvenient" that no such trust is possible, either
in new ciphers, or in old ones.
Any cipher can have some sort of hidden weakness we have not
yet found.
To combat unknown weakness we can use
multiple encryption with
three ciphers in sequence.
A good understanding of the Kerckhoffs requirements would seem to
demand something beyond current cryptographic practice.

Cryptographic orthodoxy has all of us use a single cipher
for years on end. Naturally, our
opponents will know which cipher we use,
and try to break it.
Opponents can well afford expensive and lengthy analysis, because
success means they can expose data from anyone and everyone, for
perhaps the next decade.
Since we cannot know whether they have succeeded, our continuing
use of the same cipher is a terrible
risk.
The risk is that of a
single point of failure,
a single cipher which, if broken, exposes everything.
A less "inconvenient" alternative is to use a system of multiple
ciphers, as proposed in
Shannon's 1949
Algebra of Secrecy Systems.
In a modern computer implementation we might dynamically select
a cipher for use from among a continuously increasing set of
ciphers.
That would distribute protected information from different users
and different times under wide variety of different cipherings,
which would reduce the benefit opponents would get from breaking
any particular one.
Distributing new ciphers is not an issue:
To the extent that we can securely transfer a new key, we can also
securely transfer the name of the new cipher, or even the actual
code.)

"The
key should be rememberable without notes and
easily changed."
(This is still an issue.
Hashing allows us to use long language
phrases, but the best approach may someday be to have both a
hardware key card and a key
phrase.)

"The cryptogram should be transmissible by telegraph."
(This is not very important nowadays, since even
binaryciphertext can be converted into
ASCII or
base-64 for transmission
if necessary.)

"The
encryption apparatus should be
portable and operable by a single person."
(Software encryption approaches this
ideal.)

"The system should be easy, requiring neither the knowledge
of a long list of rules nor mental strain."
(Software
encryption has the potential to approach this, but
often fails to do so.
We might think of the absolute requirement for
certifyingpublic keys, which is still
often left up to the user, and thus often does not occur.)

The general concept of protecting things with a "lock," thus
making those things available only if one has the correct "key."
In a
cipher, the ability to select a particular
one of many possible transformations between a
plaintext message and the corresponding
ciphertext.
By supporting a vast number of different key possibilities (a large
keyspace), we hope
to make it impossible for someone to decipher the message by trying
every key in a
brute force attack (but see
key problems).

In
cryptography we have various kinds
of keys, including a User
Key (the key which a user actually remembers), which may be the
same as an Alias Key (the key for an
alias file which relates
correspondent names with their individual keys). We may also
have an Individual Key (the key actually used for a particular
correspondent); a
Message Key (normally a random value
which differs for each and every message); a
Running Key (the
confusion sequence in a
stream cipher, normally produced by a
random number generator);
and perhaps other forms of key as well (also see
key management).

Ideally, a key will be an arbitrary equiprobable selection
among a huge number of possibilities (also see
balance).
This is the fundamental strength of cryptography, the
"needle in a haystack" of false possibilities.
But if a key is in some way not a
random selection from a
uniform distribution, but is
instead
biased, the most-likely keys can be examined
first, thus reducing the complexity of the search and the effective
keyspace.

In most cases, a key will exhibit
diffusion across the message; that is,
changing even one bit of a key should change every bit in the message
with probability 0.5. A key with lesser diffusion may succumb to
some sort of
divide and conquer attack.

For practical security, it is not sufficient to simply have
a large keyspace, it is also necessary to use that keyspace.
Because changing keys can be difficult, there is often great temptation
to assign a single key and then use that key forever.
But if that key is exposed, not only are the current messages revealed,
but also all other messages both past and future, and this is true
for both public-key and secret-key ciphers.
Using only one key makes that key as valuable as all the information
it protects, and it is probably impossible to secure any key that
well, especially if it is frequently used.
Humans make mistakes, people change jobs and loyalties, and employees
can be intimidated, tempted or blackmailed.

It is important to change keys periodically, thus decisively ending
any previous exposure.
Secret-key systems can make this fairly invisible by keeping an
encrypted
alias file and automatically translating
a name or channel identifier into the current key.
This supports the easy use of many secret keys, and the invisible
update of those keys.
Then only the
keyphrase for the alias file itself
need be rememebered, and that keyphrase could be changed at will.
Old keys could be removed from the alias file periodically to
reduce the amount of information exposed by that file, and
so minimize the consequences of exposure. See:
key reuse.

Even with support from the
cipher system, key file maintenance
is always serious and potentially complex.
The deciphered alias file itself should never appear either on-screen
or as a printout; it should be edited automatically or indirectly.
Thus, some security officer probably needs to be in charge of
updating and archiving the alias files.

Secret versus Public Keys

For some unknown reason, some authors claim that
secret key ciphers are essentially
impractical.
That of course flies in the face of many decades of extensive actual
use of secret-key ciphers in the military.
The claim seems to be that secret-key ciphers require vastly more
keys to be set up and managed than
public key ciphers.
There is an argument there, as we shall see, but it is not a
good one.

Public-key ciphering generally requires an entity beyond the
ciphering parties simply to function. This is a
public key infrastructure
(PKI), of which the main element is a
certification authority (CA).
The CA distributes authenticated keys for use, and so must be set up
and supported and protected as long as new keys or even mere key
authentication is needed.
But even with a CA, public-key misuse can lead to undetectable
man-in-the-middle attacks
(MITM), where the
opponent reads the messages without
having to
break any cipher at all.
With a secret key cipher at least the opponent has to actually break
a cipher, which is thought to be hard.

Users can always give secret information to others.
It is not the cryptography which allows exposure, since either end
can give a copy to someone else no matter how the original was sent.
The role of cryptography is limited to providing protected
communication (or storage), and cannot prevent exposure by either
user.
Sharing secret information with someone else inherently implies a
certain degree of
trust.

In a secret-key cipher, a user at each end has exactly the same
key.
If only two users have a key, and one user receives a message which
that key deciphers, the message can only have been sent by the other
user who has that key.
But there are various issues:

Can we afford for another user to have "our" key?
A secret key just represents a particular communications path.
A user can easily manage thousands of keys, and create new ones
at will.
Keys are not a scarce resource.

Can we trust another user to have "our" key?
The whole point of having a key is to get secret information to that
other user who we trust to have that information.
Nobody cares about the key per se, the issue is what the
key can expose.
And if we trust the other user not to expose the information itself,
what sense does it make to not trust them with the key whose main
effect is simply to expose that information?

Can we trust another user to not expose "our" key?
The main thing exposing a key can do is to expose the secret
messages.
But the other user already has the messages, and could
be sharing them, with or without sharing the key.
Sharing the key has not changed the problem.
There is nothing magic about a cipher system, whether secret key
or public key, that changes the fact that we trust the other end
with secret information, and they might not be worthy of that trust.

Suppose the other end is untrustworthy and shares the key
with someone who pretends to be that user.
Even without sharing the key, the other user can accept
outside messages and send them along, misleading us as to the source.
Even without keys and cryptography, another party can present
outside work as their own.
The key is not the problem.
The problem is whether we trust the other end, and if so, how much.
Cryptography cannot solve that problem.

The remaining issue seems to be that, if everybody has to talk
privately and independently of everyone else, then everybody needs
a different key for everybody else.
Public-key systems seem to make that easier by allowing the
senders to share the public key to a single user.
Ideally, fewer keys need be created.
But actually getting public keys is only "easy" after
a CA is established, funded, operating, and even then only if we
can live with trusting a CA.
Similar structures could be built to easily distribute secret keys,
and may be particularly appropriate in a distributed, hierarchical
business situation.

In practice, we do not need to talk to everybody
else, just a small subset.
And many interactions are with representatives from a common group,
each of whom has access to the same secret information anyway.
The group might need a different key for each "client" user, but
everyone in the group could use the key for that client.
Business groups might have to handle millions of keys, but
public-key technology does not solve the problem, because
somebody has to authenticate all those keys.
If we hide that function in the CA, then we have to fund and trust
the CA.

Suppose we need to communicate, privately and
independently, with n people:

With a secret-key cipher we need to transfer n
different keys.
This would happen in some inconvenient "out of band" way, such as
a personal meeting, phone call, fax, letter, delivery service, or
as a part of a business hierarchy.
Usually this depends upon having some sort of prior or business
relationship, with the usual introductions, meetings, and
communication paths.

With a public-key cipher we also need n keys
(actually, n pairs).
Since the keys are public, presumably they should be easy to get
from a list on some server somewhere (provided somebody sets up
and maintains such a server).
But we cannot simply trust a web page since it may be hacked,
nor can we trust email that may really be from an opponent.
Each key must be authenticated, or we risk vastly
easier MITM attacks exposing our messages.
Key authentication can be conveniently done "in band," but only
with a CA with whom we have already arranged trust.
Otherwise we have to call the other guy on the phone, see if it
is who we expect, and confirm a hash of the key, again,
"out of band."
And we cannot simply use a phone number or address from the web,
since that may call the opponent instead.
Absent a CA, the main advantage here seems to be that contents
of the phone call (or letter or fax) need not be secret.

One other possibility is a "web of trust."
In this structure, people attest to trusting someone who has a
key for someone else.
But even if we assume that could ever work to cryptographic levels of
certainty, validating a key is only half the issue.
The other part is whether we can reasonably hope to trust our secret
information to someone we do not know.
Public-key ciphers do not solve that problem.

An odd characteristic of public-key cryptography is that,
normally, if we encipher a message to a particular user, we cannot
then decipher that same message.
In a business context that may be an auditing problem, since the
business can only read and archive incoming messages.
Absent a special design, public-key cryptography may make it
impossible to document what offer was actually sent.

For those who would never consider using the short keys needed by
secret-key ciphers, note that public keys must be much longer than
private keys of the same strength, because a valid public key must
have a very restricted form that most key values will not have.
In practice, a public-key cipher almost certainly will just set up
the keys for an internal secret-key cipher which actually protects
the data, so the final key size will be small anyway.

Public-key technology is a tool that offers certain advantages,
but those are not nearly as one-sided as people used to believe.
Secret key cipher systems were functioning well in practice long
before the invention of public-key technology.

Keys should be changed periodically.
But, in a corporate setting it is likely that the corporation
will want to be able to review old messages which were encrypted
under old keys.
That either requires archiving a plaintext version of each message,
or archiving the encrypted version, plus the key to decrypt it (see
key storage).

Obviously, key archives could be a sensitive, high-value target,
so that keeping keys and messages on different machines may be a
reasonable precaution.

A corporation may seek to limit the ability for users to create
their own new keys, so that corporate authorities can monitor all
business communications.
That of course implies that the corporation takes on the role of
creating and
distributing new keys, and
probably also maintains a
key archive as well as a message archive (see
key storage).

The problem of distributing
keys to both ends of a communication path,
especially in the case of
secret key ciphers, since
secret keys must be transported and held in absolute secrecy.
Also the problem of distributing vast numbers of keys, if each
user is given a separate key. Also see:
key reuse.

Although this problem is supposedly "solved" by the advent
of the
public key cipher, in fact, the
necessary public key validation is almost as difficult as the
original problem. Although public keys can be exposed,
they must represent who they claim to represent, or a "spoofer" or
man-in-the-middle can
operate undetected.

Nor does it make sense to give each individual a separate
secret key, when a related group of people would have access
to the same files anyway. Typically, a particular group has the
same secret key, which will of course be changed when any member
leaves. Typically, each individual would have a secret key for
each group with whom he or she associates.

For public key ciphers, the key to be loaded will be in
plaintext form and need not be deciphered.
Similarly, a public key database may be unencrypted, since all
the public keys are exposed anyway.
So adding a public key can be just as simple as adding any other
data.

For secret key ciphers, the key to be loaded will have been
transported encrypted under some other key.
And the user key database will be encrypted under the keyphrase for
that particular user.
Accordingly, a new key must first be decrypted and then encrypted
under the user keyphrase.
One problem here is that we want to minimize the amount of time
any secret key exists in plaintext form.
Of course keys will be in plaintext form during use, in which case
we decipher the key only in program memory, and then zero that
storage as soon as possible.

It seems desirable to avoid deciphering the entire key database
simply to insert a single new key.
One workable possibility is to add simple structure to the cipher
itself so that cipherings can be concatenated as ciphertext
(as long as they are ciphered under the same key).
Then the new key can be enciphered on its own, and simply
concatenated onto the key storage file.

The loss of the
key necessary to decrypt encrypted messages,
or needed to create new encrypted messages.

The more valuable the messages, the more serious the risk from
loss of the associated keys. Such loss might occur by equipment
failure, accident, or even deliberate user action.

In the business case, key loss can be mitigated by maintaining
corporate
key archives, and distributing key
files to users.
Without such archives (or some alternate way to recover), a single
user equipment failure could result in the loss of critical keys
and business documents, which would virtually guarantee the end of
encryption in that environment.

Limits on the ability of
cryptographickeys to solve security problems.
Such limits are suggested by the well-known problems we have with
the house keys we use everyday:

We can lose our keys.

We can forget which key is which.

We can give a key to the wrong person.

Somebody can steal a key.

Somebody can pick the lock.

Somebody can go through a window.

Somebody can break down the door.

Somebody can ask for entry, and unwisely be let in.

Somebody can get a warrant, then legally do whatever is
required.

Somebody can burn down the house, thus making everything
irrelevant.

Even absolutely perfect keys cannot solve all problems, nor can
they guarantee privacy. Indeed, when cryptography is used for
communications, generally at least two people know what is being
communicated. So either party could reveal a secret:

By accident.

To someone else.

Through third-party eavesdropping.

As revenge, for actions real or imagined.

For payment.

Under duress.

In testimony.

When it is substantially less costly to acquire the secret by
means other then a technical
attack on the
cipher, cryptography has
pretty much succeeded in doing what it can do.
Unfortunately, once an attack has been found and implemented as
a computer program, the incremental cost of applying that attack
may be quite small.

The ability of a
cipher system to re-use a
key without reduced
security.
One of the expected requirements for a cipher system.

Obviously, we may send many messages to a particular recipient.
For security reasons we do not want to use the same key for any
two messages, and we also do not want to manually establish that
many keys.
One approach is the two-stage
message key, where a random value is
used to encrypt the message, and only that random value is encrypted
by the main key. In
hybrid ciphers, a
public key component transports
the random message key.
Thus, the stored keys are used only to encrypt a relatively small
random value or
nonce, and so are very well hidden.

One way to create the random nonce would be to
bit-permute a vector of
half-1's and half-0's, by
shuffling twice. That way, even
if the cipher failed and the nonce was exposed, the keyed generator
creating the message keys would be protected (see
Dynamic Transposition) so
that the opponent will have to repeat the previous break, which
may demand both time and luck.

Typically, the modified
key used in each
round of an iterated
block cipher.
There is ample motive to make each key modification simple, so that
each ciphering round can occur quickly, but that has resulted in
strongly-related round keys which were targeted for
attack.

A stored
key is selected for use.
Normally, we could not do this by inspection, because each key is a
long random number, generally indistinguishable by humans from among
all other long random numbers.
In any case, we do not want to reveal it even to the user.
Accordingly, the
key storage system would typically
include a name or nickname or email address field to identify the
correct key.

Under my
alias file implementation, selecting the
right key for use is done by entering the alias nickname for the
desired person, contract, project or group.
That could be the email address for the appropriate channel; those
with multiple email addresses could have the same key listed under
multiple aliases.
When the email address is used as the alias, the desired email
address can be automatically found in the message header, and the
correct key automatically used.

Similarly, stored keys should have a start date, and multiple
keys for the same channel will be distinguished by that date.
By checking the date of the message to be decrypted, the key which
was correct as of that date could be automatically selected and used,
again making most key selection automatic, even for archived messages.

In practice, it is common to select the wrong key, and then the
message cannot be read. But if the message is just being accumulated
somewhere for later use, we may mistakenly discard the original
ciphertext before making sure we have the deciphered plaintext.
Accordingly, a required feature of a
cipher system is that using the wrong
key be detected and announced.
Presumably this will be done with some sort of
error detecting code such as a
CRC of the plaintext, although some cases may
demand a keyed
MAC.

For public keys, the key database can be unencrypted, and possibly
even part of a larger database system.

For secret keys, the database must be encrypted, and should be
as simple as possible for security reasons.
One possible form is what I call an
alias file.
Each key is given a textual alias, which then becomes an
efficient way to identify and use a particular long random key.
Typically, an alias would be a nickname for a person, project,
or work group, or an email address.
By allowing the user to specify a short name instead of a key,
long and random keys can be selected and used efficiently.

The "database" part of this could be as simple as an encrypted
list of entries, with each entry having a few simple textual fields
such as alias id, key value, and start date.
By ordering the entries by start date, the system could search from
the front of the list for the first entry matching the
alias field and having a start date before the current date.

Although often honored in the breach, periodic key changes are
a security requirement.
We do of course assume that the cipher system will encrypt the data
for each message with a random key in any case.
But if everything always starts with the same key, then anyone
getting that key will have access to everything, which makes that
key an increasingly valuable target.
To compartmentalize, and limit that
risk, we must change keys
periodically, even public and private keys.

A start date field supports periodic key change in a way largely
invisible to the user.
A routine corporate key file update would add new keys at the start
of the file, with future start dates, thus not affecting the use of
the current keys at all.
Then, when the new date arrives, the new keys would be selected and
used automatically, making the key update process largely invisible
to the user and far more practical than usually thought possible.

In contrast, an end date seems much less useful, not the least
because it involves a prediction of the future as to when the key
may become a problem.
Moreover, presumably the intended response to such a date is to
stop key use, but if a new key has not been distributed, that also
stops business operation, which is just not smart.
So a start date is needed, but an end date is not.

Corporate key policies would produce new key files for users from
time to time, with future keys added and unused keys stripped out.
That also would be an appropriate time for the user to implement a
new passphrase.

Public key transport may at first seem fairly easy.
Public keys do not need to be encrypted for transport, and anyone
may see them.
However, they absolutely must be certified to be the exact
same key the sender sent.
For if someone replaces the sent key with another, subsequent
messages can be exposed without breaking any cipher (see
man in the middle attack).
The usual solution suggested is a large, complex, and
expensive certification infrastructure that is often ignored.
(See
PKI.)

Secret key transport first involves encrypting the secret key
under a one-time keyphrase or random
nonce.
The resulting message is then hand-carried (on a floppy or CDR) or
otherwise sent (perhaps by overnight mail or package courier) to
the other end.
Then the keyphrase is sent by a different channel (perhaps by phone
or fax) to decrypt the key.
Of course, if the encrypted message is intercepted and copied, and
then the second channel intercepted as well, the secret key would
be exposed, which is why hand-delivery is best.
Fortunately, most people who are working together do meet occasionally
and then keys can be exchanged.
As soon as the transported secret key is decrypted it should
immediately be re-encrypted for secure storage.
That can and should be done without ever exposing the key itself.

Two
substitution tables of the same
size with the same values can differ only in the ordering or
permutation of the values in the tables.
A huge
keying potential exists: The typical "n-bit-wide"
substitution table has 2n elements, and (2n)!
("two to the nth factorial") different permutations or key
possibilities. A single 8-bit substitution table has a
keyspace of 1648 bits.

A substitution table is keyed by creating a particular
ordering from each different key. This can be accomplished by
shuffling the table under the control of a
random number generator
which is initialized from the key.

A
key, in the form of a human-friendly
language phrase. In my designs, I usually
hash a keyphrase into a key for an
alias file, which holds the starting
keys for each particular channel or target.
Note that the hash can be a simple
CRC, since both the keyphrase
and the hash result have essentially the same security exposure.
Also see:
cipher system,
password and
user authentication.

The number of distinct
key-selected transformations supported by a
particular
cipher.
Normally described in terms of
bits, as in the number of bits needed to count
every distinct key. This is also the amount of
state required to support a state value for
each key. The keyspace in
bits is the
log2 (the base-2 logarithm) of the
number of different keys, provided that all keys are equally
probable.

Although brute force is not the only possible
attack, it is the
one attack which will always exist (except for ciphers with
Perfect Secrecy).
Therefore, the ability to resist a brute force attack is normally the
design strength of a
cipher. All other attacks should be made even more expensive.
To make a brute force attack expensive, a cipher simply needs a
keyspace large enough to resist such an attack. Of course, a
brute force attack may use new computational technologies such as
DNA or "molecular computation." Currently, 120 bits is large
enough to prevent even unimaginably large uses of such new
technology.

It is probably just as easy to build efficient ciphers which use
huge keys as it is to build ciphers which use small keys, and the
cost of storing huge keys is probably trivial. Thus, large keys
may be useful when this leads to a better cipher design, perhaps
with less key processing. Such keys, however, cannot be considered
better at resisting a brute force attack than a 120-bit key, since
120 bits is already sufficient.

On a PC-style computer, a processor internal to the keyboard
maintains the state of all keys (up or down).
The keyboard processor also continuously scans through each
possible key (perhaps every 2 msec) and reports to the PC any key
which has just gone up or down.
Keys are reported by position or "scan code;" the keyboard
processor does not use
ASCII.

Measuring keystroke timings is a common way of collecting
supposedly unknowable information for a
really random generator.
However, even though a PC computer can measure events very closely,
the keyboard scan process inherently quantizes keystrokes at a
far more coarse resolution.
And, when measuring software-detected events, various PC system
things like hardware interrupts and OS task changes can provide
substantial variable latency which is nevertheless
deterministic.

One model of a
cipher
is a key-selected mathematical
function or transformation between
plaintext and ciphertext.
To an
opponent
this function is unknown, and one of the best ways to address an
unknown function is to look at both the input and output.
More than that, even though an opponent has ciphertext,
something must be known about the plaintext or an
opponent has no way to measure attack success.

Public key ciphers allow opponents
to create known-plaintext.
Thus, public key ciphers force us to assume they will resist
known-plaintext attacks, even though that may or may not be correct.
However, most so-called "public key" ciphers do not protect actual
data with a public key system, but are in fact
hybrid ciphers, where the public key system
is used only to transfer a key for a conventional
secret key cipher.

The
cryptanalytic literature on
secret-key ciphers
is rife with attacks which depend upon known-plaintext, and
secret-key ciphers are still used for almost all data ciphering.
Virtually all secret-key ciphers are best attacked with
known-plaintext to the point that describing cipher weakness almost
universally means some number of cipherings and some amount of
known-plaintext. For example,
Linear Cryptanalysis normally
requires known-plaintext, while
Differential Cryptanalysis
generally requires the even more restrictive
defined plaintext condition.

If modern cipher designers do not talk much about known-plaintext,
that may be because designers think that:

not much can be done about it;

since typically only 1 or 2 blocks are sufficient to define a
particular key for a conventional block cipher, that tiny amount
of exposure cannot be prevented;

the current best attacks require huge amounts of known-plaintext,
amounts which in practice are unlikely to be available anyway;

since modern ciphers are designed with the goal of having
no weakness under known-plaintext conditions, designers may assume
without
proof that the goal has been achieved.

On the other hand, some aspect of the plaintext must
be known, or it will be impossible to know when success has been
achieved.
Consequently, it is hard to imagine a situation in which actual
known-plaintext would not benefit cryptanalysis.
Since huge amounts of known-plaintext are needed for current
attacks, that much exposure may be preventable at the
cipher system level.
And, since attacks only get better over time it would seem only
prudent to hide as much known-plaintext as possible.

It is surprisingly reasonable that an opponent might have a
modest amount of known plaintext and the related ciphertext:
That might be the return address on a letter, a known report or
newspaper account, or even just some suspected words.
Sometimes a cryptosystem will carry unauthorized messages such as
birthday greetings which are then discarded in ordinary trash,
due to their apparently innocuous content, thus potentially
providing a small known-plaintext example.
(It is harder to see how really huge amounts of known-plaintext
might escape, but one possibility is described in
security through obscurity.)

Unless the opponents know something about the plaintext,
they will be unable to distinguish the correct deciphering even
when it occurs.
Hiding all structure in the plaintext thus has the potential to
protect messages against even
brute force attack;
this is essentially a form of
Shannon-style
Ideal Secrecy.

One approach to making plaintext "unknown" would be to
pre-cipher the plaintext, thus hopefully producing an unstructured
ciphertext which would prevent success when attacking the second
cipher.
In fact, each cipher would protect the other.
Successful attacks would then have to step through both
ciphering keys instead of just one, which should be exponentially
more difficult.
This is one reason for using
multiple encryption.
Also see the known plaintext discussion
"Known-Plaintext and Compression"
(locally, or @:
http://www.ciphersbyritter.com/NEWS6/KNOWNPLN.HTM).

A known plaintext attack typically needs both the plaintext
value sent to the internal cipher and the resulting ciphertext.
Typically, a large amount of plaintext is needed under a single
key. A
cipher system which prevents any
one of the necessary conditions also stops the corresponding
attacks.

Known plaintext attacks can be opposed by:

minimizing the exposure of exact message plaintext (although,
realistically, only so much can be done),

Unfortunately, different abilities to sense and use deep
structure or
correlations in a sequence can make major
differences in the complexity value.
In general, we cannot know if we have found the smallest program.

nrandom samples are collected and
arranged in numerical order in array X as
x[0]..x[n-1].

S(x[j]) is the fraction of the n
observations which are less than or equal to x[j];
in the ordered array this is just ((j+1)/n).

F(x) is the reference cumulative distribution,
the probability that a random value will be less than or equal to
x. Here we want F(x[j]), the fraction
of the distribution to the left of x[j] which is a
value from the array.

There are actually at least three different K-S statistics, and
two different distributions:

What's the Difference?

This "side" terminology is standard but unfortunate, because
every distribution has two sides which we call "tails."
And every distribution can be interpreted in
one-tailed or
two-tailed ways.

Here, the "side" terminology refers to different statistic
computations: the "highest" difference, the "lowest" difference,
and the "greatest" difference between two distributions.
Thus, "side" refers not to the tails of a distribution,
but to values being either above or below the
reference.
If we knew that only one direction was news, we could use the
appropriate "one-sided" statistic, and still interpret the results
in a "two-tailed" way.

For example, if we used the "highest" test, we could detect large
positive differences from the reference distribution (if the p-value
was near 1.0) and also detect a distribution which was unusually
close to the reference (if the p-value was near 0.0).
Obviously, the "highest" test would hide negative differences.

On the other hand, if we compute both the "highest"
and "lowest" statistics, we cover all the information
(and slightly more) that we could get from the "two-sided" or
"greatest" statistic.

We can base a hypothesis test of any statistic results on
critical values on either or both tails of the
null distribution,
depending on our concerns.
One problem with a "two-tailed"
interpretation is that we accumulate critical region on both ends
of the distribution, even though the ends are not equally important.

What Does the P-Value Mean?

Normally, the
p-value we get from a statistic comparing
distributions is the probability of that value occurring when both
distributions are the same.
Finding p-values near zero or one is odd.
Repeatedly finding p-values too close to zero shows that the
distributions are unreasonably similar.
Repeatedly finding p-values too close to one shows that the
distributions are different.
And we do not need critical-value trip-points to highlight this.

The Knuth Versions

Knuth II multiplies Dn+, Dn- and Dn* by
SQRT(n) and calls them Kn+, Kn- and Kn*,
so we might say that there are at least six different K-S
statistics.

The
one-sided K-S distribution is easier
to compute precisely, especially for small n and across
a wide range (such as "quartile" 25, 50, and 75 percent values),
and may be preferred on that basis.
There is a modern evaluation for the
two-sided K-S distribution
which should be better than the old versions, but usually we do
not need to accept its limitations.
Often the experimenter can choose to use tests which are more
easily evaluated, and for K-S, that would be the "one-sided"
tests.

A delay; the time needed to perform an operation. Often a
hardware issue, but sometimes also
important in organizing system-level
software.
There is often a systems-level tradeoff between latency and
throughput (data rate).

Consider three cold slices of pizza: We might put all three on
a plate and heat them in a microwave in 3 minutes. That would be
3 minutes from thought to mouth, a 3-minute eating latency.
But if we put just one slice on a plate, we can heat that slice in
1 minute, for an eating latency of just 1 minute.
So we can be eating fully three times as soon, just by making the
appropriate choice, but then we can only eat one a minute.
System design is often about making
such choices.

Or consider a
mixing cipher, which typically needs
log n mixing sub-layers to mix n elements (i.e., n log n operations).
The latency is the delay from the time we start computing until we
get the result.
So if we double the number of elements, we also double the
necessary computation, plus another sub-layer.
Thus, in software, when we double the block size, the latency
increases somewhat, and the data rate decreases somewhat (but see
huge block cipher advantages).

In hardware, things are much different: Even though the larger
computation is still needed, that can be provided in separate on-chip
hardware for each sub-layer.
Typically, each sub-layer may take a single clock cycle to perform
the computation.
So if we double the block size, we need another sub-layer, and do
gain one more clock cycle of latency before a particular block
pops out.
But the data rate is still a full block per cycle, and stays that,
no matter how wide the block may be.
In hardware, when we double the block size, we double the data rate,
giving large blocks a serious advantage.

In the past, hardware operation delay has largely been dominated
by the time taken for
gateswitchingtransistors to turn on and off.
Currently, operation delay is more often dominated by the time it
takes to transport the
electrical
signals to and from gates on long, thin
conductors.

The effect of latency on throughput can often be reduced by
pipelining or partitioning the main
operation into many small sub-operations, and running each of
those in parallel, or at the same time.
As each operation finishes, that result is
latched and saved temporarily, pending the availability of the
next sub-operation hardware.
Thus, throughput is limited only by the longest sub-operation
instead of the overall computation.

A Latin square of
ordern is an n by n array
containing symbols from some
alphabet of size n, arranged such
that each symbol appears exactly once in each row and exactly once
in each column.

2 0 1 3
1 3 0 2
0 2 3 1
3 1 2 0

Since each row contains the same symbols, every possible row
can be created by re-arranging or
permuting the n symbols into
n! possible rows.
At order 4 there are 24 possible rows.
A naive way to build a Latin square would be to choose each row of
the square from among the possible rows, and that way we can build
(n!)n different squares.
At order 4, we can build 331,776 squares, but only 576 of those
(about 0.17 percent) have the Latin square form, and things get
exponentially worse at higher orders.

Cyclic Squares

The following square is cyclic, in the sense that
each row below the top is a rotated version of the row above it:

0 1 2 3
1 2 3 0
2 3 0 1
3 0 1 2

This is a common way to produce Latin squares, but is generally
undesirable for cryptography, since the resulting squares are
few and predictable.

Other Constructions

It is at least as easy to make a more general Latin square as
it is to construct an algebraic
group: The operation table of any finite
group is a Latin square, as is the addition table of a
finite field.
Conversely, while some Latin squares do represent
associative operations and can form a
group, most Latin squares do not.
At order 4 there are 576 Latin squares, but only 16 are associative
(about 2.8 percent).
So non-associative (and thus non-group) squares dominate heavily,
and may be somewhat more desirable for cryptography anyway.

Standard Form

A Latin square is said to be reduced or normalized
or in standard form when the symbols are in lexicographic
order across the top row and down the leftmost column.
Any Latin square of any order reduces through a single rows and
columns re-arrangement into exactly one standard square.

A Latin square is reduced to standard form in two steps:
First, the columns are re-arranged so the top row is in order.
Since that places the first element of the leftmost column in
standard position, only the rows below the top row need
be re-arranged to put the leftmost column in order.
(Alternately, the rows can be re-arranged first and then the
columns to the right of the leftmost column; the result
is the same standard square.)

The 576 unique Latin squares of order 4 include exactly 4
squares in standard form:

Expanding Standard Latin Squares

A Latin square of order n can be
shuffled or expanded into, and so
can represent, n!(n - 1)! different
squares.
This is accomplished by permuting the n - 1 rows
below the top row in (n - 1)! ways, and then
permuting n columns in n! ways.
Clearly, this reverses the process that will reduce any of those
squares into the original standard square.
However, in practice, apparently all the columns can be permuted first
and the rows (less one) permuted next, or both at the same time, or
even all the rows with the columns less one, each case apparently
producing exactly the same set of permuted or expanded squares.
(We can also permute the n symbols in n! ways, but
this will produce no new squares.)

At order 4, by permuting 4 rows and 3 columns, each standard
square expands to 4! * 3! = 24 * 6 = 144 permuted
squares.
Instead permuting 4 columns and 3 rows of the same standard square
produces exactly the same set of permuted squares.
Exactly one of each permuted set of 144 is in standard form, and
that is the original standard square.
Each of the squares expanded from the 4 standard squares is unique,
and the accumulation of 4 * 144 = 576 permuted squares
includes every possible order 4 Latin square exactly once.
Thus we see the value of standard form, where 4 reduced squares
represent 576.

A Latin square combiner can be seen as the generalization of the
exclusive-ORmixing concept from exactly two values (a
bit of either 0 or 1) to any number of different
values (e.g.,
bytes).
A Latin square combiner is inherently
balanced, because
for any particular value of one input, the other input can produce
any possible output value. A Latin square can be treated as an
array of
substitution tables, each of
which is invertible, and so can be reversed for use in a suitable
extractor. As usual
with cryptographic combiners (including
XOR), if we know the output and a specific
one of the inputs, we can extract the value of the other input.

For example, a tiny Latin square combiner might combine two
2-bit values each having the range zero to three (0..3). That
Latin square would contain four different symbols (here 0, 1, 2,
and 3), and thus be a square of
order 4:

2 0 1 3
1 3 0 2
0 2 3 1
3 1 2 0

With this square we can combine the values 0 and 2 by selecting
the top row (row 0) and the third column (column 2) and
returning the value 1.

When extracting, we will know a specific one (but only one)
of the two input values, and the result value. Suppose we know
that row 0 was selected during combining, and that the output
was 1: We can check for the value 1 in each column at row 0 and
find column 2, but this involves searching through all columns.
We can avoid this overhead by creating the row-inverse of the
original Latin square (the inverse of each row), in the
well-known way we would create the inverse of any invertible
substitution. For example, in row 0 of the original square,
selection 0 is the value 2, so, in the row-inverse square,
selection 2 should be the value 0, and so on:

1 2 0 3
2 0 3 1
0 3 1 2
3 1 2 0

Then, knowing we are in row 0, the value 1 is used to select
the second column, returning the unknown original value of 2.

A practical Latin square combiner might combine two bytes,
and thus be a square of order 256, with 65,536 byte entries. In
such a square, each 256-element column and each 256-element row
would contain each of the values from 0 through 255 exactly
once.

1. The body of rules intended to limit the conduct of man.
2. In Science, the
rules of thumb consistent
with the current
scientific models.
It is extremely rare that these are actual,
proven limits on the conduct of reality.

In the context of
block cipher design, a layer is
particular transformation or set of operations applied across the
block. In general, a layer is applied
once, and different layers have different transformations.
As opposed to
rounds, where a single transformation is
repeated in each round.

Layers can be
confusion layers (which simply change
the block value),
diffusion layers (which propagate
changes across the block in at least one direction) or both.
In some cases it is useful to do multiple operations as a
single layer to avoid the need for internal temporary storage
blocks.

The number of times a letter occurs in a body of text or
set of messages. Often of interest in classical
cryptanalysis.
Letter frequencies vary widely depending on the kind of writing used,
so there is no one right answer. One good source is:

At the top of the table are letters in a general rank ordering,
most-common at the left and least-common at the right.
Some of the variation possible in different sets of messages or
text is shown by the different ranks a given letter may have.
From this we might conclude that N, R, O, A, I should be treated
as a group, while S may (or may not) be unique enough to identify
specifically from the usage rank in a message.

In
argumentation:
1. A falsehood, knowingly told. Possibly
spin.
2. A falsehood, unknowingly told, when the teller had the
responsibility to investigate the topic and did not. As
distinct from an error based on sufficient research. (See
belief.)
3. A falsehood implied, or a false impression allowed to
exist by silence.

In scientific argumentation honesty is demanded. Lies cannot
further the cause of scientific insight and conclusion.
However, lies can waste the time of everyone involved, not just
for the time of the discussion, but potentially years of effort
by many people, based on false assumptions.

Personally, I take an accusation of lying very seriously.
Absent a public apology, my response is to end my interaction
with that person. That is what I do, and that is what I think
everyone should do.

Some people think that failing to defend against even the most
heinous assertion is a sign of weakness or even an admission of
guilt. But when a person is accused of lying, the accused
immediately knows whether the accusation is correct or not.
And if not, the accuser has just shown themselves as a
liar or a fool. There is no need to separate these possibilities.
In either case there is no reason to further dignify whatever points
they wish to present.

A participant in a discussion has no responsibility to respond,
no matter what
claim an opponent may make.
Instead, is the responsibility of the claimant to present logical
arguments or
proof, as opposed to a mere possibility or
belief.
Simply making a claim then demanding that it be accepted if it cannot
be proven false is the
ad ignorantium fallacy.

Literally, "like a line" or "resembling a line."
1. The description of an
electroniccircuit or
amplifier in which a plot of input
voltagex versus output voltage
y = f(x), over a limited
range,
is approximately a straight line,
y = ax.
In such a circuit, a known percentage change in the input produces
a similar percentage change in the output.
2. The description of an
electroniccircuit or data transmission
system in which independent
signals do not interact, or interact only weakly.
In a linear system, the output result of one signal does not change
when other signals are added, nor are spurious signals created.
3. A very simple mathematical
function.
For example, the geometric
equationy = f(x) = ax + b.
4. A
linear equation.

Suppose we have a
cipher to
analyze which has some unknown internal
function: If that function is
random, with no known pattern between
input and output or between values, we may have to somehow traverse
every possible input value before we can understand the full function.

But if there is a simple pattern in the function values,
then we may only need a few values to predict the entire function.
And a linear function is just about the simplest possible pattern,
which makes it almost the
weakest possible function.

There are at least two different exploitable characteristics
of cryptographic linearity:

Mathematical manipulation:
Unknown values on one side of a linear function can be seen as
simply modified values on the other side, thus avoiding the
function entirely.
Or unknowns might be solved by simultaneous equations and other
techniques of linear algebra.
These techniques more or less require exact mathematical linearity.

Predictability:
If we are willing to accept some error, a supposedly complex
function might be modeled by a much simpler function which is
"often" correct.
Alternately, an unknown function might be exposed by a relatively
small amount of data, even if the function is only "approximately"
linear. (Also see
rule of thumb.)
Neither approach requires full mathematical linearity.

Technical Definitions of Linearity

for any a, b, x, y in G.
To be linear, function f is thus usually limited to the
restrictive form:

f(x) = ax

that is, "multiplication" only, with no "additive" term.
Functions which do an additive term; that is, in the form:

f(x) = ax + b

are thus technically distinguished and called
affine.
Affine functions are virtually as weak as linear functions, and
it is very common to casually call them "linear."

Another definition of linearity is:

1) f(0) = 0
2) f(ax) = a * f(x)
3) f(a + b) = f(a) + f(b)

where (1) apparently distinguishes linear from affine.

It is also possible to talk about linearity with respect to
an "additive group." A
functionf: G -> H is linear in
groups G and H (which have addition as the
group operation) if:

f(x + y) = f(x) + f(y)

for any x,y in G.

Uniqueness and Implications of Linearity

There are multiple ways a relationship can be linear:
One way is to consider a, x, and b as
integers.
But the exact same bit-for-bit identical values also can be
considered
polynomial elements of
GF(2n).
Integer addition and multiplication is linear in the
integers, but when seen as
mod 2 operations, the exact same computation
producing the exact same results is not linear.
In this sense, linearity is
contextual.

Moreover, in cryptography, the issue may not be as much one of
strict mathematical linearity as it is the "distance" between a
function and some linear approximation (see
rule of thumb and
Boolean function nonlinearity).
Even if a function is not technically linear, it may well be
"close enough" to linear to be very weak in practice.
So even a mathematical proof that a function could not be
considered linear under any possible field would not really address
the problem of linear weakness.
A function can be very weak even if technically nonlinear.

True linear functions are used in ciphers because they are easy
and fast to compute, but they are also exceedingly weak.
Of course
XOR is linear and trivial, yet is used all the
time in arguably
strong ciphers;
linearity only implies weakness when an
attack can exploit that linearity.
Clearly, a conventional
block cipher design using linear
components must have nonlinear
components to provide strength, but linearity, when part of a
larger system, does not necessarily imply weakness.
In particular, see
Dynamic Transposition,
which ciphers by
permutation.
In general, there is a linear algebra of permutations, but that
seems to be not particularly useful when a different permutation
is used on every block, and when the particular permutation used
cannot be identified externally.

When the LC approximation equations include both plaintext and
ciphertext bits, they obviously require at least
known plaintext for evaluation,
with sufficient data to exploit the usually tiny bias.
LC typically also requires knowing the contents of the internal
S-boxes to establish LC
approximations.

Accordingly, most LC attacks are prevented by the simple use of
keyedS-boxes.

While these may seem like overkill for the simple purpose of
addressing Linear Cryptanalysis, if they are to be part
of the cipher or cipher system anyway, LC is pretty much out
of the picture without added cost or analysis.
Moreover, these are clear, understandable and believable ways
to address apparently unlimited anxieties about LC attacks.

In an n-element shift register (SR), if the last
element is connected to the first element, a set of n values
can circulate around the SR in n steps. But if the values in
two of the elements are combined by
exclusive-OR and that result connected
to the first element, it is possible to get an almost-perfect
maximal length sequence of
2n-1 steps. (The all-zeros state will produce another
all-zeros state, and so the system will "lock up" in a
degenerate cycle.)
Because there are only 2n different states of
n binary values, every state value but one must occur exactly
once, which is a statistically-satisfying result. Moreover, the
values so produced are a perfect
permutation of the
counting numbers
(1..2n-1).

In the figure we have a LFSR of
degree 5, consisting of 5 storage
elements a[5]..a[1] and the
feedback computation a[0]=a[5]+a[3].
The stored values may be
bits and the operation (+) addition
mod 2. A
clock edge will simultaneously shift
all elements left, and load element a[1] with the feedback result as
it was before the clock changed the register. Each SR element is
just a time-delayed replica of the element before it, and here the
element subscript conveniently corresponds to the delay. We can
describe this logically:

Normally the time distinction is ignored, and we can write more
generally, for some feedback
polynomial C and
state polynomial A of
degreen:

n
a[0] = SUM c[i]*a[i]
i=1

The feedback polynomial shown here is 101001, a degree-5 poly
running from c[5]..c[0] which is also
irreducible. Since we have degree 5
which is a
Mersenne prime, C is also
primitive. So C produces a
maximal length sequence of exactly
31 steps, provided only that A is not initialized as zero. Whenever
C is irreducible, the reversed polynomial (here 100101) is also
irreducible, and will also produce a maximal length sequence.

LFSR's are often used to generate the
confusion sequence for
stream ciphers, but this is very
dangerous: LFSR's are inherently
linear and thus weak. Knowledge of the
feedback polynomial and only
n element values (from
known plaintext) is sufficient
to run the sequence backward or forward.
And knowledge of only 2n elements is sufficient to develop an
unknown feedback polynomial (see:
Berlekamp-Massey).
This means that LFSR's should not be used as stream ciphers without
in some way isolating the sequence from analysis. Also see
jitterizer and
additive RNG.

1.
electronic devices which realize
symbolic logic, such as
Boolean logic, the TRUE and FALSE values
used in
digital computation. Also see
logic function.
2. A branch of philosophy related to distinguishing between
correct and incorrect
reasoning.
The science of correct reasoning.
The basis of mathematics and arithmetic.

Since math is based on logic, it is not supposed to be possible
for math to support fuzzy or incorrect reasoning.
However, math does exactly that in practical cryptography.
(See some examples at
proof and
old wives' tale.)
The problem seems to be a false assumption that math is
cryptography, so whatever math proves must apply in practice.
But math only is cryptography in theoretical systems for
theoretical data; in real systems, the necessary assumptions
can almost never be guaranteed, which means the conclusions are no
longer proven.
It is logically invalid (and extremely dangerous) to imagine that
unproven conclusions can provide confidence in a real system.

In TTL-compatible devices,
a logic zero input must be 0.8 volts or lower, and
a logic one input must be 2.0 volts or higher.
That leaves the range between 0.8 and 2.0 volts as invalid.
When a logic device has an invalid voltage on some input, it is not
guaranteed to perform the expected digital function.
In particular, attempts to latch invalid voltage levels can lead to
metastability problems.

Note that different logic families have different valid signal
ranges:

These are
componentspecifications
that characterize the situation of a logic output pin connected
to logic input pins.
Assuming that the supply voltage, ambient temperature, loading
and time delays are all within specified limits, the output
voltage is guaranteed to be either
higher than VOH (for a one),
or lower than VOL (for a zero).
These output values are beyond the levels needed for inputs
to sense a particular logic level (either VIH or
VIL).
As a result, a system can have 300 or 400 mV of noise (from the
power supply, ground loops, inductive and capacitive pickup) yet
still sense the correct logic value.
In this way, large digital systems can be built to perform
reliably. (Also see:
system design.)

Also "machine code." A
computer program in the form of the
numeric values or "operation codes"
(opcodes) which the computer
can directly execute as instructions, commands, or "orders."
Thus, the very public
code associated with the instructions
available in a particular computer. Also the programming of a
computer at the bit or hexadecimal level, below even assembly
language. Also see
source code and
object code.

(MITM attack.)
A form of
attack in which an
opponent can intercept
ciphertext, manipulate it, then send it
to the far end.
The point is either to expose the traffic, or perhaps to change it
in a way that may not be detected.

Public Key MITM

By far the most serious man-in-the-middle problems are
public key issues.
In this sort of attack, the opponent arranges for the user to
encipher with the key to the opponent, instead of the key to the
far end.
All long, random keys look remarkably alike, but using a key
from the opponent allows the opponent to decipher the message,
which completely exposes the
plaintext.
The opponent then re-enciphers that plaintext under the key to the
far end and sends along the resulting ciphertext so neither end
will know anything is wrong.

The public-key MITM attack targets the idea that many people
will send their
public keys on the network.
The bad part of this is a lack of public-key
certification.
Unless public keys are properly
authenticated by the user, the MITM
can send a key just as easily, and pretend to be the other end.
Then, if one uses that key, one has secure communication
with the opponent, instead of the far end.
So a message to the desired party goes through the opponent, where
the message is deciphered, read, and re-enciphered with the correct
key for the far end.
In this way, the opponent quickly reads the exact conversation
with minimal effort and without
breaking the cipher per se, or
cryptanalysis of any sort, yet neither
end sees anything suspicious.

All of this depends on the opponent being able to intercept and
change the message in transit.
The original cryptographic
model related to radio transmission
and assumed that an opponent could listen to the
ciphertext traffic, and perhaps even interfere with it,
but not that messages could be intercepted and completely
hidden.
Unfortunately, message interception and substitution is a far more
realistic possibility in a store-and-forward
computer network than the radio-wave model
would imply.
Routing is not secure on the Internet, and it is at least
conceivable that messages between two people are being routed through
connections on the other side of the world.
This property might well be exploited to make such messages flow
through a particular computer for special processing.
Again, neither end would see anything suspicious.

Perhaps the worst part of this is that a successful MITM attack
does not involve any attack on the actual ciphering.
Even a mathematical
proof of the security of a particular
cipher would be irrelevant in a system which allows MITM attacks.

CBC First Block MITM

A related (but very limited) version of an MITM attack can occur
in the first block of
CBCblock cipheroperating mode.
In CBC, the
IV is
exclusive-ORed with the plaintext of
the first block during deciphering.
So if the opponents can somehow change the IV in transit,
they can also change the resulting first block plaintext.
And if the opponents know what plaintext was sent (perhaps a logo,
or a date, or a name, or a command, or even a fixed dollar figure),
they can change it to anything they want (in the first block).

Since the problem exploited in public-key MITM attacks is a lack
of authentication, one might jump to the conclusion that all
MITM attacks are authentication problems.
But the authentication needed for public-key MITM attacks is not
the authentication of an IV, nor even the authentication of
plaintext, but instead the authentication of the public key itself.
Key authentication is a fundamentally different issue than the
CBC IV and first block problem.

Unlike public-key MITM attacks, the problem with the CBC first
block is not a lack of
authentication, but rather a lack of
confidentiality for the IV.
It is the lack of confidentiality which allows the IV value to be
usefully manipulated (see
CBC).
That makes CBC first-block MITM a cipher-level problem, something
appropriately solved below the
cipher system level.
It is often said that an IV can simply be transmitted in the open,
but it is exactly that exposure which enables the first-block CBC
MITM problem.

Preventing MITM Attacks

The way to avoid CBC first-block MITM problems is to encipher
the IV instead of exposing it. Alternately, a higher-level MAC
could be used to detect any changes in the plaintext.

The way to avoid public-key MITM attacks is to
certify the keys, but this is inconvenient
and time-consuming. So, unless the
cipher system actually requires
keys to be certified, this is rarely done.
The worst part is that a successful MITM attack consumes few
resources, need not "break" the cipher proper, and may
provide just the kind of white-collar desktop intelligence a
bureaucracy would love.

It is interesting to note that, regardless of how inconvenient
it may be to share keys for a
secret-key cipher, this is an
inherent authentication which prevents the horribly complete exposure
of public-key MITM attacks.

the mapping or
function or transformation or
rulef takes any value in the
domainX into some value in the
range, which is contained in Y.
For each element x in X, a mapping associates a single
element y in Y.
Element f(x) in Y is the image of element
x in X.

If f(X) covers all elements in Y,f
is a mapping of XontoY, and is
surjective.

Mathematics is based on the correct
logic of
argumentation which is studied in
philosophy.
As such, it should be impossible (or at least exceedingly
embarrassing) for math to support invalid conclusions.
But I claim that happens all the time in cryptography.
The problem seems to be a failure to recognize the distinction
between
theory and
practice.
In particular, many common
proofassumptions are simply impossible
to guarantee in practice, which means the proof results cannot be
relied upon.

Despite claims to the contrary, cryptography is not
mathematics!
Instead, math is a general
modeling tool.
Cryptography is an applications area which applied math can
model.
But, in any field, the utility of results always
depends upon the extent to which some model corresponds to reality.
There is nothing new about this: The distinction between
theory and
practice is pervasive in
science.
Learning the meaning of modeling, how to apply models, that models
have limits, and what to do outside a model, are fundamental parts
of a scientific education.
Clearly, cryptographic models must first correctly model
reality before they can be said to apply to reality.
The need for model validation is often unmentioned or
forgotten or just considered irrelevant in the rush
to glorify a new crypto math
proof.

Cryptography is simply different from most fields that use
math models.
In practice, the worth of a real cryptosystem often depends upon
point of view:
The various ciphering actors--sender, receiver, opponent
and designer--each have a different view of a
cryptosystem, limited by what they can know.
But we almost never see mathematicians craft strength proofs for the
information-limited user contexts.
Nor do we see arguments that such proofs cannot be achieved, which,
of course, would directly address systemic limitations in the current
concept of cryptography.

In mathematics it is common to say things like:
"Let us assume that we have situation x; now let's prove what that
would mean for other things."
In this way, absolute logical requirements are easily handwaved into
mathematical existence.
But actually achieving those requirements in practice is
another story.
Normally, math requirements are not prescriptive; that is,
they do not describe how a property is to be provably
obtained, only that it somehow be obtained.
Math normally provides no assurance that a handwaved property even
can be achieved in practice.
But if the needed property cannot be achieved, then, obviously,
all attempts to achieve it are, and have been, a useless waste of time.
That can be somewhat irritating to those who seek the practical use of
math results in real systems.

If, for example, math assumes the existence of an
unpredictablerandom number generator,
cryptographic
Perfect Secrecy can be proven.
But in practice, we can guarantee no such thing.
Oh, we can build generators that seem pretty unpredictable (see
really random),
but finding an absolute guarantee that the actual machines
are unpredictable at the time they are used seems
beyond our reach.

Without an absolute guarantee that each and every
assumption has been identified and is simultaneously
achieved in practice, a supposed
"proof" is not even a complete
logicalargument.
Such a "proof" is formally incomplete in practice, and so
technically concludes nothing at all.
In cryptography, theoretical proofs thus tend to create unfounded
belief, something both science and
mathematics should be working hard to avoid and debunk.

Almost no theoretical math security proofs apply in practice,
yet most math-oriented crypto texts seem to say they do
(see, for example, the
one time pad proofs).
In my view, that means mathematical cryptography has not yet been
forced to address the distinction between
theory and
practice.
If mathematical cryptography is to apply in reality, it must be as
an applied discipline, especially those parts which are
now largely mathematical illusion.

Alas, mathematical cryptography seems not very concerned about
the situation, since things could be done differently but are not.
One approach might be to use only properties which provably
can be achieved in practice.
That would, of course, greatly restrict mathematical "progress,"
but only if we take "progress" to include useless results.
Until the math guys really do get concerned about reality, they
necessarily leave it up to the individual practitioner to identify
those theoretical results that do not apply in practice.
In this way, ordinary workers in the field are being required to
reason better than the math guys themselves. Yet sloppy
reasoning is how some of the most
commonly-held views in cryptography can be simply false (see
old wives' tales).

Difficulties also exist in taking mathematical experience and
applying that to cryptography:

Mathematical symbology has evolved for concise expression.
It is thus not "isomorphic" to the complexity of the implementation,
and so is not a good vehicle for the design-time trade-off of
computation versus
strength.

Most mathematical operations are useful or "beautiful"
relationships specifically intended to support understanding in
either direction, as opposed to relationships which might be
particularly difficult to reverse or infer.
So when using the traditional operations for cryptography, we
must first defeat the very properties which made these operations
so valuable in their normal use.

Mathematics has evolved to produce, describe and expose
structure, as in useful or "beautiful" large-scale
relationships and groupings. But, in a sense, relationships and
groupings are the exact opposite of the fine-grained completely
random mappings that cryptography would
like to see. Such mappings are awkward to express mathematically,
and contain little of the structure which mathematics is intended
to describe.

There may be an ingrained tendency in math practitioners, based
on long practice, to construct math-like relationships, and
such relationships are not desirable in this application. So when
using math to construct cryptography, we may first have to defeat
our own training and tendencies to group, understand and simplify.

On the other hand, mathematics is irreplaceable in
providing the tools to pick out and describe structure in
apparently strong cipher designs. (See, for example,
Boolean function nonlinearity
and my comments on experimental
S-box nonlinearity measurement.)
Mathematics can identify specific strength problems, and evaluate
potential fixes.
But there appears to be no real hope of evaluating strength with
respect to every possible attack, even using mathematics.

Although mathematical cryptography has held out the promise
of providing provablesecurity, in over 50 years of work,
no practical cipher has been generally accepted as having
provenstrength. See, for example:
one time pad and
proof.

An
RNG or
FSM with a
cycle structure consisting of just a single
cycle, or perhaps a
degenerate cycle plus one long
cycle. In
cryptographic use (especially as
stream cipherrunning key generators)
having a guaranteed minimal length is an important RNG property:
Keyed RNG's may seem unpredictable,
but when a short sequence repeats, that sequence has just become
predictable and insecure.

Originally a
linear feedback shift register
(LFSR) sequence of 2n-1 steps
produced by an n-bit-wide
shift register.
This means that every
binary value the register can hold, except
zero, will occur on some step, and then not occur again until all
other values have been produced. A maximal-length LFSR can be
considered a binary counter in which the count values have been
shuffled or
enciphered.
The sequence from a normal binary counter is perfectly
balanced and
the sequence from a maximal-length LFSR is almost perfectly
balanced. Also see
M-sequence.

Maximum Distance Separable codes. In
coding theory,
codes with the greatest possible
error-correction capability.
These codes expand the original data with a wider representation
so that even multiple bit changes are still "closer" to the
correct value than some other.

However, MDS codes are not applicable to all designs, nor are
such codes needed for optimal ciphering.
The obvious counterexample is my
mixing cipher designs, which use small
Balanced Block Mixing
operations (actually,
orthogonal Latin squares).
The
oLs's are arranged into
scalable FFT-like structures to
mix every input byte into every output byte, and to diffuse even a
small input change across each and every output.

In descriptive
statistics,
a measure of the "central tendency" (the extent to which data tend
to cluster around a specific value).
Commonly the arithmetic average, the sum of n values divided by n.

Similar computations include the
geometric mean
(the nth root of the product of n values, used for average rate
of return) and the
harmonic mean
(n divided by the sum of the inverses of each value,
used to compute mean sample size), among others.

Although perhaps looked down upon by those of the mathematical
cryptography persuasion, mechanistic cryptography certainly does
use mathematics to design and predict performance. But rather
than being restricted to arithmetic operations, mechanistic
cryptography tends to use a wide variety of mechanically-simple
components which may not have concise
mathematical descriptions. Rather than simply implementing a
system of math expressions, complexity is
constructed from the various efficient components available to
digital computation.

A phrase generally implying that using a different word to
describe the same facts does not change the essence of the facts.
(Also called
just semantics.)

However, words are how we discuss facts, and semantics is the
meaning of those words, making semantics somewhat more important
than "mere."
Finding that a discussion is not using the expected meaning of a
term, or that multiple different meanings are being used
simultaneously, shows that the discussion itself is in trouble.
(See
argumentation,
logic and
fallacy.)

For example:

If someone claims that a
cipher is
provensecure,
and then we look at the many meanings for "proof" and find that
none of them apply, we can see the claim is false.
That is not just using a different word to describe reality, it
is showing that the claims are factually false.

If a
theoreticalproof leads someone to
believe that the
one time pad is "unbreakable," the
uncomfortable fact that some such ciphers actually have been
broken in
practice can create considerable
cognitive dissonance.
Fortunately, Science does not require belief.
Normally, when a
theoreticalscientific model is shown to not
apply in practice, we publicize that or correct the model.
That is not using a different word to describe the same situation,
it is finding the correct words to describe reality.

Information represented as a sequence of symbols.
Often as readable human language, and now often also as values
represented as
binary for
digital transmission. Also see
plaintext.

A message can be seen as sequence of values.
Consequently, it may seem that the only
cryptographic manipulations possible
would be either to change values
(substitution) or change position
(transposition).
However, it is also possible to include meaningless values in or
between messages (as in
nulls).
A variation is to intermix several different messages as in a
braid or grille.
Another possibility is to collect groups of symbols and change
those values as a unit, which is technically a
code.

Note that message length is rarely hidden by
ciphers.
One way of hiding length is by continuous transmission with
nulls between messages.
Naturally, it is then necessary to identify the start and end
of each message, which may involve various
synchronization techniques.
Another way of hiding length is to use nulls to expand the message
by a random amount.

Storage of messages for later use.
This becomes a particular issue when the messages contain
sensitive information.
Clearly, if we save messages in their encrypted form, we also need
keys to expose their contents, even though
updated keys may be in use for that same channel (see
key storage).

Fortunately, if we have all the keys ever used for some
channel, a start date for each is sufficient to define the
period of activity for a particular key.
So if we know the message date, we can select the key which was
active on that date.
With email, we might parse the header for the message date, and
so automatically select the correct old key.

One possibility for local user archives is to decrypt all
messages when received, and accumulate the plaintext files.
A modified version of that is to re-encrypt each message under the
user's own keyphrase, although that could be a problem when it
comes time to change that keyphrase, or if the user leaves.
Yet another alternative might be to simply archive each message
as received, even encrypted, and then the select old keys
needed to access old encrypted messages.

All these user-centric approaches have in common the problem that
the user's archives are assumed to be intact, that the computer has
not crashed or been deliberately erased.
That is an assumption a corporation may not be prepared to accept.

In contrast, corporate security policies may want to archive
all messages, from and to all users.
When the corporation is the source of all keys (see
key creation), it would have all the
alias files and all the old keys for
each user.
To select the correct key, we need to identify the user,
the email address or name (the "alias") for the far end, and the
message date.
Again, email message headers can be parsed for this information,
especially if users follow reasonable alias protocols.
(Note that alias protocols may the least part of dealing with
sensitive information and encryption.)
If each message is kept in a different file, that file can
be given the appropriate date for that message which then
provides another source of the date for
key selection.

Since any decent
cipher system should report the use
of the wrong key, all new encrypted messages could be automatically
checked for correct decryption.
Action could then be taken quickly if the appropriate keys were
not being used.

Assurance that a message is from the purported sender.
Sometimes part of
message integrity.

To a large extent, message authentication depends upon the
use of particular
keys to which
opponents should not have access.
Simply receiving an intelligible message thus indicates
authentication.
But automatic authentication may require added message
redundancy in the form of a
hash value that can be checked upon receipt,
similar to message integrity.

Another approach to message authentication is to use an
authenticating block cipher;
this is often a
block cipher which has a large
block, with some "extra data" inserted in
an "authentication field" as part of the plaintext before
enciphering each block.
The "extra data" can be some transformation of the key, the
plaintext, and/or a sequence number. This essentially creates a
homophonic block cipher: If we know
the key, many different ciphertexts will produce the same plaintext
field, but only one of those will have the correct authentication
field.

The usual approach to authentication in a
public key cipher is to encipher
with the private key. The resulting ciphertext can then be
deciphered by the public key, which anyone can know. Since even
the wrong key will produce a "deciphered" result, it is also
necessary to identify the resulting plaintext as a valid message;
in general this will also require redundancy in the form of a hash
value in the plaintext. The process provides no
secrecy, but only a person with access to
the private key could have enciphered the message. Also see:
key authentication and
user authentication.

Although widely touted and used, a MAC is hardly the only possible
form of authentication.
A MAC normally functions across a whole message, and thus requires
that the entire message exist before authentication can operate.
One alternate form of authentication is a per-block authentication
field (see
homophonic substitution and
block code).
This allows each block to be authenticated, and possibly could
even replace standard Internet Protocol error-detection, thus
reducing system overhead.
Presumably, other forms of authentication are also possible.

Assurance that a message has not been modified during
transmission, sometimes including
message authentication.
One approach computes a
CRChash across the
plaintext data, and appends the CRC
remainder (or result) to the plaintext data: this adds
a computed redundancy to an arbitrary message.
The CRC result is then
enciphered along with the data.
When the message is
deciphered, if a second CRC operation
produces the same result, the message can be assumed unchanged.

Note that a CRC is a fast,
linear hash.
Messages with particular CRC result values can be constructed
rather easily.
However, if the CRC is hidden behind strong ciphering, an
opponent is unlikely to be able
to change the CRC value systematically or effectively.
In particular, this means that the CRC value will need more
protection than a simple
exclusive-OR in an additive
stream cipher or the exclusive-OR
approach to handling short last
blocks in a
block cipher.

A similar approach to message integrity uses a nonlinear
cryptographic hash function or
MAC. These also add a computed redundancy
to the message, but generally require more computation than a CRC.
While cryptographic hashes generally purport to have significant
security properties, those are rarely if ever
proven to the same extent as the lesser
properties of a simple CRC.
It is thought to be exceedingly difficult to construct messages
with a particular cryptographic hash result, so the hash result
perhaps need not be hidden by encryption. Of course doing that is
just tempting fate.

Another approach to message integrity is to use an
authenticating block cipher;
this is often a
block cipher which has a large
block, with some "extra data" inserted in
an "authentication field" as part of the plaintext before
enciphering each block.
The "extra data" can be some transformation of the key, the
plaintext, and/or a sequence number. This essentially creates a
homophonic block cipher: If we know
the key, many different ciphertexts will produce the same plaintext
field, but only one of those will have the correct authentication
field.

A
key transported with the message and used for
deciphering the message. (The idea of a
"session key" is very similar, but
presumably lasts across multiple messages.)

Normally, the message key is a large
really random value or
nonce, which becomes the key for ciphering
the data in a single message (see
cipher system).
Normally, the message key itself is enciphered under the User Key
or other key for that link (see
alias file and
key management).
The receiving end first deciphers the message key, then uses that
value as the key for deciphering the message data.
Alternately, the random value itself may be sent unenciphered,
but is then enciphered or hashed (under a keyed cryptographic hash)
to produce a value used as the data ciphering key.

Message keys have very substantial advantages:

A message key assures that the actual data is ciphered under
an arbitrary selection from a huge number of possible keys;
it therefore prevents weakness due to user key selection.

A message key is used exactly once, so even a successful
brute force attack on a
message key exposes just one message.

A message key is constructed internally, and this construction
cannot be controlled by a user; this prevents all
attacks based
on repeated ciphering under a single key or a known sequence of keys.

To the extent that the message key value is unpredictable, it is
more easily protected than ordinary text (see
Ideal Secrecy).

Since message key values are never exposed to users on on either
end, the
opponent never has the
known plaintext
for the message key values.

It is important that message key construction be made clear
and straightforward in design and implementation.
Like most nonces, a message key is "extra" data, the value of
which is not important.
That value thus could be subverted to become a hidden side-channel
for disclosing secure information.

In a sense, a message key is the higher-level concept of an
IV, which is necessarily distinct for each
particular design.
Some form of message key is the usual way to implement a
hybrid or
public key cipher.

The condition where the
voltage output of a
digital
device is an invalidlogic level
between logic 0 and logic 1 for an unexpectedly long time.
This happens to be the voltage which, when placed at an input, and
amplified through two particular inverting stages, produces exactly
the same voltage (or waveform) on the second output.
We know there must be such a point, because an input which is
"slightly 1" gives a "hard 1" out, and "slightly 0" gives
a "hard 0" out; somewhere between these levels, some voltage must
reproduce itself.
This is a consequence of digital logic being built from internal
transistoramplifiers which are inherently
analog, and not
digital.

Metastability is typically caused by violation of the
set-up
and
hold
time requirements of a
flip-flop,
which can cause an intermediate voltage level to be latched.
Note that intermediate voltage levels always occur when a
digital signal changes state; in a well-designed system they are
ignored by
clock
signals which provide time for the transient condition to pass.
Metastability occurs when the "amplified" level in a flip-flop
happens to be exactly the same as the input level at the time
the clock connects these points.
The condition can endure indefinitely, until internal
noise
causes a collapse one way or the other.

Metastability cannot be prevented in designs which use
a digital flip-flop or latch to capture raw
analog data or
unsynchronized
digital data such as digitized noise.
Metastability can be reduced by minimizing the time the
signal spends in the invalid region, for example by using faster
logic and/or Schmitt trigger devices.
Metastability can be greatly reduced by using more than
one stage of clocked latch.
Metastability is eliminated in logic design by assuring valid
logic levels and timing, such that setup and hold times are never
violated.

In the simple case, where a single input line is sampled,
metastability may only cause an occasional unexpected delay and
an uncontrolled
non-randombias.
But if multiple lines are sampled and metastability occurs on even
one of those, a completely different value can be produced.
In some systems, latching a wrong value could lead to entering
unexpected or prohibited states with undefined results.

"Rule 1: If you have a
conjecture, set out to prove it and
refute it. Inspect the proof carefully to prepare a list of
non-trivial
lemmas (proof-analysis); find
counterexamples both to the
conjecture (global counterexamples) and to the suspect lemmas
(local counterexamples).

"Rule 2: If you have a global counterexample and
discard your conjecture, add to your proof-analysis a suitable
lemma that will be refuted by the counterexample, and replace
the discarded conjecture by an improved one that incorporates that
lemma as a condition.
Do not allow a refutation to be dismissed as a
monster.
Try to make all 'hidden lemmas' explicit.

"Rule 3: If you have a local counterexample, check
to see whether it is not also a global counterexample. If it is,
you can easily apply Rule 2.

"Rule 4: If you have a counterexample which is local
but not global, try to improve your proof-analysis by replacing
the refuted lemma by an unfalsified one.

"Rule 5: If you have counterexamples of any type,
try to find by deductive guessing, a deeper
theorem to which they
are counterexamples no longer."

". . . a valid
[formal] proof is one in which
no matter how one interprets the descriptive terms,
one never produces a counterexample -- i.e. its
validity does not depend on the meaning of the descriptive
terms . . . ." [pp.100,124]

"For any
[informal] proposition
there is always some
sufficiently narrow interpretation of its terms, such that it
turns out true, and some sufficiently wide interpretation,
that it turns out false." [p.99]

". . . informal, quasi-empirical, mathematics does not grow
through a monotonous increase of the number of indubitably
established theorems, but through the incessant improvement of
guesses by speculation and criticism, by the logic of proofs
and refutations." [p.5]

"Refutations, inconsistencies, criticism in general are very
important, but only if they lead to improvement.
A mere refutation is no victory.
If mere criticism, even though correct, had authority, Berkeley
would have stopped the development of mathematics and Dirac
could not have found an editor for his papers." [p.112]

(Also "Military Strength.") A casual description of
cipherstrength; a term intended to
convey the idea that data security is being taken seriously.
Unfortunately, this is at best a
belief, and not proven fact.

Any such claim is flawed in multiple ways:

It seems unlikely that anyone who knows anything about real
military ciphers is going to talk about them in a marketing bulletin.
So probably what we get is someone's delusion about what military
ciphers must be like.

Even if some cipher is similar to a military
cipher, mere similarity is not enough to predict strength, thus
quality and worth.

Nobody at all, anywhere, knows the strength of an unbroken
cipher, whether military or civilian, because it is the
breaking which exposes and defines the
weakness.
And even that result is just the strength known to a particular
community with particular techniques; the strength can be much
less to someone else.

There can be no informed opinion on the strength of an
unbroken cipher.

Below, we have a toy 32-bit-block Mixing Cipher.
Plaintext at the top is transformed into
ciphertext at the bottom.
Each "S" is an 8-bit
substitution table, and
each table (and now each mixing operation also) is individually
keyed.

Horizontal lines connect elements which are to be mixed
together: Each *---* represents a single
Balanced Block Mixing or BBM.
Each BBM takes two elements, mixes them, and returns two mixed
values. The mixed results then replace the original values in the
selected positions just like the
butterfly operations used in some
FFT's.

A 32-Bit Mixing Cipher
| | | | Fencing
| | | |
*---* *---*

By mixing each element with another, and then each pair with
another pair and so on, every element is eventually mixed with every
other element. Each BBM mixing is
dyadic, so each "sub-level" is a mixing of
twice as many elements as the sublevel before it. A block of n
elements is thus fully mixed in log2n
sublevels, and each result element is equally influenced equally by
each and every input element.

Goals

Scalability is the only way we can exhaustively
test a real cipher.
Without scalability, all we have are mathematical
lemmas, each with
assumptions that somehow never can
be completely satisfied in practice.
Only scalability can allow us to exhaustively test and certify
the exact code (software) actually used in practice.

Huge Blocks offer usage advantages which are simply
impractical in relatively small blocks. These include (see
Huge Block
Cipher Advantages):

Large Keyed Internal State is information an
opponent must somehow reconstitute from external observation.
This is a slightly different and more believable form of strength
than the usual claim that a complex computation based on fixed
constants and tables with the ordinary (small) keyspace somehow
cannot be undone.

Clarity leads to a better understanding of cipher operation,
and, thus, easier analysis and a better awareness of strength,
which, after all, is the primary design goal.

Diffusion

Perhaps the main issue in the design of block ciphers with
huge blocks has always been the ability to efficiently mix
information across the whole block.
First recall
substitution-permutation
ciphering, where we build a conventional
block cipher using only fencing layers
(substitution tables) and wiring.
By wiring substitution table outputs to two different following
tables, changing the input to the first table may change both
of the secondary tables, and that is
diffusion.
The problem is that both tables may not change.
If some input change causes no change on the wires to some table,
that table and possibly its subsequent tables will not be
involved, which will give the opponent a simpler ciphering
transformation to attack.
We want to eliminate that possibility.
Accordingly, what I call
ideal mixing will cause any
input change to be conducted to every following table,
thus forcing all tables to actively participate in the result.

We now know that
ideal mixing can be accomplished with
FFT-like networks using relatively small
BBM operations.
However, in the U.S., ciphers which do not use my
Balanced Block Mixing
technology must use mixing operations which are inherently
unbalanced. Since I view
balance as the single most important concept
in cryptography, avoiding that when we can get it would seem
to be a very serious decision.

Building a scalable balanced block mixing process
is fairly easy using
ideal mixingBBM technology.
Basically, each mixing operation must have a pair of
orthogonal Latin squares,
and those can be linear, nonlinear, or key-created, or combinations
thereof.
My bias is to use key-created
oLs's in tables.
(It is easy to construct keyed nonlinear orthogonal pairs of Latin
squares of arbitrary 4n order as I describe in my articles:

Confusion

I prefer to use "many" keyed substitution tables, because these
hold a large amount of unknown internal state which an
opponent must somehow reconstruct.
These S-boxes are simply
shuffled
(twice) by a keying sequence
produced by some
cryptographic RNG.
Keying by shuffling is very old technology,
very common in
softwarestream ciphers (although simply
using that technology does not somehow comvert a
block into a
stream).
Tables need to be at least "8-bit" or "byte-wide" with at least
256 byte entries.

One of the more subtle problems with
scalableMixing Ciphers
is that some limited quantity (maybe 16, maybe 64, maybe more)
of keyed tables (and keyed oLs mixing operations as well) may have
to support a block of essentially unlimited size.
Thus, tables must be re-used; tables can be selected from
an array of such tables based on some keyed function of position.
It is important that this be keyed and sufficiently complex so that
knowing something about a table in one position does not immediately
allow assigning that knowlege to other table positions.

One way to handle table selection by cipher position is to
have a maximum block size, and then have a table of that size
for each layer, indicating which S-box or BBM to use
at each position.
Those tables can be keyed (shuffled) just like other tables.
Again, we avoid exposing any significant constants in the
design by constructing what we need with RNG sequences
based on a key.

Layers

We do something right, then move on. We do not have to do
things over and over until they finally work.
In the past I have used two linear mixing layers, which implies
three fencing layers, and I still might use that structure with
keyed nonlinear mixing.
However, having become somewhat more conservative, I might now
use three linear mixing layers (instead of two) along with four
fencing layers.
With a layered design, where it is easy to add or remove layers,
there is always a desire to reduce computation and increase
speed, and it is easy to go too far.

The
ring of
polynomials,
denotedGF(2)[x], in which the
coefficients are taken
mod 2.
The four arithmetic operations addition, subtraction,
multiplication and division are supported, but, like
integers, division is not
closed.
As usual in mod 2, subtraction is the same as
addition.
Each column of coefficients is added separately, without a "carry"
to an adjacent column:

The decision about whether the divisor "goes into" the dividend
is based exclusively on the most-significant (leftmost) digit.
This makes polynomial division far easier than integer division.

Mod 2 polynomials behave much like
integers in that one
polynomial may or may not divide another without remainder. This
means that we can expect to find analogies to integer
"primes,"
which we call
irreducible polynomials.

Since division is not
closed, mod 2 polynomials do not constitute a
field. However, a
finite field of polynomials can be
created by choosing an
irreducible modulus polynomial, thus
producing a Galois field
GF(2n).

In abstract algebra, a nonempty
setM, and a
closeddyadic
operation with
associativity, and an
identity element.
Whatever the operation is, we may choose to call it
"multiplication" and denote it with * as usual.
Closure means that if elements (not necessarily numbers)
a, b are in M, then ab (that is, a*b)
is also in M.

In a monoid consisting of set M and closed operation * :

The operation * is associative: (ab)c = a(bc)

There is a single identity element which works with
all elements: for e and any a in M, ea = ae = a

A set with a closed operation which is just associative is a
semigroup.
A set with a closed operation which is associative, with an
identity element and inverses is a
group.

"A new cryptographic primitive for perfect generation of
diffusion and
confusion.
For an arbitrary set E we call a
permutation
B:E2->E2,
B(a,b)=(B1(a,b),B2(a,b)) a
multipermutation
if, for every a,b in E, the mappings
Bi(a,*),Bi(*,b), for i=1,2 are permutations
on E."

This definition would seem to include
orthogonal Latin squares,
a more-desirable
balanced form well-known in mathematics
for hundreds of years.
Their paper, on non-reversible hashing, may have been presented at
the "Cambridge Security Workshop," Cambridge, December 9-11, 1993.
My work on
Balanced Block Mixing was
published to the net on March 12, 1994, which was before the earlier
paper was available in published proceedings.
Even accepting the earlier paper, however, my work was the first
to demonstrate FFT-like reversible ciphering using
butterfly
functions that turn out to be
orthogonal Latin squares.

An
attack
on classical
transposition ciphers.
Given two messages of the same length, a classical transposition
cipher would re-arrange the messages in a similar way.
The attack is to
anagram
or re-arrange the characters in two same-length ciphertexts
so that both make sense.

The attack depends upon the idea that messages of the same
length will be permuted in the same way, and probably will not
apply to modern transposition ciphers.

The Usual Complaints

The point of multiple encryption is to reduce the damage if our
main cipher is being broken without our knowledge.
We thus compare the single-cipher case to the multiple-cipher case.
But some people just do not like the idea of multiple encryption.
Complaints against multiple encryption include:

It is not proven to make ciphering stronger.
We need to compare apples to apples: There is no proof that
the alternative single cipher will have any strength at
all! If we need a proof of
strength in practice, we cannot use
encryption, because no such cipher and proof exists (see, for
example,
proof and
one-time pad).
And if we can speculate that one of the ciphers is weak, the
single-cipher case completely fails, whereas the
two-cipher case is as strong as the second cipher would be
alone.

There is no reason to think it would make ciphering
stronger.
There are very good reasons to think that multiple
encryption makes a stronger cipher!
First is the real-world example of
Triple DES:
Despite extensive analysis, Triple DES is still unbroken, and
Triple DES is the multiple encryption composed only of
broken DES.
There could hardly be a more dramatic example of strength
improvement than Triple DES!
Next, many if not most ciphers are best attacked with
some form of
known plaintext attack.
But using multiple ciphers in sequence hides the
known plaintext
from external analysis, which prevents those attacks.
Similar hiding is not possible with only a single ciphering.

It is not proven necessary.
If we are using a cipher, then presumably there is no successful
attack on it that we know of.
But there is also no proof that a successful attack does
not exist, and that essentially isproof of
potential cipher weakness, which is the
risk of complete cipher failure.
Moreover, that failure would occur in secret, and since we
would not know about the failure, we would continue to use the
broken cipher forever.
We can hope to minimize the possibility of failure with the
redundancy of multiple cipherings.

It has not been analyzed.
First, multiple encryption was analyzed in 1949 by
Shannon in his famous work,
Communication Theory of Secrecy Systems (see
Algebra of Secrecy Systems).
Next, the well-respected cipher Triple DES is an example of
multiple encryption, and has had extensive analysis.
Then, the
rounds in conventional
block ciphers are essentially a
form of multiple encryption, and also have been heavily analyzed.
And there is a substantial and growing body of work on
multiple encryption; some modern articles are noted below.

There is no proof that using another cipher would not
make things weaker.
Weak sequences of ciphers can be deliberately constructed, but
their constructions must be coordinated to demonstrate
weakness. Getting two different ciphers, with independent
keys, to coordinate in weakness, seems very, very unlikely.
And if any particular cipher would reduce the strength of
another cipher, it would be a fine academic attack. We see
no serious proposals for this sort of attack on serious
ciphers.

Each separate cipher stack can be seen as a resulting cipher
that should be analyzed.
While multiple encryption has various advantages, it is first
a response to a real, existing problem. That problem is the
risk of the
single point of failure
represented by any single cipher. If
cryptanalysis could
prove that no failure could occur in our
"main" cipher, the problem would be solved.
The problem continues to exist specifically because
cryptanalysis can not provide such proof.
Demanding that analysis produce results for multiple ciphers
beyond what it can produce for a single cipher is just
deceptive nonsense.

Since a new attack could apply to all ciphers, it
provides no real redundancy.
There is a real example: DES was "broken" by
Differential and Linear Cryptanalysis and other techniques.
Nevertheless, the multiple encryption with three DES cipherings
called Triple DES was not broken.
Triple DES remains secure because Differential and Linear
Cryptanalysis cannot usefully be applied to it.

Is Weakness Possible?

When we have one cipher, could adding a second cipher weaken the result?
Well, it is possible, but it also seems extremely unlikely.
For example, weakening could happen if the second cipher was the same
as the first, and in decipher mode (or an
involution), and using the same key.
But, except for the keying, exactly that situation is
deliberately constructed in
"EDE (encipher-decipher-encipher)
Triple DES," about which there are no
anxieties at all.

Remember that it is always possible for any cipher
to be weak in practice, no matter how strong it is in general, or whether
it has another cipher after it or not:
All the opponent has to do is pick the right key.
So, when we think about potential problems in any form of encryption,
we also need to think about the likelihood of those causes actually
happening in practice.
Constructing a case of weakness is not particularly helpful if that
does not apply generally.
In any practical analysis, it is not very useful to find a
counterexample which does not represent
the whole; that is the classic logic
fallacy known as
accident.

Despite the "EDE Triple DES" example, the most obvious possibility
of weakness would be for the same cipher to appear twice, in adjacent
slots. Obviously, we can prevent that!
Could a different cipher expose what a first cipher has just
enciphered?
Perhaps, but if so, they are not really different ciphers after
all, and that would be something we could check experimentally.
Can a single cipher have several fundamentally different constructions?
That would seem to be difficult:
Normally, even a small change in the ciphering process has far-reaching
effects.
But, again, we could check for that.

The idea that a completely unrelated cipher could decipher
(and thus expose) what the first cipher had protected may seem
reasonable to those with little or no experience in the practice of
ciphering or who have never actually tried to break ciphertext.
But if that approach was reasonable, it would be a major issue in
the analysis of any new cipher, and we do not see that.

Ciphers transform plaintext into key-selected ciphertext, and we
can describe the number of possible cipherings mathematically.
Even restricting ourselves to small but serious
block ciphers like
DES or
AES, the number of possible transformations is
BIG, BIG, BIG (see
AES)!
Out of the plethora of possible ciphers, and every key for each of
those ciphers, we expect only one cipher and one key to expose our
information.

If many different ciphers and keys could expose an enciphered
message, finding the limits of such vulnerability would be a major
part of every cipher analysis, and that is not done because it is
just not an issue.
If just adding a second cipher would, in general, make the first
cipher weaker, that would be a useful academic
attack, and we see no serious proposals for
such attacks on serious ciphers.

A Deceptive Article

Much of the confusion about the potential risks of multiple
encryption seems due to one confusing article with a particularly
unfortunate title:

Apparently the main goal of the article is to present a contradiction for:

"Folk Theorem.A cascade of ciphers is at least as difficult to
break as any of its component ciphers." [p.3]

In the end, the main result from the article seems to be:

"It is proved, for very general notions of breaking a cipher and of
problem difficulty, that a cascade is at least as difficult to break
as the first component cipher." [Abstract]

But while that may sound insightful, it just means that if
the first cipher is weak, the cascade may be strong anyway,
which is no different than the "Folk Theorem."
And if the first cipher is strong, the result tells us nothing
at all.
(What really would be significant would be: "at least as weak
as," but that is not what the article gives us.)

The implication seems to be that we should simply discard a useful
rule of thumb that is almost
always right, for a result claimed to be right which gives us
nothing at all.
But we would not do that even in theoretical mathematics! (See
Method of Proof and Refutations.)
Instead, we would seek the special-case conditions that make our
statement false, and then integrate those into assumptions that
support valid
proof.

The example "ciphers" used in the "proof" are constructed so that
each of two possible keys do produce different ciphertexts for two
plaintexts, but not for two others.
Then we assume that only two original plaintexts actually occur.
In this case, for one cipher being "first," that key has no effect,
and the resulting ciphertext is also not affected by the key in
the second cipher.
Yet if the ciphers are used in reverse order, both keys are
effective.

However, knowing which cipher is "first" is only "important"
when the first cipher results support attacks on the second cipher
and not vise versa.
But if we are allowed to create arbitrary weakness in examples, we
probably can construct some that are mutually weak, in which case
the question of which is "first" is clearly a non-issue, despite
both the "result" and article title.

Both of the example ciphers are seriously flawed in that, for fully
half of their plaintexts, changing the key does not change
the ciphertext.
Thus, for half their plaintexts, the example ciphers are essentially
unkeyed.
Since the example ciphers start out weak, I do not accept
that either ordering has reduced the strength of the other
cipher, and that is the main fear in using multiple encryption.
Moreover, all we need to get a strong cascade is one more
cipher, if it is strong.
And that, of course, is the point of the "Folk Theorem."

The article has been used by some to sow "FUD"
(fear, uncertainty and doubt) about multiple encryption.
But multiple encryption in the form of product ciphering (in
rounds and with related keys!) is a
central part of most current
block cipher designs.
So before believing rumors of multiple encryption weakness, we
might first ask why the block ciphers which use this technology
seem to be trusted so well.

Other References

The first advantage of multiple encryption is to address the
risk of the
single point failure created
by the use of a single cipher.
Unfortunately, "risk of overall failure" seems to be a significantly
different issue than the "keyspace size" or "needed known-plaintext"
measures used in most related analysis.
Unfortunately, it is in the nature of cryptography that the
risk of cipher failure cannot be known by the cryptanalyst,
designer, or user.
The inability to know the risk is also the inability to quantify
that risk, which leaves the analyst without an appropriate measure.

What we most seek from multiple encryption is redundancy
to reduce risk of overall failure, a concept almost completely
missing from academic analysis.
The main issue is not whether multiple ciphers produce a stronger
result when they all work.
Instead the issue is overall security when one cipher is weak.
For the single-cipher case a broken cipher means complete failure
and loss of secrecy even if we are not informed.
For the multi-cipher case to be a better choice, all the remaining
ciphers need do is have any strength at all.
One would expect, of course, that the remaining ciphers would
be as strong as they ever were.
And even if we assume that all the ciphers are weak, it is
possible that their composition could still be stronger
than the abject failure of a completely exposed single cipher.

Some examples of the literature:

"Here, in addition to formalizing the problem of chosen­ciphertext
security for multiple encryption, we give simple, efficient, and
generic constructions of multiple encryption schemes secure against
chosen ciphertext attacks (based on any component schemes secure
against such attacks) in the standard model."
--Dodis, Y. and J. Katz. 2005.
Chosen-Ciphertext Security of Multiple Encryption.
Second Theory of Cryptography Conference Proceedings.
188-209.

"We prove cascade of encryption schemes provide tolerance for
indistinguishability under chosen ciphertext attacks, including
a 'weak adaptive' variant."
"Most cryptographic functions do not have an unconditional proof
of security.
The classical method to establish security is by cryptanalysis
i.e. accumulated evidence of failure of experts to find
weaknesses in the function.
However, cryptanalysis is an expensive, time-consuming and
fallible process.
In particular, since a seemingly-minor change in a cryptographic
function may allow an attack which was previously impossible,
cryptanalysis allows only validation of specific functions and
development of engineering principles and attack methodologies
and tools, but does not provide a solid theory for designing
cryptographic functions.
Indeed, it is impossible to predict the rate or impact of future
cryptanalysis efforts; a mechanism which was attacked
unsuccessfully for years may abruptly be broken by a new
attack.
Hence, it is desirable to design systems to be tolerant of
cryptanalysis and vulnerabilities (including known trapdoors)."
"Maurer and Massey claimed that the proof in [EG85] 'holds only
under the uninterestingly restrictive assumption that the enemy
cannot exploit information about the plaintext statistics', but
we disagree.
We extend the proof of [EG85] and show that, as expected
intuitively and in [EG85], keyed cascading provides tolerance
to many confidentiality specifications, not only of block
ciphers but also of other schemes such as public key and
shared key cryptosystems.
Our proof uses a strong notion of security under
indistinguishability test--under plaintext only and non-adaptive
chosen ciphertext attack (CCA1), as well as weak version of
adaptive chosen ciphertext attack (wCCA2).
On the other hand, we note that cascading does not provide
tolerance for adaptive chosen ciphertext attack (CCA2), or if the
length of the output is not a fixed function of the length of the
input."
--Herzberg, A. 2004. On Tolerant Cryptographic Constructions.
Presented in Cryptographer's Track,
RSA Conference 2005.

"In a practical system, a message is often encrypted more than
once by different encryptions, here called multiple encryption,
to enhance its security."
"Intuitively, a multiple encryption should remain 'secure',
whenever there is one component cipher unbreakable in it.
In NESSIE’s latest Portfolio of recommended cryptographic primitives
(Feb. 2003), it is suggested to use multiple encryption with
component ciphers based on different assumptions to acquire long
term security.
However, in this paper we show this needs careful discussion."
"We give the first formal model regarding public key multiple
encryption."
--Zhang, R., G. Honaoka, J. Shikata and H. Imai. 2004.
On the Security of Multiple Encryption or
CCA-security+CCA-security=CCA-security?
2004 International Workshop on Practice and Theory in
Public Key Cryptography.

"We conjecture that operation modes should be designed around
an underlying cryptosystem without any attempt to use intermediate
data as feedback, or to mix the feedback into an interemediate
round."
--Biham, E. 1994.
Cryptanalysis of Multiple Modes of Operation.
Journal of Cryptology. 11(1):45-58.

"Double encryption has been suggested to strengthen the
Federal Data Encryption Standard (DES). A recent proposal suggests
that using two 56-bit keys but enciphering 3 times (encrypt with a
first key, decrypt with a second key, then encrypt with the first
key again) increases security over simple double encryption.
This paper shows that although either technique significantly
improves security over single encryption, the new technique does
not significantly increase security over simple double encryption.
--Merkle, R. and M. Hellman. 1981.
On the Security of Multiple Encryption.
Communications of the ACM. 24(7):465-467.

Advantages of Multiple Encryption

Multiple encryption can increase
keyspace (as seen in
Triple DES).
But modern ciphers generally have enough keyspace, so adding more
is not usually the looked-for advantage in using multiple
encryption.

Multiple encryption reduces the consequences in the case that
our favorite cipher is already
broken and is continuously exposing
our data without our knowledge.
(See the comments on the John Walker spy ring in:
security through obscurity.)
When a cipher is broken (something we will not know), the use of
other ciphers may represent the only security in the system.
Since we cannot
scientificallyprove that any particular cipher is strong,
the question is not whether subsequent ciphers are
strong, but instead, what would make us
believe that any particular cipher is so
strong as to need no added protection.

Multiple encryption also protects each of the component ciphers
from
known plaintext attack.
Since
known plaintext completely exposes
the ciphering transformation, it enables a wide range of attacks,
and is likely to make almost any attack easier.
Preventing known plaintext attacks has at least the potential to
even make weak ciphers strong in practice.

With multiple encryption, the later ciphers work on the
randomized "plaintext" produced as ciphertext by the earlier cipher.
It can be extremely difficult to attack a cipher which only protects
apparently random "plaintext," because it is necessary to at least
find some structure in the plaintext to know that one has solved
the cipher.
See:
Ideal Secrecy and
unicity distance.

Cipher Standardization

Most of the protocols used in modern communications are
standardized.
As a consequence, most people do not question the need for
standardization in ciphers.
But in this way ciphers are once again very different than the
things we know so well:
The inherent purpose of ciphers is to prevent
interconnection to almost everyone (unless they have the right
key).

Obviously, it is necessary to describe a cipher clearly and
completely if it is to be properly implemented by different people.
But standardizing on a single cipher seems more likely to help the
opponent than the user (see
NSA).
With a single cipher, an opponent can concentrate resources
on one target, and that target also has the most value since it
protects most data. The result is vastly increased user
risk.

The alternative to having a standard cipher is to have
a standard cipher interface, and then select a desired
cipher by textual name from a continually-increasing list
of ciphers.
In whatever way we now transfer keys, we could also transfer the
name of the desired cipher, or even the actual cipher itself.

Risks of Multiple Encryption

Multiple encryption can be dangerous if a single cipher is
used with the same key each time. Some ciphers are
involutions which both encipher and
decipher with the same process; these ciphers will
decipher a message if it is
enciphered a second time under the same key. This is
typical of classic additive synchronous stream ciphers, as it
avoids the need to have separate encipher and decipher operations.
But it also can occur with block ciphers operated in
stream-cipher-like modes such as
OFB, for exactly the same reason.

It is true that multiple encryption cannot be
proven to improve security over having just
a single cipher.
That seems hardly surprising, however, since no single cipher can
be proven to improve security over having no cipher at all.
Indeed, using a broken cipher is far worse than no cipher,
because then users will be mislead into not taking even ordinary
precautions.
And in real cryptography, users will not know when their cipher
has been broken.

For more on this topic, see:
superencryption; and the large
sci.crypt discussions:

Located about halfway between Washington, D.C. and Baltimore, MD,
just off Rt. 295 (the Baltimore-Washington Parkway), the Museum is
about a half-hour out of Washington:
For example, take I-95 N., to Rt. 32 E.
Then, just past (by under 1/10th of a mile) the cloverleaf
intersection with Rt. 295, take the next exit (Canine Rd.), and
follow the signs.
Or take the Baltimore-Washington Parkway N., and exit just past
the I-95 cloverleaf.
The Museum is clearly marked on the Maryland page of the Mapsco
or Mapquest Road Atlas 2005.

CAUTION: Due to new construction, the wooded terrain, and the
near proximity of Rt. 295, the sign indicating the Museum exit on
Rt. 32 is much too close to the turn-off.
So, after the Rt. 295 cloverleaf, end up in the right lane prepared
for a turn-off coming under a tenth of a mile later.

In
electronics, the confusing idea that the
resistance quality might take on a
negative value, and thus generate power instead of dissipating it.
As far as is known, that does not occur in reality.
However some
feedbackcircuits can be analyzed by considering
energy from another part of the circuit as being created in a
local negative resistance.

What does occur in reality is negative incremental
or differential or dynamic resistance,
where an increase in the
voltage across an active device produces
a decrease in
current the device allows.
That is the reverse of the normal
resistance effect, and so is a
"negative-like" region in the nevertheless overall positive
effective resistance of the device.
Some things which may have some amount of negative dynamic
resistance include:

Any gas-discharge tube (e.g., neon bulbs, fluorescent lamps,
etc.), which is why a "ballast" resistor or inductor is needed
for stable use;

A unijunction transistor;

Some NPN transistors under reverse bias with base open
(collector negative, emitter positive) in breakdown;

A tunnel diode;

A "Lambda Diode" (a P-JFET and N-JFET with their source
terminals joined, and each gate connected to the drain of
the opposite FET);

The input impedance of a bipolar emitter-follower with a
capacitive load (the input admittance is a capacitance shunted
by a negative resistance which increases with frequency and so
tends toward very high frequency oscillation; in the transistor
era it was common to include a 10k base resistor in
emitter-followers to avoid instability);

A switching power supply, as seen from the power line side;

Some amplifier circuits with feedback;

Any "two-terminal" oscillator circuit;

Circuits specifically designed to vary resistance
inversely to some signal (e.g., see National Semiconductor
AN-263, fig 5).

In
cryptography, noise is often
deliberately produced as a source of
really random values.
Such noise is normally the result of a collected multitude of
independent tiny pulses, which is a
white noise.
We say that white noise contains all
frequencies equally, which actually
stretches both the meaning of frequency as a
correlation over time,
and the noise signal as a
stationary source.
In practice, the presence of low frequency components implies a
time correlation which we would prefer to avoid, but which
may be inherent in noise.

I have attacked noise correlation in two ways:

By rolling off the amplifier response below 1kHz with an
RC filter. To the extent that noise
energy is equally distributed across frequency, this removes
only 1/20th of the noise signal.

By subtracting each previous sample from the current one.
This provides a major improvement as measured by autocorrelation
testing.

Originally, a list of transformations from names to
symbols or numbers for diplomatic communications. Later, typically
a list of transformations from names,
polygraphic syllables, and
monographic letters, to numbers.
Usually the monographic transformations had multiple or
homophonic alternatives for
frequently-used letters. Generally smaller than a
codebook, due to the use of the
syllables instead of a comprehensive list of phrases.
A sort of early manual
cipher with some characteristics of a
code, that operated like a
codebook.

In general, a nonce is at least potentially dangerous, in that
it may represent a hidden channel.
In most nonce use, any random data value is as good as another,
and, indeed, that is usually the point.
However, by selecting particular values, nonce data could be
subverted and used to convey information about the key or plaintext.
Since any value should be as good as any other, the user and
equipment would never know about the subversion.
Of course, the same risk occurs in
message keys, and that does not
mean we do not use message keys or other nonces.

The
gaussiandistribution.
The usual "bell shaped" distribution discovered in 1733 by
Abraham de Moivre, and further developed by
Carl Friedrich Gauss 1777-1855.
Called "normal" because
it is similar to many real-world distributions. Note that
real-world distributions can be similar to normal, and yet
still differ from it in serious systematic ways. Also see the
normal computation page.

"The" normal distribution is in fact a family of distributions,
as parameterized by
mean and
standard deviation values. By computing
the sample mean and standard deviation, we can "normalize" the
whole family into a single curve. A value from any normal-like
distribution can be normalized by subtracting the mean then
dividing by the standard deviation; the result can be used to
look up probabilities in standard normal tables. All of which
of course assumes that the underlying distribution is in fact
normal, which may or may not be the case.

The National Security Agency, the U.S. government bureaucracy
charged with "making and breaking codes."
Organized under the U.S. Defense Department, NSA designs
cipher systems for the State
Department and Army/Navy/Air Force, and breaks
ciphers as needed for the CIA, FBI
and other U.S. agencies. (Also see
National Cryptologic Museum.)

The NSA is the frequent topic of
cryptographic speculation in the sense
that they represent
the opponent, bureaucratized.
They have huge resources, massive experience, internal research and
motivated teams of attackers.
But since NSA is a secret organization, most of us cannot know
what NSA can do, and there is little fact beyond mere speculation.
But it is curious that various convenient conditions do exist,
seemingly by coincidence, which would aid real
cryptanalysis.

Standardization

One situation convenient for NSA is that some particular cipher
designs have been standardized. (This has occurred through
NIST, supposedly with the help of NSA.)
Although
cipher standardization can be a
legal requirement only for government use, in practice the standards
are adopted by society at large.
Cipher standardization is interesting because an organization which
attacks ciphers presumably is aided by having few ciphers to attack,
since that allows attack efforts to be concentrated on few targets.

When information is at
risk, there is nothing odd about having
an approved cipher.
Normally, managers look at the options and make a decision.
But NSA has secret ciphers for use by government departments
and the military.
They also change those ciphers far more frequently than the
standardized designs.

Would a government agency risk tarnishing its reputation by
knowingly approving a flawed cipher?
Well, it is NIST, not NSA, that approves standard public ciphers.
And if NSA neither designed nor approved those ciphers, exactly how
could a flaw be considered a risk to them?
Indeed, finding a flaw in a public design could expose the
backwardness of academic development compared to the abilities
of an organization which normally cannot discuss what it can do.
That is not only not a risk, it could be the desired outcome.

Belief in Strength

Another situation which is convenient for NSA is that users
are frequently encouraged to
believe that their cipher has been
proven strong by government acceptance.
That is a reason to do nothing more, since what has been done is
already good enough.
Can we seriously imagine that NSA has a duty to tell us if they
know that our standard cipher is weak? (That would expose their
capabilities.)

Clearly, when only one cipher is used, and that cipher fails,
all secrecy is lost. Thus, any single cipher is at risk of being a
single point of failure.
But, since
risk analysis is a well known tool
in other fields, it does seem odd that cryptography users are
continually using a single cipher with no redundancy at all. The
multiple encryption alternative
simply is not used.
The current situation is incredibly risky for users, yet oddly
convenient for NSA.

The OTP

The OTP or
one time pad is commonly held up as
the one example of an unbreakable cipher.
Yet NSA has clearly described breaking the
VENONA
cipher, which used an OTP, during the Cold War.
It is argued that VENONA was "poorly used," but if a user has
no way to guarantee a cipher being "well used," there is no
reason for a user to consider an OTP strong at all.

It does seem convenient for NSA that a potentially breakable
cipher continues to be described by crypto authorities as
absolutely "unbreakable."

Something without meaning or worth.
A symbol or character with no information content, due to
either value or position. Also see
code.

Sometimes null characters are used to assure serial-line
synchronization between data
blocks or packets (see the
ASCII character "NUL").
Sometimes null characters are used to provide a synchronized
real-time delay when a transmitter has no data to send; this is
sometimes called an "idle sequence."
Similarly,
blockpadding characters are sometimes
considered "nulls."

In
statistics, in general, when we
randomly sample
innocuous data and apply a statistic computation, we get some
statistic value.
When we do that repeatedly, the statistic values accumulate in
some particular
distribution which we can learn
to recognize.
So, for any particular statistic, a null distribution is what
we find when sampling innocuous data.
This is the "nothing unusual found" situation.

The
p-value computation for a particular
statistic typically tells us the probability of getting any
particular statistic value or less (or more) in the null
distribution.
Then, if we repeatedly find very unusual statistic values,
we can conclude either that our sampling has been very lucky, or
that the statistic is not reproducing the null distribution.
That would mean that we were not sampling innocuous data, and
so could reject the
null hypothesis.
This is the "something unusual found" situation.

In
statistics, the status quo, or
what we accept as true absent evidence to the contrary.
Apparently introduced by Fisher in 1935.
What we
conclude from a statistical experiment
when results seem due to chance alone.
The particular statement or model or
hypothesisH0
which is accepted unless a test
statistic finds "something unusual."
In contrast to the
alternative hypothesis or
research hypothesisH1.

Normally, the
null hypothesis is just the statistical
conclusion drawn when the pattern being tested for is not found.
Normally, the null hypothesis cannot be proven or established by
experiment, but can only be disproven, and statistics can only do
that with some probability of error which is called the
significance.

Statistical Experiments

A statistical experiment typically uses
random sampling or
random values to probe a
universe under test.
Those samples are then processed by a test
statistic and accumulated into a
distribution.
Good statistical tests are intended to produce extreme statistic
values upon finding the tested-for patterns.

Sometimes it is thought that extreme statistic values are an
indication that the tested-for pattern is present.
Alas, reality is not that simple.
Random sampling generally can produce any possible statistic
value even when no pattern is present.
There are no statistic values which only occur when the tested-for
pattern is detected.
However, some statistic values are extreme and only occur
rarely when there is no underlying pattern.
To distinguish patterns from non-patterns, it is necessary to know
how often a particular statistic result value would occur with
unpatterned data.

The collection of statistic values we find or expect from data
having no pattern is the
null distribution.
Typically this distribution will have the shape of a hill or bell,
showing that intermediate statistic values are frequent while
extreme statistic values are rare.
To know just how rare the extreme values are, other statistic
computations "flatten" the distribution by converting statistic
values into probabilities or
p-values.
In this way each statistic result value can be associated with the
probability of that value occurring when no pattern is present.

The probability that an extreme statistic value will occur when
no pattern is present is also the probability of a
Type I error which is usually a
"false positive."
Type I errors are a consequence of the
randomness required by sampling, or a
consequence of random values, and cannot be eliminated.
Normally, random values are expected to produce sequences without
pattern.
Again, reality is not like that.
Instead, over huge numbers of sequences, random values must produce
every possible sequence, including every possible "pattern."
Usually the statistical test is looking for a particular class of
pattern which may not correspond to what we expect.

[When every sequence is possible, the probability of finding a
"pattern" depends strongly on what we interpret as a pattern.
It might be possible to get some quantification of pattern-ness
with a measure like
Kolmogorov-Chaitin complexity
(the length of the shortest program to produce the sequence).
But K-C complexity testing may have its own bias, and in any case
there is no algorithm to find the shortest program.]

Any particular statistic value can be used to separate "probably
found" from "probably not found."
Typically, scientists will use a "significance" of 95 percent or
99 percent, but that is not the probability that the
hoped-for "something unusual" signal has been found.
Instead, the complement of the significance (usually 5 percent or
1 percent) generally is the probability that a statistic value so
high or greater occurring from null data having no patterns.
With a 95 percent significance, null data will produce results that
falsely reject the null hypothesis in 5 trials out of 100.
By increasing the significance, the probably that null data will
produce an extreme result value is decreased, but never zero.
When multiple trials show that the statistic measurements do not
follow the null distribution, "something unusual" has been found,
and the null hypothesis is rejected.

Randomness testing is a
special case because there we hope to find the null distribution.
Success in randomness testing generally means being forced to
accept the null hypothesis, which is opposite to most statistical
experiment discussions.
Statistical test programs in the classic mold seem to say that
some statistic extremes mean "a pattern is probably found,"
which would be "bad" for a random generator.
Again, that is not how reality works.
In most cases, a
random number generator
(RNG) is expected to produce the null
distribution.
Extreme statistic values are not only expected, they are absolutely
required.
An RNG which produces only "good" statistical results is bad.

The Hypothesis

If we check random data which has no detectable pattern, we
might expect the null hypothesis to be always rejected, but that
is not what happens.
In tests at a 95 percent significance level, the null
hypothesis should be accepted in 95 trials out of 100.
However, in 5 trials out of 100, "something unusual" will be
"found" and the null hypothesis rejected or discarded.
5 times out of 100, the logically contrary
alternative hypothesis or
research hypothesisH1 will be accepted or supported,
even though nothing unusual is present.
This is the normal consequence of random values or
random sampling and can be
especially disturbing when the false positives occur early.

Normally, the
alternative hypothesis or
research hypothesisH1 includes the particular signal we test for
in the randomly-sampled data, but also includes any result
other than that specified by the null hypothesis.
It is, therefore, more like "something unusual found" than
evidence of the particular result we seek.
When the tested-for pattern seems to have been found, the null
hypothesis is rejected, although the result could be due to a
flawed experiment, or even mere chance (see
significance).
This range of things that may cause rejection is a motive for
also running control trials which do not have the looked-for
signal. Also see:
randomness testing and
scientific method.

A common approach is to formulate the null hypothesis to
expect no effect, as in: "this drug has no effect."
Then, finding something unexpected causes the null hypothesis
to be rejected, with the intended meaning being that the drug
"has some effect."
However, many statistical tests (such as
goodness-of-fit tests) can only
indicate whether a distribution matches what we expect, or not.
When the expectation is the known
null distribution, then what
we expect is nothing, which makes the "unusual" stand out.
But in that case, even a poorly-conducted or fundamentally flawed
experiment could produce a "something unusual found" result.
Simply finding something unusual in a statistical distribution
does not imply the presence of a particular quality.
Instead of being able to confirm a model in quantitative detail,
this formulation may react to testing error as a detectable signal.

Even in the best possible situation,
randomsampling will produce a range or
distribution of test statistic values.
Often, even the worst possible statistic value can be produced by
an unlucky sampling of the best possible data.
It is thus important to compare the distribution of the
statistic values, instead of relying on a particular result.
It is also important to know the
null distribution so we can make
the comparison.
If we find a different distribution of statistic values, that
will be evidence supporting the alternative or research hypothesis
H1.

When testing data which has no underlying pattern, if we
collect enough statistic values, we should see them occur in the
null distribution for that particular statistic.
So if we call the upper 5 percent of the distribution "failure"
(this is a common scientific
significance level) we not only
expect but in fact require such "failure" to occur
about 1 time in 20.
If it does not, we will in fact have detected something unusual
in a larger sense, something which might even indicate problems
in the experimental design.

If we have only a small number of samples, and do not run repeated
trials, a relatively few chance events can produce an improbable
statistic value even in the absence of a real pattern
That might cause us to reject a valid null hypothesis, and so commit a
Type I error.

When we see "success" in a very common distribution, we can
expect that success will be very common.
A system does not have to be all that complex to produce results
which just seem to have no pattern, and when no pattern is
detected, we seem to have the null distribution.
Finding the null distribution is not evidence of a lack of
pattern, but merely the failure to find a pattern.
And since that pattern may exist only in part, even the best of
tests may give only weak indications which may be masked by
sampling, thus leading to a
Type II error.
To avoid that we can run many trials, of which only a few should
mask any particular indication.
Of course, a weak indication may be difficult to distinguish from
sampling variations anyway, unless larger trials are used.
But there would seem to be no limit to the size of trials one
might use.

(Also "Ockham".)
Given a some known facts, and a desire to find some
relationship between those facts,
inductive reasoning and other
guesses can deveop any number of different
models which predict
the exact same facts.
However, absent further testing and new facts, the simplest model
is preferred because it should be easier to use.

Note that the simplest model is not necessarily right:
Many simple models are eventually replaced by more complex models.
Nor does Science expect practitioners to defer to a particular
model just because it has been
published:
The issue is the quality of the
argument in the publication,
and not the simple fact of publication itself.

A recommendation for scientists might be:
"When you have multiple theories which predict exactly the same
known facts, assume the simpler theory until it clearly does not
apply."
That tells us to test the simple model first, and to choose a
more complex model if the simple one is insufficient. Also see:
scientific method.

Base 8: The numerical representation in which each digit has an
alphabet of eight symbols, generally
0 through 7.

Somewhat easier to learn than
hexadecimal, since no new numeric
symbols are needed, but octal can only represent three
bits at a
time. This generally means that the leading digit will not take all
values, and that means that the representation of the top part of
two concatenated values will differ from its representation alone,
which can be confusing. Also see:
binary and
decimal.

Stories are almost universal in human society.
Most stories are about someone doing something, and how that is
going, or how it turned out.
To a large extent, stories and gossip are how we learn to interact
with the world around us.
Because story-listening is so common, humans may be genetically
oriented toward stories.
Perhaps, before the invention of writing, valuable past experiences
lived on in stories, and those who listened tended to live longer
and/or better, and reproduce more.

But even if evolution has put gossip in our genes, it does not
seem to have worried very much about the distinction between
fantasy and reality.
I suppose that would be a lot to ask of mere evolution.
But, even in modern technology, many plausible-sounding stories are
just not right, yet people accept them anyway.

Normally, Science handles this by testing the gossip-model
against reality.
But cryptography has precious little reality to test against.
Accordingly, we see the accumulation of "old wives' tales" which
are at best
rules of thumb and at worst
flat-out wrong.
Yet these stories apparently are so ingrained in the myth of the
field that mere rationality is insufficient to stop their
progression (see
cognitive dissonance).

Examples include:

Modern
ciphers are proven secure.
(Alas, no. Virtually no
proof of cipher
strength survives the transition to
practice in any useful way.
Cryptography cannot measure
cipher strength unless the cipher is broken, in which case we
will not use it.
Because we cannot measure cipher strength, we cannot
trust the strength of any cipher.)

Most cryptographers
believe that modern ciphers are
"strong enough."
(If so, they overstep.
Cryptography stands virtually alone in the modern world as a
product whose goals cannot be
quality assured. Cipher
strength occurs only in the context of
opponents who work in secret.
And even the opponents only know what they can do; they do
not know if others can do better.
There can be no expertise on cipher strength.)

Surely
AES is secure.
(There is no such knowledge. Just because academics cannot
break AES does not mean that
nobody else can.
Ciphers are broken in secret, and real
opponents are not going to announce
when they succeed.
AES could be broken now, and we would not know.
In that case, we would continue to use AES while our secret
information was exposed and exploited.
Presumably that would continue until somebody finally
published a successful attack. Better approaches include
multiple encryption and
frequent cipher changes or keyed selection as in Shannon's
Algebra of Secrecy Systems.)

But it would take a lot of effort to break
AES.
(We cannot even count on that. Academic vetting and
cryptanalysis do not develop a
guaranteed minimum strength value.
Nothing we know would prevent a new insight from leading
to an efficient break. And then that attack might be
implemented in a computer program which almost anybody could
run.)

We need only one good cipher.
(Unfortunately, there is no way to know whether we have a "good"
one or not. There is, and probably can be, no
proof of cipher strength in practice.
That means any particular cipher may fail, and if we only have
that one, we
risk everything in a
single point of failure.
To use only one cipher is to risk everything on something we
know from the outset cannot be
trusted.
An alternative is to have and use multiple ciphers and
multiple encryption.
Also see Shannon'sAlgebra of Secrecy Systems and
NSA.)

If a cipher has been around for a while without any
known
attack, we can
trust it.
(This is the basis for much of current cryptographic
dogma but is simply invalid
logic: Our
opponents may have an effective
attack about which they have not told us.
Not telling us is understandable, since if they did we would
switch to some other cipher, which could ruin their success.
So just because we do not know of an attack does not mean
that no effective attack exists.
And if an effective attack can exist without us knowing,
surely it is wrong to have confidence in any claim that an
effective attack does not exist. Indeed, such
belief is exactly what our opponents
would plant and encourage if they did have an effective
attack.)

We can trust accomplished academics to
vet our ciphers.
(First of all, most academic vetting is volunteer effort,
and we have no real idea how much time has been spent nor
how comprehensive the investigation was.
But the real problem is that there simply is, and can be,
no academic expertise with respect to what our opponents
can do, and that is what we need.)

We can trust cryptographic
proofs because, ultimately, cryptography
is mathematics.
(Cryptography and mathematics are different fields. While
math can be content to "protect"
theoretical data, cryptography
has to deal with real data, real machines and real opponents.
At its best,
mathematics can give
us a convenient
model of reality. At its worst,
math can deceive with results from models which just do not apply
to real use.
Currently there seems to be a widespread lack of desire to improve
models which do not predict correct results in
practice. See:
proof and
one time pad.)

There is no test that will certify a given sequence as
unpredictable. Even though a sequence might be
unpredictable, we cannot prove that, and absolute proof
about the sequence is required to apply the OTP proof.

One or both users simply cannot know that the system has
been used properly, and that the pad has been produced properly.
A proof does not help someone who cannot control all of the
assumptions made in that proof.

In practice, an OTP may be secure or not, just like other ciphers.)

Known plaintext exposure
does not matter anymore.
(Exposing plaintext which can be matched to ciphertext can be
fatal to some ciphers.
And although modern ciphers are claimed to not have that
weakness, that is still only a claim, not
proven fact.
Indeed, some of the best known
attacks, like
Linear Cryptanalysis,
generally require known plaintext, while others, like
Differential Cryptanalysis, generally
require the even more restrictive
defined plaintext conditions.
Preventing those conditions also prevents those attacks.)

Cryptanalysis is how we know the strength of a cipher.
(Conventional
cryptanalysis can only tell us of
the strength of a cipher when that cipher has been
broken, and then we will not use it.
Cryptanalysis tells us nothing of the strength of ciphers which
are not broken, and those are the only ones we use. We simply
cannot extrapolate academic lack-of-success to our opponents.)

Cryptographers agree that current ciphers are less of a
security
risk than the systems or people around
them.
(Cryptographers probably "agree" about many wrong things.
Obviously there is no way to know the probability of an opponent
breaking our cipher, because our opponents do not tell us when
they succeed, so we cannot develop probabilities.
And since that probability cannot be known, it also cannot be
compared to other unknown probabilities such as the risks
of hardware or people failure.)

The best way to protect information is to "put it all in
one basket," as in one much-reviewed cipher.
(The problem is that we cannot "watch that basket."
That is impossible due to the nature of the situation.
We cannot expect to know when the cipher fails, because
that happens in private, and the opponents will not tell us,
for if they did, we would change the cipher.
So if we only use one cipher, and that cipher fails, all
of our information is exposed, and our new information will
continue to be exposed until we change the cipher.
A better approach is to use
multiple encryption, and to
change ciphers frequently so that only a small amount of
information is protected by any particular cipher.)

By using an appropriate
threat model, we do not need
to know exactly how strong our ciphers are.
(Unfortunately, a threat model is not very useful for ciphers,
because we typically want to protect all of the information we
have, from anyone, under any possible attack, forever. Loosening
the constraints is not very helpful either, because cryptography
cannot correctly model cipher strength.)

When a cipher has a known weakness already, nobody should
care if we add a little more.
(This is one of the unstated arguments in common simplification
of the
BB&S design. But I do care!
Most modern ciphers do have an unfixable "hole" in the sense that
an opponent may choose the correct key by accident.
But that does not mean that it is OK to include other
holes which are easily fixed, even if they are less likely.)

Public key ciphers are almost proven secure.
(Public-key ciphers are vastly more
risky than most people realize, because they support
man-in-the-middle (MITM) attacks.
MITM attacks are worrisome because they do not require breaking
any cipher. MITM attacks can be effective even if the ciphers
have been
proven secure.
That places much of the burden for security on a complex,
originally expensive, and continually costly certification
infrastructure or
PKI, which is often simply ignored.
In contrast,
secret key ciphers do not support
MITM attacks.)

Secret key ciphers need too many
keys for general use.
(Secret keys correspond well to the metal keys we all use
and love, so if we can imagine handling the situation with metal
keys, we can do the same with a secret key cipher.
For example, if we want a particular group to use the same door,
we give them all copies of the same metal key.
If we want to communicate with a company, we hardly need a
separate cipher key for every person there.
And local secure storage does not even need key transport.)

We know how much
unpredictable information we get from a
really-random generator by just
calculating the
entropy.
(The Shannon entropy computation measures
the "information rate" or efficiency of a coding structure, not
"surprise" or unpredictability.
The computation produces the exact same result whether the measured
values are predictable or unpredictable.
But if we first certify that the information really is coming from
a quantum source, then Shannon entropy can tell us how much
information we have in the typical uneven distribution.)

If we ever need to, we could use the effects of chaotic
airflow in disk drives as an unpredictable noise source.
(A PC system is surprisingly more complex than one might expect.
Simply sampling a complex system can produce values which pass
statistical tests. But just passing such tests cannot certify a
really random source. Some problems
include:

It has not been demonstrated that the expected small variations
can be detected in a the noisy but
deterministic PC environment
of memory refresh, hardware interrupts and multitasking context
changes.

It has not been demonstrated that variations in disk rotation
occur which can be detected on an ordinary PC.

It has not been demonstrated that any such rotational variations
are due to airflow, let alone that the airflow is chaotic.

And since
chaos theory started as the unexpected
ability to find pattern in dripping water and other things thought
essentially random, even if disk variations were chaotic
they could be largely
predictable anyway.)

It was in a
paper in the proceedings
from a crypto meeting, so our interpretation must be true.
(Oh, if only things were that easy.)

Surely we can use the "laws of physics" to get
provably random values.
(Science does not "prove" physical "laws," but instead
develops
models which "predict"
experimental results.
Current models are occasionally replaced by newer models which
perform better.
Clearly, if a model can be replaced it could not have been
proven correct originally.
In the end, Science does not and can not provide absolute
certainty in harvesting unpredictable randomness from any
physical source.
There are physical sources which we assume to be
random, much like we assume some ciphers are strong,
and we may be wrong in either case.)

The
CBCIV can be sent in the clear.
(Well, if we simply assume that any changes that occur
during message transport will be detected by a message-level
MAC, sending the IV in the open seems fine.
But if the
authentication is forgotten or
broken, an exposed IV can lead to a deceptive
MITM attack which would not
otherwise be available.
For best confidence, the CBC IV should be sent encrypted.)

Ciphertext cannot be
compressed.
(If the ciphertext has been encoded for transmission in
text form, as is common for email delivery, it almost certainly
can be compressed into binary form.)

Ideally, "one-sided tests" are statistic computations sensitive
to variations on only one side of the reference.
The two "sides" are not the two ends of a statistic distribution,
but instead are the two directions that sampled values may differ
from the reference (i.e., above and below).

When comparing distributions, lowp-values almost always mean that the
sampled distribution is unusually close to the reference.

On the other hand, the meaning of high p-values depends
on the test.
Some "one-sided" tests may concentrate on sampled values
above the reference distribution, whereas different
"one sided" tests may be concerned with sampled values
below the reference.
If we want to expose deviations both above and below
the reference, we can use two appropriate "one-sided"
tests, or a
two-sided test intended to expose
both differences.

In
statistics, a hypothesis evaluation
with a rejection region on only one end of the null
distribution.
The "test" part of this is the critical value which marks the
start of the rejection region, thus becoming the measure of the
hypothesis.
Sometimes called
one-sided. In contrast to
two-tailed test. Also see
null hypothesis.

Fundamentally a way to interpret a statistical result.
Any statistic can be evaluated at both tails of its
distribution, because no distribution has just one tail.
The question is not whether a test distribution has two
"tails," but instead what the two tails mean.

When comparing distributions, finding repeated
p-values near 0.0 generally mean that
the distributions seem too similar, which could indicate some sort
of problem with the experiment.

On the other hand, the meaning of repeated p-values near 1.0
depends on the test. Some
one-sided tests may concentrate on
sampled values above the reference distribution, whereas
different "one sided" tests may be concerned with sampled values
below the reference.
If we want to expose deviations both above and below
the reference, we can use two appropriate "one-sided"
tests, or a
two-sided test intended to expose
differences in both directions.

Are "One-Tailed" Tests Inappropriate?

Some texts argue that one-tailed tests are almost always
inappropriate, because they start out assuming something
that statistics can check, namely that the statistic exposes the
only important quality.
If that assumption is wrong, the results cannot be trusted.

There is also an issue that the
significance level is confusingly
different (about twice the size) in one-tailed tests than it is in
two-tailed tests, since two-tailed tests accumulate rejection from
both ends of the
null distribution.

However, sometimes one-tailed tests seem clearly more
appropriate than the altermative, for example:

In statistical quality control, a certain maximum
defect-level may be guaranteed by contract.
When incoming lots are sampled, abnormally low defect rates
(at the other end of the distribution) are just not an issue.

Ultra-low bacteria levels in drinking water are also not an
issue.

If we are looking for an improved medical treatment, the
null hypothesis might be
that the experimental treatment does no better than some
known cure rate.
If the null hypothesis cannot be rejected, we may not care
if the experimental treatment is extraordinarily worse.
(Of course, if it turns out that the experimental treatment
is so bad it is killing people, we may care more than we
thought we would.)

1. The spy cipher based on small booklets of
randomdecimal digits (details below).
2. The
term of art rather casually used for two
fundamentally different types of
cipher:

The Theoretical One Time Pad: a theoretical
random
source produces values which are
combined with data to produce
ciphertext. In a theoretical
discussion of this concept, we can simply assumeperfectunpredictable randomness
in the source, and this
assumption supports a mathematical
proof that the cipher is unbreakable.
But the theoretical result applies
to reality only if we can prove the
assumption is valid in reality. Unfortunately, we
cannot do this, because provably perfect randomness
apparently cannot be attained in practice (see
really random). So the theoretical
OTP does not really exist, except as a goal. (This is not an
issue of being unable to prove perfection when "good enough"
is available. It is instead an issue of not being able to
know when a randomness flaw exists that might be exploited.
Again, see
proof, but also
randomness testing and
science.)

The Realized One Time Pad: a
really random source
produces values which are combined with data to produce
ciphertext. But because we can neither assume nor
prove perfect, theoretical-class randomness in any
real generator, this cipher does not have the mathematical
proof of the theoretical system.
Perhaps there is some unnoticed
correlation in the sequence,
or even some complex generating function which we do not
expect, but which nevertheless exists and may be exploited
by our
opponents. Thus, a realized one time
pad is NOT proven unbreakable, although it may
in fact be unbreakable in practice. In this sense,
it is much like other realized ciphers.

Sequence Predictability

Despite the "one-time" name, the most important OTP requirement
is not that the keying sequence be used only once, but that
the keying sequence be
unpredictable.
Clearly, if the keying sequence can be
predicted, the OTP is
broken, independent of whether the
sequence was re-used or not.
Sequence re-use is thus just one of the many forms of predictability.
Indeed, we would imagine that the extent of the inability to predict
the keying sequence is the amount of strength in the OTP.
And the OTP name is just another misleading cryptographic
term of art.

The one time pad sometimes seems to have yet another level of
strength above the usual
stream cipher, the ever-increasing amount of
unpredictability or
entropy in the
confusion sequence, leading to
an unbounded
unicity distance and perhaps,
ultimately, ShannonPerfect Secrecy.
Clearly, if the confusion sequence is in fact an arbitrary selection
among all possible and equally-probable strings of that length, the
system would be Perfectly Secret to the extent of hiding which message
of the given length was intended (though not the length itself).
But that assumes a quality of sequence generation which we
cannot
prove but can only assert.
So that is a just another
scientific model which does not
sufficiently correspond to reality to predict the real outcome.

In a realized one time pad, the confusion sequence itself must be
random for, if not, it will be somewhat
predictable. And, although we have a great many
statisticalrandomness tests, there is no
test which can certify a sequence as either random or
unpredictable.
Indeed, a random selection among all possible strings of a given
length must include even the worst possible patterns that we
could hope to find (e.g., "all zeros").
So a sequence which passes our tests and which we thus assume
to be random may not in fact be the unpredictable
sequence we need, and we can never know for sure.
(That could be considered an argument for using a
combiner with strength, such as a
Latin square,
Dynamic Substitution or
Dynamic Transposition.)
In practice, the much touted "mathematically proven unbreakability"
of the one time pad depends upon an assumption of randomness and
unpredictability which we can neither test nor
prove.

Huge Keys

In a realized one time pad, the confusion sequence must be
transported to the far end and held at both locations in absolute
secrecy like any other secret key.
But where a normal secret key might range perhaps from 16 bytes to
160 bytes, there must be as much OTP sequence as there will be data
(which might well be megabytes or even gigabytes).
And whereas a normal secret key could itself be sent under a key (as in a
message key or under a
public key), an OTP sequence
cannot be sent under a key, since that would make the OTP as
weak as the key, in which case we might as well use a normal cipher.
All this implies very significant inconveniences, costs, and risks,
well beyond what one would at first expect, so even the realized
one time pad is generally considered impractical, except in
very special situations.

There are some cases in which an OTP can make sense, at least
when compared to using nothing at all.
One advantage of any cipher is the ability to distribute key
material instead of plaintext.
Whereas plaintext lost in transport could mean exposure, key
material lost in transport would not affect security.
That allows key material to be securely transported at an advantageous
time and accumulated for later use.
Of course it also requires that key material transport be
successfully completed before use.
And the existence of a key material repository allows the
repository to be targeted for attack immediately, before secure
message transport is even needed.

A realized one time pad requires a
confusion sequence which is as
long as the data.
However, since this amount of keying material can be awkward to
transfer and keep, we often see "pseudo" one-time pad designs which
attempt to correct this deficiency.
Normally, the intent is to achieve the theoretical advantages of a
one-time pad without the costs, but unfortunately, the OTP theory of
strength no longer applies. Actual
random number generators
typically produce their sequence from values held in a fixed amount
of internal
state.
But when the generated sequence exceeds that internal state, only a
subset of all possible sequences can be produced.
RNG sequences are thus not random in the sense of being an
arbitrary selection among all possible and equally-probable strings,
no matter how statistically random the individual values may appear.
Of course it is also possible for unsuspected and exploitable
correlations to occur in the sequence
from a
really random generator whose
values also seem statistically quite random.
Accordingly, generator ciphers are best seen as classic
stream cipher designs.

Weakness in Theory

Nor does even a theoretical one time pad imply unconditional
security: Consider A sending the same message to B
and C, using, of course, two different pads. Now,
suppose the opponents can acquire plaintext from B and
intercept the ciphertext to C. If the system is using the
usual
additive combiner, the opponents
can reconstruct the pad between A and C.
Now they can send C any message they want, and encipher it
under the correct pad. And C will never question such a
message, since everyone knows that a one time pad provides
"absolute" security as long as the pad is kept secure. Note that
both A and C have done this, and they are the only
ones who had that pad.

Even the theoretical one time pad fails to hide message length,
and so does leak some information about the message.

Weakness in Practice, Including VENONA

In real life,
theory and
practice often differ.
The main problem in applying theoretical
proof to practice is the requirement to
guarantee that each and every
assumption in the proof absolutely
does exist in the target reality.
The main requirement of the OTP is that the pad sequence be
unpredictable.
Unfortunately, unpredictability is not a measurable quantity.
Nobody can know that an OTP sequence is unpredictable.
Users cannot test a claim of unpredictability on the sequences
they have.
The OTP thus requires the user to
trust the pad manufacturers to deliver
unpredictability when even manufacturers cannot measure or
guarantee that.
Any mathematical proof which requires things that cannot be
guaranteed in practice is not going to be very helpful to a real
user.
(Also see the longer discussion at
proof.)

The inability to guarantee unpredictability in practice should
be a lesson in the practical worth of mathematical cryptography.
Theoretical math feels free to assume a property for use in proof,
even if that property clearly cannot be guaranteed in practice.
In this respect, theoretical math proofs often deceive more
than they inform, and that is not a proud role for math.

At least two professional, fielded systems which include OTP
ciphering have been
broken in practice by the NSA.
The most famous is
VENONA, which has its own pages at
http://www.nsa.gov/docs/venona/.
VENONA traffic occurred between the Russian KGB or GRU and their
agents in the United States from 1939 to 1946.
A different OTP system break apparently was described in:
"The American Solution of a German One-Time-Pad Cryptographic
System," Cryptologia XXIV(4): 324-332.
These were real, life-and-death OTP systems, and one consequence
of the security failure caused by VENONA was the death by execution
of Julius and Ethel Rosenberg.
Stronger testimony can scarcely exist about the potential
weakness of OTP systems.
And these two systems are just the ones NSA has told us about.

Apparently VENONA was exposed by predictable patterns in the key
and by key re-use.
At this point, OTP defenders typically respond by saying:
"Then it wasn't an OTP!"
But that is the logic
fallacy of
circular reasoning
and tells us nothing new:
What we want is to know whether or not a cipher is secure
before we find out that it was broken by our opponents
(especially since we may never find out)!
Simply assuming security is what cryptography always
does, and then we may be surprised when we find there was no security
after all, but we expect much more from a security proof!
We expect a
proof to provide a guarantee which has no
possibility of a different outcome; we demand that there be
zero possibility of surprise weakness from a system which is
mathematically proven secure in practice.
Surely the VENONA OTP looked like an OTP to the agents
involved, and what can "proven secure" possibly mean if the user
can reasonably wonder whether or not the "proven" system really
is secure?

Various companies offer one time pad programs, and sometimes
also the keying or "pad" material.
But random values sent on the Internet (as plaintext or even as
ciphertext) are of course unsuitable for OTP use, since we would
hope it would be easier for an opponent to expose those values
than to attack the OTP.

The Spy Cipher

Typically, the "pad" is one of a matched pair of small booklets
of thin paper sheets holding
randomdecimal digits, where each digit is
to be used for
encryption at most once.
When done, that sheet is destroyed.
The intent is that only the copy in the one remaining booklet
(presumably in a safe place) could possibly
decrypt the message.

In hand usage, a
codebook is used to convert message
plaintext to decimal numbers.
Then each code digit is added without carry to the next random
digit from the booklet and the result is numerical
ciphertext.
In the past, a public
code would have been used to convert the
resulting values into letters for cheaper telegraph transmission.

Based on
theoretical results, the
practical one time pad is widely thought
to be "unbreakable," a claim which is false, or at least only
conditionally true (see the NSA VENONA practical
successes above).
For other examples of failure in the current cryptographic wisdom,
see
AES,
BB&S,
DES,
proof and, of course,
old wives' tale.

Injective. A
mapping f: X -> Y where no two
values x in X produce the same result
f(x) in Y.
A one-to-one mapping is invertible for those values of X
which produce unique results f(x), but there may not be a
full inverse mapping g: Y -> X.

In the context of a
block cipher, a one way
diffusionlayer will
carry any changes in the data
block in a direction from one side
of the block to the other, but not in the opposite direction.
This is the usual situation for fast, effective diffusion layer
realizations.

Presumably, a
hash function which produces a result value
from a message, but does not disclose the message from a result
value.
But hash function irreversibility is not a special cryptographic
property, being instead is a natural property of (almost) any hash
when the information being hashed is substantially larger than
the resulting hash value. (Also see
polyphonic as a related concept.)

Many academic sources say that a "one way" hash must make it
difficult or impossible to create a particular result value.
That would be an important property for
authentication, since, when an
opponent can easily create a particular
hash value, an invalid message can be made to masquerade as real.
But it is not clear that we can guarantee that property any more
than we can guarantee
cipherstrength. (Also see
cryptographic hash and
MAC.)

In contrast, many other uses of hash functions in cryptography
do not need the academic "one way" property, including:

Operation
code: a value which selects one operation from
among a set of possible operations. This is an encoding of functions
as values. These values may be interpreted by a
computer to perform the selected operations
in their given sequence and produce a desired result.
Also see:
software and
hardware.

The schematic symbol for an op amp is a triangle pointing right,
with the two inputs at the left and the output on the right.
Power connections come out the top and bottom, but are often simply
assumed.
Power is always required, but is not particularly innovative, and
can obscure the crucial feedback path from OUT to -IN.

+PWR
| \ |
| \
-IN --| - \
| >-- OUT
+IN --| + /
| /
| / |
-PWR

Op amps were originally used to compute mathematical functions in
analog computers, where each amplifier was
an "operation."

In the usual voltage-mode idealization, each input is imagined
to have an infinite impedance and the output has zero impedance
(here the ideal output is a voltage source, unaffected by loading).
In the the rarer current-mode form, each input is imagined to have
zero impedance and the output has infinite impedance (that is, the
output is a current source).
Some current-feedback op amps for RF use have a low-impedance
voltage output and low-impedance current inputs.
Of course, no real device has anything like infinite gain, although
op amp gain can be extremely high at DC.

Stability

One important feature of an op amp is stability (as in
lack of spurious
oscillation; see discussion in
amplifier).
Op amp transistors have substantial gain at RF frequencies, and
unexpected coupling between input and output can produce RF
oscillations.
Unfortunately, these may be beyond the frequency range that a modest
oscilloscope can detect, with the main indication being that the
device gets unreasonably hot.

In the early days of IC op amps, the designer was expected to
produce a feedback network for each circuit that included
stability compensation to prevent oscillation.
Nowadays, the IC manufacturer generally buys stability by rolling
off the frequency response at the usual
RC rate of
6dB/octave(20dB/decade) so that no gain
remains at RF frequencies to allow oscillation.
As a result, many modern general purpose op amps do not have a lot
of gain at high audio frequencies.
A few (generally older) op amps (like the TL081) do allow the
designer to change the compensation and place the start of roll-off
at a higher frequency.
And while that can provide considerably more gain, it also carries
considerably more risk of instability.

The usual cures for instability include first isolating the power
supply, since that will go everywhere:

Bypass RF and audio from op amp (+) power pin
to (-) power pin, with a tantalum capacitor right at the pins.
(Tantalum capacitors are somewhat lossy at RF frequencies, and in
this case loss is good, to prevent power pulses from bouncing around
in a LC system composed of discrete capacitors and interconnect
inductance.)

More power isolation is available with 10 to 100 ohm resistors and/or
RF-lossy ferrite beads to
decouple the op amp from the power supply.

It is important that the signal input connections be very short so
they do not pick up signal from the output, either by electrostatic
or magnetic coupling.

It is also possible to add small resistors on both inputs, and
a tiny capacitor across the input pins, to form an RC lowpass
filter and thus cut the RF level in the normal input path.

The negative feedback resistor may be mounted right on the device
pins, for the shortest possible connection.

A tiny capacitor can be placed across the feedback resistor to
cut high frequency response above the needed bandwidth.

In-Circuit Gain

One of the main advantages of op amps is an ability to precisely
set gain with resistors and negative
feedback.
In an environment where the available devices have wildly different
gain values, the ability to set gain precisely over all production
devices is a luxury.
If the feedback is purely resistive, and thus relatively insensitive
to different frequencies, an op amp can be given a wide, flat
frequency response even though the open-loop response typically droops
by 6dB/octave (20dB/decade).
By using reactive components (typically capacitors) in the feedback
loop, the frequency response can be tailored as desired.
Moreover, in general, whatever gain is available beyond that
specifically programmed acts to minimize distortion.
For example, if we want a gain of 20
decibels (20dB or 10x) at 20kHz, we probably
want an op amp to have 40dB (100x) or more gain at 20kHz, so that
20dB remains to reduce amplifier distortion.

Operational amplifiers typically roll off high frequency gain at
around 20dB/decade for stability.
With that roll-off slope, the numerical product of gain and frequency
is approximately constant in the roll-off region.
A good way to describe this might have been "gain-frequency product,"
but the phrase actually used is "gain-bandwidth product" or GBW.
The GBW is the frequency at which gain = 1, which is way beyond the
useful region, since op amps are supposed to have "infinite" gain.
(In practice, GBW varies with supply voltage, load, and measurement
frequency, not to mention faster than expected rolloff in different
designs, so the relation is approximate).
To get 40dB (100x) of open-loop gain at 20kHz, we will need a
minimum GBW of about 100x 20kHz or 2MHz.
Exactly the same computation is used in bipolar
transistors, where GBW is known as the
"transition frequency," or fT.

In-Circuit Input Impedance

In most cases, the positive op amp input is not part of the
feedback system, and so has the normal high impedance expected of
an op amp input.
However, the negative op amp input almost always is part of
the feedback system, which changes the apparent input impedance.
High amounts of negative feedback act to keep the negative input
at almost the same voltage as the positive input.
If the positive input is essentially ground, the negative input
is forced by feedback also to be essentially ground, often
described as a virtual ground.

In most cases, external signals will see the negative input as
a low-impedance ground, and this happens because of feedback, not
op amp input impedance.
Circuits which require inversion and so use the (-IN) input may:

add an op amp simply to buffer the input signal,

increase resistor values to reduce the problem, or

use non-traditional feedback architectures, e.g., the LT1193.

The Linear Technology LT1193 has an extra pair of inputs for a
total of 4: 2 positive and 2 negative.
One (-IN) pin can handle feedback and in that way set the gain.
One (+IN) pin can adjust output bias.
That leaves another (-IN) and another (+IN), both high-impedance
differential inputs unaffected by feedback, which seems ideal for
balanced line receiving.
Unfortunately, as a wideband video op amp, the LT1193 has a massive
current budget (43mA) and high noise (50nV/sqrt-Hz).

Single-Supply Operation

Most op amp circuits show bipolar (that is, both positive and
negative) power supplies referenced to a center ground.
But few if any op amps have a ground pin, so they see only a
single power circuit across the device whether bipolar supplies
are used or not.
The problem is that op amps have to be
biased just like transistors: their output
needs to rest between supply and ground or it will not be possible
to represent both positive and negative signals.
Even op amps with rail-to-rail input and output ranges cannot
reproduce a negative voltage when operating on a single positive
supply.
Conventional op amps with a limited input voltage range may demand
that the bias level be near half the supply.
Often we need a low-noise, low-hum and sometimes even high-power
voltage reference, typically at about half the supply voltage.

The usual way to get an intermediate voltage in a
single-supply system is to use two similar resistors in series
from power to ground.
Unfortunately, this means that noise on the power lines will just be
divided by two and then used on the input side of which could be a
high-gain circuit. And the resistors will add
thermal noise, and possibly
resistor excess noise.
We can reduce supply hum and noise by splitting the upper
resistance into two and adding a serious capacitor to ground
there.
Another capacitor from the lower resistor to ground will act to
filter out high frequency signals from
Johnsonwhite noise.
Since most noise power is in high frequencies, removing those
frequencies can reduce the effective noise level.

In contrast to Johnson noise,
resistor excess noise is a
1/f noise and is highest at low
frequencies, and power filtering may have little effect.
Non-homogenous resistors will generate 1/f noise
proportional to resistance and current, and can be a serious problem
in very low-level circuits, especially below 100Hz.
Fortunately, most resistors connected to op amp inputs will carry
negligible current, making excess noise a non-issue there.
Metal film resistors generate minimal 1/f noise and
should be used i