Abstract:

The invention comprises an encoder for encoding a stegotext and a decoder
for decoding the encoded stegotext, the stegotext being generated by
modulating the log power spectrogram of a covertext signal with at least
one key, the or each key having been added or subtracted in the log
domain to the covertext power spectrogram in accordance with the data of
the watermark code with which the stegotext was generated, and the
modulated power spectrogram having been returned into the original domain
of the covertext. The decoder carries out Fast Fourier Transformation and
rectangular polar conversion of the stegotext signal so as to transform
the stegotext signal into the log power spectrogram domain; subtracts in
the log power domain positive and negative multiples of the key or keys
from blocks of the log power spectrogram and evaluates the probability of
the results of such subtractions representing an unmodified block of
covertext in accordance with a predetermined statistical model.

Claims:

1-52. (canceled)

53. A watermark key generator for generating a key for watermarking
covertexts, the generator comprising a generator for generating a
two-dimensional noise pattern of predetermined height and width, and a
filter for filtering the noise signal in one dimension with a cut-off
frequency which varies with position in the pattern.

54. A key generator according to claim 53 where the variation with
position is substantially inverse with the modulus of the coordinate in
the said dimension of the position relative to a reference point in the
pattern, the reference point being selected from the centre of said
dimension or is at one end of the other dimension.

55. A key generator according claim 53, wherein the filter in operation
acts as a low pass filter to filter the noise pattern so that it has no
frequency components with time scales shorter than a threshold time scale
τ, where τ=C|t|, where C is a positive constant in the range of
0.05 to 0.4 pixels per cycle per pixel, and t is the coordinate in the
said dimension relative to the reference point, and including a second
filter for filtering the white noise signal transversely to said one
dimension.

56. A key generator according to claim 55, wherein the second filter has a
cut-off frequency which varies inversely with modulus of the coordinate
in the second dimension so that the noise pattern has no frequency
components with time scales shorter than τ=C|t| where t is the
coordinate in the second dimension relative to the reference point, and
further including a random number generator responsive to a numeric seed
input to generate uniformly-distributed random numbers;a convertor for
converting the random numbers so generated into 1-d Gaussianly
distributed random numbers; andmeans for rearranging the Gaussianly
distributed random numbers into the two-dimensional noise pattern, and
wherein the generator for generating the random numbers is a Tausworthe
generator and the converter utilises the Box-Cox method.

57. A method of generating a key for watermarking covertexts, the method
comprising generating a two-dimensional noise pattern of predetermined
height and width, and filtering the noise signal in one dimension with a
cut-off frequency which varies with position in the pattern, the
variation with position being substantially inversely with the modulus of
the coordinate in the said dimension of the position relative to a
reference point in the pattern and the filtering acting as a low pass
filter to filter the noise pattern so that it has no frequency components
with time scales shorter than a threshold time scale τ, where
τ=C|t|, where C is a positive constant in the range of 0.05 to 0.4
pixels per cycle per pixel, and t is the coordinate in the said dimension
relative to the reference point.

58. A storage medium storing a key generated by the method of claim 57.

59. An electrical signal carrying a key as generated by the method of
claim 57.

60. An encoder for encoding a covertext signal to generate a stegotext,
the encoder comprising:transformation means for carrying out a fast
Fourier transformation and rectangular polar conversion of the covertext
signal so as to transform the covertext signal into a log power
spectrogram;means for providing a key in the form of a two-dimensional
noise pattern as generated by the method of claim 57; andmeans for
modulating the log power spectrogram with the key in accordance with a
watermark code.

Description:

[0001]The present invention concerns watermarking analog or digital
signals. It will be appreciated that whilst the signals may be video or
data signals the present invention is particularly, though not
exclusively, concerned with watermarking audio signals.

[0002]The term "watermarking" is intended to cover the procedure of adding
data to a main signal so that the added data does not affect the main
purpose of the main signal. The main signal is often referred to a
"covertext" and the signal with the added watermarking data is often
referred to as a "stegotext". Thus in the case of an audio signal the
presence of the added data in the stegotext is intended to be virtually
imperceptible to a listener when the stegotext is reproduced. However the
presence of the added data in the stegotext enables, if the user has the
appropriate decoding equipment, to identify the origin of the covertext.
If the user's equipment is provided with suitable circuitry he or she may
be prevented from reproducing the main data carried in the original
covertext signal if the watermark data recovered does not match the
equipment. Additionally a user has to be able to reproduce a covertext.

[0003]Such techniques obviously have great potential with regard to
musical recordings. As a result a substantial amount of effort has been
put into the problem of watermarking audio signals in such a manner that
a person who is entitled to listen to the stegotext does not have his or
her enjoyment impaired by spurious sounds caused by the added coded data.

[0004]Alternatively it is important that the watermarking should be
sufficiently robust both to remain effective after the various types of
conventional signal processing to which recorded and transmitted audio
material can be subjected and also to be able to resist direct attempts
to eliminate or render ineffective the added coded data.

[0005]An apparatus and method for watermarking analog signals is disclosed
in International Patent Specification No. WO98/53565, and this
specification discloses a number of the techniques which have been
employed to watermark signals.

[0006]One method of watermarking proposed in this prior published
specification involves measuring the short-term autocorrelation function
of the audio signal and then adding an additional signal which is hard to
hear and which changes the value of the short-term autocorrelation
function at some specific delay or delays to produce a specific waveform
which carries data at a low rate. The actual modulation of the data on to
this waveform can be done by using any of a number of suitable modulation
techniques. At the reception end of the apparatus a watermark reader (or
decoder) measures the short-term auto correlation function of the
stegotext and applies a demodulation appropriate to the modulation
technique used. Provided that the reader can utilise the data which was
initially used to modulate the autocorrelation function the added coded
data can be removed from the stegotext.

[0007]However the short-term autocorrelation function of many audio
signals can be easily altered to be arbitrarily close to zero at
arbitrarily long delays without altering the sound of the basic audio. It
is accordingly possible to attack the watermarked signal in a relatively
simple manner so as to nullify the effect of the watermarking.

[0008]The present invention is concerned with providing a watermarking
system which is not subject to the above defect and also to provide a
decoder for decoding watermarked signals.

[0009]In accordance with a first aspect of the invention there is provided
an encoder for encoding a covertext signal to generate a stegotext, the
encoder comprising:

[0010]first transformation means for carrying out a Fast Fourier Transform
and rectangular polar conversion of the covertext signal so as to
transform the covertext signal into a log power spectrogram;

[0011]means for providing at least one key, the or each key being in the
form of a two-dimensional pattern of predetermined size;

[0012]a multiplier for adding or subtracting in the log power spectrogram
domain multiples of the key or multiples of one or more of the keys if
there is a plurality of keys, to blocks of the transformed covertext
signal;

[0013]means for controlling the addition or subtraction of the key or keys
by the multiplier in accordance with data representing a desired code;
and

[0014]second transformation means for carrying out polar rectangular
conversion and inverse Fast Fourier Transformation of the modulated
covertext signal to generate a stegotext.

[0015]In accordance with a second aspect of the invention there is
provided a method claim of encoding a covertext signal to generate a
stegotext, the method comprising:

[0016]carrying out a Fast Fourier Transform and rectangular polar
conversion of the covertext signal so as to transform the covertext
signal into the power spectrogram domain;

[0017]providing at least one key, the or each key being in the form of a
2-dimensional pattern of predetermined size;

[0018]adding or subtracting in the log power spectrogram domain multiples
of the key or multiples of one or more of the keys if there is a
plurality of keys, to segments of the transformed covertext signal;

[0019]controlling the addition or subtraction of multiples of the key or
keys at the addition/multiplication step in accordance with data
representing a desired code; and

[0020]carrying out polar rectangular conversion and inverse Fast Fourier
Transform of the modulated covertext signal to generate a stegotext.

[0021]In accordance with a third aspect of the invention there is provided
a decoder for decoding a stegotext generated by modulating the log power
spectrogram of a covertext signal with at least one key (K), multiples of
the or each key having been added or subtracted in the log domain to the
covertext power spectrogram in accordance with the data of the watermark
code with which the stegotext was generated, and returning the modulated
power spectrogram into the original domain of the covertext the decoder
comprising: transformation means for carrying out Fast Fourier
Transformation and rectangular polar conversion of the stegotext signal
so as to transform the stegotext signal into the log power spectrogram
domain; means for providing the key or keys with which the original log
power spectrogram of the covertext signal was encoded; calculation means
for subtracting in the log power domain positive and negative multiples
of the key or keys from blocks of the log power spectrogram and
evaluating the probability of the results of such subtractions
representing an unmodified block of covertext in accordance with a
predetermined statistical model; and extraction means for recovering the
encoded data from the output of the calculation means.

[0022]In accordance with a fourth aspect of the invention there is
provided a method of decoding a stegotext generated by modulating the
power spectrogram of a covertext signal with at least one key (K),
multiples of the key or keys having been added or subtracted in the log
domain to the covertext power spectrogram in accordance with the data of
the watermark code with which the stegotext was generated and the
modulated power spectrogram having been returned to the original domain
of the covertext, the method comprising: carrying out Fast Fourier
Transformation and rectangular polar conversion of the stegotext signal
so as to transform the stegotext signal into the log power spectrogram
domain; providing the key or keys with which the log power spectrogram of
the original covertext signal was encoded; subtracting in the log power
domain positive and negative multiples of the key or keys from blocks of
the log power spectrogram and evaluating the probability of the results
of such subtractions representing an unmodified block of covertext in
accordance with a predetermined statistical model; and recovering the
encoded data from the output of the calculation means.

[0023]In accordance with a fourth aspect of the invention there is
provided a watermark key generator as set out in claim 45.

[0024]In order that the present invention may be more readily understood
embodiments thereof will now be described by way of example and with
reference to the accompanying drawings, in which:

[0025]FIG. 1 is a block diagram of a system for encoding and decoding a
covertext signal with additional data so as to generate a stegotext;

[0026]FIG. 2 is a block diagram of an encoder and decoder which can be
used in the embodiment of FIG. 1 to generate and decode a stegotext;

[0027]FIG. 3 is an illustration of the power spectrum of a segment of
music;

[0028]FIG. 4 is a diagram illustrating the overlapping of modulation
patterns when the power spectrogram is modified;

[0045]FIG. 22 illustrates the generation of parameters relating to music
for use in the decoders of FIGS. 16 and 20; and

[0046]FIG. 23 is a diagram illustrating the generation of a key.

[0047]Referring now to FIG. 1 the basic system consists of a key generator
(1), an encoder (2) and a decoder (3). The key generator (1) produces a
pseudo-random key based on an integer seed value input at (1'). The
encoder (2) marks a music file input at (4) as a cover text with data
using the key to generate a stegotext. The data is input into the encoder
(2) at (2'). The decoder (3), receiving the stegotext, over a
transmission line (5), reads back the data from a marked file again using
the key and outputs the recovered data at (6). The same key must be used
in the encode and decode operations to ensure that the data are read back
correctly. The key can, of course, be regenerated when needed from the
seed, so the seed value is all that is required to decode a marked file.
The transmission line (5) can, of course, take a wide variety of forms.
Thus the stegotext could be recorded on any suitable medium or
transmitted by radio, fibre cables or the like. Hereinafter any unmarked
file will be referred to as the covertext and a watermarked file as the
stegotext. Whilst the present embodiment is described in relation to its
use with music it will be appreciated that the techniques and apparatus
described can be used in non-musical situations such as speech or video
data.

[0048]FIG. 2 of the accompanying drawings shows a block diagram of a more
detailed embodiment in accordance with the present invention. In this
Figure the covertext is an unmarked audio file which is shown at (10) The
source of the audio file is indicated at 10'. This can be a microphone
picking up a live event, a recording such as a tape or disc or a signal
which has been transmitted by radio or the Internet. This audio file is
input to the encoder (2) and in circuit (11) is converted into a power
spectrogram. The reason for this conversion is as follows. It is not
feasible to convey information in the phase components of the stegotext.
The human ear is essentially insensitive to phase, a fact exploited by
some compression algorithms. Accordingly a watermarking technique that
depends on phase is unlikely to be robust to compression. Moreover it is
possible to process an audio file, scrambling the phases of its frequency
components, by applying a random "group delay" to the file. Such
processing, which is not computationally intensive, will in general
destroy any particular wave shape present in the audio file. Thus
watermarks which depends on wave shape, that is on the time domain form
of the signal, can be rendered unreadable by this processing.

[0049]Accordingly in the present invention it is proposed to carry out the
watermarking of a covertext by using the power spectrum of the covertext.
Thus only the magnitude of each frequency component in the covertext is
modified and the phase of each frequency component is preserved
throughout the marking process. Phase information is discarded in the
decoder. This procedure will now be described in greater detail.

[0050]In order to calculate the power spectrogram of the covertext the
covertext is divided into blocks 2Y samples long that overlap by half
their length. Thus a new block starts every Y samples. In the present
embodiment, which as described is designed for audio files with a sample
rate fs=44100 Hz, Y is set to 1024.

[0051]Each block is multiplied by a window function, known as the analysis
window, and the Fourier transform of the windowed block is calculated.
The purpose of the window function is to ensure that the sample values
taper off towards zero at either end of the block, avoiding a
discontinuity. The Fourier transform treats the block as the repeating
unit of a periodic function. Since the windowed block consists of real
samples, its Fourier transform is conjugate symmetric with respect to
positive and negative frequencies. The negative frequency components
carry no additional information and can therefore be discarded.

[0052]Each Fourier coefficient is a complex number whose magnitude
represents the amplitude of the corresponding frequency component and
whose argument represents its phase. When the phase information is
discarded, what remains is the power spectrum of the signal. In a strict
sense the power spectrum is obtained by squaring the magnitude of each
Fourier coefficient.

[0053]When a number of consecutive power spectra are placed alongside one
another, a grid of values is formed: one axis, conventionally vertical,
represents frequency whilst the other, conventionally horizontal,
represents time. This grid is the power spectrogram of the audio sample.
FIG. 3 of the drawings is an example of a power spectrogram taken from a
segment of music. In this figure the values in the grid are shown as
various shades of grey. The right hand column running from -8 to 3 is a
scale against which the brightness levels of the spectrogram can be
matched so that the spectrogram can be evaluated.

[0054]The choice of Y determines the resolution of the spectrogram. In the
frequency direction, the resolution is fs/2Y; in the time direction,
the resolution is Y/fs. In the present embodiment these values are
21.5 Hertz and 23.2 ms respectively. The axes of FIG. 3 are measured in
these units.

[0055]Whilst it may appear to be difficult satisfactorily to reconstruct
an audio waveform from its power spectrogram it is possible if phase
information is retained. The spectrogram data can be returned into the
time domain with an inverse Fourier transform, overlapped in the same
manner before, and added together.

[0056]In order to watermark the covertext from which the spectrogram was
obtained it has been discovered that as long as modifications to the
spectrogram are small and as long as the original phase information is
retained the above described method recreates a satisfactory audio
waveform. It is to be observed that the reconstructed time-domain
segments are no longer guaranteed to taper to zero at either end; the
subjective quality of the final waveform is therefore improved if the
segments are windowed with the synthesis windows as described before
being added together. The analysis and synthesis windows must be chosen
to ensure that there is no overall amplitude modulation through the
system. In the present embodiment each of these windows is the square
root of a raised-cosine function.

[0057]In FIG. 2 the modulation of the spectrogram is carried out in a
circuit generally indicated at (12) in response to the bit stream to be
encoded.

[0058]Finally in block (2) circuit (13) returns the modulated power
spectrograms to the time domain and synthesises these so as to convert
them into the stegotext. In FIG. 2 the stegotext is indicated at 15.

[0059]The decoder (3) comprises a circuit (16) for converting the
stegotext to a log spectrogram, a circuit (17) utilising the key to
correlate the log spectrogram so as in circuit (18) to extract the bit
stream representing the watermark code and which is output at (19).

[0060]It has been discovered that the extent to which an element of a
power spectrogram of an audio signal can be modulated without audible
effect is roughly proportional to its original level. Thus in decibel
terms additions or subtractions may be made to the power spectrogram up
to a fixed amount. The amount of modulation that is perceptible depends
on the listening environment, but is typically around 1 dB. Accordingly
in the present embodiment the watermarking process is carried out in the
"log power spectrogram domain" and consists in making additions or
subtractions to the power spectrogram in accordance with the key
generated by key generator 1 and the data to be encoded as the watermark.
The data is input at 12'.

[0061]Since a greater degree of modulation can be applied to spectrogram
elements with larger magnitude the information carried in those elements
will be less susceptible to noise than in elements with smaller
amplitude. However it is impossible to know beforehand which these
elements will be. Thus the watermarking scheme being described is
prepared to exploit whichever elements the covertext makes available for
carrying information. Thus in the present embodiment each spectrogram
element in circuit 12 is modulated so as to maximise the
information-carrying capacity of the watermark. Thus each data bit in the
watermark induces a pattern of modulation in a region of the spectrogram.
The pattern of modulations is applied in one sense to encode a "one" bit
and in the opposite sense to encode a "zero" bit. Bits are encoded at
regular time intervals, namely at a regular horizontal spacing T in the
spectrogram.

[0062]It is possible that there will be short segments of the covertext in
which it is impossible to hide a watermark such as the silent sections of
an audio file. It is therefore essential that each data bit affects as
long a section of stegotext as possible. In the present embodiment two
approaches to this problem are used.

[0063]FIG. 4 of the accompanying drawings shows in diagrammatic form one
of these approaches. In this approach the spectrogram modulation patterns
for adjacent bits overlap. In FIG. 4 each rectangle K represents a copy
of the modulation pattern. Each spectrogram modulation pattern K is x
time units wide and y frequency units high, y being the full height of
the spectrogram. In the present embodiment x is 32 and T=5. Thus when the
first 32 column wide block of the power spectrogram of the covertext is
modulated by a key of the same size and the key is then stepped by T (5
columns) then the initial five columns of the covertext will remain only
modulated by the corresponding five columns of the key. In the next
iteration of the modulation the columns 6 to 37 of the covertext will be
modulated by the key so that columns 6 to 32 will have been modulated
twice. At the third iteration columns 6 to 10 of the first block are left
with only the double modulation, but columns 11 to 32 are modulated for a
third time, while columns 33 to 37 will receive their second modulations
and columns 38 to 42 their first modulation. This sequence is repeated
for the entire length of the covertext. The values of x and T can of
course vary over a wide range. For example x can be 256, and T can be 10.

[0064]The second approach is to apply an error-correction code to the
message bits to spread the effect of each bit still further in time.

[0065]A convolution encoder shown in FIG. 5 of the drawings is used to
spread the effect of each input bit over a longer section of the music in
a way which reduces the memory requirements in the decoder as compared to
the use of a longer key. The data stream to be encoded is input on a line
(30) to a shift register which in this embodiment consists of three
D-type flip flops (31, 32 and 33). The clock (clk/2) is provided on a
line (34). An output switch (35) which is flipped at a clock rate (clk)
is connected to the outputs of a pair of exclusive-OR gates (36, 37) so
as to select between one of two exclusive-OR combinations of the bits in
the shift register formed by the three flip flops. In the present
embodiment the upper exclusive-OR gate (36) is connected to all three
bits of the shift register and the lower gate (37) to bits 0 and 2. This
encoder is specified by the two-dimensional matrix [111; 101], where the
first row of the matrix corresponds to the shift register connections
made to the upper exclusive-OR gate, and the second row of the matrix
corresponds to the connections made to the lower exclusive-OR gate (37).
The patterns of connections can be expressed in polynomial form, with
coefficients from the set {O, 1}. In this case the polynomials are
X2+X+1 (gate 36) and X2+1 (gate 37).

[0066]In this encoder each input bit affects six consecutive output bits
(the total number of entries in the matrix) and the output bit rate is
twice the input bit rate (the number of rows in the matrix). Such a code
is called a "rate 1/2 code". The entries in each row of the matrix, the
"generator polynomials", have to be chosen carefully. Only codes whose
rate is the reciprocal of an integer can be used in the present
embodiment. It will be appreciated that this restriction is only caused
by the type of encoder used in the present embodiment and has no other
relevance. Thus if another form of error correction coding was to be used
the restriction need not apply. The identity code, which passes to its
output unchanged, is specified by the matrix [1].

[0067]The code is called "convolutional" because it can be implemented
using a convolution function as follows. The input data bits are first
interspersed with zeros according to the code rate. For example, suppose
the original data are (1011). The data are interspersed with zeros to
obtain (1000101) so that the data is now at half the original rate.
Convolving these data with (111011) which is the above encoder matrix
written as a single row, yields (111022212111). Taken modulo-2 this is
(111000010111). The modulo-2 operation performs the function of the
exclusive-OR gates (36 and 37) in the encoder. Thus a four-bit sequence
has been encoded into a twelve-bit code word. In general an n-bit
sequence is encoded into a (2n+4)-bit code word.

[0068]Whilst a convolutional code has been described it will be
appreciated that many other suitable types of error-correcting code can
be used. Such codes include Reed-Solomon code, BCH codes, Golay codes,
Fire codes, Turbo codes, Gallagher codes and Mackay-Neal codes.

[0069]Synchronisation encoding is carried out before convolution encoding.
Thus, synchronisation flags are inserted into the encoded stream of data
bits. Thus synchronisation of the encoded bit stream is achieved by
inserting a start flag into it. The flag pattern is a "0" followed by
five "1"s. To ensure that this pattern does not otherwise occur in the
data stream, any sequence of four "1"s has an extra zero bit inserted
after it. This is known as "zero stuffing". The stuffed zeros are removed
by the decoder. This procedure gives a penalty of six bits per start flag
plus an overall reduction in data rate by just over three percent. Those
skilled in the art will realise that many alternative methods for this
are possible.

[0070]Returning now to FIG. 6 of the drawings this shows how the two
processes just described are incorporated in the encoder. Thus again (10)
indicates the covertext to be encoded and K indicates a current window
being processed and K-1 indicates the previous window. At the multiplier
(41) the extracted window K is multiplied with an analysis window
function (42) so that the extracted window tapers at each of its ends. In
circuit (43) the fast Fourier transform of the extracted window modified
by the analysis window function is obtained. Rectangular polar conversion
is carried out in circuit (44) so as to generate the required power
spectrogram. This power spectrogram is modified as already described in
circuit (45), which corresponds to circuit 12 of FIG. 2, with the phase
component of the spectrogram remaining unchanged.

[0071]To complete the generation of the stegotext, polar rectangular
conversion is carried out in circuit (46); inverse fast Fourier transform
is taken in circuit (47) and the synthesis window function (48) is used
to multiply the output of the inverse fast Fourier transform circuit (47)
at (49). Finally the overlapped windows are added at (50) to generate the
stegotext indicated at (15).

[0072]It will be appreciated that it is desirable to have a number of
different watermarks available. Basically the possibility of using a
number of different watermarks makes it considerably harder for an
attacker to decode, remove or falsify a hidden message without knowledge
of which mark is to be used.

[0073]In the present embodiment the word "key" is used to refer to a
particular member of a family of watermarks. Again in the present
embodiment the keys are generated pseudo-randomly, and any one key is
determined by a single integer which is used as a seed. This is the seed
input shown in FIG. 1 of the drawings.

[0074]In the present embodiment the key is an array K(t,f) of spectrogram
modulation values where t and f are integer indices and
-1≦K(t,f)≦+1. K(t,f) is defined to be zero outside the
range -X/2≦t<X/2 and 0≦f<Y. Let the spectrogram of
the covertext be G(t,f) and the spectrogram of the stegotext be H(t,f).
Let di represent the data bits to be encoded, where di=±1
(rather than 0 or 1). In the interests of simplicity error-correction
coding is ignored. Then the encoding algorithm is given by

where s is a real constant which determines the encoding strength. In
equations (1) and (2) G and H are complex but K is real. Thus, by
equation (1), argH=argG. The watermark is therefore encoded in the power
spectrum, and the phases of the original spectral components are
preserved.

[0075]It will be appreciated that the design of the key is of paramount
importance in the generation of a stegotext which is robust against
attack. Thus design consideration for the key will now be described in
detail.

[0076]A key that consists simply of a white noise pattern, where each cell
in the key is independently and identically distributed, is attractive
for many reasons. It is computationally easy to generate and has the
maximum possible information-carrying capacity. In general it has low
correlation with the covertext and has a single narrow autocorrelation
peak. Experiments have shown that it can be robust to a wide variety of
manipulations of the audio file while still being encoded with
sufficiently low strength to be inaudible. However by manipulating the
stegotext using a group delay attack on the spectrogram in which
individual rows are shifted left or right at random, it is with the
spectrogram resolution already given possible to arrange the group delay
parameters so as to shift the rows by more than one column. This destroys
any correlation between stegotext and key. It appears to be impossible to
select a spectrogram resolution that simultaneously gives perceptually
satisfactory construction of the stegotext and robustness against all
forms of this group delay attack.

[0077]Moreover the stegotext can be resampled so that all frequencies
increase by say 5% (less than one semitone), and the text shortens in
time by the same factor. The effect of this on the spectrogram is to
stretch it vertically and shrink it horizontally. This procedure is shown
diagrammatically in FIG. 7 where 15A represents the original stegotext
and 15B is the altered stegotext. It can be seen that very few of the
cells will still coincide; along the frequency axis cells with
f≧20 will not overlap at all with their previous positions. The
correlation function is again destroyed.

[0078]The first of these two problems, namely stretch in one dimension,
can be overcome by modifying the key so that it contains repeating
columns. Experiment shows that repeating each spectrogram column twelve
times is sufficient to ensure that the group delay required to destroy
the correlation function has a perceptually unacceptable effect on the
stegotext. The cost is in reduced information-carrying capacity: the
autocorrelation peak of the key is wider and lower, and so a greater
encoding strength isreq required for a given robustness.

[0079]The second problem can be overcome by exhaustive search. The
correlation function can be evaluated at a range of different resampling
rates and, by finding the one that gives the strongest correlation,
determine by what factor the file has been resampled. Unfortunately, it
is possible to resample a stegotext in such a way that the pitches change
but the overall time remains constant, or so that the pitches stay
constant but the overall time changes. This latter process is common in
broadcast applications where, for example, it is desired to make a piece
of music fit exactly a given slot. There is therefore a two-dimensional
space of possibilities to search: the stegotext may have been arbitrarily
stretched in frequency and/or time. If the key has been modified to
include repeating columns as above, the autocorrelation function is wide
and hence the range of possible time stretches need only be sparsely
sampled; nevertheless, the computational burden is great.

[0080]However the present invention provides a solution to this problem.
Considering carefully the effect on the key of a stretch relative to some
fixed origin, it can be seen that the relative effect of the stretch is
constant over the key; it is the absolute effect that varies and which
gives rise to the problem above. In the present embodiment the key
pattern is modified so that higher spatial frequencies are filtered out
further from the origin.

[0081]For the purposes of the following discussion the covertext,
stegotext and key will be considered as images in the log spectrogram
domain. When referring to "frequencies" this will mean spatial
frequencies in these images, not frequencies in the underlying audio.

[0082]First consider the problem in one dimension. Let f(t) be a sine
wave, f(t)=sin ωt. Squashing by a factor α, gives g(t)=sin
αωt. The phase angle φ between these sine waves is given
by φ=αωt-ωt=ωt(α-1). Insisting that the
correlation between f(t) and g(t), calculated in a suitably-chosen
interval around t, exceed some threshold is equivalent to bounding the
phase angle φ so that |φ|<φ0. Thus there is a
constraint on ω in terms of
t:|ω|<φ0/(α-1)t, or, where α is chosen to
be the greatest stretch to which resistance is required, 1ω>C|t|
for some positive constant C. In view of this relationship, it is simpler
to talk in terms of the timescale τ of a sinewave, where
τ=1/ω.

[0083]It is now possible to specify the frequency content of a function
that correlates well with itself when stretched. It must contain no
frequency components with timescales shorter than a threshold timescale
τ=C|t|, where the constant C sets the degree of stretch resistance
desired. Such a function can be obtained by suitable filtering of a white
noise signal. A low-pass filter is required whose cutoff frequency varies
inversely with t. Such a filter will hereinafter be referred to as a
"swept" filter.

[0084]As already described in the present embodiment keys corresponding to
consecutive data bits are overlapped. In order to minimise overlap
between the frequency components present at a particular point in time
due to one copy of a key and those due to previous or subsequent copies,
a high-pass filter is also applied to the key. The overall effect is
therefore that of a band-pass filter. The cutoff frequency of the
high-pass filter is swept so as to match the low-pass characteristics of
the adjacent keys. This is demonstrated in the graph of FIG. 8. The
"bandwidth" Δ, which is constant in terms of timescale, is given by
Δ=CT, T being the interval between consecutive applications of the
key to the covertext. FIG. 9 shows how the copies of the key for four
consecutive bits d0, d1, d2 and d3 overlap.

[0085]An example of the result of applying such a swept band-pass filter
of the type just described to a white noise signal is shown in FIG. 10.

[0086]Similarly, a two-dimensional key can be generated from a
two-dimensional white noise pattern. A filter with characteristics
varying as described above is applied separately in each dimension. After
filtering, the data values are passed through a non-linear function to
enforce the condition -1≦K(t,f)≦+1; in this embodiment,
sine is used.

[0087]An example of the resulting pattern appears in the power spectrogram
of FIG. 11. Here the axes are `time` and `frequency` (audio frequency is
now meant): these match the axes of the spectrogram of FIG. 3 to which
the key will be applied. The origin in the time direction is in the
middle of the key, whereas that in the frequency direction is at the top.
The right hand column in FIG. 11 has a similar function to that of the
scale column in FIG. 3.

[0088]The present embodiment applies band-pass filtering not only along
the x axis but also along the y axis, although since copies of the key
are not overlapped in that direction just low-pass filtering would
suffice. It is possible to increase the information-carrying capacity of
the key by using a low-pass rather than a band-pass filter.

[0089]If the value of the constant C in the above equation is increased,
the key generated becomes resistant to greater stretches. The high-pass
and low-pass filter characteristics become closer together and so the
pass band of the filter becomes narrower. This reduces the
information-carrying capacity of the key. There is thus a tradeoff to be
made between stretch resistance and information-carrying capacity. In the
present implementation C=0.15 (pixels per cycle) per pixel and the key so
generated works satisfactorily under stretches of up to about ±6% in
either the time or the frequency directions. It will be seen that the
above definition of C refers to pixels. In the present context the term
pixel has different meanings when considering filtering in the horizontal
direction of the spectrogram and filtering in the vertical direction.

[0090]In the horizontal direction the term pixel is being used to mean the
time interval between columns of the spectrogram. When considering
filtering in the vertical direction the term pixel is used to mean the
difference in frequency of two adjacent rows of the spectrogram.

[0091]Thus in FIG. 11 a horizontal pixel is approximately 23 millisecs.
whereas in the vertical direction it is approximately 22 Hz.

[0092]Thus in using the formula τ>C|t| for the passband of the low
pass filter τ is measured in pixels per cycle and t represents the x
or y coordinate of the point of the spectrogram under consideration
measured in pixels as hereinbefore defined from a reference point or
origin.

[0093]In FIG. 11 the reference point is at the centre of the top edge of
the picture. This reference point is chosen so as to correspond to zero
frequency. It is possible to select other reference points but the zero
frequency condition is preferable.

[0094]The peak correlation of a typical key with itself after a range of
stretches in both frequency and time is shown in FIG. 12. The values have
been normalised so that the peak autocorrelation of the key is 1.

[0095]When a two-dimensional correlation between the key and the stegotext
is calculated, it is found that the correlation peak can move slightly
away from the line y=0 when the stegotext has been stretched in
frequency. For this reason the present embodiment uses a two-dimensional
correlation; the values of the function at small offsets in the y
direction are added together to form a one-dimensional function to pass
to the bit synchroniser.

[0096]Referring now to FIG. 23, this shows a flow diagram of the
generation of a key K.

[0097]At step K1 a seed integer is input and at step K2 this
number is supplied to a Tausworthe generator which generates uniformly
distributed random numbers. The output of the Tausworthe generator is
supplied at step K3 to be converted by the Box-Cox method into 1-d
Gaussianly distributed random numbers. In a key with x=32 and y=1024
there will be 32768 such random numbers. The process carried out by the
Tausworthe generator and the Box-Cox method are fully described in
"Principle of Random Variate Generation", a book by John Dagpunar
published in the series Oxford Science Publication by the Clarendon Press
in 1988.

[0100]At step K6 the data values are passed through a non-linear sine
function to enforce the condition -≦K(t,f)≦+1.

[0101]Finally at step K9 the key can either be directly used in an
encoder or a decoder or can be stored in a suitable readable memory.

[0102]All the above processes can be carried out by a suitably programmed
computer as shown at 300 and stored on a recording medium 301 which can
be a CD, ROM, DVD disc, tape or any other suitable storage medium.

[0103]Having described the basic steps and principles of encoding a
covertext in accordance with the present invention, FIG. 13 shows a block
diagram of an encoder.

[0104]As in previous figures (10) represents a covertext, in the present
embodiment music, which is to be encoded and (15) represents the final
stegotext.

[0105]In circuit (51) the covertext is transformed into a log abs
spectrogram.

[0106]The spectrogram so generated is supplied to a FFT circuit (51) where
the received spectrogram is clocked by a clock (52) into a spectrogram
buffer. The FFT circuit (51) carries out the overlapping segmentation of
the input spectrogram and the windowing function described in FIG. 6. The
clock (52) ensures that the content of spectrogram buffer (53) represents
in spectrogram form a quantity of music equal to the length of the key,
in the present embodiment 256 or 32 columns.

[0107]The data to be encoded is supplied at (55) to a circuit (54) for
adding, as already described, synchronisation flags and for carrying out
zero stuffing.

[0108]The output of circuit (54) is supplied to the convolution encoder
(56) which corresponds to the encoder described with respect to FIG. 5
and which has the requisite polynomials supplied to it at (57).

[0109]The key matrix is supplied to the encoder at (58) to a circuit (59)
where the key matrix is converted into a set of values which can be
directly multiplied into the spectrogram held in the spectrogram buffer
(53). These values are in the form of two matrices, one for encoding a
zero bit and the other for encoding a one bit. These matrices are the
antilog of the key and the reciprocal of the antilog of the key. The
operation of multiplying these matrices into the held spectrogram is
equivalent to adding or subtracting in the log spectrogram domain.

[0110]The degree of strength with which the key modulates the contents of
the buffer (53) is determined by an input (60). This input corresponds to
the real constant s in equation 2.

[0111]The two matrices are selectively multiplexed with the contents of
the spectrogram buffer (53) as indicated at (61), the selection being
made in accordance with the output of the convolutional encoder (56) so
that the music stored in buffer (53) is encoded with a single bit of
data. The contents of the buffer (53) are shifted along by one clock
period for each bit written into the IFFT circuit (62) so that the main
encoding loop is executed once for each bit written.

[0112]The output of the IFFT circuit (62) is applied to an anti-clip
buffer (63). This is to ensure that the data read from circuit (62) is
not clipped when the data is written out as a music file. If clipping is
imminent an amplitude modulation curve is generated to reduce the volume
of the output gradually so that clipping is just averted. The volume is
increased to normal, again gradually, when it is safe to do so.

[0113]Finally the output from anticlip circuit (63) is output as the
stegotext (15).

[0114]FIG. 13 also includes a scrambler 65. Many possible scramblers can
be used but a typical one is described in the V32 standard of CCITT. The
inclusion of a scrambler is optional, as is the convolutional encoder.

[0115]The previous description has for reasons of simplicity described the
use of a single key. It will of course be appreciated that more than one
key can be used, each key having been generated by a different seed
integer. Additionally multiples of the key or keys can be used to
watermark a stegotext. In the above described embodiment the multiple is
"1" so that wherever a multiple of a key is mentioned it is implicit that
the multiple can be "1" i.e. the key remains unchanged apart from its
sign.

[0116]The actual way in which two or more different keys are used to
watermark a stegotext or to retrieve the watermark code are entirely
analogous to the embodiments described in this specification. Thus if
there is more than one key which key is multiplied into the spectrogram
at (61) at any one time will be determined in accordance with the data to
be encoded. If multiples other than ±1 are used to modulate the
spectrogram more than 1 bit can be encoded each time. Decoding will, of
course, utilise the same set of multiples.

[0117]Having described an embodiment of an encoder according to the
present invention attention will now be turned to the problem of decoding
a stegotext which may have undergone compression or stretching to recover
the coded data.

[0118]Having discussed the characteristics of the key which is used in the
encoder of FIG. 13 to modulate the power spectrum of the covertext it
will be appreciated that when decoding a stegotext to extract the
watermark data the data bits can be identified by correlating the key
with the power spectrogram of the stegotext. If the stegotext has not
undergone an attack or has not otherwise being stretched or compressed
there will be a clear correlation between the stegotext and the key at
those log elements which have been modified in accordance with the data.

[0119]The previously described key is capable of dealing with distortions
of the stegotext which involve stretches of the stegotext in either the
vertical or horizontal directions of ±6%.

[0120]In order to deal with cases where the stegotext has undergone
greater stretch than the ±6% allowed for by the key a number of
approaches are possible.

[0121]However the present embodiment utilises a different approach which
does not involve direct correlation and this will be now described in
detail. Before describing the actual circuits of a decoder there will
firstly be a general discussion of the principles involved.

[0122]Most demodulators and decoders in the world have been designed to be
optimal under the assumption that the wanted signal is corrupted by noise
that is additive, white, stationary, and Gaussian. Most are also
1-dimensional in the sense that at any individual time only one real
value is received; some are 2-dimensional. In the present embodiment
signals are encountered where at any one time point many real values in
the form of a whole spectrogram column are received.

[0123]Noise is said to be Gaussian if for any individual sample the
marginal probability distribution of the value of the noise in that
sample (that is the distribution observed if no notice is taken of what
the values of other samples are) is a Gaussian, or Normal, distribution
with mean zero.

[0124]The noise is said to be white if the Fourier spectrum of the noise
is examined, the marginal probability distribution of any individual
element of that spectrum is the same as that of the marginal probability
distribution of any other element.

[0125]It is said to be stationary if the marginal probability distribution
of any one time-domain sample is the same as that of any other sample,
and the joint probability distribution of any one excerpt of the noise of
a given length is the same as that of any other such excerpt.

[0126]In almost all cases all of these assumption are violated to some
extent. Generally this is not important. However in the present invention
exceptions from the assumptions are of great importance and accordingly
will be discussed in detail.

[0127]For example, Student noise can be considered as an example of
non-Gaussian 1-dimensional noise. The Student distribution is like a
Gaussian distribution, except that the standard deviation is different
for each sample; in particular, the inverse variance is drawn afresh for
each sample from a Gamma distribution.

[0128]FIGS. 14 and 15 show respectively an excerpt of 1-dimensional white
Gaussian noise of variance 1, and of 1-dimensional white Student noise
(with a shape parameter m of the associated Gamma distribution equal to
1, and scaled to also have variance 1). Both have the same variance--but
they look quite different. The Student distributed noise has some large
spikes in it, which would occur only exceedingly rarely in Gaussianly
distributed noise.

[0129]This matters in situations where the noise is impulsive and where
outliers can force (under Gaussianity assumptions) conclusions that are
far from correct.

[0130]Another example of non-Gaussian noise that is less frequently
encountered is noise that is distributed as the log of the ratio of two
quantities that are each Gamma distributed. Such noise may have many
downward spikes, but none upwards (or the other way round). Such noise
turns out to be relevant to the audio watermarking problem.

[0131]Non-whiteness may be violated in more ways than the immediately
obvious. An example of this is coloured noise. This is noise in which
some frequencies are present more than others, but in which the noise is
stationary. Typically one may see "pink" noise, where low frequencies
predominate; band-limited noise where other frequencies have been
filtered out; and "1/f" noise, where the power spectral density is
inversely proportional to frequency down to some limit greater than zero.

[0132]In general, however, there are other types of non-white noise.

[0133]One example is 1-d non-stationary noise. If one dimension only is
considered, the noise may be non-stationary; for example the noise from a
resistor that gets too close to an intermittent heat source; or the noise
recorded after a filter containing a varying capacitance (which will
cause change of "colour" as well as of amplitude with time).

[0134]Alternatively if the signal is multidimensional, the noise in
different dimensions may be correlated, even though the signal(s) is/are
stationary against time.

[0135]Alternatively the noise in some channels/dimensions may have greater
amplitude than in others. And of course there is nothing to stop
multidimensional noise also being non-stationary against time--as indeed
it would not be unreasonable to expect of a log abs spectrogram of music.

[0136]In general violation of any of these assumptions may be important in
attempting to decode a watermark from a stegotext. However, experiments
have shown clearly that if the point of interest is robustness in
decoding watermarks of the type with which the present invention is
concerned, and particularly in music, then non-Gaussianity is of little
importance, while general non-whiteness, in the forms both of
correlations between signals between components of different frequencies
and of variation in amplitude at different frequencies, is crucially
important. A decoder that takes account of non-Gaussianity benefits in
producing more sensible values for various intermediate variables in the
calculations; whereas one that fails to take account of non-whiteness
loses performance and wastes memory and flops.

[0137]Basic to what follows is the concept of the multidimensional
Gaussian distribution. Let it be supposed that x is an N-vector of real
numbers (i.e. it is a column vector with N elements). Let μ denote the
mean of the distribution of x (also an N-vector).

[0138]Now, just as in the 1-dimensional case there is a variable a
denoting standard-deviation (or alternatively denoting 1/σ2),
in the N dimensional case there is an N×N matrix S of real numbers
which plays a similar role to that of s in the 1-dimensional case but
with greater complexity.

[0139]First, consider a 2-dimensional case where the distribution of x is
spherically (i.e. circularly) symmetric about μ. In this case any
slice through the distribution that passes through μ will look like a
1-dimensional Gaussian distribution with for example a standard deviation
σ. In that case S will be I/σ2, where I denotes the
identity matrix. If instead of being circular, the distribution has
elliptical symmetry with the long axis of the ellipse aligned with one of
the coordinate axes, the S will be a diagonal matrix with the two
diagonal elements different and positive. If the ellipse is not aligned
with the coordinate axes, then S will be a 2×2 symmetric matrix
with all entries non-zero and the diagonal elements positive. The inverse
V of S is known as the covariance matrix of the distribution, and S will
be referred to as the icov matrix (short for inverse covariance).

[0140]S also has one other important property: it is positive definite,
which means that for any non-zero vector y, y'Sy will be a positive
scalar (Here ' denotes transpose; y'Sy is always a scalar, but it can
only be guaranteed that it will always be positive if S if positive
definite and y is not zero).

[0141]In the N dimensional case, the same picture holds, except "circle"
is replaced by "N-dimensional sphere" and "ellipse" by "N-dimensional
ellipsoid".

[0142]In this case the formula for the probability density of such a
distribution is:

[0143]Now suppose it is required to consider a "random" N dimensional
Gaussian distribution with zero mean.

[0144]"Random", of course has no real meaning unless it is stated what
distribution it is being drawn from; what is needed is a distribution on
N dimensional Gaussian distributions with zero mean. What that amounts to
is a need for a distribution on icov matrices; or one on positive
definite symmetric N×N matrices S.

[0145]One distribution which satisfies these requirements is the Wishart
distribution.

[0146]The Wishart distribution has two parameters (other than N), namely
k, the "tightness" or "shape" parameter, and V, the "scale matrix". k
must be greater than N-1; the larger it is, the tighter is the
distribution about its mean. V must be positive definite and symmetric.
The mean is kV-1. The density is

If N is 1, this degenerates to a gamma distribution. However if m and r
are the shape and scale parameters of the gamma distribution, then V is
2r and k is 2m.

[0147]When 1-dimensional Student noise was previously discussed, it was
noted that for each sample there was drawn a new value of
S=1/σ2 from a gamma distribution, the actual sample then being
drawn from a 1-d Gaussian with zero mean and inverse-variance s.

[0148]One way of making multidimensional Student noise, then, is to draw,
afresh for each sample, a new icov matrix S, then draw the sample itself
from a multidimensional Gaussian distribution with zero mean and icov
matrix S. Each new icov matrix should be drawn from a Wishart
distribution with some appropriate parameters k and V.

[0149]In the special case that the scale matrix V of the said Wishart
distribution is of the form rI for some positive real number r, this
Student distribution is spherically symmetric, and given by:

It is now possible to appreciate what the log abs spectrogram of music
actually looks like. It can be modelled as multidimensional Student
noise, but a much better fit is obtained if it is modelled slightly
differently. Instead of drawing a new icov matrix for every sample,
either a new icov matrix will be drawn once every bit period, or less
frequently. Then each sample from the resulting quasi-Student
distribution is considered as one column of the spectrogram. This will be
called the "detV" distribution for reasons that will become apparent
hereinafter.

[0150]The question of what the parameters of the associated Wishart
distribution should be will be addressed later in this specification.
However, ideally these parameters will be determined for some broad
corpus of music, but in practice it will probably not make a great
difference if only a small number of pieces is used, providing that they
contain representatives of the sort of music likely to be encountered.
The "remaining uncertainty" can be taken up by the Wishart distribution
having k not greatly larger than N.

[0151]The special case that the Wishart scale matrix V is a very small
multiple of the identity (in practice taken to be e-12I) will be
referred to as the "detZ" distribution-"Z" here is for zero.

[0152]Having discussed the above background it is now possible to define
the characteristics of decoders which can be used to decode stegotexts
generated in accordance with the present invention. Each decoder is
designed to be an optimal bit-by-bit memoryless decoder for a particular
model of the noise (i.e. the covertext or music). As such each decoder
calculates the Bayesian posterior distribution on a data bit of interest
in the light of what has been received, then picks the value (0 or 1)
with the highest posterior probability. Note that the word "optimal" here
has a very specific meaning--the decoder is that which produces the best
output, irrespective of whether it is the cheapest or the most expensive
decoder to implement.

[0153]Clearly such a decoder will only really be optimal if the model is
accurate. If not, the effective capacity of the channel will be reduced.

[0154]Thus it is possible to implement the decoder using a correlation
function, in turn implemented using FFTs to produce a bit-by-bit
memoryless decoder on the assumption that the "noise", is stationary,
white, and Gaussian. The fact that this assumption is false leads to very
great loss of capacity when the signal strength is reduced to 0.005. This
loss can be mitigated to some extent by throwing away all the low
frequency components of the key, but it is still a heavy loss.

[0155]Accordingly the embodiments of a decoder to be described involve
changing to a noise model which defines the noise as non-white
multidimensional Gaussian noise that is restricted to being stationary in
time, and to have a known icov matrix.

[0156]A decoder can be implemented in an extremely simple manner for this
noise model: The known icov matrix S is taken, its Cholesky decomposition
is formed (i.e. its triangular positive-definite square-transpose-root C
such that C'C=S), and both log abs spectrogram of the received signal and
the key are pre-multiplied by C. Because the process of pre-multiplying
by C converts the presumed non-white noise into white noise, this decoder
can be called the "Whitened Gaussian" decoder.

[0157]This decoder works well when the icov matrix S is indeed that
modelling the noise (i.e. the log abs spectrogram of the music). To see
why, the noise may be considered as a very elongated ellipsoid in
N-space. This decoder works by stretching the ellipsoid into a sphere,
while at the same time stretching the carried signal by the same amount
in each direction. The result is that the part of the signal parallel to
the shortest axis of the ellipsoid is enormously stretched, way beyond
the noise, and hence is easily recoverable.

[0158]However, this decoder has problems if the S it is using is not that
of the music whose watermark is being decoded. Intuitively this happens
because instead of stretching the noise into a sphere, it is stretched
into a different elongated ellipsoid--and hence the absolute magnitude of
the noise may remain large compared with that of the stretched signal. So
at the very least this method is going to demand availability of the
correct icov matrix for each piece of music at the time of reading its
watermark, already a significant disadvantage.

[0159]The situation is, however, very much worse than that, because of the
fact that operations are in the log abs spectrogram domain. Consider the
distortion that consists of adding white Gaussian noise to the
time-domain music signal. If the operations were in the time-domain, this
would simply slightly widen all the narrow dimensions of the ellipsoid.
However, for this to be true in the log abs spectrogram domain, the
distorting-noise added would have to be constantly varying in spectral
content in a manner that remained precisely proportional to the spectral
content of the music. Since this is not the case, more complicated
behaviour can be expected. It turns out that this behaviour involves
rotating the ellipsoid to point in one or more different directions--and
the decoder fails in exactly the same way that it would have if the icov
matrix of a different piece of music had been used.

[0160]In order to overcome these problems the decoder to be described uses
the "detV" distribution--i.e. a multidimensional Gaussian noise
distribution with unknown icov matrix drawn from a Wishart distribution
and locally constant over the period of 1 bit. This takes account of the
fact that neither the original icov matrix of the music being decoding
from, nor the exact effect on it of the distortions that have occurred in
distribution are known--but that it is known that such a matrix exists.

[0161]Thus the decoder to be described is designed to be an optimal
decoder for this noise model. Accordingly the embodiments of decoders to
be described hereinafter use a probabilistic approach to calculate given
the received signal, the posterior distribution of each data bit. In
order to do this the decoders to be described utilise Bayes Theorem.

[0162]Thus consider the signal received during 1 bit period, which is M
columns of log abs spectrogram each being a vector of N elements.

[0163]It is assumed that the mean of the noise distribution is zero.
Providing the actually observed mean of the whole spectrogram is
subtracted, this is very nearly equivalent to saying that the mean is
unknown, and that a priori it was believed it could equally well have
taken any value.

[0164]Now let K be the value of the key in the log abs spectrogram domain
as it was actually used--i.e. it has already been multiplied by the
strength at which it was used (and any use of 10 instead of e as base of
logarithms has been allowed for). If the key consists of M columns each
of height N then K is an N×M matrix. Let b be the value of the bit
in question; it is assumed that there is no overlapping bit as it turns
out to make so little difference to the decoder whether or not there are
overlapping bits that it can be assumed there are not for simplicity.
Consider, for simplicity, b to be +1 or -1 rather than 1 or 0. Let X be
the matrix consisting of the M columns of the log abs spectrogram of the
music onto which b was watermarked. Let Y be the received log abs
spectrogram columns corresponding to X in the stegotext. It is assumed
for now that timing is known.

[0165]What now has to be known is P(b|Y). It would of course be equally
good to know

P ( b = + 1 Y ) P ( b = - 1 Y ) ##EQU00007##

(or its logarithm) and if this is greater than 1 (respectively positive)
to decode b=1, and otherwise b=-1.

[0166]What is the starting information? This can be summarised in the
following equations:

[0168]Equation 15 is derived from the above equations using basic
probability theory which states that P(A)=∫P(A,B)dB, namely that the
probability of a union of disjoint events being the sum or integral of
individual probabilities, and by the definition of conditional
probability which states

P ( A B ) = P ( A , B ) P ( B ) .
##EQU00010##

[0169]Equation 16 follows from equation 15 by use of equation 9. Equation
17 follows from equation 16 by use of equation 10. Finally equation 18 is
obtained by simplifying by collecting factors.

[0170]The integration of equation 18 is exceptionally complex but results
in equation 19 from which follows equation 20. Thus equation 20 is
obtained from equation 19 by dividing the version of equation 19 where
b=1 with the version in which b=-1.

[0173]Now apply the singular value decomposition (SVD) to the contents of
the innermost brackets. The SVD allows any matrix to be written as the
product of an orthogonal matrix, a diagonal matrix, and another
orthogonal matrix. The SVD has to be used twice, once for the brackets in
the numerator, and once for those in the denominator. This procedure is
illustrated by considering the numerator only.

[0174]Thus U'Y-U'K is a matrix and the SVD allows the calculation of L, D
and R such that LDR=U'Y-U'K (equation 24) where L and R are orthogonal
matrices and D is a diagonal matrix. The next seven equations follow in
sequence from equation (24).

[0175]In these equations the orthogonality of R forces R'R=I (and
similarly for L) and where the last line follows from the fact that
det(L)=det(L')=±1 since L is orthogonal and real.

[0176]I+DD', however, is a diagonal matrix computed from diagonal
matrices, and its determinant is therefore the product of the elements on
its diagonal, each of which is cheap to evaluate. Thus it is never needed
to know L or R, just D.

[0177]Having now discussed the fundamental principle of the decoding
procedure, an embodiment of a decoder will now be described with
reference to FIG. 16.

[0178]Before discussing the actual decoder of FIG. 16 it is essential to
be able to relate the various stages in the decoder to the previous
theoretical discussion.

[0179]Thus it will be appreciated that Y represents a spectrogram block as
received by the decoder and is equal to X+bK where b is the code and K
the key. Thus if the value of b and K are known then the probability of a
particular value of Y being received is exactly the same probability that
Y-bK represents the original covertext. This is what is expressed in
equation (8).

[0180]Turning now to equation (9) this expresses the probability that X is
a spectrogram of unmarked music. In this equation S is an unknown icov
matrix. If the music were to be represented as white Gaussian noise S in
equation (9) would always be a multiple of the identity matrix. However
as discussed this is not an accurate representation of real music. This
is the reason for the use in the present invention of the icov matrix.

[0181]Thus it is assumed that each column of X, though not of the whole
piece of music, comes independently from the same multi-dimensional
Gaussian distribution defined by this, as yet unknown, icov matrix S.
Thus the decoder operates on the assumption that S may be different in
different parts of the music. Equation 9 expresses the above paragraph
mathematically.

[0182]However in order to decode the marked spectrogram it is essential to
know what values of X are likely. Equation 9 can only provide this
information if the value of S is known.

[0183]Thus the function of equation 10 is to determine what values of S
are likely. In this equation it is assumed that S is distributed
according to a Wishart distribution with parameters V and k. The method
by which these parameters are chosen will be described later in the
specification.

[0184]Having discussed the basic principles involved a first embodiment of
a decoder will now be described with reference to FIG. 16 of the
accompanying drawings.

[0185]In FIG. 16 a music-based stegotext is again represented by (15). The
following description will only refer to music in the interests of
simplicity but of course other forms of covertext can be handled in a
similar manner.

[0186]At (100) the covertext is point-wise multiplied by a windowing
function comparable to the windowing function disclosed at (51, 52) of
FIG. 6, and at (101) the Fast Fourier Transform of the windowed stegotext
is generated and this is converted at (102) into the log domain.

[0187]At (103) the output of (102) is pre-multiplied with a matrix U'
which has already been generated so as to represent the matrix U' shown
in equations 22 to 25. The generation of U' will be described later. If
the output of (102) is defined as F then the output of (103) is U'*F and
this is applied to a high pass filter (104) to remove any DC offset. The
output of U'*F of (104) is a log abs spectrogram of a length dependent on
the initial stegotext.

[0188]In order to process this spectrogram it is sectioned into blocks
each having a width corresponding to K. In the present and second
embodiments each block is 32 columns wide in the time dimension and 1024
rows in the frequency dimension, though these values can of course be
varied. Each sectioned block represents U'Y as in the set of equations
23, 24 and 25.

[0189]This sectioning is carried out at (105) in such a manner that each
section block overlaps the previous block by one column less than the
width of the block in correspondence to the situation illustrated in FIG.
4 of the accompanying drawings.

[0190]Each block obtained in the above manner is added at (106) to U'K and
at (107) has U'K subtracted from it so as to give two different results
at (108) and (109), namely: (1) Left Hand: X-1=U'Y+U'K and (2) Right
Hand: X+1=U'Y-U'K.

[0191]It will now be appreciated that these values correspond to values
found in equations 23 and 24.

[0192]The next stage of the decoder involves calculating the logarithm of
the denominator and numerator respectively of equation 23. This is
achieved using the equation set out at 24. At stages (110, 111) the log
determinants of the denominator and numerator of equation 23 are
determined and at stages (112, 113) the output log determinants are
respectively scaled by

- ( k + M 2 ) , ##EQU00014##

this factor of course also appearing in the above equations. The
derivation of this factor will be described hereinafter.

[0193]At stage (114) the logarithm of the quantity of equation 23 is
calculated by subtracting the logarithm of denominator from the logarithm
of the denumerator as represented by the scaled values obtained at (112)
and (113).

[0194]The values so obtained are stored in a buffer (115) as a log ratio
of the probability of the data bit of interest if one is present being a
1 or a -1. Thus buffer (115) can be considered as holding the posterior
probabilities of a sequence of code bits, the length of the sequence
depending on the size of the buffer.

[0195]If one considers a sequence of individual entries into the buffer
(115) these will consist of values distributed about a zero axis with
each value representing the outcome of the processing, as just described,
of a matrix. The buffer (115) can hold 256 discrete values, though of
course this number is also variable. FIG. 17 is a graph indicating a
sequence of values as stored in buffer (115).

[0196]The values in the buffer (115) are represented by the dark curve
(150). The solid vertical lines 151 represent the times at which the bits
have been encoded. These times are determined by the clock extraction
circuit (116) in the manner described hereinafter.

[0197]It is now necessary to extract from the buffer (115) those values
which represent the original code. It must again be appreciated that the
original stegotext may have undergone either stretching or compression so
that the extraction of a bit rate for the code must take this into
account.

[0198]This is the procedure which is carried out at clock extraction
circuit (116). Here all possible sequences of slicing points are
considered and that sequence of points which gives maximum total
deviation from zero is selected as the clock for the embedded code.

[0199]Effectively a pair of nested loops iterates over the possible bit
clock frequencies and phase off-sets. In each iteration, the sum of the
squares of the sliced values is calculated. The values of frequency and
phase which maximise this sum are returned to stage (117) as a set of
indices representing a clock.

[0200]In stage (117) two pointers are used to point to where the first and
last values are expected to be found in the buffer (115). These are
manipulated to allow the data to be sliced from block to block stitching
together the extracted bits without gaps and repeats. The posterior
probability vector is shifted along at the same rate as the data in
spectrogram buffer (105).

[0201]The clock generated by clock extraction circuit (116) is used at
(117) to slice the data that is to be read the data from buffer (115)
into a buffer stage (118). It will be appreciated that the original key
was added at 5 column intervals to the 32 columns of the log ab
spectrogram of the covertext. Thus even if the stegotext has been
compressed or stretched prior to decoding a code bit will be expected
approximately every fifth column of the sequence of values stored in the
buffer. However the result of slicing the buffered data in response to
the extracted clock is still not a sufficiently accurate representation
of the original code as so far the decoder has assumed that the music has
a detV distribution. As already described this is not completely true.
Thus certain of the points at the 5 column intervals may be wrong.
Accordingly allowances have to be made for the original music not being a
detV distribution.

[0202]Consider first the output to stage (118). This consists of a
sequence of values which purport to be

log ( P ( b n = 1 Y n ) P ( b n = 0 Y n
) ) ##EQU00015##

[0203]This would be the required sequence if the music actually had a detV
distribution. However as already explained it does not.

[0204]If C1, C2 . . . Cn is defined as the sequence of
values output from stage (117) then if the music does have a detV
distribution then

C n = log ( P ( b n = 1 Y n ) P ( b n = 0
Y n ) ) ##EQU00016##

[0205]As the music does not have a detV distribution it is necessary to
find a function f such that f(Cn) is a better approximation to that
logarithm than Cn so that a more correct sequence can be input to
the error correcting decoding stage (120) of the decoder, it being
appreciated that in the encoder the watermark code was in addition
encoded with an error correcting code.

[0206]The output of the data slicing stage (117) is a series of values on
either side of zero with the positive values potential "+1"s and the
negative values potential "-1"s.

[0207]Each of the "+1" values will deviate about a value which will be
referred to as α. Similarly each of the "-1", values will deviate
about -α and a can be estimated to equal the mean of the absolute
values of Cn i.e. α-mean|Cn|.

[0208]The amount by which the positive Cn varies about α and
the negative Cn varies about -α also has to be estimated. This
value is defined as G so that

σ=std(|Cn|-α)

[0209]Having now obtained σ the original values C1, C2 . .
. Cn are scaled in stage (118) so that they have a standard
deviation of 1 about + or

1 about + or - α σ . ##EQU00017##

This is summarised by the equations

a n = C n σ ##EQU00018##

and

β = α σ . ##EQU00019##

[0210]Thus if hn=|an|-β then hn has mean zero and a
standard deviation of 1 but is not necessarily Gaussianly distributed.

[0211]In the present embodiment it is assumed that hn is a one
dimensional Student distribution of the type which has already been
discussed. Thus

[0213]The derivation of the values r and m will now be described. This is
done by collecting a large sample of the values of hn typically seen
in practice and doing MAP inference using equation (25) as the likelihood
of individual values of hn and improper uniform prior distributions
on m and log(r).

[0214]Accordingly in stage (118) equation 28 is used to calculate the
corrected sequence which finally has to have the added error correction
encoding decoded so as to obtain what is referred to as a "likelihood"
map containing vectors of calculated log likelihood ratios.

[0215]FIG. 18 shows the log likelihood map derived from the contents of
buffer (115) by the procedures which have just been described. Reference
numerals 150, 151 are used in FIG. 18 in the same sense as in FIG. 17.

[0218]Therefore for every two bits present in the output of circuit (118)
a decision has to be made as to which bit is part of the desired code.

[0219]In order to carry out this function the convolutional decoder (120),
in its simplest form, looks at each potential output bit, and for each
such bit considers all the possible values of the surrounding bits within
a fixed window. This procedure is carried out in phase search circuit
(119). The size of the window is a compromise between performance and
amount of calculation. For example for a window encompassing ten values
in the buffer a total of 1024 sequences have to be evaluated.

[0220]For each of the 1024 sequences the encoded value is calculated and
the probabilities that values in the buffer over that window are
calculated by adding or subtracting the relevant values in the buffer in
accordance with the relevant bit being +1 or -1.

[0221]The probabilities of all the 512 sequences which have a +1 in the
position under consideration are added, and the other 512 sequences which
have a zero in the relevant position are added. This gives a probability
that the bit under consideration is a 1 or a 0.

[0222]This procedure is illustrated in FIG. 19. in this figure (250) in a
schematic representation of the values sliced from circuit (118).
Wini represents a ten-value window and 7 represents the pixel of
interest. Wini+2 represents the next window in this sequence and 8
the next pixel of interest value. Finally Vi represents the result
of the evaluation just carried out for the pixel of interest 7 and
Vi+2 the outcome of the next evaluation.

[0223]As shown in FIG. 19, because the output of the convolutional encoder
of FIG. 5 gave 2 output bits for each code bit, the window is stepped at
two bit intervals along the contents of the buffer. This procedure has to
be carried out twice over the respective odd and even numbered values in
the buffer. Two sequences are thus generated each having a probability
associated with it and finally a selection is made on the basis of the
sequence which has the higher probability.

[0224]What has just been described is the simplest form of
encoder/decoder.

[0225]However it may be advantageous to have some other ratio other than 2
to 1.

[0226]If for example the ratio was 4 to 1 it would be necessary to use
four sequences in which a window was successively stepped evenly for
values in the buffer, and to select the most probable output bit from
these four sequences.

[0227]Equally there are other ways in which the code can be decoded which
will be apparent to people skilled in the art such as a Viterbi decoder.

[0228]Decoder polynomials corresponding to those used at the encoding
stage are supplied to the maximum-likelihood decoder 120 at 120', and
finally added synchronisation bits and zeros added during zero stuffing
are removed at (121) to leave the decoded data. The descrambler (122) is
only necessary if the optional scrambler (65) in FIG. 13 has been used in
the encoding process.

[0229]FIG. 20 of the accompanying drawings shows another embodiment of a
decoder. It will be seen that the decoder of FIG. 20 has the bulk of its
integers common to the decoder of FIG. 16. Thus where these common
integers occur the same reference numerals have been used.

[0230]Fundamental to the operation of the decoder of FIG. 21 is the
concept of projection maps in vector spaces.

[0231]FIG. 21 of the drawings illustrates an orthogonal projection map f
from a two-dimensional vector space onto a one-dimensional subspace.

[0232]In this figure a random set of points v1, v2, v3,
v4 and v5 have been mapped by a function f onto a single line,
namely the diagonal line marked L.

[0233]In general terms let v be a real N-dimensional vector space with a
dot product written v1v2. A projection map on V is a function
fV→V which satisfies the following equations.

rf(v)=f(rv)

f(v1+v2)=f(v1)+f(v2)

f(f(v))=f(v)

for any v1, v2, v3εv and for any real number r. It
is said to be an orthogonal projection map if in addition

f(v1)=0v1f(v2)=0, for any v1, v2εV.

[0234]A subspace W of V is a subset of V such that
rw1+w2εW for any w1, w2εW and any
real number r. For each subspace W of V there is exactly one orthogonal
projection map fw onto W.

[0235]For example N might be 2, v the set of all real 2-vectors, and w the
set of all 2-vectors whose first element is twice the second element. In
FIG. 19 W is represented by line L and the random points are joined by
dotted lines to their images in W under fw.

[0236]It will be appreciated that the embodiments of the present invention
which have been described such as the encoder of FIG. 13 and the decoder
of FIG. 16 all operate on the basis that matrices of values in the log
abs spectrogram are being manipulated so that any manipulation of the
matrices which reduces computational requirements but retains the
information necessary in particular for the extraction of the code from
the stegotext would be of considerable value. The above described concept
of projection maps in vector spaces provides such a tool. Thus if A is a
matrix whose columns span W, then there is a matrix B such that
fw(v)=Bv.

[0237]The following is one example of how the necessary projection matrix
can be calculated using the Matlab® programming language.

[0238]If A is a matrix whose columns span W, then there is a matrix B
which may be calculated by the following Matlab statements such that
fw(v)=Bv.

[L,D,R]=svd(A); [0239]% get an orthonormal basis L for V [0240]% whose
first few columns are an [0241]% orthonormal basis for W [0242]% now
replace the non-zero elements on the diagonal of D by 1 [0243]% and pad D
up with zeros to be the same size as L

[0243]d=diagfrom(D);

dnnz=sum(dkey>1e-10*dkey(1));

D1=diagsz(ones(dnnz,1),size(L));

% B operates by calculating the components of any v in V% in the basis L,
zeroing the components orthogonal to V,% then going back to the original
coordinate system.

B=L*D1*L';

If the last line of these statement is replaced by

B 0 = L ( , 1 dnnz ) ' ; ##EQU00023##

[0244]Thus B0 not only carries out the projection map fw but it
also switches to an orthonormal coordinate system for W which has fewer
dimensions than the original coordinate system for V with of course the
unimportant exception that W=V.

[0245]When discussing the decoder of FIG. 16 and the equations associated
with the operation of this decoder it will be remembered that U'Y
represented a log abs spectrogram of a segment of the stegotext as
modified by statistics of prior calculated statistics of samples similar
to the covertext. If W denotes the subspace spanned by the columns of U'Y
and this received data is projected orthogonally onto W as discussed in
the simple example of FIG. 21 then the necessary calculations to extract
the watermark code, can be substantially simplified if the projection of
Y onto the new subspace does not result in the loss of too much
information.

[0246]It has been discovered that carrying out such a projection does
little to damage the robustness of the stegotext to attack.

[0247]Thus again considering equation 25 it will be seen that by carrying
out the projection B0 it is now necessary to evaluate
det(I+(B0U'Y-B0U'K)(B0U'Y-B0U'K)') where B0 is
the matrix related to fw as already discussed with reference to FIG.
20.

[0248]In order to carry out this evaluation B0U'can be precomputed
along with B0U'K and stored in ROM.

[0249]The next steps to be undertaken are the evaluation of
T=(B0U')Y-(B0U'K), evaluating I+TT', and taking a Cholesky
decomposition such that C'C=I+TT' noting that det
(I+(B0U'Y-B0U'K)(B0U'Y-B0U'K)')=det(C)2.

det(C)2 is easy to evaluate since C is triangular.

[0250]Once the above has been appreciated it is now possible to discuss
the differences between the encoder of FIG. 20 and the encoder of FIG.
16.

[0251]In FIG. 20 the dotted box (124) represents a controllable resampler
circuit which is only present in a variant of the embodiment which will
be described hereinafter. Additionally the dotted connecting line between
box (124) and circuit (116) is only present in the variant to be
described. K as before represents the key which was originally used to
encode the covertext. As already described this key has been generated
using random white noise in turn generated by a random integer seed
number and then being filtered by a 2D swept band pass filter.

[0252]A matrix multiplier (201) matrix multiplies the key K, held in a ROM
(202) with the already mentioned predefined statistical data U' to
generate U'K. This data U' is stored in a ROM (203). The output of
multiplier (201) is supplied to box (204) which generates the projection
matrix B0 and this projection matrix is multiplied by multiplier
(205) with the output of multiplier (201) to generate B0U'K.
Additionally B0 as output by (204) is also supplied to one input of
a matrix multiplier (206) the other input of which is supplied with U' so
that the output of multiplier (206) is B0U'.

[0253]The decoder of FIG. 20 now operates in a manner which is analogous
to the decoder of FIG. 16 and those elements of the encoder of FIG. 20
which operate in the same manner as the element of the encoder of FIG. 16
have been given the same reference numerals.

[0254]Thus at multiplier (103) each block of Y as already defined is
multiplied by B0U'. Similarly at (106, 107) the value B0U'K are
respectively added and subtracted rather than U'K.

[0255]The remaining elements of the second embodiment of the decoder
operate on exactly the same manner as those of the first embodiment.

[0256]It is now necessary to describe the procedure by means of which the
prior statistical values required by the Bayesian processes of the two
embodiments of the encoders which have just been described. These will be
described with regard to the flow diagram of FIG. 22.

[0257]In step S1 of this figure a plurality of musical examples are
concatenated. The musical samples can be chosen from a wide range of
music and for example can be generated by playing excerpts from a
suitable number of CDs. The CD excerpts can be mixed with or replaced by
taped, live or broadcast music.

[0258]The end result of this concatenation is a length of music excerpts
which can cover a wide music range. It is of course possible to skew the
music excerpts chosen so that the statistical data could be based on
different types of music so that a user can select an appropriate set of
data for decoding purposes. Thus several different sets of statistical
data could be stored in ROM (203) and an appropriate selection made by a
user.

[0259]At step S2 the log power spectrogram of the concatenated musical
examples is generated. At step S3 the mean of the columns of the power
spectrogram so obtained is calculated and in step S4 this mean is
subtracted from the individual columns yielding A.

[0260]The latter two steps can be approximated by passing the rows of the
spectrogram through a high-pass filter with appropriate characteristics.

[0262]At step S6 it is assumed that each column had been drawn
independently from a detV distribution of the type already discussed with
Wishart parameters with r*E for the scale matrix and k for shape
parameters. It is also assumed that log r has an improper uniform prior
distribution and k has an improper uniform prior distribution. In step S6
the MAP (Maximum A Posteriori) values of γ and k are calculated
using Bayes Theorem.

[0263]In step S7 V is set to equal rE and k=k and these values are used to
calculate U'. U' is accordingly a matrix which transforms the mean local
covariance matrix of the log power spectrogram of the samples into the
identity matrix. Finally in step S8 U' is stored.

[0264]A number of variants of the encoder shown in FIG. 20 will now be
described.

[0265]It is possible that extracting the MAP values of the timing
frequency and phase from buffer (115) may not result in the most accurate
performance. Accordingly in a variant of the two decoders described all
possible ranges of timing frequency ω and phase φ are
considered. For each ω and φ there is estimated the posterior
probability P(ω,φ|D) of ω and φ given the data D by
adding the absolute values of the values sliced from buffer (115) where
the slicing is done according to ω and φ, and exponentiating
the result.

[0266]In the previously described decoders the values of ω and φ
which maximised P(ω,φ|D) were then used to slice the values in
buffer (115).

[0267]Instead in the variant being described the audio input is initially
resampled by resampler circuit (123) so as to either stretch or compress
the stegotext. Then a random sample is drawn from P(ω,φ|D) by a
random sampler which replaces the clock extraction circuit (116) and this
random value is used to slice the data in buffer (115). This procedure
takes into account the fact that the MAP may occur as a narrow high spike
in the distribution when in fact the true value belongs to a broader peak
which although lower contains more probability.

[0268]Thus consider the manner in which the random sampling of buffer
(115) in circuit (116) as just described operates in conjunction with the
resampler (123).

[0269]Firstly if the stegotext has been stretched (compressed) by less
than 6% then it is likely that circuit (116) will immediately select an
appropriate timing for data slicing circuit (117) due to the nature of
the key (K), and similarly output a correct correction value to circuit
(123) so as to undo the stretch (compression) which has been detected.

[0270]If on the other hand the stretch (compression) was greater than 6%
then values in buffer (115) and hence the calculated values of
P(ω,φ|D) will be incorrect and all of similar magnitude. This
results in a random value being fed to the resampler circuit. Two cases
now need to be considered. Firstly the value fed back is by chance within
6% of the correct value. If the total range of stretch possible is
±10% then the chance of this happening is at least 1/2. Once the newly
resampled stegotext reaches buffer (115) circuit (116) will be able to
determine the correct correction now needed. From here on the correct
value will be fed back to circuit (123) and on to circuit (117).

[0271]Secondly the value fed back to circuit (123) may be outside ±6%
of the correct value. If this once again is incorrect the next value fed
back will be randomly chosen and the procedures described continue. As at
each iteration there is a 1/2 probability of picking up the correct value
so that only a few iterations are required until the correct timing and
phase have been estimated. Naturally the resampler circuit may start by
leaving the input stegotext unaltered.

[0272]Finally a common problem with all decoders dealing with precisely
reproduceable inputs is that under certain circumstances a particular
input may not be decoded correctly, and in the absence of any randomness
in the input the same problem will recur whenever the input is repeated.
It is accordingly proposed to provide a decoder such as, for example, the
decoders and their variants which have just been described with respect
to FIGS. 16 and 21 with means for avoiding this problem.

[0273]A number of solutions are possible. One is merely to add truly
random noise to the input to the decoder. This has the disadvantage of
degrading the performance of the decoder.

[0274]Another alternative is to precede the actual input to be decoded
with a period of zero signal or of random noise, the period having within
a predetermined range a truly random length.

[0275]It will be appreciated that in the foregoing specification the
various embodiments of encoders and decoders have been defined in terms
of circuit elements such as "filter", "multiplier", "buffer", and
"circuit" and so on. However apart from the actual recording or
reproduction of a signal all these circuit elements can be replaced by
appropriate software manipulation. Thus in particular the encoder
described with respect to FIG. 13 can be replaced in all its functional
aspects by a general purpose computer receiving appropriate code. An
example of such a code is given with regard to the generator of the
matrix B0 used in the decoder of FIG. 20. Thus all the steps and
blocks shown in FIGS. 13, 16 and 20 can have their functions carried out
as software steps.

[0276]In the case of the decoder embodiments if they are to be used in
individual systems which as well as decoding the stegotext produce the
stegotext as an output, for example as music, then the decoders may well
be in the form of integrated microprocessor(s) possibly employing very
large scale integrated circuits.