1.
Innovative Systems Design and Engineering www.iiste.orgISSN 2222-1727 (Paper) ISSN 2222-2871 (Online)Vol 3, No 7, 2012 A Review of Analog Audio Scrambling Methods for Residual Intelligibility A.Srinivasan1 P.Arul Selvan2* 1. Dept of Information Technology, MNM Jain Engineering College, Chennai – 600097, Tamil Nadu, India. Email: asrini30@gmail.com 2. Dept of Electronics and Communication Engineering, Sathyabama University, Chennai, Tamil Nadu, India * E-mail of the corresponding author: arulp6874@gmail.comAbstractIn this paper, a review of the techniques available in different categories of audio scrambling schemes is done withrespect to Residual Intelligibility. According to Shannons secure communication theory, for the residualintelligibility to be zero the scrambled signal must represent a white signal. Thus the scrambling scheme that has zeroresidual intelligibility is said to be highly secure. Many analog audio scrambling algorithms that aim to achievelower levels of residual intelligibility are available. In this paper a review of all the existing analog audio scramblingalgorithms proposed so far and their properties and limitations has been presented. The aim of this paper is to providean insight for evaluating various analog audio scrambling schemes available up-to-date. The review shows that thealgorithms have their strengths and weaknesses and there is no algorithm that satisfies all the factors to the maximumextent.Keywords: residual Intelligibility, audio scrambling, speech scrambling1. IntroductionIn communication systems, audio which includes speech and music signifies either analog or digital audio. Thetransmission of digital audio of good quality requires a channel bandwidth (up to 32kbps) that is greater than thechannel bandwidth needed for analog audio (up to 4 KHz). Scrambling of digital audio results in a signal whosecharacteristics is similar to white noise. Hence it has zero residual intelligibility with high cryptanalytic strength, butthis scrambled digital audio signal needs a higher channel bandwidth for transmission. Another class of analogscrambling operates on the digital codes of pulse code modulation (PCM), adaptive differential pulse codemodulation (ADPCM) and delta modulation (DM). In this case, the scrambled bits are converted into analog form fortransmission over analog channels. This is a kind of nonlinear transformation which results in poor recovered speechquality; hence it has lesser practical usage [S.C.Kak et al 1983]. Scrambling of the analog audio reduces the residualintelligibility, but the signal has lesser cryptanalytic strength. Moreover, the signal bandwidth is kept at acomparatively low level, so that transmissions through analog channels are feasible.The key factors that characterize the scrambling algorithm are Residual Intelligibility, Encoding Delay andKey-Space. This paper reviews the available analog audio scrambling algorithms for the above mentioned factors.The auxiliary factors Bandwidth Expansion and Cryptanalytic Strength are also considered in the review.The paper is organized as follows. In section 2, the main factors pertaining to analog audio scrambling algorithms aresummarized. In section 3, the algorithms are categorized based on the methodology used. Next in section 4, thealgorithms have been discussed for the three key factors with the tabulation of results, merits-demerits and futurework. The paper concludes with final remarks.2. Factors of Analog Audio Scrambling algorithmsIn an analog scrambler, the analog signal is first converted into a discrete signal and then processed for scramblingusing digital processing techniques; finally the scrambled signal is again converted back to analog signal. Since, thescrambler output is an analog signal; the scrambling scheme is termed as analog scrambling. Analog scrambling is 22

2.
Innovative Systems Design and Engineering www.iiste.orgISSN 2222-1727 (Paper) ISSN 2222-2871 (Online)Vol 3, No 7, 2012the preferred method for secure speech communications over the telephone channel. Moreover, analog scramblingcan only provide good privacy in the context of casual eavesdropping. For high security applications digitalencryption has to be used.2.1 Residual IntelligibilityThe amount of redundant information in the scrambled signal is termed as residual intelligibility, which helps ineasier recovery of the original information. Scrambling effectiveness is determined by the amount of residualintelligibility, key-space, rate of change of the key and the distortion produced by the key. The above factors arelinked to the complexity of the system and the resultant encoding delay. Thus for low intelligibility levels and highkey-space the scrambling effectiveness is higher, but the system complexity and the encoding delay increases[S.C.Kak et al 1977].Intelligibility is a subjective quantity that is evaluated by using trained and untrained human listeners to listen to thescrambled audio. Intelligibility is commonly expressed in terms of word, sentence and digit intelligibility levels. Testmaterials used for word intelligibility testing include a list of monosyllabic words, for sentence intelligibility its acombination of monosyllabic words and for digit intelligibility recordings of N-Digit numbers are used. In theredundancy scale, sentences have the highest redundancy followed by words and digits have the lowest amount ofredundancy. In most cases the analog scrambler performance is benchmarked with respect to the digit intelligibilitybecause of limited vocabulary and lesser redundancy of the digits. Intelligibility scores are given in the range of100-0 percent, with zero percent being the ideal value of zero residual intelligibility which will resemble white noise.A level of 10% is termed as the lower threshold, 30% is the medium level and 50% is the higher level [N.S.Jayant,R.V.Cox, B.J.McDermott, A.M.Quinn 1983].2.2 Encoding DelayThe amount of time taken per unit by the scrambling algorithm to complete the scrambling operation is termed asencoding delay; in general the unit is taken as block or segment. The encoding delay is directly proportional to thenumber (N) of units, length (L) of each unit and the number of samples (S) present in one unit. When N,Land Sincreases the recovered speech quality increases because of the availability of more number of permutable samples,but the encoding delay also increases. Considering the two complementary factors of encoding delay and recoveredspeech quality an appropriate segment length chosen is between 16 to 32 ms or 256 samples per frame [N.S.Jayant1982].2.3 Key-spaceThe procedure used for transforming the signal is commonly called as Key. The level of security offered by ananalog scrambling algorithm is a complex function of the number of usable keys called as key-space, length, rate ofchange of the key, properly selected limited key dictionary, proper time variation and distribution of thekeys.[N.S.Jayant 1982]. For casual privacy the key is independent of time and for high security the key is timedependent. Similarly high security needs a larger key-space, but, when the key-space is larger or the key is timedependent the system complexity increases, thereby increasing the encoding delay. Within a given key-space, thekeys selected have to be statistically independent for increased security. Moreover, the keys selected have to distortthe scrambled signal to a larger extent.2.4 Bandwidth ExpansionWhen the speech signal is scrambled, discontinuities are introduced in the scrambled signal, which results in anincrease in the scrambled signal bandwidth. For higher scrambling effectiveness, larger amount of discontinuities areintroduced, which in turn increases the bandwidth. Bandwidth expansion limits the capability of the scrambled signalto be transmitted through narrow-band channels. In general the time-frequency permutation introduces bandwidthexpansion. To keep the bandwidth expansion minimal, linear orthogonal invertible transformations can be used. Inthis review paper, bandwidth denotes analog bandwidth.3. Analog Audio Scrambling Classification3.1 Analog Audio Scrambling Classification-First levelScrambling of the audio signal can be done in analog and digital domain. This section captures the classification ofthe analog audio scrambling techniques. 23

3.
Innovative Systems Design and Engineering www.iiste.orgISSN 2222-1727 (Paper) ISSN 2222-2871 (Online)Vol 3, No 7, 2012The taxonomy of analog audio scrambling algorithms is depicted in Figure 1.In the Sample Amplitude based technique, the amplitude of the analog audio samples is altered with a simplereordering in the time domain, resulting in change of the magnitude spectrum of the scrambled signal. In the TimeDomain based technique, the samples are grouped together into segments and these segments are then reordered. Inthe Frequency Domain based technique, the frequency contents of the segments are extracted as sub-bands and thesesub-bands are permuted thereby altering the frequency spectrum. When scrambling is done in both the time andfrequency domains, it is called as two-dimensional technique. The audio signal is transformed using an appropriatetransformation technique and the transform coefficients are permuted to produce the Transformation based audioscrambling.3.2 Analog Audio Scrambling Classification-Second levelThis section sub-classifies the above techniques based on the analog scrambling techniques available in the literatureup-to-date.3.2.1 Sample Amplitude based techniquesIn the sample amplitude based technique, the amplitude samples of the original signals are taken up for scrambling.Typical operations include interchange or permutation of speech samples [J.Phillips, M.H.Lee, J.E.Thomas 1971],linear addition of pseudorandom noise amplitudes and non-linear modulo-arithmetic additions [S.C.Kak et al 1977].Two basic types of permutations available are Uniform (U) permutations and Shift-Register generatedPseudo-Random (PR) permutations. Some types of scramblers involve addition of masking signals to the amplitudesamples, these masking signals can be a PR binary or modulo-arithmetic sequence.3.2.2 Time Domain based techniquesIn the time domain based technique, the audio signal is divided into segments and the segments are then permuted.Main time domain techniques are Time-Inversion, Time Segment Permutation (TSP), Hopping-Window and SlidingWindow TSP, Time Shifting of Speech Sub-bands, Reverberation [N.S.Jayant 1982] and time-domain basedscrambler which does not need synchronization [F. Huang, E. V. Stansfield 1983].3.2.3 Frequency Domain based techniquesIn this class of scramblers the speech signal spectrum is divided into many sub-bands and the position of thesesub-bands are then permuted. Main frequency domain techniques are Frequency Inversion, Band-splitting,Band-splitting with Frequency Inversion and Frequency Inversion followed by Cyclic Band-shift [N.S.Jayant 1982].3.2.4 Two-Dimension based techniquesTwo-Dimensional Scramblers perform manipulations in both the time and frequency domains simultaneously.Important types of scramblers are Frequency Inversion combined with Block TSP, Frequency Inversion and CyclicBand-shift combined with time manipulations and Time-Frequency Segment Permutation (TFSP).3.2.5 Transform based techniquesThis class of analog scramblers is based on operations performed on the linear transform coefficients of the audiosamples. Types of transforms used are Discrete Prolate Spheroidal Transform (DPST), Fast Fourier Transform (FFT),Discrete Cosine Transform (DCT), Modified discrete cosine transform (MDCT), Hadamard Transform (HT),Circulant transformation, Wavelet Transform, parallel structure of two different types of wavelets with the samedecomposition levels, combination of QAM mapping method and an orthogonal frequency division multiplexing(OFDM).4. Review of the techniques4.1 Review on Sample-Amplitude based techniquesSample interchange method is the simplest technique where the individual samples are reordered. The reordering canbe achieved by using delay networks as shown in the Figure 2. The figure given shows the scrambling order for ablock-4 sequence. This reordering produces sideband components that mask or alters the amplitude of the adjacentaudio samples. But, this method still retains a substantial amount of residual intelligibility; the word intelligibility is 24

4.
Innovative Systems Design and Engineering www.iiste.orgISSN 2222-1727 (Paper) ISSN 2222-2871 (Online)Vol 3, No 7, 2012about 22% and the digit intelligibility is about 24% for a block size of 128 samples. When the sample displacementis larger the sideband components are stronger, this increases the effect of masking. A variation of this method is tohave a sample-sequence reordering which is closer to the completely reverse sequence, in this case a wordintelligibility level of 2% is obtained. A second variation is to have a more complex sample interchange that takesplace between samples of different segments, this is termed as running exchange and it gives much lower value ofresidual intelligibility. The two variations given above leads to a scrambled signal bandwidth that exceeds the analogchannel bandwidth of 4KHz[J.Phillips, M.H.Lee, J.E.Thomas 1971].Majority of the scrambling schemes involves permutation of the samples or transform coefficients. The permutationmust result in a marked difference between the original and scrambled blocks both perceptually and spectrally. Theeffectiveness of permutation is measured in terms of the rank correlation coefficient which ranges from 0-1 with zerodenoting a highly effective permutation. Spearmans coefficient and Kendalls coefficient are the two most frequentlyused methods for determining the rank correlation.[S.C.Kak et al 1983]The class of scramblers based on temporal permutation of the speech samples has efficiency which is dependent onthe order of the permutation matrix and the randomness of the matrix coefficients. Two types of permutation possibleare U-permutations and PR-Permutations. U-permutations results in the frequency spectrum of the scrambled signalthat is flat in an average sense. Whereas, for PR-Permutation the frequency spectrum of the scrambled signal is flatin the average sense and also the transitions of the adjacent samples produces a smoother spectrum. This flatnessensures a decrease of residual intelligibility. Since the speech-silence pattern is recognizable, the residualintelligibility of both the technique is considered to be essentially higher. [S.C.Kak et al 1977]Contiguous time-sample permutation has an important limitation of bandwidth expansion of the scrambled signal. Toovercome this, individual samples are grouped together into time-segments on which the U or PR permutations canbe applied. Typical segment duration chosen is 10-30ms. [S.C.Kak et al 1977]. A common issue that needs to betaken care of for a sample/segment based scrambler is synchronization of the frames between the scrambler anddescrambler.Masking based scramblers are an alternative to permutation based scramblers. Main types of masking techniquesinclude linear addition of PR noise or modulo-m addition to the samples/segments; these techniques provide a lowlevel of residual intelligibility. The masking signal has to be slow changing with respect to the audio waveform, thisensures that the entire audio sample is impacted by the masking signal and the spectrum is closer to the white signalspectrum. The frequency spectrum after scrambling is shown in Figure 3. [S.C.Kak et al 1977]. Moreover thespeech-to-channel noise ratio is lower in the case of linear masking thereby resulting in higher receiver complexity.Non-linear masking techniques are more robust to real-channel imperfections, but it leads to bandwidth expansion[N.S.Jayant 1982]. A significant advantage of the masking techniques is the removal of speech-silence patterns.Technique based on chaotic encryption in conjunction with lookup tables is discussed in [K.Ganesan, R.Muthukumar,K.Murali 2006]. The lookup tables are constructed by using an appropriate chaotic system (like Arnold map), theentries in this table include index number and iterated decimal value. The amplitude values of the quantized audiosamples are converted based on the lookup table entry. The input quantized audio data that varies between 0 and19512 is converted to the amplitude values that vary between 0 and 65284. Thus the randomized amplitude value isgenerated with a higher dynamic range; this ensures a lower level of residual intelligibility.4.1.1 Experimental OverviewThe Table 1 given below lists the comparative values of the factors for the various algorithms.The sample interchange method leaves a considerable amount of residual intelligibility in the scrambled speech,because the interchange happens within a finite distance. Since finite numbers of samples are taken up forinterchange, key-space and encoding delay are both lower. Improvement in residual intelligibility and increase inkey-space value is obtained when the number of samples taken up for applying this method is larger, but this willincrease the encoding delay. In comparing the PR and U permutation scramblers for a given value of block length N,its found that both the techniques have a relatively higher level of residual intelligibility. The presence ofspeech-silence pattern increases the residual intelligibility. As the value of N is increased, the key-space increases.For a value of N=256, PR permutation has a key-space of 4080 and U permutation has a key-space of 63232. Inpermutation based algorithms the segment duration has to be kept between 10-30 ms for limiting the bandwidth 25

5.
Innovative Systems Design and Engineering www.iiste.orgISSN 2222-1727 (Paper) ISSN 2222-2871 (Online)Vol 3, No 7, 2012expansion [S.C.Kak et al 1977].Masking and permutation techniques can be applied concurrently on the speech samples to improve the residualintelligibility levels. The coefficients of the permutation matrices can be time-varying to increase the crypt-analyticstrength. It is also important to note that the Hamming Distance can be used as a measure of scrambling forpermutation based scramblers. When more elements are moved from their original place because of permutation theHamming Distance is larger and the residual intelligibility will be lesser.4.1.2 Merits and DemeritsThe encoding delay is low because the scrambling is done on a finite set of samples at a time. The bandwidthexpansion is minimal when the segment duration is kept between 10-30ms.The various algorithms in this category retain a significant amount of residual intelligibility. The presence ofspeech-silence pattern in the techniques other than masking decreases the cryptanalytic strength. The key-spaceavailable is low because these algorithms work on a subset of speech samples.4.1.3 Future WorkSample interchange in a random manner is theoretically possible, but it has practical difficulties that needs to beexplored.[J.Phillips, M.H.Lee, J.E.Thomas 1971] Permutation matrix coefficients can be generated by following alook-up-table approach.The possibility of using higher dimensional chaotic system for better scrambling results is yet to beexplored.[K.Ganesan, R.Muthukumar, K.Murali 2006]4.2. Review on time domain based techniquesIn this class of scramblers the speech segments of length 10-30ms is taken up for permutation, because this willresult in a bandwidth-preserving operation [S.C.Kak et al 1977]. The basic unit taken up for scrambling is a block ofsamples or segments; variation in the scrambling technique depends on the operation that is performed on the blocks.The block-wise operation introduces a time-delay which is directly proportional to the block size. Permutation basedscramblers do not change the characteristics like frequency, phase and amplitude of the speech components, but thetime or frequency order of the components are only changed. The coordinates of 1s in the permutation matrixdefines the scrambling key, thus for an NxN matrix the key-space is of N! keys. In this key-space, only 10-20% ofthe keys provide low residual intelligibility, hence key selection is an important factor in this type of scrambler. Themain advantage of permutation based scramblers is that it does not increase the signal bandwidth [D. B.Sadkhan, D.Abdulmuhsen, N. F.Al-Tahan 2007].The scrambling efficiency (S) is given by the functionS = F(B,N,D)Where B = block size, N = number of samples or Segments in a block and D = the temporal distance of segmentseparation.The average segment separation and residual intelligibility are monotonically related. For permutation scramblersgiven a constant B and N, the scrambling efficiency is directly proportional to the temporal distance D. Temporaldistance is the time-distance between a pair of segments in the original speech that appears as adjacent segments inthe scrambled speech. Its important to note that B and D are related as given belowMax(D) = 2B - 1(InSegments)For a block size of 8, the maximum temporal distance achievable is 15 segments[N.S.Jayant, R.V.Cox,B.J.McDermott, A.M.Quinn 1983].In the Time-Inversion technique, block length of the order of 128ms or 256 ms is chosen. The order of the speechsamples are inverted within the block. This resulted in reduction of residual intelligibility, but the level is stillcomparatively higher. The total encoding delay is of the order of 256ms or 512ms. Time-inversion is a deterministicoperation; hence the cryptanalytic strength is lesser.In the TSP technique, segments of speech are permuted and transmitted in a pseudo-random fashion by following asegment-mapping algorithm. Two types of TSP techniques available are block and sequential technique. In the Block 26

6.
Innovative Systems Design and Engineering www.iiste.orgISSN 2222-1727 (Paper) ISSN 2222-2871 (Online)Vol 3, No 7, 2012TSP, all the Segments in a given block are scrambled and transmitted before the segments of the next block arebrought into the scrambler memory. But in a sequential TSP, individual segments are transmitted instead of waitingfor the block. Optimal segment duration in both the cases is about 16-32ms [N.S.Jayant, R.V.Cox, B.J.McDermott,A.M.Quinn 1983]. In a block TSP Scrambler known as Hopping-Window TSP Scrambler, segments of 16msduration in blocks of b segments are used. Thus the number of permutations available is b!, but only 0.1% of thepermutations are good from a scrambling point of view. The sequential TSP scrambler known as Sliding-windowTSP scrambler has a memory to store b segments; the segment that is outputted is determined by a pseudo-randomselector. The maximum staying time in the memory permissible for a segment is t=2b which ensures an optimalresidual intelligibility. This staying time is termed as communication delay. TSP based scramblers gives a higherresidual intelligibility level of the order of 80-100% for an communication delay of 256ms, for larger communicationdelay (512ms) the intelligibility level of block TSP improves to 60%. Using mu-law compression of speech, theresidual intelligibility level can be improved up to 45%. In the TSP technique the need to synchronizescrambler-descrambler is the main disadvantage [N.S.Jayant 1982].In the Time Shifting of Speech Sub-bands technique, different time segments of speech are differentially delayed.Normally the time segment corresponding to the lower frequency signal is delayed by time interval τ and added tothe time segment corresponding to the higher frequency signal for transmission. The reverse happens at thedescrambler and the total encoding delay introduced is τ. Scramblers in this category provide a better residualintelligibility level, but with higher encoding delay.[N.S.Jayant 1982]In the Reverberation technique, multiple number of time-discrete echoes with fixed interval are mixed with thecurrent speech amplitude to generate the scrambled output. In the Forward type technique the echoes decreasesexponentially and in the Reverse type the echoes increases exponentially. These schemes have higher value ofencoding delay and a lower residual intelligibility [N.S.Jayant 1982].Time-domain based scrambler which does not need synchronization uses a time varying transversal filter, where theincoming time-samples are selected randomly and multiplied by constant values. Conversely in frequency domainthis is equivalent to having narrow band filters that have different center frequencies, each of the filter passes a giveninput frequency sub-band whose center frequency is then shifted. The amount of frequency shift is controlled byeither constant key or variable key, which results in scrambling of the input frequency sub-bands. For most of thekeys the speech signal is not intelligible, but for a subset of the keys the scrambled spectrum shows perfect symmetryfor certain sub-bands, which results in the presence of sufficient intelligibility [F. Huang, E. V. Stansfield 1983].In a speech scrambling algorithm based on blind source separation, unknown and mutually independent sourcesignals which are in the form of mixtures are used. The algorithm proposed combines the time element scramblingand masking methods, wherein segments of speech signal are mixed with equal number pseudorandom key signals.The process of mixing reduces the number of the segments; hence decryption without knowing the key signals willnot be possible. A significant aspect of this algorithm is that the speech segments are taken up together for mixingwith the key signals thereby rendering more complete scrambling, hence keeping the residual intelligibilitylower[Q.H. Lin, F.L. Yin, T.M. Mei, H.L. Liang 2004].4.2.1 Experimental OverviewThe Table 2 given below lists the comparative values of the factors for the various algorithms. The Time-Inversionmethod is applied on a segment of speech samples and hence the scrambled speech retains significant amount ofresidual intelligibility. To have a lower the residual intelligibility level the segment size is made larger. In the TSPscrambling method, scrambling takes place at the segment level hence the residual intelligibility is reduced, but notby a significant amount. By performing scrambling of the speech samples within the segment together with thesegment level scrambling, the residual intelligibility can be lowered. In the reverberation method, the residualintelligibility is controlled by the number of the past speech samples that impacts the present speech sample; whenthis number is higher the residual intelligibility becomes lower.4.2.2 Merits and DemeritsThe bandwidth expansion is low. Time-Domain operations remove the speech-silence rhythm that is present in thescrambled signal. Time-Domain based scramblers are robust to real channel imperfections, wherein the overallnature of the speech signal is intact for segment length within 16ms. Hence synchronization is essential for segment 27

7.
Innovative Systems Design and Engineering www.iiste.orgISSN 2222-1727 (Paper) ISSN 2222-2871 (Online)Vol 3, No 7, 2012lengths more than 16ms.The various algorithms in this category have a comparatively higher value of residual intelligibility and the digitintelligibility is of the order of 60%. For lower values of residual intelligibility the encoding delay will be above512ms. The key-space is low because these algorithms work on a subset of speech samples.4.2.3 Future WorkThe effect of loss of synchronization between the transmitter and the receiver on the intelligibility of theunscrambled speech needs to be examined.4.3. Review of frequency domain based techniquesIn frequency domain based scramblers, the frequency sub-bands of the audio signal are divided into segments andscrambling of these segments is performed. The Frequency Inversion as shown in Figure 4 is a technique based onone reference frequency which is termed as the key. Though this technique provides a residual intelligibility level of30%, the characteristics of the scrambled speech are identifiable; hence it has the least crypt-analytic strength. Amarginal variation of this technique is the frequency hopping inversion which involves a varying reference frequency;here the residual intelligibility is only slightly better. This technique offers a digit intelligibility score of 30% whenuntrained listeners are used.[S.C.Kak et al 1977].Band-splitting technique which involves permuting the frequency sub-bands offers a better residual intelligibility.With f sub-bands, the total number of permutations available is f!. When the correct position of one or two mainsub-bands are found out, then information of the phonemes can be recognized, hence the crypt-analytic strength islower.[N.S.Jayant 1982]With Band-splitting and Frequency Inversion technique, specific frequency sub-bands are subjected to frequencyinversion. With f sub-bands the total number of scrambler mappings possible is of the order of f!2f of which only 5%of the mappings are effective. The word intelligibility level using this technique is of the order of 45 to 70% withtrained listeners.[N.S.Jayant 1982]In the Frequency Inversion followed by Cyclic Band-shift technique each sub-band is shifted by the factor n (modulok). For the case of 16 sub-bands and the shift variation rate of 50 per second, the residual intelligibility level is of theorder of 55% for digits and 30% for words. [N.S.Jayant 1982]4.3.1 Experimental OverviewThe Table 3 given below lists the comparative values of the factors for the various algorithms.4.3.2 Merits and DemeritsThe encoding delay and bandwidth expansion are low. These classes of scramblers do not need to havesynchronization between the transmitter and receiver for segment length upto 200Hz. The spectral characteristics ofthe individual phonemes are altered which increases the security. The various algorithms discussed in this categoryhave higher levels of residual intelligibility and the presence of speech-silence rhythm. The key-space is low becausethese algorithms work on a subset of speech samples. Crypt-analytic strength is lesser because identification ofcertain frequency components gives information that leads to deciphering the information of the remaining content.A common disadvantage of all frequency domain scramblers is the effect of group delay distortion in thetransmission channel.4.3.3 Future WorkSpecification of the spectral distortions that provide optimal speech scrambling needs to be done. Cascading ofmultiple stages of the techniques to realize better residual intelligibility levels can be done.4.4. Review on two dimensional scrambling techniquesTwo-Dimensional Scramblers operates on time-segments of 16ms duration which are subsequently partitioned intofrequency sub-bands, manipulations of both the time and frequency domain components are done simultaneously.The Time-domain manipulations destroy the speech-silence rhythm and the frequency-domain manipulations alterthe spectral characteristics of some of the audio components, thereby cumulatively reducing the residualintelligibility up to 15-25%. These types of scramblers come with increased complexity, encoding delay andsensitivity to channel imperfections.[N.S.Jayant, R.V.Cox, B.J.McDermott, A.M.Quinn 1983] 28

8.
Innovative Systems Design and Engineering www.iiste.orgISSN 2222-1727 (Paper) ISSN 2222-2871 (Online)Vol 3, No 7, 2012Frequency Inversion combined with Block TSP technique produces digit intelligibility of the order of 20% for ablock size of 256 samples. The scrambler is operated with a delay of 1024ms. Dynamic cyclic band-shift schemesthat are used for analog scramblers have two variations. First type of scrambler system uses a type oftime-manipulation called dynamic time reverberation. This system produces digit intelligibility of the order of18-28% and word intelligibility close to zero. In this system the preferred order is frequency scrambling followed bytime scrambling. Second type of system uses time-shifting between two frequency sub-bands, this system producesdigit intelligibility of the order of 25-38% and word intelligibility of 2-3%. In this system the preferred order is timescrambling followed by frequency scrambling. The above two types of systems operate well for channels thatintroduce heavy signal distortion and fading. [N.S.Jayant 1982]In the TFSP based scrambling system shown in the Figure 5, f frequency sub-bands in each of the b time-segmentsare collected to form fb time-frequency segments. These time-frequency segments are outputted randomly eithersequentially or in blocks from the scrambling system memory. The maximum memory retention time of one segmentbeing t = 2fb segment-durations, this retention time denotes the encoding delay. The average digit intelligibility of aTFSP scrambler for 256ms encoding delay is about 25 percent. The word intelligibility score is close to zero[N.S.Jayant 1982]. Two problems that need to be addressed in the TFSP scrambler are synchronization and recoveredspeech quality. Synchronization is established by sending signaling chirps from the transmitter. Channel equalizationis done to increase the recovered speech quality. [R.V.Cox, T.M.Tribolet 1983].4.4.1 Experimental OverviewThe Table 4 given below lists the comparative values of the factors for the various algorithms. In most algorithms theresidual intelligibility is lower with typical level of 30%. In all the algorithms the usable key-space is very low,encoding delay is moderate with typical value of 256 ms, bandwidth expansion is low.4.4.2 Merits and DemeritsThe various algorithms in this category have low levels of residual intelligibility with digit intelligibility of the orderof 20%. The presence of speech-silence rhythm is removed. The bandwidth expansion is also low. These classes ofscramblers do not need to have synchronization between the transmitter and receiver. These scramblers are robust totransmission channel characteristics with problems only at the spectral and temporal segment boundaries, thus theloss of speech quality is lesser [N.S.Jayant 1982].The key-space is low because these algorithms work on a subset of time and frequency segments. Crypt-analyticstrength is lesser, because identification of certain frequency components gives information that leads to decipheringthe information of the remaining content. The encoding delay is moderate because these types of scramblers involveboth time and frequency domain manipulations.4.4.3 Future WorkThere is scope for devising appropriate techniques to bring the digit intelligibility value closer to the lower bound of10% and word intelligibility value of zero percent. Problems due to the channel characteristics on the spectral andtemporal boundaries need to be addressed.4.5.Review on Transform domain based techniquesThe class of analog audio scramblers based on operations performed on the linear transform coefficients of thespeech samples is known as transform based scrambler. The transform based scramblers have larger number ofusable permutations which increases the cryptanalytic strength and offers very low levels of residual intelligibility.A speech sample block is first converted into transform coefficient blocks (F). These transform coefficient blocks arescrambled based on operations like permutation or non-linear modulo-arithmetic masking (P). The scrambled speechblocks are generated by performing inverse transformation operation (I). The reverse of this is done at the receiver.The process for transform domain scrambling is shown in Figure 6.The transformations to be used in these class of algorithms has to be linear orthogonal type, the reason being that itwill not increase the level of the noise component in the scrambled sequence. For example, consider F as thetransformation and x as the input sequence, the transformed sequence is given by Fx, when the noise gets added thetransformed sequence becomes Y = Fx+n. When inverse transformation is applied on Y then F-1Y = F-1(Fx+n)= x+F-1n, hence when F-1 is orthogonal F-1n = n and the noise component can be easily filtered out, thereby preserving the 29

9.
Innovative Systems Design and Engineering www.iiste.orgISSN 2222-1727 (Paper) ISSN 2222-2871 (Online)Vol 3, No 7, 2012sequence energy.A simple scrambling scheme is the permutation of the coefficients of the transform sequence, where a band-limitedinput sequence results in a band-limited scrambled sequence at the output. The Discrete Prolate SpheroidalTransform (DPST) is a type of linear orthogonal transform that is used for this purpose. The crypt-analytic strengthof this scheme is much higher compared to traditional analog scramblers. The residual intelligibility level is lower,but the limitation is high complexity and usage in narrowband channels only [A. D.Wyner 1979].In the FFT based scrambling, the FFT coefficients selected are scrambled using a permutation matrix which is eitherstored in the ROM memory or generated instantaneously from a key value. FFT based scrambling expands thebandwidth of the scrambled signal, hence when the transmission is done on a band-limited channel the recoveredspeech quality is reduced. The recovered speech quality can be increased by having a large number of samples perFFT frame; typical frame lengths for this purpose are 128,256 and 512 samples. As the number of permutable FFTcoefficients is higher the crypt-analytic strength is increased, but the encoding delay also increases considerably. Areasonable frame size with tradeoff between recovered speech quality and encoding delay is 256 samples per frame.An alternative to limit the bandwidth is to take up a subset of the FFT coefficients for scrambling, commonly 85 FFTcoefficients corresponding to frequencies from 288 to 2976 Hz is taken for scrambling. To further increase thecrypt-analytic strength a multi-frame structure where different permutation is used for each frame can be used. FFTbased scheme is impacted by group delay distortion which is equalized using a digital transversal filter. Preservationof signal energy, talk spurts and original intonation decreases the security of the FFT based scheme. [K. Sakurai, K.Koga, T. Muratan 1984].A scrambling scheme based on FFT coefficient permutation and adaptive dummy spectrum insertion is used toprevent the detection of the talk spurts. Dummy spectrum insertion introduces noise at the receiver; syllabiccompanding operation is used to reduce this noise. To enhance security, FFT coefficients of lesser energy areadaptively selected and replaced with dummy coefficients prior to permutation. The values of these dummycoefficients are selected so that the scrambled speech signal is of constant energy. This scheme is sensitive to channelimpairments whereby the scrambled speech undergoes parabolic group delay distortion as shown in Figure 7, whichinduces high amounts of delay at the spectral boundaries. An equalizer is used to suppress this distortion [K. Sakurai,K. Koga, T. Muratan 1984].DCT has good energy compaction property, hence DCT based scrambling systems are superior when compared toDFT and DPST. When the bandwidth limitation is taken into account the DCT based scrambler has 197 coefficientsavailable for permutation in the band 300 to 3300 Hz. This results in a total of 197! possible permutations whichincreases the crypt-analytic strength and provides lower levels of residual intelligibility. These systems have lowerencoding delay and better recovered speech quality. To prevent detection of talk spurts dummy transform coefficientsare substituted for a predefined block of components in the original speech spectrum [S.Sridharan, E.Dawson,B.Goldburg 1993].In the technique based on Modified discrete cosine transform (MDCT), the audio samples transformed by MDCT aresorted and packetized according to its importance by an index. A subset of the important packets is selectivelyscrambled and the rest of the packets are either discarded or left in its original form. The primary focus is ensuringhigh energy efficiency and it is seen that as the number of packets scrambled are increased, then the dissimilaritybetween the original and scrambled audio increases. [H. Wang, M. Hempel, D. Peng, W. Wang, H. Sharif, H.H. Chen2010]Speech scramblers based on Hadamard (H) matrices which are a linear transformation of speech components is aneffective alternative to permutation based scrambler. Main advantage of this method is that the signal energy isdistributed more uniformly over the scrambling frame hence making pattern matching impossible. Other advantagesinclude no bandwidth expansion, lower residual intelligibility, larger key-space, lower encoding delay and simplersystem implementation. The results for the listening test for sentence intelligibility for a frame length of 64msindicate that for permutation scrambler 20% correct guess was obtained and for H-based scrambler correct guess wasapproximately zero percent [D. B.Sadkhan, D. Abdulmuhsen, N. F.Al-Tahan 2007]. A significant advantage is thatthe speech segments are both scrambled and altered in terms of amplitude, frequency and phase (This is because theentries of the H-Matrix is 1,-1) thereby giving lower values of residual intelligibility [V. Milosevic, V. Delic, V. Senk1997]. 30

10.
Innovative Systems Design and Engineering www.iiste.orgISSN 2222-1727 (Paper) ISSN 2222-2871 (Online)Vol 3, No 7, 2012When the speech signal is subjected to circulant transformation, phase distortion is introduced. This phase distortionredistributes the signal energy to the entire frame. When the frame length is higher the order of circulant matrixincreases, thus the redistribution of energy covers more area thereby reducing the residual intelligibility to a lowervalue. Another property of this scheme is the distortion of the formant frequencies and introduction of new formantswhich significantly contributes in reducing the residual intelligibility. As the row values of the circulant matrixfunctions as the key, theoretically an infinite number of keys are possible. [G.Manjunath, G.V.Anand, 2002] .Analog speech scrambler based on Wavelet Transform scrambles the speech signal in both time and frequencydomains. As this resembles a two-dimensional scrambler very low levels of residual intelligibility is obtained. In thismethod the speech signal is converted into wavelet-analyzed signal by means of the filter bank which is based onwavelet basis. These wavelet signals are then multiplexed and collected as frames of constant length, scramblinginvolves permutation of these frames. The spectrum of the scrambled signal is highly irregular and the formantfrequencies of the speech signal are hidden completely. Thus the scheme provides very low values of residualintelligibility. [F. Ma, J. Cheng, Y. Wang, 1996].A technique based on parallel structure of two different types of wavelets with the same decomposition levels hasbeen discussed in [D. B.Sadkhan, D. Abdulmuhsen, N. F.Al-Tahan 2007]. The combinations of wavelets used areDb1 wavelet along with Haar wavelet, Db2 wavelet along with Sym2 wavelet and Db4 wavelet along with Sym4wavelet for the same level. The speech signal is divided into two sub-frames of equal size and the two sub-frames areapplied to the parallel wavelet structure and the wavelet coefficients are generated. These coefficients are thensuitably permuted. For a level 3 type of Haar wavelet the lowest value of Segmental SNR (SEGSNR) distancemeasure that is achieved for a SNR of 15db is -4.7093. These results show that using wavelet transforms give lowervalues of residual intelligibility. Since high computation time is involved, the wavelet structure level is restricted tothree.In the technique based on the combination of QAM mapping method and an orthogonal frequency divisionmultiplexing (OFDM), the speech signal in PCM format is converted to complex valued frequency components byQAM mapping. These components are permuted and then inverse transformed to get the time-domain signal. Tocontrol bandwidth expansion the number of components is restricted to 93 corresponding to frequencies from375-3250 Hz. The length of the scrambling key is equal to 93 and hence the key-space is 93!. The formant and pitchinformation are totally removed in the scrambled speech thereby lowering the residual intelligibility. [D.C.Tseng,J.H.Chiu, 2007]4.5.1 Experimental OverviewThe Table 5 given below lists the comparative values of the factors for the various algorithms. In most of thealgorithms the residual intelligibility is lower. In all the algorithms the usable key-space is high but limited to thenumber of transform coefficients selected. Encoding delay is higher because of the larger number of samplesavailable in each frame and bandwidth expansion is comparatively high.For a given frame length the residual intelligibility and key-space of DPST and FFT based algorithms arecomparable, but the encoding delay of the DPST algorithm is higher as it involves more number of calculations. In aDCT based system when the samples/frame is greater than 256, the residual intelligibility decreases, but the encodingdelay increases. Circulant transformation based system is capable of distorting the silent portions of the speechthereby reducing the intelligibility levels to very low values. In this scheme, since the phase vector is the key,theoretically infinite choices of keys are possible for the phase range 0 to π.4.5.2 Merits and DemeritsThe algorithms in this category have very low levels of residual intelligibility with sentence intelligibility closer tozero percent. The presence of speech-silence rhythm is removed. The key-space is high because these algorithmswork on a considerably larger number of permutable transform coefficients, crypt-analytic strength is also higher.The noise components in the original signal are not enhanced and will be kept at the same level. Moreover, theenergy of the scrambled signal is held constant.The encoding delay is high because these types of scramblers involve time and frequency domain manipulations, thebandwidth expansion is also higher.4.5.3 Future Work 31

18.
This academic article was published by The International Institute for Science,Technology and Education (IISTE). The IISTE is a pioneer in the Open AccessPublishing service based in the U.S. and Europe. The aim of the institute isAccelerating Global Knowledge Sharing.More information about the publisher can be found in the IISTE’s homepage:http://www.iiste.orgThe IISTE is currently hosting more than 30 peer-reviewed academic journals andcollaborating with academic institutions around the world. Prospective authors ofIISTE journals can find the submission instruction on the following page:http://www.iiste.org/Journals/The IISTE editorial team promises to the review and publish all the qualifiedsubmissions in a fast manner. All the journals articles are available online to thereaders all over the world without financial, legal, or technical barriers other thanthose inseparable from gaining access to the internet itself. Printed version of thejournals is also available upon request of readers and authors.IISTE Knowledge Sharing PartnersEBSCO, Index Copernicus, Ulrichs Periodicals Directory, JournalTOCS, PKP OpenArchives Harvester, Bielefeld Academic Search Engine, ElektronischeZeitschriftenbibliothek EZB, Open J-Gate, OCLC WorldCat, Universe DigtialLibrary , NewJour, Google Scholar