3
Abstract The goal of forensic steganalysis is to detect the presence of embedded data and to eventually extract the secret message. In the given paper a new feature based steganalytic method for JPEG images was introduced. The detection method is a linear classifier trained on feature vectors corresponding to cover and stego images. The features are calculated as an L1 norm of difference between a specific macroscopic functional calculated from stego image and same functional obtained from a decompressed, cropped, and recompressed stego image. The functionals are built from marginal and joint statistics of DCT coefficients. Because the features are calculated directly from DCT coefficients, conclusions can be drawn about the impact of embedding modifications on detectability. Three different steganographic examples are tested and compared. The experimental results reveal new facts about current steganographic methods for JPEGs and new design principles for more secure JPEG steganography.

4
Previous work on Steganalytic methods Chi-square attack by Westfield – original version could detect sequentially embedded message and was based on first order statistics. Based on distinguishing statistic – steganalyst first inspects the embedding algorithm and then identifies a quantity (distinguishing statistics) that changes predictably with the length of embedded message. For JPEG images the calibration is done by decompressing the stego image, cropping up a few pixels in each direction and recompressing using same quantisation table. The DS calculated from this image is used as an estimate for the same quantity from the cover image. Using this calibration a highly accurate and reliable estimation of the embedded message length can be constructed for many schemes. Blind Classifiers by Memon and Farid – A blind detector learns what a typical unmodified image looks like in a multidimensional feature space and a classifier is then trained to learn the difference between cover and stego image features. Introduction of blind detectors prompted further research in steganography and Tzscoppe constructed a JPEG steganographic scheme HPDM (histogram preserving data mapping) which was undetectable using Farid’s scheme but is easily detectable using single scalar feature-calibrated spatial blockiness in DCT domain rather than from a wavelet decomposition.

5
Proposed Research The paper combined the concept of calibration with the feature based classification to devise a blind detector specific to JPEG images. Calibrated Features: Two types of features were used in analysis – first order features & second order features. All features were constructed in the following manner. A vector functional F is applied to the stego JPEG image J1. This functional could be global DCT coefficient histogram, a cocurrence matrix, spatial blockiness. The stego image J1 is decompressed to the spatial domain, cropped by 4 pixels in each direction and then recompressed with the same quantisation as J1 to obtain J2. The same vector functional F is then applied to J2. The final feature f is obtained as an L1 norm of the difference f = || F(J1) – F(J2) ||

7
Basic logic behind this choice for features is the following: The cropping and recompression produces a “calibrated” image with most macroscopic features similar to the original cover image because the cropped stego image is perceptually similar to the cover image and thus its DCT coefficients have approx the same statistical properties as the cover image. Cropping by 4 pixels is important because 8 x 8 grid of recompression does not see the previous JPEG compression and thus the obtained DCT coefficients are not influenced by previous quantisation in DCT domain. We can think the cropped/recompressed image as an approximation to the cover image or as side information.

8
First order Features : The simplest first order statistic of DCT coefficients is their histogram. Suppose the stego JPEG file is represented with a DCT coefficient array dk(i,j) and quatisation matrix Q(i,j) and global histogram of all 64k DCT coefficients will be denoted as Hr where r=L,…..,R ; L=min k, i, j dk(i,j) and R=max k, i, j dk(i,j) For a fixed DCT mode (i,j) let hr ij,denote the individual histogram of values dk(i,j). To provide additional first order macroscopic statistics to our set of functionals, we use dual histogram given as: Where delta(u,v) = 1 if u=v and 0 otherwise. The above g value is the number of times the value d occurs as (i,j)-th DCT coefficient over all total B blocks in JPEG image.

9
Second order Features: If the corresponding DCT coefficients from different blocks were independent then any embedding scheme that preserves the first order statistics – the histogram would be undetectable by Cachin’s definition of steganographic security. Thus we use the features that capture inter-block dependencies as they would be likely violated by most steganographic algorithms. Let Ir and Ic denote the vectors of block indices while scanning the image by rows and columns resp. The first functional capturing inter-block dependencies is the variation V defined as : Embedding changes also increase the discontinuities along the 8 x 8 block boundaries, thus two blockiness measures Ba, a=1,2are included to our set of functionals. The blockiness is calculated from decompressed JPEG image and represents an integral measure of inter-block dependency over all DCT modes over the whole image.

10
In the expression above M and N are image dimensions, x ij are grayscale values of the decompressed JPEG image. The final three functionals are calculated from the co-occurrence matrix C of neighboring DCT coefficients which is a DxD matrix, D=R-L+1 and matrix C describes the probability distribution of pairs of neighboring DCT coefficients and is defined as: Let C(J1) and C(J2) b e the co-occurrence matrices for JPEG images J1 and J2 resp. Due to approx symmetry of Cst around (s, t)=(0,0), the differences Cst(J1) – Cst(J2) for (s, t) belonging to {(0,1),(1,0),(-1,0),(0,-1)} are strongly positively correlated and same is true for the group (s, t) belonging to {(1,1),(-1,1),(1,-1),(-1,-1)}.

11
The co-occurrence matrix for the embedded image can be obtained as a convolution C*P (q), where P is the probability distribution of the embedding distortion which depend on the relative message length, and values of C*P (q) will spread out and following three quantities were taken as our features:

12
The final set of 23 functionals used in this paper is summarized as in table below:

13
Experiment The paper used the Greenspun image database consisting of 1814 images of size 780x540. All these images were converted to grayscale, the black border frame was cropped away and images were compressed using 80%quality JPEG. The paper selected three different steganographic algorithms namely F5 algorithm, Outguess 0.2, and Model based Steganography without and with deblocking MB1 and MB2 for JPEG images. Each technique was analyzed separately. For a fixed relative message length expressed in bits per non-zero DCT coefficients of the cover image, a training database of embedded image was created. The Fisher Linear Discriminant Classifier was trained on 1314 cover and stego images. The generalized Eigen vector obtained form this training was then used to calculate the ROC curve for the remaining 500 cover and stego images. The detection performance was evaluated using detection reliability P defined as : P = 2A-1, Where A is the area under ROC (Receiver Operating Characteristic Curve) also called as accuracy. The accuracy was scaled to obtain P = 1 for a perfect detection and P = 0 when ROC coincides with diagonal line (where reliability of detection is 0). The detection reliability of all the three methods is shown in table 2 as:

17
Table 3. Detection reliability for individual features for all three embedding algorithms for fully embedded images.

18
Conclusion From table 2 we can see that Outguess algorithm is the most detectable and also it provides the smallest capacity. The detection reliability is relatively high even for embedding rates as small as 0.05 bpc and this method becomes highly detectable for messages above 0.1 bpc. F5 algorithm performs better than outguess on turning off the matrix embedding since matrix embedding decreases the detectability of short messages as it improves the efficiency. From table 3 it can be seen that both MB1 and MB2 methods clearly have the best performance of all three tested algorithms. MB1 preserves not only the global histogram but all marginal statistics (histograms) for each individual DCT mode. MB2 algorithm has same embedded mechanism as MB1 but reserves one half of the capacity for modifications that bring the blockiness of the stego image to its original value.

19
Comments Looking at the results in table 1 and table 2 there is no doubt that the model based Steganography MB1 and MB2 is by far the most secure method out of three tested paradigms. MB1 and MB2 not only preserve the global histogram but also all histograms of individual DCT coefficients and hence all dual histograms are also preserved. MB2 also preserves one second order functional, L1 blockiness. Thus, we conclude that the more statistical measures an embedding method preserves, the more difficult it is to detect it. One surprising fact revealed is that preserving a specific functional does not mean that the calibrated feature will be preserved. Preserving the blockiness along the original 8x8 grid does not mean that the blockiness along the shifted grid will also be preserved. This is because the embedding and deblocking changes are likely to introduce distortion into the middle of the blocks and thus disturb the blockiness feature, which is the difference between the blockiness along the solid and dashed lines as seen in the figure 3 below:

21
Its further pointed out that the features derived from the co-occurrence matrix are very influential for all three schemes esp. for Model based steganographic methods. MB2 method is the currently the only JPEG steganographic method that takes into account the inter-blocking dependencies between the DCT coefficients which is the probability distribution of coefficient pairs from neighboring blocks.