AbstractContent distribution using network coding has received a lot of attention. When we directly apply network coding technique it may be insecure because attackers can inject bogus data to corrupt the content distribution process so that information dispersal will become slow or network resource may get deplete. Hence, content verification is an important and practical issue when we make use of network coding. When random linear network coding is used, it is infeasible for the source of the content to sign all the data .Hence, the traditional hash and sign methods are no longer applicable. A new on-the-fly verification technique has been proposed which uses a classical homomorphic hash function. However, this technique is difficult to be applied to network coding. Because it has high computational and communication overhead. This issue is further explored and proposes a methods that helps to reduce both the computational and communication cost and provide the security at the same time.

For the past few years there has been an increasing interest on the application of network coding on file distribution. Various researchers have considered the benefit of using network coding on P2P networks for file distribution and multimedia streaming and also on millions of PCs around the Internet for massive distribution of new OS updates and software patches. The main focus here is the security of content distribution schemes using network coding and how to achieve the security at the same time.

Network Security is used to provide security to the authorized data which is being distributed from source to destination in the network. It also prevents unauthorized access of data by developing a secure network using security service like access, confidentiality, authentication, integrity, non-repudiation. Some of common internet attack methods used to modify the authorized data are eavesdropping, viruses, worms, Trojan, IP spoofing, denial of service.

To prevent data from such attacks technologies like cryptographic systems, firewall, intrusion detection

systems, anti malware software and Scanners and secure socket layer is used. Current development in the network security is the biometrics and smartcard which greatly reduces the unauthorized access of secure systems.

An important issue in practical large content delivery in a fully distributed environment is how to maintain the integrity of the data in the presence of link failures, transmission errors, software and hardware faults, and even malicious attackers. If malicious attackers are able to modify the data in transmission or inject arbitrary bogus data into the network they may be able to greatly slow down the content distribution or even prevent users from getting correct data entirely.

characteristics and applications of wsns

The characteristics of WSN are- Power consumption constrains for nodes using batteries or energy harvesting, Communication failures, Ability to cope with node failures, Mobility of nodes, Dynamic network topology, Heterogeneity of nodes, Scalability to large scale of deployment, Ability to withstand harsh environmental conditions, Easy of use Unattended operation.

Applications of WSN are:

Industrial automation

Automated and smart homes

Video surveillance

Traffic monitoring

Medical device monitoring

Monitoring of weather conditions

Air traffic control

Robot control.

DEFINITIONS

Content Distribution is the process of transmitting the messages or data from source to destination. Content Distribution is the act of sharing or circulating content with other websites, directories, or users. Content Distribution is a great means for product companies to circulate their products through various online means.

Content verification means verifying the contents with its strength to check whether the received content is modified by unauthorized users.

Network coding is the set of techniques or algorithm for giving security during transmission via networks. Network coding is a technique which can be used to improve a network's throughput, efficiency and scalability, as well as resilience to attacks and eavesdropping, as compared to traditional methods of OSI model or TCP/IP model.

Bogus data is to insert fake data to the original data by unauthorized users.

RELATED WORK

The maximum capacity between a source and a sink connected through a network is the same as the maximum network flow f between them. When the network can be viewed as a directed acyclic graph with unit capacity edges, f is also the min-cut of the graph between the source and the sink. However, when there is a single source and multiple sinks, the maximum network flow f may not be achieved. If the nodes in a network can perform coding on the information they receive it is possible for multiple sinks to achieve their maximum network flow bound simultaneously through the same network. Then it becomes possible to achieve the theoretical capacity bound if one allows the network nodes on the path to perform coding, instead of just the conventional tasks of routing and forwarding.

Later, Li et al. showed that, although the coding performed by the intermediate nodes does not need to be linear, linear network codes are indeed sufficient to achieve the maximum theoretical capacity in acyclic synchronous networks. In their settings, each node computes some linear combination of the information it receives from its upstream nodes, and passes the results to its downstream nodes. However, to compute the network code (i.e., the correct linear combinations) that is to be performed by the nodes, the topology of the network has to be known beforehand, and has to be fixed during the process of content distribution. Furthermore, their algorithm is exponential in the number of edges in the network.

Koetter and Medard considered the problem of linear network coding. They improved and extended the results by Li et al. and considered the problem of link failures. They found that a static linear code is sufficient to handle link failures, if the failure pattern is known beforehand. However, as mentioned by Jaggi et al. the code construction algorithm proposed by Koetter et al. still requires checking a polynomial identity with exponentially many coefficients.

Jaggi et al. proposed the first centralized code construction algorithm that runs in polynomial time in the number of edges, the number of sinks, and the minimum size of the min-cut. The network coding does not improve the achievable transmission rate when all nodes except the source are sinks, finding the optimal multicast rate without coding is NP-hard.

With the popularity of P2P networks researchers are beginning to consider the problem of on-the-fly Byzantine fault detection in content distribution using random network coding. The verification techniques proposed by Krohn et al. can be employed to protect the integrity of the data without the knowledge of the entie content. The verification techniques were originally developed for content distribution using rateless erasure codes and were based on homomorphic cryptographic hash functions.

Random network coding was proposed by Ho et al as a way to ensure the reliability of the network in a distributed setting where the nodes do not know the network topology, which could change over time. In their setting, each node would perform a random linear network coding, and the probability of successful recovery at the sinks can be tightly bounded. Chou et al, proposed a scheme for content distribution based on random network coding in a practical setting, and showed that it can achieve nearly optimal rate using simulations. Recently, Gkantsidis and Rodriguez proposed another scheme for large scale content distribution based on random network coding. They show by simulation that when applied to P2P overlay networks, using network coding can be 20 to 30 percent better than server side coding and 2 to 3 times better than uncoded forwarding, in terms of download time.

PROBLEM STATEMENT AND PROPOSED SYSTEM

In existing system, content is sent from the source to destination. By applying hash technique we get hashed content and key. Same key is used for the same content every time. Hashed content is sent to the destination through the centralized server. If the destination does not have the capacity of storing the content which is received from the source then both transmission rate and delay will be too high. Thus by sending same key for the same content unauthorized users easily modify the content.

In proposed system, three techniques to maintain integrity of the content being distributed from the source to the destination is proposed. First Random linear network coding is used to split the content and to store the content in different storage locations randomly. Thus by splitting the content the transmission rate and delay will be less and network traffic will also be avoided. Homomorphic hash function is the second technique to hash the splitted content and to generate the keys randomly. Hashed content is randomly distributed to the distributed network. Hashed content strength and keys is stored in the source.

Finally on-the-fly verification technique is used in which three methods are used. Data verification verifies hashed content and hashed content strength. Block by block downloading downloads hashed content from distributed network, random keys and hashed content strength from the source. Using keys we dehash the hashed content to get original splitted content Reconstruction reconstructs the splitted content which is received randomly to get the original content sent from the source.

A HOMOMORPHIC HASH FUNCTION BASED ON VSH The proposed basic scheme is based on the nontrapdoor

variant (VSH-DL) of the Very Smooth Hash (VSH) function.

The rationale behind VSH and its variants is that using smaller primes as group generators would greatly improve the computational efficiency of hash functions that involve many exponentiations. The VSH functions, however, do not have the homorphic property that would be required by the on-the-fly verification process. The homomorphic property is obtained by rearranging the order of the bits in the input message blocks before applying VSH-DL.

A Homomorphic VSH-DL not tampered with, c = (c1,..,cn) are the coefficients where

Let p be a large strong prime such that there is another large prime q divides p-1. That is, there is a positive integer such that p=q+1. As a result, there exist a multiplicative

i i

subgroup G in Zp* with order q. Furthermore for any x Zp,, let y=x mod p, if y 1 mod p, y must be in G.

each ci zq , x is the linear combination x = i= n cixi mod zq.

1

Each node can verify the integrity of the packet as following:

h

Compute the hash value H1 = H(x)

Let p1, . . , pm be m prime numbers, such that m

n

Compute H2 =

i=1

c mod p.

< c for some constant c. In other words, m is bounded by some polynomial in . In practice, we

Verify that H1

= H2.

can choose those primes to be the m smallest prime numbers, such that pi 1 mod p. That is, we can choose p1= 2, p2= 3 and skip those primes whose order is not q. When q is much larger than (e.g =2) the probability that a random small prime pi satisfies the condition pi 1 mod p is high. Assume that

a message is a vector of the form: x =(x1,..xm) m where xi

Zq for 1 i m .

The hash of x is computed as

THE BASIC INTEGRITY VERIFICATION SCHEME

The Basic Scheme

The proposed scheme consists of two algorithms, namely, the encoding algorithm where the original data are prepared for distribution and the verification algorithm, which is used by individual nodes to verify the integrity of the received data.

Encoding

Let the parameters p, q, m and p1 ,, pm be chosen as in Section 7.1Given any binary string X, let n be the smallest positive integer such that | x | < mn ( – 1) 1. Assume that n < poly() for some positive polynomial poly. We also assume that X is compressed, such that the bits are random.

In this way, we can always pad the original X properly (e.g., with a one followed by zeros) such that the result can be divided into small pieces of – 1 bits each. In other words, we can always encode the data into the form x=(x1,..,xn) where xi=(xi,1,.,xi,m) T and each xi,j is of length – 1, and can be considered as an element in Zq for all 1 i n and 1 j m.

We will call xi as the ith block, and each xi,j as the jth subblock of the ith block. Now, given X, the encoder computes

for each 1 i n.

Basic Verification Algorithm

During verification, each network node is given a packet (x,c) and system parameters. In the case where this packet is

It is worth to note that once an intermediate node has

received the parameters p, q, and m, it will be able to compute and p1,,pm locally. Therefore, those prime bases do not need to be distributed. The hash values, however, still need to be distributed. Nevertheless, we will see that in any reasonable setting the communication overhead is low.

BATCH VERIFICATION

The Baseline Batch Verification Scheme

To reduce the computational cost of the verification, a batch of packets can be verified at the same time. In particular, after a node has received b packets ((y1,c1),(y2,c2),.(yb,cb)) the node can verify all the packets as follows:

Randomly choose b numbers r1,.,rb zq

Compute w = i=1b riyi mod q.

Compute v = i=1b rici mod q

Verify the integrity of the packet (w,v) using the basic integrity verification scheme.

Due to the homomorphic property of the hash function, we can see that if the verification in Step 4 fails, then at least one of the packets is corrupted. However, if the batch of packets pass the verification, it is still possible that some packets are corrupted but could not be detected by the verification algorithm. Hence, we need to analyze the security more carefully.

SPARSE RANDOM LINEAR NETWORK CODING

The computation overhead involved in the content distribution consists of two parts. The first part is the cost due to the verification of the packets and the second part is the cost due to the need to compute random combinations of the data blocks.

The cost can be reduced through the use of more efficient hash functions and batch verification techniques. the second part of the cost also plays a very important role in practice, especially when the content is large (e.g., in the order of gigabytes), and it has a significant impact on the choice of parameters.

The method involves divide the content to be distributed into smaller runks and random linear network coding is applied to each trunk of content independently. Although this method works in certain application scenarios, it does not address the problem directly but instead avoids high computation overhead by applying random linear network coding to smaller problem instances.

Hence, this strategy may lose certain benefits from network coding. For example, when a node sends data to its downstream nodes, it has to decide which trunk to send. If the algorithm to make such decisions is not designed properly, it may result in a situation where a certain trunk cannot be reconstructed after a few key nodes have left the network.

A simple yet powerful alternative to avoid high computation cost when computing the random combinations is proposed. This method is referred as Sparse Random Linear Network Coding.

The idea is that, instead of computing a random combination of all the n data blocks, we can instead randomly select only of them and compute a random combination of only those blocks. When a node A needs to send a packet (x,c) to its downstream node, it performs the following steps:

Randomly choose packets from the random combinations received by A so far. Let these packets be (x1,c1),.,(x,c).

Randomly choose r1,, r zq

.Compute packet (x,c) as

Also, we require the source node to be more powerful than other nodes and still send random combinations of all n blocks. It is clear that each packet being sent over the network would still be quite random, and allow high probability of reconstruction at the receivers.

The probability of successful delivery is very high even with small constant . Therefore, the computation overhead due to the computation of random combinations can be made independent of the number of blocks, which greatly relaxes the constraints that need to be considered for practical systems.

COMPUTATION OVERHEAD

The computation cost involves both the computation of random combinations and the hash values. When sparse random linear coding is applied, for each combined block we only need to compute the combination of blocks, which makes the computation of combinations much more efficient than that of the hash values.

In Fig 1, the throughput of the computation of H is computed as the size of a data block divided by the average time it takes to compute a hash value for the block. The values are taken as the average 25 random instances. we can see that throughput increases when the number of subblocks per block is increased.

Even with relatively small values of and ( say =512 and =400), the computation of H1 cannot be very efficient compared with common hash functions such as SHA-1. With carefully designed precomputation methods the throughput can be increased by a small constant factor r at the price of a storage requirement that is 2r times that without precomputation.

Fig. 1. Computation Efficiency of H1

Fig. 2. Computation Efficiency of H2

In Fig 2, the throughput under different security parameters is the gradient of different curves in the figure. For example, for p=512 and q=400, when n= 500, the time it takes to compute H is about two seconds. This translates to a throughput of about 12.5 kilobytes per second, which is much lower than 40 kilobytes per second. For p=1024 and q=800 similar calculations show that the throughput is greatly reduced to about 5 kilobytes per second.the advantage is that the efficiency is not reduced as much when the parameters are increased for stronger security. The computational advantage is mainly due to the use of deterministically chosen small primes as the bases for exponentiations, which is the rationale behind the design of the VSH scheme.

CONCLUSION

The problem of on-the-fly verification of the integrity of the data in transit is considered. Although a previous scheme based on homomorphic hash functions is applicable, it was mainly designed for server side coding only and will be much less efficient when it is applied on random network coding. A new on-the-fly verification scheme based on a faster homomorphic hash function is proposed and proved its security.

The computation and communication cost incurred during the content distribution process is considered and identify

various sources of the cost and investigate ways to eliminate or reduce the cost. A sparse variant of the classical random linear network coding is proposed where only a small constant number of blocks are combined each time.

Experiments are conducted to examine the efficiency of the proposed hash function, as well as the effectiveness of the proposed sparse random linear network coding. The results show that the new hash function is able to achieve reasonable speed, and the sparse variant performs just as well as the random network coding using typical parameters.