Abstract

We study a lossy source coding problem with secrecy constraints in which a remote information source should be transmitted to a single destination via multiple agents in the presence of a passive eavesdropper. The agents observe noisy versions of the source and independently encode and transmit their observations to the destination via noiseless rate-limited links. The destination should estimate the remote source based on the information received from the agents within a certain mean distortion threshold. The eavesdropper, with access to side information correlated to the source, is able to listen in on one of the links from the agents to the destination in order to obtain as much information as possible about the source. This problem can be viewed as the so-called CEO problem with additional secrecy constraints. We establish inner and outer bounds on the rate-distortion-equivocation region of this problem. We also obtain the region in special cases where the bounds are tight. Furthermore, we study the quadratic Gaussian case and provide the optimal rate-distortion-equivocation region when the eavesdropper has no side information and an achievable region for a more general setup with side information at the eavesdropper.

As networks are becoming more distributed, their vulnerability to malicious activities increases which in turn raises the concern on the security of such networks. Consequently, information-theoretic security as a concrete framework for analyzing secrecy in networks has gained attention among researchers [2]. Information-theoretic security, which was initially introduced by Shannon [4], exploits different statistical characteristics of received information at the legitimate receiver and at the eavesdropper. Moreover, it makes no assumptions on the computational power of the eavesdropper, unlike the traditional cryptographic approaches for secrecy. Later, Wyner introduced the Wiretap channel model in [5] and showed that perfectly secure communication without a shared secret key is possible if the channel from the transmitter to the eavesdropper is a degraded version of the channel to the legitimate receiver. This result was generalized to broadcast channels with confidential messages by Csiszár and Körner in [6]. Subsequently, many extensions to this problem have been developed and studied in the literature (see, for instance, [2], [3], and references therein).

In this paper, we consider secrecy in a multiterminal source coding problem. In particular, we study the problem of conveying an information source to a single destination via multiple agents (encoders) in the presence of a passive eavesdropper. The agents have access to noisy observations of the source and are connected to the destination via noiseless rate-limited links. They do not cooperate or communicate to one another and are not required to estimate the source themselves. This scenario is of interest for many applications such as sensor networks or smart grid systems where reconstruction of the source at sensors and smart meters is not necessary. The distributed nature of such networks makes them more susceptible to eavesdropping. At each instant, the eavesdropper listens in on one of the links from the agents to the destination in order to obtain information about the source. In addition, it has access to side information correlated to the source. Since the link that will be compromised by the eavesdropper is unknown to the agents prior to their transmissions, each agent should protect its link in order to leak as little information as possible about the source. Our objective is to characterize the trade-off among agents’ transmission rates, incurred distortion at the destination, and the amount of information revealed to the eavesdropper. This setup can be viewed as the extension of the so-called CEO problem [7] in which secrecy constraints are considered.

The chief executive/estimation officer (CEO) problem was motivated in [7] by a communication and distributed processing system analogous to a scenario in which a firm’s CEO is interested in information of a source that cannot be observed directly. The CEO assigns a group of agents to independently observe a corrupted version of the source and communicate their observations. The lossless variant of this setup was initially studied by Gel’fand and Pinsker [8]. It was extended by Yamamoto and Itoh [9] as well as Flynn and Gray [10] to the lossy case with only two encoders for which an achievable rate-distortion region was derived. The model was generalized to the CEO problem with many encoders by Berger and Viswanathan [7] in which the trade-off between the end-to-end average distortion and sum of the rates at which the agents transmit to the CEO was studied. Multiterminal lossy source coding problems, including the CEO problem, are still open in general. However, for the special case of the quadratic Gaussian CEO problem [11], the sum-rate-distortion function for infinite number of agents with identical signal-to-noise ratios (SNRs) was derived by Oohama [12], and later, the complete rate-distortion region with arbitrary number of agents and SNR values was characterized by Prabhakaran et al. [13] and Oohama [14]. More recently, Courtade and Weissman [15] gave the rate-distortion region of the CEO problem under the logarithmic-loss distortion measure.

Secure lossless source coding with uncoded side information at the legitimate decoder and the eavesdropper was studied by Prabhakaran and Ramchandran [16] with the assumption of no rate constraint on the encoder-decoder link. The minimum leakage rate was derived and it was shown that due to the side information at the eavesdropper, the usual Slepian-Wolf scheme [17] is not always optimal. Lossless source coding with coded side information at the decoder (the so-called one-helper problem) and no side information at the eavesdropper was studied by Tandon et al. [18] where the rate-equivocation region was characterized. This setup was extended by Gündüz et al. [19] with additional side information at the eavesdropper in which inner and outer bounds on the compression-equivocation rate region were derived that did not match in general. Secure distributed lossless compression of two correlated sources, in which both sources were to be estimated at the decoder, was considered by Luh and Kundur [20] without side information at the eavesdropper and by Gündüz et al. [21] with side information at the eavesdropper. These models were generalized by Salimi et al. [22] to the case where both the legitimate receiver and the eavesdropper have access to correlated side information and the eavesdropper can choose to intercept either links from the encoders to the decoder at each instant. In [22], inner and outer bounds for the compression-equivocation region were provided which were proved to be tight for several special cases.

The extension to the lossy case was considered in [23], and more recently by Villard and Piantanida [26] in which inner and outer bounds on the rate-distortion-equivocation region were derived. The optimal characterization of the rate-distortion-equivocation region was first found in [24] for the lossy case with uncoded side information. Later in [26], the optimal characterization for the lossless case was also derived. A different setup was considered by Kittichokechai et al. [27] in which the eavesdropper can only access the coded side information, and the complete region was characterized under the logarithmic-loss distortion [15]. Chia and Kittichokechai [28] studied the case when the encoder has access to the side information of the decoder. Tandon et al. [29] considered a scenario with two legitimate receivers and investigated the privacy of side information at one receiver with respect to the other one. An alternative approach to provide secrecy in source coding problems is based on having a shared secret key between the transmitter and the legitimate receiver [30], although we do not exploit this approach in our work.

Our setup in this paper has two main distinctions from the aforementioned scenarios; first, the destination (CEO) is interested in estimation of the original source rather than the agents’ observations as in all prior works. Similarly, the secrecy constraints in our problem are on the equivocation of the eavesdropper with respect to the remote source, not to the observations of the agents. In fact, our setup is a generalization of the previous cases considered for lossy secure source coding problems. We extend our previous work [33] for the lossless variant of this problem to the lossy case and derive inner and outer bounds on the rate-distortion-equivocation region of the CEO problem with secrecy constraints. We also investigate the region in special cases where the bounds are tight and we show that for these special cases our results coincide with the previous results in the literature.

In addition, we consider the quadratic Gaussian CEO problem with secrecy constraints and provide the optimal characterization of the rate-distortion-equivocation region for the case when the eavesdropper has no side information and an achievable region for a more general setup with side information at the eavesdropper.

In this paper, we use capital letters to indicate a random variable, small letters to indicate realization of a random variable, calligraphic letters to denote a set, e.g., \CMmathcalX, and |\CMmathcalX| to indicate the cardinality of the set. The notation Xn denotes the sequence {X1,…,Xn}. The notion X−Y−Z shows that X, Y, and Z form a Markov chain, i.e., p(x,y,z)=p(x,y)p(z|y) or p(x,y,z)=p(x|y)p(y,z). We define \CMmathcalIM\coloneqq{1,…,M} for M∈N, and [x]+\coloneqqmax{0,x} for x∈R. Finally, \mathbbm1R>0(x):\CMmathcalX→{0,1} denotes the indicator function such that \mathbbm1R>0(x)=1 for x∈R>0, and \mathbbm1R>0(x)=0 otherwise.

The rest of the paper is organized as follows: In Section 2, we describe the problem along with some definitions. Main results for inner and outer bounds on the rate-distortion-equivocation region are presented in Section 3. Then, we study some special cases of our results in Section 4 where the region is completely characterized. The rate-distortion-equivocation region for the quadratic Gaussian case is given in Section 5. Finally, the paper is concluded in Section 6.

We consider the CEO problem with secrecy constraints as depicted in Figure 1. In this setup, two non-cooperative and independent agents have access to length-n observations Yn1 and Yn2, respectively, which are noisy versions of the source sequence Xn. These observations are conditionally independent given Xn. Each agent independently transmits a compressed version of its observation to the CEO over a rate-limited noiseless link. The CEO estimates the source sequence ^Xn based on the received information from the two agents. An eavesdropper, referred to as Eve, with access to side information En correlated to the source sequence Xn can eavesdrop only one of the links from the agents to the CEO at each time instance to obtain as much information as possible about the source. Therefore, agents’ transmission rates should be such that the CEO can reconstruct the source reliably within a certain mean distortion threshold while simultaneously the equivocation at Eve is maximized. Eve’s equivocation, with respect to either links, corresponds to her uncertainty about the original source when she combines her side information with the information obtained from the link. We assume that Eve cannot access both links simultaneously as the links are noise-free and in such case she would be more powerful than the CEO for estimating the source due to her additional side information. The sequences Xn, Yn1, Yn2, and En are independent and identically distributed (i.i.d.) according to joint distribution p(x,y1,y2,e)=p(x)p(y1|x)p(y2|x)p(e|x) over the finite alphabet \CMmathcalX×\CMmathcalY1×\CMmathcalY2×\CMmathcalE.

Let d:\CMmathcalX×\CMmathcalX→[0,dmax] be a finite distortion measure. We define the component-wise mean distortion between two sequences xn, ^xn in \CMmathcalXn as

The achievability scheme resulting in the inner bound is based on superposition coding and random binning at the agents, and joint decoding at the CEO. In particular, agent j first transmits the bin index related to the auxiliary random variable Vj with distribution p(vj|uj) via the noiseless link. Then, the agents send the remaining information which is required for the CEO to be able to reconstruct the source based on the Wyner-Ziv scheme [34]. The detailed proof is given in Appendix Section 7, however, we provide some intuitions on the results. Inequalities – and are similar to the Berger-Tung bounds [35] that establish perfect estimation of U1 and U2 at the CEO from which X can be reconstructed within the distortion limit D. In the equivocation bounds and , the first term corresponds to Eve’s uncertainty about the source after decoding the codeword vnj based on the received bin index combined with her side information and the second term is the reduction in her uncertainty when receiving the remaining information transmitted to the CEO by the agents. Finally, the last term in and stems from the fact that in contrast to previous works, the secrecy constraints are on Eve’s equivocation with respect to the original source while the transmitted information by the agents are functions of their respective observations and not the source, resulting in an increase in Eve’s uncertainty. Inequalities and depict a trade-off between Eve’s equivocation and transmission rates, implying that each link’s transmission rate limits the other link’s equivocation rate.

Table 1: Corner points of the inner region corresponding to different decoding orders: rates and distortion.

If Agent 1 has access to the source sequence Xn, our setup reduces to the lossy source coding problem with a helper and an eavesdropper who can choose to listen in on either source-destination or helper-destination links.

where the auxiliary random variables V1 and U1 satisfy the Markov chain V1−U1−X−(E,Y2).

The achievability proof follows from the proof of Theorem ? by setting Y1=X and V2=U2=Y2. Inequalities – are inactive for this setup. The converse proof is given in [26] for the secure lossy source coding with uncoded side information. Note that if Eve intercepts the helper’s link, it can also reconstruct the helper’s sequence Yn2 losslessly.

The achievability proof follows from the proof of Theorem ? by setting U1=Y1=X and V2=U2=Y2. The converse proof is similar to the proof given in [16].

The achievability is a special case of Theorem ? and obtained by setting V1 and E to be constants, U1=Y1=X, and V2=U2. The proof of converse is given in Appendix Section 11.

In this section, we study the Gaussian CEO problem with secrecy constraints and quadratic distortion measure.

Let X be a Gaussian source, i.e., X∼\CMmathcalN(0,σ2X). The observations at the agents are modeled as Yj=X+Nj for j∈{1,2}, with Nj∼\CMmathcalN(0,σ2Nj), where Gaussian random variables X, N1, and N2 are mutually independent.

First, we consider the case where the eavesdropper has no side information. The model is depicted in Figure 2 and the following theorem provides the complete rate-distortion-equivocation region for this Gaussian setup.

Figure 2: The quadratic Gaussian case with no side information at Eve.

An example of the region of Theorem ? is illustrated in Figure 4 for different distortion constraints.

Figure 3: An example of the rate-distortion-equivocation region for the quadratic Gaussian case with no side information at Eve and different distortion constraints.

Figure 4: An example of the rate-distortion-equivocation region for the quadratic Gaussian case with no side information at Eve and different distortion constraints.

Next, we consider the case where Eve has access to additional side information correlated to the source as shown in Figure 5. We model this side information as E=X+NE where NE is a Gaussian random variable with NE∼\CMmathcalN(0,σ2NE) and is independent of X, N1, and N2. The following theorem gives an inner bound for the rate-distortion-equivocation region of the quadratic Gaussian CEO problem with secrecy constraints and side information at the eavesdropper.

Figure 5: The quadratic Gaussian case with side information at Eve.

Note that if there is no correlation between Eve’s side information and the source, i.e., σ2NE→∞, the region of Theorem ? coincides with the one in Theorem ?.

We studied the extension of the CEO problem with secrecy constraints. This setup is of interest to communication scenarios such as sensor networks or smart power grids in which links are vulnerable to eavesdropping. We derived inner and outer bounds on the rate-distortion-equivocation region in the discrete case. We also showed that the results that were derived for the one-helper problem with secrecy constraints in [18] and [26] can be obtained as special cases of our results for the CEO problem with secrecy constraints. In addition, we provide the optimal region for the quadratic Gaussian case when Eve has no side information as well as an achievable region for a more general case. In this work, we have considered noise-free links from the agents to the CEO, however, it would be interesting to investigate the effects of noisy channels in this problem. Moreover, extending this problem to include more agents and eavesdroppers with possibly different side information is another direction worthwhile investigating.

We first state the following lemma that we use in the proof of Theorem ?. The lemma follows from [2].

Now, we proceed to prove Theorem ?.

Let V1, V2, U1, and U2 be random variables on some finite sets \CMmathcalV1, \CMmathcalV2, \CMmathcalU1, and \CMmathcalU2 according to the joint distribution p(x,y1,y2,e,v1,v2,u1,u2)=p(x)p(y1|x)p(y2|x)p(e|x)p(u1|y1)p(u2|y2)p(v1|u1)p(v2|u2), along with a function ^X:\CMmathcalU1×\CMmathcalU2→\CMmathcalX satisfying the conditions of Theorem ?.

Codebook generation

: For fixed conditional distributions p(uj|yj) and p(vj|uj), j=1,2, randomly generate 2n(I(Vj;Yj)+ϵ1) independent codewords vnj(sj) of length n according to ∏ni=1PVj(vj,i(sj)), where sj∈{1,…,2n(I(Vj;Yj)+ϵ1)}. Then, divide them into 2n(RVj+ϵ2) equal-sized bins, indexed by bj∈{1,…,2n(RVj+ϵ2)} and denoted by {\CMmathcalBj(bj)}. For each codeword vnj(sj), randomly generate 2n(I(Uj;Yj|Vj)+ϵ3) independent sequences unj(sj,s′j) according to ∏ni=1PUj(uj,i(sj,s′j)), and divide them into 2n(RUj+ϵ4) equal-sized bins, indexed by wj∈{1,…,2n(RUj+ϵ4)} and denoted by {\CMmathcalB′j(sj,wj)}. Define Rj=RVj+RUj for j=1,2. The codebook is revealed to the agents, CEO, and Eve.

Encoding

: Assume that the sequence ynj is observed by Agent j, j=1,2. Find a codeword vnj(sj) jointly typical with ynj. If there is more than one such codeword, select one uniformly at random. If there is no such vnj, select one out of 2n(I(Vj;Yj)+ϵ1) uniformly at random. Given vnj(sj), find a codeword unj(sj,s′j) jointly typical with ynj. If there is more than one such codeword, select one uniformly at random. If there is no such unj, select one out of 2n(I(Uj;Yj|Vj)+ϵ3) uniformly at random. The agent transmits the bin indices bj and wj of the codewords vnj(sj)∈\CMmathcalBj(bj) and unj(sj,s′j)∈\CMmathcalB′j(sj,wj), respectively, i.e., fj(ynj)=(bj,wj).

Decoding at the CEO

: Given the received messages from both agents, J1=(b1,w1) and J2=(b2,w2), find a unique index tuple (^s1,^s′1,^s2,^s′2) such that the codewords (vn1(^s1),un1(^s1,^s′1),vn2(^s2),un2(^s2,^s′2)) are jointly typical, and they are in the bin indexed by (b1,w1,b2,w2). If there is such a unique index tuple, compute the source estimate component-wise as ^xi=gi(J1,J2)\coloneqq^X(u1,i(^s1,^s′1),u2,i(^s2,^s′2)) for i=1,…,n; otherwise set the output to an arbitrary sequence in \CMmathcalXn.

Error analysis

: Let (s1,s′1,s2,s′2) and (^s1,^s′1,^s2,^s′2) be the chosen indices at the encoders and decoder, respectively. Let Pr(E) denote the probability of an error event during encoding and decoding steps. We now show that this probability, averaged over all possible codebooks, tends to zero as n→∞ provided that conditions of Theorem ? is satisfied. Consider the following error events in the encoding steps (for j,j′=1,2, and j≠j′):

Finally, by the union of events bound, the probability of error in the encoding and decoding steps is upper bounded as

Pr(E)≤Pr(8⋃t=0Et)≤8∑t=0Pr(Et).

We proceed to bound each term in . From properties of typical sequences, Pr(E0) vanishes as n→∞. By covering lemma [39], Pr(E1) and Pr(E2) tend to zero as n→∞. For j,j′=1,2, and j≠j′, since {Ynj′|Vnj(sj)=vnj,Ynj=ynj}∼∏ni=1p(yj′,i|yj,i), by conditional typicality lemma [39], Pr(E3) tends to zero as n→∞. Similarly, as {Ynj′|Vnj(sj)=vnj,Unj(sj,s′j)=unj,Ynj=ynj}∼∏ni=1p(yj′,i|yj,i), Pr(E4) also vanishes as n→∞. To bound Pr(E5), let (vnj′,unj′,yn1,yn2)∈\CMmathcalT(n)δ(Vj′,Uj′,Y1,Y2). Then, Pr(Vnj(sj)|Vnj′(sj′)=vnj′,Unj(sj,s′j)=unj′,Yn1=yn1,Yn2=yn2)=p(vnj|ynj), and by Markov lemma [39], Pr(E5) tends to zero as n→∞. Using similar steps and based on Markov lemma, Pr(E6) and Pr(E7) also tend to zeros as n→∞.

As can be seen from E8, in the decoding step, an error occurs if the decoded codewords are jointly typical and they are in the bin indexed by (b1,w1,b2,w2), however, the decoded tuple (^s1,^s′1,^s2,^s′2) of codeword indices are different from the chosen ones at the encoders, i.e., (s1,s′1,s2,s′2). We split this event into eight possible events (other events result in the same constraints as one of these eight events) and bound its probability using the union of events bound as follows: