For the sake of example, suppose we have a list of advertisements $\{A_i\}_{i=1}^n$, each of which have parameters $I_i$: the number of impressions, and $C_i$ the number of clicks. Then $C_i/I_i$ denotes the click-thru rate of advertisement $i$.

I'm writing an MAB Thompson sampler, whose job is to sample from advertisements $\{A_i\}$. This works by sampling a random values according to:

$$\theta_i\approx \mbox{Beta}(\alpha+C_i-1,\beta+I_i-C_i-1),$$

after which we choose advertisement $A_s$, where $s=\mbox{argmax}_i(\{\theta\}_{i=1}^n)$.

Due to security reasons, I can't explicitly show $I_i,C_i$ anywhere in the code (this is happening client-side). So I was thinking that instead of delivering $I_i,C_i$, I could deliver transformed versions $\tilde{I}_i,\tilde{C}_i$ of these, and exploit some of the equivalence properties of beta distributions. My worry is that this isn't enough: if you know the true values must be integers, then maybe there's a way to recover the original values. Suggestions?

$\begingroup$Maybe just add some noise to $I,C$ ? If reasonable amount of noise is added, then this should not have a great impact on results. You possibly would need to check (e.g. by simulation) how much noise is "acceptable" and if it adds enough secrecy. On another hand, this would still give your adversary an occasion to learn about approximate values of $I,C$. You would also need to clarify what exactly will be visible to adversary (e.g. previous values of $I,C$, or priors, or he sees only the current state?). Interesting question, but at first sight it seems rather hopeless.$\endgroup$
– Tim♦Aug 29 '17 at 22:35

$\begingroup$@Tim: I like the noise idea, and I figure I can obfuscate it like crazy if I also make it vary with time in a deterministic way. It's a bit risky though because $I/C$ tends to be pretty small (say 0.01), so the noise would be minimal at best$\endgroup$
– Alex R.Aug 29 '17 at 22:55

$\begingroup$@Tim: Concerning visibility, we can assume that at some point there's a call of the beta function above, with some set of parameters. Right now they are plainly deobfuscated as $I$ and $C$. We can assume those parameters are visible and in theory an adversary could collect parameters over time.$\endgroup$
– Alex R.Aug 29 '17 at 22:56

$\begingroup$If $I/C$ are small, then the probability of sampling such class would be also small. I'm not an expert in multiarmed bandits, but in many cases you would not like it to be close to zero since you would want some chance of testing each case. So there shouldn't be any problem with replacing those values with something that is greater then the initial value.$\endgroup$
– Tim♦Aug 30 '17 at 9:34

1 Answer
1

The problem with using any symmetries is that they are symmetric, so you can easily reverse them, so this sounds rather like a security through obscurity strategy. It would be enough for your adversary to learn what kind of symmetries did you use, to decode your data.

The simple obfuscation that can be used in here is to add noise to the values of $I_i, C_i$, where the amount of noise is proportional to the altered values. If reasonable amount of noise is added, then this should not have a great impact on the results. You possibly would need to check (e.g. by simulation) how much noise is "acceptable" and if it adds enough secrecy. On another hand, this would still give your adversary an occasion to learn about approximate values of $I_i, C_i$.

In the comments you worry that the $I_i/C_i$ ratio tends to be very small and this could lead to problems. First of all, if you alter the values proportionally to their initial values, this should not have much impact. Second, if your aim is to sample probabilities, so to randomly allocate advertisements, then if the "noisy" probabilities would be slightly larger then the true value, this would lead displaying the adds more often then it is needed. In multi-armed bandit scenario this would lead to adding additional layer of randomization to allocating the adds and this does not have to be bad. If they would be smaller, this should not have impact as well since you probably won't choose those adds as well. It is a accuracy vs privacy trade-off, that you are going to face.

The idea of adding noise to the data is related to differential-privacy, so you could read more about the theory behind it.