I'm trying to build a cosine locality sensitive hash so I can find candidate similar pairs of items without having to compare every possible pair. I have it basically working, but most of the pairs in ...

I am reading Algorithms 4th edition by Robert Sedgewick and am stumped at a particular problem. On page 460 of the book the author is describing a technique to hash strings and use prime numbers for ...

In some textbook problems, the problem asks me to insert a bunch of elements and remove a bunch of elements.
Insertion is done using linear probing, i.e. h = XmodR + I, when there is a collision, I ...

I was just curious as to if there are (or could by) any hash values that are impossible to compute due to the implementation of the algorithm. For example, SHA-256 produces a value that is 256-bits ...

"I was asked this question in an interview, How to design a bloom filter to reduce cache access in a high load and concurrency environment. I know how bloom filter works but try to understand how high ...

I am doing an exercise from a Big Data course I'm taking on Coursera (this exercise is for experimenting with a big-data problem and is not for any credit or homework) , the assignment was described ...

More precisely: The input is set of $M$ sets (most likely stored sequentially on disk) that contain partitions of the set $\{1..N\}$. I want to efficiently (as far as memory and time complexity goes) ...

Find the approximate number of messages (n) that need to be tried
before finding two that had the same message digest (size k) with probability 0.8.
You need to find n as a function of k . What is n ...

With conventional collision resolution methods like separate chaining and linear/quadratic probing, the probe sequence for a key can be arbitrarily long - it is simply kept short with high probability ...

The longest common substring (LCS) of two input strings $s,t$ is a common substring (in both of them) of maximum length. We can relax the constraints to generalize the problem: find a common substring ...

A Bloom filter makes it possible to efficiently keep track of whether various values have already been encountered during processing. When there are many data items then a Bloom filter can result in ...

I was reading the following FDS paper:
https://www.usenix.org/system/files/conference/osdi12/osdi12-final-75.pdf
and it says that the the following hash function does not distribute things uniformly ...

(This question is related to homework)
I am doing a cryptography course via long distance and we have been given an assignment which is based on lattice-based cryptography. I have spent the majority ...

I'm not sure how to word this because I'm not familiar with this, but I'm sure a process like this is rather common.
Basically, I've got members signing up for our website, and each one is assigned a ...

I want to create a fast way to detect whether a file might or might not be the same. For almost 100% sureness I would use an existing hash algorithm, e.g. SHA256.
However, the files are expected to be ...

I need to process queries to Hash various ranges of a character array. I am currently using the Arrays.hashCode from the standard java library. But the problem is that this method is too slow. Also my ...

I'm reading about MinHash technique to estimate the similarity between 2 sets: Given set A and B, h is the hash function and $h_\min(S)$ is the minimum hash of set S, i.e. $h_\min(S) = \min(h(s))$ for ...

The goal is to distribute approximately 100 million variable length strings, average length 100 characters, uniformly among 100 million buckets. Perfection not required, just no egregious clumping. ...