Recommended Posts

Hi all,
I'm sure this shouldn't be too hard, but for whatever reason my brain just isn't clicking today [headshake]
I have a 10 component vector where each element is a floating point value ±X (X could be 1.0, 3.0, 10.0, 500.0 etc..)
I want to pick N evenly distributed samples as a seed for a searching algorithm. The idea being that if they're evenly distributed then I should have a fairly unbiased search that can cover most (or all) of the search space.
In context, I think my current randomly sampled approach introduces a bias if/when it clusters the initial samples too close together.
Anyone got any ideas?
Cheers,
Jack

0

Share this post

Link to post

Share on other sites

Assume rand() is a function that generates uniform random numbers between zero and one. You want those numbers to fit into the range [-X,X], so use X*(1.0 - 2.0*rand()). Do that for each element in your array, using the appropriate X for each element.

0

Share this post

Link to post

Share on other sites

Original post by jjdAssume rand() is a function that generates uniform random numbers between zero and one. You want those numbers to fit into the range [-X,X], so use X*(1.0 - 2.0*rand()). Do that for each element in your array, using the appropriate X for each element.

Thats pretty much what I have already. I'm using a Mersenne Twister for PRNG's, so it should be uniformly distributed.

As an example of what I'm thinking...

If you had a two-dimensional search space and wanted 9 evenly distributed samples then you could easily work that out as being a 3x3 grid:

#-------#| + + + || + + + || + + + |#-------#

It's expanding that concept to N samples over a 10D space [smile]

Cheers,Jack

0

Share this post

Link to post

Share on other sites

Original post by jjdAssume rand() is a function that generates uniform random numbers between zero and one. You want those numbers to fit into the range [-X,X], so use X*(1.0 - 2.0*rand()). Do that for each element in your array, using the appropriate X for each element.

Thats pretty much what I have already. I'm using a Mersenne Twister for PRNG's, so it should be uniformly distributed.

As an example of what I'm thinking...

If you had a two-dimensional search space and wanted 9 evenly distributed samples then you could easily work that out as being a 3x3 grid:

Share this post

Link to post

Share on other sites

A Sobol sequence would be sexy... [smile] but my first port of call would be the simple random search. Having said that, you might be able to improve on a simple random search like that by using adaptive monte carlo or markov-chain approaches. They're a bit more complicated, but not much. However, it depends upon whether you can get some kind of likelihood function to assist in the search, i.e. some measure of whether the thing you are searching for is likely to be closer or further away.

Share this post

Link to post

Share on other sites

Original post by jollyjeffersIf you had a two-dimensional search space and wanted 9 evenly distributed samples then you could easily work that out as being a 3x3 grid:

#-------#| + + + || + + + || + + + |#-------#

It's expanding that concept to N samples over a 10D space [smile]

That concept in reverse is like saying "I want 3 samples on each axis, and to fill a 2-dimensional space". The number of samples you need is 3^2 = 9.

If you have a 10-dimensional space, and you want 3-samples on each axis, you'll need 3^10 = 59049 samples in total.

Going the other way, from a number of samples to where to place them, is obviously much harder, but the dificulty does not lie in the fact that you are dealing with a space of greater dimensionality. What would you do if, to copy your earlier example - "you had a two-dimensional search space and wanted 7 evenly distributed samples".

Put another way, is there any flexibility on the number of samples? If so, you could neep a number of "patterns" and choose that which had a number of samples closest to that requested. If you are wanting a solution that will work for any arbitrary number of samples, I beleive the problem is too general - your best bet will almost certainly be sticking to random samples.

Share this post

Link to post

Share on other sites

Perhaps I'm missing something, but doesn't selecting N evenly distributed samples in a 10D space means thatn := ⌊N1/10⌋samples per dimension are available if all dimensions show the same range and N is a upper bound, or n := ⌈N1/10⌉if N is an lower bound?

Thinking of each sample to "cover" the same area, the rangeRof each dimension is to be divided up into just n slices, so that each sample is in the center of that slice:si := ( 0.5 + i ) R / n, 0 <= i < n

(That would become interesting if the ranges of each dimension are not the same...)

Quote:

Original post by jjdConceptually I would agree, but performance-wise it is better to use random sampling that creating a grid-like pattern. This is the so-called "curse or dimensionality."

I've understood the OP asking for a grid due to:

Quote:

Original post by jollyjeffersThe idea being that if they're evenly distributed then I should have a fairly unbiased search that can cover most (or all) of the search space.

In context, I think my current randomly sampled approach introduces a bias if/when it clusters the initial samples too close together.

He asked for evenly distributed samples. Given that he also mentioned random samples, I do not think he was restricting himself to a deterministic method.

Okay, reading the OP another time gives you right.

I remember the days of programming a ray-tracer ... There was a method used to avoid lumping during (more or less) randomly sampling the scene. The basic method was to use a regular grid and to add some jitter to the sample positions. There was also an extension with a distance measure to guarantee a minimum distance. But AFAIK that method was already costly for 2 dimensions...

If you are wanting a solution that will work for any arbitrary number of samples, I beleive the problem is too general - your best bet will almost certainly be sticking to random samples.

I think you might be right here. The number of samples is semi-variable... as in, it can be changed on different invokations, but once the algorithm starts it can't vary the number itself. Specifically, if its told to use 100 samples then it should use 100 - not 103 (etc..)

Quote:

He asked for evenly distributed samples. Given that he also mentioned random samples, I do not think he was restricting himself to a deterministic method.

My original post was possibly a bit misleading... I did want a deterministic method - my current approach is (theoretically) an evenly distributed random method.

To be honest, I'm not entirely sure what I wanted to start with [smile]

A problem I'm having is, when picking other parameters, each invokation of the algorithm is generating a new random distribution - thus the starting point is not the same. Whilst the results are roughly comparable, it'd be nice if they all started in the same state thus comparing apples-to-apples. This thought led me to devising an algorithm to distribute the samples correctly/evenly each time, and ended up leading to this thread...

Thanks for all your help so far - some interesting ideas. I think I'll just stick with my random sampling at the moment. Maybe I could improve on it, but it's not strictly "wrong" so I'll leave it for now...

Cheers,Jack

0

Share this post

Link to post

Share on other sites

Original post by haegarrI remember the days of programming a ray-tracer ... There was a method used to avoid lumping during (more or less) randomly sampling the scene. The basic method was to use a regular grid and to add some jitter to the sample positions. There was also an extension with a distance measure to guarantee a minimum distance. But AFAIK that method was already costly for 2 dimensions...

I've used that trick too, but in many dimensions I believe it will be quite bad. However, I really like your original idea where you want to find a well distributed set of N points in the space. Perhaps finding the points that are maximally separated. Sounds tricky, but interesting.

In that case, if you are interested, I would suggest checking out the Metropolis-Hasting algorithm.

[Edit]Metropolis-Hasting's is a Monte Carlo, Markov Chain method but it also falls under the name of simulated annealing. You can think of your f(x) as a likelihood function: The greater the value the more likely you have found the point you are looking for.

[Edit again] Err, for minima it should be "the lesser the value the more likely..."

0

Share this post

Link to post

Share on other sites

A hashing algorithm based on scene data could also be used to create the random generator seed, if one decides to rely on sample point sets generated per-frame. That way the same frame always generates the same sample point set.

There are a bunch of GPL MD5 C++ implementations out there.

To make the 128-bit MD5 hash key fit through a 32-bit random number generator, it can be chopped up into four 32-bit double words.

Feed the first dword in as the seed, then generate a random number A, repeat using each of the three remaining dwords, storing the random number in its respective random number variable B, C or D.

Calculate the arithmetic mean for A, B, C and D to obtain the final seed value for the frame.

0

Share this post

Link to post

Share on other sites

Original post by jjd[Edit]Metropolis-Hasting's is a Monte Carlo, Markov Chain method but it also falls under the name of simulated annealing. You can think of your f(x) as a likelihood function: The greater the value the more likely you have found the point you are looking for.

[Edit again] Err, for minima it should be "the lesser the value the more likely..."

Good point, bringing up simulated annealing. I agree that the possibilities are there.

Relaxation techniques could also be used to numerically generate an approximately even distribution. :)