I think it might be worth trying polar coordinates, so that each new point is a random distance and direction form the last. Then you can tweak your distance distributions to suit your purpose. If you need to restrict the total area, you could wrap the 2d space or just ignore any points that fall outside.

Math::Trig has handy polar conversion functions, so it would be easy to give it a try.

I used to use a "density map" generator back in the day. Basically each time you want a random coordinate, you generate the coordinate plus one extra random number. You then check the number against a density function, and if the extra random number is less than the density at that location, you return the coordinates. Otherwise you try again:

So by defining an appropriate density map function, you can create many types of distributions. The disadvantage is that your density map function may reject too many candidates, slowing things down. (The density_broken function, for example, is quite useless in this regard.)

The reason that I thought you might like it is this: For testing a project, I would create distinct distributions by simply drawing an image with MSPAINT or similar and loading that into a lookup table and using a function like density_lookup.

I don't know your objections to your own suggestion, so I don't know if this one is interesting or not, though.

Basically, it is too directed. That is, it requires parameters to be chosen -- the ratio between clustered picks and non-clustered; the size of clustering subrange; etc. -- which means I would essentially be choosing what to test and thus excluding anything I haven't thought of.

so I don't know if this one is interesting or not

This is very interesting. I particularly like the idea of using images -- whether hand-drawn, or grabbed at random from an image search -- to bias the picking process. It has so many possibilities ...

Eg. grab a random image, process the image with a filter to reduce it to a just points of a particular color or hue; or maybe use a Conway's Life type process to manipulate the pixels until groups of similar hues the reduce to single points; or a dozen other ideas; and then use those points as my dataset.

The only problem with the idea is that it has triggered so many possibilities for investigation, I might never get back to testing the algorithm :) Thanks for kicking down the doors on the box of my thought train!

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Anyway, one problem with this approach is that the ratio of a discarded points could be too high.

In that case, a more efficient way may be to divide the plane in regions (i.e. triangles), calculate the probability of every region and then generate the random points first picking a region and then a point inside the region with your proposed algorithm using the conditioned density function.

Yes, the discard rate can be a problem. For the table lookup version in 2D, I had a speedup that worked decently: Generate the first coordinate, then use the same technique to generate the remaining coordinate. It can still be a problem, though, in the event you choose a "mostly black" line, but I never needed made anything better than that.

I also tried to make a "transformational" technique that wouldn't reject any coordinates, but never got it working well enough to use. (Rather: the technique worked fine, but coming up with the warping functions for the project was more difficult (for me, at any rate), so I simply tended to run the density-mapping version before going to lunch, going home for the day, etc.

The intention was: Generate a random coordinate, and "remap" it based on a displacement function. I was hoping to be able to turn a density function into a space warping function. The difficulties I had were primarily coming up with functions to warp space appropriately, and ensuring that I could hit any point in the desired output space without too much overhang. (If the function moved the point outside the desired range, you had to reject the point and try again anyway.)

Hmm, I had a similar problem a couple of years ago, trying to test worst case scenarios, although in a totally different context. I don't claim at all to be an expert on this kind of things, but my feeling is that if you want to test worst case scenarios, then you have to bite the bullet and accept that your input data is not going to be random, and there is nothing wrong with that since you are specifically looking for not random situations. Then, of course, the problem is that you don't necessarily know for sure what is the true worst case scenario, sometimes it is actually very difficult to find out. For example, Shell sort and comb sort are known to be fairly efficient sorting algorithms, but they are very rarely used because nobody seems to really know what their worst case scenario could be and it is therefore difficult to assess their worst case complexity.

Interesting question. You might want to be more explicit about exactly what characteristics you want in this distribution, i.e. what phenomenon generates your data. For example, I'm not a physicist, but for beams of particles with varying intensities hitting a detector plate subject to random noise:

Cluster means distributed uniformly over a rectangular area.

Number of points per cluster following an exponential distribution (or whatever the beam intensities follow).

Points in each cluster distributed as a 2D Gaussian. (With variance fixed or chosen according to some distribution?)

Number of clusters either fixed or chosen according to some distribution.

You might want to be more explicit about exactly what characteristics you want in this distribution,

That's hard. Mostly because I definitely do not want any formally defined distribution. Almost exactly the opposite in fact.

The problem with "random" is that on average, the points will be evenly distributed over the range (2D plane in this case). That's almost the exact definition of PRNG.

Whilst with enough iterations, all possible distributions -- including those where a majority of the points in the sample size tend to be grouped or clustered on one side or in one corner of the plane -- with even a relatively small plane, (500,500) and sample size 100 -- there are so many 'roughly even' distributions and so few 'lopsided' distributions, that I'd need to run billions of sets to ensure I'd tested a few of the lopsided ones. That's not practical.

So, I'm looking for a way to induce lopsided -- which pretty much means not 'roughly evenly distributed' -- distributions, without prescribing where or why the concentrated and sparse bits will be.

I can't think of any better description than: I want lopsided distributions that are completely randomly generated.

Not good I know, but its the best I've got.

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

Do you have sample data you can perturb, mix, or otherwise use to generate test data?

I want lopsided distributions that are completely randomly generated.

Hm... Maybe transform your PRNG through a random, monotonic, nonlinear mapping? e.g. generate a piecewise-linear function (or spline) in each dimension with steps taken from the PRNG, then generate uniform random points and apply the function to them. I suspect a Real Statistician would scoff, but I am not such a person.

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

It can "plug in" (and actually also provides) different types of distribution function which may better serve your needs.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

I wanted an "aesthetic" distribution of points over a plane, so I wrote Random::PoissonDisc. This module tries to distribute points "equally" across a plane with a minimum distance between each other.

At the release time of the module, I wrote a blog post on Random::PoissonDisc which has graphics showing the distribution differences of "white noise" random and the "blue noise" random that Random::PoissonDisc produces.

If the distribution properties of Random::PoissonDisc match your requirements, you will likely want to reimplement the module using PDL if you want better performance. The paper which outlines the algorithm is also linked from the module documentation.

Actually, that seems to be almost the antithesis of what I'm after. It seeks to restrain the randomness to a regular(ish) grid pattern. My problem is that randomly picking points in the plane is too uniform.

I want clumps, but I want them at a larger scale than they form naturally from purely random distribution.

Try to find pictures of copulas, google "copula plots". Those should give you ideas how you want your data to look like. Once you decide for one, there usually is an algorithm to generate the data corresponding to that copula.

Sounds like you want to a number of distributions, from "one big clump" to "almost uniform", at random.

What about generating several "poles" as clump attractors, each with a random weight for attractiveness. Then let each point "roll" down hill according to the attractor weights and distances. So generate 3D coordinates, and use the extra value as the weight. Something like this:

You can also play with variations of roll_down_hill to try different distributions, going from sub-linear to exponential (or maybe some of each), trying a host of different functions for transformations. And don't forget you can play with the distributions wherever you take a random number, including the number of attractors and their weights. For instance, you might find that fewer attractors are more likely to cause worst case behavior, and bias the number of attractors in the small direction.

Here's another idea that occurred to me, so I thought I'd toss it out--build a function that generates a coordinate and then randomly maps it into a "clump" for you. In order to prevent any of your area from being ignored, there's also a chance that it won't be mapped to a clump:

The good point is that it's simple and fast. It's a proof-of-concept, though, and I didn't do any work to ensure that the clumps don't extend beyond the desired range. If you like the idea, then that task is left as an exercise for the reader ;^D.

Update: I forgot to mention: The clumps in this version are rectangular. Changing their shape is possible, but I didn't bother doing that, either.

A little late to the party, but maybe interesting. The following code generates a random set of "attractors" which tend to suck near by randomly generated points closer to the attractor. Attractors have a radius which limits their effect. Nearby attractors fight with each other which results in oddly shaped clumping, which is most likely a desirable outcome.

I played with this a little and it definitely produces clumps. The problem with it (for my purposes) is that it always leaves a background of relatively uniformly distributed points, and I couldn't see any way of preventing that.

For my purpose, I really want clear space around and between the clumps, which the weight map method both achieves and gives fine control over.

However, your "attractors" idea sparked another idea in me -- completely unrelated to OP purpose -- but intriguing enough to sidetrack me for a day or so.

What happens if you treat a set of clumpy random points as equally sized particles of some finite mass, and have them all exert "gravity" upon each other according to the inverse square of their distance.

Tada. The red dots are the starting position, the cyan their position after a single round of attraction, and the blue, after the second.

Ignore the ones that 'sling-shot' off into outer space; the effect of the inverse square law is that in order to get detectable influence over any distance, you have to set the mass of the points quite high, with the consequence that once they get very close to each other, they exert huge forces that results in sling shots because there are no collisions.

Over many generations, the gravity will cause all the points to come together (and then be scattered in all directions), but the first few generations have the effect of concentrating whatever clusters already exist. I wonder if this wouldn't be a fruitful technique for tackling the NP-hard clustering problem without resorting making assumptions per the K-means type of algorithm?

With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

When putting a smiley right before a closing parenthesis, do you:

Use two parentheses: (Like this: :) )
Use one parenthesis: (Like this: :)
Reverse direction of the smiley: (Like this: (: )
Use angle/square brackets instead of parentheses
Use C-style commenting to set the smiley off from the closing parenthesis
Make the smiley a dunce: (:>
I disapprove of emoticons
Other