Picking a random Key and Value from a Map?

Here's a seemingly trivial coding task that I just can't seem to find an elegant solution for. Perhaps it's a Friday afternoon 'dumbness' thing but I'm just not seeing it.

I have a Map that represents a set of names for a group:
With content that looks something like this:
What I'd like is an elegant way to extract a random Key Value pair from that Map. For example:
There is no requirement for any strong randomness or any even distribution of the chosen values. A rough approximation is all that is required as I'm just generating some test data for a performance benchmark I'm writing for a new feature I've written at work.

Just the String value of some Map key and the String value of one of the Map value Set.

Stephan van Hulst wrote:

If an entry, do changes to it need to be reflected in the map?

No. The map mustn't change.

Stephan van Hulst wrote:

Are entries ever removed from the map?

No

Stephan van Hulst wrote:

Can we depend on a specific map implementation?

Yes, any you choose.

Tim Driven Development

Stephan van Hulst

Saloon Keeper

Posts: 8601

161

posted 11 months ago

My preferred way would be to implement a custom Map that delegates all calls to a backing Map, but also maintains a List of keys in parallel. Extend it with a method getRandomEntry() that picks a random key from the list, uses it to get the associated value, and returns a new AbstractMap.SimpleEntry.

The mind is a strange and wonderful thing. I'm not sure that it will ever be able to figure itself out, everything else, maybe. From the atom to the universe, everything, except itself.

Stephan van Hulst

Saloon Keeper

Posts: 8601

161

posted 11 months ago

This may seem a little overkill, but you can also use it to add other methods you might be missing. It has the same properties as a LinkedHashMap because it's based on one, including the time and space complexities of all methods.

The mind is a strange and wonderful thing. I'm not sure that it will ever be able to figure itself out, everything else, maybe. From the atom to the universe, everything, except itself.

Stephan van Hulst

Saloon Keeper

Posts: 8601

161

posted 11 months ago

Stephan van Hulst wrote:It has the same properties as a LinkedHashMap because it's based on one, including the time and space complexities of all methods.

Except the put(), remove() and related methods (e.g. putIfAbsent()); those have the time complexity of ArrayList.contains() and ArrayList.remove() respectively.

The mind is a strange and wonderful thing. I'm not sure that it will ever be able to figure itself out, everything else, maybe. From the atom to the universe, everything, except itself.

Jeanne, although it probably doesn't make a difference in this particular exercise, I'm uncomfortable with the statistical distribution of your random choices by calling nextInt() twice. For example, I've modified the loading of groups in your code.
Using the randomKeyIndex you have a roughly 1 in 2 chance of selecting either "Gp1" or "Grp2". So let's say it's Grp1, then with randomValueIndex each String in the list has a 1 in 10 chance of being selected, which gives us a 1 in 20 chance that selection "a" would be chosen. Conversely, If Grp2 is selected there's a 1 in 1 chance that "K" will be chosen, or 1 in 2 overall chance. This means that the random distribution over all items in all the lists would not be equal.

The most difficult part of wild life photography is getting them to sign the model release.

Stephan van Hulst

Saloon Keeper

Posts: 8601

161

posted 11 months ago

In most solutions the core idea is to use a separate list of keys. I guess the preferred solution depends on whether speed or memory usage is more important:

1) Keep track of a list of keys for the entire lifetime of the map: O(1) random entry lookup time, but doubles the memory used to hold keys for the duration of the map's life.
2) Generate a list of keys when a random entry is needed: O(n) random entry lookup time, only doubles memory used to hold keys for the duration of the lookup.

The mind is a strange and wonderful thing. I'm not sure that it will ever be able to figure itself out, everything else, maybe. From the atom to the universe, everything, except itself.

Piet Souris

Rancher

Posts: 2455

80

posted 11 months ago

Stephan van Hulst wrote:In most solutions the core idea is to use a separate list of keys. I guess the preferred solution depends on whether speed or memory usage is more important (...)

For me, elegance of the solution is much more important. If creating many extra objects is a problem, only then is it time to look at something less elegant or something more efficient. (admitted: elegance is pretty subjective, I know).

Mind you, as Carey already pointed out, if you pick out keys at random, let the chance of picking a key be dependant on the length of the value in some way.

Carey Brown wrote:Using the randomKeyIndex you have a roughly 1 in 2 chance of selecting either "Gp1" or "Grp2". So let's say it's Grp1, then with randomValueIndex each String in the list has a 1 in 10 chance of being selected, which gives us a 1 in 20 chance that selection "a" would be chosen. Conversely, If Grp2 is selected there's a 1 in 1 chance that "K" will be chosen, or 1 in 2 overall chance. This means that the random distribution over all items in all the lists would not be equal.

This was my rational for using my "explode()" approach; it solves this issue. However, I really like Piet's Java-8 approach and will modify it here to work lilke explode().

The most difficult part of wild life photography is getting them to sign the model release.

Hello everyone, sorry for the delay in reply here I posted this Friday just as I was packing up work for the week and it's been really good weather over the weekend so have spent most of it outside in the sun away from the computer. I'm back at work now so am ready to put some of these suggestions into action.

Henry wrote:Can we also give out cows too? Some of the solutions are cool

Absolutely! Distribute your cows as you wish. The solutions are cool.

Jeanne wrote:I assumed the sample data was represented and each key had the same number of values.

Ah sorry Jeanne, my sample data does suggest that but it's not representative in that way. Each key can have one to many values.

Lots of effort has gone into helping me over the last few days and I appreciate it very much. Cow time!!!

I've had some more time to mull this one over and have come up with a solution that works well enough for me.

The solutions presented here are really cool, especially the Java 8 Stream ones inspired by Piet. I have settled on a slightly different approach though. The dataset shown is a set of 'valid' data fields that can make up part of a customer Order object and for a test I'm writing I want to generate lots and lots of customer Orders that contain valid data. It doesn't matter what data it is, just that it's valid and not all the same. For this I want to select a random Key and Value from the valid data.

Given that the "getRandom" method is the heavily used on I wanted to keep that really simple so have gone with a two stage solution.

1) Flatten Map<String, Set<String>> to List<List<String, String>>, where each List item is a List containing exactly 2 items (Essentially like Piet's Pair without the dedicated Pair class, coz I'm lazy).
This turns data structured like this:Into data structured like this:
2) From the flat List structure I can select a Random index very cheaply many times over.

That's one million Orders generated with random, but valid, data.

Tim Driven Development

Piet Souris

Rancher

Posts: 2455

80

posted 11 months ago

And a cow for you for this also lovely solution.

But I must confess to feeling a bit embarrassed at the moment. I like all these flatmap java 8 streams, find them very elegant, but yesterday something sprang to mind.
I was following the Coursera course of fuctional programming with Scala, by Martin Odersky, slightly over three years ago (fantastic course, by the way! Wholeheartedly recommended),
and in lesson six the problem was to get Pair(i, j)'s with 1 <= j <= i < n, for which i + j was prime. This is part of the slide:

This works, but makes most people's head hurt. Is there a simpler way?

And next we were introduced to the powerfull scala for comprehensions. Now, I don't know about my fellow students, but I cheered loudly. These for's were so much easier than these headache causing incomprehensible flatmap filter business...
So yesterday I dusted off my eclipse with scala, started it and came up with this:

And now I don't know anymore what is elegant or not... My WorldView has collapsed.

This clearly went to sleep a while ago, and has many very imaginative answers. I also don't know what you deem "elegant", but I didn't notice the "obvious" (I could have missed it, given so many) among them (which I hope has the merit of being readily understood by junior programmers, something I think we tend to undervalue in the corporate world!). Herewith, and hoping I didn't misunderstand the requirements: