I encountered this problem while doing a little code golfing so for interests sake it would also be nice to see both a simple solution and/or an efficient one, not so much interested in my existing code.

To understand this, first consider the case l == 1. Then it's basically just returning a random.sample() of the input data in sorted order; in this case the offset variable is always 0.

The case where l > 1 is an extension of the previous case. We use random.sample() to pick up positions, but maintain an offset to shift successive results: in this way, we make sure that they are non-overlapping ranges --- i.e. they start at a distance of at least l of each other, rather than 1.

Very nice. The only thing that can be done faster (but not as elegant) is the sorting which, for this case, can work in O(n) time and O(n) space. But in Python, without real arrays, this point is rather meaningless.
–
nickieSep 5 '13 at 19:39

This works perfectly and random.sample handles the error case I was handling in my code. I guess this might seem simple but I'm having trouble understanding what len(seq) - (l - 1) * n and how you determined it.
–
RussWSep 5 '13 at 21:01

@RussW: how it works it's already stated in the answer... let me try reword it. The case of L=1 is of course trivial, just do a random sample and you're done. Now think to all the other elements as "extras" and just ignore them for a while and concentrate only on the "heads" of the subsequences. Those "extra" elements are n*(L-1) so you take them away, random-chose the "heads" and then compute the result by adding the proper "tails". Don't feel ashamed for not understanding it immediately, it happens very often when talking with Armin... he does his best but he's just too smart :-D
–
6502Sep 6 '13 at 6:43

@6502 Thanks. So (l - 1) * n represents the sum of the tails, random.sample randomly selects the indices for the heads from a range of len(seq) - sum of tails. I'm just trying to visualize it so I can easily arrive at it in future. It is actually quite simple after all.
–
RussWSep 6 '13 at 9:00

Many solutions can be hacked for this problem, but one has to be careful if the sequences are to be strictly random. For example, it's wrong to begin by picking a random number between 0 and len(seq)-n*l and say that the first sequence will start there, then work recursively.

The problem is equivalent to selecting randomly n+1 integer numbers such that their sum is equal to len(seq)-l*n. (These numbers will be the "gaps" between your sequences.) To solve it, you can see this question.

This doesn't give equal probabilities. Try for example: a = [1, 2, 3]; count = {(1,2):0, (1,3):0, (2,3):0}; for i in range(2000): [x], [y] = rand_parts(a, 2, 1); count[x, y] += 1; print count. It picks the solution [[2], [3]] about half the time, and the other two solutions only a quarter of the time each.
–
Armin RigoSep 6 '13 at 8:25

This is precisely the solution that does not generate really random answers. The probability for picking the first one should not be uniform, it should be biased by the number of answers that exist after you pick a number for the first one.
–
nickieSep 5 '13 at 17:51

First of all, I think you need to clarify what you mean by the term random.

How can you generate a truly random list of sub-sequences when you are placing specific restrictions on the sub-sequences themselves?

As far as I know, the best "randomness" anyone can achieve in this context is generating all lists of sub-sequences that satisfy your criteria, and selecting from the pool however many you need in a random fashion.

Now based on my experience from an algorithms class that I've taken a few years ago, your problem seems to be a typical example which could be solved using a greedy algorithm making these big (but likely?) assumptions about what you were actually asking in the first place:

What you actually meant by random is not that a list of sub-sequence should be generated randomly (which is kind of contradictory as I said before), but that any of the solutions that could be produced is just as valid as the rest (e.g. any of the 6 solutions is valid from input [1,2,3,4,5,6] and you don't care which one)

Restating the above, you just want any one of the possible solutions that could be generated, and you want an algorithm that can output one of these valid answers.

Assuming the above here is a greedy algorithm which generates one of the possible lists of sub-sequences in linear time (excluding sorting, which is O(n*log(n))):

One of your requirements is that there cannot be any overlaps, and this ultimately implies you need to deal with unique numbers and unique numbers only. So I use the set() operation to get rid all the duplicates. Then I sort it.

Rest is pretty straight forward imo. I just iterate over the sorted list and form sub-sequences greedily.

If the algorithm can't form enough number of sub-sequences then print "Impossible!"

Hope this was what you were looking for.

EDIT: For some reason I wrongly assumed that there couldn't be repeating values in a sub-sequence, this one allows it.

I do get what you mean. I paused and thought about that when trying to define the problem but it's not really contradictory to 'randomness'. The problem of returning a random integer i.e. random.randint(a, b), is constrained by result being an integer and the bounds a and b. Similarly, there are p different possible ways to retrieve n sub sequences each of length l that are contiguous and non-overlapping. So the problem is to return any random item from the set of p possibilities.
–
RussWSep 5 '13 at 19:58