Background

So, I wrote some straightforward Python code to download my favourite podcasts.

Next Problem

I don’t want to listen to multiple programs from the same feed in a row, because I like variety.

I don’t want to listen to them a randomly shuffle order, because many of the feeds have a internal sequence to their podcasts, and because that will lead to occasional runs of the same feed.

I don’t want to listen to them in the order that they were downloaded, because some podcast feeds are bursty in nature, leading to multiple of the same type in a row. This is especially true after I add a new feed; I get a lot of back-issues of the feed all at once.

Instead, I want my podcasts carefully sorted so as to maximise the space between podcasts of the same feed.

First Attempt

My first solution was a simplistic one. Twice a week, a script downloads the latest podcasts. It downloads them in a round-robin fashion from all of the feeds that have new articles.

If all the feeds have a similar number of articles, this works well. However, if one feed has many more shows than the others, it will lead to several shows in a sequence at the end of the list.

It wasn’t good enough, and it was bugging me. I often found myself hand-sorting the files before burning to CD.

Second Attempt

I posed the problem to the blog readers. I lightly disguised as a Wine-Gum Selection Puzzle, mainly so I could abstract away some of the complexity.

The result was some very interesting suggestions that lead to me doing even more reading and thinking. (Thank you to all the people who contributed! I am very appreciative.) I tried to apply each of the pieces of advice, but I still couldn’t get my head around any neat solution.

Third Attempt

I remained suspicious that the problem is NP-complete. So, how do you address NP-complete problems?

One way is to give up! I did that for a while, but it continued to bug me whenever I listened to two similar podcasts in a row, or found myself doing re-sorting by hand.

Another way is brute force, for small values of n. This is just a search problem, right? Remembering the classic A* search algorithm, I found an open-source implementation and tried to fit this problem into it.

It didn’t go!

A* is a branch-and-bound search. However, it was all branch, and no bound! Unlike a travelling salesman example, you never visit the same node going via another node. It is a tree, not a graph. A* is wasting its time.

Fourth Attempt

I dumped the A* implementation, and wrote another basic best-first search implementation from scratch.

The implementation consisted of a collection of partial solution objects.

Partial Solutions

Each partial solution object included a list of items (wine-gum colours or feed names) so far. The object could also derive a metric describing the “interestingness” of the ordering so far, and an estimate of what the final metric would be once all the rest of the items were placed.

Interestingness and Estimation

The estimate of the final metric was a tricky procedure. I want the estimate to be as accurate as possible, but the estimate may never be an under-estimate if the search algorithm is to work. The estimate also needs to be calculated relatively quickly.

After a little thinking, my over-estimate algorithm was to notionally place each of the remaining pieces in the last possible position and compute the interestingness it added to the overall solution. The last possible position maximises the interestingness, so if I optimistically over-assume every remaining piece gets that position, I am never under-estimating the final cost.

Search Algorithm

The main algorithm plucked out the partial solution from the collection with best estimated final metric, added each of the possible next items to make a new set of partial solutions, and pushed them all back into the collection. This was repeated until one of the partial solutions actually uses up all of the items – and hence is a final solution. By the magic of best-first searching, this is guaranteed to be an optimal solution.

The Collection

The right choice of collection data-structure is some sort of B-tree, so I stored it in a plain list instead! I had a quick look at the built-in Python support for B-trees and decided I couldn’t be bothered. I put it in a list, and just sorted the list over and over again.

Later, I decided that it was worth a teensy bit of optimisation. I moved from a list to a set (to allow easier deletions). Rather than sorting the whole list after each iteration, I just zipped through it looking for the maximum value, which was all I needed.

Result

The result wasn’t a huge surprise for an NP-complete problem. It was slow. I left it puzzling overnight on a 3-feed, 15-item problem, and it was still looking for the optimal solution the next morning.

No wonder iTunes doesn’t offer this feature!

Fourth Algorithm

I was expecting the result to be slow, and I already I had a plan.

There is a third way of dealing with NP-complete problems that I hadn’t explored yet – accepting approximate answers.

Realistically, I don’t need a perfect ordering of podcasts – just something that doesn’t have too many repeats too often. I will accept an ordering that isn’t optimal, but is near enough.

I put an escape clause so that the program would stop running after a few minutes, and return the best partial solution so far. I could accept that this was close enough to the best ordering of the first few items.

I can then remove those items out of the equation and run the program again on the remaining items.

This is sub-optimal, but I figure it is a good enough approximation.

First result

I ran the program for a short period and looked at the best partial solution it had found. I was shocked to discover it was an appallingly bad partial solution.

It was trying to optimise the following items: O-O-O-O-O-O-R-R-R-R-Y-Y-Y-Y, and it had got as far as O-R-Y-R-Y-R-Y-R. That was going to lead to a large consecutive sequences of Os at the end. How could that be a contender as the best solution?

I realised that my heuristic for estimating actually encouraged the postponement of the placement of the biggest group of items. It assumes that all 5 remaining Os are all in the best possible position (the last one), so there is no need to start placing them sooner.

New Estimation Function

I replaced the estimation function with one that assumed that the remaining Os would be evenly distributed amongst remaining positions. I took care to ensure it remained an over-estimate, but it was still much more accurate when you have a large group of items from one colour/feed.

I ran the tests again, and discovered it ran much, much faster. The task that previously ran overnight now finished in 3 or 4 minutes. The first estimation function was leading the poor search algorithm into a number of dead-ends that the new function avoided.

Nonetheless, it was still too slow to find the optimal solution for significant numbers of n. The worst-case situation I needed to handle was also the first set of data it needed to handle – my current backlog of 140 podcasts from about 12 feeds.

Second result

I combined the escape clause (run for a while before returning a partial solution, removing those items from the pool and run again) with the new estimating function, and tackled the real data.

It was still too slow – the partial solutions I was reaching in a reasonable time were only a few items long.

I further constrained the problem to only process a maximum of 9 feeds, 5 items per feed and 25 items over-all, in each pass. Again, this will lead to sub-optimal solutions, but I figure those numbers are high enough to introduce enough variety to keep the interestingness high.

After each pass where it gives up and returns the best partial solution found so far, the list of articles and feeds still available is updated, and a new set of feeds and items are selected. So, each time it only considers a limited pool of variety, but the pool changes every 10 or so podcasts, so it is less noticeable.

Final result

I have integrated this implementation with the downloading component I already had.

The downloading component runs automatically twice a week, and stores the podcast in neatly sorted piles by feed.

When I need another CD for my car, I run a program to discover what’s been downloaded, sort the shows into a single list using the algorithm I have described, and prepare them for burning to CD.

So, the solution appears to be working well enough for now. It only takes 5 or 10 seconds of visual inspection of the list to see some sub-optimal choices, but the mistakes don’t seem too egregious (e.g. there are no cases of three of the same track in a row, but there are situations where swapping the order of two items would improve the metric.)

So I am happy for now, but I haven’t listened to my first CD yet. Ask me again once I have been listening for a while.

Comments

Here is an algorithm in O(n*f) time, where n is the number of slots and f is the number of feeds. It is possibly only an approximation, but it should rule out anything really bad and give very consistent results.

(1) Find the feed with the largest number of shows
(2) Distribute the feed uniformly among the remaining slots in chronological order within the feed.
(3) Repeat

Using your original wine-gum example with the following feeds and counts:

R: 3
L: 3
O: 6
I: 2

So we start out with 14 remaining slots (0-based) and O as the largest feed with 6 shows. Each show goes in slot floor((14/6)*k)-1 where k = [1,6]. So the 6 shows are at slots 1, 3, 6, 8, 10, 13.

Now we have 8 remaining slots and pick R with 3 shows. Each show goes in the free slot floor((8/3)*k)-1 where k = [1,3]. To find a free slot , we have to iterate over the list of slots and skip any that are in use. So the 3 shows are at slots 2 (free slot #1), 7 (free slot #5), and 12 (free slot #7).

And so on.

It does have the peculiarity of having the rarest feed first. If you want the least rare feed first, reverse the order or let k be from 0 to feedSize – 1. I *think* that randomizing the order of feeds put int o the algorithm should not affect the distribution but I can’t think of a proof off hand. Is this a sufficient solution?

I can’t quite understand why you seem so determined to work within the limitations of your hardware, instead of finding a device which can provide the flexibility you seem to want. In other words a device with a display and controls to allow you to pick whatever song/podcast you want to listen to next.

In this vein, you might also want to re-state your objections to using iTunes. Frankly, they come off as a bit petulant and whiny. Is a non-native-Windows UI really that much of a show-stopper? Can’t you just use it anyway? Grit your teeth, think of England, whatever it takes?

A new MP3 player with controls and display is what … a couple of hundred buck at the most? Surely this is an easy expenditure to justify over all this coding effort you’re expending, unless you’re mainly interested in the algorithmic solution and associated blog fodder. (And it is interesting).

In which case I would have to say I prefer Jonathon’s algorithm to yours; mainly on the basis of simplicity. To be honest I don’t quite follow yours.

I don’t believe Julian is mainly interested in the algorithmic solution and associated blog fodder. I think he is highly interested in those things, but probably is most interested in having some automated way to gather and spread out the feeds so that he doesn’t have to select them manually, either beforehand or during listening. If only there were an MP3 player which provided “chronological wine-gum algorithm” shuffling, rather than purely random, my guess is that he’d jump on it.

I also like Jonathon’s algorithm for its simplicity. It almost feels too simple, and perhaps Julian will find something undesirable about it (even besides being insufficiently complicated to make a decent blog entry or take up enough coding time and effort); I’ve yet to play with it or with Julian’s own algorithm.

Finally, in my typical pedantic fashion, I would like to point out that every tree (in this context) is a graph.

I think the algorithm described there will give slightly better results than Jonathan’s, and I think that algorithm is O(n*log(n)), which is probably pretty close to O(n*f), so equivalent to Jonathon’s.

There’s an even easier approximate solution is O(n). However, it sometimes performs very badly. Simply select from a random feed is weighted towards the size of the remaining feeds. (The O(n) assumes rand(n) is constant time.)

So the question is how do I compare a bad O(n) heuristic against a better O(n*f) heuristic against an unknown quality, unknown complexity heuristic against an O(n!) perfect solution?

Answer: I dunno, but Alastair’s argument of choosing simplicity is a pretty damn strong one. If I was leading a project team doing this for commercial software, I would tell the team to use Jonathon’s solution; it’s both simple and right enough. Luckily, for me, for my private hacking, I get to implement it the hard, fun way!

I can’t quite understand why you seem so determined to work within the limitations of your hardware, instead of finding a device which can provide the flexibility you seem to want. In other words a device with a display and controls to allow you to pick whatever song/podcast you want to listen to next.

When I am listening on my computer, I do choose exactly which song or podcast I want to listen to next. In the case of podcasts, I delete them immediately after I hear them.

When I am listening in my car, I have an MP3 CD player that has a display. If I am listening to music, I can select the songs or folders I want to listen to, as often as I want to listen to them. When I am listening to podcasts, I only want to hear each one once, and I don’t want to have to keep track of which ones I have already heard, so I want the order to be pre-determined. It isn’t about a lack of display; it is about the lack of a method to flag/delete a podcast I have heard.

When I listen to MP3s on my iPod Shuffle, I have less choice. Another MP3 player would give me more choice, but the itsy-bitsy form-factor of the iPod Shuffle is more important to me than the lack of display. The new iPod Nano may change my decision here! I am not sure if the iPod Nano would give me the ability to quickly flag/delete a podcast immediately after I have listened to it.

As for iTunes, using it makes me feel stupid. I don’t like that. If I was being paid to use it, I would indeed lie back and think of England. If it had unique features that required the unexpected UI, I might try to embrace its differences. However, given that it has so many competitors for playing music, I don’t feel the need to use it for that. Given it can’t download and shuffle with a “stable sort” for podcasts from the same feed, I don’t want use it for downloading.

I’ll concede it’s probably petulant and whiny to mention it all in the post above; perhaps I should have just let the matter drop, but I still defend my choice to not use it.

Finally, in my typical pedantic fashion, I would like to point out that every tree (in this context) is a graph.

You are, of course, absolutely right. I should have said “It is only a tree, not an arbitrarily connected graph.” The problem was that I was having to come up with method that would return a unique node id, so that the argorithm could detect when it had already visited a node. However, that would never happen, so I didn’t need to implement such a check.

With R:2, O:1 and L:1, this algorithm will find the answer “R-O-R-L”, where a better answer is “R-O-L-R”.

That’s easy enough to fix: instead of distributing completely uniformly, always put the last item in the last available slot and then distribute uniformly. That will stretch the items of each feed as far apart as possible.

No one claimed it’s optimal. The strategy will clearly produce degenerate results for a population consisting mostly or entirely of two-item feeds simply because of the placement strategy. The pattern you see will simply expand: A-B-C-D-E-F-G-G-F-E-D-C-B-A.

Now how problematic is that degenerate case really?

It seems to me that it’s really only an issue for the middlemost four slots that end up being A-B-B-A. If that bothers you so much, it is easy to add a post-processing step with keyhole optimisation type fix for this. (Actually you probably want to look for A-B-C-C-B-A first because turning that into A-B-C-A-B-C is sufficiently better than turning it into A-B-C-B-C-A to make it worth the effort. Beyond 6 slots, the returns diminish rapidly.)

Hmm, it strikes me that you really want to model this sort of in terms of fluid dynamics.

We consider each item piece to repel items from its own feed, with a force inversely proportional to some function of the distance – probably its square. You start with a system with randomly ordered items, and simply calculate the forces on each of the items, then swap neighbouring items where the directed force on one of the items minus the directed force on the other is greater than a certain threshold. Said threshold represents friction. Then you just iterate until the system reaches an equilibrium.

This won’t find optimal solutions either, because when the available space is much larger than a particular population of items, the fall-off of the force at a distance means they won’t repel each other strongly enough to spread across the entire available space. But the items will pull so far apart anyway that for the stated purpose of the algorithm (you don’t want to hear items from the same feed too frequently), it just doesn’t matter.

Julian, I am equally bothered by a lack of an optimal solution. However I don’t think that the problem is NP-complete. One of the problems is that the metric for success is still poorly defined. You are not seeking to maximize the average distance between two of the same feed. Rather you are trying to maximize the minimum distance between two of the same feed. And minimize the number of instances of the minimum-distance. This is a much more complicated metric.

Part of my thought in trying to come up with an optimal solution was focussed on greedy/iterative algorithms. I tried to come up with an algorithm which was given an optimal sub-sequence and which added a show to the end which also yielded an optimal sub-sequence. I hoped by this to have an inductive proof of optimality. Unfortunately, this entire class of solutions is impossible because optimal sequences can contain non-optimal sub-sequences. Optimality being defined here as having every permutation result in a lower or equal minimum-distance. For instance, take R-3, B-1, G-1. One optimal arrangement is RBRGR. But the sub-sequence RBRG is not an optimal sub-sequence since within that sub-sequence RBGR is a better permutation. So no solution based on an inductive proof is possible.

Rather you are trying to maximize the minimum distance between two of the same feed. And minimize the number of instances of the minimum-distance.

The metric that I am explicitly trying to maximise is the product of the distances between every two consecutive instances of the same feed.

If you are arguing that I should change my metric for better listening pleasure, I am open to it.

I actually suspect that the product of the square roots of the distances might be more accurate to reality. However, it would probably take me half-an-hour just to come up with an example where the difference in metrics would change the outcome, so I haven’t bothered.

Unfortunately, this entire class of solutions is impossible because optimal sequences can contain non-optimal sub-sequences.

That is an insightful way to look at it! You’ve proved we can’t produce a short-sighted algorithm that incrementally adds another wine-gum to the end of the sequence without considering all of the remaining colours.

Complicated? Algorithmically it’s very simple – much easier to conceptualise than the code for any solution involving backtracking, as far as I’m concerned. It’s a bit costly in that calculating all the forces during an iteration is O(n²) – but your playlists shouldn’t be large enough to make that a problem, and if they are, you can cut off at a fixed distance, since the distance fall-off and friction threshold conspire to make the forces from distant items irrelevant. (In a universe as small as yours and with only one dimension, anyway – in the cosmos, the sheer amount of gravitating mass can make the forces of even hugely distant objects significant. We have a rendezvous with Andromeda in about 3bn years.)

Perhaps I misunderstand the dilemma, but it strikes me that if you’re using any form of graph-search algorithm (ie: A* or BFS, the two you mention), you will only get the optimal solution if your metric (a.k.a. heuristic) underestimates the true solution. That is, the algorithm tries to achieve something it can’t possibly reach, and gets closer and closer to the optimal solution. If the metric ever overestimates (and it’s not clear to me whether yours does or doesn’t, because I can’t figure out the “true” metric), then you won’t get the optimal solution.
It is however 12:07am, so take this comment with appropriate time dilation accounted for…

Yes, you are right. In the traditional description of the shortest-path algorithms, the heuristic is an under-estimate (or more strictly, never an over-estimate) of the final path length.

Because I am seeking to maximise a metric, rather than (as is more traditional) minimise a path-length, I needed to invert the traditional description, as above.

(In fact, I cheated. I used the negative of the metric defined, so I could continue to try to minimise it, and thus use the terms that you and I are familiar with. This complexity didn’t seem worth expounding in the description.)

So the question is how do I compare a bad O(n) heuristic against a better O(n*f) heuristic against an unknown quality, unknown complexity heuristic against an O(n!) perfect solution?

Answer: I dunno, but Alastair’s argument of choosing simplicity is a pretty damn strong one. If I was leading a project team doing this for commercial software, I would tell the team to use Jonathon’s solution; it’s both simple and right enough. Luckily, for me, for my private hacking, I get to implement it the hard, fun way!

Confession time!

The proof was in the pudding. I wasn’t happy with my algorithm’s output. I had put in too many tricks to reduce the size of the problem before passing it to the algorithm proper, and the results were noticeably sub-optimal.

If I am going to have a sub-optimal solution, I may as well have a fast sub-optimal solution.

In far less time than it took for one run of my algorithm, I implemented and ran the Jonathon/Aristotle algorithm, and that is the one I am using now.

FOr some odd reason, this post is still on my mind despite the year having incremented again! Anyhoo, at work I’ve been loosely filling my time read papers on reinforcement learning techniques, and they seem like a good solution for this problem. Specifically, one technique called temporal-difference learning would allow you to learn the best policy for taking an unsorted list and sorting it, using whichever metric you’ve decided on as the “reward”. You’d write some kind of simulator to play a few thousand “episodes” of the problem, and hopefully over time the average metric would increase. You then take whatever weights you’ve now learned for each move, and apply this back to your real iTunes shuffling program.
The only caveat to this is that although you’ll learn a good algorithm in terms of maximising the metric, it may well be very slow. You could alter your metric so that solving the shuffling problem faster was also rewarded, and then you’d be learning a fast and effective algorithm.
At any rate, it’s an interesting way of solving odd problems – I’ve just implemented an AI for a simple snakes game using it! 🙂