Algorithm for price comparison site

I have a price comparison site where I want to gcalculate the cheapest stores to buy all the products people want to buy (up to a maximum number of products of 10-15.), instead of just the cheapest price for one product. The result could be: buy the first three products from Amazon, the fourth from eBay and the last four from Buy.com. At the moment I have about twenty stores.

I am now iterating over all the possible combinations between products and stores, but of course this does not scale very well.

I have been breaking my head over this for days now and I am not getting any further. Can you point me in the right direction?

beginnersmind
Wednesday, February 28, 2007

Deleting …Approving …

I assume right now you are stepping through each product, then for each product stepping through each store to see which has the lowest price? I you want to be sure you are getting the lowest price, I don't see any other way of doing that.

If you are worried about speed/scalability then we would need to discuss how exactly you are implementing this.

Dan

Dan Boris
Wednesday, February 28, 2007

Deleting …Approving …

Linear Algebra is your friend.

Sgt.Sausage
Wednesday, February 28, 2007

Deleting …Approving …

This sounds a bit like the famous travelling salesman problem where there are too many combinations to solve by brute force in a reasonable time.

One approach might be to build up a tree of possible options but have a way to prune out most possibilities early on. For instance narrow it down to the 4 sites with the lowest unit price first, then use brute force to explore the combinations that are left.

Colin Sanson
Wednesday, February 28, 2007

Deleting …Approving …

Thanks. Getting the lowest price of each product is not the problem (I am already doing that in the single product version), the problem, as I should have explained in my question, is that the biggest variety is in the shipping costs. I thought about the traveling salesman problem and saw the similarities as well, but I think the big difference is that the final price depends for such a big part on the shipping costs, which are unknown until you know how many products you buy at the store. Some stores have free shipping, some free shipping over $25, some have $5 shipping per item, some have a fixed shipping price per order etc. Narrowing down to the four items with the lowest unit price is not good enough in this case.

The downside of course is that you won't always get the optimal solution, but you can get a very good solution, very cheaply.

In this scenario, one way to implement it is to assume that you want to buy all of your products from one store. You calculate the prices for each of the 40 stores. If one store is substantially below the others, i.e., more then 20% off, make that your primary choice.

Find the most expensive item on your list. See who's the lowest seller of that item. If it's not your primary choice, swap the seller for that item. Find the second most expensive, and swap if needed.

Ideally, at this point, you've done 42 operations and you're probably not much more than 5% off the actual lowest possible price. The alternative, as you mentioned, is to try all possible permutations and that can easily be thousands of operations.

TheDavid
Wednesday, February 28, 2007

Deleting …Approving …

I think it's an interesting intellectual question.

However, before anyone gets too carried away, ordinary consumers may not want to buy stuff from multiple sites or deal with multiple delivery dates just to save a penny. Yes, a single centralized site like beginnersmind's can handle all of the details for you but you're still asking consumers to trust a yet another anonymous, faceless "company" with your credit card and billing info.

Back to the problem.

If you're interested in including the various shipping options and special deals and package discounts and so forth, you're probably better off looking at genetic algorithms or parallel agents. Specifically, rather than trying to find an absolute maximum (or more realistically, relative maxima) you construct a way to identify relevant parameters and create a thread for each "group" of common parameters. And once you have threads so to speak, you can throw hardware at the problem quite easily.

Hmm... it's an intriguing thought. Good luck.

TheDavid
Wednesday, February 28, 2007

Deleting …Approving …

I think you're right, TheDavid, that most people are not that interested in saving pennies. However, other sites do exist that do exactly this, so the idea is not entirely without merit. There usually is some sort of store rating system attached, that lets people search trusted stores only.

Maybe I should just go and get a CS degree first...

beginnersmind
Wednesday, February 28, 2007

Deleting …Approving …

The problem seems nonlinear. For example, you get a certain price from Amazon for a product if you buy it alone, but if you buy it with something else then you get free shipping.

It really depends on the shipping aspect of it then. If the shipping costs are really arbitrary, then you might have an NP-hard problem. In that case (and if you really want this thing to scale), you should consider approximation algorithms or nonlinear optimization.

Thursday, March 01, 2007

Deleting …Approving …

Interesting problem. Research "knapsack problem." It's probably some sort of variant. If you set it up as a integer programming problem in the simplest way, then the addition of the shipping makes it look non-linear. I bet it's doable though, if you give yourself the option of buying a certain amount of discount, given certain constraints.

You do some setup by finding the cost of each possible combination of products from each vendor, and you're back to integer programming. There could be plenty of funky cases. For example, if you want to buy a stack of books including 2 copies of Freakonomics, it might turn out best to buy one from Amazon (to get free shipping) and one from Barnes & Noble (to get 10% off on 3+ items.) However, setup evaluating all the cases is going to be too expensive. Actually, integer programming in any form is likely to be too expensive.

I would look into: 1) a heuristic, e.g. the aforementioned greedy algorithm. 2) a heuristic method of setting up a restricted problem. For example, you could figure out which three vendors give you the biggest savings on a single item. Then use an exact method with just the three vendors. 3) You may be able to structure the problem to make it simpler.

Peter

Peter Vanderwaart
Thursday, March 01, 2007

Deleting …Approving …

I know one price comparison site that has a warehouse full of cheap labor manually searching sites and entering prices.

I kid you not.

Amazed
Saturday, March 03, 2007

Deleting …Approving …

Perhaps I misunderstand your spec, but isn't this just a matter of:

ALTER TABLE product_prices ADD INDEX (product_id, price);

SELECT min(price), shop_id FROM product_prices WHERE product_id IN (123,456,789,...) GROUP BY product_id;

(actually, getting SQL to output the shop_id for the minimum price might be a bit more tricky than this - but doable and not significantly worse in terms of performance)

Matt
Sunday, March 04, 2007

Deleting …Approving …

Or even more trivially, why don't you just cache the minimum-priced shop for each product?

Others seem to be envisaging some kind of tricky combinatorial problem, but I really don't see the need for that in your description - it seems you just want to output the cheapest shop for each of a given list of products?