I want to sum all the values, by date. So December 2007 total would be 1500 and January 2008 would be 3000.

What is the best way to do this with a collection of approx 300 of the above objects.

I'm sorry this isn't really a Friday afternoon type question!

Any help much appreciated!

Matthew Kerry
Friday, February 01, 2008

Deleting …Approving …

For 300 objects, it doesn't matter how you solve the problem. It'll take milliseconds to do no matter how you approach it.

A more helpful answer will probably depend on your implementation language, but the basic categories are:1. Sort the collection, then iterate through it, accumulating results until the date changes.2. Extract all the unique dates, then query the collection for the objects with a particular date.

Couldn't you just iterate though the objects, check each Date as it comes up and populate a separate collection with the totals? This may not be the most performant way, but it sure seems simple to me.

Some sudo code...

For Each obj In InitialCollection If NewCollection.Contains(obj.Date) Then NewCollection.Item(obj.Date).Value += obj.Value

Else NewCollection.Add(obj.Date).Value = obj.Value End IfNext

Rich
Friday, February 01, 2008

Deleting …Approving …

+1 to Rich except use a dictionary or hashtable for the second collection. The key is the date, the value is the current sum. This is only if the hashtable/platform supports value equality on the Date type.

Ted
Friday, February 01, 2008

Deleting …Approving …

Mark hits the nail on the head. If there's only 300 objects the solution is nearly irrelevant. Use whatever makes the code easiest to maintain, assuming you don't have some WTF-ish code to extract the data from each object.

I would solve it using the hash table method. Normalize the date and use that as the key, with the value being the running total. Then it becomes trivial to iterate over the objects and update the running totals. That's probably easier for some languages than others. If it's not easy, you're using the wrong language :-)

An anonymous voice in the crowd
Friday, February 01, 2008

Deleting …Approving …

> If it's not easy, you're using the wrong language :-)

SELECT SUM(ObjectValue) FROM ObjectTable GROUP BY ObjectDate

Christopher Wells
Friday, February 01, 2008

Deleting …Approving …

Sheesh! Obviously you should serialize the collection to XML, put some HTTP around it, add some SOAP headers, create a web service, open up a socket, wait for an asynchronous response, send it to a secondary server in case one fails, compare the results, and reconstitute a result object.

What the hell are they teaching in schools these days, anyway?

Greg
Friday, February 01, 2008

Deleting …Approving …

@Greg, apparently they're teaching exactly that.

Lance Hampton
Friday, February 01, 2008

Deleting …Approving …

Depending on the language, it may be possible to define a "CompareTo" method on the objects, put them in an array or list, and sort them. The subtotals would be collected in the usual way passing over the set in sorted order.

I doubt it would be faster than using the Hashtable, but it might have other benefits, for example, the subtotals would created in order sorted by date.

Nutmeg Programmer
Saturday, February 02, 2008

Deleting …Approving …

I think the preferred way would have to be to create a map, hash, collection or whatever keyed by date. Then iterate over the list and accumulate the totals for each date found. Any other approach such as sorting and summing each subrange is too much low level work and more risk of mistakes.

Tasks like this are legion in my day job's programming domain. We use Java. Every time I get one I weep silently that I can't use Ruby, and then write the (overly long, obnoxious) code to solve it with Java collections. Which, to be fair, were amazing when they were the best tool I had available. (Think of how annoying it would be in C without the STL. Not hard -- it isn't a hard problem, it is CS101 -- just annoying.)

'My goal in offering Bingo Card Creator to the world is to in my own small way improve students' education by making fun and educational activities possible.'

EFL might be more apropos? :-)

Monday, February 04, 2008

Deleting …Approving …

Thanks for everyone's comments.

After some wrangling I convinced the DBA to change the query to group by just the month period and it seems to work a treat.

Linq also looks like an interesting development that we need to get to soon.

Monday, February 04, 2008

Deleting …Approving …

> What the hell are they teaching in schools these days, anyway?

You did want all of that done in C, making as much use of pointers as possible, right?

Monday, February 04, 2008

Deleting …Approving …

What happens if the input grows to 30,000? 300,000? Why not choose a solution that scales. It's as easy as one that doesn't.

Someone had the right solution above - one loop with a hash lookup each iteration. The running time is O(n), so it scales with any number if input.

DSCheerleader
Tuesday, February 05, 2008

Deleting …Approving …

If you're not using .Net 3.0, this post will be of no interest to you.

Because I was curious to see how smart Linq was when it does grouping, I extended Chris Tavares's example above to compare Linq's execution of such a query with a hand-coded "manual" version. (See below.)

I was surprised to find that Linq ran *faster* than my manual method did. It occurred to me that performing ten million calls to ContainsKey might be the culprit and so I wrote a version of the manual method that doesn't use ContainsKey to check before trying to update the dictionary; instead, it catches the exception if the key's not in the dictionary and adds it.

It seems that given a comparatively small number of keys, catching the KeyNotFoundException is faster than avoiding it in the first place. And the Linq query is very slightly slower than the manual method.

It's also interesting to see that Linq executes the query when it's referenced, not when it's initialized.

There's something I'm seeing here that I don't understand, though: when this program dumps the Linq results, there's a perceivable delay between each line written. I think that the total delay is about equal to the difference in execution times between the Linq query and the optimized query. I wonder what's up with that.

@Nigel SampsonExceptions can indeed cause huge problems but the magnitude is relative to call stack unwind that occurs until they are caught. If they are caught very close to the throwing place the penalty is minor, but if the catching occurs far away (call stack wise) the penalty grows at an exponential rate.