Ok, so I have a program that basically logs errors into a nosql database. Right now there is just a single model for an error and its stored as a document in the nosql database.

Basically I want to summarize across different errors and produce a summary of the "types" of errors that occured.

Traditionally in a SQL database the this normalization would work with groupings, sums and averages but in a NoSQL database I assume I need to use mapreduce.

My current model seems unfit for the task, how should I change the way I store "models" in order to make statistical analysis easy? Would a NoSQL database even be the right tool for this type of problem?

I'm storing things in Google AppEngine's BigTable, so there are some limitations to think of as well.

Just curious, but shouldn't this type of question be asked on StackOverflow.com?
–
funkymushroomDec 23 '10 at 17:49

To be honest, I'm not sure, its more of a design question. Since as far as I've seen the problem is specific for Document style databases. the storage is modeled differently. Basiclly its something that would be dicussed around a whiteboard, however, you might be right.
–
MortenDec 27 '10 at 9:48

If you are continuously logging errors, #1 will be your best option. If you take that approach, having the counter attached to the ErrorOccurrences may not be the best method. You could make a separate counter model using the Sharded Counter method to more efficiently create your summary, if you don't have to provide the line numbers right away. That would also mean you are making at least 3 writes to the datastore for every error vs. just one.

Great answer! It's more or less exactly the kind of answer I was looking for. It's good that you pointed out the amount of writes, since I hadn't thought in those terms at all.
–
MortenJan 5 '11 at 7:37

An error is aggregated over several columns, an example would be that the stacktrace shows that error occur at different locations in the code. So there is no way to precategories the errors. but remember that this is not a relational database. It stores complete documents, and its an (quite complex) aggregate over those documents that I am after.
–
MortenDec 23 '10 at 9:17