Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

I am making one service which needs to hold data grouped and calculated by year-month combination. I know how to calculate data and put it on new table. But I am confused what kind of data type I should use for storing month - year value. Here is what I have considered.

Two Integer columns one for year other for month (clear to understand and strong to manipulate (anybody who uses this table can easily use ranges and orders) but I think that would take more space than another solution.

Use a Date field, and always store one date for entire month. (Hard to explain to users how it works and how to create WHERE clauses)

One varchar(7) column and put there stings like 2012-02,2013-01. For me this is easy to understand but hard to manipulate.

Whick I should pick? or is there other solution. Can someone suggest according to performance which way is going to be good solution, Because most of queries is going to using data range in where solution and my new table will have around 2-5 million records.

I think you should choose one (or more) options depending on how the data needs to be retrieved.

For 5 million records, space should really not be your primary concern. For each of your options here are the pros and cons for retrieval:

(1) Two Integer Columns: This is a great approach to use if you need to compare months from different years. Extracting by month and sorting by year will be much quicker if you index year and month separately. Option 1 is best if this is an important or frequent mode of using the data. On the other hand, this mode is terrible at extracting ranges other than years and months. For example, it is not good for ranges that cross years. The WHERE clause could get more complicated than you'd ideally like it to be when date ranges cross calendar year boundaries. (Think from November 2011 to February 2012.)

(2) Date Field: You and gbn have both identified good points about this format. It is also good for sorting chronologically and for extracting ranges of months. It happens to be the most compact representation (only 3 bytes). It isn't good at all for comparing like months in various years and it isn't idea for displaying in screens and reports.

(3) char(7) YYYY-MM Field: Note that if you are really concerned about space (and you shouldn't be in your case) then you can use char instead of varchar since every item will be of a known length. Using YYYY-MM is good for sorting and filtering ranges. It isn't as good as (2) for space, but it is better for display and simplicity of the WHERE clause - unless you need to extract like months across years.

You may have noticed that there is a difference between (1) and (2)/(3) when it comes to the nature of the extraction that you need to do. If you need to do both month/year over year and month range extractions then none of these options are perfect. If that is the case, I'd suggest you consider using a combination of (1) and either (2) or (3) - I'd pick (3) myself, since I'd value ease of display/use over storage space. If you do use a combination, make one or the other a computed column(s) and index it for efficient retrieval.

Good point, when I initially fill my table, this perhaps is going to be one of two models which recommend Joel Brown
–
adopilotMar 12 '12 at 14:17

@adopilot - The approach of keeping both end-points of the range is great for retrieving the record that matches a given point in time (i.e. date/time). You could consider this as "option 4.".
–
Joel BrownMar 12 '12 at 19:50

If you also have the full date in the table and use the reduced one for aggregation only, then make a function that creates a normalized, ideally sensibly distributed value (e.g. "months since January 1970") from the date, and create an index on the result of that function to speed up the aggregation process.