Home » Performance » MySQL » Group by any date interval
Group by any date interval [message #3088] Fri, 16 May 2008 10:31
festerwim  is currently offline festerwim
Messages: 3
Registered: May 2008
Location: Belgium
Junior Member
I have a table with datestamped info (DATETIME type) and some tinyint and smallint columns. I want to calculate averages and sums over any date interval the user in my application wants.

I have tried many ways but none seem to work properly. Has anybody done this before and has a definitive solution?

The things I have tried:

1) Convert to unix time:
SELECT ... FROM ... GROUP BY FLOOR(UNIX_TIMESTAMP(TimeStamp)/86400)

Where 86400 is the number of seconds in the grouping period (e.g. if you want to group by hour, then it is 60 * 60, for grouping by 2 hours it is 2 * 60 * 60 ).

There are problems with this approach although I don't remember the exact details I'm afraid.

2) Group by year, month, day, hour separately:
SELECT  ZoneId, GROUP_CONCAT(DATE_FORMAT(timestamp, '%Y%m%d-%H:%i:%s')),STR_TO_DATE( CONCAT( DATE_FORMAT(timeStamp, '%Y'),DATE_FORMAT(timeStamp, '%m'), DATE_FORMAT(timeStamp, '%d'), (FLOOR(DATE_FORMAT(timeStamp, '%H') / 2) ) * 2) ,'%Y%m%d%H') as CalculatedTimeStamp, sum(Length*(nVehicles1+nVehicles2+nVehicles3+nVehicles4+nVehicles5))/sum(nVehicles1+nVehicles2+nVehicles3+nVehicles4+nVehicles5), sum(nVehicles1), sum(nVehicles2), sum(nVehicles3), sum(nVehicles4), sum(nVehicles5) FROM tzonedata WHERE TimeStamp >= "2008-05-09 11:28:09" AND TimeStamp <= "2008-06-08 11:28:00" AND ( zoneid=1 OR  zoneid=2 OR  zoneid=3 OR  zoneid=4 OR  zoneid=5 OR  zoneid=6 OR  zoneid=7 OR  zoneid=8) GROUP BY DATE_FORMAT(timeStamp, '%Y'),DATE_FORMAT(timeStamp, '%m'), DATE_FORMAT(timeStamp, '%d'), FLOOR(DATE_FORMAT(timeStamp, '%H') / 2), zoneid  ORDER BY CalculatedTimeStamp LIMIT 0, 5000


This example groups per 2 hours. Notice the 'CalculatedTimeStamp', this calculation makes it so that each returned row starts at the start of an hour. The returned resultset will contains times like this:

0:00
2:00
4:00
6:00
...

BUT:
-> It does not work when you have summer and wintertime! If you group by hour:
SELECT  ZoneId, GROUP_CONCAT(DATE_FORMAT(timeStamp, '%H')), STR_TO_DATE( CONCAT( DATE_FORMAT(timeStamp, '%Y'),DATE_FORMAT(timeStamp, '%m'), DATE_FORMAT(timeStamp, '%d'), DATE_FORMAT(timeStamp, '%H')) ,'%Y%m%d%H') as CalculatedTimeStamp, sum(nVehicles1), sum(nVehicles2), sum(nVehicles3), sum(nVehicles4), sum(nVehicles5) FROM tzonedata WHERE TimeStamp >= "2008-03-30 0:01:00" AND TimeStamp <= "2008-03-30 14:01:00" AND ( zoneid=1) GROUP BY DATE_FORMAT(timeStamp, '%Y'),DATE_FORMAT(timeStamp, '%m'), DATE_FORMAT(timeStamp, '%d'), DATE_FORMAT(timeStamp, '%H'), zoneid  ORDER BY CalculatedTimeStamp LIMIT 0, 5000

Then you get the following result:
The group by happens correctly, the GROUP_CONCAT column shows:
00,00,00,00
01,01,01,01
02,02,02,02
03,03,03,03

However, the calculated timestamp contains 3:00 2 times!
30-mrt-2008 00:00:00
30-mrt-2008 01:00:00
30-mrt-2008 03:00:00
30-mrt-2008 03:00:00

This is due to the fact that at that time we switch from winter to summertime.

Another problem with this approarch is that if the user selects to group by 7 days, it always gets truncated at the month. E.g.:
1 jan -> 7 jan
8 jan -> 16 jan
17 jan -> 23 jan
23 jan -> 30 jan
31 jan -> 1 feb !!!

That is ofcourse because I group by year first, then by month and then by day.


Any help would be greatly appriciated, I already spend 3 days trying to figure out the correct way to do this.

regards,

Wim

Read Message
Previous Topic:Lookup Tables not Using Indexes?
Next Topic:processing large result sets...
Goto Forum:

  



Current Time: Sun Sep 7 09:11:56 EDT 2008

Total time taken to generate the page: 0.00887 seconds
.:: Contact :: Home :: MySQL Support by Percona.com ::.

Powered by: FUDforum 2.7.5.
Copyright ©2001-2006 FUD Forum Bulletin Board Software

MySQL Performance | Forum authority Badge