Simulating First/Last aggregate functions in MySQL

By Joao

I came across an interesting post at Xaprb.com on selecting the first and last row of a group in MySQL.  This solution builds on a comment in that post about using GROUP_CONCAT to achieve this.

As a practical example of the usage of the first and last aggregate functions, consider an application converting financial data between different timeframes.

The table structure might look like this:

CREATE TABLE EURUSD_3600 (
    datetime datetime NOT NULL,
    open float NOT NULL,
    low float NOT NULL,
    high float NOT NULL,
    close float NOT NULL,
    PRIMARY KEY (datetime)
);

Assuming data in the hourly  timeframe (3600 is the number of seconds in one hour):

INSERT INTO EURUSD_3600 SELECT '2001-01-03 00:00:00',0.9507,0.9505,0.9509,0.9506;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 01:00:00',0.9506,0.9492,0.951,0.9496;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 02:00:00',0.9496,0.9495,0.9509,0.9505;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 03:00:00',0.9504,0.9498,0.9508,0.9499;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 04:00:00',0.9499,0.9499,0.9507,0.9503;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 05:00:00',0.9503,0.9503,0.9506,0.9506;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 06:00:00',0.9506,0.9505,0.9507,0.9505;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 07:00:00',0.9505,0.9503,0.9513,0.9509;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 08:00:00',0.951,0.9468,0.951,0.9489;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 09:00:00',0.949,0.9489,0.9543,0.9539;

SQL is a suitable tool to group this data into a more compact timeframe, for instance, the 2-hour timeframe. By using a formatting function on the datetime field, a standard GROUP BY SQL statment can be used to achieve this:

CONCAT(
    year(datetime), '-',
    month(datetime), '-',
    day(datetime), ' ',
    floor(hour(datetime) / 2) * 2, ':00:00'
)

Calculating the low and high of the period is easy, using the standard min/max aggregate functions:


SELECT
    CAST(
        CONCAT(
            year(datetime), '-',
            month(datetime), '-',
            day(datetime), ' ',
            floor(hour(datetime) / 2) * 2, ':00:00'
        ) AS DATETIME) AS datetime,
    MIN(low) AS low,
    MAX(high) AS high
FROM EURUSD_3600
GROUP BY CAST(
        CONCAT(
            year(datetime), '-',
            month(datetime), '-',
            day(datetime), ' ',
            floor(hour(datetime) / 2) * 2, ':00:00'
        ) AS DATETIME)

GROUP_CONCAT can be used for the opening(first) and close(last), the trick is to be sure to use the ORDER BY clause in the group concatenation, otherwise the returned first and last records might not be as expected:


SUBSTRING_INDEX(
GROUP_CONCAT(CAST(open AS CHAR) ORDER BY datetime),
',',
1
) as open

SUBSTRING_INDEX(
GROUP_CONCAT(CAST(close AS CHAR) ORDER BY datetime),
',',
-1
) as close

GROUP_CONCAT returns a comma separated string of all groups values, then SUBSTRING_INDEX is used to return the first and last values of the group. This works as long as data is properly sorted and the resulting string fits in the GROUP_CONCAT buffer.

Note however that there is a problem with the preceding code when calculating the close value. SUBSTRING_INDEX looks from the end of the string, but the GROUP_CONCAT buffer has a limited size, and the it will be truncated if the concatenated rows exceed the buffer size, in which case the returned value will not be the last in the group as desired.

An easy solution is to reverse the sort order and fetch the first element instead:


SUBSTRING_INDEX(
GROUP_CONCAT(CAST(close AS CHAR) ORDER BY datetime DESC),
',',
1
) as close

MySQL will still throw a warning about truncated group_concat when there are too many rows in the source table, but this can be safely ignored for the purpose of this example.

The final query turns out as:


SELECT
    CAST(
        CONCAT(
            year(datetime), '-',
            month(datetime), '-',
            day(datetime), ' ',
            floor(hour(datetime) / 2) * 2, ':00:00'
        ) AS DATETIME) AS datetime,
    SUBSTRING_INDEX(
        GROUP_CONCAT(CAST(open AS CHAR) ORDER BY datetime),
        ',',
        1
    ) as open,
    MIN(low) AS low,
    MAX(high) AS high,
    SUBSTRING_INDEX(
        GROUP_CONCAT(CAST(close AS CHAR) ORDER BY datetime DESC),
        ',',
        1
    ) as close
FROM EURUSD_3600
GROUP BY CAST(
        CONCAT(
            year(datetime), '-',
            month(datetime), '-',
            day(datetime), ' ',
            floor(hour(datetime) / 2) * 2, ':00:00'
        ) AS DATETIME);

No related posts.

Related posts brought to you by Yet Another Related Posts Plugin.


, ,

categoriaTechnology commento14 Comments dataOctober 12th, 2009

About... Joao

This author published 25 posts in this site.

Share

FacebookTwitterEmailWindows LiveTechnoratiDeliciousDiggStumbleponMyspaceLikedin

Comments


Polprav
October 22nd, 2009

Hello from Russia!
Can I quote a post in your blog with the link to you?


Joao
October 26th, 2009

Sure. When you’re done send me the link so I can see it.


stephen
November 19th, 2009

I am trying to turn this into a function, but it won’t work. Please, correct me.

DELIMITER $$

DROP FUNCTION IF EXISTS `test`.`last`$$
CREATE FUNCTION `test`.`last`(f1 varchar(12),f2 varchar(12)) RETURNS varchar(12) CHARSET latin1
return substring_index(group_concat(f1 order by f2 desc),’,',1);

$$

DELIMITER ;


Joao
November 20th, 2009

Not sure you can do this as a function using the same underlying logic.

My first approach was to create an aggregate UDF, but that didn’t work because you can’t guarantee the order of data going into the udf.


Arnold Daniels
November 28th, 2009

Nice little hack :) .

You could do this with a MySQL UDF though. Like the above function you need to pass the field with the value as well as the field to order by.

SELECT agg_last(close, datetime AS `desc`) FROM …


Joao
November 28th, 2009

But how do you guarantee sort order ? That’s the bit I couldn’t figure out. Does “datetime as `desc`” somehow make it happen, or is it something that can be implemented inside the UDF ?


webmaster
April 17th, 2010

You can get results like that but you are sorting (order-ing) in calculated colums which will not be efficient with large tables.
Here is a solution that is simple elegant and uses efficiently indexes etc.
The text behid the link is specifically only about this solution of a FIRST and LAST aggragating functions wiht GROUP BY in MySQL:
http://topwebguy.com/first-and-last-in-mysql-a-working-solution/


Joao
April 17th, 2010

I’ve just tried your approach and indeed it’s more efficient. Thanks for sharing!


ty
September 26th, 2010

I’ve tried this and helped me a lot. Thank you!


Robert Eisele
May 10th, 2011

Arnold,

I’ve written these small functions and added them to my infusion UDF: https://github.com/infusion/udf_infusion

Robert


Paolo
September 30th, 2013

Hi Joao,
thanks for this great post. How would you change the query if I wanted to group days instead of hours? For example, looking at weekly data using daily data.


Joao
September 30th, 2013

You’d need to use different mysql date time functions in the “group by” and “select” clauses.

The actual function for weekly could be:

date_format(date_sub(datetime, interval weekday(datetime)+1 DAY), '%Y-%m-%d 00:00:00')

There are others implemented here:
https://github.com/joaocosta/Finance-HostedTrader/blob/master/lib/Finance/HostedTrader/Config/Timeframes.pm#L18


Paolo
December 10th, 2013

Hi Joao,
thanks for responding on Sep 30th dated post. I also used the format for the different time frames you have listed on the linked site. If I may trouble you a bit more, I have minute data on my database and if I wanted to measure a 65 min timeframe instead of the typical 60 min. How would you modify your query. My reason is a 65 min timeframe would divide the trading day in 6 equal parts.
Again, thanks for you response.


gary
July 1st, 2014

If there are other columns in the table, how to get these other column values for the Group wise Last Row.

Leave a comment