Simulating First/Last aggregate functions in MySQL
By Joao
I came across an interesting post at Xaprb.com on selecting the first and last row of a group in MySQL. This solution builds on a comment in that post about using GROUP_CONCAT to achieve this.
As a practical example of the usage of the first and last aggregate functions, consider an application converting financial data between different timeframes.
The table structure might look like this:
CREATE TABLE EURUSD_3600 (
datetime datetime NOT NULL,
open float NOT NULL,
low float NOT NULL,
high float NOT NULL,
close float NOT NULL,
PRIMARY KEY (datetime)
);
Assuming data in the hourly timeframe (3600 is the number of seconds in one hour):
INSERT INTO EURUSD_3600 SELECT '2001-01-03 00:00:00',0.9507,0.9505,0.9509,0.9506;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 01:00:00',0.9506,0.9492,0.951,0.9496;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 02:00:00',0.9496,0.9495,0.9509,0.9505;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 03:00:00',0.9504,0.9498,0.9508,0.9499;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 04:00:00',0.9499,0.9499,0.9507,0.9503;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 05:00:00',0.9503,0.9503,0.9506,0.9506;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 06:00:00',0.9506,0.9505,0.9507,0.9505;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 07:00:00',0.9505,0.9503,0.9513,0.9509;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 08:00:00',0.951,0.9468,0.951,0.9489;
INSERT INTO EURUSD_3600 SELECT '2001-01-03 09:00:00',0.949,0.9489,0.9543,0.9539;
SQL is a suitable tool to group this data into a more compact timeframe, for instance, the 2-hour timeframe. By using a formatting function on the datetime field, a standard GROUP BY SQL statment can be used to achieve this:
CONCAT(
year(datetime), '-',
month(datetime), '-',
day(datetime), ' ',
floor(hour(datetime) / 2) * 2, ':00:00'
)
Calculating the low and high of the period is easy, using the standard min/max aggregate functions:
SELECT
CAST(
CONCAT(
year(datetime), '-',
month(datetime), '-',
day(datetime), ' ',
floor(hour(datetime) / 2) * 2, ':00:00'
) AS DATETIME) AS datetime,
MIN(low) AS low,
MAX(high) AS high
FROM EURUSD_3600
GROUP BY CAST(
CONCAT(
year(datetime), '-',
month(datetime), '-',
day(datetime), ' ',
floor(hour(datetime) / 2) * 2, ':00:00'
) AS DATETIME)
GROUP_CONCAT can be used for the opening(first) and close(last), the trick is to be sure to use the ORDER BY clause in the group concatenation, otherwise the returned first and last records might not be as expected:
SUBSTRING_INDEX(
GROUP_CONCAT(CAST(open AS CHAR) ORDER BY datetime),
',',
1
) as open
SUBSTRING_INDEX(
GROUP_CONCAT(CAST(close AS CHAR) ORDER BY datetime),
',',
-1
) as close
GROUP_CONCAT returns a comma separated string of all groups values, then SUBSTRING_INDEX is used to return the first and last values of the group. This works as long as data is properly sorted and the resulting string fits in the GROUP_CONCAT buffer.
Note however that there is a problem with the preceding code when calculating the close value. SUBSTRING_INDEX looks from the end of the string, but the GROUP_CONCAT buffer has a limited size, and the it will be truncated if the concatenated rows exceed the buffer size, in which case the returned value will not be the last in the group as desired.
An easy solution is to reverse the sort order and fetch the first element instead:
SUBSTRING_INDEX(
GROUP_CONCAT(CAST(close AS CHAR) ORDER BY datetime DESC),
',',
1
) as close
MySQL will still throw a warning about truncated group_concat when there are too many rows in the source table, but this can be safely ignored for the purpose of this example.
The final query turns out as:
SELECT
CAST(
CONCAT(
year(datetime), '-',
month(datetime), '-',
day(datetime), ' ',
floor(hour(datetime) / 2) * 2, ':00:00'
) AS DATETIME) AS datetime,
SUBSTRING_INDEX(
GROUP_CONCAT(CAST(open AS CHAR) ORDER BY datetime),
',',
1
) as open,
MIN(low) AS low,
MAX(high) AS high,
SUBSTRING_INDEX(
GROUP_CONCAT(CAST(close AS CHAR) ORDER BY datetime DESC),
',',
1
) as close
FROM EURUSD_3600
GROUP BY CAST(
CONCAT(
year(datetime), '-',
month(datetime), '-',
day(datetime), ' ',
floor(hour(datetime) / 2) * 2, ':00:00'
) AS DATETIME);
No related posts.
Related posts brought to you by Yet Another Related Posts Plugin.
aggregate , function , mysql 


October 12th, 2009
Polprav
October 22nd, 2009
Hello from Russia!
Can I quote a post in your blog with the link to you?
Joao
October 26th, 2009
Sure. When you’re done send me the link so I can see it.
stephen
November 19th, 2009
I am trying to turn this into a function, but it won’t work. Please, correct me.
DELIMITER $$
DROP FUNCTION IF EXISTS `test`.`last`$$
CREATE FUNCTION `test`.`last`(f1 varchar(12),f2 varchar(12)) RETURNS varchar(12) CHARSET latin1
return substring_index(group_concat(f1 order by f2 desc),’,',1);
$$
DELIMITER ;
Joao
November 20th, 2009
Not sure you can do this as a function using the same underlying logic.
My first approach was to create an aggregate UDF, but that didn’t work because you can’t guarantee the order of data going into the udf.
Arnold Daniels
November 28th, 2009
Nice little hack
.
You could do this with a MySQL UDF though. Like the above function you need to pass the field with the value as well as the field to order by.
SELECT agg_last(close, datetime AS `desc`) FROM …
Joao
November 28th, 2009
But how do you guarantee sort order ? That’s the bit I couldn’t figure out. Does “datetime as `desc`” somehow make it happen, or is it something that can be implemented inside the UDF ?