Creating Custom Aggregate Functions in PostgreSQL using PL/PERL

This article describes how to create aggregate functions, similar to SQL's min and max, to summarise data for you in queries.

Postgres supports the creation of custom aggregate functions in various languages, including PL/SQL, PL/Perl, PL/Python, and PL/C. The general method is described in their documentation. Using PL/PERL for custom functions is also described. However, there is no good description of how to the aggregate function operations using Perl.

The only relevant resource I could find was how to create aggregate functions using PL/SQL. This is fine if the type you are attempting to summarise is "accessible" using SQL. Unfortunately Postgres's path data type does not have a feature allowing you to extract the actual points that compose that path (!). Hence, some text processing is necessary, and hence I used Perl.

An aggregate requires three things:

A new type that holds the running state of the aggregate's result (between the processing of each record).
A function to calculate the aggregate from the records passed to it.
A new aggregate definition that specifies the input (base) type, the state variable type, and the function to be used for calculating the aggregate.

The aggregate will normally return a tuple of the type of the state variable, but this can be changed by using a final function to post-process the result of the aggregate, and provide a more useful return type. See the pages above for details.

The function to be used for the aggregate must accept two arguments. The type of the first is the type of the state variable, whilst the type of the second is the type of the column to which the aggregate function will be applied (the base type specified when the aggregate is defined). Note that you can have polymorphic functions (see the web pages above).

An important point is that SQL is case insensitive. This means that all the fields of your custom type, as well as your function name should be lower case. Use underscores to separate words. If you do not do this, you risk errors of the form "Perl hash contains non-existant column %s", where %s is the field name in question.

The state variable is passed to the function as a reference to a hash, which then has values for keys that have the same name as the fields of the custom type you defined.

The function itself can use any of the normal Perl constructs. However, for debugging, print does not work, so one way to achieve a similar effect is to define a field of type TEXT in your custom type, which you then write into with whatever debug output you wish. This will then be printed in the result tuple.

Given the above, here is an example:

The custom type is defined first as:
```
CREATE TYPE bounding_lat_long AS (
  minlat FLOAT,
  maxlat FLOAT,
  minlong FLOAT,
  maxlong FLOAT
);
```
(Note that you could additionally have the debug TEXT line at the end).

The function is then given as below. Note that each block must be delimited, which can be using quote marks (inconvenient as then any inside the block must be escaped), or using dollar signs. The delimeters can be made different from each other by enclosing any string between the dollar signs. The function that follows is designed to find the minimum latitude and longitude values from the collection of all the points that are in all the paths of the records that are being aggregated. Note that $state of type bounding_lat_long is used to store the maximum and minimum latitude and longitude values found thus far, between records. The final contents of that hash is returned as the result.

CREATE OR REPLACE FUNCTION get_bound_lat_long(bounding_lat_long, path) RETURNS bounding_lat_long AS $f$
 my $state = $_[0];
 my $pathToEval = $_[1];
 my $count = 0;

 #my $refTest;

 my @points = split /\),\(/, $pathToEval;

 my $prevMaxLat = $state->{maxlat};
 my $prevMinLat = $state->{minlat};
 my $prevMaxLong = $state->{maxlong};
 my $prevMinLong = $state->{minlong};

 while ($count <= $#points) {

  $points[$count] =~ s/[\(\)]//g;
  $points[$count] =~ s/[\[\]]//g;
  $points[$count] =~ /^(.*?),(.*?)$/;
  my $lat = $1;
  my $long = $2;

  #$refTest = $2;

 if (!defined($prevMaxLat)) {
  $prevMaxLat = $lat;
 }
 if (!defined($prevMaxLong)) {
  $prevMaxLong = $long;
 }
 if (!defined($prevMinLat)) {
  $prevMinLat = $lat;
 }
 if (!defined($prevMinLong)) {
  $prevMinLong = $long;
 }
  
  if ($lat > $prevMaxLat) {
   $prevMaxLat = $lat;
  }
  if ($lat < $prevMinLat) {
   $prevMinLat = $lat;
  }
  if ($long > $prevMaxLong) {
   $prevMaxLong = $long;
  }
  if ($long < $prevMinLong) {
   $prevMinLong = $long;
  }
  
  $count ++;
}

$state->{maxlat} = $prevMaxLat;
$state->{minlat} = $prevMinLat;  
$state->{maxlong} = $prevMaxLong;
$state->{minlong} = $prevMinLong;
#$state->{debugx} = $refTest;

return $state;

$f$ LANGUAGE plperl;

Finally, the aggregate can be defined as:

CREATE AGGREGATE min_max_lat_long (
 sfunc = get_bound_lat_long,
 basetype = path,
 stype = bounding_lat_long
);

This aggregate can be used on any field of type path:

cambridge=# select min_max_lat_long(path) from roadlinks;
            min_max_lat_long             
-----------------------------------------
 (52.209526,52.214786,0.066648,0.111773)
(1 row)