Pivoting data is a rather common problem that comes in many different flavors. At its heart, the requirement is to transpose data from multiple rows into columns of a single row.
This requirement is particularity common in a reporting context. The following explanation is therefore based on a query that reports monthly revenues:
SELECT EXTRACT(YEAR FROM invoice_date) year
, EXTRACT(MONTH FROM invoice_date) month
, SUM(revenue) revenue
FROM invoices
GROUP BY EXTRACT(YEAR FROM invoice_date)
, EXTRACT(MONTH FROM invoice_date)
The query returns the result in a purely vertical form—one value per row. Quite often, the data is required in another form: for example, just one row per year and a separate column for each month. In other words, the rows of a year should be turned into columns.
The first step in implementing this requirement is to remove the month from the group by
and select
clauses to get one row per year:
SELECT EXTRACT(YEAR FROM invoice_date) year
, SUM(revenue) total_revenue
FROM invoices
GROUP BY EXTRACT(YEAR FROM invoice_date)
Obviously, the result not longer provides a monthly breakdown, nevertheless, this step is required to condense the result to a single row per year.
The next step is to define twelve columns where each column summarizes the revenues of one month only. To get the revenue for January, for example, the expression sum(revenue)
must take only invoices from January into account. This can be easily accomplished with the filter
clause:
sum(revenue) FILTER (WHERE EXTRACT(MONTH FROM invoice_date) = 1)
The filter
clause limits the rows aggregated to those satisfying the condition in parenthesis. In this example, only the invoices from January. The revenues of the other months can be obtained in the same way.
To make the query more literate, the extract
expression can be moved to a central location. That could be a generated column or a view so that other queries could reuse this expressions. For this example, it is sufficient to centralize the extract
expression within the query—either using the with
clause, or as an inline view:
SELECT year
, SUM(revenue) FILTER (WHERE month = 1) jan_revenue
, SUM(revenue) FILTER (WHERE month = 2) feb_revenue
...
, SUM(revenue) FILTER (WHERE month = 12) dec_revenue
FROM (SELECT invoices.*
, EXTRACT(YEAR FROM invoice_date) year
, EXTRACT(MONTH FROM invoice_date) month
FROM invoices
) invoices
GROUP BY year
Conforming Alternatives
Even though the filter
clause was introduced with SQL:2003, it is hardly supported today. Luckily, this is not a big problem because case
can be used for the very same purpose. The trick is to map values that do not satisfy the filter criteria to neutral values, which do not change the result of the aggregation. Null
is a very good choice for this because it does not change the result of any aggregate function—not even avg
. Furthermore, else null
is the default clause for case
expressions without explicit else
clause anyway—it is sufficient to skip the else
clause altogether.
SELECT year
, SUM(CASE WHEN month = 1 THEN revenue END) jan_revenue
, SUM(CASE WHEN month = 2 THEN revenue END) feb_revenue
...
, SUM(CASE WHEN month = 12 THEN revenue END) dec_revenue
FROM (SELECT invoices.*
, EXTRACT(YEAR FROM invoice_date) year
, EXTRACT(MONTH FROM invoice_date) month
FROM invoices
) invoices
GROUP BY year
The expression CASE WHEN month = 1 THEN revenue END
evaluates to the revenue for invoices from January. For other invoices, the implied else null
returns the null
value, which does not change the result of sum
. See also “Null
in Aggregate Functions (count, sum, …)” and “Conforming Alternatives to filter
”.
If you like this page, you might also like …
… to subscribe my mailing lists, get free stickers, buy my book or join a training.
The Special Case of EAV
The greatest challenge with the pivot problem is to recognize it when you run into it. This is particularity true when dealing with the so-called entity-attribute-value (EAV) model: it does not look like a pivot problem, but it can nevertheless be solved in the very same way.
The EAV model takes normalization to the extreme and no longer uses columns in the traditional way. Instead, every single value is stored in its own row. Besides the value, the row also has a column to specify which attribute the value represents and a third column to identify what entity the values belongs to. Ultimately, a three column table can hold any data without ever needing to change the table definition. The EAV model is thus often used to store dynamic attributes.
The EAV model does not come without drawbacks: It is almost impossible to use constraints for data validation, for example. However, the most puzzling issue with the EAV model is that the transformation into a one-column-per-attribute notation is almost always done using joins—quite often one outer join per attribute. This is not only cumbersome, it also results in very poor performance—a true anti-pattern.
However, turning rows into columns is the pivot problem in its purest form. Therefore, these steps should be followed again: (1) use group by
to reduce the rows to one row per entity, (2) use filter
or case
to pick the right attribute for each column.
SELECT submission_id
, MAX(CASE WHEN attribute='name' THEN value END) name
, MAX(CASE WHEN attribute='email' THEN value END) email
, MAX(CASE WHEN attribute='website' THEN value END) website
FROM form_submissions
GROUP BY submission_id
Note the use of the max
function: it is required to reduce the rows of the group (all attributes) into a single value. This is a purely syntactic requirement that is applicable regardless of the actual number of rows that are grouped.
To obtain the original value for each attribute—even though we have to use an aggregate function—the respective filter logic (case
or filter
) must not return more than one not-null
value. In the example above, it is crucial that each of the named attributes (name
, email
, website
) exists only once per submission_id
. If duplicates exist, the query returns only one of them.
The prerequisite that each attribute must not appear more than once is best enforced by a unique constraint.0 Alternatively, the query can count the aggregated rows using count(*)
and the respective case
expressions (or filter
clauses). The results can be validated in the application—if selected as additional columns—or in the having
clause: having count(*) filter (...) <= 1
.
If the prerequisite is satisfied and the aggregation is always done on a single not-null
value, every aggregate function just returns the input value. However, min
and max
have the advantage that they also work for character strings (char
, varchar
, etc.).
Limitations
SQL is a statically typed language: the query must list the result columns upfront. To pivot a table with unknown or dynamic attributes, multisets or document types (XML, JSON) can be used to represent a key-value store in a single column. See conforming alternatives to listagg
: document types.
Compatibility
Proprietary Alternatives
pivot
and unpivot
(SQL Server, Oracle)
SQL Server supports the pivot
and unpivot
clauses since version 2005 (documentation). They are also available in the Oracle Database since version 11g (documentation).
model
clause (Oracle)
The proprietary model
clause, which was introduced with Oracle Version 10g, can also be used to solve the pivot problem (documentation).
crosstab
table function (PostgreSQL)
The PostgreSQL database is delivered with a set of crosstab
table functions to pivot data (documentation).