How-to: Dynamic Pivot in Power Query / T-SQL

Nolock · ‎02-23-2020

Recently, I have been faced with a challenge of pivoting many tables. Every table was completely different, I needed some preprocessing steps, like filtering rows and so on. And I also didn’t want to pivot a huge table in Power Query. The solution? A dynamic pivot written in T-SQL and called from Power Query.

Let’s start with the dynamic pivot written in T-SQL. I have asked Uncle Google for a help. There are many different solutions on the internet. At the end, I have combined some of them and created the following stored procedure.
It has 3 parameters. The first one is a query which I want to pivot. It can contain some selections or projections. The second parameter is a column with an aggregation function used as a result of pivoting. And the last parameter is a name of a column which values are column names of the upcoming pivoted table.

The stored procedure has two parts. The first one gets distinct values of the column @ValueColumn which are new column names of the pivoted table. The other part is then pivoting of a query.

CREATE PROC [foo].[spDynamicPivot] (
	--The source query to pivot
	@Query NVARCHAR(MAX)
	--The aggregation function surrounding the column name to be pivoted
	,@AggregateFunction NVARCHAR(MAX)
	-- Values of this column are column names of the upcoming pivoted table
	,@ValueColumn NVARCHAR(MAX)
	)
AS
BEGIN
	SET XACT_ABORT ON;
	SET NOCOUNT ON;

	-- select values of @ValueColumn for pivoting
	DECLARE @pivotColumns NVARCHAR(MAX);
	DECLARE @pivotColumnsQuery NVARCHAR(MAX) = 'SELECT @pivotColumns = STUFF(
			(
				SELECT '',['' + @ValueColumn + '']'' [text()] 
				FROM (' + @Query + ') Q 
				GROUP BY ' + @ValueColumn + '
				ORDER BY ' + @ValueColumn + ' 
				FOR XML PATH ('''')
			), 
			1, 
			1, 
			''''
		)';
	DECLARE @paramDef NVARCHAR(MAX) = '@pivotColumns NVARCHAR(MAX) OUTPUT';

	EXEC sp_executesql @pivotColumnsQuery
		,@paramDef
		,@pivotColumns = @pivotColumns OUTPUT;

	-- pivot
	DECLARE @pivotQuery NVARCHAR(MAX) = 'SELECT * FROM (' + @Query + ') Q PIVOT( ' + @AggregateFunction + ' FOR ' + @ValueColumn + ' IN (' + @pivotColumns + ') ) P';

	EXEC sp_executesql @pivotQuery;
END

And now the Power Query part comes. I have created a new connection to an SQL Server using the import mode and an SQL statement. The SQL statement is a call of the above-mentioned stored procedure.

The Power Query Editor generates the following code:

let
    Source = Sql.Database(
        "<SQL Server Name>", 
        "<DB Name>", 
        [
            Query="
                EXEC foo.spDynamicPivot
                    @Query = 'SELECT * FROM foo.bar WHERE bar_id IS NOT NULL',
                    @AggregateFunction = 'MAX(content)',
                    @ValueColumn = 'product_label'",
            CreateNavigationProperties=false
        ]
    )
in
    Source

Now, I can create many pivot tables with just a few clicks. It is also a kind of “query folding” because the SQL query is executed on the SQL server and not in Power Query, which was one of my requirements.

Disadvantages

There are always some disadvantages. I have found two, but there are some more for sure.

Disadvantage Nr. 1: Dynamic columns in a Power Query table

What happens if the distinct values in the column containing pivot column names have changed? The dataset refresh fails because it expects some column names, but the query delivers different ones. You must go to the Power Query Editor, refresh the preview, and apply changes. And you must repeat that every time when the underlying table has changed in the way described earlier. Not so cool ☹
If you face this problem, you can write the whole T-SQL statement with predefined column names in SQL Server Management Studio and paste it into your database connection in Power Query editor. But it won't be dynamic anymore.

Disadvantage Nr. 2: The German letter ß vs. ss

It is the same as the disadvantage Nr. 1, but much worse to find out. The problem is in the used collation. In my case, the DB collation does not differentiate between ß and ss. It means that 2 words like groß and gross are the same and therefore you once can get a column called “groß” and other times “gross”. That is a big problem because Power Query thinks there are 2 different columns with completely different column names. And your dataset refresh fails again. You have to preprocess your data before pivoting.

That’s enough for today. If you also have similar problems, let me know down in the comments.

TimoRiikonen · ‎02-27-2020

I tried this with SQL Server 2017, but I didn't get this stored procedure to function.

First I got error "Must define @ValueColumn.

Then I changed one line as following, which fixed the error, but I got other problems:

SELECT '',[' + @ValueColumn + ']'' [text()]

Nolock · ‎03-04-2020

Hi @TimoRiikonen,

thank you very much for the bug report. There was a typo in

SELECT '',['' + @ValueColumn + '']'' [text()]

The fixed query is:

CREATE PROC [foo].[spDynamicPivot] (
	--The source query to pivot
	@Query NVARCHAR(MAX)
	--The aggregation function surrounding the column name to be pivoted
	,@AggregateFunction NVARCHAR(MAX)
	-- Values of this column are column names of the upcoming pivoted table
	,@ValueColumn NVARCHAR(MAX)
	)
AS
BEGIN
	SET XACT_ABORT ON;
	SET NOCOUNT ON;

	-- select values of @ValueColumn for pivoting
	DECLARE @pivotColumns NVARCHAR(MAX);
	DECLARE @pivotColumnsQuery NVARCHAR(MAX) = 'SELECT @pivotColumns = STUFF(
			(
				SELECT '',['' + ' + @ValueColumn + ' + '']'' [text()] 
				FROM (' + @Query + ') Q 
				GROUP BY ' + @ValueColumn + '
				ORDER BY ' + @ValueColumn + ' 
				FOR XML PATH ('''')
			), 
			1, 
			1, 
			''''
		)';
	DECLARE @paramDef NVARCHAR(MAX) = '@pivotColumns NVARCHAR(MAX) OUTPUT';

	EXEC sp_executesql @pivotColumnsQuery
		,@paramDef
		,@pivotColumns = @pivotColumns OUTPUT;

	-- pivot
	DECLARE @pivotQuery NVARCHAR(MAX) = 'SELECT * FROM (' + @Query + ') Q PIVOT( ' + @AggregateFunction + ' FOR ' + @ValueColumn + ' IN (' + @pivotColumns + ') ) P';

	EXEC sp_executesql @pivotQuery;
END

watkinnc · ‎04-06-2020

Great idea, but it sure seems like a lot. Couldn't you just query the source, and then use a reference query to get Table.ColumnNames? Then List.Count them for the # of columns? You could rename them by using each {ColumnName) & "", ""NewColumnName"", Dane idea for type changes.

Im not at my pc, but I'll post the code later. This can DEFINITELY be done in PQ, and without the drawbacks you mentioned!