Recently, I have been faced with a challenge of pivoting many tables. Every table was completely different, I needed some preprocessing steps, like filtering rows and so on. And I also didn’t want to pivot a huge table in Power Query. The solution? A dynamic pivot written in T-SQL and called from Power Query.
Let’s start with the dynamic pivot written in T-SQL. I have asked Uncle Google for a help. There are many different solutions on the internet. At the end, I have combined some of them and created the following stored procedure. It has 3 parameters. The first one is a query which I want to pivot. It can contain some selections or projections. The second parameter is a column with an aggregation function used as a result of pivoting. And the last parameter is a name of a column which values are column names of the upcoming pivoted table.
The stored procedure has two parts. The first one gets distinct values of the column @ValueColumn which are new column names of the pivoted table. The other part is then pivoting of a query.
CREATE PROC [foo].[spDynamicPivot] (
--The source query to pivot
--The aggregation function surrounding the column name to be pivoted
-- Values of this column are column names of the upcoming pivoted table
SET XACT_ABORT ON;
SET NOCOUNT ON;
-- select values of @ValueColumn for pivoting
DECLARE @pivotColumns NVARCHAR(MAX);
DECLARE @pivotColumnsQuery NVARCHAR(MAX) = 'SELECT @pivotColumns = STUFF(
SELECT '',['' + @ValueColumn + '']'' [text()]
FROM (' + @Query + ') Q
GROUP BY ' + @ValueColumn + '
ORDER BY ' + @ValueColumn + '
FOR XML PATH ('''')
DECLARE @paramDef NVARCHAR(MAX) = '@pivotColumns NVARCHAR(MAX) OUTPUT';
EXEC sp_executesql @pivotColumnsQuery
,@pivotColumns = @pivotColumns OUTPUT;
DECLARE @pivotQuery NVARCHAR(MAX) = 'SELECT * FROM (' + @Query + ') Q PIVOT( ' + @AggregateFunction + ' FOR ' + @ValueColumn + ' IN (' + @pivotColumns + ') ) P';
EXEC sp_executesql @pivotQuery;
And now the Power Query part comes. I have created a new connection to an SQL Server using the import mode and an SQL statement. The SQL statement is a call of the above-mentioned stored procedure.
The Power Query Editor generates the following code:
Source = Sql.Database(
"<SQL Server Name>",
@Query = 'SELECT * FROM foo.bar WHERE bar_id IS NOT NULL',
@AggregateFunction = 'MAX(content)',
@ValueColumn = 'product_label'",
Now, I can create many pivot tables with just a few clicks. It is also a kind of “query folding” because the SQL query is executed on the SQL server and not in Power Query, which was one of my requirements.
There are always some disadvantages. I have found two, but there are some more for sure.
Disadvantage Nr. 1: Dynamic columns in a Power Query table
What happens if the distinct values in the column containing pivot column names have changed? The dataset refresh fails because it expects some column names, but the query delivers different ones. You must go to the Power Query Editor, refresh the preview, and apply changes. And you must repeat that every time when the underlying table has changed in the way described earlier. Not so cool ☹ If you face this problem, you can write the whole T-SQL statement with predefined column names in SQL Server Management Studio and paste it into your database connection in Power Query editor. But it won't be dynamic anymore.
Disadvantage Nr. 2: The German letter ß vs. ss
It is the same as the disadvantage Nr. 1, but much worse to find out. The problem is in the used collation. In my case, the DB collation does not differentiate between ß and ss. It means that 2 words like groß and gross are the same and therefore you once can get a column called “groß” and other times “gross”. That is a big problem because Power Query thinks there are 2 different columns with completely different column names. And your dataset refresh fails again. You have to preprocess your data before pivoting.
That’s enough for today. If you also have similar problems, let me know down in the comments.