This post is about a tool which converts a Power BI dataset to a Power BI Dataflow. I have analyzed the internals of PBIT files and Power BI Dataflow JSON files in depth and created a PowerShell script which converts any PBIT into Power BI Dataflow JSON.
I have been searching for a conversion tool for a long time. The only solution I have found was a manual conversion like in this blog post of @MattAllington or this post of Reza Rad. Also, I have recently studied the internals of the PBIT/PBIX file and I have tried to extract the maximum of it. And then there was only one step further to analyze the structure of a Power BI Dataflow JSON file.
I wanted to have a script which does all the repetitive work for me. I do not like the kind of assembly-line-work in IT!
The mighty tool I am talking about is absolutely no magic. It parses Power Query queries, their names, Power Query Editor groups, and some additional properties from a PBIT file. Then, it transforms all the parsed information into a form which is used by Power BI Dataflows. It is a JSON file used for import/export of dataflows. An example of such a file follows:
In this project, I use the files DataMashup and DataModelSchema. When you open the file DataMashup, you only see some binary text.
But when you scroll to the right you see there is an XML object. That is the part of the file I am interested in. It contains all the Power Query queries and their properties.
The second file, DataModelSchema, is a JSON file. It contains all tables and their columns which are loaded into the tabular model.
There are also columns properties but many of them, like summarizeBy or Format, are important for the Power BI model but not for a dataflow. The only important property is the data type of a column. The rest can be ignored.
How to use the script
The script is written in PowerShell 5.1. There are a plenty of functions defined at the beginning. The start of the execution is in the end of the script.
But first, navigate to the directory where your PBIT file is stored. Then go to the end of the script and change the variable $fileName to the name of your PBIT file. The output file will be generated in the same directory with a name of your PBIT file + “.json”. You can change the name if needed, too. The last line is the call of the function GenerateMigrationString. Its return value is then saved to the output file.
# name of the input PBIT file
$fileName = "BaseIT Dataset v1.2.pbit"
# name of the output JSON file
$jsonOutputFileName = $fileName + ".json"
# generate the migration string from a PBIT file
GenerateMigrationString($fileName) | Out-File $jsonOutputFileName -Encoding utf8
The last step is an import into Power BI Dataflows as you can see in the following screenshot.
I have tested the code with a huge dataset having over 300 complex queries in its ETL process.
And the working result in Power BI Dataflows:
I would like to describe some limitations of Power BI source files and Power BI Dataflows.
Group names and group hierarchy
While analyzing the structure of a PBIT/PBIX file, I found out that I can parse a group ID of a Power Query Group, but not its name. Moreover, I could not read the hierarchy of groups.
These both properties are stored encrypted in the file DataMashup, as you can see on the following screenshot.
I have tried to decode it with a Base64 decoder, but I got only a binary object. My next idea was to check if it is an encoded table like in Power Query Enter Data Explained. Also not working. If somebody has an idea, how to decode and interpret the group names and the group hierarchy, please let me know.
The exact order of properties
There were some stumbling stones during the development. One of them is an order of properties. Do not ask me why, but sometimes the order of properties in the dataflow JSON import file plays a role. If you do not keep the exact order, the import file is rejected by Power BI Dataflow. I worked with objects which are serialized to JSON. At the beginning, I did not know how to force the JSON serializer to generate properties in an exact order. The solution was using the Add-Member method.
Do you know the record #shared? It contains all built-in and custom functions and all your custom queries. More about that for example here. The problem is this record works in Power BI Desktop only and cannot be used in Power BI Service. The PowerShell script ignores all queries containing the keyword #shared and writes a warning like “WARNING: The query 'Record Table' uses the record #shared. This is not allowed in Power BI Dataflows and the query won't be migrated.”
Multiline comments in Power BI Dataflows
The Power BI Dataflows do not support multiline comments at the time of writing the article. There is already an official issue and the bug will be fixed in the near future.
I have a dataset containing an ETL process with more than 300 queries. If I wanted to migrate this dataset manually into Power BI Dataflows, it would take hours or even days. Thanks to this script, the job is done in minutes. And every single next dataset, too 😊