I'm trying to implement a range of data governance workflows in an organizational context in Power BI desktop and service. I want to programatically capture transforms that occur on data, specifcally aiming for column and ID/code text label transforms and want to ultimately manage these in the CDM folders on azure storage as entity mapping terms. It's to support a controlled taxonomy of terms in essence.
Even if some parts require some intervention. I can do it all manually but that simply won't be sustainable.
This is a long multipart use case ( a thought dump). I don't expect a full answer, but a simple... your dreaming, yes it's possible, use this approach, or look at this tool it does similar things would be so so appreciated ! 🙂
Request: I'm hoping for some inspiration on how to automate the extraction of a set of data transformations in the Power BI stack
Goal: Automatically (as much as possible) capture Power Query function transformation actions such as column or text pattern replacements, that occur on original SQL/csv column names or text strings e.g. ID's or short codes which are changed to user labels for final BI reports.
Tech requirements: It's in a government agency built on .NET, Azure and centered on Power BI. So tools in those frameworks are key dependencies.
For background.... I use RDF https://www.w3.org/RDF/ , SKOS https://www.w3.org/TR/skos-reference/#labels and an enterprise level vocabularby manager to do all this with other clients. But this specific client does not have the option to go down that level of investment.... but they still want data governance ( he bangs head against wall )
My envisaged workflow is:
1. BI report authors publish a report and/or dataset to Power BI service
2. a scheduled event .... a dataflow Power Query script or R script or an Azure service or powershell scriot or cron job or something else ! ... runs and performs introspection on a) each table in the loaded data model and/or b) dumps a copy of the M-code for parsing.
3. a script extracts key metadata about a set of transformations that I define. e.g. I want to capture all replace.text function calls on columns or text patterns so that textOriginal and textFinal are captured (plus some other metadata). Hence at this stage I see that I need to be able to parse the M-query itself, not just get the "final" table headers or cell contents
3. create, append or export this information to a persisted file or dataset. e.g.
3a) directly create/append data to a new power bi table: or
3b. if an intermediate step is required, a csv, for example in sharepoint, then run a script to ingest this into a Power BI dataset/flow
4. export/serialize the final table/data file in the required CDM folder structure ( I woud use python most likely)
5. load the custom entity mapping to the CDM folder ( python)
So what have I acheived so far ....
6. I've experimented with some R sctipts to export a table as csv - works fine but only gets end state column labels or cell content
7. I've set up a bunch of regex functions that extract all the text replace functions and NameOriginal, NameFinal on columns or text strings from the M-Code.
Next steps/ Gaps
8. work out how to export this data to a sharepoint folder
9. how to run "something" automatically to extract the M-Code to somewhere e.g. sharepoint power bi storage folder?
10. how to run the regex parsing on this file
10a) I'm assume Powershell is the line here. And could be scheduled ?
Can powershell tasks be automated on Azure ? i guess yest
10b) it appears that R can now use regex and so I wonder if R scripts can scheduled in Power BI service in anyway?
11. Key issue..... I haven't found a likely candidate to then run a scheduled/event driven process to create/append this data to a Power BI dataset or report. e.g. M code that can scheduled to create stage table/data
At this point I'm expecting to have a set of scripts run on an azure server, likely python flask to orchestrate things.
I've connected to the tabular model of the power bi desktop files a lot.
I know once in the tabular model or .bim files all dax and I think M-code can be obtained. So I could then parse these. But it's all manual so far as I've seen via Daxstudio or excel.
Any ideas on automating this ?
Can the same be acheived to Power BI service ? or Azure gen2 storage ?