How to manage lifecycle and deployment for dataset...

rogersmj · ‎02-12-2020

Our company is building a data platform and SaaS product suite, and part of that product involves embedding Power BI.

For our first product, we developed a standard Power BI model and set of reports -- every customer got the same thing. An ETL engine behind the scenes brings in everyone's data and standardizes it to fit the Power BI model. That model is huge and complex, and is core to our niche. We have a template PBIX that we version control, and at deployment time we have a tool that uses the Power BI API to upload that file to the Power BI service and set its parameters for each of our customers (each customer has an instance of the same report/dataset in the service).

Now, however, we are trying to expand our offering and develop additional "modules" that involve other types of data which relate back to the core model. Not all customers will have the same modules. Some have CORE + A, some have CORE + B, some have CORE + B + C, etc. Each of these modules is particular to a domain of expertise, so they each should be considered discrete from each other, and could be developed by different people/teams. But we want them all to extend/build upon the CORE model, which they cannot change. For that reason -- and for development parallelism -- we don't want to build all these different domains into one giant model. We would like to be able to reference the CORE in secondary models that add the other elements.

It doesn't seem like there's any way to do this without creating a maintenance nightmare by creating completely unlinked copies of the original core data model, building on it, and deploying them all as unique artifacts. And I'm talking hundreds of customers here, so we don't want to do that. We don't want to get to a place where if we make an update to, say, a measure or table in the CORE model, we have to go manually update 100+ different Power BI files. Same with measures in one of the extension modules, for that matter -- if 25 customers have module A, we don't want to update those 25 datasets manually either.

We've found that you can create a new report in Power BI desktop and connect it to an existing dataset in the Power BI service (this is the shared datasets concept), but you can't extend it by adding new tables and relationships that only exist in your secondary report. We also haven't been able to find any way to treat the Power BI schema as code and deploy discrete objects within it to targeted datasets -- I think that would be ideal, because then we could make one change in a module's model (to a measure calculation for example) and with the push of a button propogate that new object definition to all customers who use it.

No matter which way we pull on this thread we seem to run into a dead end. There doesn't seem to be any way to share common elements of a "global" data model while also selectively extending it, and not digging ourselves into a maintenance nightmare. Any suggestions/ideas would be most welcome.

How to manage lifecycle and deployment for datasets that have common and variable elements at scale?

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024