Model Optimization: M vs. Relationships vs. DAX Ta...

Anonymous · ‎03-21-2019

Hi,

This is a PBI model optimization question.

Power BI provides a plethora of ways to complete the same tasks. As the title suggests, many things can be done in multiple stages of Power BI. When joining tables as an example, one can clean them and merge them in M, or close and load and use a lookup column, or if it's a calculation you can use a measure, etc.

Power BI can also lead to problem abstraction. That is, when working across imperfect tables issues can be hidden and steps become more complex to solve business problems. For instance, if you don't merge tables in M, and instead opt for a relationship between the two tables, and then pull those tables into a table visualization, row context will change how each calculation is measured. If later on you need to add an additional calculation that needs to be performed on an aggregation of these tables combined, with row context excluded, you now have to think through how to solve for the row context. One way of solving for it would be to do as much work pre-relationships and measures as possible.

My solution thus far has been to perform as many steps in M as possible. Merging/appending tables, creating keys between tables, cleaning tables, etc. I drive most of it from M.

This has turned out to be a non-optimal solution. M struggles to deal with my data volumes which are around 25-100mill rows worth of calculations. In fact in some cases it just freezes entirely.

Yet I get stuck thinking about how to solve for this. Because the data involves so many transformations, it seems I have to rely on removing abstractions as much as possible so issues will be evidenced and context doesn't impede progress.

My question is, what is a more optimal way to sort through where to do what when within Power BI?

Thanks!

jsh121988 · ‎03-21-2019

So I don't have a definitive answer as you know there are many methods. I always say choose the one with the highest performance, and it's case by case.

In my case, I pull about 2-3 mil rows on wider tables, and 30 mil on skinnier (10 columns mostly numbers). I use SQL to handle all transformations as it's easier to bulk modify and read. I tend to avoid PowerQuery unless im exploring the data. I avoid giant flattened tables as the impact performance on the SQL server, and instead use relationships. Moving away from M has increases performance IMO.

If you really need to, perhaps you can have an ETL that runs on your source that precalculates your logic. We do this in extreme cases in my org.

I did see a suggestion recently on a workaround incremental refresh. You split your data into 2 tables coming in, one that will refresh and one that won't, then append/union the data in M. For the table that doesn't refresh, you need to make sure the data will never need a refresh. i haven't put this into practice, but it may be something that can help you with a little research.

Model Optimization: M vs. Relationships vs. DAX Tables vs. DAX Columns vs. DAX Formulas

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly