Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
Anonymous
Not applicable

DAX and Power BI Optimization

I m new to DAX and Power BI and need help in improving performace in Power BI.
There are 25-30 tables that contain about 15 columns each and is expected to have about 2 million rows.
We are using Direct query mode in Power BI and standard DB design optimizations have been incorporated.

 

Now we need to handle optimizations on Power BI and DAX.

 

So your answers to the below questions will help me improve performance:

 

1. Hiding unused columns

Most of the columns are not used directly in reports. They are used as part of a Measure.
Will it help improve performance if I hide such columns?

M1 = DIVIDE(SUM('TA'[ColA]), SUM('TA'[ColB]), 0)

 

2. Hierarchies and columns

Many textual fields are hierarchical in nature. For example: Region > Country > Province > City
I have created hierarchies in Power BI and then using the same for drill down, multi-level dropdown filters ..etc.

Will it help, if I hide the actual columns while using the hierarchies?

 

3. Removing default summarization from the numeric columns

 

Consider the below measure:

 

M1 = DIVIDE(SUM('TA'[ColA]), SUM('TA'[ColB]), 0)

 

Here we are executing "SUM" operation on ColA and ColB.
Should I enable summarization on these columns? I m not using these columns directly in any of the visuals.

 

4. DAX expressions on a smaller table vs large table

 

As measures are independent of the table, will creating measures in a smaller table as against the table where the columns are present impact performance?

 

5. Multiple smaller DAX expressions vs single DAX expression

 

Consider the DAX expressions for achieving MoM %change:

 

Pass % = DIVIDE(SUM('TA'[total_pass]),SUM('TA'[total_count]), 0)
Pass % Last Month = CALCULATE([Pass %], FILTER(ALL('Calendar'), 'Calendar'[month_year_key] = MAX('Calendar'[month_year_key])-1))
Pass % Second Last Month = CALCULATE([Pass %], FILTER(ALL('Calendar'), 'Calendar'[month_year_key] = MAX('Calendar'[month_year_key])-2))
Pass MoM % Change = DIVIDE([Pass % Last Month], [Pass % Second Last Month], 0) -1

 

While the values seem to be fine, I wanted to know if the approach is correct.
Should I be creating separate simple DAX measures like [Pass % Last Month] and [Pass % Second Last Month] when they are not being used independently. There are many such expressions that have been developed.


As MoM % Change measure now depends on completion of [Pass % Last Month] and [Pass % Second Last Month], will it actually take more time to complete?

 

I could have combined these two into a single expression as below and it still works:

 

Pass MoM % Change = DIVIDE( CALCULATE([Pass %], FILTER(ALL('Calendar'), 'Calendar'[month_year_key] = MAX('Calendar'[month_year_key])-1)),  

CALCULATE([Pass %], FILTER(ALL('Calendar'), 'Calendar'[month_year_key] = MAX('Calendar'[month_year_key])-2)), 0) -1


6. Computed Column vs Table in Column

Consider ColC = ColA + ColB

This can be created as a separate column in DB / we can use computed column.

Which is preferable?

 

Any assistance in this regard would be very much appreciated.

Thanks again.

1 ACCEPTED SOLUTION

Unfortunately, you are looking in the wrong direction.

 

Most of your questions are about DAX and the Power BI model but, if you use DirectQuery, there is not much that you can do in DAX to improve performance. We analyzed the performance and implications of DirectQuery in this whitepaper: https://www.sqlbi.com/articles/directquery-in-analysis-services-2016/. 

 

The quick conclusion is: try to reduce DAX usage to a minimum and super-optimize your SQL database so that it will answer quickly to the specific queries Power BI will generate. The description of the reasons is in the whitepaper.

 

Besides, the real question here is: why using DirectQuery at all? Your model is a tiny one, the data refresh should be very fast and, by avoiding DirectQuery, you would remove most of your speed issues.

 


Have fun with DAX!

Alberto Ferrari
http://www.sqlbi.com

Alberto Ferrari - SQLBI

View solution in original post

4 REPLIES 4

Unfortunately, you are looking in the wrong direction.

 

Most of your questions are about DAX and the Power BI model but, if you use DirectQuery, there is not much that you can do in DAX to improve performance. We analyzed the performance and implications of DirectQuery in this whitepaper: https://www.sqlbi.com/articles/directquery-in-analysis-services-2016/. 

 

The quick conclusion is: try to reduce DAX usage to a minimum and super-optimize your SQL database so that it will answer quickly to the specific queries Power BI will generate. The description of the reasons is in the whitepaper.

 

Besides, the real question here is: why using DirectQuery at all? Your model is a tiny one, the data refresh should be very fast and, by avoiding DirectQuery, you would remove most of your speed issues.

 


Have fun with DAX!

Alberto Ferrari
http://www.sqlbi.com

Alberto Ferrari - SQLBI
Anonymous
Not applicable

@AlbertoFerrari

 

Thanks for the response Alberto.

At this point data size might be small. But we are expecting 10 million rows of data in each table.

 

Hence Direct query seems to be the only option.

 

Dashboard is complex with about 18 data slicers and 10 visuals.

Found 30-40 queries being fired simulataneously.

 

Minimum time to load seems to be 1-2 minutes.

Hence wanted to check if DAX optimizations can bring down total number of queries, reduce complexity of queries ..etc

v-sihou-msft
Employee
Employee

@Anonymous

 

1. Hiding unused columns

 

Hidden columns are also loaded into Power BI model. Hiding just simplifies the view, it will not improve the performance. 

 

2. Hierarchies and columns

 

Also hiding columns will not impact if the columns are involved in a hierarchy. 

 

3. Removing default summarization from the numeric columns

 

Numeric columns will always be defined an aggregation. It's just a property on column. "Do not aggregate" is also a type. I suggest you not setting "Do not aggregate" on numeric column. 

 

4. DAX expressions on a smaller table vs large table

 

It's better to create measures in table where the columns are present so that Vertipaq Engine doesn't need to iterator the mapping values in anthter table.

 

5. Multiple smaller DAX expressions vs single DAX expression

 

There's no much difference. For better usage, I prefer multiple smaller DAX formula if there will be frequently referenced. 

 

6. Computed Column vs Table in Column

 

Try to avoid calculated column. They are evaluated when the model is updated. It will iterator entire table every time. For more details, refer to links below:

Avoiding calculated column in DAX

Calculated Columns

 

Regards,

 

 

Anonymous
Not applicable

The solution provided is mostly true for Import mode.

However, as @AlbertoFerrari mentioned, in Direct query mode the optimizations didn't help me much.

 

I noticed that hiding unused columns, help reduce memory footprint while using PBI Desktop and can significantly impact when you are working in import mode.

 

In Direct Query mode, I noticed that embedding simpler re-usable measures inside the queries (Ex: MAX('Column'[a]) can be made a separate measure that can be used in sevral DAX expressions) brought down the total number of queries that were sent for processing and the waiting time involved. Ths actually resulted in slightly faster execution of queries.

 

Thanks,

Mannu

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.