Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
plainly
Regular Visitor

How you handle huge dimension tables DAX bottle neck?

Hello,

 

I often have analytics use cases for millions of customers or contracts (dimension tables with millions of values). SQLBI has a great article on the costs of relationships: https://www.sqlbi.com/articles/costs-of-relationships-in-dax/. The key message by SQLBI is that large dimension tables are a challenge: ”When you design a star schema, which is the suggested design pattern for this technology, this translates in a bottleneck when you have large dimensions, where large is something that has more than 1-2 million rows. 

 

I did a performance study of star schema vs snowflake schema in this type of environment:

https://www.plainlyresults.com/blog/power-bi-dax-the-cost-of-relationships-snowflake-vs-star-schema/

 

I have found as main approaches the following to avoid huge dimension tables:

Can a table be grouped or aggregated to make the table smaller?
Can some granular dimensions be grouped? Can a fact table be aggregated so that it still answers most business questions? There can be another fact table linked to the data model for selected use cases like the detailed information of a contract or customer. This could be even implemented with Power BI composite models and direct query.

Is it possible to turn fact data into informative dimensions?
Consider a set of fact tables describing sales channel features per order. It might be better to create parameters or segmentation with machine learning which describes how customers behave in different sales channels. This makes the data model less heavy. Same time, business users have a better user experience concentrating on the key facts to improve the business.

One can get rid of a table relationship
One can combine a dimension column with a high cardinality to the fact table to get rid of a costly table relationship. With this decision needs to be checked that all the calculations are logical now and in the future. Combining a dimension to the fact means basically a bidirectional filtering relationship between the fact and the dimension which might not be the desired functionality in some calculations.

 

In practice, I often end up using alternative get rid of a table relationship or I use a piece of snowflake schema to isolate a selected calculation "entity" instead of trying to combine all to a huge fact table with pure star schema.

 

How would you approach this topic? How you handle big dimension tables and huge fact tables linking to those dimension tables which are required for a lot of calculations?

1 ACCEPTED SOLUTION
Anonymous
Not applicable

Hi there.

 

If one has a huge dimension table, then one should consult the book by Ralph Kimball (can't remember the name but you can look and find it here About the Kimball Group | Data Warehouse/Business Intelligence Experts), where he explains the different approaches to modeling such situations.

 

On top of that, I also insist on people having the best possible hardware for huge tabular models. That means mainly the fastest possible CPUs, the fastest RAM and suitable NUMA settings.

 

Also, always try to remove columns you don't need, try to split values so that columns have low-cardinality data in them and try to not use PBI the way it should not be used. For instance, if your clients want paginated reports, then use Paginated Reports, not PBI visuals. I find that most people try to sqeeze square pegs into round holes... No, this is not how it works.

 

Best

View solution in original post

1 REPLY 1
Anonymous
Not applicable

Hi there.

 

If one has a huge dimension table, then one should consult the book by Ralph Kimball (can't remember the name but you can look and find it here About the Kimball Group | Data Warehouse/Business Intelligence Experts), where he explains the different approaches to modeling such situations.

 

On top of that, I also insist on people having the best possible hardware for huge tabular models. That means mainly the fastest possible CPUs, the fastest RAM and suitable NUMA settings.

 

Also, always try to remove columns you don't need, try to split values so that columns have low-cardinality data in them and try to not use PBI the way it should not be used. For instance, if your clients want paginated reports, then use Paginated Reports, not PBI visuals. I find that most people try to sqeeze square pegs into round holes... No, this is not how it works.

 

Best

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.