Solved: Re: Performance considerations when building a Pow...

NV · ‎02-29-2016

Does anybody have a clear view on performance considerations of the tabular data model in a Power BI report? I haven't got any good info on comparing the performance and memory consumption of a tabular model where all dim tables have been merged into the fact table, increasing the total column count of the fact table, but decreasing the amount of relations in the model.

The query time (data update) might be affected when merging many tables into one, but the question is, how does it perform from there on, compared to a datamodel where all tables are stored separately and combined using data model relationships? Fysically the merged table is duplicating the dimension values in the dimension columns in the extended fact table, but does eg. the X-Velocity engine store data in a way so that the end result is still the same as having all dimension values in separate tables?

So the question is what should you consider when choosing between:

1. A datamodel with one giant fact table with many columns for dimensions

2. A datamodel with one fact table and many dimension tables

MattAllington · ‎02-29-2016

There is no one right answer - it depends on your data.

Generally you should aim for a star schema. The more columns in each dimension table, generally the more value in normalising that dimension into a single lookup table. If your lookup table has only 1 primary key and 1 text description column, then get rid of the primary key and use the text description in the fact table.

Generally it is not good to use a snowflake schema, but it depends on your data. Sometimes you have to - it depends.

The biggest issue in power pivot is the cardinality in each column. Keep this low where ever possible.

The next biggest problem is the width of fact tables. The more columns, the bigger the problem.

Duplicate dimensions in a fact table are therefore only a problem (generally) when they have a high cardinality and/or when they contribute to making the fact table really wide. If there are lots of rows, then it adds to the problem, but often less than linearly - but it depends on your data.

I wrote this article that may help. http://exceleratorbi.com.au/the-optimal-shape-for-power-pivot-data/

* Matt is an 8 times Microsoft MVP (Power BI) and author of the Power BI Book Supercharge Power BI.

View solution in original post

NV · ‎02-29-2016

Thank you all for your input. I agree with that from design aspect a star schema is usually the best approach for a number of valid reasons as pointed out by @MattAllington and @greggyb. For simple, one purpose data models and where the row count is large and number of dimensions is small I too would probably opt for the practical, no-fuzz approach of @djnww.

@MattAllington describes the performance aspect I was looking for quite well. I ran some tests and seems to me that when putting the dimesions in a star schema when both row count is large and cardinality of the dimensions is high, there seems to be an increase in performance and reduction in the data model size. When decreasing the cardinality of the dimesions the difference in both size and performance seems to reduce significantly between the star design and the one-table design.

So, all things considered, the star schema still seems to be the best practice also from an performance aspect, especially for more complex data models with dimensions with high cardinality. Not to mention the benefits of readability for end users when data exploration may be involved.

To bad that the Quick Insights - feature seems to not perform very well from a star schema design. This is also one reason i strated thinking of the cosiderations between a one-table design versus the star schema design. Based on this information, however, I may need to consider dividing the reports as:

1) Star(Snowflake) schema for performance optimized, user friendly data exploration reports

2) One-table design for "one purpose" reports, where Quick Insights can provide out of the box data exploration

MattAllington · ‎02-29-2016

There is no one right answer - it depends on your data.

Generally you should aim for a star schema. The more columns in each dimension table, generally the more value in normalising that dimension into a single lookup table. If your lookup table has only 1 primary key and 1 text description column, then get rid of the primary key and use the text description in the fact table.

Generally it is not good to use a snowflake schema, but it depends on your data. Sometimes you have to - it depends.

The biggest issue in power pivot is the cardinality in each column. Keep this low where ever possible.

The next biggest problem is the width of fact tables. The more columns, the bigger the problem.

Duplicate dimensions in a fact table are therefore only a problem (generally) when they have a high cardinality and/or when they contribute to making the fact table really wide. If there are lots of rows, then it adds to the problem, but often less than linearly - but it depends on your data.

I wrote this article that may help. http://exceleratorbi.com.au/the-optimal-shape-for-power-pivot-data/

* Matt is an 8 times Microsoft MVP (Power BI) and author of the Power BI Book Supercharge Power BI.

greggyb · ‎02-29-2016

It's also worthwhile to consider the usability of the solution. We find that our clients' end users tend to do better with a star schema, as each dimension encapsulates a single business concept and its related attributes.

When dimensions consist of many attributes, its easier to have them grouped together. In a single very wide fact table, people will need to know all of the attribute fields they're interested in.

In terms of data exploration / discoverability, multiple orthogonal dimensions are easier to understand than a single monolithic table. E.g. a field named [Size]: Does this describe one of several product sizes, or does this describe customer size (revenue), or does this capture a human being's shirt size?

One way to work around this would be to use logical dimension prefixes, so that someone browsing the model would be able to see all Customer attributes together. Now, you've just used a naming convention to do the job of a dimension. There's nothing wrong with this, but I'd look askance at a model implemented this way.

Ninja edit: An filtering on dimensions becomes a bear. Rather than being able to manipulate the context of a dimension, or clear that context with an ALL( 'Dim' ), you'll have to use ALL( 'Monolith', 'Monolith'[logical dim field 1], 'Monolith'[logical dim field 2], 'Monolith'[logical dim field 3], ... ).

Check out the Power BI User Group of Philadelphia
Next meeting: October 5, 2016 Virtual meeting

Phil_Seamark · ‎02-29-2016

You may also want to consider how you track changes to data into account. This can be done a number of ways including using columns in Fact & Dimension tables to manage the tracking.

To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

djnww · ‎02-29-2016

@NV

Without a doubt, a flat table will be quicker than multiple tables with joins. This is why OLAP technologies predominantly work with a flat file structure.

Others may disagree with me, but I let our data warehouse manage the tables and relationships. PowerBI is about getting the data you need for a specific purpose. It is best practise to do all your complex joins before the data makes it it into PowerBI. I generally only use PowerBI to create joins to reference tables. Hence, I may have a report that has 3 non-related datasets/tables (Sales, Staff, Customer Details). I join all 3 to a date reference table that contains date, hour, year, day type etc. That way, there is no confusion over which date fields to use.

Keep it simple wherever possible without losing too much functionality. Sorry.. I don't actually have a definitive answer for you.

Daniel

Performance considerations when building a Power BI datamodel

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly