Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Advice required on the Pros and Cons on normalising a very large flattened table into a Star Schema

Hi,

I have been given a SQL query, by the solution team, to retrieve data needed to satisfy a report. I execute this query using a SQL Server connection in Import mode. The query joins on 6 tables with minimal functions taking place in the query. The returned dataset is 20 columns wide and 19 million rows deep.

 

I'm thinking 19 million rows deep requires the implementation of a best practice Star Schema. I see a single FACT table and 4 DIMENSION tables being the model.

 

The original flat table has it's Enabled Load setting disabled. I then reference that same flat table 5 times, once for each of the tables (1 FACT table and 4 DIMENSION tables).

 

So here's the question, each of these 5 tables has to have the full 19 million rows loaded to it. It is seriously time consuming. Yes, the Dimension tables will have duplicate rows removed and therefore end up being considerbly less than the 19 millions rows, but it is the initial loading of these 19 millions rows into each of the 5 tables each time the data is refreshed (within Power Query or through the Power BI Service). Corret me if I'm wrong.

 

How do you weigh up the Pros and Cons of creating a Star Schema (or Snow-Flake Schema for that matter)?
Where do you draw the line and say a flat table of 20 columns and 19 million rows loaded once is better than a Star Schema that has to have 19 million rows loaded 5 times to populate each table?

 

Would the quickness of the slicing/filtering on the analytics/measures outweight the repetitive loading time?

 

What are your thoughts?
Thanks.

1 ACCEPTED SOLUTION
Anonymous
Not applicable

Thanks @v-yulgu-msft @SjoerdV for your responses. I'm going to close this post and re-ask the question in a different way with a different post title. I feel I may get more view/responses this way. Plus, the topic will be useful for pretty much everyone that does data model within Power BI.

View solution in original post

3 REPLIES 3
SjoerdV
Frequent Visitor

Kind of the same situation as mine, only with much more records.

 

What I'm (currently) doing, is importing 2 CSV fact-tables with a lot of columns. I do some ETL-stuff on those 2 tables and then disable loading them. From there I create a singe fact-table, together with some dimension-tables by extracting data from the 2 original CSV fact-tables. 

I don't know if this is the most efficient way, but it saves me from creating a wide fact-table. From what I can see so far, is that we're doing exactly the same here. I therefore wonder if the query is actually loading the dim-tables 5 times...

Anonymous
Not applicable

Thanks @v-yulgu-msft @SjoerdV for your responses. I'm going to close this post and re-ask the question in a different way with a different post title. I feel I may get more view/responses this way. Plus, the topic will be useful for pretty much everyone that does data model within Power BI.

v-yulgu-msft
Employee
Employee

Hi @Anonymous,

 

Here are some links for your reference:

Star or Snowflake for datamodel?

Star Schema vs Snowflake Schema and Why You Should Care

Pros and cons of the snowflake schema

 

Regards,

Yuliana Gu

Community Support Team _ Yuliana Gu
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors