Re: Reference query in Query Editor vs. creating c...

Arie · ‎02-03-2018

Hi,

I have an import from a large Excelsheet with 1 table, with a lot of columns. As I have learned, I create a starschema from this table, with 1 fact table and 15 dimension tables. Now I have to choose between 2 approaches:

A - Power Query:

I have imported the Exceltable in a staging table, were I did the clean up. Removing unneeded columns, adding columns, trimming values, changing data types, etc. After that I have referenced the staging table to 1 fact table. And after that I have referenced the staging table 15 times to different dimension tables. In each dimension table I have removed the other columns and removed the duplicates, so I have clean and nice looking dimension tables. After that I have loaded the datamodel and set the relationships in Power BI.

What I observed is that loading en refreshing the data model takes a lot of time. Then I see all tables (fact and dimension) loading the same amount of MB's. Because I have referenced all tables to one staging table (which only has a source connection with the Excelsheet), I had expected just one loading stream from this staging table.

B- DAX Calculated Columns:

Another way to create the starschema is to load the Exceltable into 1 table in Power Query and load that into the datamodel. Then I can create the fact tables as calculated tables with DAX (using DISTINCT or VALUES) and create the relationships afterwards.

My question is: what is, regarding the performance of loading and refreshing the data model, the best way to create this datamodel?

Kind regards,

Arie

Anonymous · ‎04-02-2020

Hi there, I would also love to have more insights on that matter. Anyone has any updated on that toppic? Thank you Carlos

hansei · ‎06-26-2019

I would love to see an answer

Anonymous · ‎03-06-2018

@Arie, @v-huizhn-msft, @Greg_Deckler

I have the same query too.

The referenced tables too seem to load data while I had expected that it simply refers the existing table and then applies further transforms.

Tried applying Table.Buffer

This seems to prolong loading of tables by 3-4 times. The large table takes around 1 min to load (without Table.Buffer) and 3-4 minutes after applying the buffer.

Infact I noticed that Duplicating table/ creating new query and removing unwanted columns and duplicates (to create a DIM table) loads faster rather than using Referenced tables.

Is it due to parallel loading of tables when using separate queries?

Thanks,

Mannu

Greg_Deckler · ‎02-03-2018

I'm interested in this question. One thought on this is why create a star schema at all versus just using the denormalized single table?

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

Arie · ‎02-03-2018

That also is an interesting question. Why not just importing the table without any dimensions?

I think the benefit of extra dimension tables comes in when creating extra measures in DAX.

v-huizhn-msft · ‎02-08-2018

Hi @Arie,

When we import data in import model, we import the table without any dimensions. When you get data in Direct Query or Live connect, you connect the datasource direcly, so the relationship are still there.

Power BI – Live connection vs. Import, comparison and limitations

Best Regards,
Angelia

Reference query in Query Editor vs. creating calculated table in DAX?

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly