Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hi,
I have an import from a large Excelsheet with 1 table, with a lot of columns. As I have learned, I create a starschema from this table, with 1 fact table and 15 dimension tables. Now I have to choose between 2 approaches:
A - Power Query:
I have imported the Exceltable in a staging table, were I did the clean up. Removing unneeded columns, adding columns, trimming values, changing data types, etc. After that I have referenced the staging table to 1 fact table. And after that I have referenced the staging table 15 times to different dimension tables. In each dimension table I have removed the other columns and removed the duplicates, so I have clean and nice looking dimension tables. After that I have loaded the datamodel and set the relationships in Power BI.
What I observed is that loading en refreshing the data model takes a lot of time. Then I see all tables (fact and dimension) loading the same amount of MB's. Because I have referenced all tables to one staging table (which only has a source connection with the Excelsheet), I had expected just one loading stream from this staging table.
B- DAX Calculated Columns:
Another way to create the starschema is to load the Exceltable into 1 table in Power Query and load that into the datamodel. Then I can create the fact tables as calculated tables with DAX (using DISTINCT or VALUES) and create the relationships afterwards.
My question is: what is, regarding the performance of loading and refreshing the data model, the best way to create this datamodel?
Kind regards,
Arie
Hi there, I would also love to have more insights on that matter. Anyone has any updated on that toppic? Thank you Carlos
I would love to see an answer
@Arie, @v-huizhn-msft, @Greg_Deckler
I have the same query too.
The referenced tables too seem to load data while I had expected that it simply refers the existing table and then applies further transforms.
Tried applying Table.Buffer
This seems to prolong loading of tables by 3-4 times. The large table takes around 1 min to load (without Table.Buffer) and 3-4 minutes after applying the buffer.
Infact I noticed that Duplicating table/ creating new query and removing unwanted columns and duplicates (to create a DIM table) loads faster rather than using Referenced tables.
Is it due to parallel loading of tables when using separate queries?
Thanks,
Mannu
I'm interested in this question. One thought on this is why create a star schema at all versus just using the denormalized single table?
That also is an interesting question. Why not just importing the table without any dimensions?
I think the benefit of extra dimension tables comes in when creating extra measures in DAX.
Hi @Arie,
When we import data in import model, we import the table without any dimensions. When you get data in Direct Query or Live connect, you connect the datasource direcly, so the relationship are still there.
Power BI – Live connection vs. Import, comparison and limitations
Best Regards,
Angelia
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
107 | |
99 | |
76 | |
64 | |
58 |
User | Count |
---|---|
148 | |
113 | |
97 | |
84 | |
67 |