Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more.
Get startedGrow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.
Hi guys,
I am working with a single source CSV file, size 11.2 MB, around 43k rows and 19 columns.
The information is about Incidents (tickets), so it has information about agents, SLAs, assignment groups, priority etc.
The thing is, I'm trying to create a star schema, but I only have one source file, so I'm practically duplicating my fact table to create a dim table, removing columns, creating IDs and merging the dim table with the fact table so I can have only IDs in the latter. So far I have created 8 dim tables.
Whenever I load the model, I see it gets slower as I add dim tables, I'm guessing this is because I'm duplicating the source. Also when applying changes I see it gets up to 60.. 70.. 80 MB.
Here's my final model:
The questions are:
Is this the right approach for a single source file with 11.2 MB size?
Should I lower the number of dim tables?
Should I keep all the information in one table and create calculated columns or measures?
How can I improve the performance? I was told to always look for a star schema while modeling in power bi.
Thanks guys
OV
Can you access the same Data using SQL/Database? The best approach would be to have a table for each of your dimension.
The refreshing takes time because you are using the same 43K Datasource when generating each of the dimensions
If you are using a Star schema, the compression of your model will better and result in a smaller Pbix file.
A smaller file will always imply a better performance when doing a scan on a table. Of course, for a model containing a Fact table of 43K rows, it might not have a big gain in performance.
Always remember to lower the number of columns in your table (especially the Fact table). Remove columns with high cardinality (number of distinct values) if you don't need them.
Thanks for the tips!
I'm not creating a connection to a database, but a CSV file.
I managed to duplicate the source only twice, my performance is way much better now!!
I'm going to create a few dim tables using DAX instead of PQ (:
@Anonymous Ok, sounds good,
Try to avoid SummarizeColumns when creating this table and be careful if you use VALUES Function, you might have Circular dependency error when creating the relationship.
Yes, with Summarize, you'll have no issue :P.
Ideally you want a star/snowflake schema. Fact table(s) in the center and dimensions radiating outward.
If you don't want to use PQ to create dimensions you can always use DAX to create those tables. Don't listen to the bot.
Do the dimensions using DAX provide the same performance in the dashboard (processing, slicers, etc..) as the ones in PQ?
Is it advisable to create no dim tables at all and only use one table with the whole 19 columns?
@Anonymous , I know power Query is the suggested way. But in DAX you can get bit slower but less load of data.
New tables using Distinct, Summarize etc
Distinct(Table[Col1])
Summarize(Table, Table[Col2], Table2[Col3])
Check which one performs better.
I think Star Schema is best, the cases when need all or need to ingore some filter , star schmea come bit handy.
Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.
Ask questions in Eventhouse and KQL, Eventstream, and Reflex.
User | Count |
---|---|
88 | |
82 | |
65 | |
64 | |
58 |
User | Count |
---|---|
171 | |
113 | |
110 | |
73 | |
73 |