Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
Hi,
I'm working with a set of csv files on a folder, where there is a new file everyday.
I started building the data model with only two files, and even then the refresh response was slow, but as soon as I connect the rest of the files (even 2-3 more), the refreshing time is absurdly high, even blocking the computer.
I tried to simplify the files (each one has 75 columns, but I can delete just about 10-12 of them) but as it needs to have calculated columns I can't get rid of many. Date intelligence is off.
I'm happy to share the pbix file or provide more info if needed, in case someone can help.
Solved! Go to Solution.
Thanks Sayali,
What I did to get the csv files is Get Data > Folder > select the folder. Then in the next steps got rid of some columns I don't need.
Now I'm trying a different approach, merging all csv into one and then treat the data. Seems to be a bit faster, although it's given me a 299Mb file to work with and takes about 5' to refresh. However, so far it doesn't block the computer or stop...
Hi @Anonymous ,
If all your .csv filed have the same structure, you can use the Folder connector and let Power Query combine the files for you, and then do some optimazations for the data source such as remove uncessary columns, rows etc which would be helpful to reduce the data size and the refreh time.
Please refer: Data reduction techniques for Import modeling
Best Regards,
Community Support Team _ Yingjie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
Hi
I was checking a few things and the data does not look like too much. As there were no CSVs I could not really test anything. But My assumption is you are trying to loop through all the CSVs in a folder. and looping is always slow because it can not parallelize any actions
Correct me if I am wrong
but instead of looping, you can always use Folder Feature in "Get data" which's quite fast.
If you don't have that option and I have few suggestions
1. Your Main problem is following columns
2. I understand you need start time and end time - here do you need on the second level? this is something you can adjust keeping it only hour and minute level
3. You can check ID columns if they are really required eg. XHOTEL_ID or XRESERVA_ID
4. Same with Codes eg.
Above columns are most expensive one. try if you can get rid of them
Regards,
Sayali
Proud to be a Super User!
Thanks Sayali,
What I did to get the csv files is Get Data > Folder > select the folder. Then in the next steps got rid of some columns I don't need.
Now I'm trying a different approach, merging all csv into one and then treat the data. Seems to be a bit faster, although it's given me a 299Mb file to work with and takes about 5' to refresh. However, so far it doesn't block the computer or stop...
Hi @Anonymous ,
If all your .csv filed have the same structure, you can use the Folder connector and let Power Query combine the files for you, and then do some optimazations for the data source such as remove uncessary columns, rows etc which would be helpful to reduce the data size and the refreh time.
Please refer: Data reduction techniques for Import modeling
Best Regards,
Community Support Team _ Yingjie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
You can add in One-drive or Drop box and share the link
Regards,
Sayali
Proud to be a Super User!
It would be good to have PBIX.
Could you please share it will help to understand whats going on?
Regards,
Sayali
Proud to be a Super User!
https://1drv.ms/u/s!BMuMkTAe9Ey13hvwXZQJjvyFFLCw?e=s65Kbo
I've put it here. Please let me know if it's okay.
User | Count |
---|---|
124 | |
108 | |
99 | |
62 | |
62 |
User | Count |
---|---|
137 | |
115 | |
102 | |
71 | |
61 |