Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
I want to speed up of import query from CSV file with 30,000,000 rows.
The CSV file have weekly fact data of 2 years.
and I need to import the CSV file to Power BI Desktop per weekly.
Now, I take 20 minutes for import.
How can I speed up of import.
My test environment:
PC: Intel NUC
CPU: Core i5-4250U, 1.3GHz, 2 core, 4 logical processor.
MEM: 16 GB
Storage: SSD 256 GB.
During Import:
CPU average is 60%, Storage read is 20 MB/s.
Regards,
Yoshihiro Kawabata
Hi @yoshihirok,
Based on my research, I found that unchecking the "Allow data preview to download in the background." under File - Options and Settings - Options - (Current File) - Data Load - Background Data may speed up the import query. Could you have a try to see if it works?
In addition, if you're using the "Folder" connector to import your CSV files, then you don't need to import again every time when there are new CSV files added in the folder from Desktop and republish them to the service. You can just schedual a refresh on the service, and every new files in the folder will be included in your dataset after a refresh finished.
Regards
Hi, @v-ljerr-msft thank you for your research information.
"Allow data preview to download in the background."
The update time of Off/On is same.
Today, I verify Off/On update time.
"Folder"
I'm not using "Folder" in this case,
My csv is one file.
Regards,
Yoshihiro Kawabata
If @v-ljerr-msft 's suggestions don't speed up enough, you can consider incremental load workaround: http://www.thebiccountant.com/2017/01/11/incremental-load-in-powerbi-using-dax-union/
Prerequisite is a fact-table with entries entries who cannot be changed once made (which is the case on many bookkeeping systems):
1) Load historical data once and set to "Don't refresh" -> retrieve cut-off-criteria
2) Create second table for UNION with current data
3) From time-to-time: Move your "new old-data" from 2) to 1) during quiter times, so that 2) will start from scratch
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
Dealing with a similar issue at the moment (around 25 million rows spread across 50 .csv files), refresh time is really slow.
Based on what I've read online, I don't think there's any effective way of speeding things up (at least not without doubling the data with your method, Imke) within PowerBI. Therefore, a more effective solution I'm considering implementing is:
- put the data in a Database (could use Azure Database, £10/month. Not the Data Warehouse because that is much more expensive)
- use Azure Analysis Services (whish is essentially the cloud version of SQL Server Tabular) to load data from that Database and create the model with tables, relationships, measures, etc (problem is that this is quite expensive as well for personal use, £200+/month)
- use a Power BI in Live Connection mode to connect to Azure Analysis Services
To avoid costs, you can of course use the on-premise solutions instead of the cloud:
- SQL Server instead of Azure Database
- SSAS Tabular instead of Azure Analysis Services (depending on your computer configuration, because when using Live Connection, data will be in your computer's memory)
Supposedly refreshing SSAS Tabular will be faster than Power BI if you create partitions in terms of loading the data.
Note: you can also use Power Query in SSAS Tabular (at least in SQL Server 2017) to load data directly from .csv/.txt files. However, I'm not sure if this is slower than loading from a Database, but I'd imagine it can't be faster anyway.
I was wondering if you've implemented/worked with these solutions, @ImkeF, and what's your opinion.
Useful links:
- demo of Power BI Live Connection working with 1 billion rows
- Azure Price Calculator to determine prices essentially for all cloud services
- SSAS Tabular 2017 loading data from .csv files (here the author doesn't create partitions or anything, so you can actually see that the import is painfully slow, very similar to Power BI). So the hope is that Partitions or loading from a database instead of .csv would be faster. Or using Azure Analysis Services, not SSAS.
Hi @jb007,
how long does your "really slow" actually take?
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
About 20 minutes, @Imke.
I don't even have 4 minutes like Madonna, I want it to take 1 minute or less.
So I'm thinking SSAS Tabular and partitioning the table by week could help. It would load 1 week only, instead of over 6 years of data, so 300x less data.
Yes, that's the way to go then: Use partitions in SSAS tabular model.
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
109 | |
99 | |
77 | |
66 | |
54 |
User | Count |
---|---|
144 | |
104 | |
101 | |
86 | |
64 |