Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Speed Up Excel File Load

Power BI has a big issue when trying to load Excel files as a data source. The files could be very small (only a couple of megabytes), but if you do the wrong combination of steps (such as an append), the load is extremely slow. Even a straight load is very slow, for the size of the file.

 

I suggest that Power BI should load the Excel files once into memory, in Power BI's native table format (or whatever in memory), and then refer to that for all future steps.

 

Currently it will re-read the same file(s) from disk, many, many times - depending on the steps performed. I'm unsure why this overhead is needed.

 

If I was to perform the equivilant load and data maniputation using Python Pandas, it would load the excel files once, perform all the data manipulation, and be done in a fraction of the time.

 

Microsoft really need to pick up the game on this. I have no idea why they treat an excel file like a database. I lose hours of time whenever I deal with excel files, because of this. The worst is when you have a folder of excel files. 

Status: New
Comments
v-yuta-msft
Community Support

@Anonymous ,

 

Could you show more details about the performed steps? For example, have you done some operations on the excel tables,  like merge/append a excel table with table in sql server in direct query mode. 

 

Regards,

Jimmy Tao

Anonymous
Not applicable

I was using the same queries and steps that are created when you combine multiple xlsx files together - except doing it for multiple worksheets in the same file. (So having the sample work sheet, and a function that runs on each of those worksheets to apply any modifications per sheet). Then I expand all the worksheets into one table.

 

I also had one other xlsx source, that I then append onto the end.

 

Regardless of the steps, there is definately a lot of unessecary overhead when Power BI loads xlsx files. When processing the load, Power BI will reload the xlsx file multiple times. Also the speed that it loads say a 1mb or 2mb is very slow (like watching it at 1kB a second or something). 

 

Just knowing that doing the equivilant data load and manipulation in python, using pandas, would load the file in an instant, and perform the data manipulation shortly after - a fraction of the time. (I would just use that here, but Python isn't available on report server).