Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
DRoberts88
New Member

Power query model 32GB taking days to load

Hi,

 

I have a 32 GB database which is contained on a txt file on my harddrive with 150 mill rows. This database contains customer IDs, their demographics, products they purchased, price, and other redundant columns for each time they purchase in the last 10 years.

 

I want to extract from this data set the first time a customer bought a product, the date (nearest month is OK), their demographic and average price. 

 

When I imported the dataset originally I intended to use M to "clean up" and then put the logic in DAX to work out the first purchase. To import the data took 4 hours ish but it worked. However when I wanted to do my DAX logic to select out the rows which was the first purchase of the product I got an out of memory error - so I resorted to M code in power query.

 

This is how I want to extract the data from the DB using M code using groupings:

 

#"Grouped Rows" = Table.Group(#"Filtered Rows", {"CUST_ID", "PRODUCT", "DEMOGRAPHIC"}, {{"Initiation date", each List.Min([MONTH_ID]), type nullable number}, {"AggPrice", each List.Average([PRICE]), type nullable number}}).

 

However to load this query currently it has taken 4 hours to load 3.5 GB of the 32GB file, and previously I have left it 24 hours and it just loads 10-15 out of 32 GB. I have also tried grouping by two rows (so ignoring demographic) rather than three and only extracting Month (which is a number e.g. Dec 2019 is 201912 and not a datetime variable) but these are just as slow. I also tried importing two tables with the idea of joining them via DAX but again this is just as slow slow also. Does anyone else have any ideas as to how I can do the above with this dataset with a query that would take a reasnable time to load? Thanks a lot.

4 REPLIES 4
edhans
Super User
Super User

@DRoberts88 did any answers help? If so, can you mark one or more as the solution? 



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting
Jimmy801
Community Champion
Community Champion

Hello @DRoberts88 

 

as you only need a simple transformation of grouping, that can be done also in DAX and depending of your office installation and hardware you could also think of doing this with Excel. Probably 64bit installation needed. First of all Power Pivot will be able to save this amount of data with 10% of your file size, do to logic of data storage. The approach then could be as follows:

- load you txt-file into power pivot

- create a simple table in an excel sheet and load that to your data model too

- Go to Data --> connections and choose your simple table to load into a sheet

- Right click on this newly created table --> table --> Dax

- choose DAX instead of table and write an evaluate summarize('YourTable', "Column to group by", max('YourTable'[ColumnofDates])) statement

- store the Excel file and load the grouped table into Power BI

another limitation you have with this variant is that you cannot have more then 1 Mio customers.

 

If this post helps or solves your problem, please mark it as solution (to help other users find useful content and to acknowledge the work of users that helped you)
Kudoes are nice too

Have fun

Jimmy

 

 

 

 

StefanoGrimaldi
Resident Rockstar
Resident Rockstar

hey, 

@edhans  have a really fair point there, but if for some reasons you need as a most work in cvs (but as the post before mentions, it have its limits so cahnging database storega would be the best approach here) fine, maybe some tips can come around

- First - create dimensiaonlas tables stored in CVS containing informations likes client code (all its details) one file, another file postal code (all its details) etc, like this you could dowload less columns to the text file or from it to the power query, also this will let you work using star shcema for a better model and reduce signifcatly some of the size and perfom issues, 

- disable the auto date/time option and create a dimensional data table, this normally reduce a vast amount of models size and its a best practice. 

- try your dax formulas on DAX studio to try to optimize them the more you can (also in dax studio you can anlyze the performeza and size impact by table by column to look for opportunity for optimizations opportunities. 

 

other them this for your situation would be following @edhans recommendations. 





Did I answer your question? Mark my post as a solution! / Did it help? Give some Kudos!

Proud to be a Super User!




edhans
Super User
Super User

You need to put this in a real database. There is only so much optimization that can be done in M when working with text files. You are requiring M to do 100% of the work. it must process all 32GB of the data. It doesn't matter if your end result is a filtered table with 1 row, it will still read 32GB of data to get it.

 

If that was on SQL server or any other server that supported folding, that would take Power BI about 3 seconds to process that type of filter. Even your grouping would be handled by the server. This would even be way WAY faster in MS Access, except that is limited to 2GB per database, so your data wouldn't even fit. Heck, even SQL Server Express 2019 won't work. it is limited to 10GB. 

If you cannot get SQL Server 2019 due to pricing, consider PostgreSQL. It is open source and pretty well supported in Power BI through Power Query. 

I'll be curious if anyone has better ideas, but I cannot fathom working with 32GB of text data in Power Query for any serious transformations.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors