Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
decarsul
Helper IV
Helper IV

PC crashes when doing a left join

Gday all!

 

This week i've been having an issue with a powerquery left join.

I have 2 tables, table 1 contains an id that corresponds with a unique id in table 2.

Now i want to join just 1 column in table 1 that exists in table 2.

 

Both dataflows are roughly 500mb each.

But when i join them, all my 32GB of memory gets eaten in like 1 minute, which results in basically crashing my entire system.

 

Does anybody have any idea how this could be? (i've removed null values from both id columns, so at most, it should be a 1 to many relation).

1 ACCEPTED SOLUTION
aj1973
Community Champion
Community Champion

Ahh okay Merging is not joining. now I understand and for sure its a size issue that you are facing.

 

Regards
Amine Jerbi

If I answered your question, please mark this thread as accepted
and you can follow me on
My Website, LinkedIn and Facebook

View solution in original post

19 REPLIES 19
aj1973
Community Champion
Community Champion

Hi @decarsul 

You are trying to connect 2 Dataflows with 500MB each, correct? well that's a 1GB of data for one Dataset which is not possible for a Dataset.

aj1973_0-1622738566252.png

Your system must have known the truth lol.

 

 

Regards
Amine Jerbi

If I answered your question, please mark this thread as accepted
and you can follow me on
My Website, LinkedIn and Facebook

And also,

 

Does this apply only to the online workspace, or also to the desktop client?

aj1973
Community Champion
Community Champion

Answering all your 3 questions:

  1. Limits count for free and Pro licenses
  2. The 10GB size is the limite for a workspace not for a Dataset. one workspace can contain many Datasets, one dataset can't be more than 1GB size, size of all Datasets can't go over 10GB in A workspace
  3. Same answer as question 2

 

Regards
Amine Jerbi

If I answered your question, please mark this thread as accepted
and you can follow me on
My Website, LinkedIn and Facebook

Than this is most likely my issue, because the dataflows combined will gross out over 1GB.

But why then does this not give a warning or error. But instead gets 'stuck' in a loop of sorts and eats up all the system resources to a point it can nolonger function?

aj1973
Community Champion
Community Champion

I don't have an aswer to this question, it could be related to many things.

But I know for sure that work can be done on your dataset and/or your dataflows to get what you need.

 

Regards
Amine Jerbi

If I answered your question, please mark this thread as accepted
and you can follow me on
My Website, LinkedIn and Facebook

I know it can too. For now i've added a filter before joining, so that i don't load in 10 years of data, but instead load in the last 2 quarters, effectively making the dataset total smaller.

aj1973
Community Champion
Community Champion

here are 2 articules that can help you in your endeavour

https://towardsdatascience.com/how-to-reduce-your-power-bi-model-size-by-90-76d7c4377f2d

https://towardsdatascience.com/power-bi-how-to-fit-200-million-rows-in-less-than-1gb-5eee310064b7

 

Good luck

Regards
Amine Jerbi

If I answered your question, please mark this thread as accepted
and you can follow me on
My Website, LinkedIn and Facebook

aj1973
Community Champion
Community Champion

For that you can use paramaters or if you have premium license you can use incremental refresh against your dataflows.

The issue is there and clear. reduce the size of your dataflows or move to Premium capacity and all will be good.

 

Regards
Amine Jerbi

If I answered your question, please mark this thread as accepted
and you can follow me on
My Website, LinkedIn and Facebook

To followup my question. I'm asking because of this post made in 2018.

According to this, i can use and upload up to 10GB

decarsul_0-1622786831641.png

 

Hello Amine,

 

hmmm . . . well that <insert cuss word here> . . . 

I wish it would give me a warning or something, instead of completely locking up my system tho.

 

oh btw, does this limit count only for free users, or also for pro users?

Because i am the latter.

Anonymous
Not applicable

Try doing it on Power BI Service, or through Power BI Dataflows in case you have a Premium capacity or Pro License. That may reduce the usage of your memory. 

Other options would be filtering the tables by some criteria/range and then appending them. Not efficient, but still a workaround. 

The datasource are PowerBI Dataflows in a premium environment. And i have a pro license.

Yeah . . . i already skimmed as much as i could, this crashing is also the reason i am only trying left join 1 columnn instead of 8 (which is the bare minimum i need).

 

Ill try exporting the dataflows to 2 csv files and see i if can join them that way. If that would work, it should point to an issue within the DataFlow itsself, no?

Anonymous
Not applicable

If you are trying to join the dataflows in Desktop, it will still consume your system's memory. Try joining them on the Power BI Service on Get Data.

Exporting the CSVs may work but still if they have a considerable size, they'll likely crash again. 

Other thing that could work could be creating a new dataflow that joins the other 2 dataflows, you can use the concept of linked entities since you have a Premium environment.

Yeah i'm aware it takes system resources, but i don't see how a 2x 500Mb can eat up 32Gb of ram. I'm really only 'adding' 1 column with a zipcode.

 

Yeah i can try joining them in the dataflow in the premium environment. See if that works. 

aj1973
Community Champion
Community Champion

Sorry I didn't get your scenario! You have 2 dataflows, you call these dataflows to your Desktop and you pick different entities from these 2 Dataflows. In your model you try to join (relationship) 2 entities from those 2 dataflows and your system crashes?

Is this your issue?

 

Regards
Amine Jerbi

If I answered your question, please mark this thread as accepted
and you can follow me on
My Website, LinkedIn and Facebook

I do the left join from PowerQuery, not from DAX relationmanager if that's what you mean.

aj1973
Community Champion
Community Champion

Sorry but why using PowerQuery for joining? Power Query is good for ETL (Extraction, transformation and loading). Modeling(joins and relationships) is done in power bi desktop and DAX.

 

Regards
Amine Jerbi

If I answered your question, please mark this thread as accepted
and you can follow me on
My Website, LinkedIn and Facebook

Maybe we're not understanding. I geuss a better translation would be 'merging'?

The reason i'm doing it in Powerquery is so that i don't have to load the table that contains more data than i need to the model. But i need it to create a unique id so can i can make a relationship to another table.

aj1973
Community Champion
Community Champion

Ahh okay Merging is not joining. now I understand and for sure its a size issue that you are facing.

 

Regards
Amine Jerbi

If I answered your question, please mark this thread as accepted
and you can follow me on
My Website, LinkedIn and Facebook

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.