Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
VivG
Frequent Visitor

Dates between merge join / Expand columns performance issue

Hi all,

 

I have a first very simple table A which contains only three columns:

- a unique ID

- a client ID

- a date

This table has 30,000 rows.

 

I have a second table B with these important columns:

- a client ID

- a start date

- an end date

- a category

This table is bigger with 300,000 rows.

 

What I want to do is to have a new table A which includes the category from table B by merging on the client ID between table A and B and where the date of table A is between the start date and the end date of table B.

 

What I did is the following:

- merge the two tables based on the client ID

#"Merge" = Table.NestedJoin(#"Table A", {"client ID"}, #"Table B", {"client ID"}, "new_column", JoinKind.LeftOuter)

- add a custom column to do the row selection based on the date range

#"Date range selection" = Table.AddColumn(#"Merge", "custom", each let mydate= [date] in Table.SelectRows([new_column],each [start date] <= mydate and [end date] >= mydate))

- then expand my new custom column

#"Result" = Table.ExpandTableColumn(#"Date range selection", "custom", {"start date", "end date", "category"}, {"start date", "end date", "category"})

 

The issue I have is that this query is really too long. I have the feeling that for each row of table A, Power Query is re-loading table B (which is quite large). Even after 2 hours the query is still loading data.

 

I'm quite new to Power Query / Power BI so maybe my query is not good and there is a better way to do this. Do you have any idea of what can be done to improve this?

 

Thanks for your help!

Vivien

 

 

 

 

 

2 ACCEPTED SOLUTIONS
edhans
Super User
Super User

This is your problem:

#"Date range selection" = Table.AddColumn(#"Merge", "custom", each let mydate= [date] in Table.SelectRows([new_column],each [start date] <= mydate and [end date] >= mydate))

 It is going to get deep into the weeds, but @ImkeF article here I think will help you.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

View solution in original post

VivG
Frequent Visitor

Yes I read the article you linked and I tried to apply this to my situation but I was quite confused on which grouping dimension I should use on my different tables, I really have to improve my understanding of the M language...

Somebody suggested me to just invert step 2 and step 3 (so to do the expand column before doing the row selection) and it immediately solved my problem as table B is now loading only one time.

Thank you for your help @edhans and for the link!

View solution in original post

5 REPLIES 5
edhans
Super User
Super User

This is your problem:

#"Date range selection" = Table.AddColumn(#"Merge", "custom", each let mydate= [date] in Table.SelectRows([new_column],each [start date] <= mydate and [end date] >= mydate))

 It is going to get deep into the weeds, but @ImkeF article here I think will help you.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting
VivG
Frequent Visitor

Hello @edhans ,

 

And thank you for your answer!

 

Indeed, I don't think that this second step of my query is the issue. 

 

If I just do step 1:

- merge the two tables based on the client ID

#"Merge" = Table.NestedJoin(#"Table A", {"client ID"}, #"Table B", {"client ID"}, "new_column", JoinKind.LeftOuter)

 

and step 2:

- add a custom column to do the row selection based on the date range

#"Date range selection" = Table.AddColumn(#"Merge", "custom", each let mydate= [date] in Table.SelectRows([new_column],each [start date] <= mydate and [end date] >= mydate))

 

my query gives a result in less than 2 minutes.

 

This is really when I add step 3:

- then expand my new custom column

#"Result" = Table.ExpandTableColumn(#"Date range selection", "custom", {"start date", "end date", "category"}, {"start date", "end date", "category"})

 

that the query never ends, even after two hours of waiting. 

 

I did a new test this morning and reduced table A to 5 rows instead of 30,000 rows and it gave me a result in 5 minutes. But that's what I feared, Power Query is reloading table B for each line of table A on step 3. Table B is 38 Mo and I saw Power Query loading 38 x 5 = 190 Mo of data. So if I want to do this with my complete table A of 30,000 rows, I would need 30,000 x 38 = 1,140 Go loaded and 500 hours! Is there any way to say to Power Query to keep table B in the RAM when it is loaded instead of re-loading it for each line of table A?

 

Thanks!

Vivien 

 

 

It is the expansion of that table you are crossjoining. It is the step I referenced that is the source of the issue. Did you read the article I linked to?



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting
VivG
Frequent Visitor

Yes I read the article you linked and I tried to apply this to my situation but I was quite confused on which grouping dimension I should use on my different tables, I really have to improve my understanding of the M language...

Somebody suggested me to just invert step 2 and step 3 (so to do the expand column before doing the row selection) and it immediately solved my problem as table B is now loading only one time.

Thank you for your help @edhans and for the link!

Yes @VivG . The details Imke gets into in that article are pretty deep, but can be very helpful in cases like yours.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors