Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
RickMolenaar
New Member

Full outer join unexpected behaviour

I've been having some issues trying to get a full outer join for two tables. Setup at the time of writing (database is regularly, i.e. several times per week, expanded, but this is of no specific importance to my analysis):

 

- Table 1 has 537,168 rows and goes back to 2012, with 2 outliers before that.

- Table 2 has 1,001,147 rows and goes back to 2006.

- Both tables have subsequently been filtered to start from 2012, resulting in 537,166 rows in table 1 and 538,802 rows in table 2.

- For almost all rows in Table 1, there is a corresponding row in Table 2.

- There may be rows in Table 2 that are referred to by multiple row in Table 1, though this is quite rare. This is of no interest for my analysis and duplicates can be removed if necessary.

- For most rows in Table 2, there is a corresponding row in Table 1.

- There are no rows in Table 1 that are referred to by more than one row from Table 2.

 

Explanation using the store analogy:

- There are multiple stores, where customers can place orders.

- The orders are placed in Table 2, which is updated daily.

- Every week, the employees enter the orders which have been fulfilled in Table 1. This is not necessarily the same day for all stores; Store 1 may update on Monday, store 2 on Tuesday, etc.

- As such, in general the last entries in Table 2 do not yet have a matching row in Table 1. This will be updated when the new order data comes in. However, at times some orders which employees enter are not correctly registered, which causes some rows to never get a match.

- Sometimes, an order may be fulfilled in multiple parts, say the customer ordered product 1 and product 2. Product 1 is deliverd in part 1 of the delivery, while product 2 gets delivered in part 2. This will result in a many to one relationship from Table 1 to Table 2 (though in most cases it's a 1:1 match). As noted, this is of no interest.

- In rare cases, an incorrect order is delivered, resulting in a row in Table 1 without a matching row in Table 2.

- My specific interest is in checking when it happens that the orders are not correctly entered in Table 1, i.e. a row in Table 2 without a matching row in Table 1. You can assume that this will be visible from the data (and if even it's not visible, that would also be a legitimate result for my analysis).

- However, I would also, for completeness' sake, be able to view the incorrect orders, i.e. rows from Table 1 without a matching row in Table 2.

 

For this purpose, I am trying to make a full outer join, since this should show full matches, non-matches from Table 1 and non-matches from Table 2 (unless I gravely misunderstood the full outer join). However, I am running into 2 problems:

 

- First, using Table 1 as the first table and Table 2 as the second table for a full outer join takes an excessively long time in the query editor. I'm talking about a few hundred rows per second to start, but practically none after getting about halfway. It hasn't loaded even 1000 rows in the last 30 minutes of writing this. Due to this, I can't include it in the report load. Interestingly enough, Table 2 full outer join Table 1 completes within a reasonable time (few minutes max). It is like this consistently (tried removing and adding the joined table, restarting power bi, restart pc).

- When I somehow did get the table to load (or when looking at just the 2 join 1), the results were not quite as expected. It seems that the joined tables have exactly the same rows as table specified first in the join, plus one additional entirely empty row. Full outer join should, however, by definition return rows from both tables, and definitely the two differently joined tables should have the same size.

 

Does anyone:

-Know what causes the loading to fail?

-Have any ideas on how to fix this?

-Know where the mismatch between what I am expecting and what I am getting comes from?

-Have any ideas on how to get the table I want, i.e. all rows from both tables?

2 REPLIES 2
v-yuezhe-msft
Employee
Employee

@RickMolenaar

Full outer join in Query Editor of Power BI Desktop could fail due to the large data in the two tables. In your scenario, I would recommend you use DAX to create required table, depending on your requirements you can probably obtain a similar result using SUMMARIZECOLUMNS, or use a combination of CROSSJOIN and FILTER.

If you have any questions about the DAX, could you please share small queries(tables) with dummy data and post expected result in table format?

Regards,
Lydia

Community Support Team _ Lydia Zhang
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

My 2 cents:

 

1. After joining, you have the numbers of rows from the first table, + 1. You will get additional rows once you expand the column with nested tables (from the second table).

 

2. If you are looking for mismatches, why not simply join Left-Anti and/or Right-Anti?

Specializing in Power Query Formula Language (M)

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.