Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
robertpisarek
Regular Visitor

Merge in Power Query based on conditions

How to merge the same table based on two conditions in PowerQuery?

 

For example in SQL:

 

select * from Table1 t1
inner join Table1 t2 on
t1.TransactionID = t2.TransactionID
and t1.CategoryID <> t2.CategoryID

1 ACCEPTED SOLUTION
MarcelBeug
Community Champion
Community Champion

My suggestion would be to join on TransactionID and remove the results where the CategoryID's are equal.
As you require an inner join, this also means removing empty nested tables.

 

let
    Source = Table.NestedJoin(Table1,{"TransactionID"},Table1,{"TransactionID"},"t2",JoinKind.Inner),
    FilteredOnCategory = Table.ReplaceValue(Source,each [t2],(Earlier) => Table.SelectRows(Earlier[t2],each [CategoryID] <> Earlier[CategoryID]), Replacer.ReplaceValue,{"t2"}),
    RestoredType = Value.ReplaceType(FilteredOnCategory,Value.Type(Source)),
    FilteredNoEmptyTables = Table.SelectRows(RestoredType, each not Table.IsEmpty([t2])),
    Expanded_t2 = Table.ExpandTableColumn(FilteredNoEmptyTables, "t2", {"CategoryID"}, {"t2.CategoryID"})
in
    Expanded_t2

 

The RestoredType step restores the column types from the Source table, as Table.ReplaceValues resets all column types to any.

Specializing in Power Query Formula Language (M)

View solution in original post

5 REPLIES 5
MarcelBeug
Community Champion
Community Champion

My suggestion would be to join on TransactionID and remove the results where the CategoryID's are equal.
As you require an inner join, this also means removing empty nested tables.

 

let
    Source = Table.NestedJoin(Table1,{"TransactionID"},Table1,{"TransactionID"},"t2",JoinKind.Inner),
    FilteredOnCategory = Table.ReplaceValue(Source,each [t2],(Earlier) => Table.SelectRows(Earlier[t2],each [CategoryID] <> Earlier[CategoryID]), Replacer.ReplaceValue,{"t2"}),
    RestoredType = Value.ReplaceType(FilteredOnCategory,Value.Type(Source)),
    FilteredNoEmptyTables = Table.SelectRows(RestoredType, each not Table.IsEmpty([t2])),
    Expanded_t2 = Table.ExpandTableColumn(FilteredNoEmptyTables, "t2", {"CategoryID"}, {"t2.CategoryID"})
in
    Expanded_t2

 

The RestoredType step restores the column types from the Source table, as Table.ReplaceValues resets all column types to any.

Specializing in Power Query Formula Language (M)

Hello  @MarcelBeug 

 

I am trying to wrap my head around the function you created for the 3rd argument NewValue inside Table.ReplaceValue()

Could you please explain how does this function work and especially the logic highlighted in red

 

 


FilteredOnCategory
    = Table.ReplaceValue(// table, oldValue, NewValue, replacer as function, col to search. replace one value with another in the specified columns      Source,     each [t2], // oldValue                           (Earlier) => Table.SelectRows(Earlier[t2],each [CategoryID] <> Earlier[CategoryID]), // NewValue       Replacer.ReplaceValue,{"t2"} // Replacer "t2" field name  ),

  

Thanks

"Earlier" is just the name of a parameter, it may be anything (e.g. "This", "Each", "ThisRow", "Currentt"), but I chose "Earlier" like EARLIER in DAX.

 

In fact "Earlier" is equivalent with the record that makes up each row in table Source.

In Source, the column t2 consists of nested tables (the Table1 tables after the join with itself).

In these nested tables, only the records must be selected with a Category different from the Category in Source.

So the nested tables are represented by Earlier[t2] and the CategoryID inside these tables are represented by [CategoryID].

The Source[CategoryID] is represented by Earlier[CategoryID].

 

Earlier => is equivalent with each, but each cannot be used, as a qualifier is required to distinguish between the CategoryID in Source and the CategoryID inside the nested tables in t2.

Specializing in Power Query Formula Language (M)

@MarcelBeug

 

I am familiar with EARLIER() in DAX. This is very interesing! 

 Earlier[t2] and the CategoryID  refer to [t2] table with nested Table1 but Earlier[CategoryID] looks at he Source[CategoryID]... -- this is where it's little confusing. So, we have  each [CategoryID] <> Earlier[CategoryID] . If Ealire[t2] is looking at nested table result, why wouldn't Earlier[CategoryID] look at nested result too, it has "ealier" prefix and it's logicly to amuse that. 

 

To put it in a different way, why Earlier[CategoryID] with prefix Earlier is not looking at nested tables (Earlier [t2]) but at the source?  

 

 

image.png

 

 

Thanks

An example at the Source step.

 

Earlier[CategoryID] is a column in Source.

Earlier[t2] is also a column in Source.

Earlier[Category] is not inside Earlier[t2].

 

 

Earlier not inside t2.png

Specializing in Power Query Formula Language (M)

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.