Solved: Re: Merge in Power Query based on conditions

robertpisarek · ‎12-07-2017

How to merge the same table based on two conditions in PowerQuery?

For example in SQL:

select * from Table1 t1
inner join Table1 t2 on
t1.TransactionID = t2.TransactionID
and t1.CategoryID <> t2.CategoryID

MarcelBeug · ‎12-07-2017

My suggestion would be to join on TransactionID and remove the results where the CategoryID's are equal.
As you require an inner join, this also means removing empty nested tables.

let
    Source = Table.NestedJoin(Table1,{"TransactionID"},Table1,{"TransactionID"},"t2",JoinKind.Inner),
    FilteredOnCategory = Table.ReplaceValue(Source,each [t2],(Earlier) => Table.SelectRows(Earlier[t2],each [CategoryID] <> Earlier[CategoryID]), Replacer.ReplaceValue,{"t2"}),
    RestoredType = Value.ReplaceType(FilteredOnCategory,Value.Type(Source)),
    FilteredNoEmptyTables = Table.SelectRows(RestoredType, each not Table.IsEmpty([t2])),
    Expanded_t2 = Table.ExpandTableColumn(FilteredNoEmptyTables, "t2", {"CategoryID"}, {"t2.CategoryID"})
in
    Expanded_t2

The RestoredType step restores the column types from the Source table, as Table.ReplaceValues resets all column types to any.

Specializing in Power Query Formula Language (M)

View solution in original post

MarcelBeug · ‎12-07-2017

My suggestion would be to join on TransactionID and remove the results where the CategoryID's are equal.
As you require an inner join, this also means removing empty nested tables.

let
    Source = Table.NestedJoin(Table1,{"TransactionID"},Table1,{"TransactionID"},"t2",JoinKind.Inner),
    FilteredOnCategory = Table.ReplaceValue(Source,each [t2],(Earlier) => Table.SelectRows(Earlier[t2],each [CategoryID] <> Earlier[CategoryID]), Replacer.ReplaceValue,{"t2"}),
    RestoredType = Value.ReplaceType(FilteredOnCategory,Value.Type(Source)),
    FilteredNoEmptyTables = Table.SelectRows(RestoredType, each not Table.IsEmpty([t2])),
    Expanded_t2 = Table.ExpandTableColumn(FilteredNoEmptyTables, "t2", {"CategoryID"}, {"t2.CategoryID"})
in
    Expanded_t2

The RestoredType step restores the column types from the Source table, as Table.ReplaceValues resets all column types to any.

Specializing in Power Query Formula Language (M)

NatashaSchuster · ‎12-08-2017

Hello @MarcelBeug

I am trying to wrap my head around the function you created for the 3rd argument NewValue inside Table.ReplaceValue()

Could you please explain how does this function work and especially the logic highlighted in red


FilteredOnCategory  
 = Table.ReplaceValue(// table, oldValue, NewValue, replacer as function, col to search. replace one value with another in the specified columns
     Source,     each [t2], // oldValue                     
     (Earlier) => Table.SelectRows(Earlier[t2],each [CategoryID] <> Earlier[CategoryID]), // NewValue 
     Replacer.ReplaceValue,{"t2"} // Replacer "t2" field name
 ),

Thanks

MarcelBeug · ‎12-08-2017

"Earlier" is just the name of a parameter, it may be anything (e.g. "This", "Each", "ThisRow", "Currentt"), but I chose "Earlier" like EARLIER in DAX.

In fact "Earlier" is equivalent with the record that makes up each row in table Source.

In Source, the column t2 consists of nested tables (the Table1 tables after the join with itself).

In these nested tables, only the records must be selected with a Category different from the Category in Source.

So the nested tables are represented by Earlier[t2] and the CategoryID inside these tables are represented by [CategoryID].

The Source[CategoryID] is represented by Earlier[CategoryID].

Earlier => is equivalent with each, but each cannot be used, as a qualifier is required to distinguish between the CategoryID in Source and the CategoryID inside the nested tables in t2.

Specializing in Power Query Formula Language (M)

NatashaSchuster · ‎12-08-2017

@MarcelBeug

I am familiar with EARLIER() in DAX. This is very interesing!

Earlier[t2] and the CategoryID refer to [t2] table with nested Table1 but Earlier[CategoryID] looks at he Source[CategoryID]... -- this is where it's little confusing. So, we have each [CategoryID] <> Earlier[CategoryID] . If Ealire[t2] is looking at nested table result, why wouldn't Earlier[CategoryID] look at nested result too, it has "ealier" prefix and it's logicly to amuse that.

To put it in a different way, why Earlier[CategoryID] with prefix Earlier is not looking at nested tables (Earlier [t2]) but at the source?