cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
viera00 Regular Visitor
Regular Visitor

Complex PowerQuery Merge query using substrings

Hello everyone,

 

I have a questions about a complex merge I need to do in Power Query.

Imagine the following tables:

 

Table1:

 

Column1 | Column2

1              4441

2              4442

3              4443

4              443

 

Then I have Table2 , as the following table:

 

Table2:

Column1 | Column2

1               4441098

2               4441097

3               4441011

4               4441122

5               4443111

6               4443123

 

And the merge should have the rows of Table1 and the rows of Table2 where the "prefix" of Table2.Column2 is Table1.Column2. Take into account that I do not know the number of chars of the prefix, I mean, Table1.Column2 can have any lenght not greater than 7

 

Regards GV

German Viera
http://slidemodel.com/
1 ACCEPTED SOLUTION

Accepted Solutions
Super User
Super User

Re: Complex PowerQuery Merge query using substrings

A different way to perform a lookup is to use "Table.SelectRows", although this might be slower - so it is essential that you use the Table.Buffer and for very large table try to partition also. But the basic principle can look as follows:

 

let
    Table1 = Table.Buffer(#table({"Column1", "Column2"}, {{1, "4441"}, {2, "4442"}, {3, "4443"}, {4, "443"}})),
    Table2 = #table({"Column1", "Column2"}, {{1, "4441098"}, {2, "4441097"}, {3, "4441011"}, {4, "4441122"}, {5, "4443111"}, {6, "4443123"}}),
    Lookup = Table.AddColumn(Table2, "Lookup", (outer) => Table.SelectRows(Table1, each Text.StartsWith(outer[Column2], [Column2]))),
    #"Expanded Lookup" = Table.ExpandTableColumn(Lookup, "Lookup", {"Column2"}, {"Lookup.Column2"})
in
    #"Expanded Lookup"

 

Advantage here is that you don't need an exact match, but can use all sorts of conditions, in this case "Text.StartsWith".

Did I answer your question? Mark my post as a solution!

Proud to be a Datanaut!

Imke Feldmann

How to integrate M-code into your solution -- Check out more PBI- learning resources here -- Performance Tipps for M-queries




13 REPLIES 13
Super User
Super User

Re: Complex PowerQuery Merge query using substrings

well there needs to be a rule about the length of the merged characters.  It is not important how long the second table are, but it is important how long the characters in the first table are.  Your example suggests that there can be 3 or 4 character lenght in the first table.  Assuming that is a typo, I would do this.

 

Load table 2

duplicate column 2

split column 2 at 4 characters

delete the remainder column

join table 1 on table 2 with the new column.

 

If your 3 character 443 in table is not a typo, you are in a world of trouble - unless you can identify a rule on how to split the column in table 2.



* Matt is a Microsoft MVP (Power BI) and author of the Power BI Book Supercharge Power BI.
Super User
Super User

Re: Complex PowerQuery Merge query using substrings

A different way to perform a lookup is to use "Table.SelectRows", although this might be slower - so it is essential that you use the Table.Buffer and for very large table try to partition also. But the basic principle can look as follows:

 

let
    Table1 = Table.Buffer(#table({"Column1", "Column2"}, {{1, "4441"}, {2, "4442"}, {3, "4443"}, {4, "443"}})),
    Table2 = #table({"Column1", "Column2"}, {{1, "4441098"}, {2, "4441097"}, {3, "4441011"}, {4, "4441122"}, {5, "4443111"}, {6, "4443123"}}),
    Lookup = Table.AddColumn(Table2, "Lookup", (outer) => Table.SelectRows(Table1, each Text.StartsWith(outer[Column2], [Column2]))),
    #"Expanded Lookup" = Table.ExpandTableColumn(Lookup, "Lookup", {"Column2"}, {"Lookup.Column2"})
in
    #"Expanded Lookup"

 

Advantage here is that you don't need an exact match, but can use all sorts of conditions, in this case "Text.StartsWith".

Did I answer your question? Mark my post as a solution!

Proud to be a Datanaut!

Imke Feldmann

How to integrate M-code into your solution -- Check out more PBI- learning resources here -- Performance Tipps for M-queries




viera00 Regular Visitor
Regular Visitor

Re: Complex PowerQuery Merge query using substrings

Hi Matt,

 

Thank you for your reply, but I'm in a world of trouble, as my 3 chars is not a typo. Even worst, I've records of 5 and 2 chars.

 

Regards,

 

GV

German Viera
http://slidemodel.com/
viera00 Regular Visitor
Regular Visitor

Re: Complex PowerQuery Merge query using substrings

Hi ImkeF

 

Interesting Solution.

 

I will try it and let you know if worked. Thank you very much for your contribution.

 

Regards,

 

GV

German Viera
http://slidemodel.com/
Beckham Regular Visitor
Regular Visitor

Re: Complex PowerQuery Merge query using substrings

Lets say your unique ID has 6 different possible lengths (2, 3, 4, 5, 6, or 7 characters long)

 

A way you could possibly do this would be to duplicate your query 6 times, split off a different number of characters in each query, and merge only matching rows to the original table. Then at the end you'd need to append all the tables.

 

It might be messy and a bit slow, but it should work.

kalcey Frequent Visitor
Frequent Visitor

Re: Complex PowerQuery Merge query using substrings

Hey @ImkeF,

Could you please elaborate more on where to write this code?

Is there a way to use the GUI to do that?

 

Thanks

Super User
Super User

Re: Complex PowerQuery Merge query using substrings

Pls check this video, which contains the "less-codiest"-way I can think of (for this specific technique):

 

Did I answer your question? Mark my post as a solution!

Proud to be a Datanaut!

Imke Feldmann

How to integrate M-code into your solution -- Check out more PBI- learning resources here -- Performance Tipps for M-queries




Hermes Frequent Visitor
Frequent Visitor

Re: Complex PowerQuery Merge query using substrings

This is awesome solution.
I've spend few days trying different ways of achieving same result, but this is by far the best - fastest and cleanest.


I wonder if this could be made into custom function?

 

Also, is there a solution to speed up subsequent actions after this join? To be honest, performance of PQ after this kind of join falls of the cliff and even simple filter takes ages (and this is for a dimension kind of table, with just about 1000 rows)

Super User
Super User

Re: Complex PowerQuery Merge query using substrings

The solution below I applied similarly in an Excel / Power Query application I recently developed ( @ImkeF knows: the account matching application), which is used in an interactive way, so it had to be fast.

 

You can add key columns to Table 2, each with the first n (1-7) characters of the value in Column2, acting as key columns. I did so in below query Table2With7Keys.

 

Next, you can add a column to Table1 with the length of the value in Column2, and group on that length, so you have partitioned tables for each key length.

Now you can merge those nested tables with Table2, using the key column with the corresponding key length.

Then finish up with some expanding and removal.

 

Query Table2With7Keys:

 

let
    Source = Table2,
    #"Added Custom" = Table.AddColumn(Source, "Keys", (x) => Table.FromRows({List.Transform({1..7}, each Text.Middle(x[Column2],0,_))}, List.Transform({1..7}, each "Key"&Text.From(_)))),
    #"Expanded Keys" = Table.ExpandTableColumn(#"Added Custom", "Keys", {"Key1", "Key2", "Key3", "Key4", "Key5", "Key6", "Key7"})
in
    #"Expanded Keys"

 

Query MergedTables:

 

let
    Source = Table1,
    #"Added Custom" = Table.AddColumn(Source, "Length", each Text.Length([Column2])),
    #"Grouped Rows" = Table.Group(#"Added Custom", {"Length"}, {{"AllData", each _, type table}}),
    Merged = Table.Buffer(Table.TransformColumns(#"Grouped Rows",{{"AllData", each Table.NestedJoin(_,{"Column2"},Table2With7Keys,{"Key"&Text.From(List.Average(_[Length]))},"Table2Data",JoinKind.LeftOuter), type table}})),
    #"Expanded AllData" = Table.ExpandTableColumn(Merged, "AllData", {"Column1", "Column2", "Table2Data"}, {"Column1", "Column2", "Table2Data"}),
    #"Expanded Table2Data" = Table.ExpandTableColumn(#"Expanded AllData", "Table2Data", {"Column1", "Column2"}, {"Table2.Column1", "Table2.Column2"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded Table2Data",{"Length"})
in
    #"Removed Columns"
Specializing in Power Query Formula Language (M)