Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
viera00
Helper II
Helper II

Complex PowerQuery Merge query using substrings

Hello everyone,

 

I have a questions about a complex merge I need to do in Power Query.

Imagine the following tables:

 

Table1:

 

Column1 | Column2

1              4441

2              4442

3              4443

4              443

 

Then I have Table2 , as the following table:

 

Table2:

Column1 | Column2

1               4441098

2               4441097

3               4441011

4               4441122

5               4443111

6               4443123

 

And the merge should have the rows of Table1 and the rows of Table2 where the "prefix" of Table2.Column2 is Table1.Column2. Take into account that I do not know the number of chars of the prefix, I mean, Table1.Column2 can have any lenght not greater than 7

 

Regards GV

German Viera
http://slidemodel.com/
1 ACCEPTED SOLUTION

A different way to perform a lookup is to use "Table.SelectRows", although this might be slower - so it is essential that you use the Table.Buffer and for very large table try to partition also. But the basic principle can look as follows:

 

let
    Table1 = Table.Buffer(#table({"Column1", "Column2"}, {{1, "4441"}, {2, "4442"}, {3, "4443"}, {4, "443"}})),
    Table2 = #table({"Column1", "Column2"}, {{1, "4441098"}, {2, "4441097"}, {3, "4441011"}, {4, "4441122"}, {5, "4443111"}, {6, "4443123"}}),
    Lookup = Table.AddColumn(Table2, "Lookup", (outer) => Table.SelectRows(Table1, each Text.StartsWith(outer[Column2], [Column2]))),
    #"Expanded Lookup" = Table.ExpandTableColumn(Lookup, "Lookup", {"Column2"}, {"Lookup.Column2"})
in
    #"Expanded Lookup"

 

Advantage here is that you don't need an exact match, but can use all sorts of conditions, in this case "Text.StartsWith".

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

View solution in original post

14 REPLIES 14
Simonolsson
Regular Visitor

Hello, hope this is ok to make this question in this topic, i was the closest case i could find on this page. 

I can't manage to get this code to work, my case i just a little bit different than the origanal post. 

I have this: 

Sheet 1 = Price master

(column B) = P/N

Example: 16510-96J10

Sheet 2 = Exceptions

(Column A) = Part start

Example: 16510-
(Column E) = AG end

Example: 3

 

I want to do like a Vlookup from Price master, the result should be "search for first characters" from Exceptions. 
In excel i can do it like this, but want is integrated in my Query "=VLOOKUP(LEFT([@[P/N]];5);Exceptions!A:D;4;0)"

 

The result should be something like this: 

Search: 16510-96J10 (number found in "Price master"

The match should be for something like this, match from 2 characters to whole number. whichever comes first.

16

165

1651

16510

Numbers found in "Exceptions"

 

Outcome: New column in "Price master" with result from AG end ( "3" in this case)

 

 

 

Beckham
Advocate II
Advocate II

Lets say your unique ID has 6 different possible lengths (2, 3, 4, 5, 6, or 7 characters long)

 

A way you could possibly do this would be to duplicate your query 6 times, split off a different number of characters in each query, and merge only matching rows to the original table. Then at the end you'd need to append all the tables.

 

It might be messy and a bit slow, but it should work.

well there needs to be a rule about the length of the merged characters.  It is not important how long the second table are, but it is important how long the characters in the first table are.  Your example suggests that there can be 3 or 4 character lenght in the first table.  Assuming that is a typo, I would do this.

 

Load table 2

duplicate column 2

split column 2 at 4 characters

delete the remainder column

join table 1 on table 2 with the new column.

 

If your 3 character 443 in table is not a typo, you are in a world of trouble - unless you can identify a rule on how to split the column in table 2.



* Matt is an 8 times Microsoft MVP (Power BI) and author of the Power BI Book Supercharge Power BI.

Hi Matt,

 

Thank you for your reply, but I'm in a world of trouble, as my 3 chars is not a typo. Even worst, I've records of 5 and 2 chars.

 

Regards,

 

GV

German Viera
http://slidemodel.com/

A different way to perform a lookup is to use "Table.SelectRows", although this might be slower - so it is essential that you use the Table.Buffer and for very large table try to partition also. But the basic principle can look as follows:

 

let
    Table1 = Table.Buffer(#table({"Column1", "Column2"}, {{1, "4441"}, {2, "4442"}, {3, "4443"}, {4, "443"}})),
    Table2 = #table({"Column1", "Column2"}, {{1, "4441098"}, {2, "4441097"}, {3, "4441011"}, {4, "4441122"}, {5, "4443111"}, {6, "4443123"}}),
    Lookup = Table.AddColumn(Table2, "Lookup", (outer) => Table.SelectRows(Table1, each Text.StartsWith(outer[Column2], [Column2]))),
    #"Expanded Lookup" = Table.ExpandTableColumn(Lookup, "Lookup", {"Column2"}, {"Lookup.Column2"})
in
    #"Expanded Lookup"

 

Advantage here is that you don't need an exact match, but can use all sorts of conditions, in this case "Text.StartsWith".

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

kalcey
Frequent Visitor

Hey @ImkeF,

Could you please elaborate more on where to write this code?

Is there a way to use the GUI to do that?

 

Thanks

Pls check this video, which contains the "less-codiest"-way I can think of (for this specific technique):

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

jfreyha
Frequent Visitor

Thank you!

what's the difference between inner and outer? are those the join kinds? or is something like "earlier" in DAX?

Yes, it's like EARLIER in DAX 🙂

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Anonymous
Not applicable

Hi @ImkeF

 

This solution was very helpful. Really grateful. I have a question regarding the type of join. I would like to use a full outer join, in order to see all rows from the First and Second. I tried this, but I still keep getting only all rows from the second table. Is there a way to get around this

 

=Table.SelectRows(Table.Buffer(Table1), (FullOuter) => Text.StartsWith([Column2], FullOuter[Columns]))

Yes, the "Inner" is just a random name for the "inner" of the nested functions in that command. So you can name it anything and it wouldn't change the result.

 

If you want to have a full outer, you can simply perform a full outer after that first operation (and merge the results of the 2 columns together)

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

This is awesome solution.
I've spend few days trying different ways of achieving same result, but this is by far the best - fastest and cleanest.


I wonder if this could be made into custom function?

 

Also, is there a solution to speed up subsequent actions after this join? To be honest, performance of PQ after this kind of join falls of the cliff and even simple filter takes ages (and this is for a dimension kind of table, with just about 1000 rows)

The solution below I applied similarly in an Excel / Power Query application I recently developed ( @ImkeF knows: the account matching application), which is used in an interactive way, so it had to be fast.

 

You can add key columns to Table 2, each with the first n (1-7) characters of the value in Column2, acting as key columns. I did so in below query Table2With7Keys.

 

Next, you can add a column to Table1 with the length of the value in Column2, and group on that length, so you have partitioned tables for each key length.

Now you can merge those nested tables with Table2, using the key column with the corresponding key length.

Then finish up with some expanding and removal.

 

Query Table2With7Keys:

 

let
    Source = Table2,
    #"Added Custom" = Table.AddColumn(Source, "Keys", (x) => Table.FromRows({List.Transform({1..7}, each Text.Middle(x[Column2],0,_))}, List.Transform({1..7}, each "Key"&Text.From(_)))),
    #"Expanded Keys" = Table.ExpandTableColumn(#"Added Custom", "Keys", {"Key1", "Key2", "Key3", "Key4", "Key5", "Key6", "Key7"})
in
    #"Expanded Keys"

 

Query MergedTables:

 

let
    Source = Table1,
    #"Added Custom" = Table.AddColumn(Source, "Length", each Text.Length([Column2])),
    #"Grouped Rows" = Table.Group(#"Added Custom", {"Length"}, {{"AllData", each _, type table}}),
    Merged = Table.Buffer(Table.TransformColumns(#"Grouped Rows",{{"AllData", each Table.NestedJoin(_,{"Column2"},Table2With7Keys,{"Key"&Text.From(List.Average(_[Length]))},"Table2Data",JoinKind.LeftOuter), type table}})),
    #"Expanded AllData" = Table.ExpandTableColumn(Merged, "AllData", {"Column1", "Column2", "Table2Data"}, {"Column1", "Column2", "Table2Data"}),
    #"Expanded Table2Data" = Table.ExpandTableColumn(#"Expanded AllData", "Table2Data", {"Column1", "Column2"}, {"Table2.Column1", "Table2.Column2"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded Table2Data",{"Length"})
in
    #"Removed Columns"
Specializing in Power Query Formula Language (M)

Hi ImkeF

 

Interesting Solution.

 

I will try it and let you know if worked. Thank you very much for your contribution.

 

Regards,

 

GV

German Viera
http://slidemodel.com/

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.