Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Speed up a PQ cross-table loop that takes minutes to run

 I have a sales fact table and I need to categorise each record based on matching strings in a free text field to strings in another lookup table. Here's basically how it works:

 

Fact table before transformation:

charliedata_0-1607695216813.png

 

Lookup table:

charliedata_1-1607695244518.png

 

...and I transform it to this shape for the lookup logic to work:

charliedata_3-1607695373917.png

 

 

 

Fact table after running the transformation script:

charliedata_2-1607695269133.png

 

In other words, I get the desired output with this logic.

However...

...while it's obviously fast with this small sample data, with my actual fact table (and a final lookup table with 20 records rather than 5) it takes a looooong time to refresh the dataset. I've tried to make a sample file available via the link below, but in case it doesn't work here's the PQ script that does the lookup:

charliedata_4-1607695468225.png

 

In brief, within a new column added to the fact table, each row of the lookup table is cross-checked with the [Free Text] field in the fact table, and if there's a match then a column is added to the lookup table with the value from the [Status] field in the fact table. This expanded lookup table is inserted as a nested table into the fact table's new column. Then I expand it and it's done.  

 

I've tried using buffering the different steps and tables, but it hasn't made any difference (at least, the alternatives I tried didn't...). I've also tried adapting this bulk find-replace logic, but I don't have the PQ skills to make it work.

 

Is there a more efficient logic? Perhaps using Lists, or a totally different approach to the cross-check?

 

I know @ImkeF is the Power Query guru on these forums...a solution to this would save my weekend! 

 

PBIX 

1 ACCEPTED SOLUTION
ImkeF
Super User
Super User

Hi @Anonymous ,
I would think that a simple merge would be fastest here:

 

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUfLJLC5JTQEynBOLlGJ1opWMgOzg/ByQkFN+YglYzBhFYUZiJkSpCUKpW35RNljMFFmpU35JSU6qUmwsAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, Status = _t, #"Free Text" = _t]),
    #"Merged Queries" = Table.NestedJoin(Source, {"Free Text"}, #"Lookup Table (2)", {"Product"}, "Lookup Table (2)", JoinKind.LeftOuter),
    #"Expanded Lookup Table (2)" = Table.ExpandTableColumn(#"Merged Queries", "Lookup Table (2)", {"Category"}, {"Category"}),
    #"Added Custom" = Table.AddColumn(#"Expanded Lookup Table (2)", "Custom", each if [Category] = null then null else #table({[Category]}, {{[Status]}})),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Big Items", "Small Items"}, {"Big Items", "Small Items"})
in
    #"Expanded Custom"

 

 

Unless you want a partial match. But then additionals quesitons would have to be answered (take just the first match or all...?)
Also, if there could be multiple /dynamic number of categories (Big items, small items...) you would have to make the last step of the table expansion dynamic.

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

View solution in original post

3 REPLIES 3
ImkeF
Super User
Super User

Hi @Anonymous ,
I would think that a simple merge would be fastest here:

 

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUfLJLC5JTQEynBOLlGJ1opWMgOzg/ByQkFN+YglYzBhFYUZiJkSpCUKpW35RNljMFFmpU35JSU6qUmwsAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, Status = _t, #"Free Text" = _t]),
    #"Merged Queries" = Table.NestedJoin(Source, {"Free Text"}, #"Lookup Table (2)", {"Product"}, "Lookup Table (2)", JoinKind.LeftOuter),
    #"Expanded Lookup Table (2)" = Table.ExpandTableColumn(#"Merged Queries", "Lookup Table (2)", {"Category"}, {"Category"}),
    #"Added Custom" = Table.AddColumn(#"Expanded Lookup Table (2)", "Custom", each if [Category] = null then null else #table({[Category]}, {{[Status]}})),
    #"Expanded Custom" = Table.ExpandTableColumn(#"Added Custom", "Custom", {"Big Items", "Small Items"}, {"Big Items", "Small Items"})
in
    #"Expanded Custom"

 

 

Unless you want a partial match. But then additionals quesitons would have to be answered (take just the first match or all...?)
Also, if there could be multiple /dynamic number of categories (Big items, small items...) you would have to make the last step of the table expansion dynamic.

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Anonymous
Not applicable

Hi @ImkeF - thank you so much for responding

 

The actual data had a lot more text in the Free Text field - my data was too simple - but in the end a cross join using a merge column with value = 1 in both tables, followed by a text.contains, was much faster than my original solution

 

Giving you the accepted solution for prompting the merge approach

AlB
Super User
Super User

Hi @Anonymous 

Try this. Place the following M code in a blank query to see the steps. See it all at work in the attached file.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUfLJLC5JTQEynBOLlGJ1opWMgOzg/ByQkFN+YglYzBhFYUZiJkSpCUKpW35RNljMFFmpU35JSU6qUmwsAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, Status = _t, #"Free Text" = _t]),

    big_ = Table.SelectRows(#"Lookup Table", each [Category] = "Big Items")[Product],
    small_ = List.Difference(#"Lookup Table"[Product], big_),
    custom1 = Table.AddColumn(Source, "Big Items", each if List.Contains(big_, [Free Text]) then [Status] else null),
    custom2 = Table.AddColumn(custom1, "Small Items", each if List.Contains(small_, [Free Text]) then [Status] else null)

in
    custom2

Please mark the question solved when done and consider giving a thumbs up if posts are helpful.

Contact me privately for support with any larger-scale BI needs, tutoring, etc.

Cheers 

SU18_powerbi_badge

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.