Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
imranamikhan
Helper V
Helper V

Buffered lists with List.Buffer()

Hi everyone,

 

I have a large table with a list of customers and I wanted to identify if any customers in the table exist in a separate keyword list of distinct customers using DAX. I have received support from the forums on using an M code solution but have found performance to be extremely slow for 200,000+ records.

 

Alternatively, I have used Power Query's fuzzy matching feature which is significantly faster than custom M code, but it is not as accurate in identifying matches.

 

I understand it is possible to use buffered lists with List.Buffer() to significantly improve performance, but I do not quite know if I am using this correctly.

 

Ref: https://community.powerbi.com/t5/Desktop/DAX-Check-if-text-value-exists-in-keyword-list-then-return/...

 

let
    Source = Table.Buffer(SharePoint.Files("https://rbcom.sharepoint.com/sites/CustomerScope", [ApiVersion = 15])),
    #"PBI_Customer_Master_Tables csv_https://rbcom sharepoint com/sites/CustomerScope/Shared Documents/1  Customer Master Cleansing/3  PowerBI/0  Power BI Master Files/0  Test Data/" = 
    Source{[Name="PBI_Customer_Master_Tables.csv",#"Folder Path"="https://rbcom.sharepoint.com/sites/CustomerScope/Shared Documents/1. Customer Master Cleansing/3. PowerBI/0. Power BI Master Files/0. Test Data/"]}[Content],
    #"Imported CSV" = Csv.Document(#"PBI_Customer_Master_Tables csv_https://rbcom sharepoint com/sites/CustomerScope/Shared Documents/1  Customer Master Cleansing/3  PowerBI/0  Power BI Master Files/0  Test Data/",[Delimiter=",", Columns=17, Encoding=65001, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(#"Imported CSV", [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Customer", type text}, {"Data Source", type text}, {"Customer Text", type text}, {"Data Origin", type text}, {"IAG", type text}, {"IAG Text", type text}, {"IBG", type text}, {"IBG Text", type text}, {"Trade Type Text", type text}, {"Trade Type", type text}, {"Parent Customer Code", type text}, {"Group Company Code", type text}, {"Record Timestamp", type date}}),
    #"Appended Query" = Table.Combine({#"Changed Type", DB_CustomerMasterData_History}),
    #"Removed Other Columns" = Table.SelectColumns(#"Appended Query",{"Customer", "Data Source", "Customer Text", "Data Origin", "IAG", "IAG Text", "IBG", "IBG Text", "Trade Type Text", "Trade Type", "Parent Customer Code", "Group Company Code", "Record Timestamp"}),
    #"Changed Type3" = Table.TransformColumnTypes(#"Removed Other Columns",{{"Trade Type", type text}}),
    #"Inserted Text After Delimiter" = Table.AddColumn(#"Changed Type3", "Text After Delimiter", each Text.AfterDelimiter([Customer], "/"), type text),
    #"Added Custom" = Table.AddColumn(#"Inserted Text After Delimiter", "eComm Keyword", each List.Accumulate
            (
                List.Buffer(#"Tbl_eComm"[Keyword]),

                "",

                (state, current) => 
                
                    if List.Contains(Text.Split([Customer Text], " "), current, Comparer.OrdinalIgnoreCase) and state = ""
                    
                    then state & " " & current 
                    
                    else state
                
    ))
in
    #"Added Custom"

 

Best regards,

Ami

3 ACCEPTED SOLUTIONS
PhilipTreacy
Super User
Super User

Hi @imranamikhan 

Try buffering the list before List.Accumulate

let
    Source = Table.Buffer(SharePoint.Files("https://rbcom.sharepoint.com/sites/CustomerScope", [ApiVersion = 15])),
    #"PBI_Customer_Master_Tables csv_https://rbcom sharepoint com/sites/CustomerScope/Shared Documents/1  Customer Master Cleansing/3  PowerBI/0  Power BI Master Files/0  Test Data/" = 
    Source{[Name="PBI_Customer_Master_Tables.csv",#"Folder Path"="https://rbcom.sharepoint.com/sites/CustomerScope/Shared Documents/1. Customer Master Cleansing/3. PowerBI/0. Power BI Master Files/0. Test Data/"]}[Content],
    #"Imported CSV" = Csv.Document(#"PBI_Customer_Master_Tables csv_https://rbcom sharepoint com/sites/CustomerScope/Shared Documents/1  Customer Master Cleansing/3  PowerBI/0  Power BI Master Files/0  Test Data/",[Delimiter=",", Columns=17, Encoding=65001, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(#"Imported CSV", [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Customer", type text}, {"Data Source", type text}, {"Customer Text", type text}, {"Data Origin", type text}, {"IAG", type text}, {"IAG Text", type text}, {"IBG", type text}, {"IBG Text", type text}, {"Trade Type Text", type text}, {"Trade Type", type text}, {"Parent Customer Code", type text}, {"Group Company Code", type text}, {"Record Timestamp", type date}}),
    #"Appended Query" = Table.Combine({#"Changed Type", DB_CustomerMasterData_History}),
    #"Removed Other Columns" = Table.SelectColumns(#"Appended Query",{"Customer", "Data Source", "Customer Text", "Data Origin", "IAG", "IAG Text", "IBG", "IBG Text", "Trade Type Text", "Trade Type", "Parent Customer Code", "Group Company Code", "Record Timestamp"}),
    #"Changed Type3" = Table.TransformColumnTypes(#"Removed Other Columns",{{"Trade Type", type text}}),
    #"Inserted Text After Delimiter" = Table.AddColumn(#"Changed Type3", "Text After Delimiter", each Text.AfterDelimiter([Customer], "/"), type text),
    
    BufferedList = List.Buffer(#"Tbl_eComm"[Keyword]),

    #"Added Custom" = Table.AddColumn(#"Inserted Text After Delimiter", "eComm Keyword", each List.Accumulate
            (
                BufferedList,

                "",

                (state, current) => 
                
                    if List.Contains(Text.Split([Customer Text], " "), current, Comparer.OrdinalIgnoreCase) and state = ""
                    
                    then state & " " & current 
                    
                    else state
                
    ))
in
    #"Added Custom"

regards

Phil



Did I answer your question? Then please mark my post as the solution.
If I helped you, click on the Thumbs Up to give Kudos.


Blog :: YouTube Channel :: Connect on Linkedin


Proud to be a Super User!


View solution in original post

mahoneypat
Employee
Employee

I agree you should buffer outside the List.Accumulate, but re-reading your original post I believe this would perform faster in DAX.  If you can provide some example data in the two tables/columns, a column or measure expression can be suggested.

 

Regards,

Pat





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


View solution in original post

mahoneypat
Employee
Employee

@imranamikhan Thanks for the sample data.  This DAX column expression returned all the same results as the M approach.  I called your tables Data and Keyword, so you'll need to change the expression and use your actual table names.  There is no relationship between the tables.  Runs fast on this small dataset, but please try on your actual.

 

NewColumn =
VAR thiscustomer = Customer[Customer Text]
VAR matchtable =
    FILTER (
        VALUES ( Keyword[Keyword] ),
        SEARCH (
            Keyword[Keyword],
            thiscustomer,
            1,
            0
        ) > 0
    )
VAR matchkeyword =
    MAXX (
        matchtable,
        Keyword[Keyword]
    )
RETURN
    IF (
        ISBLANK ( matchkeyword ),
        "#No Match",
        matchkeyword
    )

 

Regards,

Pat





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


View solution in original post

9 REPLIES 9
imranamikhan
Helper V
Helper V

Thanks @mahoneypat 

This is a super impressive formula and I am getting the exact number of matches for 200k+ records as the M query. Performance times when running a refresh are equal to the M query version. So I now have a choice between both options.

 

Thanks again all for the support.

mahoneypat
Employee
Employee

@imranamikhan Thanks for the sample data.  This DAX column expression returned all the same results as the M approach.  I called your tables Data and Keyword, so you'll need to change the expression and use your actual table names.  There is no relationship between the tables.  Runs fast on this small dataset, but please try on your actual.

 

NewColumn =
VAR thiscustomer = Customer[Customer Text]
VAR matchtable =
    FILTER (
        VALUES ( Keyword[Keyword] ),
        SEARCH (
            Keyword[Keyword],
            thiscustomer,
            1,
            0
        ) > 0
    )
VAR matchkeyword =
    MAXX (
        matchtable,
        Keyword[Keyword]
    )
RETURN
    IF (
        ISBLANK ( matchkeyword ),
        "#No Match",
        matchkeyword
    )

 

Regards,

Pat





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


¡Hola! @mahoneypat , tengo una pregunta.  Si no quiero que me devuelva como resultado "matchkeyword", sino  los valores de una columna "result",  que corresponden al segmento de esos datos.

 

Keyword[Keyword]      -      Keyword[result]                                 Customer[Customer Text]

factura                                    Consultor Junior                             Duplicidad de factura

Cobro                                     Consultor Senior                            Cobro no aplicado

Devoluciones                         Consultor Medio                            Cambio de devoluciones anual no a aplicado

Gracias !!!  Att. Frank 

imranamikhan
Helper V
Helper V

Good to hear from you and thanks @PhilipTreacy - this is significantly faster (down from 40 minutes to 15). And I believe the rest of the time is spent on loading other queries I have running.

Hi @mahoneypat - I have added some sample data to OneDrive: Onedrive Link 

@lbendlin - thank you for the explaination.

mahoneypat
Employee
Employee

I agree you should buffer outside the List.Accumulate, but re-reading your original post I believe this would perform faster in DAX.  If you can provide some example data in the two tables/columns, a column or measure expression can be suggested.

 

Regards,

Pat





Did I answer your question? Mark my post as a solution! Kudos are also appreciated!

To learn more about Power BI, follow me on Twitter or subscribe on YouTube.


@mahoneypa HoosierBI on YouTube


PhilipTreacy
Super User
Super User

Hi @imranamikhan 

Try buffering the list before List.Accumulate

let
    Source = Table.Buffer(SharePoint.Files("https://rbcom.sharepoint.com/sites/CustomerScope", [ApiVersion = 15])),
    #"PBI_Customer_Master_Tables csv_https://rbcom sharepoint com/sites/CustomerScope/Shared Documents/1  Customer Master Cleansing/3  PowerBI/0  Power BI Master Files/0  Test Data/" = 
    Source{[Name="PBI_Customer_Master_Tables.csv",#"Folder Path"="https://rbcom.sharepoint.com/sites/CustomerScope/Shared Documents/1. Customer Master Cleansing/3. PowerBI/0. Power BI Master Files/0. Test Data/"]}[Content],
    #"Imported CSV" = Csv.Document(#"PBI_Customer_Master_Tables csv_https://rbcom sharepoint com/sites/CustomerScope/Shared Documents/1  Customer Master Cleansing/3  PowerBI/0  Power BI Master Files/0  Test Data/",[Delimiter=",", Columns=17, Encoding=65001, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(#"Imported CSV", [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Customer", type text}, {"Data Source", type text}, {"Customer Text", type text}, {"Data Origin", type text}, {"IAG", type text}, {"IAG Text", type text}, {"IBG", type text}, {"IBG Text", type text}, {"Trade Type Text", type text}, {"Trade Type", type text}, {"Parent Customer Code", type text}, {"Group Company Code", type text}, {"Record Timestamp", type date}}),
    #"Appended Query" = Table.Combine({#"Changed Type", DB_CustomerMasterData_History}),
    #"Removed Other Columns" = Table.SelectColumns(#"Appended Query",{"Customer", "Data Source", "Customer Text", "Data Origin", "IAG", "IAG Text", "IBG", "IBG Text", "Trade Type Text", "Trade Type", "Parent Customer Code", "Group Company Code", "Record Timestamp"}),
    #"Changed Type3" = Table.TransformColumnTypes(#"Removed Other Columns",{{"Trade Type", type text}}),
    #"Inserted Text After Delimiter" = Table.AddColumn(#"Changed Type3", "Text After Delimiter", each Text.AfterDelimiter([Customer], "/"), type text),
    
    BufferedList = List.Buffer(#"Tbl_eComm"[Keyword]),

    #"Added Custom" = Table.AddColumn(#"Inserted Text After Delimiter", "eComm Keyword", each List.Accumulate
            (
                BufferedList,

                "",

                (state, current) => 
                
                    if List.Contains(Text.Split([Customer Text], " "), current, Comparer.OrdinalIgnoreCase) and state = ""
                    
                    then state & " " & current 
                    
                    else state
                
    ))
in
    #"Added Custom"

regards

Phil



Did I answer your question? Then please mark my post as the solution.
If I helped you, click on the Thumbs Up to give Kudos.


Blog :: YouTube Channel :: Connect on Linkedin


Proud to be a Super User!


imranamikhan
Helper V
Helper V

Hi @lbendlin - Yes I only need to use once per power query refresh. So whenever new data comes in, I refresh Power Query to see the latest results.

 

I do not understand what you meant by re-use. If you could explain that would be appreciated.

List.Buffer() and Table.Buffer() are pointless unless you plan to re-use their data in the same query multiple times.  All they do is pre-spool the data into memory, so that you can reference it later at better speed. If you don't reference the list or the table later in the query then you are wasting memory.

lbendlin
Super User
Super User

You only get a performance boost if you re-use your buffer (table or list).  If you declare it and then don't use it at least twice, or - worse, as in your case - recreate the buffer in each iteration of List.Accumulate, then performance will be worse, not better.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.