Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
Nad_PBI_21
Frequent Visitor

Categorisation through keywords slow

Hi,

I'm looking for a more efficient method to categorize dates based on a reference table that contains keywords.

In the query below, I search for keywords (Booking and Partner) twice and then merge to retrieve specific columns from the reference table. However, any modification is quite slow, even simple tasks like changing the format. The left pane indicates a high amount of MB loading. I've applied some filters to reduce the number of lines and also buffers.

Would anyone know of a smarter approach to achieve the same results?

Thank you very much,

Nad

 

Steps:

Added Booking keyword: This column is created by searching for matches between the "Booking keyword" column in a reference table named REF_KEY and the "Buchungstext" column in the Source table. If a match is found, the value from the REF_KEY table is added to the new column.

Added Partner keyword: Similar to the previous step, 

Merged Queries: Performs a left outer join between the Source table and the REF_KEY table based on three columns: "Kostenart", "Match Booking keyword", and "Match Partner keyword".

 

Query

let

    // Rename the TRANSFORM table as Source

    Source = TRANSFORM,

 

    // Add custom column for Booking keyword match

    #"added Booking keyword" = Table.AddColumn(bufferedSource, "Match Booking keyword", each 

        let

        // Filter REF_KEY table and buffer it

            filtered_REF_KEY = Table.Buffer(Table.SelectColumns(REF_KEY, {"Kostenart", "Booking keyword"})),

            condition1 = [Kostenart],

            filteredCondition = Table.SelectRows(filtered_REF_KEY, each [Kostenart] = condition1),

            whatSearch = List.RemoveNulls(filteredCondition[Booking keyword]),

            whereSearch = [Buchungstext],  // Assuming [Buchungstext] is a column in Source

            matches = List.Select(whatSearch, each Text.Contains(Text.Lower(whereSearch), Text.Lower(_)))

        in

            if List.IsEmpty(matches) then null else matches{0}

    ),

 

    // Add custom column for Partner keyword match

    #"added Partner keyword" = Table.AddColumn(#"added Booking keyword", "Match Partner keyword", each 

        let

        // Filter REF_KEY table and buffer it

            filtered_REF_KEY = Table.Buffer(Table.SelectColumns(REF_KEY, {"Kostenart", "Partner keyword"})),

            condition1 = [Kostenart],

            filteredCondition = Table.SelectRows(filtered_REF_KEY, each [Kostenart] = condition1),

            whatSearch = List.RemoveNulls(filteredCondition[Partner keyword]),

            whereSearch = [Partner],

            matches = List.Select(whatSearch, each Text.Contains(Text.Lower(whereSearch), Text.Lower(_)))

        in

            if List.IsEmpty(matches) then null else matches{0}

    ),

    #"Inserted Year" = Table.AddColumn(#"added Partner keyword", "Year", each Date.Year([Belegdatum]), Int64.Type),

    #"Merged Queries" = Table.NestedJoin(#"Inserted Year", {"Kostenart", "Match Booking keyword", "Match Partner keyword"}, REF_KEY, {"Kostenart", "Booking keyword", "Partner keyword"}, "REF_KEY", JoinKind.LeftOuter),

    #"Expanded REF_KEY" = Table.ExpandTableColumn(#"Merged Queries", "REF_KEY", {"Assumption", "Em. fact. key"}, {"Assumption", "Em. fact. key"})

in

    #"Expanded REF_KEY"

1 ACCEPTED SOLUTION
v-junyant-msft
Community Support
Community Support

Hi @Nad_PBI_21 ,

I can give you some suggestions:

1. Minimize the use of 'Table.Buffer'. While buffering can improve performance by reducing the number of times a source is accessed, overusing it, especially with large tables, can lead to high memory consumption. Consider buffering only the smallest necessary tables or the ones that are used multiple times in subsequent transformations.
2. Instead of performing the keyword search within an `each` function for every row, you could try to transform your reference table into a list of keywords and their corresponding categories once, and then use this list to categorize your main table. This way, you perform the heavy lifting of creating the keyword-category list just once.
3. Use 'Table.Join' instead of 'Table.NestedJoin' when possible. If you don't need nested tables and only require a few columns from the reference table, a regular join might be more efficient.
4.  Ensure that the columns used for joins and searches have the appropriate data types. Text operations can be slow on numeric or date fields if they are not properly typed.

Best Regards,
Dino Tao
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

3 REPLIES 3
v-junyant-msft
Community Support
Community Support

Hi @Nad_PBI_21 ,

I can give you some suggestions:

1. Minimize the use of 'Table.Buffer'. While buffering can improve performance by reducing the number of times a source is accessed, overusing it, especially with large tables, can lead to high memory consumption. Consider buffering only the smallest necessary tables or the ones that are used multiple times in subsequent transformations.
2. Instead of performing the keyword search within an `each` function for every row, you could try to transform your reference table into a list of keywords and their corresponding categories once, and then use this list to categorize your main table. This way, you perform the heavy lifting of creating the keyword-category list just once.
3. Use 'Table.Join' instead of 'Table.NestedJoin' when possible. If you don't need nested tables and only require a few columns from the reference table, a regular join might be more efficient.
4.  Ensure that the columns used for joins and searches have the appropriate data types. Text operations can be slow on numeric or date fields if they are not properly typed.

Best Regards,
Dino Tao
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

thanks a lot for your help, I will try to follow your recommendations @v-junyant-msft 

dufoq3
Super User
Super User

Hi @Nad_PBI_21,

 

We'd like to help you, but provide sample data as table and also expected result please.


Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

Helpful resources

Announcements
PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

Top Solution Authors
Top Kudoed Authors