Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
ChemEnger
Advocate IV
Advocate IV

VERY Slow Group By

I have a query (from a Power BI Dataflow) with approximately 14000 rows.  The Primary key ([Parent], Int64.Type) can have multiple records ([Batch Number], type text) associated with it.  I need to count the number of [Batch Number]s associated with each [Parent] so that I can exclude [Parent] records with more than 20 [Batch Number] records.

 

No matter what I try, any aggregation step is VERY slow.  I have also tried buffering the data but this step is itself very slow.

 

My current steps (including re-expanding the original records are:

#"Parents Only" = Table.SelectColumns(#"Previous Step", {"Parent"}),
//Group the records, with a count **THIS IS THE SLOW STEP**
#"Grouped Rows" = Table.Group(#"Parents Only", {"Parent"}, {{"Count", each Table.RowCount(_), Int64.Type}}),
//Add a key **Not sure I need this but thought it would speed things up??**
#"Added [Parent] Key" = Table.AddKey(#"Grouped Rows", {"Parent"}, true),
// Filtered to Parents with NGT 20 batches listed
#"Filtered Rows" = Table.SelectRows(#"Added [Parent] Key", each [Count] <= 20),
// Remove count, just leave Parent ID
#"Removed Count" = Table.RemoveColumns(#"Filtered Rows",{"Count"}),
// Re-join the list of Parent ID numbers to the filtered table, now only with Parents with NGT 20 batches
#"Re-join" = Table.NestedJoin(#"Removed Count", {"Parent"}, #"Added [Parent] Key", {"Parent"}, "Original Record"),
#"Expanded Original Record" = Table.ExpandTableColumn(#"Re-join", "Original Record", {"Batch Number"}, {"Batch Number"}),

I have tried the Table.AddKey step before the Table.Group but this but this doesn't speed up the process and is also fairly slow itself.

I can't create a separate query of just the [Parent] records, as some contain e.g. multiple [Batch Number]s separated by commas that need de-aggregating (Table.ExpandListColumn...) to multiple records within the same query.

1 ACCEPTED SOLUTION
ChemEnger
Advocate IV
Advocate IV

SOLVED.

 

The speed (or lack thereof) was due to a previous step in the query - I was filtering the the first part of the [Batch Number] field against a list of known [Batch Prefix]s.  What was happening (obvious now!) was that the call was being made to this list multiple times.  A List.Buffer wrapper therefore solved it in one go!

 

//Buffer the list of Batch Prefixes to prevent multiple calls
#"Batch Prefixes" = List.Buffer(#"Prefixes"[Batch Prefix]),
//Select only rows with batch prefixes that match the [Batch Number] Prefix
#"Filter for Known Batches" = Table.SelectRows(#"Table with Batch Prefixes", each List.Contains(#"Batch Prefixes", [Batch Prefix])),

 

One point to note is that I did try to create a list query of the Batch Prefixes up front and reference that but got a Formula.Firewall error, hence creating the list from a table within this query (#"Prefixes"[Batch Prefix])

View solution in original post

1 REPLY 1
ChemEnger
Advocate IV
Advocate IV

SOLVED.

 

The speed (or lack thereof) was due to a previous step in the query - I was filtering the the first part of the [Batch Number] field against a list of known [Batch Prefix]s.  What was happening (obvious now!) was that the call was being made to this list multiple times.  A List.Buffer wrapper therefore solved it in one go!

 

//Buffer the list of Batch Prefixes to prevent multiple calls
#"Batch Prefixes" = List.Buffer(#"Prefixes"[Batch Prefix]),
//Select only rows with batch prefixes that match the [Batch Number] Prefix
#"Filter for Known Batches" = Table.SelectRows(#"Table with Batch Prefixes", each List.Contains(#"Batch Prefixes", [Batch Prefix])),

 

One point to note is that I did try to create a list query of the Batch Prefixes up front and reference that but got a Formula.Firewall error, hence creating the list from a table within this query (#"Prefixes"[Batch Prefix])

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.