Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
I have a query (from a Power BI Dataflow) with approximately 14000 rows. The Primary key ([Parent], Int64.Type) can have multiple records ([Batch Number], type text) associated with it. I need to count the number of [Batch Number]s associated with each [Parent] so that I can exclude [Parent] records with more than 20 [Batch Number] records.
No matter what I try, any aggregation step is VERY slow. I have also tried buffering the data but this step is itself very slow.
My current steps (including re-expanding the original records are:
#"Parents Only" = Table.SelectColumns(#"Previous Step", {"Parent"}),
//Group the records, with a count **THIS IS THE SLOW STEP**
#"Grouped Rows" = Table.Group(#"Parents Only", {"Parent"}, {{"Count", each Table.RowCount(_), Int64.Type}}),
//Add a key **Not sure I need this but thought it would speed things up??**
#"Added [Parent] Key" = Table.AddKey(#"Grouped Rows", {"Parent"}, true),
// Filtered to Parents with NGT 20 batches listed
#"Filtered Rows" = Table.SelectRows(#"Added [Parent] Key", each [Count] <= 20),
// Remove count, just leave Parent ID
#"Removed Count" = Table.RemoveColumns(#"Filtered Rows",{"Count"}),
// Re-join the list of Parent ID numbers to the filtered table, now only with Parents with NGT 20 batches
#"Re-join" = Table.NestedJoin(#"Removed Count", {"Parent"}, #"Added [Parent] Key", {"Parent"}, "Original Record"),
#"Expanded Original Record" = Table.ExpandTableColumn(#"Re-join", "Original Record", {"Batch Number"}, {"Batch Number"}),
I have tried the Table.AddKey step before the Table.Group but this but this doesn't speed up the process and is also fairly slow itself.
I can't create a separate query of just the [Parent] records, as some contain e.g. multiple [Batch Number]s separated by commas that need de-aggregating (Table.ExpandListColumn...) to multiple records within the same query.
Solved! Go to Solution.
SOLVED.
The speed (or lack thereof) was due to a previous step in the query - I was filtering the the first part of the [Batch Number] field against a list of known [Batch Prefix]s. What was happening (obvious now!) was that the call was being made to this list multiple times. A List.Buffer wrapper therefore solved it in one go!
//Buffer the list of Batch Prefixes to prevent multiple calls
#"Batch Prefixes" = List.Buffer(#"Prefixes"[Batch Prefix]),
//Select only rows with batch prefixes that match the [Batch Number] Prefix
#"Filter for Known Batches" = Table.SelectRows(#"Table with Batch Prefixes", each List.Contains(#"Batch Prefixes", [Batch Prefix])),
One point to note is that I did try to create a list query of the Batch Prefixes up front and reference that but got a Formula.Firewall error, hence creating the list from a table within this query (#"Prefixes"[Batch Prefix])
SOLVED.
The speed (or lack thereof) was due to a previous step in the query - I was filtering the the first part of the [Batch Number] field against a list of known [Batch Prefix]s. What was happening (obvious now!) was that the call was being made to this list multiple times. A List.Buffer wrapper therefore solved it in one go!
//Buffer the list of Batch Prefixes to prevent multiple calls
#"Batch Prefixes" = List.Buffer(#"Prefixes"[Batch Prefix]),
//Select only rows with batch prefixes that match the [Batch Number] Prefix
#"Filter for Known Batches" = Table.SelectRows(#"Table with Batch Prefixes", each List.Contains(#"Batch Prefixes", [Batch Prefix])),
One point to note is that I did try to create a list query of the Batch Prefixes up front and reference that but got a Formula.Firewall error, hence creating the list from a table within this query (#"Prefixes"[Batch Prefix])
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
114 | |
98 | |
86 | |
70 | |
62 |
User | Count |
---|---|
151 | |
120 | |
103 | |
87 | |
68 |