Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
mikecrobp
Helper I
Helper I

Power M Query custom aggregate function to return most common value within a group

Can anyone help me? To come up with a custom aggregate function to return the most common value for one or more columns for a set of records which have a unique grouping: ie that you can use where you might use MAX, MIN etc but the result is the most commonly occuring value.

 

eg if grouping on col G1 and aggregating on cols A1 and A2 with the first table as input data you would get the 2nd table as output (happy to reformat tables if someone can point to how):

 

G1 A1 A2
a V1 V1
a V2 V1
a V2 V2
b V1 V1
b V2 V1


G1 A1 A2
a V2 V1
b V1 V1

 

Must be Power M Query - not DAX

And I guess formula to put in the Table.Group parameters  would be OK rather than a neat custom function. 

Something which looks at the resulting list of values, gets the distinct values, then counts occurences, sorts Hi-Lo and returns the first value. But I can't get my head around that.

 

Mike 

 

1 ACCEPTED SOLUTION

Here is the answer I came up with. I have never written a Power M Query function definition so all feedback on style and performance welcome.

I think it is a useful function for when your data is mostly correct but you need to take the common value.

 

The logic of the function is:

  1.  get the unique values and convert to table
  2.  find number of times each occurs in the input
  3.  Sort by descending and return the string in the first row - the most commonly occuring string

I could add something for how nulls are handled (eg ignore null if top string and return next one down) but this seems to work for me

 

Function is MostCommon (you have to name the query)

 

let 
fnMostCommon = (ListIn)  =>
let
uniquevalues=List.Distinct(ListIn),
result=Table.FromList(uniquevalues,null,{"u"}),
result2=Table.AddColumn(result ,"freq", each List.Count(List.PositionOf(ListIn, [u], 100))),
result3 = Table.Sort(result2,{"freq", Order.Descending}),
result4 = List.First(result3[u])
in result4
in fnMostCommon

 

And the test harness is:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSlTSUQozhBCxOlC+ETa+EZifhKY+CVl9LAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [G1 = _t, A1 = _t, A2 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"G1", type text}, {"A1", type text}, {"A2", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"G1"}, {{"Agg", each MostCommon([A1]), type text}, {"Agg2", each AllConcat([A1]), type text}}),
    a = #"Grouped Rows"{[G1="a"]}[Agg]
in
    a

 

 

View solution in original post

3 REPLIES 3
v-xuding-msft
Community Support
Community Support

Hi @mikecrobp ,

I created a sample. Maybe it helps a little. Please have a try.

  • Duplicate orginal table as Table(2) and group by it.

5.PNG

  • Group by columns of Table.

1.PNG

  • Merge two tables and append the new column

2.PNG

  • Remove "Count" column and rows.

4.PNG

Table:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSlTSUQozhBCxOlC+ETa+EZifhKY+CVl9LAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [G1 = _t, A1 = _t, A2 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"G1", type text}, {"A1", type text}, {"A2", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"G1", "A1"}, {{"Count", each Table.RowCount(Table.Distinct(_)), type number}}),
    #"Merged Queries" = Table.NestedJoin(#"Grouped Rows", {"G1"}, #"Table (2)", {"G1"}, "Table (2)", JoinKind.LeftOuter),
    #"Aggregated Table (2)" = Table.AggregateTableColumn(#"Merged Queries", "Table (2)", {{"Count", List.Min, "Min of Table (2).Count"}}),
    #"Removed Columns" = Table.RemoveColumns(#"Aggregated Table (2)",{"Count"}),
    #"Removed Top Rows" = Table.Skip(#"Removed Columns",1),
    #"Removed Bottom Rows" = Table.RemoveLastN(#"Removed Top Rows",1)
in
    #"Removed Bottom Rows"

 Table(2):

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSlTSUQozhBCxOlC+ETa+EZifhKY+CVl9LAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [G1 = _t, A1 = _t, A2 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"G1", type text}, {"A1", type text}, {"A2", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"G1", "A1"}, {{"Count", each List.Min([A2]), type text}})
in
    #"Grouped Rows"

Best Regards,

Xue Ding

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly. Kudos are nice too.

Best Regards,
Xue Ding
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Thank you Xue Ding. of course that is how I should have presented the problem - with code using Table.FromRows!

 

My apologies - I should have said that I been using the technique of creating a new table and merging it back in.

Based on a post in stack overflow: Select row with MAX value per category Power BI

 

I wanted to create a new aggregate function because:

1 - it will be easy to read on the code. It will just appear with the standard aggregate functions like MAX, SUM etc

2 - To see if it ran any faster

3 - it will reduce the number of queries, for clarity

 

Writing the question yesterday has helped me think about the problem and I will write something based on your test data Mike

Here is the answer I came up with. I have never written a Power M Query function definition so all feedback on style and performance welcome.

I think it is a useful function for when your data is mostly correct but you need to take the common value.

 

The logic of the function is:

  1.  get the unique values and convert to table
  2.  find number of times each occurs in the input
  3.  Sort by descending and return the string in the first row - the most commonly occuring string

I could add something for how nulls are handled (eg ignore null if top string and return next one down) but this seems to work for me

 

Function is MostCommon (you have to name the query)

 

let 
fnMostCommon = (ListIn)  =>
let
uniquevalues=List.Distinct(ListIn),
result=Table.FromList(uniquevalues,null,{"u"}),
result2=Table.AddColumn(result ,"freq", each List.Count(List.PositionOf(ListIn, [u], 100))),
result3 = Table.Sort(result2,{"freq", Order.Descending}),
result4 = List.First(result3[u])
in result4
in fnMostCommon

 

And the test harness is:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSlTSUQozhBCxOlC+ETa+EZifhKY+CVl9LAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [G1 = _t, A1 = _t, A2 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"G1", type text}, {"A1", type text}, {"A2", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"G1"}, {{"Agg", each MostCommon([A1]), type text}, {"Agg2", each AllConcat([A1]), type text}}),
    a = #"Grouped Rows"{[G1="a"]}[Agg]
in
    a

 

 

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.