cancel
Showing results for
Did you mean:
Highlighted
Regular Visitor

## Calculate the Count of the Mode Value for each SN Group

Source Data:

I was able to use the list functions for each grouped table to get:

Min

Max

Mode

Average

Count

#"Customized" = Table.Group(#"Changed Type", "SN", {{ "Min", each List.Min([Value]), type number},
{"Max", each List.Max([Value]), type number},{"Mode", each List.Mode([Value]), type number},
{"Average", each List.Average([Value]), type number},{"Count", each List.Count([Value]), type number}},
GroupKind.Local, Comparer.OrdinalIgnoreCase)

List.Mode returns the value used most for each table, but what is the proper way to add another column to show the count of the mode value occurances for each table?

For example, adding another column showing count of the mode values:

SN      Mode_Value_Occurance

a         2

b         4

1 ACCEPTED SOLUTION

Accepted Solutions
Super User

## Re: Calculate the Count of the Mode Value for each SN Group

Something like this should work (additional code in red):

```#"Customized" = Table.Group(#"Changed Type", "SN", {{ "Min", each List.Min([Value]), type number},
{"Max", each List.Max([Value]), type number},{"Mode", each List.Mode([Value]), type number},
{"Average", each List.Average([Value]), type number},{"Count", each List.Count([Value]), type number},{"Mode_Value_Occurrence",  each let mode = List.Mode([Value]) in List.Count(List.Select([Value], each _=mode)), Int64.Type}},
GroupKind.Local, Comparer.OrdinalIgnoreCase)```

Admittedly, the Mode ends up being evaluated twice for each group, but if this performs well enough I wouldn't worry about it.

Regards,

Owen

Proud to be a Datanaut!

10 REPLIES 10
Super User

## Re: Calculate the Count of the Mode Value for each SN Group

Something like this should work (additional code in red):

```#"Customized" = Table.Group(#"Changed Type", "SN", {{ "Min", each List.Min([Value]), type number},
{"Max", each List.Max([Value]), type number},{"Mode", each List.Mode([Value]), type number},
{"Average", each List.Average([Value]), type number},{"Count", each List.Count([Value]), type number},{"Mode_Value_Occurrence",  each let mode = List.Mode([Value]) in List.Count(List.Select([Value], each _=mode)), Int64.Type}},
GroupKind.Local, Comparer.OrdinalIgnoreCase)```

Admittedly, the Mode ends up being evaluated twice for each group, but if this performs well enough I wouldn't worry about it.

Regards,

Owen

Proud to be a Datanaut!

Regular Visitor

## Re: Calculate the Count of the Mode Value for each SN Group

Thanks  @OwenAuger

It worked perfectly.

There's no other way to define the mode value to count by looking at the "Mode" column?

That's the way I was trying to do it.  Count the occurences in [Value}  defined by the values in the "Mode" column.

Super User

## Re: Calculate the Count of the Mode Value for each SN Group

You're welcome @roncruiser

One method I can think of to allow one aggregation to refer to another aggregation (e.g. Mode_Value_Occurrence to refer to Mode) is by creating a record rather than 6 separate aggregations, since the fields of a record can refer to each other. Then you just need another step to expand the record.

The code would look like this (paste into a blank query):

```let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSlTSUTJUitWBsIywsIzhLBMwKwkuhsoyxC4bCwA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [SN = _t, Value = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"SN", type text}, {"Value", type number}}),
Customized = Table.Group(#"Changed Type", "SN",{{ "Group", each [ Min = List.Min([Value]), Max = List.Max([Value]), Mode = List.Mode([Value]), Average = List.Average([Value]), Count = List.Count([Value]), Mode_Value_Occurrence = List.Count(List.Select([Value], each _=Mode))], type [Min=number, Max=number, Mode=number, Average=number, Count=Int64.Type, Mode_Value_Occurrence=Int64.Type]}}, GroupKind.Local, Comparer.OrdinalIgnoreCase),
#"Expanded Group" = Table.ExpandRecordColumn(Customized, "Group", {"Min", "Max", "Mode", "Average", "Count", "Mode_Value_Occurrence"}, {"Min", "Max", "Mode", "Average", "Count", "Mode_Value_Occurrence"})
in
#"Expanded Group"```

Alternatively, it might be possible to a subsequent step using Table.AddColumn that takes the resulting Mode values and counts rows of the table just before the grouping step where Value matches the Mode. However this would be a bit awkward and would require effectively re-running the grouping. It would also be difficult since the Table.Group step has GroupKind.Local, meaning you would have separate groupings for each "run" of the same SN value.

Regards,

Owen

Proud to be a Datanaut!

Regular Visitor

## Re: Calculate the Count of the Mode Value for each SN Group

What would be more efficient in terms of processing time?  Creating a record or table.  My dataset is 150K devices (SN's).  Each device has many different parameter values.  I used "Value" as a generic term, but each device has ~100 different "Value's". From each device i would be extracting the same statistics for many different "Value" parameters. Knowing which to use, a record or a table, can give me a time advantage loading into the datamodel.  As it is, it takes a very long time.

Super User

## Re: Calculate the Count of the Mode Value for each SN Group

My intuition (without testing) is that creating and expanding the record adds some overhead that wouldn't exist with the regular grouping. Would be interested to hear about relative performance on a trimmed-down version of your dataset.

A side note - I would suggest changing GroupKind.Local to GroupKind.Global, unless you specifically wanted local grouping. This should also help with performance.

One other question - is it necessary to pre-calculate these statistics in Power Query, or could you simply have DAX measures that are evaluated when required?

Proud to be a Datanaut!

Regular Visitor

## Re: Calculate the Count of the Mode Value for each SN Group

For each device (.xml file), there are 40 different tracked parameters.  For each parameter, there are 128 different nodes.  Each node value is recorded in the .xml file for every parameter.

40 parameters x 128 nodes = 5120 recorded values in each .xml file

For 100K devices = 512M recorded values.

The xml files are stored on our network.  Not a server database like MySQL.

My first thought was to divide the 40 different parameters into 10 different excel workbooks holding 10 parameters each.   That's based on my experience or inexperience.    Pre-calculating each tracked parameter per device seemed logical rather than trying to cram it all into one super large data model.

I'm open for suggestions to speed things up.  Rather than having to reduce the sample size.

-------------------------

I'll change to GroupKin.Global to increase performance and keep grouped tables.  Thank you again.

Regular Visitor

## Re: Calculate the Count of the Mode Value for each SN Group

As an added note of information, all group bys and the statistics are caculated by a separate function.  Maybe that adds more overall processing time.  Is it better practice to have group bys and statistics done in the main query rather than going through a function to calculate the stats for each xml file?

Super User

## Re: Calculate the Count of the Mode Value for each SN Group

Interesting - is there any way you could post a small example of the actual files and existing queries you have set up?

I'm going to tag @ImkeF and @MarcelBeug who are more specialised in Power Query and may have some smarter ways of handling this.

Proud to be a Datanaut!

Regular Visitor

## Re: Calculate the Count of the Mode Value for each SN Group

Sure.  I'll prep a workbook to share with the group.

Notes: Each xml file is approximately 330KB in size.

As an experiement, late yesterday I kicked off a full load of each xml file at a reduced sample size of 50K devices  (50K xml files) into the data model.  I built a function that strips out ~1000 rows of useful data from each xml file that is 15 columns wide.  Just basically a mass data import.  Then write pertinent measures and calculations on top of that.

It doesn't seem efficient to do it this way as it's still loading this morning when I walked into work:

46 Million Rows

17hr: 28min later.

I'm still leaning towards breaking up the data into separate workbooks...

Any guidance would be appreciated.  I'll post up the sample workbook soon...  Thank you!