cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Highlighted
Helper V
Helper V

Faster Division/Multiplication using Merge same table Query - Power Query

I have a table of Columns containing Attribute counts. I have added a column that does row-wise summation of the Attribute column (denoted by Numbers) values. I want to Divide the Attribute columns by the "Count" column, in-place (replace values) i.e. do a `Percentage of Row-Total`. I don't want to add new columns, then do the calculation, then delete the original column.

 

"ID"  "CD"     "6"    "5"     "4"    "3"   "1"   "Count"

1RJ.14512453755302
2RJ.2436636579211
3RJ.32910441768258
4RJ.42646233411140
5RJ.53451412213161
6RJ.63363265813193
7RJ.7225640418167
8RJ.83493405411232
9RJ.93469345011198
10RJ.10274633428156
11RJ.113270264215185
12RJ.12225735455164
13RJ.13258339439199
14RJ.14316926508184
15RJ.152462393510170
16RJ.16117543576192
17RJ.17214521238118
18RJ.182051303212145
19RJ.19286034379168
20RJ.20263829414138
21RJ.212164234211161
22RJ.223361353811178
23RJ.23195136368150
24RJ.242265323611166
25RJ.252885354511204
26RJ.262656282711148
27RJ.27294341337153
28RJ.282257154311148
29RJ.29143022359110
30RJ.30193525278114
31RJ.311830282612114
32RJ.321354364410157
33RJ.33205919409147
34RJ.34243438235124
35RJ.35266737398177
36RJ.36274631347145
37RJ.372426273613126
38RJ.38142424211093
39RJ.393158342810161
40RJ.401845253710135

 

In one of my searches, someone had suggested that the values for multiple columns could be calculated in a faster way (avoid row level context evaluation) by merging the table with itself and with some M-code. But not shared the code example.

Is it possible?

24 REPLIES 24
Highlighted

 

A more coincise and (peraphs ) general way to write "your" code:

 

 

let 
    dim=1000000,
    Origine = Table.FromColumns(List.Transform({1..10}, each List.Transform(List.Random(dim, _), each Number.Round(_*dim,0))),{"0".."9"}),


    PercentOfRow = Table.FromRecords( Table.TransformRows(Origine, (r)=>  let sum=List.Sum(Record.FieldValues(r)) in  Record.FromList(List.Transform(Record.FieldValues(r), each _/sum), {"0".."9"})& [Counts=sum])),
    pc = Table.TransformColumnTypes(PercentOfRow,List.Transform(Table.ColumnNames(Origine),each {_, Percentage.Type}))
in
    pc

 

 

 

 

Highlighted

Interesting @Rocco_sprmnt21 !

 

The main problem with my solution or yours or @Jimmy801 is that when i start adding custom calculated columns (row-level evaluation) to our final table, the loading time just keeps on increasing exponentially. 😞

 

for e.g. if i add the following calculated columns to our final table:

Step1 = AddColumn(Final, "Total" , each [#"6"] + [#"4"], Percentage.Type)

RowAvgOfTotal = Number.Round(List.Average(Step1[Total]), 2)

Step2 = AddColumn(Step1, "Total%" , each [Total] / RowAvgOfTotal, Percentage.Type)

RowPercentileOfTotal = Number.Round(PercentileInclusive(Step2[Total%], 0.8), 4 )
Step3 = AddColumn(Step2, "Total%Percentile" , each [Total] / RowPercentileOfTotal, Percentage.Type)

PercentileInclusive Function:

//PercentileInclusive Function
let
    Source = (inputSeries as list, percentile as number) =>
let
    SeriesCount = List.Count(inputSeries),
    PercentileRank = percentile*(SeriesCount-1)+1, //percentile value between 0 and 1
    PercentileRankRoundedUp = Number.RoundUp(PercentileRank),
    PercentileRankRoundedDown = Number.RoundDown(PercentileRank),
    Percentile1 = List.Max(List.MinN(inputSeries,PercentileRankRoundedDown)),
    Percentile2 = List.Max(List.MinN(inputSeries,PercentileRankRoundedUp)),
    Percentile = Percentile1+(Percentile2-Percentile1)*(PercentileRank-PercentileRankRoundedDown)
in
    Percentile
in
    Source

Adding such calculated columns just increases loading time from 4 to 5mbs to gb's. It is apparent that adding the columns to previous steps each time increases the load time as i think the previous calculated columns are also getting evaluated again.  Any alternative approach that you can suggest?

Highlighted

I'm not sure I understand what you want to do, but maybe it can be useful to try doing separate calculations on the same source table and then putting it all together.

ac1 = func1 (tab),
ac2 = func2 (tab),
..

 

tfc=Table.FromColumns(

Highlighted

let
    dim=500000,
    Origine = Table.FromColumns(List.Transform({1..10}, each List.Transform(List.Random(dim, _), each Number.Round(_*dim,0))),{"0".."9"}),


    PercentOfRow = Table.FromRecords( Table.TransformRows(Origine, (r)=>  let sum=List.Sum(Record.FieldValues(r)) in  Record.FromList(List.Transform(Record.FieldValues(r), each _/sum), {"0".."9"})& [Counts=sum])),
    pc = Table.TransformColumnTypes(PercentOfRow,List.Transform(Table.ColumnNames(Origine),each {_, Percentage.Type})),
    #"Aggiunta colonna personalizzata" = Table.AddColumn(pc, "total", each [4]+[6]),
    avg =  List.Average(#"Aggiunta colonna personalizzata"[total]),

    acp1 = Table.AddColumn(#"Aggiunta colonna personalizzata", "pctot", each [total]/avg),
    pcn=pcntl(acp1[pctot],0.8),
    acp = Table.AddColumn(acp1, "pcntl", each [pctot]/pcn)
in
    acp

 

 

this om my laptop ends in few minutes

Highlighted

I was tempted to generalize the following approach. For the first part it wasn't difficult.
But for the part where the type of the columns is specified, it wasn't easy.

 

 

 

 

 

PercentOfRow = Table.FromRecords(
    Table.TransformRows(
        #"Changed Type",
        (r) => 
        Record.TransformFields(
            r,
            {
                {"6", each r[#"6"] / r[Counts] },
                {"5", each r[#"5"] / r[Counts] },
                {"4", each r[#"4"] / r[Counts] },
                {"3", each r[#"3"] / r[Counts] },
                {"1", each r[#"1"] / r[Counts] }
            }
        )
    ),        
    type table [
               CD= Text.Type,
               ID = Int64.Type,
               #"6" = Percentage.Type, 
               #"5" = Percentage.Type,
               #"4" = Percentage.Type,
               #"3" = Percentage.Type,
               #"1" = Percentage.Type
    ]
)
in
    PercentOfRow

 

 

 

 

 

But here is the result anyway. 
 A big thank to   Ben Gribaudo  now the solution is  general and again one-linear

 

 

 

 

let 
    dim=1000000,
    Origine = Table.FromColumns(List.Transform({1..10}, each List.Transform(List.Random(dim, _), each Number.Round(_*dim,0))),{"0".."9"}),
    
    PercentOfRow = Table.FromRecords( Table.TransformRows(Origine, (r)=>  let sum=List.Sum(Record.FieldValues(r)) in  Record.FromList(List.Transform(Record.FieldValues(r), each _/sum) , {"0".."9"})& [Counts=sum]),   Expression.Evaluate("type table"& "["&Text.Combine(List.Transform({"0".."9"}, each _&"="&"Percentage.Type"),",")&", Counts=number]",#shared))

in
  PercentOfRow

 

 

 

 

 

Highlighted

Wow! This is incredible.

 

@Rocco_sprmnt21, I created a `sample workbook` and the `sample input csv` showing the Input and the desired Output just for your experienced eyes to see why the data is loading so slowly. I have added the percentile function that i had googled earlier (it seems there is no inbuilt functions in PQ for Percentile, Rank etc.)

 

Here are the files in this TEST folder. I have removed adding the other Columns code (as shown in desired output) as it was taking a long time to finish and just kept the code till where the row percentage of total gets calculated. The calculations for the additional columns are shown in the desired output table.

 

I am not sure what you meant by creating the columns separately and then joining them. I hope you can provide some clear logic on how to speed up the loading once the entire desired output report gets created.

Highlighted

Hi @Rocco_sprmnt21 ,

 

Any query in M can be written as one line this way, it is just a matter of time and persistence :).

 

This is similar, but parametrised approach:

let
    Origine = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("VVRLdiQxCLtLr/PmGTD+XGGWs+3X979GCgmYZNFWUiVTIMl+v1/y+nr9+/snYPqziM5ndXuWHf/Hz4a+Pl/vl5IbMIOwVryMxfez3OenIqAaqQEaz2VE2Rmf2bHhxAs/4E5yAzTezVg0tlo8k9gkc4Dr5Hq9dKmyGm1J7JLFHha5aBLtWn3BT3OvgbvJ3VXI0cio4uCvDeoh9VQL14rq3a4aFbvk3uKu/stHj3Ypg4y0IlB3CYHWp1YTT2Mgl2+BFm/3aP2gBMw8TnY6B+R8Ud+8XAd5TZLTO6DGmwMvbtkOP+8lOc0DmtSIVHlU0ycrp3tA5GxpVUYvGF02rZb0D4g5EUi0gPbx/FJoSQOB2mlWqSihD0ml00KgjoqRjdISOslM8dJEoMaWNcpFq9zLYmlNE7XdsFNnAElCRCzJaSK7hHiz+qWJ8iPOWsdPO9BSwtlp9s7a6aIy5T1jH1mmiVJrughE/eWlha3/jTB7mjYCIcjxX2niERg0XdNGIHOxah9Snuc7204fgdqRg3SYGTuch1bTR9bqWEvH5Hfp9JH30SzDKaeXj0JFLH0ESgeUI+/OE0e09BGIULHyqYmZp2Knj9YXFu4N6DxnnwLnbWN1k1pn9VZLuHQ4TZLTRyDMZEpPpapueJDTRg4WX1+7Io0jiRF3Vk4XgT8uJ6lv7B8nxtJEoPa9jm2ME8xRxsnSRWa4O+cipUfe05Ym8s6It94XMfSmeHlkZroIhDGzXURzZD8KfD7f", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, CD = _t, #"6" = _t, #"5" = _t, #"4" = _t, #"3" = _t, #"1" = _t, Count = _t]),
    ChangeType = Table.TransformColumnTypes(Origine,{{"6", Int64.Type}, {"5", Int64.Type}, {"4", Int64.Type}, {"3", Int64.Type}, {"1", Int64.Type}}),
    #"Removed Columns" = Table.RemoveColumns(ChangeType,{"Count"}),


    mTable = #"Removed Columns",
    nonValueColumns = {"ID", "CD"},
    
    fTransformRecord = (mRecord as record, mNonValueColumns as list)=>
        let 
            recordsToProcess = Record.RemoveFields(mRecord, mNonValueColumns),
            sumTotal = List.Sum(Record.ToList(recordsToProcess)),
            TransformRecord = List.Accumulate(Record.FieldNames(recordsToProcess), recordsToProcess, (a, n)=> Record.TransformFields(a, {n, (x)=>x / sumTotal}))   & Record.SelectFields(mRecord, nonValueColumns)     
        in TransformRecord,
    
    Output = Table.FromRecords(Table.TransformRows(mTable, (x)=> fTransformRecord(x, nonValueColumns))) 

in Output

 

Kind regards,

JB

Highlighted

This is fast. However, once you start adding additional calculated columns (see sample workbook & sample csv in TEST folder in my previous reply), then loading the table each time a calculated column gets added, goes for a toss! That is my concern and the desired output i am trying to achieve in a fast loading manner.

Highlighted

Hi @jborro ,

 

this statement of yours

"Any query in M can be written as one line this way, it is just a matter of time and persistence :)." (*)

is an incontrovertible truth of which I am well aware.

 

But it was not from this general aspect that my enthusiasm for the 1-line solution arose.

 

The purpose of my previous post, after seeing that @sifar786 's 1-line solution was performing, was to generalize it, for example, to be able to deal with a generic number of columns.

But, in a first attempt I was unable to make the part of setting column types parametric and in the same expression.

After reading Bengraud's blog, I was able, albeit with the use of the expression.evaluate function, to put in the same expression as it was in the 1-line solution of @sifar786  also the part of setting column types.

 

 

from this two

 

 

 

    PercentOfRow = Table.FromRecords( Table.TransformRows(Origine, (r)=>  let sum=List.Sum(Record.FieldValues(r)) in  Record.FromList(List.Transform(Record.FieldValues(r), each _/sum), {"0".."9"})& [Counts=sum])),
    pc = Table.TransformColumnTypes(PercentOfRow,List.Transform(Table.ColumnNames(Origine),each {_, Percentage.Type}))

 

 

 

 

to this one

 

 

 

 

    PercentOfRow = Table.FromRecords( Table.TransformRows(Origine, (r)=>  let sum=List.Sum(Record.FieldValues(r)) in  Record.FromList(List.Transform(Record.FieldValues(r), each _/sum) , {"0".."9"})& [Counts=sum]),   Expression.Evaluate("type table"& "["&Text.Combine(List.Transform({"0".."9"}, each _&"="&"Percentage.Type"),",")&", Counts=number]",#shared))

 

 

 

 

just this 🙂

 

 

(*)

the reduction to a single line was not simply incorporating one expression into another: so for example.

 

    pc = Table.TransformColumnTypes(    PercentOfRow = Table.FromRecords( Table.TransformRows(Origine, (r)=>  let sum=List.Sum(Record.FieldValues(r)) in  Record.FromList(List.Transform(Record.FieldValues(r), each _/sum), {"0".."9"})& [Counts=sum])),,List.Transform(Table.ColumnNames(Origine),each {_, Percentage.Type}))

 

I would never have been thrilled with a solution put like this.

 

PS

In general I believe that it is not the task of power query to do these "aesthetic" operations, I mean to change the number type to percentage type.

Perhaps it is preferable to do only mash-up and not make-up.

 

 

Highlighted

@Rocco_sprmnt21 


@Rocco_sprmnt21 wrote:

I'm not sure I understand what you want to do, but maybe it can be useful to try doing separate calculations on the same source table and then putting it all together.

ac1 = func1 (tab),
ac2 = func2 (tab),
..

 

tfc=Table.FromColumns(


These seem useful, but don't know how to do it without affecting loading time. How would you do it with my given sample files to create the desired output as shown in the workbook in a speedier way? Right now, in my amateur way, the loading takes a lot of time (going into gb's).

Helpful resources

Announcements
Community Conference

Power Platform Community Conference

Check out the on demand sessions that are available now!

Community Conference

Microsoft Power Platform Communities

Check out the Winners!

secondImage

Power Platform 2020 release wave 2 plan

Features releasing from October 2020 through March 2021

Top Solution Authors
Top Kudoed Authors