Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Grow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.

Reply
streli
Helper I
Helper I

Cumulative sum in powerquery editor

Hi

 

I have 3 colums: a dataday, an expiry date column and a volumen column (you can see in the picture: each dataday has an expiry date and a unit value.)

How can I calculate a cumulative amount for the unit values in Power Query Editor, so that the calculation skips the datadays and unit values with an earlier expiry date than the dataday?

 

streli_0-1690203859633.png

 

For example, there is an expiry date of 8 June for a 7 June dataday (where the volume is 10) and there is an expiry date of 8 June for a 8 June dataday (where the volume is 10) and therefore I don't want it to be included in the cumulative amount of 9 June. The correct result is therefore 71 for 9 June and not 91 because the expiry dates of 8 June are excluded from the aggregation.

 

Thank you very much!

15 REPLIES 15
Resolutions
Regular Visitor

@streli To calculate the cumulative amount while excluding the unit values with earlier expiry dates in Power Query Editor, you can use the following steps:

 

Assuming your table is named "YourTable" and the columns are named "DateDay," "ExpiryDate," and "Volume," follow these steps in Power Query Editor:

Step 1: Sort the data:
Sort the table by the "DateDay" column in ascending order and then by the "ExpiryDate" column in descending order. This will ensure that the unit values with earlier expiry dates are listed first and will be excluded from the cumulative calculation.

 

Step 2: Add an Index Column:
Add an Index column to the table. This will create a unique identifier for each row, which is essential for the next steps.

 

Step 3: Duplicate the table:
Duplicate the table using the "Duplicate" feature in Power Query Editor. This will create a copy of the original table that we can use for the cumulative calculation.

 

Step 4: Merge Queries:
Merge the original table with the duplicated table based on the "DateDay" and "ExpiryDate" columns. The merge should be a Left Anti join, which will keep only the rows from the original table where there is no match in the duplicated table (i.e., where the unit values have an earlier expiry date). This will effectively exclude those rows from the cumulative calculation.

 

Step 5: Group and Aggregate:
Group the resulting table by "DateDay" and aggregate the "Volume" column using the "Sum" function. This will calculate the cumulative amount, excluding the unit values with earlier expiry dates.

 

Step 6: Merge the Results:
Merge the aggregated table back with the original table based on the "DateDay" column. This will bring the cumulative amount back to the original table.

 

Step 7: Remove Unnecessary Columns:
Remove the duplicated table and any intermediate columns created during the process, keeping only the "DateDay" and "Volume" columns.

 

Now, your table should have a new column showing the cumulative amount while excluding the unit values with earlier expiry dates.

Hi Resolutions, thank you very much for your reply, I will be able to check your solution proposal on Monday. But I have Direct Query connect, in which case I cannot add Index Column...

ronrsnfld
Super User
Super User

Your description is not clear to me. For example, you write:

  • skips the datadays and unit values with an earlier expiry date than the dataday?

But there are no examples in your data where the expiry date occurs before the dataday date on the same row

  • expiry date of 8 June for a 7 June dataday -- but 8 June occurs after 7 June not before
  • expiry date of 8 June for a 8 June dataday  --  again 8 June is not before 8 June

If what you really mean is to exclude those situations where the expiry date is the same as or before the data day, there is only a single instance (where they are the same.

 

If you mean something else, you can adjust the comparison operator in the code below.

 

  • Group by Date and Aggregate by Sum of Volume
  • Add running total column
    • In the code below I chose to use List.Generate, as it does not require an additional Index column

 

let

//This group of lines is to transform the result of pasting your screenshot into Excel into something usable.
// replace them with your actual data source
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type date}, {"Column2", type text}}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "Column2", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"Column2.1", "Column2.2"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column2.1", Int64.Type}, {"Column2.2", type date}}),
    #"Renamed Columns" = Table.RenameColumns(#"Changed Type1",{{"Column1", "DataDay"}, {"Column2.1", "Volume"}, {"Column2.2", "ExpiryDay"}}),

//-----------------------------------
//Group Rows by DataDay Sum the Volume
    #"Grouped Rows" = Table.Group(#"Renamed Columns", {"DataDay"}, {
        {"Total", (t)=> List.Sum(Table.SelectRows(t, each [ExpiryDay]>[DataDay])[Volume]), type nullable number}}),
    
//Add Running Total Column
    #"Running Total" = 
        let 
            vol = List.Buffer(#"Grouped Rows"[Total]),
            runT = List.Generate(
                ()=>[rt = vol{0}, idx=0],
                each [idx] < List.Count(vol),
                each [rt = [rt] + vol{[idx]+1}, idx = [idx]+1],
                each [rt]),
            res = Table.FromColumns(
                Table.ToColumns(#"Grouped Rows") & {runT},
                type table[DataDay=date, Volume=Int64.Type, Volume Running Total=Int64.Type])
        in 
            res
in
    #"Running Total"

 

 

Results from your Data

ronrsnfld_0-1690457809130.png

If you want something else, please add sufficient detail

 

 

 

 

Hi, thank you very much for your reply, in the aggregate value for June 9 (and June 10), I would like to exclude these two values because these dates are earlier than June 9 (June 8 < June 9), so the correct value for June 9 would be 71 and the correct value for June 10 would be 92 

streli_0-1690458941201.png

 

Ok, I think I understand now. We will try the following:

  • Group by DataDay and SUM the volume
    • Sort the table in Date order ascending
  • Group by ExpiryDay and SUM the volume
    • Sort the table in Date order Ascending
  • Add Running Total columns to each of the above Grouped tables
    • Using a custom function for this so as to simplify the code
  • Correct the RT column of the DataDay table by subtracting the last amount in the ExpiryDate column where the Date in ExpiryDay RT is less than that in the DataDay RT

Note that the first lines are merely to transform your data so I can use it, since you only provided a screenshot and not text information that could be copy/pasted

 

Running Total Custom Function

Rename fnRT

 

//Running Total Returns a Table
//Rename: "fnRT"

(tbl as table, col as text)=> 


let 
    values = List.Buffer(Table.Column(tbl,col)),
    colNames= List.Buffer(Table.ColumnNames(tbl)),
    RT = List.Generate(
        ()=> [rt = values{0}, idx = 0],
        each [idx] < List.Count(values),
        each [rt = [rt] + values{[idx]+1}, idx = [idx]+1],
        each [rt]
    ),
    res = Table.FromColumns(
        Table.ToColumns(tbl) & {RT},
        colNames & {col & " Running Total"}
    )
in 
    res

 

 

 

Main Query

 

let
    Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"DataDay", type date}, {"Volume", Int64.Type}, {"ExpiryDay", type date}}),

//Group Rows by DataDay; Sum the Volume
    #"Grouped DataDay" = Table.Group(#"Changed Type", {"DataDay"}, {
        {"Totals by DataDay", each List.Sum([Volume]), Int64.Type}}),
    #"Sorted DataDay" = Table.Sort(#"Grouped DataDay",{{"DataDay", Order.Ascending}}),

//Group Rows by Expiry Day; Sum the Volume
    #"Group ExpiryDay" = Table.Group(#"Changed Type",{"ExpiryDay"},{
        {"Totals by ExpiryDay", each List.Sum([Volume]), Int64.Type}
    }),
    #"Sorted ExpiryDay" = Table.Sort(#"Group ExpiryDay",{{"ExpiryDay", Order.Ascending}}),

//Add Running Total Column by DataDay
    #"RT DataDay" = fnRT(#"Sorted DataDay","Totals by DataDay"),
    #"RT ExpiryDay" = fnRT(#"Sorted ExpiryDay", "Totals by ExpiryDay"),

//Subtract ExpiryDays prior or equal to dataday
    #"Corr RT" = Table.AddColumn(#"RT DataDay", "Corrected Running Total", 
        (c)=> c[Totals by DataDay Running Total] - 
            (try Table.Last(Table.SelectRows(#"RT ExpiryDay", each [ExpiryDay] < 
                c[DataDay]))[Totals by ExpiryDay Running Total] otherwise 0), Int64.Type),
    #"Removed Columns" = Table.RemoveColumns(#"Corr RT",{"Totals by DataDay", "Totals by DataDay Running Total"})

in
    #"Removed Columns"

 

 

 

 

Data

ronrsnfld_0-1690490762621.png

Results

ronrsnfld_5-1690491077120.png

 

 

 

 

Hi, thanks again for yor reply!! Can you help me how to do this?

  • Paste the custom function as a new query into the Advanced Editor
    • Rename it as per the instructions
  • Paste the main query as a new query into the Advanced Editor
    • Change the Source line to reflect your actual data source

 

  • Paste the custom function as a new query into the Advanced Editor
    • Rename it as per the instructions
  • Paste the main query as a new query into the Advanced Editor
    • Change the Source line to reflect your actual data source

 

Hi ronrsnfld, thank you very much for your reply, I will be able to check your solution proposal on Monday, but I see that the final result is correct 🙂

Rickmaurinus
Helper V
Helper V

Hi,

 

I don't fully grasp your required outcome, could you elaborate.

 

To compute a running total in Power Query you can make a regular one with List.Generate: https://gorilla.bi/power-query/running-total/

That looks something like this: 

 

 

 

List.Generate ( 
   // Start value: Set RT to 1st value in BuffValues and set RowIndex to 0
   () => [ RT = BuffValues{0}, RowIndex = 0 ],
    // Condition: Generate values as long as RowIndex is < than number of list items
    each [RowIndex] < List.Count( BuffValues ),  
   // Record with logic for next values:
    each [    
           // Calculate running total by adding current RT value and the next value
           RT = List.Sum( { [RT], BuffValues{[RowIndex] + 1} } ), 
           // Increment the RowIndex by 1 for the next iteration
           RowIndex = [RowIndex] + 1 
         ],            
   // Only return the running total (RT) value for each row
    each [RT] 
)

 

 

And you can also create a grouped running total like here: https://gorilla.bi/power-query/running-total-by-category/

 

--------------------------------------------------

 

@ me in replies or I'll lose your thread

 

Master Power Query M? -> https://powerquery.how

Read in-depth articles? -> BI Gorilla

Youtube Channel: BI Gorilla

 

If this post helps, then please consider accepting it as the solution to help other members find it more quickly.

Hi, I will try to explain exactly what I want to do. Here are my data (I have 3 million rows), this is just a sample: 

streli_0-1690370374029.png

I want to calculate a cumulative amount for a volume, but so that the calculation skips values with expiry date < dataday. I get the cumulative amount grouped by dataday:

streli_1-1690371021260.png

But I need 71 for June 9, not 91, because there are two expiry dates earlier than June 9 in the previous rows (volume 10 and 10), these should not be included in the cumulative value for June 9.

streli_2-1690371449619.png

 

streli
Helper I
Helper I

My problem is that I need to calculate the cumulative amount for more than 1 million rows (and at the same time skip the overdue amounts) and therefore I can't solve the problem with a DAX function.

Hi, @streli 

 

PQ is not the ideal tool for these kinds of calculations. But as you are interested in PQ so here is what you need to do. 

  1. Sort your "dataday" and "expiry date" in ascending order.

  2. Add an index column: Add Column > Index Column > From 1.

  3. Create a new custom column with the following formula:

= List.Sum(Table.SelectRows(MyTable, each [Index] >= _[Index] and [expiry date] >= _[expiry date]) [volume])

 

  1. Rename the new column as "Cumulative Volume".

  2. Apply the changes and load the data into Power BI.

Please note this can be quite slow on larger data sets, as it's essentially a row-by-row operation. If you find that this method is too slow, you may need to consider a different tool or method, such as using SQL.

 

rubayatyasmin_0-1689517080227.png

 


Did I answer your question? Mark my post as a solution!super-user-logo

Proud to be a Super User!


Manoj_Nair
Solution Supplier
Solution Supplier

@streli- Check this out, if this works for you.

Cumulative Amount = 
CALCULATE(
    SUM('Table'[Volumen]),
    FILTER(
        ALL('Table'),
        'Table'[ExpiryDate] > MAX('Table'[DataDay])
    )
)

 

rubayatyasmin
Super User
Super User

Hi, @streli 

 

Unfortunately, Power Query Editor doesn't support running totals directly. However, we can do it with Power Query Editor for cleaning and filtering the data, and then DAX for the cumulative sum calculation. Here's a step-by-step guide.

 

  1. Load and clean data in Power Query Editor, making sure the dataday, expiry date, and volume columns are correctly formatted.

  2. Filter rows in Power Query Editor where the expiry date is earlier than the dataday.

  3. Close & Apply to load the filtered data into Power BI.

  4. Create a new DAX measure in Power BI to calculate the cumulative sum:

Cumulative Volume =
CALCULATE (
SUM ('Table'[Volume]),
FILTER (
ALLSELECTED ('Table'[dataday]),
'Table'[dataday] <= MAX ('Table'[dataday])
)
)

 

  1. Replace 'Table' with your table name.

This will give you a cumulative sum that excludes datadays with expiry dates earlier than the dataday.

 

rubayatyasmin_0-1689517080227.png


Did I answer your question? Mark my post as a solution!super-user-logo

Proud to be a Super User!


Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayPowerBICarousel

Power BI Monthly Update - May 2024

Check out the May 2024 Power BI update to learn about new features.

Top Solution Authors
Top Kudoed Authors