Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The ultimate Microsoft Fabric, Power BI, Azure AI & SQL learning event! Join us in Las Vegas from March 26-28, 2024. Use code MSCUST for a $100 discount. Register Now

Vijay_A_Verma

Power Query - Methods to Refer to Previous Row - Time Performance Comparison to determine the Best

Use Case - There are many scenarios where you need to refer to previous row such as need to compute month on month growth in Sales. In this article, I talk about 3 techniques to refer to previous row and decide which one is the best. 

 

Solution - Below are 3 methods to refer to previous row. To demonstrate, I have created a table which is named Master. I have created 3 queries which reference to this Master to demonstrate 3 methods. In below method, we are referring to previous row for Units Sold column.

Method 1 (Index Method) -  In this method, we insert a 0 based index and use that to refer to a previous row. Below is the code.

 

 

 

let
    Source = Master,
    #"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1, Int64.Type),
    #"Added Custom" = Table.AddColumn(#"Added Index", "Previous", each if [Index]>0 then #"Added Index"[Units Sold]{[Index]-1} else null)
in
    #"Added Custom"

 

 

 

 Method 2 (Merge Method) - In this method, 2 indexes are inserted - One is 0 based and one is 1 based. Then we use these 2 indexes to self join the table. In this method, first record goes down to end of table, hence we need to perform a sort to maintain the original sort order. Below is the code.

 

 

 

let
    Source = Master,
    #"Added Index" = Table.AddIndexColumn(Source, "Index", 0, 1, Int64.Type),
    #"Added Index1" = Table.AddIndexColumn(#"Added Index", "Index.1", 1, 1, Int64.Type),
    #"Merged Queries" = Table.NestedJoin(#"Added Index1", {"Index"}, #"Added Index1", {"Index.1"}, "Added Index1", JoinKind.LeftOuter),
    #"Expanded Added Index1" = Table.ExpandTableColumn(#"Merged Queries", "Added Index1", {"Units Sold"}, {"Previous"}),
    #"Sorted Rows" = Table.Sort(#"Expanded Added Index1",{{"Index", Order.Ascending}}),
    #"Removed Columns" = Table.RemoveColumns(#"Sorted Rows",{"Index", "Index.1"})
in
    #"Removed Columns"

 

 

 

 Method 3 (List Method) - In this method, we use Table.FromColumns function to refer to previous row. Below is the code.

 

 

let
    Source = Master,
    Custom1 = Table.FromColumns(Table.ToColumns(Source) & {{null} & List.RemoveLastN(Source[Units Sold],1)},Table.ColumnNames(Source)&{"Previous"})
in
    Custom1

 

 

A sample pbix file can be downloaded from here illustrating the above

https://1drv.ms/u/s!Akd5y6ruJhvhugbERzhaB96FVlLQ?e=SmjWXg

Methodology to measure time taken by the queries - Since Power BI Desktop doesn't provide a very reliable method to measure time, the data has been loaded to Excel and Excel VBA code has been used to measure the time taken by the query.

Below VBA code has been used to measure time performance of queries. The code is credited to Charles Williams of FastExcel and has been taken from here - https://docs.microsoft.com/en-us/previous-versions/office/developer/office-2010/ff700515(v=office.14...

 

 

#If VBA7 Then
  Private Declare PtrSafe Function getFrequency Lib "kernel32" _
  Alias "QueryPerformanceFrequency" (cyFrequency As Currency) As Long
  Private Declare PtrSafe Function getTickCount Lib "kernel32" _
  Alias "QueryPerformanceCounter" (cyTickCount As Currency) As Long
#Else
  Private Declare Function getFrequency Lib "kernel32" _
  Alias "QueryPerformanceFrequency" (cyFrequency As Currency) As Long
  Private Declare Function getTickCount Lib "kernel32" _
  Alias "QueryPerformanceCounter" (cyTickCount As Currency) As Long
#End If
Function MicroTimer() As Double
' Returns seconds.
    Dim cyTicks1 As Currency
    Static cyFrequency As Currency
    MicroTimer = 0
' Get frequency.
    If cyFrequency = 0 Then getFrequency cyFrequency
' Get ticks.
    getTickCount cyTicks1
' Seconds
    If cyFrequency Then MicroTimer = cyTicks1 / cyFrequency
End Function

 

 

I am invoking below function and I will be using a Sub to call this function with required Query name

 

 

Function GetTime(QueryName)
    Dim Timer As Double
    Dim PreviousRefreshStatus As Boolean
    Timer = MicroTimer()
    With ThisWorkbook.Connections("Query - " & QueryName).OLEDBConnection
        PreviousRefreshStatus = .BackgroundQuery
        .BackgroundQuery = False
        .Refresh
        .BackgroundQuery = PreviousRefreshStatus
    End With
    GetTime = Format(MicroTimer() - Timer, "0.000")
End Function

 

 

This is the Sub used to populate the results of 3 approaches in one column of Result worksheet.

 

 

Sub Result()
    Dim Ws As Worksheet
    Dim i As Long, LastRow As Long
    Dim Rng As Range, Cell As Range
    
    Application.ScreenUpdating = False
    Set Ws = Worksheets("Result")
    LastRow = Ws.Range("A" & Rows.Count).End(xlUp).Row
    Set Rng = Ws.Range("A2:A" & LastRow)
    
    For Each Cell In Rng
        LastColumn = Ws.Cells(Cell.Row, Columns.Count).End(xlToLeft).Column
        Cell.Offset(0, LastColumn) = GetTime(Cell)
    Next Cell
    
    Application.ScreenUpdating = True

End Sub

 

 

Once I run the macro, it populates the result for all approaches. Hence, I ran the query 3 times to measure time in 3 rounds and took average of 3 rounds to arrive at the final time taken by a query.

NOTE - Below time results should not be taken as absolute but rather should be used to compare performance of methods. Timing depends upon computer configuration, processes being run and various other factors. Also the queries run at different times, report different timings because of this. Lower the time is, variation is more. Higher the time is, variation is lesser. 

Results - Below are the results for a dataset containing 500, 5000, 50000, half a million and 1 million records. Since Index method was clearly losing out and took 304 second just for 5000 records, hence for >5000 records, this method was discarded. 

Note - List method is referred to as ListGenerate in below pictures

1.png

 

Conclusion - Yes, Merge method is something which is the the fastest. As number of records increase, List.Generate has been slowing catching up. This is partly due to the fact that we are forced to sort in Merge method to maintain original sorting order. But if we can live up without sorting, then Merge method's performance will be all the more better.

2.png

 

All Excel files can be downloaded from below

500 Records

5000 Records

50000 Records

Half a Million Records

1 Million Records

File used for Graphs

--- End of Article ---

Comments

Great Post! Thanks for sharing!

Great analysis, thank you! I was using merge and it's great to see that was the most efficient overall, with the sorting caveat.

Hi @Vijay_A_Verma ,
thanks for providing the code for measure query execution times - that's very helpful!

With regards to your measurements, this seems to depend quite a bit on the width of the tables. If you run it on slim tables, then the "ListGenerate"-approach (which I would call "Shift", as there is no ListGenerate formula in it) will be considerably faster.  You can check out this file, where the "Shift"-method is more than 3 times faster than the merge-method (500k file): https://thebiccountant-my.sharepoint.com/:x:/g/personal/imke_thebiccountant_onmicrosoft_com/EcATEqj-...

ImkeF_0-1666642308754.png

 

Perfect @ImkeF ...Sorry, I am checking it quite late as I have been contributing to Linkedin of late, and logged in today after a long time.

Thank you for posting your observation. I think readers would need to keep this trade-off in mind while choosing a particular method. Every second is important in Power BI world.