Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

The ultimate Microsoft Fabric, Power BI, Azure AI & SQL learning event! Join us in Las Vegas from March 26-28, 2024. Use code MSCUST for a $100 discount. Register Now

Nathaniel_C

Using EARLIER() with a SMART INDEX.

Many times, we need to find a value from a different row than the one we are on, in order to use that value for a formula. In Excel the engine knows that cell C1 is in the row above cell C2 and thus we can subtract cell C2 from C1. In Power BI the sort order for each column may be non-sequential to our way of thinking. One exception to that rule is that we think of a date column as being sorted from earliest date to the latest date. The Power BI engine does as well. 

 

But what happens if we have a date column that has many duplicate dates, or times, or no date column at all?

 

There are many articles written about the use of EARLIER() because a lot of users have a hard time with it. Rob Collie suggests that it is misnamed and should be called CURRENTROW(). One of the issues is that Power BI may store data in an other-than-expected pattern for the column we need to use, and so when using EARLIER we need a date column or an Index.  An Index column is another column that humans and Power BI think is sorted in the same manner. In Power Query an Index column is sequential, starting with 0 or 1 and incrementing each row.

 

One of the solutions in using EARLIER (), is to add an Index column in Power Query so that whatever row we are on, if we want the row above, we can write DAX to filter the table so that we can retrieve the value from the row where the index is one less, (meaning the row above.) So, if we are on row 256, and want the previous value, we ask for -1 which would give us the row 255. DAX gets the value from whatever column we indicated in the previous row, or the row two above if we use -2, etc.

 

So on to the problem.

 

Our DAX pilgrim posted:

 

“I have cumulative hourly meter readings (kWh) from 01/08/2018 00:00 to 31/08/2018 00:00. Each of meter serial is unique and belongs to a certain feeder. what I want to achieve is to evaluate (using DAX column or a measure) for each meter in each feeder the hourly (kwh) from the cumulative i.e the difference between the hourly time steps readings. Thanks in advance.”

 

Nicely explained with a table included.

 

In solving this issue, one might be tempted to use EARLIER with an Index Column. However, as it stands this will not work because the previous read for a meter may be many rows away in the table, and the date column has duplicate values. One might think that using a visual like a Matrix would do the filtering for us, but if we don’t write this in DAX, we are limited in where we might use this. One might think of doing the filtering and calculations in Excel and then uploading the results would be a work around, but then the ability to update automatically from the source is lost. So, let us do this in Power BI and Power Query using DAX.

 

 

After thinking on this issue and trying various approaches, I did three things:

  1. Entered a table that conformed with the poster’s logic
  2. Created a unique identifier for the table
  3. Created a Smart Index

 

Although the poster had written that each meter serial number was unique it was not clear to me whether that meant for each feeder or overall. Better safe than sorry, I merged the columns containing the Feeder identity, and the meter serial number to create a unique identifier that covered all Feeders and all Meters.

 

The next step is the crux of the solution to my mind. In my rock-climbing days, the crux was that one essential move that you had to use to make it to the top. And like many problems, looking back on the solution, it may seem simple, but while you are hanging on the rock face trying to solve it, it is never simple.

 

Many of us know in Power Query how to click on the little arrow at the top of the column and access the sort order.  Simple, yes? But what to sort and why? I sorted first on the unique identifier. That brought each meter next to itself. Then I sorted the Date Time column to put the earlier time first.

 

And then I inserted the index. Let me repeat that. And then I inserted the index.

 

Normally, as I mentioned above, we use an index to be able to move one (or more) rows up. Like a ladder. I am here and I want to go there. However, in this case by adding the index after the sorting, the index captured the sort order. So now the index contains some explicit intelligence that a regular index only carries implicitly. We have a Smart Index!  Which is a lot more powerful! This Smart Index knows how to “pair” up the two readings that we need, and equally important, which reading came first.

 

 

Following is the column code and the PBIX that I created. Smart Index

Hourly Difference = 
                    var firstread = CALCULATE ( MAX( Meter[Hourly Reading] ), FILTER( ALL( Meter), Meter[Smart Index] = EARLIER( Meter[Smart Index] ) -1 ))  //finds the previous reading using Smart Index, so essentially sorting and filtering here 
                    var currentread = Meter[Hourly Reading]  //finds the current reading 
                    var result = IF ( Meter[Smart Index]=0, 0, currentread - firstread ) // 3 possible values, 0, blank or the difference. Here either 0 or the difference, the next line the difference or blank


                
  return   IF( Meter[Unique ID] = CALCULATE( MAX( Meter[Unique ID] ), FILTER( ALL( Meter ), Meter[Smart Index] = EARLIER( Meter[Smart Index] )-1 )), result)

In the return we want to make sure that of the pair, that we are on second row of the pair. So, we insert a test.  And what is cool, is that we can use pretty much the same DAX, except instead of the value from [Hourly reading] we are looking for the value from [Unique ID]. What we want to know is, do both rows contain the same meter value? If the value from the current [Unique ID] is the same as the value from the previous [Unique ID] then go ahead and subtract the one from the other (result) otherwise do nothing which Power BI interprets as blank because we do not want to compare one meter's value to another meter's value.

 

Using a Smart Index that encapsulates the order with which we would like to see our data used with EARLIER() is a win-win situation.  It essentially reduces the need for filtering as well. Keep your eyes open for other situations that a Smart Index could be applicable.

Comments

Hello Readers:

 

Although the PBIX file is right above the code, here is a picture of the desktop to give you an idea of what was requested.

 

Nathaniel

Meter.PNG

 

Using variables it would be better to avoid EARLIER, which was necessary when DAX did not have variables.

 

Hourly Difference = 
VAR PreviousSmartIndex = Meter[Smart Index] - 1
VAR firstread =
    CALCULATE (
        MAX ( Meter[Hourly Reading] ),
        REMOVEFILTERS ( Meter ),
        Meter[Smart Index] = PreviousSmartIndex
    ) //finds the previous reading using Smart Index, so essentially sorting and filtering here 
VAR currentread = Meter[Hourly Reading] //finds the current reading 
VAR result =
    IF (
        Meter[Smart Index] = 0,
        0,
        currentread - firstread
    ) // 3 possible values, 0, blank or the difference. Here either 0 or the difference, the next line the difference or blank
RETURN
    IF (
        Meter[Unique ID]
            = CALCULATE (
                MAX ( Meter[Unique ID] ),
                REMOVEFILTERS ( Meter ),
                Meter[Smart Index] = PreviousSmartIndex
            ),
        result
    )

 

Thank you, @marcorusso !

Anonymous

@Nathaniel_C  - This logic can also be accomplished in Power Query. Steps:

1. Sort

2. Add Index

3. Add Prevous Index by subtracting 1 from Index

4. Merge table with itself (Previous Index with Index)

5. Expand table to get previous Unique ID and Reading

6. Calculate Change

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("jZzNjhw3DIRfxdhzAE9Pj6Tu3Db3AH03/P6vEW8GhlVkfVQDySWoSGwWS6Ufzv748fH58dfH9vXv4/vz+/Oxnd+259+Px69/vn3+++u/H+fHz78sbEaNDigZq9NY+4zaHg+Ave7B2gxrB6C6REZjDUHRVx6SsRegTkHRWNtDptwJJvk/BsF+E3CVn7nNqNYWZP6POimxu8xIqJeMRSlrt6LvMhalYsg3rqh8j7UtqHyjnisq61RsW/2Vu9PlSIWxG10akexGmDiYCLMnlnajyzOJfDeyzOW/G1nmst6NLDF6kWWu6t3IciSSdifLQYGpLHMtRi4XlAuKBtOx6DNFlo1IElk2iktkOQglsjwoLpHlSFLajSwxepElFo/KMhtcpPI9558ie/oV9jOsGAEmtTgAJWM1GktUeVBgqsoGqHYrLlHlvKgoTGQ5JVZRIstOY4ksT0KpLDn7kv6+5DIu2J7L6BGey+hKnsvrFpc1SmXZF1zWY6lbUvQiy0H5Om/lS2V5UvjBLZ8RZt0yq9e5ZUtV5swy16Izy5MCE1k2QqlZppQ5sxz0jbqHTQuBM8tB3yiqzGVhzfJMC4E1y0FfqarMq6I1S5pTxsKUiSrHa0HlW0mUMlUlxaV7WEKNW5k4bo0lqhxLKq9wAgIqo43vfoH9rGFSZQ1Qek4llKhy2jko6nVrLDJLRYkqT/pG2sIqSr3yBShR5WzPIa2PW18ZjpZLLuPi77mMy7XnMhaj5zJu3DyX9Ywiy4OiF1l2YklkeVJe1Sy3BZdxUQEq4wECqHwvGKl+rFnmZFizTJ9pzTJ9pjPLRiiR5bSjURSZpaL0ZEnR6xY25dWZ5ZnK1ZplKh5rlj1VtTdL5EiVRCmTujgoNB2LUqYXPpSM1624VJVUr2SWnsoapaqkGVWVmHxVZVp6rFlOO92XX2DjlUOASS02QNHJUlGqSgqMVKkoUeV0w6eofit6UeVJ0es1LI2lB8udskqqDDBaE4HKqDdPZbxW9FRGr/RURr15KuMdh6cyKtxTGS+RPZXx4tdTeZUkiSobjRVUSdUTVHlEmD9YpsJwXjn5iKJ0LELpfU9KrL2FpbFElZ1QC1U6r2ypXJ1XZiHZW1jMqqgyM269kmF6RZM04rwy16LzyoM+gG5hPZXRBT2V8bLKUwmqtAfLFZX1jLSDBSph6Vndwja/wMbXhQAD7SpKfZdQ+jZCM75uzSiqHDQWvY0oip4sFaVe+QSUPlk+KKvqlRvByCuByniu9FTG6x5PZdzBeirj06CnMrqNpzJeMHkq66oWVXbKBF33eCrjOgBUXvEqAbiMIm9+hY1b2ACrC9seLFM27MEyZdaZ5ZG4dGZ50lhqljSW3vckjTizPGhGlWVi3JslJQzfRoDKeMnhqYzHKU9l3Oh6KkFw9smSkqEHSyoxkeWgql7IcmGWnsr4zgJUxl0zUDln/58vyNf/9X1/2C3sRjCp6wegVOENUKrKA1Cqyieg1Cx3QOnbCEWvqjwBddzKl6iyU1zhbaQTTK97xorKeVlnKufDIFM5a4SpvMr06xaWEiuqPKle1SxpLD1YUlzaSECFqFtYKkR73VNQ6VT5S7yb9cqDYGZLllHOdzNKVUmBqSp3QLVbKPXKE1Cqyg4oPVhSvvS6Z6OsqldSWoMqnysqL7P4ZCrfVUahueseptKtA5nKq0yGeiV9pF7CEkpU+Xs/yVzW4VeyNFxepUQ2dwf+Bav6eybKi/6eqbCL/p6jAUpPlgegRJa9A0pl+QKUyvIBKL3voehVlpRWlSXFFcwSYbooUsaCWRJJ7sWSqZxPqUxlFG/R3tO3BZXzLSxTWY8lssSy0JMlZUJU2TGrbgtbUHl5iVQHSwOTik3JcAfLeY2q+ntSYdv+HopfOwloRpFloxn1cSQtBMWTZUadt1C2GdbAFrL0J8tUGfZkSYNpBx+lzJ0smcqrTIbuYSkudUsqC9ffw1TOjQRMZdynV/09OFi4hv2Ti6q/Z1phq/6eB6BU4h1Qosop/qK/p9OM+jjSAKXXsDTjuDWj++VIRrkWdZNVNUuaMtz3PFdURrup2nv2BZXxBFS199CMqsoVlbGsi/aekwh3jQRMZdxcVO09SyrjFrZq78nZt2bZk5KcWQ4aTLewKbH2FjbVmPPKXGL2yZJQ+nuulH7bC0vfqL2wqSysVyYq/S0spcL+cKSgMt7kFO09GNpCldYrUyna9h4qC1Xlisq4OS3ae+ZdW9Hf0ykwlSWh0Cyr/p5pKaj6e6b1rujvGYSiK911f09GkVkW/T0Yl5rlDijtumuAcv09GaXXsJh8kWWnhAWzPFZUXmHnUPT3DEq/exxhKuN1Z9HfM0mp6O9pFL2aJUWvZklloWZJcZ23ZgyqxOS7F+MJ5k+WKbHOLLNGrFnSlLqFTTVm+3sIpU+WKRf2YEnR0xa26O9piXD7ZElxrVTpe2GTRKxZ5ipzZtkJ5Tr4mMp3KRLhet9DiVVVEkrNkspC73uSkuzBkkhynQQFlfP7Z0FlhFX9PdPSWfT3TNot+nsOQmnXHQWmXrkByv1JgoxyHeoZpao8AXXciv68Fb39OZeB6RYWRyNVFv0907a56u+hz3R/koCpjJvmor+nU/GoKil6VeVYUBkdtejvOVEgqsollfGhqGzvSRVrD5aEWijcHiwpsNetGd3PuTKq34rLtfdklPuLBBlFjQRVew/DaKdStff0pMrFwbJo78Ep91s1pqpMSlpcwq7be5jKGiWqPAil58ollVe4oKnae740/vM/", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Feeder = _t, Meter = _t, Time = _t, READING = _t]),
    AddKey = Table.AddColumn(Source, "Unique ID", each [Feeder] & [Meter]),
    #"Changed Type" = Table.TransformColumnTypes(AddKey,{{"Unique ID", type text}, {"Time", type datetime}, {"READING", Int64.Type}, {"Meter", Int64.Type}}),
    #"Sorted Rows" = Table.Sort(#"Changed Type",{{"Unique ID", Order.Ascending}, {"Time", Order.Ascending}}),
    AddIndex = Table.AddIndexColumn(#"Sorted Rows", "Smart Index", 0, 1),
    #"Added Custom" = Table.AddColumn(AddIndex, "Prev Index", each [Smart Index] - 1),
    #"Merged Queries" = Table.NestedJoin(#"Added Custom", {"Prev Index"}, #"Added Custom", {"Smart Index"}, "Added Custom", JoinKind.LeftOuter),
    #"Expanded Added Custom" = Table.ExpandTableColumn(#"Merged Queries", "Added Custom", {"READING", "Unique ID"}, {"Previous READING", "Previous Unique ID"}),
    #"Added Custom1" = Table.AddColumn(#"Expanded Added Custom", "Hourly Difference", each if [Unique ID] = [Previous Unique ID] then [READING] - [Previous READING] else null),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{"Prev Index", "Previous READING", "Previous Unique ID"})
in
    #"Removed Columns"

 

This can also be done within a Function, calling the function for each Unique ID, so that the Merge will be less expensive.

 

Anonymous

How do I do this...

 

"This can also be done within a Function, calling the function for each Unique ID, so that the Merge will be less expensive."

Anonymous

@Anonymous  - Please see this pbix.

To create and use a table function in Power Query, you need to do the following:

  1. Create a parameter for the column you want to partition. In this example, I called it FeederMeter, which is the same as the Unique ID.
  2. In a very early step, filter your query to limit to only rows where [Column] = Parameter. (Ideally, this would be part of the call to the source DB.)
  3. Right-Click the query and click "Create Function".
  4. Create a different table with the distinct list of Partition values.
  5. Add Column -> Invoke Custom Function. This will run the Function for each row in your table.

 

Hi @Nathaniel_C ,

That is a very neat post and it gave me some inspiration on how to use a smart index as a tool to use an index based on the sort order for multiple columns. Thanks for sharing!

I wanted to show an alternative to this without using PowerQuery but using DAX. The Smart Index column is then not a continous column but ranks all input based on the Time column, per Meter and Feeder. Also, I've used DAX variables rather then EARLIER() for performance. So I added two calculated columns, this is SmartIndexWithRANKX;

SmartIndexWithRANKX = 
VAR curMeter = Meter[Meter]
VAR curFeeder = Meter[Feeder]
RETURN
RANKX(FILTER(ALL(Meter), Meter[Feeder] = curFeeder && Meter[Meter] = curMeter), Meter[Time], , ASC)

The second calculated column is to calculate the difference with the previous reading:

HourlyDif Based on Rank = 
VAR curFeeder = Meter[Feeder]
VAR curMeter = Meter[Meter]
VAR prevSmartIndexRanked = Meter[SmartIndexRank] -1
RETURN
IF(prevSmartIndexRanked = 0, 
    BLANK(), 
    Meter[Hourly Reading] - CALCULATE(
        SELECTEDVALUE(Meter[Hourly Reading]), 
        FILTER(ALL(Meter), Meter[Feeder] = curFeeder && Meter[Meter] = curMeter && Meter[SmartIndexRank] = prevSmartIndexRanked))
)

Same result, just a different approach 🙂

image.png

What are your thoughts on this? I can imagine that on very large datasets, your method might be preferred as to push all calculations to the preperation of the data.

Either way, nice post and I will keep an eye out for using SmartIndexes 🙂

Kind regards,
Djerro123