Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
aviral
Advocate III
Advocate III

Rows aggregation based on various criteria

*This is not a question

 

Would like to share recent learning that I had related to one of the issues that I was facing while trying to aggregate data. There are multiple posts available which gave part solutions. The attempt here is to combine all that knowledge into a single post. 

 

I think people who are coming from the programming background or those who work on platforms which provide more evovled constructs to capture a logic might find this post useful. 

 

Problem: 

The problem faced was with regard to a table that contained sequential snapshots of programs with a timestamp with multiple records communicating the same status over time. I needed to collapse such records to provide a single records which captured the start and end time. This was more like a CDR (Call Data Record) scenario where the call status is reported at regular intervals and then aggregated by some mechanism to indicate a sequential progression of call over time between its start and end time.

 

Example Source Table: 

 

Call Identifier | Timestamp | Status

Call-1                t               Active

Call-1                t+5           Active        (remove)

Call-1                t+10         Suspended

Call-1                t+15         Suspended (remove)

Call-1                t+20         Active

Call-1                t+25         Active        (remove)

Call-1                t+30         Active        (remove)

Call-1                t+35         Active        (remove)

Call-1                t+40         Active        (remove)

Call-1                t+45         End

 

Imagine the similar data for multiple calls The objective was to conver this data to a form such that there remains one record for each staus in continuation without losing information. 

 

Example Target Table: 

 

Call Identifier | Timestamp | Status

Call-1                t               Active

Call-1                t+10         Suspended

Call-1                t+20         Active

Call-1                t+45         End

 

So we can easily understand from the aggregated table above that call

Started at t and Ended at t+45

was Active from t to t+10

was Suspended from t+10 to t+20

was Active again from t+20 to t+45 when it ended. 

 

Solution: 

To achieve this, there are a few tricks in PowerBI that we need to know but before that let us look at the sequence of steps that logically need to be performed. Here is a list of steps:

1. We need to group all the data by call identifer first. (Please note that the example table above shows data for only one call but in real life there might be a single table containing records for all calls). 

2. Next we need to sort the records within the call group by timestamp. This is required to get the sequence of status right. 

3. Next we need to remove all records where the status is same as previous record. 

 

Pretty simple... right... ? But how do you do it in PowerBI which does not provide straight looping constructs to compare the values. There are workarounds available but they can be complicated and at times difficult to understand (by my standard). 

 

The easiest way I found to achieve this was to add two index columns first starting from 1 and other starting from 0 and then doing a crossjoin of the same table on two. While doing the crossjoin please take care of the following: 

1. To compare with next value, do the crossjoin such that index starting with "1" is crossjoined with index starting with "0"

= Table.NestedJoin(#"PrevAppliedStepName",{"IndexStartingOne"},#"PrevAppliedStepName",{"IndexStartingZero"},"PrevAppliedStepName",JoinKind.LeftOuter)

 

2. To compare with previous value, do the opposite such that index starting with "0" is crossjoined wiht index starting with "1"

= Table.NestedJoin(#"PrevAppliedStepName",{"IndexStartingZero"},#"PrevAppliedStepName",{"IndexStartingOne"},"PrevAppliedStepName",JoinKind.LeftOuter)

 

Doing the crossjoin against itself, gives a column of table which can be expanded to include only the values which you want to compare and now you will have the values (to be compared) from the previous/next record in the same row. A new custom column can be created with the logic to either retain or remove the row and then table filtered on this column to keep only the relevant rows. 

 

Before doing the crossjoin however, we need to get the order in the table right and this can be achieved by creating a custom column which is a concatenation of fields in sequence in which you want to sort (in our case it would be call name + time stamp) and then sorting on this column.

 

Hope this helps someone (or me when I am looking for it in the future). 

 

In case you have a better solution or have more to add to this, please feel free to comment or add an answer. 

1 ACCEPTED SOLUTION
v-yuezhe-msft
Employee
Employee

@aviral,

Alternatively, you can perform the following steps to get target table.

1. Duplicate the Timestamp column in Query Editor, then split the duplicated column by +.
1.JPG

2. Remove Timestamp – Copy1 column and rename Timestamp -Copy2 column to interval.
2.JPG

3. Create the following calculated columns in the table.

Column = IF(Table[Interval]=BLANK(),0,Table[Interval])


Index = RANKX(FILTER(Table,Table[Call Identifier]=EARLIER(Table[Call Identifier])),Table[Column],,ASC,Dense)

Difference = var previous=CALCULATE(FIRSTNONBLANK(Table[Status],Table[Status]),FILTER(Table,Table[Call Identifier]=EARLIER(Table[Call Identifier])&& Table[Index]=EARLIER(Table[Index])-1)) return IF(ISBLANK(previous)||previous<>Table[Status],1,0)

4. Drag Difference column to Visual Level filters of the table visual and set its value to 1.
3.JPG


Regards,
Lydia

Community Support Team _ Lydia Zhang
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

1 REPLY 1
v-yuezhe-msft
Employee
Employee

@aviral,

Alternatively, you can perform the following steps to get target table.

1. Duplicate the Timestamp column in Query Editor, then split the duplicated column by +.
1.JPG

2. Remove Timestamp – Copy1 column and rename Timestamp -Copy2 column to interval.
2.JPG

3. Create the following calculated columns in the table.

Column = IF(Table[Interval]=BLANK(),0,Table[Interval])


Index = RANKX(FILTER(Table,Table[Call Identifier]=EARLIER(Table[Call Identifier])),Table[Column],,ASC,Dense)

Difference = var previous=CALCULATE(FIRSTNONBLANK(Table[Status],Table[Status]),FILTER(Table,Table[Call Identifier]=EARLIER(Table[Call Identifier])&& Table[Index]=EARLIER(Table[Index])-1)) return IF(ISBLANK(previous)||previous<>Table[Status],1,0)

4. Drag Difference column to Visual Level filters of the table visual and set its value to 1.
3.JPG


Regards,
Lydia

Community Support Team _ Lydia Zhang
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.