Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
Dear,
I am trying to build the below logic in powerbi. Can anyone help on this, I tried and able to get the result only one part. Data is about millions, and need to think about performance as well.
This is the wifi access analysis I am doing. I have a date field , Name and status (Success, Failure and Incomplete). Below the points to put in the logic
Input Desired O/P
DateTime | Name | Status | DateTime | Name | Status | ||||||
2/2/2017 18:58 | John | Completed | 2/2/2017 18:58 | John | Completed | ||||||
2/2/2017 19:57 | John | Failed | 4/2/2017 18:56 | John | Incomplete | ||||||
2/2/2017 20:56 | John | Incomplete | 2/2/2017 18:56 | Paul | Failed | ||||||
4/2/2017 18:56 | John | Incomplete | 2/2/2017 18:53 | Mathew | Completed | ||||||
2/2/2017 18:56 | Paul | Failed | 2/2/2017 18:51 | Jose | Incomplete | ||||||
2/2/2017 18:55 | Paul | Incomplete | 2/2/2017 18:51 | Ani | Completed | ||||||
2/2/2017 18:53 | Mathew | Completed | |||||||||
2/2/2017 18:52 | Mathew | Failed | |||||||||
2/2/2017 18:51 | Jose | Incomplete | |||||||||
2/2/2017 18:51 | Ani | Completed |
Br,
SHams
Hi @shamsuddeenvp what I would do is the following:
And I would suggest if you are looking for performance as @MarcelBeug suggested to potentially the column in the Query Editor but the rest as Calculated columns, so that you can leverage the performance of the Vertipaq engine.
If I interpret the logic correctly, then you need:
all records with status "Completed", and
all records with status "Failed"
if there is no record for the same person with status "Completed" within +/- 4 hours
all records with status "Incomplete"
if there is no record for the same person with status "Completed" or with status "Failed" within +/- 4 hours
I put your data in an Excel file and created the Power Query (M) code below.
To be honest I have my doubts about performance, but you may give it a try on some test data.
let Source = Excel.Workbook(File.Contents("C:\Users\Marcel\Documents\Forum bijdragen\Power BI Community\Help solve logic.xlsx"), null, true), Input_Table = Source{[Item="Input",Kind="Table"]}[Data], Typed = Table.TransformColumnTypes(Input_Table,{{"DateTime", type datetime}, {"Name", type text}, {"Status", type text}}), #"Added Custom" = Table.AddColumn(Typed, "Keep", (x) => if x[Status] = "Completed" then true else if x[Status] = "Failed" then 0 = Table.RowCount(Table.SelectRows(Typed, each [Name] = x[Name] and [Status] = "Completed" and [DateTime] >= x[DateTime] - #duration(0,4,0,0) and [DateTime] <= x[DateTime] + #duration(0,4,0,0))) else 0 = Table.RowCount(Table.SelectRows(Typed, each [Name] = x[Name] and ([Status] = "Completed" or [Status] = "Failed") and [DateTime] >= x[DateTime] - #duration(0,4,0,0) and [DateTime] <= x[DateTime] + #duration(0,4,0,0))) ), #"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([Keep] = true)), #"Removed Columns" = Table.RemoveColumns(#"Filtered Rows",{"Keep"}) in #"Removed Columns"
Thanks a lot @MarcelBeug
This is the logic what I am exactly looking for. Great.
However, it takes long time since my data is more (in lak).. I will have to try some alternate solution.
Br,
Shams
I had done this using DAX queries.
Br,
Shams
To address your performace issues, you need to split your date column in to two columns.
The other should carry the Hour or Minute of the day as an integer. This will make a massive difference over the size of your model and make loading and calculations much faster.
Then you can do the rest in DAX or PQ. When the actions are within 4 hours on the same day it's easy. If they split a day, you just need a basic IF statement to handle that.
If you have millions of rows, you have to avoid having columns that are highly unique like your datetime column.
User | Count |
---|---|
102 | |
88 | |
78 | |
71 | |
69 |
User | Count |
---|---|
113 | |
99 | |
97 | |
72 | |
68 |