Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
Hello,
As a relative new user of PowerBI dataflow i'm stuck with this problem:
As source there is a table with id's, orderID's, Events, date of event and if needed a plandate.
Now I want to convert this table to a table in witch only the OnHold, OffHold and Plandates are shown by order.
I have made an example in Excel for the source and the wanted result.
I tried to make a copy of the source with only the needed events and then merge it with the source but that will not do. Filtering results in a loop.
Is there anyone who can help?
main | table | wanted | result | |||||||||||||
ID | Order | Event | date | plandate | Order | Plan1ID | OnHoldID | PlanOldID | PlanNewID | PdtPlan1 | PdPlanOld | PdPlanNew | dtOnhold | dtOffhold | ||
1 | O1 | Start | 01/01/2020 | O1 | 2 | 7 | 3 | 16 | 03/07/2020 | 03/07/2020 | 15/07/2020 | 15/01/2020 | 05/02/2020 | |||
2 | O1 | Plan1 | 05/01/2020 | 03/07/2020 | O1 | 2 | 21 | 18 | 27 | 03/07/2020 | 20/07/2020 | 06/08/2020 | 07/03/2020 | 14/07/2020 | ||
3 | O1 | Plan | 05/01/2020 | 03/07/2020 | O3 | null | 11 | null | 14 | null | null | null | 20/01/2020 | 22/01/2020 | ||
4 | O2 | Start | 05/01/2020 | O3 | 13 | 17 | 14 | 23 | 08/08/2020 | 08/08/2020 | 16/08/2020 | 28/02/2020 | 04/04/2020 | |||
5 | O2 | Plan1 | 06/01/2020 | 04/02/2020 | ||||||||||||
6 | O2 | Plan | 06/01/2020 | 04/02/2020 | ||||||||||||
7 | O1 | OnHold | 15/01/2020 | |||||||||||||
8 | O2 | Finish | 16/01/2020 | |||||||||||||
9 | O3 | Start | 17/01/2020 | |||||||||||||
10 | O4 | Start | 19/01/2020 | |||||||||||||
11 | O3 | OnHold | 20/01/2020 | |||||||||||||
12 | O3 | OffHold | 22/01/2020 | |||||||||||||
13 | O3 | Plan1 | 23/01/2020 | 08/08/2020 | ||||||||||||
14 | O3 | Plan | 23/01/2020 | 08/08/2020 | ||||||||||||
15 | O1 | OffHold | 05/02/2020 | |||||||||||||
16 | O1 | Plan | 05/02/2020 | 15/07/2020 | ||||||||||||
17 | O3 | OnHold | 28/02/2020 | |||||||||||||
18 | O1 | Plan | 01/03/2020 | 20/07/2020 | ||||||||||||
19 | O4 | Plan1 | 05/03/2020 | 05/05/2020 | ||||||||||||
20 | O4 | Plan | 05/03/2020 | 05/05/2020 | ||||||||||||
21 | O1 | OnHold | 07/03/2020 | |||||||||||||
22 | O3 | OffHold | 04/04/2020 | |||||||||||||
23 | O3 | Plan | 04/04/2020 | 16/08/2020 | ||||||||||||
24 | O4 | Finish | 06/05/2020 | |||||||||||||
25 | O3 | Finish | 13/07/2020 | |||||||||||||
26 | O1 | OffHold | 14/07/2020 | |||||||||||||
27 | O1 | Plan | 14/07/2020 | 06/08/2020 | ||||||||||||
28 | O1 | Finish | 06/08/2020 |
@CroWorC ,
could you explain in more detail the logic with which you pass from the main table to the wanted one (keep in mind that, I don't know the meaning and the relations between the various acronyms used)?
As usual, @Smauro 's solutions are elegant and instructive, but it seems that his truly admirable effort of interpretation has produced a slightly different result from what you described as wanted. To hope to have an effective solution also on the size of your case, you should also say how many rows, columns (and possibly sub-groups) it is.
@Anonymous The data I have is a table from an application for following orders.
Every step an order makes in the proces is logged in this table.
The orders are projects build by diffrent contractors and in the building process there are two roughly two steps where something can get stucked.
The first part is between orderintake (Start) and communicating first promissed date (Plan1). The second part is between communicating first promissed date and order ready.
In both parts there can be an onhold situation because of legitime reason or because of unlegitime reason. The onhold time because of legitime reason will be compensated. An Onhold situation can occure more than one time per order. The time lost is in both parts diffrent. Before communicating plandate the total time between Start and communicating first promissed date minus the time OnHold is the total runtime and that is one of the KPI's
The second part is more difficult. An onhold situation frustrates the planning of the contractor so the time between the plandate by Onhold and the plandate directly given after the Offhold moment is the time to compensate. The application asks for a new plandate bij the OffHold action.
KPI: realize project on 1st promised date (including compensation)
KPI: number of replans <=2
Possible events:
wanted:
1 start
2 plan1 given
3 ready
not wanted
1 Start
2 OnHold legitime (date = 03/22/2021)
3 OffHold (date = 01/14/2021)
4 Plan given (this plandate is not relevat but after OffHold there must be a plandate)
5 Plan1 given (plandate 03/19/2021)
6 OnHold (legitime)
7 OffHold
8 Pladate (plandate 03/26/2021)
9 OnHold (not legitime)
10 OffHold
11 Plandate (plandate 04/02/2021)
12 OnHold (Legitime)
13 OffHold
14 Plandate (plandate 04/09/2021)
15 ready (date = 04/07/2021)
In the second example the compensation before giving the first plandate is 1 day,
The compensation after giving the first pandate is 7 days (step 5 and 😎 + 7 days (step 11 and 14) = 14 days.
The runtime of the order is total runtime - 15 days
The compensated plandate is the original plandate + 14 days (and in this case no score on 1th plandate)
In between the Events mentioned are diffrent other events for communication about the order.
In total there are about 150K orders with 3M events
The remoddeling of the table makes it possible for me to determine in witch part the event felt and how to compensate per order.
Thanks @Anonymous , didn't catch the error there with missing HoldDates.
Issue was that they want to use the same value on different lines. For example, a plan's ID could be relevant in more than 1 holds. More, there could be no plan1 after a hold, so the previous should be used etc etc. That would all be solved if we were to look at the whole table everytime as I did in my first solution, but since we now know that there are 3 millions of records, I tried to look at every order's record only once. Of course, some values will not be there (like PdtPlan1)
Anyways, here the table for anyone that wants to play around:
And here's a solution that uses fill down to fix those issues:
Cheers,
Hi Smauro,
Sorry for the late reaction and thanks for the possible solution. The data in my table seems to be too big for this rebuild. I have + 3 milion records in the table and the solution You gave runs out of time. So I tried to select a part of the records but it seems that Powerflow is using the original table still. It keeps on running out of time. Then I tried to get fewer records from the database using SQL statement in stead of chosing a table, but that won't work in Powerflow (same SQL works in PowerBI Dataset) So for now I'm running out of options. I wil come back if my dataproblem is solved.
Hi there, yes, this solution would be very heavy in your dataset.
Best I could hastly do is this:
Instead of buffering and filtering everytime, it sorts the table once and then groups locally. Give it a try and let me know 🙂
(YourTable is your table's query name)
Cheers,
Spyros
Hi @CroWorC ,
Well, this is a possible solution:
Try it by replacing PreviousStep with your previous step's name.
Cheers