Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
Hi,
I have a huge problem trying to get the client status over time.
Imagine a table like this:
Date | Client_ID | Status |
8/20/2019 | 1 | S |
8/19/2019 | 1 | S |
8/18/2019 | 1 | S |
8/20/2019 | 2 | N |
8/19/2019 | 2 | S |
8/18/2019 | 2 | S |
8/20/2019 | 3 | S |
8/19/2019 | 3 | N |
8/18/2019 | 3 | S |
My desired variable is "Desertion" which is a change in customer status from S to N, however, if i get that one I could get all the others permutations. That being said, I need to create a column that can track the client status over time (keep in mind that the table has around 40MM rows so keep the calculation simple).
The different variations would be:
IF Past_Status = S and Current_Status = S THEN 'Active'
IF Past_Status = S and Current_Status = N THEN 'Desertion'
IF Past_Status = N and Current_Status = N THEN 'Inactive'
IF Past_Status = N and Current_Status = S THEN 'Reactivated'
IF Past_Status = BLANK and Current_Status = S THEN 'New Client'
IF Past_Status = BLANK and Current_Status = N THEN 'Prospect'
IF Past_Status = N and Current_Status = BLANK THEN 'Unknown'
IF Past_Status = S and Current_Status = BLANK THEN 'Unknown'
I have written this problem several times and havent received an answer with a way to properly do it.
Hope you can help me.
Regards,
IC
Solved! Go to Solution.
@Anonymous
Hi, lets try with DAX (in Power Query will be a excellent solution too)
1, Create a Calculated Column for Rank
Ranking = VAR _ClientID = 'Table'[Client_ID] RETURN RANKX ( FILTER ( 'Table'; 'Table'[Client_ID] = _ClientID ), 'Table'[Date], , ASC )
2, Create a Calculated Column for RankPrev
RankingPrev = 'Table'[Ranking]-1
3. Create a Calculated Column for Previous Status
PastStatus = VAR _ClientID='Table'[Client_ID] VAR _RnkPrev='Table'[RankingPrev] RETURN CALCULATE(VALUES('Table'[Status]),FILTER(ALL('Table'),'Table'[Ranking]=_RnkPrev && 'Table'[Client_ID]=_ClientID))
4. Create a Calculated Column for Status to Date
(Review the conditions)
Status_To_Date = SWITCH(TRUE(),'Table'[PastStatus]="S" && 'Table'[Status]="S","Active", 'Table'[PastStatus]="S" && 'Table'[Status]="N","Desertion", 'Table'[PastStatus]="N" && 'Table'[Status]="N","Inactive", 'Table'[PastStatus]="N" && 'Table'[Status]="S","Reactivated", 'Table'[PastStatus]=BLANK() && 'Table'[Status]="S","New Client", 'Table'[PastStatus]=BLANK() && 'Table'[Status]="N","Prospect", 'Table'[PastStatus]="N" && 'Table'[Status]=BLANK(),"Unknown ", "Unknown")
5. Test it with millions of rows
Regards
Victor
Lima - Peru
The issue is that DAX is really bad at figuring out value from "previous row" in an efficent manner. Especially with 40MM rows. Due to the way DAX creates calculated columns, it has to calculate each value from scratch.
There's actually an interesting solution to the problem in Power Query here, where they sort the table, add an index, add a column which is index-1, and join the table to itself based on index and index-1. After getting rid of the rest of the columns, it ends with a current value and previous value in the same row. From there, you can create another conditional column in power query to check if current = N and previous = S and other combinations you want.
For your particular case, I would add in another condition to the join, making sure that the previous client_id is the same as the current, since the index won't reset on client_id change. If you need more help implementing this solution, please ask.
OH YES, I will need help. Actually, someone offered a similar solution with the rank workaround over here:
https://community.powerbi.com/t5/Desktop/Customer-Status-over-time/m-p/760492#M366505
I even already have the rank variable in my table, but I think your solution about bringing the value past value to the actual row would be perfect. Please help me in this matter.
The Rank variable is the following:
It looks like @Vvelarde's DAX solution will work, but I fear that with 40 million+ rows, it will suffer enormous performance problems. It will only be upon loading the data, but that data load could take a long time.
Here's the power query I would try to see if it has better performance:
let Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WckksSVXSUXLOyUzNK4n3dAGyg0sSS0qLlWJ1opUs9I0MgMjQEihsCJKCihpaYhW1wCaKMMEIiP0wTDDCaoIRVhOMsbrBGNlcCzS1sQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [#"(blank)" = _t, #"(blank).1" = _t, #"(blank).2" = _t]), #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]), #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{{"Date", type date}}), #"Sorted Rows" = Table.Sort(#"Changed Type",{"Client_ID",{"Date", Order.Ascending}}), #"PrevTable" = Table.RenameColumns(#"Sorted Rows",{{"Date","PrevDate"}, {"Client_ID", "PrevClient_ID"}, {"Status", "PrevStatus"}}), #"Added PrevIndex" = Table.AddIndexColumn(#"PrevTable", "PrevIndex", 1, 1), #"Added CurIndex" = Table.AddIndexColumn(#"Sorted Rows", "CurIndex", 0, 1), #"Join Prev" = Table.Join(#"Added CurIndex", "CurIndex", #"Added PrevIndex", "PrevIndex", JoinKind.LeftOuter), #"Removed Columns" = Table.RemoveColumns(#"Join Prev",{"PrevDate", "PrevIndex"}), #"Added Conditional Column" = Table.AddColumn(#"Removed Columns", "StatusOverTime", each if [PrevClient_ID] = [Client_ID] then if [Status] = "S" and [PrevStatus] = "S" then "Active" else if [Status] = "N" and [PrevStatus] = "S" then "Desertion" else if [Status] = "N" and [PrevStatus] = "N" then "Inactive" else if [Status] = "S" and [PrevStatus] = "N" then "Reactivated" else if [Status] = "S" and [PrevStatus] = null then "New Client" else if [Status] = "N" and [PrevStatus] = null then "Prospect" else if [Status] = null and [PrevStatus] = "N" then "Unknown" else if [Status] = null and [PrevStatus] = "S" then "Unknown" else "Error" else null), #"Removed Columns1" = Table.RemoveColumns(#"Added Conditional Column",{"PrevClient_ID", "PrevStatus"}) in #"Removed Columns1"
I've also attached a sample .pbix so you can see how it looks on your sample data.
@Anonymous
Hi, lets try with DAX (in Power Query will be a excellent solution too)
1, Create a Calculated Column for Rank
Ranking = VAR _ClientID = 'Table'[Client_ID] RETURN RANKX ( FILTER ( 'Table'; 'Table'[Client_ID] = _ClientID ), 'Table'[Date], , ASC )
2, Create a Calculated Column for RankPrev
RankingPrev = 'Table'[Ranking]-1
3. Create a Calculated Column for Previous Status
PastStatus = VAR _ClientID='Table'[Client_ID] VAR _RnkPrev='Table'[RankingPrev] RETURN CALCULATE(VALUES('Table'[Status]),FILTER(ALL('Table'),'Table'[Ranking]=_RnkPrev && 'Table'[Client_ID]=_ClientID))
4. Create a Calculated Column for Status to Date
(Review the conditions)
Status_To_Date = SWITCH(TRUE(),'Table'[PastStatus]="S" && 'Table'[Status]="S","Active", 'Table'[PastStatus]="S" && 'Table'[Status]="N","Desertion", 'Table'[PastStatus]="N" && 'Table'[Status]="N","Inactive", 'Table'[PastStatus]="N" && 'Table'[Status]="S","Reactivated", 'Table'[PastStatus]=BLANK() && 'Table'[Status]="S","New Client", 'Table'[PastStatus]=BLANK() && 'Table'[Status]="N","Prospect", 'Table'[PastStatus]="N" && 'Table'[Status]=BLANK(),"Unknown ", "Unknown")
5. Test it with millions of rows
Regards
Victor
Lima - Peru
Victor, usted esta casi a la altura de un dios. Mil gracias. Funciona super bien!!
Sin embargo, ahora tengo otro problema. Estos valores son para un reporte donde quiero saber la deserción semanal. Cada uno de los ranking que obtuviste en la formula utilizada vendría siendo una semana. Es decir que el rango maximo seria la semana mas actualizada.
Sabiendo esto, intenté filtrar por Ranking en Top 1, mas no pude obtener el valor deseado que es "Desercion semanal" en un visual de KPI.
Podrias ayudarme a obtener el numero deseado? Si lograste hacer lo anterior, facilmente logras esto.
Gracias anticipadas,
IC
@Anonymous
Hi, explain the case better please.
Hola, explicame el caso mejor.
Regards
Victor
Lima - Peru
Tomemos un caso como ejemplo:
Aqui podemos ver el estado de este cliente en el tiempo. Lo que quisiera hacer es poder tomar el ultimo estado disponible (Mayor fecha existente) para cada cliente, contabilizar cada uno de los estados en esa fecha y crear una visualizacion estilo KPI para las deserciones.
A fin de cuentas lo que quiero es esto:
Pero de esta manera (solo para deserción):
He escrito demasiado. Disculpame.
Saludos y gracias anticipadas,
Ivan Cespedes
Rep. Dom.
Escribeme al
Facebook vvelardeb para que me expliques la lógica desde el inicio hasta el resultado esperado.
Me parece sencillo pero con la lógica bien entendida.
Ya resolví el problema. Muchas gracias por tu ayuda. Tengo solo otra duda de otro problema pero no seguiré turbándote. Ten un buen día!
La pc no me deja entrar al Facebook. Pero por mensaje privado te puedo mandar mi whatsapp
User | Count |
---|---|
124 | |
108 | |
99 | |
62 | |
62 |
User | Count |
---|---|
137 | |
115 | |
102 | |
71 | |
61 |