Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
I have a data table with clients, and products. A client can own many different products. Each product has a start and an end date.
As long as the client owns at least one product, the client is considered to be a client. If the client ends all their products, they are no longer a client. A person might end all products and then purchase new products 2 or 3 years later - becoming a client again.
I want to count how many new clients we get in each time period.
I can count all the Active clients by using:
Active Client = CALCULATE(DISTINCTCOUNT('Products'[ClientNumber]),
FILTER('Products', ('Products'[StartDate] <= LASTDATE('DIMDate'[Date])
&& 'Products'[EndDate]>= FIRSTDATE('DIMDate'[Date]))))
but this formula only tells me how many active clients I have, not how many NEW clients have joined (or how many clients have terminated).
I tried making a client table with one record per client and counting on start dates:
New Clients= CALCULATE(COUNTROWS('ClientTable'),
FILTER('ClientTable', ('ClientTable'[FirstClientStartDate] <= LASTDATE('DIMDate'[Date])
&& 'ClientTable'[FirstClientStartDate]>= FIRSTDATE('DIMDate'[Date]))))
But this gives me the wrong number because it assumes that a Client can only become a client ONE TIME - and it neglects to show when clients REJOIN
I think what I need is a table that shows me EVERY TIME a Client starts or ends a relationship - but I can't figure out how to build this. Since clients can have many products and the products overlap in time, I am not sure how to identify when all products have been stopped within a time frame.
I have attached a link to a sample PBIX file with some records for testing.
Or here is a table with sample records
ClientNumberStartDateEndDateProduct
1 | 2003-02-02 | 2004-02-02 | A |
2 | 2005-02-02 | 2010-02-02 | B |
3 | 2002-02-02 | 2005-02-02 | C |
4 | 2002-02-02 | 2003-02-02 | B |
5 | 2003-02-02 | 2007-02-02 | B |
6 | 2001-02-02 | 2006-02-02 | A |
7 | 2005-02-02 | 2006-02-02 | C |
8 | 2002-02-02 | 2006-02-02 | B |
1 | 2009-02-02 | 2010-02-02 | C |
2 | 2001-02-02 | 2002-02-02 | A |
4 | 2003-02-02 | 2006-02-02 | C |
5 | 2005-02-02 | 2010-02-02 | C |
6 | 2003-02-02 | 2007-02-02 | C |
8 | 2004-02-02 | 2009-02-02 | A |
2 | 2001-02-02 | 2002-02-02 | C |
3 | 2002-02-02 | 2005-02-02 | B |
5 | 2005-02-02 | 2007-02-02 | A |
7 | 2001-02-02 | 2004-02-02 | B |
3 | 2003-02-02 | 2008-02-02 | A |
7 | 2001-02-02 | 2005-02-02 | A |
For the desired results I would expect the results to show as follows:
Start Year | |||||||||
2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 |
3 | 3 | 2 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
Note: 2005 shows 1 because client#2 started again as a new client in 2005 (after having ended all previous products in 2002)
and 2009 shows 1 because client#1 started again as a new client in 2009 (after having ended all previous products in 2004)
Solved! Go to Solution.
Hi,
I think i have solved the problem. Please download the file from here.
OK...I think I have figured out a solution although it seems pretty complex and I am sure could be simplified.
First I added four new calculated columns to my table:
VeryFirstStart = Calculate(Min('Products'[StartDate]),ALLEXCEPT(Products,Products[ClientNumber]))=Products[StartDate]
this flags records that contain the first ever start date of any client
MaxEndDateofEarlierDates = Calculate(Max('Products'[EndDate]),Filter(Filter(ALLEXCEPT(Products,Products[ClientNumber]),Products[EndDate]),Products[StartDate]<EARLIER(Products[StartDate])))
this shows the date of any NEW product purchases where the new start date occurs AFTER an ending date
Second+Start = And(Products[MaxEndDateofEarlierDates]<>BLANK(), Products[MaxEndDateofEarlierDates]<Products[StartDate])
this flags records where we want to count the new start date as a new client
NewStart = OR(Products[Second+Start],Products[VeryFirstStart])
this flags ANY new client start date regardless of whether it was the first or a subsequent
Finally I added this measure:
!MemberNewStarts = CALCULATE(DISTINCTCOUNT(Products[ClientNumber]),FILTER('Products', ('Products'[StartDate] <= LASTDATE('DIMDate'[Date])
&& 'Products'[StartDate]>= FIRSTDATE('DIMDate'[Date]) && Products[NewStart]=TRUE())))
It works and gives me the results I want - but seems very cumbersome. If anyone has a more efficient way to do this, I would love to know.
Here is a link to the Sample file with the current solution (I also added similar columns to calculate clients leaving).
https://1drv.ms/u/s!AslX_Cn_svS_gSB7rnmfwQJZKUKv
back to the drawing board...the approach above uses too much memory and cannot be calculated in my model which has 200000 records
Hi,
I think i have solved the problem. Please download the file from here.
This is fantastic! So much cleaner and smarter. Thank you. It works very well on my small test file,
I am currently trying to make it work on my big file (200,000 records of clients with products that span decades). I have already tried limiting the data table to just pull dates for the years that I want to chart (2014 to 2018), and it is still "applying the changes" to my query after 20 minutes. I will edit this post to let you know if it works on that much data in the end. Otherwise, maybe I will try to adapt the technique to make it work on a year by year basis rather than date to date.
Sadly the solution doesn't work for me with my big data set because of not enough memory.I tried to adapt the idea to follow the same logic but reduce the data set to years rather than days, however, although size is no longer a problem, I don't get the results I was expecting (perhaps I adapted something wrong)
I made a table with one record for each clientproductyear combination (called ClientProductYear),then adapted the measures as follows and linked the ClientProductYear table to a simple year table called years. Then adapted the measures as follows - without success
Last year of interaction with client prior to the beginning of the active year = CALCULATE(MAX(ClientProductYear[Years]),GENERATESERIES(MINX(ALL('years'),'years'[Year]),MIN('years'[Year])-1))
New Client = if(ISBLANK(COUNTROWS(FILTER(VALUES(ClientProductYear[Client Number]),[Last date of interaction with client prior to the beginning of the active year]<MIN('years'[Year])-1))),0,COUNTROWS(FILTER(VALUES(ClientProductYear[Client Number]),[Last date of interaction with client prior to the beginning of the active year]<MIN('years'[Year])-1)))
I think I have to give up and tell the management team that due to the data size, we cannot count new clients starting if they have already started and ended a client relationship with us in the past (but that despite this we had really wonderful help from the online community). Thanks anyway.
Hi,
I had not used the GENRATESERIES() function in my formula. I used DATESBETWEEN(). Try and use that. Also, in the Else part of the IF() function, you are grouping by OMA. You should group by Client Number.
Replicate my exact solution please.
Hi Ashish,
Thanks. Your solution works - but I can't use it because of my computing power. I have 200,000 records and each record represents a product that is owned for many years (decades) so the Data file becomes many millions of records.
I did replicate the solution exactly and I am sure it would have worked if my data set were smaller. It did work perfectly on a smaller test set that I tried.
What I just posted a few minutes ago with GenerateSeries was my adapted version where I was trying to relate the info by Years rather than days in order to save space (I had already made a smaller dataset which just listed products by year rather than day). Since I was breaking the data down by year long time intervals and matchng it by year, I had switched from Datesbetween to GenerateSeries. The text OMA that you saw was a typo I made in this forum but my actual measure has the correct column name.
Tomorrow I will mark your original solution as the solution - since it answers the question. The problem in my case is that my computer just can't handle the memory demands that the solution requires and I just wanted to play with it a bit longer to see if I can come up with something that will work for my situation.
Thanks again
You are welcone. Disheartened to hear that the solution did not work on 200,000 rows.
You are welcome. Look forward to getting your feedback.
Hi,
With respect to the data that you have shared, please also show the expected result.
Thanks, I have edited the original question to show what I would expect for results.
Start Year | |||||||||
2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 | 2008 | 2009 | 2010 |
3 | 3 | 2 | 0 | 1 | 0 | 0 | 0 | 1 | 0 |
User | Count |
---|---|
140 | |
113 | |
104 | |
77 | |
65 |
User | Count |
---|---|
135 | |
117 | |
101 | |
71 | |
61 |