Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
kmclorg
Helper I
Helper I

Counting Starts (how to ignore start dates when a client relationship already exists)

I have a data table with clients, and products. A client can own many different products. Each product has a start and an end date. 

 

As long as the client owns at least one product, the client is considered to be a client. If the client ends all their products, they are no longer a client.  A person might end all products and then purchase new products 2 or 3 years later - becoming a client again.

 

I want to count how many new clients we get in each time period.

 

I can count all the Active clients by using:

Active Client = CALCULATE(DISTINCTCOUNT('Products'[ClientNumber]),
FILTER('Products', ('Products'[StartDate] <= LASTDATE('DIMDate'[Date])
&& 'Products'[EndDate]>= FIRSTDATE('DIMDate'[Date]))))

 

but this formula only tells me how many active clients I have, not how many NEW clients have joined (or how many clients have terminated).

 

I tried making a client table with one record per client and counting on start dates:

New Clients= CALCULATE(COUNTROWS('ClientTable'),
FILTER('ClientTable', ('ClientTable'[FirstClientStartDate] <= LASTDATE('DIMDate'[Date])
&& 'ClientTable'[FirstClientStartDate]>= FIRSTDATE('DIMDate'[Date]))))

 

But this gives me the wrong number because it assumes that a Client can only become a client ONE TIME - and it neglects to show when clients REJOIN

 

I think what I need is a table that shows me EVERY TIME a Client starts or ends a relationship - but I can't figure out how to build this. Since clients can have many products and the products overlap in time, I am not sure how to identify when all products have been stopped within a time frame.

 

I have attached a link to a sample PBIX file with some records for testing.

sample pbix file

 

Or here is a table with sample records

ClientNumberStartDateEndDateProduct

12003-02-022004-02-02A
22005-02-022010-02-02B
32002-02-022005-02-02C
42002-02-022003-02-02B
52003-02-022007-02-02B
62001-02-022006-02-02A
72005-02-022006-02-02C
82002-02-022006-02-02B
12009-02-022010-02-02C
22001-02-022002-02-02A
42003-02-022006-02-02C
52005-02-022010-02-02C
62003-02-022007-02-02C
82004-02-022009-02-02A
22001-02-022002-02-02C
32002-02-022005-02-02B
52005-02-022007-02-02A
72001-02-022004-02-02B
32003-02-022008-02-02A
72001-02-022005-02-02A

 

For the desired results I would expect the results to show as follows:

Start Year         
2001200220032004200520062007200820092010
3320100010

 

Note: 2005 shows 1 because client#2 started again as a new client in 2005 (after having ended all previous products in 2002)

and 2009 shows 1 because client#1 started again as a new client in 2009 (after having ended all previous products in 2004)

1 ACCEPTED SOLUTION

Hi,

 

I think i have solved the problem.  Please download the file from here.

 

Untitled.png


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

View solution in original post

11 REPLIES 11
kmclorg
Helper I
Helper I

OK...I think I have figured out a solution although it seems pretty complex and I am sure could be simplified.

 

First I added four new calculated columns to my table:

VeryFirstStart = Calculate(Min('Products'[StartDate]),ALLEXCEPT(Products,Products[ClientNumber]))=Products[StartDate]

this flags records that contain the first ever start date of any client

 

MaxEndDateofEarlierDates = Calculate(Max('Products'[EndDate]),Filter(Filter(ALLEXCEPT(Products,Products[ClientNumber]),Products[EndDate]),Products[StartDate]<EARLIER(Products[StartDate])))

this shows the date of any NEW product purchases where the new start date occurs AFTER an ending date

 

Second+Start = And(Products[MaxEndDateofEarlierDates]<>BLANK(), Products[MaxEndDateofEarlierDates]<Products[StartDate])

this flags records where we want to count the new start date as a new client

 

NewStart = OR(Products[Second+Start],Products[VeryFirstStart])

this flags ANY new client start date regardless of whether it was the first or a subsequent

 

Finally I added this measure:

!MemberNewStarts = CALCULATE(DISTINCTCOUNT(Products[ClientNumber]),FILTER('Products', ('Products'[StartDate] <= LASTDATE('DIMDate'[Date])
&& 'Products'[StartDate]>= FIRSTDATE('DIMDate'[Date]) && Products[NewStart]=TRUE())))

 

It works and gives me the results I want - but seems very cumbersome. If anyone has a more efficient way to do this, I would love to know.

 

Here is a link to the Sample file with the current solution (I also added similar columns to calculate clients leaving).

 

https://1drv.ms/u/s!AslX_Cn_svS_gSB7rnmfwQJZKUKv

 

back to the drawing board...the approach above uses too much memory and cannot be calculated in my model which has 200000 records

 

Hi,

 

I think i have solved the problem.  Please download the file from here.

 

Untitled.png


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

This is fantastic! So much cleaner and smarter. Thank you. It works very well on my small test file,

 

I am currently trying to make it work on my big file (200,000 records of clients with products that span decades). I have already tried limiting the data table to just pull dates for the years that I want to chart (2014 to 2018), and it is still "applying the changes" to my query after 20 minutes. I will edit this post to let you know if it works on that much data in the end. Otherwise, maybe I will try to adapt the technique to make it work on a year by year basis rather than date to date.

Sadly the solution doesn't work for me with my big data set because of not enough memory.I tried to adapt the idea to follow the same logic but reduce the data set to  years rather than days, however, although size is no longer a problem, I don't get the results I was expecting (perhaps I  adapted something wrong)

 

I made a table with one record for each clientproductyear  combination (called ClientProductYear),then adapted the measures as follows and linked the ClientProductYear table to a simple year table called years. Then adapted the measures as follows - without success

 

Last year of interaction with client prior to the beginning of the active year = CALCULATE(MAX(ClientProductYear[Years]),GENERATESERIES(MINX(ALL('years'),'years'[Year]),MIN('years'[Year])-1))

 

New Client = if(ISBLANK(COUNTROWS(FILTER(VALUES(ClientProductYear[Client Number]),[Last date of interaction with client prior to the beginning of the active year]<MIN('years'[Year])-1))),0,COUNTROWS(FILTER(VALUES(ClientProductYear[Client Number]),[Last date of interaction with client prior to the beginning of the active year]<MIN('years'[Year])-1)))

 

I think I have to give up and tell the management team that due to the data size, we cannot count new clients starting if they have already started and ended a client relationship with us in the past (but that despite this we had really wonderful help from the online community). Thanks anyway.

Hi,

 

I had not used the GENRATESERIES() function in my formula.  I used DATESBETWEEN().  Try and use that.  Also, in the Else part of the IF() function, you are grouping by OMA.  You should group by Client Number.

 

Replicate my exact solution please.


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

Hi Ashish,

Thanks. Your solution works - but I can't use it because of my computing power. I have 200,000 records and each record represents a product that is owned for many years (decades) so the Data file becomes many millions of records.

 

I did replicate the solution exactly and I am sure it would have worked if my data set were smaller. It did work perfectly on a smaller test set that I tried.

 

What I just posted a few minutes ago with GenerateSeries was my adapted version where I was trying to relate the info by Years rather than days in order to save space (I had already made a smaller dataset which just listed products by year rather than day). Since I was breaking the data down by year long time intervals and matchng it by year, I had switched from Datesbetween to GenerateSeries. The text OMA that you saw was a typo I made in this forum but my actual measure has the correct column name. 

 

Tomorrow I will mark your original solution as the solution - since it answers the question. The problem in my case is that my computer just can't handle the memory demands that the solution requires and I just wanted to play with it a bit longer to see if I can come up with something that will work for my situation.

 

Thanks again

 

You are welcone.  Disheartened to hear that the solution did not work on 200,000 rows.


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

You are welcome.  Look forward to getting your feedback.


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/
Ashish_Mathur
Super User
Super User

Hi,

 

With respect to the data that you have shared, please also show the expected result.


Regards,
Ashish Mathur
http://www.ashishmathur.com
https://www.linkedin.com/in/excelenthusiasts/

Thanks, I have edited the original question to show what I would expect for results.

 

Start Year         
2001200220032004200520062007200820092010
3320100010

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.