I browsed the web and found literally 100's of tutorials and answers on how to build the new and returning customers measure. However, I could not find anything that matches my data model.
I'd like to calculate the distinct count of active customers per year. What makes a customer active?
- the order date
- and the licence expiry date
For example, a customer who completes a purchase today will be active in 2019, until the licence expires. So, if the licence expires in 2021, he will be active in 2019, 2020 and 2021.
Another example. A customer who bought a licence in 2017 and the licence expires in 2018, he would be considered active in 2017 and 2018. But not on 2019, unless he buys again.
So far, I could only come up with the following
CustomersPerYear = CALCULATE(DISTINCTCOUNT(Invoices[InCustID]),Invoices)
but that will only count customers who completed a purchase during the year.
These are the tables in my query.
I believe I need to:
- first of all, establish a relationship between the Items table and the Date table
- run a distinct count on InCustID with the following conditions:
- there is a Creation Date or
- the Expiry Date is greater than or equal to the Year
Any suggestion please? Thank you.
one of my current strategies to solve such answers is by expanding the Start and End dates into a new table, most of the time I'm using a DAX statement that looks similiar to this:
Table = GENERATE( DISTINCT('Table1') , var datestart = 'Table1'[DateStart] var dateend = 'Table1'[DateEnd] return GENERATESERIES( datestart,dateend,1 ) )
The underlying assumption: both date columns are of the datatype Date (better) or Datetime.
After creating such a table I'm able to relate my Date table to the column "Value" the columnname used by GENERATESERIES, and of course all the dimension tables.
I tend to hide this table, as I'm just using this table for the counting stuff, for this reason I'm assigning all the measures to the "original" table. The reason why I'm doing this: I'm using the storage engine of the DAX engine for the filtering part and not the slower formula engine. This works well even for huge number of rows. From the description you have given, it may not work as your dateranges can span multiple years. But I still would consider to give it a try :-)
Here are some great articles how to optimize DISTINCTCOUNT:
Hopefully this provides some ideas!
Thank you @TomMartens , appreciated.
The Start and End dates are the CreationDate and ExpiryDate in the items table. Yes, they span across several year, but I'd like to give it a try. I attempted to create that table, however, because some ExpiryDates are blank, the measure fails with this error:
The arguments in GenerateSeries function cannot be blank.
I guess it happens because some ExpiryDates are blank. I can hide them in the query but I may need them in other reports. So, the question is how to generate that table and exclude blank dates, please?
as most of the time blank values are indicating that a certain event did not happen until now, you have to decide, maybe replace missing values with the date of today, or if it fits your business topic better, replace it with the max date of your calendar table (if this table is not "unreasonable" long :-))
As there are many blank values, you also might to consider to move everything to measure, using a virtual table, based on the filtered table and expand this table and count the customers. I also did this still with great to good performance on large datasets. Using a measure may get rid of some of rows with a blank value by the filtering.