Solved: Re: Accounts with both a failed payment AND a succ...

DAXisMyNemesis · ‎07-29-2019

Hi all

I don't really know how to go about searching for the solution for this as I'm not quite sure what the keywords should be...

I'm designing a model based on customer payments and am looking to produce figures for customers who have had a payment fall through ('payment reversed'), but have also made a payment successfully in the same month ('payment not reversed'). Not bothered about the order of events at this stage. I have measures to calculate counts for each separately, but I'm not sure how to go about isolating the inner portion of the Venn diagram!

I have a solution which filters based on measure outputs but the performance is horrible (as I knew it would be). I would have thought I should be

1. pulling a list of account IDs that have a reversal in the month
2. storing that in a VAR
3. applying that list of account IDs as a filter in a CALCULATE(count of successful payments)

Unfortunately my career in DAX is pretty young so I'm getting coder's block when I try to take that many steps in one fell swoop! I'm getting a mental block particularly around populating a VAR with a column of reversed Account IDs rather than pulling a full table, and it seems like a PITA to try and take a particular column out of a table VAR.

Help!

Anonymous · ‎08-06-2019

[Accounts with both Successful and Failed Direct Debits in Month] := 
CALCULATE(
	var __accountsWithSuccessfulPayment =
		calculatetable (
			summarize (
				TRANSACTIONS,
				ACCOUNTS[AccountID],
				DIMDATES[MonthEndDate]
			),
			TRANSACTIONS[HasBeenReversed_Int] = 0
		)
	var __accountsWithFailedPayment =
		calculatetable (
			summarize (
				TRANSACTIONS,
				ACCOUNTS[AccountID],
				DIMDATES[MonthEndDate]
			),
			TRANSACTIONS[HasBeenReversed_Int] = 1
		)
	var __accountsWithBoth =
		intersect (
			__accountsWithSuccessfulPayment,
			__accountsWithFailedPayment
		)
	return
		countrows ( __accountsWithBoth ),
	
	KEEPFILTERS(DIMTRANSACTIONTYPES[PaymentEntryTypeName] = "Payment Received"),
	KEEPFILTERS(DIMTRANSACTIONTYPES[PaymentMethodGroup]= "Direct Debit"),
	KEEPFILTERS(DIMTRANSACTIONCodes[GlobalAccountType] = "Mortgage"),
	
	USERELATIONSHIP(DIMDATES[DateKey], TRANSACTIONS[SKeyDatePeriodEnd])
)

Here's you measure withouth GROUPBY.

Best

Darek

View solution in original post

DAXisMyNemesis · ‎08-05-2019

Hi

I've written replies to this twice and both have never actually been posted on here. My account has a post count of 2 but is only showing 1 that I can view. I even have a "First Reply" badge, despite no reply ever going public!

Really, REALLY annoying as I put quite a lot of effort into the 2nd attempt. So I'm just going to post this, and then edit it if it actually makes it into the thread...

Thanks

Edit - HA! I'm in.

Firstly, thanks for replying and apologies for being vague, not supplying code / a file. It was half 6 on a monday evening and I just wanted shot of work for the day after toiling with this for a good few hours, so I hoped my explanation would suffice for the timebeing. I cannot supply a file as this pulls from a tabular cube and the sheer quantity of VERY sensitive customer data involved makes anonymisation more of an undertaking than building the report in the first place. I could have thrown together a very basic equivalent Power BI sample though.

Secondly - @Cmcmahan has understood what I'm asking correctly. I don't want to know who has made more payments than failures, I want to know who has made both a payment AND a failed payment in the month, so we can isolate cases where they should probably be changing their payment date but are not being proactive about it.

This works -

DDs Success And Fail in Month:=
CALCULATE(
	COUNTX(
		ADDCOLUMNS( 
			VALUES(TRANSACTIONS[AccountID])
			, "BothSuccessAndFail", CALCULATE(IF(
						    1 IN VALUES(TRANSACTIONS[HasBeenReversed_Int]) 
						&&  0 IN VALUES (TRANSACTIONS[HasBeenReversed_Int]) 				
							, 1								
							)
						)
			)
		, [BothSuccessAndFail]
	)
, USERELATIONSHIP(DIMDATES[DateKey], TRANSACTIONS[KeyDatePeriodEnd])
, KEEPFILTERS(DIMPAYMENTTYPES[PaymentEntryTypeName] = "Payment Received")
, KEEPFILTERS(DIMPAYMENTTYPES[PaymentMethod]= "Direct Debit")
, KEEPFILTERS(DIMPAYMENTCODES[AccountType] = "Mortgage")
, DATESMTD(DIMDATES[Date])

I could get the ADDCOLUMNS to work but not the SUMMARIZECOLUMNS, which throws an error "SummarizeColumns() and AddMissingItems() may not be used in this context." I tried building the calculate within the SummarizeColumns as I thought external filters might causing be the problem, but as soon as I put the measure in a pivot against month, it throws that error. I suppose that's an external filter too...

DDs Success And Fail in Month 2:=
	COUNTX(
		SUMMARIZECOLUMNS( 
			VALUES(TRANSACTIONS[AccountID]), VALUES('TRANSACTIONS'[AccountID])
			, CALCULATETABLE(
				VALUES('TRANSACTIONS'[AccountID])
				, USERELATIONSHIP(DIMDATES[DateKey], TRANSACTIONS[KeyDatePeriodEnd])
				, KEEPFILTERS(DIMPAYMENTTYPES[PaymentEntryTypeName] = "Payment Received")
				, KEEPFILTERS(DIMPAYMENTTYPES[PaymentMethod]= "Direct Debit")
				, KEEPFILTERS(DIMPAYMENTCODES[AccountType] = "Mortgage")
			  )
			, "BothSuccessAndFail", IF(
						    1 IN VALUES(TRANSACTIONS[HasBeenReversed_Int]) 
						&&  0 IN VALUES (TRANSACTIONS[HasBeenReversed_Int]) 
																
							, 1								
						)
			)
		, [BothSuccessAndFail]
	)
)

The performance of this is definitely improved from what I had - it's down to 5 seconds rather than 20 - but I wonder if it can be improved more? There are around 15 million rows involved, so even with filtering as I do with the CALCULATE, I don't know if this will be a natural side effect of using COUNTX.
One of the things not being factored in to the measure at the moment is to check for other payments within the SAME month. And then, to be able to add the total per month together to get a 12 rolling month total! By contrast what this does now if you take a total over a year is perform a count of all accounts which have had a payment and a failure at ANY time within the year (e.g. success in Feb, fail in July).

I'm guessing this would be where I need a SummarizeColumns to be able to group on more than just the account?

Sorry again, I know this is a wall of text. In short - why is the summarizecolumns not working, and how do I build monthly constraints within the measure so I that a yearly figure = month 1 + month 2 + month 3 etc?

Thanks, this has been a huge help and my late reply is not intentional!!

Anonymous · ‎08-05-2019

By the way... Optimizing DAX means "pushing as much work as possible to the Storage Engine which is multi-threaded as opposed to the Formula Engine which is always single-threaded." I can see, for instance, you've got an IF in there. IF will be calculated by the Formula Engine and this means the query will not be as fast it it could be. But fast queries are not only a matter of writing DAX but above all a matter of having the correct model. Without a good model, no amount of tuning will yield good performance. Easy as that.

Best
Darek

DAXisMyNemesis · ‎08-05-2019

Hi Darek

Thanks for your response. In which case can you advise how you would go about removing the IF logic from this measure? I would have thought the best approach would be to create a list of AccountIDs that have failed in the month and then use that to filter a list of AccountIDs that have had a payment in the month. No IF statements, no heavy formula work, just taking the inner part of the Venn diagram.

I'm not really sure how the model would be negatively affecting this situation, the model itself is fairly robustly designed albeit more for a SQL & SSRS environment than the fairly straightforward tabular cube I have built. If we can build a solution that is storage engine focussed then maybe we can better judge whether there is indeed a shortcoming in the model design.

Thanks and looking forward to hearing more from you!

Anonymous · ‎08-05-2019

[# Accounts (Pymnt & Rvrsl)] =
CALCULATE(
	var __accountsWithPymnt =
		CALCULATE(
			VALUES( Transactions[AccountID] ),
			KEEPFILTERS( TRANSACTIONS[HasBeenReversed_Int] = 0 )
		)
	var __accountsWithPymntAndRvrsl =
		CALCULATE(
			VALUES( Transactions[AccountID] )
			__accountsWithPymnt,
			KEEPFILTERS( TRANSACTIONS[HasBeenReversed_Int] = 1 )
		)
	var __countOfAcconts = 
		COUNTROWS( __accountsWithPymntAndRvrsl )
	return
		__countOfAccounts,
		
    KEEPFILTERS ( DIMPAYMENTTYPES[PaymentEntryTypeName] = "Payment Received" ),
    KEEPFILTERS ( DIMPAYMENTTYPES[PaymentMethod] = "Direct Debit" ),
    KEEPFILTERS ( DIMPAYMENTCODES[AccountType] = "Mortgage" ),
    
    USERELATIONSHIP ( DIMDATES[DateKey], TRANSACTIONS[KeyDatePeriodEnd] )
)

Try this one... You might need to re-consider the use of KEEPFILTERS depending on what you want to achieve. From this measure, it's then easy to create derivatives like, for instance, month-to-date:

[# Accounts (Pymnt & Rversl) MTD] =
	CALCULATE(
		[# Accounts (Pymnt & Rvrsl)],
		DATESMTD( DIMDATES[DateKey] )
	)

Best

Darek

DAXisMyNemesis · ‎08-06-2019

Thanks Darek.

I'd managed to put together something along the same lines, albeit messier, and the performance is far better than the IF statement equivalent. I'll give your version a go too, though I think it'll hit the same issues where it doesn't isolate payments & fails to the same month.

Would GROUPBY hit the same issues as with SUMMARIZE? I guess I could use SELECTCOLUMNS instead in the variable, I just naturally tend towards getting distinct values but there's no benefit.

Accounts with both Successful and Failed Direct Debits in Month:=
VAR FailAccounts = 
CALCULATETABLE(
	GROUPBY(TRANSACTIONS, ACCOUNTS[AccountID], DIMDATES[MonthEndDate]),
	 TRANSACTIONS[HasBeenReversed_Int]=0,
	KEEPFILTERS(DIMTRANSACTIONTYPES[PaymentEntryTypeName] = "Payment Received"),
	KEEPFILTERS(DIMTRANSACTIONTYPES[PaymentMethodGroup]= "Direct Debit"),
	KEEPFILTERS(DIMTRANSACTIONCodes[GlobalAccountType] = "Mortgage")	,
	USERELATIONSHIP(DIMDATES[DateKey], TRANSACTIONS[SKeyDatePeriodEnd])
	/*,DATESMTD(DIMDATES[DateValue])*/
	)
RETURN

CALCULATE(COUNTROWS(GROUPBY(TRANSACTIONS, ACCOUNTS[AccountID], DIMDATES[MonthEndDate]))
	, USERELATIONSHIP(DIMDATES[DateKey], TRANSACTIONS[SKeyDatePeriodEnd])
	,  TRANSACTIONS[HasBeenReversed_Int]=1
	, KEEPFILTERS(DIMTRANSACTIONTYPES[PaymentEntryTypeName] = "Payment Received")
	, KEEPFILTERS(DIMTRANSACTIONTYPES[PaymentMethodGroup]= "Direct Debit")
	, KEEPFILTERS(DIMTRANSACTIONCodes[GlobalAccountType] = "Mortgage")
	/*, DATESMTD(DIMDATES[DateValue])*/
	, FailAccounts
)

Anonymous · ‎08-06-2019

[Accounts with both Successful and Failed Direct Debits in Month] := 
CALCULATE(
	var __accountsWithSuccessfulPayment =
		calculatetable (
			summarize (
				TRANSACTIONS,
				ACCOUNTS[AccountID],
				DIMDATES[MonthEndDate]
			),
			TRANSACTIONS[HasBeenReversed_Int] = 0
		)
	var __accountsWithFailedPayment =
		calculatetable (
			summarize (
				TRANSACTIONS,
				ACCOUNTS[AccountID],
				DIMDATES[MonthEndDate]
			),
			TRANSACTIONS[HasBeenReversed_Int] = 1
		)
	var __accountsWithBoth =
		intersect (
			__accountsWithSuccessfulPayment,
			__accountsWithFailedPayment
		)
	return
		countrows ( __accountsWithBoth ),
	
	KEEPFILTERS(DIMTRANSACTIONTYPES[PaymentEntryTypeName] = "Payment Received"),
	KEEPFILTERS(DIMTRANSACTIONTYPES[PaymentMethodGroup]= "Direct Debit"),
	KEEPFILTERS(DIMTRANSACTIONCodes[GlobalAccountType] = "Mortgage"),
	
	USERELATIONSHIP(DIMDATES[DateKey], TRANSACTIONS[SKeyDatePeriodEnd])
)

Here's you measure withouth GROUPBY.

Best

Darek

DAXisMyNemesis · ‎08-07-2019

Thanks Darek. I'll go with this as the solution.

To be clear about what I wanted, I want to calculate figures which are accurate at month granularity, but which roll up so that a 12 month figure would equate to Month 1 + Month 2 + Month 3 etc.

Hence the need for the month end date in here (we do have a monthkey too, which is a six digit integer, so I've swapped to that though I don't think it makes much difference).

This has been a huge help, I've learnt a lot of really useful concepts and functions from this discussion. Especially about using VAR inside a calculate to maintain any filters throughout, and the use of INTERSECT too!

Thanks @Anonymous and @Cmcmahan

Anonymous · ‎08-06-2019

First of all, you cannot use SUMMARIZECOLUMS in measures.

Second of all, there is no need to use GROUPBY, which should only be used if you have a need to transform the rows in a group separately. You don't have this need, so this should not be used.

Thirdly, what does it mean "with both Successful and Failed Direct Debits in Month"? What does it calculate if you select one full year, say 2018, from DIMDATES? What does it calculate then?

I understand MonthEndDate is - for each day in the DIMDATES table - the last day of the month to which the date belongs. But then, would it not be easier and clearer use something like a year-month indicator ("2018-01")?

Second questions is this. What does it calculate when the selected granularity of time is greater than month? Say, a week? Or a day? Or a fortnight? Does it calculate what you need?

Best
Darek

Anonymous · ‎08-05-2019

SUMMARIZECOLUMNS should not be used in measures. It's for writing queries only. It's a long story why (you can go to www.sqlbi.com and read upon this topic). Secondly, instead of SUMMARIZE you could and should use ADDCOLUMNS/SUMMARIZE.

Best
Darek

Anonymous · ‎07-29-2019

You should have posted some data... and pics...

This is the best I can do without almost any information:

[# Customers With Pmnt & Rvrsl] =
var __accountsWithReversal =
	CALCULATETABLE (
		SUMMARIZE (
			Transactions,
			Accounts[AccountID]
		),
		Transactions[Type] = "reversal"
	)
-- To figure out if a real payment
-- was made during the period it's
-- not enough to filter by Type = "payment".
-- One has to make sure that the payment
-- did not have a reversal in the period.
-- So, we have to pull out the accounts
-- where the number of "payments" is greater
-- than the number of "reversals". This, of course,
-- is a simplification. You have to figure out
-- what it means for an account "to have a real
-- payment within a selected period of time."
var __accountsWithPaymentAndReversal =
	FILTER (
		ADDCOLUMNS(
			__accountsWithReversal,
			"Diff",
				CALCULATE(
					COUNTROWS ( Transactions ),
					Transactions[Type] = "payment"
				) - 
				CALCULATE (
					COUNTROWS ( Transactions ),
					Transactions[Type] = "reversal"
				)
		),
		[Diff] > 0
	)
var __accountCount = 
	COUNTROWS ( __accountsWithPaymentAndReversal )
RETURN
	__accountCount

Best

Dare

Cmcmahan · ‎07-29-2019

So to start with, can you share some sample of your data? We can make guesses about how it would look, but if you share what you actually have, that would be very useful, and get you a more specific answer faster. I'm guessing your table has columns similar to: {PaymentID, CustomerID, Date, Amount, Status}, but status might be a true/false flag for reversed payments, or have multiple text categories. If you could share that info, we can give you a better answer.

Also, when you say that you want to "produce figures" for these customers, what do you mean? Do you just want a count of customers that meet both these criteria in a given month? Do you just want a list of CustomerIDs? Do you want to do more advanced calculations based on other data associated with those payments?

Here's a measure that would give a count of IDs that have both a 1 and a 0 value in the Status column for the currently sliced time period:

CountFailAndSuccess = 
COUNTX(
    SUMMARIZE(
        Payments, Payments[CustomerID], 
        "BothSuccessAndFail", IF(1 IN VALUES(Payments[Status]) && 0 IN VALUES(Payments[Status]), 1)
    ),
    [BothSuccessAndFail]
)

Anonymous · ‎07-29-2019

Please do not use SUMMARIZE to calculate values. SUMMARIZE is a VERY COMPLEX function (with bugs into the bargain) that should only be used for grouping. ONLY. If you want to suffer and scratch your head later down the line for hours on end, then you can ignore this advice.

Please read this to know WHY YOU SHOULD NEVER DO WHAT YOU'VE SUGGESTED: https://www.sqlbi.com/articles/all-the-secrets-of-summarize/

Best

Darek

Cmcmahan · ‎07-29-2019

While I appreciate the chance to learn, there's no need to be so hostile about it. It would also help if you could re-write the DAX to not use SUMMARIZE in a way that angers you so much.

After reading the article linked, here's a happier version of the DAX that doesn't include SUMMARIZE:

CountFailAndSuccess = 
COUNTX(
    ADDCOLUMNS( VALUES(Payments[CustomerID]), "BothSuccessAndFail", CALCULATE(IF(1 IN VALUES(Payments[Status]) && 0 IN VALUES(Payments[Status]), 1))),
    [BothSuccessAndFail]
)

It looks like the issues with unpredictable results you would run into while using SUMMARIZE only really occured when you also used a CALCULATE expression within the calculated values, which my previous answer didn't.

In addition to all of this, in the article you linked, the VERY FIRST SENTENCE is:

The content of this article is obsolete as of January 2018. Recent versions of Excel 2016, Power BI, and Analysis Services have a SUMMARIZE behavior that is different from the one described in this article.

Since we're in a post-January 2018 world, the behavior of SUMMARIZE has changed and the entire article is a neat history lesson. However, we can still follow their current advice and use SUMMARIZECOLUMNS with newly calculated results just like we used SUMMARIZE before:

CountFailAndSuccess = 
COUNTX(
    SUMMARIZECOLUMNS(Payments[CustomerID], "BothSuccessAndFail", IF(1 IN VALUES(Payments[Status]) && 0 IN VALUES(Payments[Status]), 1)),
    [BothSuccessAndFail]
)

DAXisMyNemesis · ‎08-02-2019

Hi - sorry for the late reply!

Thanks very much for your help everyone, appreciate it a lot.

I know my original request was a bit vague, I'm working with a tabular cube (SSAS 2017) that contains a LOT of sensitive data and so knocking together a fake data / anonymized equivalent would be a huge piece of work.

@Cmcmahan's approach is what I'm looking for, rather than counts of payments minus counts of cancellations. It's a count of accounts which have both payments and cancellations in the same month, regardless of volumes of each.

I've posted the code I'm using at the moment, which is getting towards what I want...

I've got the ADDCOLUMNS approach to work, but performance is still a bit ropey (4-5 seconds, which is better than 20!). I wonder if I can improve it by filtering within the ADDCOLUMNS, or by getting the "BothSuccessAndFail" to limit to the same month?

The Time Intelligence I'm after is a bit tricky. Ideally I want the figure to accumulate over more than a month, but for the IF Statement to only look at other transactions within the same month? (So an account can have a total of 3 in a quarter, but would have to have a payment AND a fail in each month to do so).

Here's the code I'm using at the moment - I know the DATESMTD needs to go, which is why I think I'd need to use SUMMARIZECOLUMNS...

DDs Success And Fail in Month:=
CALCULATE(
	COUNTX(
		ADDCOLUMNS( 
			VALUES(TRANSACTIONS[AccountID])
			, "BothSuccessAndFail", CALCULATE(IF(
							1 IN VALUES(TRANSACTIONS[HasBeenReversed_Int]) 
						         &&  0 IN VALUES (TRANSACTIONS[HasBeenReversed_Int]) 
																
							, 1								
							)
						)
			)
		, [BothSuccessAndFail]
	)
, USERELATIONSHIP(DIMDATES[DateKey], TRANSACTIONS[KeyDatePeriodEnd])
, KEEPFILTERS(DIMPAYMENTTYPES[PaymentEntryTypeName] = "Payment Received")
, KEEPFILTERS(DIMPAYMENTTYPES[PaymentMethod]= "Direct Debit")
, KEEPFILTERS(DIMPAYMENTCODES[AccountType] = "Mortgage")
, DATESMTD(DIMDATES[DateValue])
)

But, if I use SUMMARIZECOLUMNS like below, I get an error message SummarizeColumns() and AddMissingItems() may not be used in this context. I thought putting the CALCULATETABLE within the SummarizeColumns would fix this issue but it doesn't seem to. What am I doing wrong??

(Edit: it only causes an error when I try to show values over time, rather than as a standalone unfiltered figure. The number it calculates is almost 3 times as high as it should be, so adding the date within the SummarizeColumns is obviously not having the desired effect)

DDs Success And Fail in Month 2:=
	COUNTX(
		SUMMARIZECOLUMNS( 
			'TRANSACTIONS'[AccountID], 'TRANSACTIONS'[KeyDatePeriodEnd]
			, CALCULATETABLE(
				VALUES('TRANSACTIONS'[AccountID])
				, USERELATIONSHIP(DIMDATES[DateKey], TRANSACTIONS[KeyDatePeriodEnd])
				, KEEPFILTERS(DIMPAYMENTTYPES[PaymentEntryTypeName] = "Payment Received")
				, KEEPFILTERS(DIMPAYMENTTYPES[PaymentMethod]= "Direct Debit")
				, KEEPFILTERS(DIMPAYMENTCODES[AccountType] = "Mortgage")
			  )
			, "BothSuccessAndFail", IF(
						    1 IN VALUES(TRANSACTIONS[HasBeenReversed_Int]) 
						&&  0 IN VALUES (TRANSACTIONS[HasBeenReversed_Int]) 
																
							, 1								
						)
			)
		, [BothSuccessAndFail]
	)
)

Thanks again, this has been a huge help already!!

Anonymous · ‎07-29-2019

"While I appreciate the chance to learn, there's no need to be so hostile about it." Hostile??? I don't have time for arguments, mate. Once again, please read the article in its entirety and also the discussions below (some are as young as 2 months) because you have clearly not understood it.

UPDATE 2018-01-24 : The content of this article is obsolete as of January 2018. Recent versions of Excel 2016, Power BI, and Analysis Services have a SUMMARIZE behavior that is different from the one described in this article. As noted below, using SUMMARIZE should be deprecated for aggregations and you should use SUMMARIZECOLUMNS instead. Read more in Introducing SUMMARIZECOLUMNS.

Please, when you do something, do it correctly. ALWAYS. Why is this important? Because if others follow your advice and the advice is wrong, they'll have a hard time understanding what's going on. Please save them grief and frustration.

Thanks.

Best

Darek

Anonymous · ‎07-29-2019

This does not sound overly complex 🙂 Nice exercise, in fact 🙂

Please share the file via OneDrive or Google Drive (set perms accordingly so I can access). If the file is too big, please supply a sample that is good enough to design a solution and illustrate any issues. If any data is sensitive, obfuscate.

Your calculations can be FAST but you have to build the right model. A good model is CRUCIAL. If it's **bleep**ty, your calcs will be **bleep**ty as well 🙂 Let me have a look at it.

Thanks and waiting for your file.

Best

Darek

Accounts with both a failed payment AND a succeeded payment in a month?

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024