Create a duplicate search function for customer DB...

masplin · ‎09-15-2017

actually doing this in powerpivot but the code woudl be the same. We have an issue with our employees creating duplicated cusotmers as not bothering to search for thme properly. We aslo have fragmented data as not every custoemr has the same fields populated. I've writen some code to locate people sharing mobile number sor email addresses which all works fine as the FILTEr is using all positives for the EARLIER comparison.

I've got a bit stuck on the alst step where i want to find cuostmers that sahre address and surname, but not if we also have mobile or email data that says that are different (most commonly husband and wife say)

so for example the Barkers are clearly difffent people as we have mobiles that are all different, whereas for Daley the last 2 are differnet, but the first one has a blank mobile and email so could be a duplicate of one of the others. Ideally my code would only be flagging the 4th line

Customer No	Salutation Code	First Name	Surname	Address	Address 2	Post Code	E-Mail	Mobile Phone No
1390568	MRS	Rachel	Barker	2 Nettleton Close	BD4 0SS		7929088569
2025565	MRS	Anna	Barker	2 Nettleton Close	BD4 0SS		7508610410
1392564	MS	Victoria	Barker	2 Nettleton Close	BD4 0SS	barker.ve@hotmail.com	7923240056
2004798	MR		Daley	28 Speedwell Road	BS5 7SB
2004796	MR	M	Daley	28 Speedwell Road	BS5 7SB	yaris246@blueyonder.co.uk	7776261368
1302226	MR		Daley	28 Speedwell Road	BS5 7SB	wesleydaley.wd@gmail.com	7768382597

The code I have so far is

=CALCULATE(
                        COUNTROWS(Customer),
                        FILTER(
                                       Customer,
                                       Customer[Post Code]<>"DONT USE" &&
                                       Customer[Post Code]=EARLIER(Customer[Post Code])  &&
                                       Customer[Post Code]<>BLANK() &&
                                       Customer[Address]=EARLIER(Customer[Address]) &&
                                       Customer[Sex Code]=EARLIER(Customer[Sex Code]) &&
                                       Customer[Mobile Phone No]<>EARLIER(Customer[Mobile Phone No]) && 
                                      OR( LEFT(Customer[Surname],3)=LEFT(EARLIER(Customer[Surname]),3), RIGHT(Customer[Surname],3)=RIGHT(EARLIER(Customer[Surname]),3)) 
                                                                    )
                          )

but the mobile no bit doesn't do what I need. Also we have lots of blanks. I cannot work out anyway to automatically exclude the Barkers (as the mobiles are all different). If one mobile is blank and other is populated that would need ot be flagged, so they are onyl not counted if both have the data AND it is different.

Hopefully someone more intelligent can point me in the right direction.

Thnaks for any help

Mike

Phil_Seamark · ‎09-15-2017

Hi @masplin,

This calculated column might be a start. It is a way of identifying the barkers as being safe to exclude based on the number of distinct mobiles to the number of rows. This column could then be used in a new filter to help you fine tune.

Customers To Exclude = 
var ColAddress = 'Customer'[Address]
var ColPostCode = 'Customer'[Post Code]
var RowsPerCustomer = 
                CALCULATE(
                            COUNTROWS('Customer'),
                            FILTER(
                                ALL(Customer),
                                'Customer'[Address] = ColAddress &&
                                'Customer'[Post Code] = ColPostCode
                                )
                            )
                            
var DistinctMobilesPerCustomer =  
                CALCULATE(
                            DISTINCTCOUNT('Customer'[Mobile Phone No]),
                            FILTER(
                                ALL(Customer),
                                'Customer'[Address] = ColAddress &&
                                'Customer'[Post Code] = ColPostCode && 
                                'Customer'[Mobile Phone No] <> BLANK()
                                )
                            )
RETURN  IF(RowsPerCustomer= DistinctMobilesPerCustomer,1,0)

To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

masplin · ‎09-17-2017

Hi Phil

Is the clause

 'Customer'[Address] = ColAddress

doing somethnig similar to the use of EARLIER i.e grouping them by these fuields with common entries?

So i think if ALL the cusotmers in the rowspercusmter group are differnet it would exclude them e.g. Barker, but if there are either 2 mobiles the same OR one mobile blank it would not?

Seems brilliant!!!!

Phil_Seamark · ‎09-17-2017

HI @masplin

Yeah, the syntax I have used is just the same as using the EARLIER function. I use it because I find it slightly more readable but it's functionally the same.

To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

masplin · ‎09-22-2017

Hi Phil. i see how it works in principal but it doesn't work in practice. i have 113,000 customers 32GB of RAM 64 bit and it just can't complete the distinct calc. I just tried this simple version and it never completes. Thisi trying find cusotmers with same mobile, gender and surname, but not where the initial is the non-blank and the same.

Guess no other optins on this?

=                             
              CALCULATE(
                        DISTINCTCOUNT(Customer[Initial]),
                        FILTER(
                                       Customer,
                                       Customer[Mobile Phone No]=EARLIER(Customer[Mobile Phone No])  &&
                                       Customer[Mobile Phone No]<>BLANK() &&
                                       Customer[Sex Code]=EARLIER(Customer[Sex Code]) &&
                                      OR( LEFT(Customer[Surname],3)=LEFT(EARLIER(Customer[Surname]),3), RIGHT(Customer[Surname],3)=RIGHT(EARLIER(Customer[Surname]),3)) &&
                                     Customer[Initial]<>BLANK()
                                       )
                          )

v-jiascu-msft · ‎10-10-2017

Hi @masplin,

Maybe you can try to add more than one Calculated Columns to get this work, which would calculate it step by step rather than one big step. Maybe this could work once a step.

1. Add a Calculated Column "Tag".

Tag =
IF (
    Customer[Mobile Phone No] <> BLANK ()
        && Customer[Initial] <> BLANK (),
    1,
    0
)

2. Add another Calculated Column.

 =
VAR mpn = Customer[Mobile Phone No]
VAR sc = Customer[Sex code]
RETURN
    CALCULATE (
        DISTINCTCOUNT ( Customer[Initial] ),
        FILTER (
            Customer,
            Customer[Mobile Phone No] = mpn
                && Customer[Sex Code] = sc
                && Customer[Tag] = 1
&& OR ( LEFT ( Customer[Surname], 3 ) = LEFT ( [Surname], 3 ), RIGHT ( Customer[Surname], 3 ) = RIGHT ( [Surname], 3 ) )
        )
    )

Best Regards!

Dale

Community Support Team _ Dale
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Create a duplicate search function for customer DB using LAX

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly