Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
masplin
Impactful Individual
Impactful Individual

Create a duplicate search function for customer DB using LAX

 actually doing this in powerpivot but the code woudl be the same.  We have an issue with our employees creating duplicated cusotmers as not bothering to search for thme properly.  We aslo have fragmented data as not every custoemr has the same fields populated.  I've writen some code to locate people sharing mobile number sor email addresses which all works fine as the FILTEr is using all positives for the EARLIER comparison.

 

I've got a bit stuck on the alst step where i want to find cuostmers that sahre address and surname, but not if we also have mobile or email data that says that are different (most commonly husband and wife say) 

 

so for example the Barkers are clearly difffent people as we have mobiles that are all different, whereas  for Daley the last 2 are differnet, but the first one has a blank mobile and email so could be a duplicate of one of the others.  Ideally my code would only be flagging the 4th line 

 

Customer NoSalutation CodeFirst NameSurnameAddressAddress 2Post CodeE-MailMobile Phone No
1390568MRSRachelBarker2 Nettleton CloseBD4 0SS 7929088569
2025565MRSAnnaBarker2 Nettleton CloseBD4 0SS 7508610410
1392564MSVictoriaBarker2 Nettleton CloseBD4 0SSbarker.ve@hotmail.com7923240056
2004798MR Daley28 Speedwell RoadBS5 7SB  
2004796MRMDaley28 Speedwell RoadBS5 7SByaris246@blueyonder.co.uk7776261368
1302226MR Daley28 Speedwell RoadBS5 7SBwesleydaley.wd@gmail.com7768382597

 

The code I have so far is

 

=CALCULATE(
                        COUNTROWS(Customer),
                        FILTER(
                                       Customer,
                                       Customer[Post Code]<>"DONT USE" &&
                                       Customer[Post Code]=EARLIER(Customer[Post Code])  &&
                                       Customer[Post Code]<>BLANK() &&
                                       Customer[Address]=EARLIER(Customer[Address]) &&
                                       Customer[Sex Code]=EARLIER(Customer[Sex Code]) &&
                                       Customer[Mobile Phone No]<>EARLIER(Customer[Mobile Phone No]) && 
                                      OR( LEFT(Customer[Surname],3)=LEFT(EARLIER(Customer[Surname]),3), RIGHT(Customer[Surname],3)=RIGHT(EARLIER(Customer[Surname]),3)) 
                                                                    )
                          )             

 but the mobile no bit doesn't do what I need. Also we have lots of blanks. I cannot work out anyway to automatically exclude the Barkers (as the mobiles are all different).  If one mobile is blank and other is populated that would need ot be flagged, so they are onyl not counted if both have the data AND it is different.

 

Hopefully someone more intelligent can point me in the right direction.

 

Thnaks for any help

 

Mike

5 REPLIES 5
Phil_Seamark
Employee
Employee

Hi @masplin,

 

This calculated column might be a start.  It is a way of identifying the barkers as being safe to exclude based on the number of distinct mobiles to the number of rows.  This column could then be used in a new filter to help you fine tune.

 

Customers To Exclude = 
var ColAddress = 'Customer'[Address]
var ColPostCode = 'Customer'[Post Code]
var RowsPerCustomer = 
                CALCULATE(
                            COUNTROWS('Customer'),
                            FILTER(
                                ALL(Customer),
                                'Customer'[Address] = ColAddress &&
                                'Customer'[Post Code] = ColPostCode
                                )
                            )
                            
var DistinctMobilesPerCustomer =  
                CALCULATE(
                            DISTINCTCOUNT('Customer'[Mobile Phone No]),
                            FILTER(
                                ALL(Customer),
                                'Customer'[Address] = ColAddress &&
                                'Customer'[Post Code] = ColPostCode && 
                                'Customer'[Mobile Phone No] <> BLANK()
                                )
                            )
RETURN  IF(RowsPerCustomer= DistinctMobilesPerCustomer,1,0)    

To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

Hi Phil

 

Is the clause 

 'Customer'[Address] = ColAddress

doing somethnig similar to the use of EARLIER i.e grouping them by these fuields with common entries?

 

 So i think if ALL the cusotmers in the rowspercusmter group are differnet it would exclude them e.g. Barker, but if there are either 2 mobiles the same OR one mobile blank it would not?

 

Seems brilliant!!!!

HI @masplin

 

Yeah, the syntax I have used is just the same as using the EARLIER function.  I use it because I find it slightly more readable but it's functionally the same.


To learn more about DAX visit : aka.ms/practicalDAX

Proud to be a Datanaut!

Hi Phil.  i see how it works in principal but it doesn't work in practice. i have 113,000 customers 32GB of RAM 64 bit and it just can't complete the distinct calc.  I just tried this simple version and it never completes. Thisi trying find cusotmers with same mobile, gender and surname, but not where the initial is the non-blank and the same.

 

Guess no other optins on this?

 

=                             
              CALCULATE(
                        DISTINCTCOUNT(Customer[Initial]),
                        FILTER(
                                       Customer,
                                       Customer[Mobile Phone No]=EARLIER(Customer[Mobile Phone No])  &&
                                       Customer[Mobile Phone No]<>BLANK() &&
                                       Customer[Sex Code]=EARLIER(Customer[Sex Code]) &&
                                      OR( LEFT(Customer[Surname],3)=LEFT(EARLIER(Customer[Surname]),3), RIGHT(Customer[Surname],3)=RIGHT(EARLIER(Customer[Surname]),3)) &&
                                     Customer[Initial]<>BLANK()
                                       )
                          )             

Hi @masplin,

 

Maybe you can try to add more than one Calculated Columns to get this work, which would calculate it step by step rather than one big step. Maybe this could work once a step.

1. Add a Calculated Column "Tag".

Tag =
IF (
    Customer[Mobile Phone No] <> BLANK ()
        && Customer[Initial] <> BLANK (),
    1,
    0
)

2. Add another Calculated Column.

 =
VAR mpn = Customer[Mobile Phone No]
VAR sc = Customer[Sex code]
RETURN
    CALCULATE (
        DISTINCTCOUNT ( Customer[Initial] ),
        FILTER (
            Customer,
            Customer[Mobile Phone No] = mpn
                && Customer[Sex Code] = sc
                && Customer[Tag] = 1
&& OR ( LEFT ( Customer[Surname], 3 ) = LEFT ( [Surname], 3 ), RIGHT ( Customer[Surname], 3 ) = RIGHT ( [Surname], 3 ) ) ) )

Best Regards!

Dale

Community Support Team _ Dale
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.