Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
mbegg
Helper II
Helper II

Identify "very-close" duplicates

I have a list of 19,000 business names with some "very-close" duplicates. 

 

E.g. XYZ Pty Ltd and XYZ Pty. Ltd. or ABCDE and ABCD

 

There is no logic to the differences so I can't just find & replace all the . from Pty. Ltd. and fix all of the duplicates. 

 

Is there a way to identify the "very-close" duplicates. I am thinking of function that would identify if the current value is the same as another value in the list except for 1 or 2 or 3 or x characters.   

1 ACCEPTED SOLUTION
v-sihou-msft
Employee
Employee

@mbegg

 

Since there's no logic on the difference between those "very-close" duplicates, it's not possible to identify those duplicates via Power Query/DAX. I suggest you try some Text Analysis API to achieve your goal.

 

Regards,

View solution in original post

3 REPLIES 3
v-sihou-msft
Employee
Employee

@mbegg

 

Since there's no logic on the difference between those "very-close" duplicates, it's not possible to identify those duplicates via Power Query/DAX. I suggest you try some Text Analysis API to achieve your goal.

 

Regards,

This isn't a solution! While Fuzzy match in PBI has been great, it doesn't handle fuzzy duplicates in a single column and therefore this post is not solved.

@v-sihou-msft

 

I now know that what I was trying to describe is called "fuzzy match" in the data analytics space. I will add this as a development idea

Helpful resources

Announcements
LearnSurvey

Fabric certifications survey

Certification feedback opportunity for the community.

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.