Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
I have a list of 19,000 business names with some "very-close" duplicates.
E.g. XYZ Pty Ltd and XYZ Pty. Ltd. or ABCDE and ABCD
There is no logic to the differences so I can't just find & replace all the . from Pty. Ltd. and fix all of the duplicates.
Is there a way to identify the "very-close" duplicates. I am thinking of function that would identify if the current value is the same as another value in the list except for 1 or 2 or 3 or x characters.
Solved! Go to Solution.
Since there's no logic on the difference between those "very-close" duplicates, it's not possible to identify those duplicates via Power Query/DAX. I suggest you try some Text Analysis API to achieve your goal.
Regards,
Since there's no logic on the difference between those "very-close" duplicates, it's not possible to identify those duplicates via Power Query/DAX. I suggest you try some Text Analysis API to achieve your goal.
Regards,
This isn't a solution! While Fuzzy match in PBI has been great, it doesn't handle fuzzy duplicates in a single column and therefore this post is not solved.
I now know that what I was trying to describe is called "fuzzy match" in the data analytics space. I will add this as a development idea
User | Count |
---|---|
102 | |
90 | |
80 | |
71 | |
69 |
User | Count |
---|---|
114 | |
100 | |
97 | |
72 | |
68 |