Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
I have a list of 19,000 business names with some "very-close" duplicates.
E.g. XYZ Pty Ltd and XYZ Pty. Ltd. or ABCDE and ABCD
There is no logic to the differences so I can't just find & replace all the . from Pty. Ltd. and fix all of the duplicates.
Is there a way to identify the "very-close" duplicates. I am thinking of function that would identify if the current value is the same as another value in the list except for 1 or 2 or 3 or x characters.
Solved! Go to Solution.
Since there's no logic on the difference between those "very-close" duplicates, it's not possible to identify those duplicates via Power Query/DAX. I suggest you try some Text Analysis API to achieve your goal.
Regards,
Since there's no logic on the difference between those "very-close" duplicates, it's not possible to identify those duplicates via Power Query/DAX. I suggest you try some Text Analysis API to achieve your goal.
Regards,
This isn't a solution! While Fuzzy match in PBI has been great, it doesn't handle fuzzy duplicates in a single column and therefore this post is not solved.
I now know that what I was trying to describe is called "fuzzy match" in the data analytics space. I will add this as a development idea
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
114 | |
97 | |
85 | |
70 | |
61 |
User | Count |
---|---|
151 | |
120 | |
103 | |
87 | |
68 |