Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
I have data from a retailer, and there are multiple items with the same or similar description that in some cases have the same item number, or in others have a different item number. Typically the duplicates are in the system due to carton qty changes or some other attribute change, however at the end of the day they are the same item.
How do I combine these items (in some cases I have as many as 24 of the same like item) so that when I create a table with sales info I do not get duplicate rows of the same items sales? I would like to condesne to one row per item not the multiple rows due to the duplicates.
I would apprecite any help the community can provide.
Depending on the nature of the data and volume I'd consider....
i) if you've got every possible combination already, and it's not too onerous a task, you could create a reference table of cleansed values to join to get your cleansed version.
Item Number | Item Description | New Description |
12345 | Potato Peeler | Potato Peeler |
12345 | Potatoe Peeler | Potato Peeler |
54321 | Potato Peeler 6 pk | Potato Peeler |
ii) if it is a huge list of items or you'll be ingesting new records that could include new values, you might want to try and do some string comparison but this isn't without risk as would never be perfect. Have a look at things like Levenshtein Distance on https://en.wikipedia.org/wiki/String_metric and see if something like that would work. This is a good example https://community.powerbi.com/t5/Desktop/Levenshtein-String-Distance-Algorithm-In-DAX/m-p/959545.
iii) Failing that you need some data quality management upstream 🙂
Good luck!
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
110 | |
95 | |
76 | |
65 | |
51 |
User | Count |
---|---|
146 | |
109 | |
106 | |
88 | |
61 |