Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Roger_McCallen
Frequent Visitor

Combining like items

I have data from a retailer, and there are multiple items with the same or similar description that in some cases have the same item number, or in others have a different item number. Typically the duplicates are in the system due to carton qty changes or some other attribute change, however at the end of the day they are the same item.

How do I combine these items (in some cases I have as many as 24 of the same like item) so that when I create a table with sales info I do not get duplicate rows of the same items sales? I would like to condesne to one row per item not the multiple rows due to the duplicates.

 

I would apprecite any help the community can provide.

1 REPLY 1
cpearson
Resolver I
Resolver I

Depending on the nature of the data and volume I'd consider....

i) if you've got every possible combination already, and it's not too onerous a task, you could create a reference table of cleansed values to join to get your cleansed version.

 

Item NumberItem DescriptionNew Description
12345Potato PeelerPotato Peeler
12345Potatoe PeelerPotato Peeler
54321Potato Peeler 6 pkPotato Peeler

 

ii) if it is a huge list of items or you'll be ingesting new records that could include new values, you might want to try and do some string comparison but this isn't without risk as would never be perfect. Have a look at things like Levenshtein Distance on https://en.wikipedia.org/wiki/String_metric and see if something like that would work.  This is a good example https://community.powerbi.com/t5/Desktop/Levenshtein-String-Distance-Algorithm-In-DAX/m-p/959545.

 

iii) Failing that you need some data quality management upstream 🙂

 

Good luck!

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.