Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Find and Replace on a Large Scale

Hello, 

 

I am trying to clean a lot of data. In order to do a proper analysis over time I need to identify how many times a company shows up in our data. The trouble is, over the years people have inputted their company name a variety of ways. I thought about building out a conditional column but there are well over 1,000 companies on the lookup table and hundreds of thousands in the data table. I feel like Find and Replace would just take too long. I have an inkling that some code may be needed but I'm just not sure how to go about it. I have put an example below. Thank you for any help you can provide. 

 

List of Companies (Sample Lookup):

Cheerios 
Honey Nut Cheerios
Frosted Flakes
Honey Bunches of Oats
Cinnamon Toast Crunch
Lucky Charms 
Froot Loops 
Frosted Mini Wheats
Life
Fruity Pebbles
Raisin Bran
Special K 
Rice Krispies
Special K Red Berries
Reese’s Puffs
Raisin Bran Crunch
Cap’n Crunch
Apple Jacks
Corn Flakes

 

Sample Data (Data Table):

Cheerios Inc
HoneyNut Cheerios
Frosted Flakees
FrostedFlake
Bunches of Oats, Honey
Honey Bunches of Oats
Toast Crunsh
Cinnamon Toast Crunch
LuckyCharms
Lucky Charms 
Lucky Charm
Froot Loops 
FrootLoops
Mini Wheats
Frosted Mini Wheats
Life
Fruity Pebbles
Raisin Bran
Special K 
Rice Krispies
Special K Red Berries
Reese’s Puffs
Raisin Bran Crunch
Cap’n Crunch
Apple and Jacks
Jacks, Apple
Apple Jacks
Corn Flakess
Corn Flakes
1 ACCEPTED SOLUTION
Greg_Deckler
Super User
Super User

@Anonymous Try using a Merge query and use the Fuzzy logic option. If that fails, I did once write a really flexible fuzzy matching DAX measure if you can believe it. Fuzzy - Microsoft Power BI Community It was for a very similar scenario and the Fuzzy matching in PQ just wasn't "tunable" enough.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

View solution in original post

3 REPLIES 3
v-yingjl
Community Support
Community Support

Hi @Anonymous ,

It depends on how to want to define the match standards so that you can use fuzzy match in Power Query as Greg mentioned.

 

You can refer this community blog which introduces fuzzy match in Power Query:

How to fuzzy match the dirty data and horizontal display the corresponding table 

 

Best Regards,
Community Support Team _ Yingjie Li
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Greg_Deckler
Super User
Super User

@Anonymous Try using a Merge query and use the Fuzzy logic option. If that fails, I did once write a really flexible fuzzy matching DAX measure if you can believe it. Fuzzy - Microsoft Power BI Community It was for a very similar scenario and the Fuzzy matching in PQ just wasn't "tunable" enough.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...
Anonymous
Not applicable

Thank you very much! This was incredibly helpful. @Greg_Deckler 

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors