Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Compare and Bulk Replace values based on two different tables (text errors, typos, nomenclatures)

Hi,

I have a table with multiple columns and in one column (it is not formatted as table) I have more than 62000 rows. this column called Resource_column is full of errors, typos, nomenclatures, etc. I also another column called Reference_Column which is formated as table. I have to compare my Resource_column with Reference_column and replace all the errors with good ones based on the Resource_column. I have put the link to the file here: 

 

https://docs.google.com/spreadsheets/d/1RT9d8tJjvM_c23y54Yg3ASj9svxOut5Z/edit?usp=sharing&ouid=10542...

 

and added an screenshots to show the difference:

bestmbaman_0-1631214743666.png

 

bestmbaman_0-1631224036227.png

 

bestmbaman_1-1631224202123.png

 

 

 

Notice: the blank rows in the Resource_Column should not be filled at the end of replacing.

 

Thanks in advance!

 

Edit: I really appreciate your support as it looks pretty complicated issue. The main problem is that I am looking for a way to not go into Bulk Find and Replace alley as I have a lot of values to be fixed (it is not possible to make a complete find column and use it with replace)

@Greg_Deckler  ,  @amitchandak , @parry2k , @jdbuchanan71 , @ryan_mayu  , @MFelix , @selimovd 

 

9 REPLIES 9
v-shex-msft
Community Support
Community Support

HI @Anonymous,

According to your description, I think your requirement should more suitable to use AI training of text analysis instead of power query/Dax.

For this scenario, I'd like to suggest your or do data processing on the source data out of power bi side or try to use a web connector to invoke the external API of AI analysis.

Regards,
Xiaoxin Sheng

Community Support Team _ Xiaoxin
If this post helps, please consider accept as solution to help other members find it more quickly.
Anonymous
Not applicable

Thank you for the reply!

Would you suggest the best API or AI app/software to take advantage as this is not a recurring job but an ad hoc report!

bestmbaman

Hi @Anonymous,

If you are not familiar with these parts, I'd like to suggest you use the third-party API(e.g MS partner that provides the text analysis service) for these types of AI text analysis services.
Then you only need to use raw records to invoke the API to get the fixed results without complex training and learning steps.

Regards,

Xiaoxin Sheng

Community Support Team _ Xiaoxin
If this post helps, please consider accept as solution to help other members find it more quickly.
Greg_Deckler
Super User
Super User

@Anonymous I would try the fuzzy matching in Merge queries first. If that fails then you can look at my very flexible and tunable DAX Fuzzy matching algorithm here: Fuzzy - Microsoft Power BI Community.

 

Either way though, I agree with @lbendlin that you won't be able to identify true errors unless there is some other data you have that can identify that.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...
Anonymous
Not applicable

@Greg_Deckler  Thank you Greg!

by other Data, what exactly you mean?! I already shared a reference table (with only one column) that contains all the correct names. The challange is how I can figure multiple type of errors and at the same time how to match two different columns with different number of rows? 

Assuming that I figured the errors and fixed them , how can I replace the good values in a column with 406 rows to a column with +62000 rows?!

After all, what programming language other than DAX can be used here if DAX falls short here?

 

Please don't Archive this question and let other ones to try their opinion, I believe this challange comming from the real world business is not just mine.

 

Dear:

  @amitchandak , @parry2k , @jdbuchanan71 , @ryan_mayu  , @MFelix , @selimovd 

The real challange is here, I would kindly appreciate your opinion

 

Thank you @Greg_Deckler , @lbendlin  very much

@Anonymous By the data I mean I don't want to have to type all of your data from a screen shot but rather just copy and paste it. In theory you have access to Power Query and could use that to solve this problem, particularly the fuzzy matching options when performing a Merge query. Outside of that you could also possibly do this in R or Python within Power Query as well. But, once you get to the Desktop itselft then yes it is DAX for the most part. And thus you could potentially use my very flexible and tunable DAX Fuzzy matching algorithm here: Fuzzy - Microsoft Power BI Community.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...
lbendlin
Super User
Super User

I think you have bigger problems than that.  How is the script supposed to be able to tell if it is a typo or an error?  Which similarity algorithm are you planning to employ?  Soundex? Levenshtein ? some other one?

 

String similarity — the basic know your algorithms guide! | by Mohit Mayank | ITNEXT

 

 

Anonymous
Not applicable

Thanks you @lbendlin ,

I am totally confused with this challange! 

In this case what is the best solution you can think of? 

Also please note that this is just a sample of the real data set manipulated and transformed only for this community to show the real problem and seek solution!

Looking forward to hearing from you

Thanks,

 

Also calling on all great heros:

@Greg_Deckler  ,  @amitchandak , @parry2k , @jdbuchanan71 , @ryan_mayu  , @MFelix , @selimovd 

There is no magic to this - you and your requestor need to agree on which algorithm to use, and what the cutoffs should be.  The shorter the words the more risky/useless the algorithms will become.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.