cancel
Showing results for 
Search instead for 
Did you mean: 

Fuzzy Merge ignores spaces after special characters

PowerBI and also Excel's Fuzzy Merge will ignore spaces after a special character even if the similarity threshold is set to 1 and should only ignore capitalization.

As seen in the following example from Excel it maches all rows except for the second one since there's a space between "SECOND" and "row".

But the space between "With-" and "Space" is ignored and it still matches with "with- space".

This problem also occurs with other special characters aside from minus.

 

The versions used are:
Power BI Report Server (on premise) Version1.14.8179.37378 (May 2022)
Excel 365 Version 2108 (Build 14326.20962)

 

TPS_0-1655991436978.png

 

Status: Delivered

Hi @TPS,

 

Since “second” and “row” in the source2 are glued together, Power BI will treat it as an entire text part. But for the “SECOND row”, they will be treated as two text parts: SECOND and row. That’s why they are not matched.

Here is a sample can help you understand the working principal.

vcazhengmsft_0-1656057299357.png

 

vcazhengmsft_1-1656057299360.png

 

vcazhengmsft_2-1656057299366.png

 

If you would like to make the “SECOND row” match with “secondrow”, you could merge like this: delete 1 from Similarity threshold(optional) input box and tick Match by combining text parts.

Match by combining text parts option will try to combine text parts such as “Micro” and “soft” into “Microsoft” to find matches. But the 1 Similarity will only allow exact matches which will prevent the match between “SECOND row” and “secondrow”.

vcazhengmsft_3-1656057299369.png

 

vcazhengmsft_4-1656057299372.png

 

Best Regards,

Community Support Team _ Caiyun

Comments
v-cazheng-msft
Community Support
Status changed to: Delivered

Hi @TPS,

 

Since “second” and “row” in the source2 are glued together, Power BI will treat it as an entire text part. But for the “SECOND row”, they will be treated as two text parts: SECOND and row. That’s why they are not matched.

Here is a sample can help you understand the working principal.

vcazhengmsft_0-1656057299357.png

 

vcazhengmsft_1-1656057299360.png

 

vcazhengmsft_2-1656057299366.png

 

If you would like to make the “SECOND row” match with “secondrow”, you could merge like this: delete 1 from Similarity threshold(optional) input box and tick Match by combining text parts.

Match by combining text parts option will try to combine text parts such as “Micro” and “soft” into “Microsoft” to find matches. But the 1 Similarity will only allow exact matches which will prevent the match between “SECOND row” and “secondrow”.

vcazhengmsft_3-1656057299369.png

 

vcazhengmsft_4-1656057299372.png

 

Best Regards,

Community Support Team _ Caiyun

TPS
Regular Visitor

Hello @v-cazheng-msft,

sorry my explanation was a bit confusing.

The bug I'm trying to report is about the last row "With- space" that matches with "with-space" even though the second entry does not have a space. 

I only included other examples to say that in normal cases the space is not being ignored (as it shouldn't).

Regards,

TPS