cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
mhutchens81 Regular Visitor
Regular Visitor

Measure equavalent for string similarity formula

I'm currently using the formula below to compare strings from two columns and give me a line-by-line % match:

let
Source = Text.ToList([Name1]),
Reference = Text.ToList([Name2]),
SourceCount = List.Count(Source),
SourceNotInReferenceCount = List.Count(List.Difference(Source, Reference, Comparer.OrdinalIgnoreCase)),
PercentSourceInReference = 1 - (SourceNotInReferenceCount / SourceCount)
in
PercentSourceInReference

 

However, I now want to apply the same formula to a calculated table, but I can't, because I'm not able to use Power Query. Is there a way to achieve the same outcome with a Measure?


Any help that can be provided would really be appreciated.

1 ACCEPTED SOLUTION

Accepted Solutions
Super User
Super User

Re: Measure equavalent for string similarity formula

@mhutchens81

 

It is remarkable how much more adept than DAX M is at cases like this. What in the query editor looks relatively straightforward becomes rather irksome here. You can create your calculated column in the table you show (Table1):

 

Edited: Removed var _Word2 which was no longer necessary

 

Similarity =
VAR _String1 =
    LOWER ( Table1[Name1] )
VAR _String2 =
    LOWER ( Table1[Name2] )
VAR _Word1 =
    ADDCOLUMNS (
        GENERATESERIES ( 1; LEN ( _String1 ) );
        "Letter"; MID ( _String1; [Value]; 1 )
    )
VAR _SourceNotInReferenceCount =
    SUMX (
        SUMMARIZE (
            _Word1;
            [Letter];
            "Occurrences"; MAX (
                ( LEN ( _String1 ) - LEN ( SUBSTITUTE ( _String1; [Letter]; "" ) ) )
                    - ( LEN ( _String2 ) - LEN ( SUBSTITUTE ( _String2; [Letter]; "" ) ) );
                0
            )
        );
        [Occurrences]
    )
VAR _SourceCount =
    LEN ( _String1 )
VAR _PercentSourceInReference =
    1 - DIVIDE ( _SourceNotInReferenceCount; _SourceCount )
RETURN
    _PercentSourceInReference

 

 

Code formatted with   www.daxformatter.com

View solution in original post

4 REPLIES 4
Super User
Super User

Re: Measure equavalent for string similarity formula

Hi @mhutchens81

 

It would help immensly if you showed an example based on your data with the expected result. So that what you need is clear.

Additionally, please always show your sample data in text-tabular format in addition to (or instead of) the screen captures. That allows people trying to help to readily copy the data and run a quick test, plus it increases the likelihood of your question being answered. Just use 'Copy table' in Power BI and paste it here.

 

 

That said, and if I understand your requirement correctly, try this, where Table1 is the table that contains the two columns [Name1] and [Name2]

 

 

Measure =
VAR SourceCount =
    COUNTROWS ( Table1 )
VAR SourceNotInReferenceCount =
    COUNTROWS ( FILTER ( Table1, Table1[Name1] = Table1[Name2] ) )
VAR PercentSourceInReference =
    1 - DIVIDE ( SourceNotInReferenceCount, SourceCount )
RETURN
    PercentSourceInReference

or, in another interpretation, since duplicate values are supported in List.Difference:

 

Measure =
VAR SourceCount =
    COUNTROWS ( Table1 )
VAR SourceNotInReferenceCount =
    SUMX (
        DISTINCT ( Table1[Name1] ),
        MIN (
            CALCULATE ( COUNT ( Table1[Name1] ) )
                - CALCULATE (
                    COUNT ( Table1[Name2] ),
                    FILTER ( Table1, Table1[Name2] = EARLIER ( Table1[Name1] ) )
                ),
            0
        )
    )
VAR PercentSourceInReference =
    1 - DIVIDE ( SourceNotInReferenceCount, SourceCount )
RETURN
    PercentSourceInReference
Highlighted
mhutchens81 Regular Visitor
Regular Visitor

Re: Measure equavalent for string similarity formula

Hi @AlB, thanks or your response, and your advice. Apologies for not being as clear as I could have been. I've mocked up some example results using the initial formula, the results of which are below:

 

Name1                                Name2                                Similarity

CatCat1
DogDog1
TreeTrees1
MarkedMarking0.666666667
HelloGoodbye0.4
Hello how are youHello how are you1
Goodbye how are youGoodbye how are you1
Stop to smell the flowersI don't care about flowers0.64
Knock on the doorRattle the window latch0.647058824

 

I've had a play around with your measures (thanks again), but unfortunately all I'm getting is a similarity rating of '1' for each line. I think I wasn't as clear as I should have been with my requirements.

 

Hopefully this helps to clear things up a bit?

Super User
Super User

Re: Measure equavalent for string similarity formula

@mhutchens81

 

It is remarkable how much more adept than DAX M is at cases like this. What in the query editor looks relatively straightforward becomes rather irksome here. You can create your calculated column in the table you show (Table1):

 

Edited: Removed var _Word2 which was no longer necessary

 

Similarity =
VAR _String1 =
    LOWER ( Table1[Name1] )
VAR _String2 =
    LOWER ( Table1[Name2] )
VAR _Word1 =
    ADDCOLUMNS (
        GENERATESERIES ( 1; LEN ( _String1 ) );
        "Letter"; MID ( _String1; [Value]; 1 )
    )
VAR _SourceNotInReferenceCount =
    SUMX (
        SUMMARIZE (
            _Word1;
            [Letter];
            "Occurrences"; MAX (
                ( LEN ( _String1 ) - LEN ( SUBSTITUTE ( _String1; [Letter]; "" ) ) )
                    - ( LEN ( _String2 ) - LEN ( SUBSTITUTE ( _String2; [Letter]; "" ) ) );
                0
            )
        );
        [Occurrences]
    )
VAR _SourceCount =
    LEN ( _String1 )
VAR _PercentSourceInReference =
    1 - DIVIDE ( _SourceNotInReferenceCount; _SourceCount )
RETURN
    _PercentSourceInReference

 

 

Code formatted with   www.daxformatter.com

View solution in original post

mhutchens81 Regular Visitor
Regular Visitor

Re: Measure equavalent for string similarity formula

"It is glaring how much more adept than DAX M is for cases like this. What in the query editor looks relatively straightforward becomes rather irksome here."

 

Looking at your solution, the statement above appears to be a bit of an understatement! And that's not meant to sound like a criticism - the solution works perfectly, thank you so much @AlB Smiley Happy

Helpful resources

Announcements
Can You Solve These Challenge

Challenge: Can You Solve These?

Find out how to participate in the first Power BI 'Can You Solve These?' challenge.

Community News & Announcements

Community News & Announcements

Get your latest community news and announcements.

Virtual Launch Event

Microsoft Business Applications October Virtual Launch Event

Join us for an in-depth look at the new innovations across Dynamics 365 and the Microsoft Power Platform.

Community Kudopalooza

Win Power BI Swag with Community Kudopalooza!

Each week, complete activities and be qualified in the drawing for cool Power BI Swag.

Users Online
Currently online: 59 members 1,222 guests
Please welcome our newest community members: