Solved: Find & Count Matching Words in Lists

TheOriginalDeeb · ‎03-17-2018

I'm having troble understanding List functions, which I think will be necessary to find a solution to my problem.

So I have basically 2 table columns: Phrase & Words

So for example, phrase 1 = '10 best aristotle quotes' and what I want to do is to compare that to a list of words AND count how many of those words appear in the phrase and create a percentage as the desired calculation

Phrase: 10 best aristotle quotes

Words: best quotes

Match: 2

Percentage: 50% (2 matches/4 words in phrase)

I'd appreaciate all ideas for a Power Query solution.

Also, if this explanation is not clear, please let me know and I will try to make it clearer. Thanks!

v-xjiin-msft · ‎03-20-2018

Hi @TheOriginalDeeb.

If you prefer M code solution. Then @Greg_Deckler's solution will be a great choice.

1. Go to Edit Queries => New Query, Blank Query => Advanced Editor.

let
    fnCountMatches = (List1 as list, List2 as list) =>

let
    MyList = List1,  //{"ten","best","aristotle","quotes"},
    MyOtherList = List2,  //{"best","quotes"},
    
    MyNumerator = List.Count(List.Difference(MyList, MyOtherList)),
    MyDenominator = List.Count(MyList),
    MyResult = MyNumerator / MyDenominator

in
    MyResult
in
    fnCountMatches

2. Then go back to your source table. Split the columns first, then create a new custom column and invoke this function with expression:

fnCountMatches(#"Changed Type2"[Phrase],#"Changed Type2"[Words])

The entire M code:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjRQSEotLlFILMosLskvyUlVKCzNL0ktVtJRAotDebGxAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Phrase = _t, Words = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Phrase", type text}, {"Words", type text}}),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Changed Type", {{"Phrase", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Phrase"),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Phrase", type text}}),
    #"Split Column by Delimiter1" = Table.ExpandListColumn(Table.TransformColumns(#"Changed Type1", {{"Words", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Words"),
    #"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Words", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type2", "Custom", each fnCountMatches(#"Changed Type2"[Phrase],#"Changed Type2"[Words])),
    #"Changed Type3" = Table.TransformColumnTypes(#"Added Custom",{{"Custom", Percentage.Type}})
in
    #"Changed Type3"

Please be noticed that custom M function cannot be used in Power BI Service currently.

Thanks,
Xi Jin.

View solution in original post

v-xjiin-msft · ‎03-19-2018

Hi @TheOriginalDeeb,

To achieve your requirement, you can refer to following method:

Sample data is like:

Go to Edit Queries -> Select column Phrase and choose Split Column option. Same to column Words

Then Close & Apply. Create a measure or a calculated column with following expression to calculate the Percentage:

Percentage =
CALCULATE (
    DISTINCTCOUNT ( 'Table'[Words] ),
    FILTER ( 'Table', 'Table'[Phrase] = 'Table'[Words] ),
    ALLEXCEPT ( 'Table', 'Table'[ID] )
)
    / CALCULATE (
        DISTINCTCOUNT ( 'Table'[Phrase] ),
        ALLEXCEPT ( 'Table', 'Table'[ID] )
    )

Thanks,
Xi Jin.

TheOriginalDeeb · ‎03-19-2018

Thanks so much for taking the time to work on this and help me. 2 slight problems....

1. I'd really prefer an M code solution

2. I don't want to compare both columns on an individual word level, just phrase to word.

I've been able to use Text.Split to add 2 columns with the contents split out into Lists. But I'm stuck on how to count and match

v-xjiin-msft · ‎03-20-2018

Hi @TheOriginalDeeb.

If you prefer M code solution. Then @Greg_Deckler's solution will be a great choice.

1. Go to Edit Queries => New Query, Blank Query => Advanced Editor.

let
    fnCountMatches = (List1 as list, List2 as list) =>

let
    MyList = List1,  //{"ten","best","aristotle","quotes"},
    MyOtherList = List2,  //{"best","quotes"},
    
    MyNumerator = List.Count(List.Difference(MyList, MyOtherList)),
    MyDenominator = List.Count(MyList),
    MyResult = MyNumerator / MyDenominator

in
    MyResult
in
    fnCountMatches

2. Then go back to your source table. Split the columns first, then create a new custom column and invoke this function with expression:

fnCountMatches(#"Changed Type2"[Phrase],#"Changed Type2"[Words])

The entire M code:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjRQSEotLlFILMosLskvyUlVKCzNL0ktVtJRAotDebGxAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Phrase = _t, Words = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Phrase", type text}, {"Words", type text}}),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Changed Type", {{"Phrase", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Phrase"),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Phrase", type text}}),
    #"Split Column by Delimiter1" = Table.ExpandListColumn(Table.TransformColumns(#"Changed Type1", {{"Words", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Words"),
    #"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Words", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type2", "Custom", each fnCountMatches(#"Changed Type2"[Phrase],#"Changed Type2"[Words])),
    #"Changed Type3" = Table.TransformColumnTypes(#"Added Custom",{{"Custom", Percentage.Type}})
in
    #"Changed Type3"

Please be noticed that custom M function cannot be used in Power BI Service currently.

Thanks,
Xi Jin.

TheOriginalDeeb · ‎03-20-2018

Thanks Xi Jin.

I'm still new to Power Query, and not too good with functions and parameters. I can follow Greg's example with the hard-coded list.

But I'm not sure how to pass in my own data (from a table) into the function.

I did get the formulas to work for the most part. But there is a problem with stemming. It does not match plural to singular (It won't find 'quote' in 'quotes')

I'm wondering if there any other List functions that might work?

v-xjiin-msft · ‎03-21-2018

Hi @TheOriginalDeeb,

You can refer to my sample report here: https://1drv.ms/u/s!AlqSnZZUVHmshBwJn8RyiDli-bVl

And for "quote" and "quotes". As I know there's no List function which can do the job. To resolve this issue, I think you need to hard code these words like create a new custom column with expression like below, then pass new column to the function.

if Text.From([Phrase])="quotes" then "quote" else [Phrase]

Thanks,
Xi Jin.

Greg_Deckler · ‎03-18-2018

So, something like this?

let
    fnCountMatches = () =>

let
    MyList = {"ten","best","aristotle","quotes"},
    MyOtherList = {"best","quotes"},
    
    MyNumerator = List.Count(List.Difference(MyList, MyOtherList)),
    MyDenominator = List.Count(MyList),
    MyResult = MyNumerator / MyDenominator

in
    MyResult
in
    fnCountMatches

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...