Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
TheOriginalDeeb
Frequent Visitor

Find & Count Matching Words in Lists

I'm having troble understanding List functions, which I think will be necessary to find a solution to my problem.

 

So I have basically 2 table columns: Phrase & Words

 

So for example, phrase 1 = '10 best aristotle quotes' and what I want to do is to compare that to a list of words AND count how many of those words appear in the phrase and create a percentage as the desired calculation

 

Phrase: 10 best aristotle quotes

Words: best quotes

Match: 2

Percentage: 50% (2 matches/4 words in phrase)

I'd appreaciate all ideas for a Power Query solution.

Also, if this explanation is not clear, please let me know and I will try to make it clearer. Thanks!

 

1 ACCEPTED SOLUTION

Hi @TheOriginalDeeb.

 

If you prefer M code solution. Then @Greg_Deckler's solution will be a great choice.

 

1. Go to Edit Queries => New Query, Blank Query => Advanced Editor. 

 

let
    fnCountMatches = (List1 as list, List2 as list) =>

let
    MyList = List1,  //{"ten","best","aristotle","quotes"},
    MyOtherList = List2,  //{"best","quotes"},
    
    MyNumerator = List.Count(List.Difference(MyList, MyOtherList)),
    MyDenominator = List.Count(MyList),
    MyResult = MyNumerator / MyDenominator

in
    MyResult
in
    fnCountMatches

2. Then go back to your source table. Split the columns first, then create a new custom column and invoke this function with expression:

 

fnCountMatches(#"Changed Type2"[Phrase],#"Changed Type2"[Words])

3.PNG

 

The entire M code:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjRQSEotLlFILMosLskvyUlVKCzNL0ktVtJRAotDebGxAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Phrase = _t, Words = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Phrase", type text}, {"Words", type text}}),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Changed Type", {{"Phrase", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Phrase"),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Phrase", type text}}),
    #"Split Column by Delimiter1" = Table.ExpandListColumn(Table.TransformColumns(#"Changed Type1", {{"Words", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Words"),
    #"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Words", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type2", "Custom", each fnCountMatches(#"Changed Type2"[Phrase],#"Changed Type2"[Words])),
    #"Changed Type3" = Table.TransformColumnTypes(#"Added Custom",{{"Custom", Percentage.Type}})
in
    #"Changed Type3"

Please be noticed that custom M function cannot be used in Power BI Service currently.

 

Thanks,
Xi Jin.

 

 

View solution in original post

7 REPLIES 7
v-xjiin-msft
Solution Sage
Solution Sage

Hi @TheOriginalDeeb,

 

To achieve your requirement, you can refer to following method:

 

Sample data is like:

 

1.PNG

 

Go to Edit Queries -> Select column Phrase and choose Split Column option. Same to column Words

 

2.PNG3.PNG

 

Then Close & Apply.  Create a measure or a calculated column with following expression to calculate the Percentage:

 

Percentage =
CALCULATE (
    DISTINCTCOUNT ( 'Table'[Words] ),
    FILTER ( 'Table', 'Table'[Phrase] = 'Table'[Words] ),
    ALLEXCEPT ( 'Table', 'Table'[ID] )
)
    / CALCULATE (
        DISTINCTCOUNT ( 'Table'[Phrase] ),
        ALLEXCEPT ( 'Table', 'Table'[ID] )
    )

 

4.PNG

 

Thanks,
Xi Jin.

Thanks so much for taking the time to work on this and help me. 2 slight problems....

 

1. I'd really prefer an M code solution

 

2. I don't want to compare both columns on an individual word level, just phrase to word.

 

I've been able to use Text.Split to add 2 columns with the contents split out into Lists. But I'm stuck on how to count and match

Hi @TheOriginalDeeb.

 

If you prefer M code solution. Then @Greg_Deckler's solution will be a great choice.

 

1. Go to Edit Queries => New Query, Blank Query => Advanced Editor. 

 

let
    fnCountMatches = (List1 as list, List2 as list) =>

let
    MyList = List1,  //{"ten","best","aristotle","quotes"},
    MyOtherList = List2,  //{"best","quotes"},
    
    MyNumerator = List.Count(List.Difference(MyList, MyOtherList)),
    MyDenominator = List.Count(MyList),
    MyResult = MyNumerator / MyDenominator

in
    MyResult
in
    fnCountMatches

2. Then go back to your source table. Split the columns first, then create a new custom column and invoke this function with expression:

 

fnCountMatches(#"Changed Type2"[Phrase],#"Changed Type2"[Words])

3.PNG

 

The entire M code:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMjRQSEotLlFILMosLskvyUlVKCzNL0ktVtJRAotDebGxAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Phrase = _t, Words = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Phrase", type text}, {"Words", type text}}),
    #"Split Column by Delimiter" = Table.ExpandListColumn(Table.TransformColumns(#"Changed Type", {{"Phrase", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Phrase"),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Phrase", type text}}),
    #"Split Column by Delimiter1" = Table.ExpandListColumn(Table.TransformColumns(#"Changed Type1", {{"Words", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), "Words"),
    #"Changed Type2" = Table.TransformColumnTypes(#"Split Column by Delimiter1",{{"Words", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type2", "Custom", each fnCountMatches(#"Changed Type2"[Phrase],#"Changed Type2"[Words])),
    #"Changed Type3" = Table.TransformColumnTypes(#"Added Custom",{{"Custom", Percentage.Type}})
in
    #"Changed Type3"

Please be noticed that custom M function cannot be used in Power BI Service currently.

 

Thanks,
Xi Jin.

 

 

Thanks Xi Jin.

 

I'm still new to Power Query, and not too good with functions and parameters. I can follow Greg's example with the hard-coded list.

But I'm not sure how to pass in my own data (from a table) into the function.

 

I did get the formulas to work for the most part. But there is a problem with stemming. It does not match plural to singular (It won't find 'quote' in 'quotes')

 

I'm wondering if there any other List functions that might work?

Hi @TheOriginalDeeb,

 

You can refer to my sample report here: https://1drv.ms/u/s!AlqSnZZUVHmshBwJn8RyiDli-bVl

 

And for "quote" and "quotes". As I know there's no List function which can do the job. To resolve this issue, I think you need to hard code these words like create a new custom column with expression like below, then pass new column to the function. 

 

if Text.From([Phrase])="quotes" then "quote" else [Phrase]

Thanks,
Xi Jin.

 

Greg_Deckler
Super User
Super User

So, something like this?

 

let
    fnCountMatches = () =>

let
    MyList = {"ten","best","aristotle","quotes"},
    MyOtherList = {"best","quotes"},
    
    MyNumerator = List.Count(List.Difference(MyList, MyOtherList)),
    MyDenominator = List.Count(MyList),
    MyResult = MyNumerator / MyDenominator

in
    MyResult
in
    fnCountMatches

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

Hi Greg,

 

I haven't used a lot of functions, so I'm not sure about this setup for analyzing multiple phrases.

 

On thing that I should have pointed out before is that the list or [Words] will vary by row most of the time.

 

So, if there are 100 rows in the [Phrase column], I'll also have 100 different lists in each row of [Words].

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.