Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
I would like to extract the keywords that are matched between two columns of text in a dataset, but to have an exact match on the full word only.
See the linked Excel file, the output desired for UID 2 is only "software", not "war, software", and UID 3 would have no match for "drive" in the Abstract as the word there is "driveway"
The example includes keywords “war” and “software”. War should not be considered a match for software.
Using a List approach I can return a TRUE / FALSE for the exact match, but I also want to show the user the keywords that were matched.
#"Added List of Title and Abstract Edited" = Table.AddColumn(#"Inserted Merged Column", "Title and Abstract Edited as List", each Text.Split([Title and Abstract Edited]," ")), #"Find Keywords in Title and Abstract List" = Table.AddColumn(#"Added List of Title and Abstract Edited", "Title and Abstract Edited Keyword Match", each List.ContainsAny([Title and Abstract Edited as List], KeywordsList)),
I have tried two other approaches to obtain the matches but they return the false positives.
#"Added Keyword Matches as list" = Table.AddColumn(#"Filtered Rows1", "Keyword Match List", (x) => List.Select(KeywordsList, each Text.PositionOf(x[#"Title and Abstract Edited"], _) >= 0)), #"Add Abstract Keyword Table" = Table.AddColumn(#"Extracted Values", "Keywords Found", each let CurrentText = [Abstract] in Table.SelectRows(#"Keywords In Use",each Text.Contains(CurrentText, [Keywords], Comparer.OrdinalIgnoreCase))),
Is there a way to return only exact word matches, like the List approach that returns TRUE / FALSE?
Also, this query takes a considerable time to run, with the processing indicating 200MB for the keywords, which are a list of 300 words, so seems like it is looping through that table many times. Is there a better approach to eliminate this?
Appreciate any assistance.
Solved! Go to Solution.
Yes, List.Instersect would do that job here:
#"Added List of Title and Abstract Edited" = Table.AddColumn(#"Inserted Merged Column", "Title and Abstract Edited as List", each Text.Split([Title and Abstract Edited]," ")), #"Find Keywords in Title and Abstract List" = Table.AddColumn(#"Added List of Title and Abstract Edited", "Title and Abstract Edited Keyword Match", each List.Intersect( { [Title and Abstract Edited as List], KeywordsList } ))
Then you can either expand or use Text.Combine to create a string with all keywords.
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
Yes, List.Instersect would do that job here:
#"Added List of Title and Abstract Edited" = Table.AddColumn(#"Inserted Merged Column", "Title and Abstract Edited as List", each Text.Split([Title and Abstract Edited]," ")), #"Find Keywords in Title and Abstract List" = Table.AddColumn(#"Added List of Title and Abstract Edited", "Title and Abstract Edited Keyword Match", each List.Intersect( { [Title and Abstract Edited as List], KeywordsList } ))
Then you can either expand or use Text.Combine to create a string with all keywords.
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
Works perfectly, thank you for the quick response.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
102 | |
48 | |
19 | |
13 | |
11 |