Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
kscwp20
Regular Visitor

Keyword search of multiple columns with multiple keywords

Hi,

 

Sorry in advance for the rather long question and on the use of terminology (or lack thereof)! I've looked at the similar posts in relation to keywords and they're useful but slightly off - primarily because of the multiple element of my query.

 

I'm trying to perform a keyword search (ideally choosing between fuzzy, contains and identical search matches) in which there are multiple lists (currently 😎 of keywords each containing words or phrases related to the theme of each list. e.g Colours List has red, blue, green; Shapes List has square, circle, yellow etc. Some lists however, contain the same words eg. the Fruits List has orange which also appears in the Colours List.

 

Using each of the lists, I'm trying to search a large dataset and specifically a few columns (currently 2) for which I want to search each record to see if it contains the words of each list.

 

The current solution I'm using involves the creation of these Lists and then using the following M code in order to search the two columns in the dataset- primarily using the Invoke Custom Function ("ICF") feature. Note - 'MyKeywords' in the language below is replaced by the name of the corresponding List.

 

= (String) =>
let
//check if values in MyKeywords is in String
MatchFound = List.Transform(List.Buffer(#"MyKeywords"), each Text.Contains(String, _, Comparer.OrdinalIgnoreCase)),
//index position of match found
Position = List.PositionOf(MatchFound, true),
//return null if Position is negative
Return = if Position < 0 then null else #"MyKeywords"{Position}
in
Return

 

The output of this is then a column where the keyword found related to the function is output against each record or an ERROR if no word is found. What this then rolls up into is 8 different functions which are used against the 2 separate columns giving a total of 16 ICF steps in my query. Currently, for a dummy data set of 20 rows, the time taken to process is fine, however for an end use case of approx. 12k rows I'm concerned it won't be quite as quick.

 

Additionally, I'm trying to also count how many times a keyword appears (across all the lists) so that they can be counted up and totalled - including if a record contains two keywords from one list. Currently if a record contains keywords from different lists, it shows as a "hit" as it will result in a match in the corresponding list output column. However, I can't tell if the record contains two or more keywords from one list (as mentioned previously) or if on rarer occasions, the record contains multiple keywords from multiple lists.

 

The context to understanding what keywords from which list are found along with the number of times each word from what list is found is that a scoring/weighting system is then applied to each of the records - dependent on which list each keyword is from and the number of keywords found in the record.

 

I've experimented with using IF TEXTCONTAINS and SEARCH functions, but being a few days into PowerBI - my experience doesn't take me too far.

 

Ideally, I wouldn't need to create Lists and can select a column in a table equivalent to lists and then use that column to do a VLOOKUP/INDEX MATCH type function on the multiple columns - that will reduce the steps in my query to begin with!

 

Any help or pointers would be really really helpful - thank you!

 

1 ACCEPTED SOLUTION

Hi @kscwp20 ,

 

1.Go to query editor>select "Key words table">select all column>unpivot columns:

Annotation 2020-04-16 181832.png

2. Create a ColumnHead table:

Annotation 2020-04-16 182014.png

3.Create 3 measures as below:

 

KeywordColor = 
var  pn = SELECTEDVALUE('Table'[Project Name (PN)])
var pt = SELECTEDVALUE('Table'[Project Type (PT)])
return CONCATENATEX(FILTER(DISTINCT('Keywords table'[Value]),CONTAINSSTRING(pn,[Value]) || CONTAINSSTRING(pt,[Value])),[Value],",")
Count = 
var  pn = SELECTEDVALUE('Table'[Project Name (PN)])
var pt = SELECTEDVALUE('Table'[Project Type (PT)])
return COUNTROWS(FILTER(DISTINCT('Keywords table'[Value]),CONTAINSSTRING(pn,[Value]) || CONTAINSSTRING(pt,[Value])))

 

 

Measure 2 = 

    SWITCH (
        SELECTEDVALUE ( 'CoulmnHeader'[Value] ),
        "keyword", [KeywordColor],
        "count", ""& [Count]
    )

 

 

Finally you will see:

Annotation 2020-04-16 182216.png

For the related .pbix file,pls click here.

 

Best Regards,
Kelly
Did I answer your question? Mark my post as a solution!

View solution in original post

3 REPLIES 3
Greg_Deckler
Super User
Super User

Sample data and expected output would be fantastic. Please see this post regarding How to Get Your Question Answered Quickly: https://community.powerbi.com/t5/Community-Blog/How-to-Get-Your-Question-Answered-Quickly/ba-p/38490

 

I'm sure it would help @ImkeF and @edhans or someone like them to answer your question.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

Sorry Greg - thanks for the advice.

 

I've detailed below sample data (a few rows) and my current methodology and expected/ideal output.

 

Happy to clarify further if needed.

 

Thanks

 

 

Sample Data:

 Dataset - Columns to look through
Record No.Project Name (PN)Project Type (PT)
1Desert SnakeRed Services
2Green ShootsHound Dog
3Purple TigerBear and Tuna Forensics
4Holy CowFast Apple

 

Keyword Lists - each list contains keywords
ColourFruitsAnimalsFish
RedAppleSnakeTuna
BlueBananaLionShark
GreenOrangeTigerMackerel
OrangeGrapeBearSardine
PurpleLemonDogDolphin

 

Current output:

Colour - PNColour - PTFruits - PNFruits - PTAnimals - PNAnimals - PTFish - PNFish - PT
 null RednullnullSnakenullnullnull
GreennullnullnullDognullnullnull
PurplenullnullnullTigerBearnullTuna
nullnullnullAppleCownullnullnull

 

To get the count, I then unpivot the dataset to create the "Attribute" as the Keyword List Title and "Value" as the keyword.

Then Group rows by Project Name/Project Type and counting the number of rows (i.e number of keywords).

 

Ideal/expected output:

 

Keywords   Count    
ColourFruitsAnimalsFishColourFruitsAnimalsFishTotal Count
Red Snake 1 1 2
Green Dog 1 1 2
Purple Tiger, BearTuna1 214
 AppleCow  11 2

 

Hi @kscwp20 ,

 

1.Go to query editor>select "Key words table">select all column>unpivot columns:

Annotation 2020-04-16 181832.png

2. Create a ColumnHead table:

Annotation 2020-04-16 182014.png

3.Create 3 measures as below:

 

KeywordColor = 
var  pn = SELECTEDVALUE('Table'[Project Name (PN)])
var pt = SELECTEDVALUE('Table'[Project Type (PT)])
return CONCATENATEX(FILTER(DISTINCT('Keywords table'[Value]),CONTAINSSTRING(pn,[Value]) || CONTAINSSTRING(pt,[Value])),[Value],",")
Count = 
var  pn = SELECTEDVALUE('Table'[Project Name (PN)])
var pt = SELECTEDVALUE('Table'[Project Type (PT)])
return COUNTROWS(FILTER(DISTINCT('Keywords table'[Value]),CONTAINSSTRING(pn,[Value]) || CONTAINSSTRING(pt,[Value])))

 

 

Measure 2 = 

    SWITCH (
        SELECTEDVALUE ( 'CoulmnHeader'[Value] ),
        "keyword", [KeywordColor],
        "count", ""& [Count]
    )

 

 

Finally you will see:

Annotation 2020-04-16 182216.png

For the related .pbix file,pls click here.

 

Best Regards,
Kelly
Did I answer your question? Mark my post as a solution!

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.