Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
BWilkinson
Frequent Visitor

String lookup with 0 to N partial matches

In my table, I have a column meant for comments which has a short descriptor paragraph. Within this paragraph there may be certain keywords, which are mapped to specific string value categories (I currently have them in a separate lookup table). What I am looking to do is add a column that is a concatenation of all the different categories for which the corresponding keyword is found in the paragraph. 

 

Here is the pseudo-code I have written out for my problem:

For each word in the paragraph
    If the word is in the keyword column
        Append the corresponding category

 

Here is an example of the results I'm looking for:
PowerBI Multiple String Lookup Example.png

 

 

 

 

 

 

Is this possible in PowerBI? Any help would be greatly appreciated. 

 

1 ACCEPTED SOLUTION

Hi @BWilkinson 

 

 Thanks for giving a toguh problem to solve, it took me almost  3 hours to resolve.

 

It is a bit convuluted way of working I could come out in DAX. 

 

I had followed Marc Russo's article

http://sqlblog.com/blogs/marco_russo/archive/2011/12/30/string-comparison-in-dax.aspx

 

go through the comments and reply by Marc Russo to Tommy.

 

Ok based on that.

 

1. I created the raw text file as my source data called -  TextData containing the column Text  values same as your sample.

2. I created another table called Words which contains two colums, Word and ID.  Word is the list of search terms and ID is a running sequence number.  Example -  Dog,1 ;  Test,2  etc,

3. I created a column in TextData table as under.

  

                        Found =    calculate(

                                                                    if(   countrows(values('Words'[ID]))>1,  0,  values(('Words'[ID]))  ),

                                                       filter(

                                                                       all('Words'[Word]),

                                                                                 search('Words'[Word],TextData[Text],1,0)<>0

                                                                )

                                                  )

 For explanation see the MArc Russo's article.

 

4. I created a table called Category containing ID and Cat example - 1, Animal Food ;  2 , Paragraph

5. Created a link between the Found and the Category table on ID using manage relationship.

6. Built a table report

   Capture12.GIF

 

7. Hope this is what you wanted.

 

8. There are few issues you should be aware that this will only search for the occurance of the search word in the data file. If it has more than one word in the current row then it will not display the corresponding category.

 

Let me know if this works for you.

 

Please accept this as solution if it works and give KUDOS.

 

Cheers

 

CheenuSing

  

 

 

Did I answer your question? Mark my post as a solution and also give KUDOS !

Proud to be a Datanaut!

View solution in original post

7 REPLIES 7
smpa01
Super User
Super User

This is how I solved it (using M)

 

Table 4

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45Wcq1IzC3ISVXSUQpJLS5RitWJVnLMy8xNzAGKuOSngwXc8vNTgNyAzKqqRKXYWAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Category = _t, Keyword = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Category", type text}, {"Keyword", type text}})
in
    #"Changed Type"

Table4Table4

List X1

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45Wcq1IzC3ISVXSUQpJLS5RitWJVnLMy8xNzAGKuOSngwXc8vNTgNyAzKqqRKXYWAA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Category = _t, Keyword = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Category", type text}, {"Keyword", type text}}),
    #"Removed Other Columns" = Table.SelectColumns(#"Changed Type",{"Keyword"}),
    Keyword = #"Removed Other Columns"[Keyword]
in
    Keyword

X1X1

Table 3

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WCsnILFYAokSFktTiEoWCxKLE9KLEggylWJ1oJd9KhZT8dIXEklSF3EqFgsyqqkSwOFhTRmKxQl6+QnZqZXl+UUqxUmwsAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Paragraph = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Paragraph", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Custom", each (X1)),
    #"Expanded Custom" = Table.ExpandListColumn(#"Added Custom", "Custom"),
    #"Added Custom1" = Table.AddColumn(#"Expanded Custom", "Custom.1", each Text.Contains([Paragraph],[Custom],Comparer.OrdinalIgnoreCase)),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom1", each ([Custom.1] = true)),
    #"Merged Queries" = Table.NestedJoin(#"Filtered Rows",{"Custom"},Table4,{"Keyword"},"Table4",JoinKind.LeftOuter),
    #"Expanded Table4" = Table.ExpandTableColumn(#"Merged Queries", "Table4", {"Category"}, {"Category"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded Table4",{"Custom.1"})
in
    #"Removed Columns"

Table3Table3

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs
v-huizhn-msft
Employee
Employee

Hi @BWilkinson,

Use DAX lookup value need the two fields are same. For your scenairo, all th data type is text. So the upper and lowercase values of each letter is very important. For example, "Test" is not equal to "test", so I create the following sample table.

1.PNG

First, you need to split the Paragraph by the split column button or Power Query statement.

2.PNG

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("NYwxCsMwFEOvIv6cS3Rot27djAeBTWwa5xvb0Dqn70+gICR4POScvFLusBAj9oHKxrWxJlnkBDb3L0vdovjFyXMi6AqOiDJR83HQDEPWtz0Xbpd2nSZ27Ip3nB9toZvx9x+qQbz/AQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Paragraph = _t, Keyword = _t, Category = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Paragraph", type text}, {"Keyword", type text}, {"Category", type text}}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "Paragraph", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"Paragraph.1", "Paragraph.2", "Paragraph.3", "Paragraph.4", "Paragraph.5"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Paragraph.1", type text}, {"Paragraph.2", type text}, {"Paragraph.3", type text}, {"Paragraph.4", type text}, {"Paragraph.5", type text}})
in
    #"Changed Type1"

You will get the following formula.

4.PNG

For here you need to compare very new column to the very word in Keyword column value. It's really complex, because the length of [Paragraph] is not sure. And we usually campare different value in each row, like [Paragraph].This to [Keyword].test, you need to campare the  [Paragraph].This to all the values in [Keyword][test, dog, pizza], [there is regular rules. And DAX is unsupported fuzzy seach, so it's impossible to do it using DAX. Thanks for understanding.

Best Regards,
Angelia

Thanks for the response, but this is not what I'm looking for. Based on your suggested solution I would have to manually create M*N lookup columns, where M is the length of my longest paragraph and N is the number of keywords. 

 

At that point I might as well just use a massive if contains statement in one power query column. 

Hi @BWilkinson 

 

 Thanks for giving a toguh problem to solve, it took me almost  3 hours to resolve.

 

It is a bit convuluted way of working I could come out in DAX. 

 

I had followed Marc Russo's article

http://sqlblog.com/blogs/marco_russo/archive/2011/12/30/string-comparison-in-dax.aspx

 

go through the comments and reply by Marc Russo to Tommy.

 

Ok based on that.

 

1. I created the raw text file as my source data called -  TextData containing the column Text  values same as your sample.

2. I created another table called Words which contains two colums, Word and ID.  Word is the list of search terms and ID is a running sequence number.  Example -  Dog,1 ;  Test,2  etc,

3. I created a column in TextData table as under.

  

                        Found =    calculate(

                                                                    if(   countrows(values('Words'[ID]))>1,  0,  values(('Words'[ID]))  ),

                                                       filter(

                                                                       all('Words'[Word]),

                                                                                 search('Words'[Word],TextData[Text],1,0)<>0

                                                                )

                                                  )

 For explanation see the MArc Russo's article.

 

4. I created a table called Category containing ID and Cat example - 1, Animal Food ;  2 , Paragraph

5. Created a link between the Found and the Category table on ID using manage relationship.

6. Built a table report

   Capture12.GIF

 

7. Hope this is what you wanted.

 

8. There are few issues you should be aware that this will only search for the occurance of the search word in the data file. If it has more than one word in the current row then it will not display the corresponding category.

 

Let me know if this works for you.

 

Please accept this as solution if it works and give KUDOS.

 

Cheers

 

CheenuSing

  

 

 

Did I answer your question? Mark my post as a solution and also give KUDOS !

Proud to be a Datanaut!

Thank you very much for linking that article and for providing your solution. I was able to slightly modify your solution to get what I was looking for.

 

Is there any particular reason you split my lookup table into two queries? By keeping my one lookup table, and changing the "0" in the if statement to a CONCATENATEX(Words, [Category], " "), I was able to properly show multiple categories. 

Did you experience issues with the IF statement where it returned index values of zero when there was an actual match? If so, can you please share your solution? I don’t need to concatenate just want to lookup the partial string against the lookup table and return the corresponding index number.

Thank you

Hi @BWilkinson,

Yeah, because the length of pragrah is not sure, so there is no formula to operate them at once. I am not specific for Power Query, you can post it to Power Query forum to ge more professional support.

Best Regards,
Angelia

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.