Solved: Re: Match Word List with another Table and get the...

Anonymous · ‎02-24-2019

A) How do i find the Item Counts?

I have a single column table named WordList containing a unique list of values, whose each word value i want to do a partial match with phrases in multiple Attribute columns (more than 255 columns) in another table named Data Extract and get a count of the Item codes that show these word values in their rows. I want to count only the First occurence of a word value for each item in a row.

e.g. A word value may occur in a single phrase multiple times OR across multiple Attributes for that Item in a row, but it should be counted as 1. So we will be counting Items and not how many times a value was found in an item row.

WordList Table:

Words

auto

bus

ingenius

natural

color

DataExtract Table:

ItemID Attr1 Attr2 Attr3 Attr4

10001 natural and soft put in auto mode so natural! get auto; get auto mode

10002 bus at stop natural icecream boy is ingenius more power

10003 let me out! promote goodwill simple simon auto model

10004 matching color natural color natural essence figure it!

10005 god will meet at bus stop similar cases rosetta stone

10006 mud ringworm helper columns rainbow color geneva convention

10007 masked man tenor voice silica gel natural smell

10008 microcosm guns of navarone bloated cell enigmatic man

10009 cellular sky natural color color coated ingenius of him!

10010 ingenius body maker chambers old story minor changes

Question: Is it good to transform i.e. unpivot the Attribute columns first as column Attributes and Value?

Output Table 1:

Words ItemCount %ItemCount

auto 2 =2/10

bus 2 =2/10

ingenius 3 =3/10

natural 5 =5/10

color 3 =3/10

B) If i click on any of the words in the WordList, each word should drilldown to the following Output Table:

Output Table 2:

e.g. If i clicked on the word Natural in above list, i should get the Attribute names that contained it, the unique Attribute Values that contained it and the Total Count of the Attribute Values found under that Attribute Name.

AttributeName AttributeValue AttributeValueCount

Attr1 natural and soft 1

Attr2 natural icecream 1

Attr2 natural color 2

Attr3 so natural! 1

Attr3 natural essence 1

Attr4 natural smell 1

ImkeF · ‎02-24-2019

This is the query for your first part, where you want to see a 5 for the "natural" keyword:

let
    Source = DataExtract,
    #"Inserted Lowercased Text" = Table.AddColumn(Source, "Words", each List.Transform(Text.Split([AttributeValue], " "), Text.Lower), type list),
    #"Expanded Words" = Table.ExpandListColumn(#"Inserted Lowercased Text", "Words"),
    #"Merged Queries" = Table.NestedJoin(#"Expanded Words",{"Words"},Table.TransformColumns(WordList, {"Words", Text.Lower}),{"Words"},"WordList",JoinKind.LeftOuter),
    #"Expanded WordList" = Table.ExpandTableColumn(#"Merged Queries", "WordList", {"Words"}, {"Words.1"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded WordList", each ([Words.1] <> null)),
    #"Grouped Rows" = Table.Group(#"Filtered Rows", {"ItemID"}, {{"Result", each Table.Distinct(_, {"Words"}), type table}}),
    #"Expanded Result" = Table.ExpandTableColumn(#"Grouped Rows", "Result", {"AttributeName", "AttributeValue", "Words"}, {"AttributeName", "AttributeValue", "Words"})
in
    #"Expanded Result"

For the second part (where you expect a 7), just delete the last 2 steps of the query above.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

View solution in original post

Zubair_Muhammad · ‎02-24-2019

@Anonymous

And this is the MEASURE which is used as VISUAL filter to get OUTPUT B

Measure =
IF (
    SEARCH (
        SELECTEDVALUE ( WordList[Words] ),
        SELECTEDVALUE ( DataExtract[AttributeValue] ),
        1,
        0
    ) > 0,
    1,
    0
)

Regards
Zubair

Please try my custom visuals

Anonymous · ‎02-25-2019

Hi @Zubair_Muhammad ,

myword is defined by not used anywhere. SELECTEDVALUE is not present in Excel Power Query. Also, when i enter the ItemCount_ calculated column, it goes on calculating and Excel PowerPivot screen freezes.

Can an optimized Measure be used instead of a calculated column?
Why is ItemCount suffixed with a "_" ?
ItemCount is also not used anywhere.

Zubair_Muhammad · ‎02-25-2019

@Anonymous

You can use MIN, MAX in place of selectedvalue
You can remove the "_"
I will write a MEASURE for you

Regards
Zubair

Please try my custom visuals

Anonymous · ‎02-25-2019

Hi @Zubair_Muhammad ,

For Output 1, the Item Count for each Word is not show in your pbix file. Only the Attributes count is shown when a Word is clicked in the Slicer. Is it supposed to be a separate Output Table and how to display the Attribute Count?

e.g.

Output Table 1 lists all the Words and the Item Counts for each of those words.

Output Table 2 lists all the Attribute Names, Attribute values and their Counts, related to word selected in Slicer.

Zubair_Muhammad · ‎02-25-2019

@Anonymous

Here is the MEASURE.

See the attached file as well

ItemCount =
VAR myword =
    MIN ( WordList[Words] )
RETURN
    COUNTROWS (
        FILTER (
            VALUES ( DataExtract[ItemID] ),
            NOT (
                ISEMPTY (
                    FILTER (
                        CALCULATETABLE ( VALUES ( DataExtract[AttributeValue] ) ),
                        SEARCH ( myword, [AttributeValue], 1, 0 )
                    )
                )
            )
        )
    )

Regards
Zubair

Please try my custom visuals

Anonymous · ‎02-25-2019

Hi @Zubair_Muhammad ,

MIN doesn't work with text fields in Excel and gives an error. Not sure how to make this work in Excel 2016. If you could show me in Excel, that would really help.

Zubair_Muhammad · ‎02-25-2019

HI @Anonymous

You can also use VALUES in place of Selectedvalue

Measure =
IF (
    SEARCH (
        VALUES ( WordList[Words] ),
        VALUES ( DataExtract[AttributeValue] ),
        1,
        0
    ) > 0,
    1,
    0
)

Regards
Zubair

Please try my custom visuals

Anonymous · ‎02-25-2019

@Zubair_Muhammad ,

Also, i think SEARCH does a partial searching of the word instead of finding the whole word in the Attribute Values. How can this measures be modified to search for whole words only?

Anonymous · ‎02-25-2019

@Zubair_Muhammad putting VALUES in the Measure gives an error "a table of multiple values was supplied where a single value was expected".

Is it possible for you to demo me an excel example?

Anonymous · ‎02-24-2019

Thanks @Zubair_Muhammad. I will have to go thru this to understand better how you achieved it, as i using Excel 2016 Pro 64 bit. Is it possible to recreate this solution in Excel using Slicer as a visual Filter?

Also, is it possible to achieve the same using Power Query transformation in Excel, as the DataExtract is huge (20+ million Item Rows & more than 255 Attribute columns)? Will there be a performance hit using the DAX measures?

Zubair_Muhammad · ‎02-24-2019

@Anonymous

I will try it later tonight with Power Query alone

But I think this solution which i provided above will be more efficient than a whole sole Power Query Solution

Regards
Zubair

Please try my custom visuals

ImkeF · ‎02-24-2019

Hi @Anonymous ,

still the open questions:

1) Shall the match be case sensitive?

2) What matches should be returned on "Autobus"?

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Anonymous · ‎02-24-2019

Hi @ImkeF ,

1) Case-insensitive

2) no match. No fuzzy logic needed.

ImkeF · ‎02-24-2019

This is the query for your first part, where you want to see a 5 for the "natural" keyword:

let
    Source = DataExtract,
    #"Inserted Lowercased Text" = Table.AddColumn(Source, "Words", each List.Transform(Text.Split([AttributeValue], " "), Text.Lower), type list),
    #"Expanded Words" = Table.ExpandListColumn(#"Inserted Lowercased Text", "Words"),
    #"Merged Queries" = Table.NestedJoin(#"Expanded Words",{"Words"},Table.TransformColumns(WordList, {"Words", Text.Lower}),{"Words"},"WordList",JoinKind.LeftOuter),
    #"Expanded WordList" = Table.ExpandTableColumn(#"Merged Queries", "WordList", {"Words"}, {"Words.1"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded WordList", each ([Words.1] <> null)),
    #"Grouped Rows" = Table.Group(#"Filtered Rows", {"ItemID"}, {{"Result", each Table.Distinct(_, {"Words"}), type table}}),
    #"Expanded Result" = Table.ExpandTableColumn(#"Grouped Rows", "Result", {"AttributeName", "AttributeValue", "Words"}, {"AttributeName", "AttributeValue", "Words"})
in
    #"Expanded Result"

For the second part (where you expect a 7), just delete the last 2 steps of the query above.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Anonymous · ‎02-25-2019

Hi @ImkeF ,

Can this solution be replicated in Power Query for Excel 2016?

Initially why have you created a List column of Word values instead of just Duplicating the AttributeValue column and then splitting the words in the new column and then unpivoting them again? Is it to reduce the steps like Unpivoting again and again?
Also, i didn't understand why the Words.1 column was generated and why Group By doesn't show the distinct counts in the column.
Once the final step is generated, do i have to create pivot tables from the final transformed InputTable data and take counts?
I understood the unpivoting, splitting and merging part. But the rest of steps, how does it work?

Anonymous · ‎02-26-2019

Hi @ImkeF ,

I tried your solution as well as the modified one with the duplicated column. Here are the 2 solutions pasted from Advanced Editor:

Solution with Duplicated Column:

let
    Source = Excel.Workbook(File.Contents(FileName), null, true),
    Table1_Table = Source{[Item="Table1",Kind="Table"]}[Data],
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Table1_Table, {"UPC"}, "Attribute", "Value"),
    #"Changed Type" = Table.TransformColumnTypes(#"Unpivoted Other Columns",{{"UPC", type text}, {"Attribute", type text}, {"Value", type text}}),
    #"Merged Queries" = Table.NestedJoin(#"Changed Type",{"Attribute"},ExcludeList,{"ExcludeColumns"},"ExcludeList",JoinKind.LeftAnti),
    #"Duplicated Column" = Table.DuplicateColumn(#"Merged Queries", "Value", "Value - Copy"),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Duplicated Column", "Value - Copy", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"Value - Copy.1", "Value - Copy.2", "Value - Copy.3", "Value - Copy.4", "Value - Copy.5", "Value - Copy.6", "Value - Copy.7", "Value - Copy.8", "Value - Copy.9", "Value - Copy.10", "Value - Copy.11", "Value - Copy.12", "Value - Copy.13", "Value - Copy.14", "Value - Copy.15", "Value - Copy.16", "Value - Copy.17"}),
    #"Unpivoted Other Columns1" = Table.UnpivotOtherColumns(#"Split Column by Delimiter", {"UPC", "Attribute", "Value"}, "Attribute.1", "Value.1"),
    #"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns1",{"Attribute.1"}),
    #"Merged Queries1" = Table.NestedJoin(#"Removed Columns",{"Value.1"},WordList,{"Words"},"WordList",JoinKind.LeftOuter),
    #"Expanded WordList" = Table.ExpandTableColumn(#"Merged Queries1", "WordList", {"Words"}, {"Words"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded WordList", each ([Words] <> null)),

    #"Grouped Rows" = Table.Group(#"Filtered Rows", {"UPC"}, {{"Result", each Table.Distinct(_, {"Words"}), type table}}),
    #"Expanded Result" = Table.ExpandTableColumn(#"Grouped Rows", "Result", {"Attribute", "Value", "Words"}, {"Attribute", "Value", "Words"})
in
    #"Expanded Result"

Your Solution:

let
    Source = Excel.Workbook(File.Contents(FileName), null, true),
    Table1_Table = Source{[Item="Table1",Kind="Table"]}[Data],
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Table1_Table, {"UPC"}, "Attribute", "Value"),
    #"Changed Type" = Table.TransformColumnTypes(#"Unpivoted Other Columns",{{"UPC", type text}, {"Attribute", type text}, {"Value", type text}}),
    #"Merged Queries" = Table.NestedJoin(#"Changed Type",{"Attribute"},ExcludeList,{"ExcludeColumns"},"ExcludeList",JoinKind.LeftAnti),
    #"Inserted Uppercased Text" = Table.AddColumn(#"Merged Queries", "Words", each List.Transform(Text.Split([Value], " "), Text.Upper), type list),
    #"Expanded Words" = Table.ExpandListColumn(#"Inserted Uppercased Text", "Words"),
    #"Merged Queries1" = Table.NestedJoin(#"Expanded Words",{"Words"},Table.TransformColumns(WordList, {"Words", Text.Upper}),{"Words"},"WordList",JoinKind.LeftOuter),
    #"Expanded WordList" = Table.ExpandTableColumn(#"Merged Queries1", "WordList", {"Words"}, {"Words.1"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded WordList", each ([Words.1] <> null)),
    #"Grouped Rows" = Table.Group(#"Filtered Rows", {"UPC"}, {{"Result", each Table.Distinct(_, {"Words"}), type table}}),
    #"Expanded Result" = Table.ExpandTableColumn(#"Grouped Rows", "Result", {"Attribute", "Value", "Words"}, {"Attribute", "Value", "Words"})
in
    #"Expanded Result"

I have only added another step to using an ExcludeList to Exclude some unwanted Attribute columns from Attribute names column using MERGE with LEFT ANTI.

Your query seems concise but i am getting different results from both your Query and duplicate columns one - the latter giving correct results.
How can the Duplicate column query be made more concise with less steps?
In the Duplicate column query, i had to split duplicated column, due to which i am getting hardcoded names e.g. "Value - Copy.1", "Value - Copy.2", "Value - Copy.3"... Can some of these steps be made more dynamic and concise without hardcoding names?

Once i know that the queries and pivots are giving the right output, eventually I plan to automate loading data to power query and data model.

Thanks.

ImkeF · ‎02-26-2019

Please post link to sample file or adapt your solution into the file that I've attached in my 1st answer, so I can follow your steps. (As you're using different table- and column names than in my example).

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Anonymous · ‎02-26-2019

Hi @ImkeF ,

It is the same data, only the ITEM ID column is renamed to UPC. AttributeName column in Excel automatically becomes Attribute column and AttributeValue column becomes Value column when unpivoted.

ExcludeList table only consists of Attribute columns to remove, since original pivoted data may contain many unwanted columns like:

ExcludeList Table

ExcludeColumn

Attr13

Attr15
Attr20

Attr25

ImkeF · ‎02-26-2019

Sorry, but if you don't provide a sample file where your queries from above (both) work, I cannot help you any further here.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Anonymous · ‎02-26-2019

Hi @ImkeF ,

I have tried replicating your solution in Excel 2016 with the sample data. Here is the SampleFile for your perusal. I have included your solution as well as my duplicated column solution, but your solution looks concise, dynamic and cool! I have followed your instructions of removing the last 2 steps and now getting the desired output. I still have to check if this will work with the original data as it was taking a lot of time and did not complete till now. Is there a way to make the query steps more concise and optimized as i will be using VBA to input these steps as strings?

Could you explain your below steps?

   #"Inserted Uppercased Text" = Table.AddColumn(#"Merged Queries", "Words", each List.Transform(Text.Split([Value], " "), Text.Upper), type list),

and

    #"Grouped Rows" = Table.Group(#"Filtered Rows", {"UPC"}, {{"Result", each Table.Distinct(_, {"Words"}), type table}}),

For demo purpose, i included the External file as an INPUT Sheet TABLE. But i plan to automate this via VBA. If the INPUT TABLE was in an External file, then how do i create a parameter table (as shown in sheet) that records the Path & FileName and add this as an input to a STAGING QUERY say INPUT QUERY and then reference this query in the INPUT TABLE QUERY? This is to avoid some errors i am currently getting in my main file while doing this process.

Thanks.

Match Word List with another Table and get the Count of Words

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly