Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Match Word List with another Table and get the Count of Words

Hi @ImkeF / @MarcelBeug ,

 

A) How do i find the Item Counts?

 

I have a single column table named WordList containing a unique list of values, whose each word value i want to do a partial match with phrases in multiple Attribute columns (more than 255 columns) in another table named Data Extract and get a count of the Item codes that show these word values in their rows. I want to count only the First occurence of a word value for each item in a row.

 

e.g. A word value may occur in a single phrase multiple times OR across multiple Attributes for that Item in a row, but it should be counted as 1. So we will be counting Items and not how many times a value was found in an item row.

 

WordList Table:

 

Words

auto

bus

ingenius

natural

color

 

 

DataExtract Table:

 

ItemID    Attr1                       Attr2                        Attr3                      Attr4

10001      natural and soft     put in auto mode      so natural!               get auto; get auto mode

10002      bus at stop             natural icecream       boy is ingenius        more power

10003      let me out!             promote goodwill     simple simon           auto model

10004      matching color      natural color              natural essence       figure it!

10005      god will                  meet at bus stop       similar cases            rosetta stone

10006      mud ringworm       helper columns         rainbow color          geneva convention

10007      masked man          tenor voice                silica gel                   natural smell

10008      microcosm             guns of navarone      bloated cell              enigmatic man

10009      cellular sky             natural color              color coated            ingenius of him!

10010      ingenius body        maker chambers       old story                  minor changes

 

Question: Is it good to transform i.e. unpivot the Attribute columns first as column Attributes and Value?


Output Table 1
:

Words         ItemCount          %ItemCount

auto                   2                              =2/10

bus                     2                              =2/10

ingenius             3                              =3/10

natural               5                              =5/10

color                  3                              =3/10

 

 

 

 

B) If i click on any of the words in the WordList, each word should drilldown to the following Output Table:

 

Output Table 2:

 

e.g. If i clicked on the word Natural in above list, i should get the Attribute names that contained it, the unique Attribute Values that contained it and the Total Count of the Attribute Values found under that Attribute Name.

 

AttributeName         AttributeValue               AttributeValueCount

Attr1                           natural and soft                         1

Attr2                           natural icecream                        1

Attr2                           natural color                               2

Attr3                          so natural!                                   1

Attr3                          natural essence                           1

Attr4                          natural smell                               1

 

1 ACCEPTED SOLUTION

This is the query for your first part, where you want to see a 5 for the "natural" keyword:

 

let
    Source = DataExtract,
    #"Inserted Lowercased Text" = Table.AddColumn(Source, "Words", each List.Transform(Text.Split([AttributeValue], " "), Text.Lower), type list),
    #"Expanded Words" = Table.ExpandListColumn(#"Inserted Lowercased Text", "Words"),
    #"Merged Queries" = Table.NestedJoin(#"Expanded Words",{"Words"},Table.TransformColumns(WordList, {"Words", Text.Lower}),{"Words"},"WordList",JoinKind.LeftOuter),
    #"Expanded WordList" = Table.ExpandTableColumn(#"Merged Queries", "WordList", {"Words"}, {"Words.1"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded WordList", each ([Words.1] <> null)),
    #"Grouped Rows" = Table.Group(#"Filtered Rows", {"ItemID"}, {{"Result", each Table.Distinct(_, {"Words"}), type table}}),
    #"Expanded Result" = Table.ExpandTableColumn(#"Grouped Rows", "Result", {"AttributeName", "AttributeValue", "Words"}, {"AttributeName", "AttributeValue", "Words"})
in
    #"Expanded Result"

For the second part (where you expect a 7), just delete the last 2 steps of the query above.

 

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

View solution in original post

29 REPLIES 29
Zubair_Muhammad
Community Champion
Community Champion

@Anonymous 

 

And this is the MEASURE which is used as VISUAL filter to get OUTPUT B

 

Measure =
IF (
    SEARCH (
        SELECTEDVALUE ( WordList[Words] ),
        SELECTEDVALUE ( DataExtract[AttributeValue] ),
        1,
        0
    ) > 0,
    1,
    0
)

Regards
Zubair

Please try my custom visuals
Anonymous
Not applicable

Hi @Zubair_Muhammad ,

 

myword is defined by not used anywhere. SELECTEDVALUE is not present in Excel Power Query. Also, when i enter the ItemCount_ calculated column, it goes on calculating and Excel PowerPivot screen freezes.

 

  • Can an optimized Measure be used instead of a calculated column?
  • Why is ItemCount suffixed with a "_" ?
  • ItemCount is also not used anywhere.

 

@Anonymous 

 

You can use MIN, MAX in place of selectedvalue
You can remove the "_"
I will write a MEASURE for you


Regards
Zubair

Please try my custom visuals
Anonymous
Not applicable

Hi @Zubair_Muhammad ,

 

For Output 1, the Item Count for each Word is not show in your pbix file. Only the Attributes count is shown when a Word is clicked in the Slicer. Is it supposed to be a separate Output Table and how to display the Attribute Count?

 

e.g.

Output Table 1 lists all the Words and the Item Counts for each of those words.

Output Table 2 lists all the Attribute Names, Attribute values and their Counts, related to word selected in Slicer.

 

@Anonymous 

 

Here is the MEASURE.

See the attached file as well

 

ItemCount =
VAR myword =
    MIN ( WordList[Words] )
RETURN
    COUNTROWS (
        FILTER (
            VALUES ( DataExtract[ItemID] ),
            NOT (
                ISEMPTY (
                    FILTER (
                        CALCULATETABLE ( VALUES ( DataExtract[AttributeValue] ) ),
                        SEARCH ( myword, [AttributeValue], 1, 0 )
                    )
                )
            )
        )
    )

Regards
Zubair

Please try my custom visuals
Anonymous
Not applicable

Hi @Zubair_Muhammad ,

 

MIN doesn't work with text fields in Excel and gives an error. Not sure how to make this work in Excel 2016. If you could show me in Excel, that would really help.

HI @Anonymous 

 

You can also use VALUES in place of Selectedvalue

Measure =
IF (
    SEARCH (
        VALUES ( WordList[Words] ),
        VALUES ( DataExtract[AttributeValue] ),
        1,
        0
    ) > 0,
    1,
    0
)

Regards
Zubair

Please try my custom visuals
Anonymous
Not applicable

@Zubair_Muhammad ,

 

Also, i think SEARCH does a partial searching of the word instead of finding the whole word in the Attribute Values. How can this measures be modified to search for whole words only?

Anonymous
Not applicable

@Zubair_Muhammad putting VALUES in the Measure gives an error "a table of multiple values was supplied where a single value was expected".

 

Is it possible for you to demo me an excel example?

Anonymous
Not applicable

Thanks @Zubair_Muhammad. I will have to go thru this to understand better how you achieved it, as i using Excel 2016 Pro 64 bit. Is it possible to recreate this solution in Excel using Slicer as a visual Filter?

 

Also, is it possible to achieve the same using Power Query transformation in Excel, as the DataExtract is huge (20+ million Item Rows & more than 255 Attribute columns)? Will there be a performance hit using the DAX measures?

 

@Anonymous 

 

I will try it later tonight with Power Query alone

But I think this solution which i provided above will be more efficient than a whole sole Power Query Solution


Regards
Zubair

Please try my custom visuals

Hi @Anonymous ,

still the open questions:

1) Shall the match be case sensitive?

2) What matches should be returned on "Autobus"?

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Anonymous
Not applicable

Hi @ImkeF ,

 

1) Case-insensitive

2) no match. No fuzzy logic needed.

 

This is the query for your first part, where you want to see a 5 for the "natural" keyword:

 

let
    Source = DataExtract,
    #"Inserted Lowercased Text" = Table.AddColumn(Source, "Words", each List.Transform(Text.Split([AttributeValue], " "), Text.Lower), type list),
    #"Expanded Words" = Table.ExpandListColumn(#"Inserted Lowercased Text", "Words"),
    #"Merged Queries" = Table.NestedJoin(#"Expanded Words",{"Words"},Table.TransformColumns(WordList, {"Words", Text.Lower}),{"Words"},"WordList",JoinKind.LeftOuter),
    #"Expanded WordList" = Table.ExpandTableColumn(#"Merged Queries", "WordList", {"Words"}, {"Words.1"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded WordList", each ([Words.1] <> null)),
    #"Grouped Rows" = Table.Group(#"Filtered Rows", {"ItemID"}, {{"Result", each Table.Distinct(_, {"Words"}), type table}}),
    #"Expanded Result" = Table.ExpandTableColumn(#"Grouped Rows", "Result", {"AttributeName", "AttributeValue", "Words"}, {"AttributeName", "AttributeValue", "Words"})
in
    #"Expanded Result"

For the second part (where you expect a 7), just delete the last 2 steps of the query above.

 

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Anonymous
Not applicable

Hi @ImkeF ,

 

Can this solution be replicated in Power Query for Excel 2016?

 

  • Initially why have you created a List column of Word values instead of just Duplicating the AttributeValue column and then splitting the words in the new column and then unpivoting them again? Is it to reduce the steps like Unpivoting again and again?
  • Also, i didn't understand why the Words.1 column was generated and why Group By doesn't show the distinct counts in the column.
  • Once the final step is generated, do i have to create pivot tables from the final transformed InputTable data and take counts?
  • I understood the unpivoting, splitting and merging part. But the rest of steps, how does it work?
Anonymous
Not applicable

Hi @ImkeF ,

 

I tried your solution as well as the modified one with the duplicated column. Here are the 2 solutions pasted from Advanced Editor:

 

Solution with Duplicated Column:

let
    Source = Excel.Workbook(File.Contents(FileName), null, true),
    Table1_Table = Source{[Item="Table1",Kind="Table"]}[Data],
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Table1_Table, {"UPC"}, "Attribute", "Value"),
    #"Changed Type" = Table.TransformColumnTypes(#"Unpivoted Other Columns",{{"UPC", type text}, {"Attribute", type text}, {"Value", type text}}),
    #"Merged Queries" = Table.NestedJoin(#"Changed Type",{"Attribute"},ExcludeList,{"ExcludeColumns"},"ExcludeList",JoinKind.LeftAnti),
    #"Duplicated Column" = Table.DuplicateColumn(#"Merged Queries", "Value", "Value - Copy"),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Duplicated Column", "Value - Copy", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"Value - Copy.1", "Value - Copy.2", "Value - Copy.3", "Value - Copy.4", "Value - Copy.5", "Value - Copy.6", "Value - Copy.7", "Value - Copy.8", "Value - Copy.9", "Value - Copy.10", "Value - Copy.11", "Value - Copy.12", "Value - Copy.13", "Value - Copy.14", "Value - Copy.15", "Value - Copy.16", "Value - Copy.17"}),
    #"Unpivoted Other Columns1" = Table.UnpivotOtherColumns(#"Split Column by Delimiter", {"UPC", "Attribute", "Value"}, "Attribute.1", "Value.1"),
    #"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns1",{"Attribute.1"}),
    #"Merged Queries1" = Table.NestedJoin(#"Removed Columns",{"Value.1"},WordList,{"Words"},"WordList",JoinKind.LeftOuter),
    #"Expanded WordList" = Table.ExpandTableColumn(#"Merged Queries1", "WordList", {"Words"}, {"Words"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded WordList", each ([Words] <> null)),

    #"Grouped Rows" = Table.Group(#"Filtered Rows", {"UPC"}, {{"Result", each Table.Distinct(_, {"Words"}), type table}}),
    #"Expanded Result" = Table.ExpandTableColumn(#"Grouped Rows", "Result", {"Attribute", "Value", "Words"}, {"Attribute", "Value", "Words"})
in
    #"Expanded Result"

Your Solution:

let
    Source = Excel.Workbook(File.Contents(FileName), null, true),
    Table1_Table = Source{[Item="Table1",Kind="Table"]}[Data],
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Table1_Table, {"UPC"}, "Attribute", "Value"),
    #"Changed Type" = Table.TransformColumnTypes(#"Unpivoted Other Columns",{{"UPC", type text}, {"Attribute", type text}, {"Value", type text}}),
    #"Merged Queries" = Table.NestedJoin(#"Changed Type",{"Attribute"},ExcludeList,{"ExcludeColumns"},"ExcludeList",JoinKind.LeftAnti),
    #"Inserted Uppercased Text" = Table.AddColumn(#"Merged Queries", "Words", each List.Transform(Text.Split([Value], " "), Text.Upper), type list),
    #"Expanded Words" = Table.ExpandListColumn(#"Inserted Uppercased Text", "Words"),
    #"Merged Queries1" = Table.NestedJoin(#"Expanded Words",{"Words"},Table.TransformColumns(WordList, {"Words", Text.Upper}),{"Words"},"WordList",JoinKind.LeftOuter),
    #"Expanded WordList" = Table.ExpandTableColumn(#"Merged Queries1", "WordList", {"Words"}, {"Words.1"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded WordList", each ([Words.1] <> null)),
    #"Grouped Rows" = Table.Group(#"Filtered Rows", {"UPC"}, {{"Result", each Table.Distinct(_, {"Words"}), type table}}),
    #"Expanded Result" = Table.ExpandTableColumn(#"Grouped Rows", "Result", {"Attribute", "Value", "Words"}, {"Attribute", "Value", "Words"})
in
    #"Expanded Result"

 

I have only added another step to using an ExcludeList to Exclude some unwanted Attribute columns from Attribute names column using MERGE with LEFT ANTI.

 

  • Your query seems concise but i am getting different results from both your Query and duplicate columns one - the latter giving correct results.
  • How can the Duplicate column query be made more concise with less steps?
  • In the Duplicate column query, i had to split duplicated column, due to which i am getting hardcoded names e.g. "Value - Copy.1", "Value - Copy.2", "Value - Copy.3"... Can some of these steps be made more dynamic and concise without hardcoding names?

Once i know that the queries and pivots are giving the right output, eventually I plan to automate loading data to power query and data model. 

 

Thanks.

Please post link to sample file or adapt your solution into the file that I've attached in my 1st answer, so I can follow your steps. (As you're using different table- and column names than in my example).

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Anonymous
Not applicable

Hi @ImkeF ,

 

It is the same data, only the ITEM ID column is renamed to UPC. AttributeName column in Excel automatically becomes Attribute column and AttributeValue column becomes Value column when unpivoted.

 

ExcludeList table only consists of Attribute columns to remove, since original pivoted data may contain many unwanted columns like:

 

ExcludeList Table

 

ExcludeColumn

Attr13

Attr15
Attr20

Attr25

 

Sorry, but if you don't provide a sample file where your queries from above (both) work, I cannot help you any further here.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Anonymous
Not applicable

Hi @ImkeF ,

 

I have tried replicating your solution in Excel 2016 with the sample data. Here is the SampleFile for your perusal. I have included your solution as well as my duplicated column solution, but your solution looks concise, dynamic and cool! I have followed your instructions of removing the last 2 steps and now getting the desired output. I still have to check if this will work with the original data as it was taking a lot of time and did not complete till now. Is there a way to make the query steps more concise and optimized as i will be using VBA to input these steps as strings?

 

Could you explain your below steps?

   #"Inserted Uppercased Text" = Table.AddColumn(#"Merged Queries", "Words", each List.Transform(Text.Split([Value], " "), Text.Upper), type list),

and

    #"Grouped Rows" = Table.Group(#"Filtered Rows", {"UPC"}, {{"Result", each Table.Distinct(_, {"Words"}), type table}}),

For demo purpose, i included the External file as an INPUT Sheet TABLE. But i plan to automate this via VBA. If the INPUT TABLE was in an External file, then how do i create a parameter table (as shown in sheet) that records the Path & FileName and add this as an input to a STAGING QUERY say INPUT QUERY and then reference this query in the INPUT TABLE QUERY? This is to avoid some errors i am currently getting in my main file while doing this process.

 

Thanks.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.