Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Match Word List with another Table and get the Count of Words

Hi @ImkeF / @MarcelBeug ,

 

A) How do i find the Item Counts?

 

I have a single column table named WordList containing a unique list of values, whose each word value i want to do a partial match with phrases in multiple Attribute columns (more than 255 columns) in another table named Data Extract and get a count of the Item codes that show these word values in their rows. I want to count only the First occurence of a word value for each item in a row.

 

e.g. A word value may occur in a single phrase multiple times OR across multiple Attributes for that Item in a row, but it should be counted as 1. So we will be counting Items and not how many times a value was found in an item row.

 

WordList Table:

 

Words

auto

bus

ingenius

natural

color

 

 

DataExtract Table:

 

ItemID    Attr1                       Attr2                        Attr3                      Attr4

10001      natural and soft     put in auto mode      so natural!               get auto; get auto mode

10002      bus at stop             natural icecream       boy is ingenius        more power

10003      let me out!             promote goodwill     simple simon           auto model

10004      matching color      natural color              natural essence       figure it!

10005      god will                  meet at bus stop       similar cases            rosetta stone

10006      mud ringworm       helper columns         rainbow color          geneva convention

10007      masked man          tenor voice                silica gel                   natural smell

10008      microcosm             guns of navarone      bloated cell              enigmatic man

10009      cellular sky             natural color              color coated            ingenius of him!

10010      ingenius body        maker chambers       old story                  minor changes

 

Question: Is it good to transform i.e. unpivot the Attribute columns first as column Attributes and Value?


Output Table 1
:

Words         ItemCount          %ItemCount

auto                   2                              =2/10

bus                     2                              =2/10

ingenius             3                              =3/10

natural               5                              =5/10

color                  3                              =3/10

 

 

 

 

B) If i click on any of the words in the WordList, each word should drilldown to the following Output Table:

 

Output Table 2:

 

e.g. If i clicked on the word Natural in above list, i should get the Attribute names that contained it, the unique Attribute Values that contained it and the Total Count of the Attribute Values found under that Attribute Name.

 

AttributeName         AttributeValue               AttributeValueCount

Attr1                           natural and soft                         1

Attr2                           natural icecream                        1

Attr2                           natural color                               2

Attr3                          so natural!                                   1

Attr3                          natural essence                           1

Attr4                          natural smell                               1

 

1 ACCEPTED SOLUTION

This is the query for your first part, where you want to see a 5 for the "natural" keyword:

 

let
    Source = DataExtract,
    #"Inserted Lowercased Text" = Table.AddColumn(Source, "Words", each List.Transform(Text.Split([AttributeValue], " "), Text.Lower), type list),
    #"Expanded Words" = Table.ExpandListColumn(#"Inserted Lowercased Text", "Words"),
    #"Merged Queries" = Table.NestedJoin(#"Expanded Words",{"Words"},Table.TransformColumns(WordList, {"Words", Text.Lower}),{"Words"},"WordList",JoinKind.LeftOuter),
    #"Expanded WordList" = Table.ExpandTableColumn(#"Merged Queries", "WordList", {"Words"}, {"Words.1"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded WordList", each ([Words.1] <> null)),
    #"Grouped Rows" = Table.Group(#"Filtered Rows", {"ItemID"}, {{"Result", each Table.Distinct(_, {"Words"}), type table}}),
    #"Expanded Result" = Table.ExpandTableColumn(#"Grouped Rows", "Result", {"AttributeName", "AttributeValue", "Words"}, {"AttributeName", "AttributeValue", "Words"})
in
    #"Expanded Result"

For the second part (where you expect a 7), just delete the last 2 steps of the query above.

 

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

View solution in original post

29 REPLIES 29
Anonymous
Not applicable

Hi @ImkeF,

 

Sorry for bothering you again. I know i said no partial matches before. But is it possible to do a partial match above instead of an exact match? How will the above query change in such scenario?

 

e.g. the Word ALLERGEN may be found as ALLERGENICS, ALLERGENIC, ALLERGENS, ALLERGEN, HYPOALLERGENIC, HYPO-ALLERGENIC.

Thanks.

Hi @Anonymous,

the solution for the partial match would look like so:

 

let
    Source = Input,
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"UPC"}, "Attribute", "Value"),
    #"Changed Type" = Table.TransformColumnTypes(#"Unpivoted Other Columns",{{"UPC", type text}, {"Attribute", type text}, {"Value", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Words", each List.Select(WordList[Words], (l) => Text.Contains([Value], l, Comparer.OrdinalIgnoreCase))),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([Words] <> {})),
    #"Expanded Words" = Table.ExpandListColumn(#"Filtered Rows", "Words"),
    #"Grouped Rows" = Table.Group(#"Expanded Words", {"UPC"}, {{"Result", each Table.Distinct(_, {"Words"}), type table}}),
    #"Expanded Result" = Table.ExpandTableColumn(#"Grouped Rows", "Result", {"Attribute", "Value", "Words"}, {"Attribute", "Value", "Words"})
in
    #"Expanded Result"

 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Anonymous
Not applicable

Hi @ImkeF ,

 

I tried your concise solution, but not getting the partial match results in 2nd pivot table. I was able to write some M code that does the job, though it is not so quick, concise and ingenius as your solution. I observed that your code does not exclude the columns from the ExcludeColumns table but my code excludes.

 

I am attaching both Tools (your solution & my solution) for your perusal. Also attaching the Dummy Data.xlsx file that i have used with both the Tools. You can see the difference of results in the 2nd pivot table. I calculated the partial matches for my result and they are ok. But your code is somehow excluding some of the results, but don't know why as i didn't completely understand your code.

 

Finally the Grand Total in 1st Pivot should be 20 distinct UPCs, but it is showing only 15. Also the % is coming incorrect.

 

Thanks.

The Exclude-aspect is new to me. Could you please elaborate what you're trying to achieve with it?

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Anonymous
Not applicable

Sure @ImkeF ,

 

Using a Left-Anti Merge, i am able to remove from InputTable columns, the unwanted columns listed in the ExcludeColumns table e.g. Char8 column to be excluded from analysis.

 

 

P.S: As per my original query, the GrandTotals should be showing 20 (but showing 15) and the % should be calculated based on UPC count for a word / 20 (but showing UPC count for a word / 15). This is wrong in both the files.


I think after power query transforms the table, the rows reduce from 20 to 15 and thats the Distinct Count of UPC thats being picked up instead of the Original Distinct Count of UPC i.e. 20.

 

How to solve this problem with Power Query or DAX?

 

e.g. for the 1st pivot it should show:

 

Row Labels                      Distinct Count of UPC         %Distinct Count of UPC2

OIL840.00%
NO735.00%
GLUTEN525.00%
ALLERGEN525.00%
HYPO420.00%
PARABEN315.00%
SODIUM210.00%
JOJOBA15.00%

Grand Total                             20                             100.00%

Excluding column has nothing to do with with the request to detect the words, so these are steps that could be executed everywhere.

 

Your application is becoming too complex for me here, so I hope that someone else steps in.

I cannot help you any further here.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Anonymous
Not applicable

Hi @ImkeF ,

 

Thank you very much for your proactive help. I would really appreciate if you take a last look at the latest version of the Word Partial Match Tool and let me know how it could be optimized to run better.

 

I have written 2 DAX formulae to get the correct Grand Totals and Percentage of Grand Totals for distinct UPCs.

 

I also found out why the Grand Total was showing 15 Distinct UPCs instead of 20; the original distinct count of UPCs. At the time of Unpivoting the Other columns, any UPC rows that had all NULL values in them, were being removed automatically by Unpivot. So the count was showing less.

 

Current Issues to tackle :

 

  1. Tool M code is not optimized and takes a lot of time to load final Input Table data to Data Model. I imported 50K x 268 columns crosstab data. Tool excluded 168 columns out of it. So 50K x 117 columns data got unpivoted and other transformations were applied to this data. Finally, Partial matches were done on this data using a WordList of 40 words. However, the tool was running whole night since the time i hit bed till i got up in the morning and it is still running now. Works fast with a very small dataset, but fails on a large one. What is going wrong here?
  2. Another issue i see is when i import a new Excel workbook and Refresh All, its path and filename gets updated in the Parameter table, but in the staging Input table in Power Query editor, columns from an earlier import are observed as residue. How to seamlessly remove earlier data from tables and load new data in Power Query? I could be doing something wrong in the steps or missing some steps. Could this be the reason why tools taking time?

The VBA code in the tool does an import of excel file, stores path and filename to the Parameter Table cell and only does a Refresh All (Data menu > Refresh All).

 

Appreciate all your help.

Performance tuning can bedome a very time-consuming task, that I don't do for free.

You might find some guidance to do it yourself here: https://www.thebiccountant.com/speedperformance-aspects/ 

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Zubair_Muhammad
Community Champion
Community Champion

@Anonymous 

 

Please see attached file. It hopefully solves both problems.

 

Steps are as follows

 

1) Unpivot Attribute columns

2) Use Calculated Column for item Count as

 

ItemCount_ =
VAR myword = [Words]
RETURN
    COUNTROWS (
        FILTER (
            VALUES ( DataExtract[ItemID] ),
            NOT (
                ISEMPTY (
                    FILTER (
                        CALCULATETABLE ( VALUES ( DataExtract[AttributeValue] ) ),
                        SEARCH ( [Words], [AttributeValue], 1, 0 )
                    )
                )
            )
        )
    )

 

 

sifar1.png


Regards
Zubair

Please try my custom visuals

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.