Solved: Re: Match Word List with another Table and get the... - Page 2

Anonymous · ‎02-24-2019

A) How do i find the Item Counts?

I have a single column table named WordList containing a unique list of values, whose each word value i want to do a partial match with phrases in multiple Attribute columns (more than 255 columns) in another table named Data Extract and get a count of the Item codes that show these word values in their rows. I want to count only the First occurence of a word value for each item in a row.

e.g. A word value may occur in a single phrase multiple times OR across multiple Attributes for that Item in a row, but it should be counted as 1. So we will be counting Items and not how many times a value was found in an item row.

WordList Table:

Words

auto

bus

ingenius

natural

color

DataExtract Table:

ItemID Attr1 Attr2 Attr3 Attr4

10001 natural and soft put in auto mode so natural! get auto; get auto mode

10002 bus at stop natural icecream boy is ingenius more power

10003 let me out! promote goodwill simple simon auto model

10004 matching color natural color natural essence figure it!

10005 god will meet at bus stop similar cases rosetta stone

10006 mud ringworm helper columns rainbow color geneva convention

10007 masked man tenor voice silica gel natural smell

10008 microcosm guns of navarone bloated cell enigmatic man

10009 cellular sky natural color color coated ingenius of him!

10010 ingenius body maker chambers old story minor changes

Question: Is it good to transform i.e. unpivot the Attribute columns first as column Attributes and Value?

Output Table 1:

Words ItemCount %ItemCount

auto 2 =2/10

bus 2 =2/10

ingenius 3 =3/10

natural 5 =5/10

color 3 =3/10

B) If i click on any of the words in the WordList, each word should drilldown to the following Output Table:

Output Table 2:

e.g. If i clicked on the word Natural in above list, i should get the Attribute names that contained it, the unique Attribute Values that contained it and the Total Count of the Attribute Values found under that Attribute Name.

AttributeName AttributeValue AttributeValueCount

Attr1 natural and soft 1

Attr2 natural icecream 1

Attr2 natural color 2

Attr3 so natural! 1

Attr3 natural essence 1

Attr4 natural smell 1

ImkeF · ‎02-24-2019

This is the query for your first part, where you want to see a 5 for the "natural" keyword:

let
    Source = DataExtract,
    #"Inserted Lowercased Text" = Table.AddColumn(Source, "Words", each List.Transform(Text.Split([AttributeValue], " "), Text.Lower), type list),
    #"Expanded Words" = Table.ExpandListColumn(#"Inserted Lowercased Text", "Words"),
    #"Merged Queries" = Table.NestedJoin(#"Expanded Words",{"Words"},Table.TransformColumns(WordList, {"Words", Text.Lower}),{"Words"},"WordList",JoinKind.LeftOuter),
    #"Expanded WordList" = Table.ExpandTableColumn(#"Merged Queries", "WordList", {"Words"}, {"Words.1"}),
    #"Filtered Rows" = Table.SelectRows(#"Expanded WordList", each ([Words.1] <> null)),
    #"Grouped Rows" = Table.Group(#"Filtered Rows", {"ItemID"}, {{"Result", each Table.Distinct(_, {"Words"}), type table}}),
    #"Expanded Result" = Table.ExpandTableColumn(#"Grouped Rows", "Result", {"AttributeName", "AttributeValue", "Words"}, {"AttributeName", "AttributeValue", "Words"})
in
    #"Expanded Result"

For the second part (where you expect a 7), just delete the last 2 steps of the query above.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

View solution in original post

Anonymous · ‎02-27-2019

Hi @ImkeF,

Sorry for bothering you again. I know i said no partial matches before. But is it possible to do a partial match above instead of an exact match? How will the above query change in such scenario?

e.g. the Word ALLERGEN may be found as ALLERGENICS, ALLERGENIC, ALLERGENS, ALLERGEN, HYPOALLERGENIC, HYPO-ALLERGENIC.

Thanks.

ImkeF · ‎02-27-2019

Hi @Anonymous,

the solution for the partial match would look like so:

let
    Source = Input,
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Source, {"UPC"}, "Attribute", "Value"),
    #"Changed Type" = Table.TransformColumnTypes(#"Unpivoted Other Columns",{{"UPC", type text}, {"Attribute", type text}, {"Value", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Words", each List.Select(WordList[Words], (l) => Text.Contains([Value], l, Comparer.OrdinalIgnoreCase))),
    #"Filtered Rows" = Table.SelectRows(#"Added Custom", each ([Words] <> {})),
    #"Expanded Words" = Table.ExpandListColumn(#"Filtered Rows", "Words"),
    #"Grouped Rows" = Table.Group(#"Expanded Words", {"UPC"}, {{"Result", each Table.Distinct(_, {"Words"}), type table}}),
    #"Expanded Result" = Table.ExpandTableColumn(#"Grouped Rows", "Result", {"Attribute", "Value", "Words"}, {"Attribute", "Value", "Words"})
in
    #"Expanded Result"

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Anonymous · ‎02-28-2019

Hi @ImkeF ,

I tried your concise solution, but not getting the partial match results in 2nd pivot table. I was able to write some M code that does the job, though it is not so quick, concise and ingenius as your solution. I observed that your code does not exclude the columns from the ExcludeColumns table but my code excludes.

I am attaching both Tools (your solution & my solution) for your perusal. Also attaching the Dummy Data.xlsx file that i have used with both the Tools. You can see the difference of results in the 2nd pivot table. I calculated the partial matches for my result and they are ok. But your code is somehow excluding some of the results, but don't know why as i didn't completely understand your code.

Finally the Grand Total in 1st Pivot should be 20 distinct UPCs, but it is showing only 15. Also the % is coming incorrect.

Thanks.

ImkeF · ‎02-28-2019

The Exclude-aspect is new to me. Could you please elaborate what you're trying to achieve with it?

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Anonymous · ‎02-28-2019

Sure @ImkeF ,

Using a Left-Anti Merge, i am able to remove from InputTable columns, the unwanted columns listed in the ExcludeColumns table e.g. Char8 column to be excluded from analysis.

P.S: As per my original query, the GrandTotals should be showing 20 (but showing 15) and the % should be calculated based on UPC count for a word / 20 (but showing UPC count for a word / 15). This is wrong in both the files.

I think after power query transforms the table, the rows reduce from 20 to 15 and thats the Distinct Count of UPC thats being picked up instead of the Original Distinct Count of UPC i.e. 20.

How to solve this problem with Power Query or DAX?

e.g. for the 1st pivot it should show:

Row Labels Distinct Count of UPC %Distinct Count of UPC2

OIL	8	40.00%
NO	7	35.00%
GLUTEN	5	25.00%
ALLERGEN	5	25.00%
HYPO	4	20.00%
PARABEN	3	15.00%
SODIUM	2	10.00%
JOJOBA	1	5.00%

Grand Total 20 100.00%

ImkeF · ‎02-28-2019

Excluding column has nothing to do with with the request to detect the words, so these are steps that could be executed everywhere.

Your application is becoming too complex for me here, so I hope that someone else steps in.

I cannot help you any further here.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Anonymous · ‎03-01-2019

Hi @ImkeF ,

Thank you very much for your proactive help. I would really appreciate if you take a last look at the latest version of the Word Partial Match Tool and let me know how it could be optimized to run better.

I have written 2 DAX formulae to get the correct Grand Totals and Percentage of Grand Totals for distinct UPCs.

I also found out why the Grand Total was showing 15 Distinct UPCs instead of 20; the original distinct count of UPCs. At the time of Unpivoting the Other columns, any UPC rows that had all NULL values in them, were being removed automatically by Unpivot. So the count was showing less.

Current Issues to tackle :

Tool M code is not optimized and takes a lot of time to load final Input Table data to Data Model. I imported 50K x 268 columns crosstab data. Tool excluded 168 columns out of it. So 50K x 117 columns data got unpivoted and other transformations were applied to this data. Finally, Partial matches were done on this data using a WordList of 40 words. However, the tool was running whole night since the time i hit bed till i got up in the morning and it is still running now. Works fast with a very small dataset, but fails on a large one. What is going wrong here?
Another issue i see is when i import a new Excel workbook and Refresh All, its path and filename gets updated in the Parameter table, but in the staging Input table in Power Query editor, columns from an earlier import are observed as residue. How to seamlessly remove earlier data from tables and load new data in Power Query? I could be doing something wrong in the steps or missing some steps. Could this be the reason why tools taking time?

The VBA code in the tool does an import of excel file, stores path and filename to the Parameter Table cell and only does a Refresh All (Data menu > Refresh All).

Appreciate all your help.

ImkeF · ‎03-01-2019

Performance tuning can bedome a very time-consuming task, that I don't do for free.

You might find some guidance to do it yourself here: https://www.thebiccountant.com/speedperformance-aspects/

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Zubair_Muhammad · ‎02-24-2019

@Anonymous

Please see attached file. It hopefully solves both problems.

Steps are as follows

1) Unpivot Attribute columns

2) Use Calculated Column for item Count as

ItemCount_ =
VAR myword = [Words]
RETURN
    COUNTROWS (
        FILTER (
            VALUES ( DataExtract[ItemID] ),
            NOT (
                ISEMPTY (
                    FILTER (
                        CALCULATETABLE ( VALUES ( DataExtract[AttributeValue] ) ),
                        SEARCH ( [Words], [AttributeValue], 1, 0 )
                    )
                )
            )
        )
    )

Regards
Zubair

Please try my custom visuals

Match Word List with another Table and get the Count of Words

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly