Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
Hi @ImkeF / @MarcelBeug ,
A) How do i find the Item Counts?
I have a single column table named WordList containing a unique list of values, whose each word value i want to do a partial match with phrases in multiple Attribute columns (more than 255 columns) in another table named Data Extract and get a count of the Item codes that show these word values in their rows. I want to count only the First occurence of a word value for each item in a row.
e.g. A word value may occur in a single phrase multiple times OR across multiple Attributes for that Item in a row, but it should be counted as 1. So we will be counting Items and not how many times a value was found in an item row.
WordList Table:
Words
auto
bus
ingenius
natural
color
DataExtract Table:
ItemID Attr1 Attr2 Attr3 Attr4
10001 natural and soft put in auto mode so natural! get auto; get auto mode
10002 bus at stop natural icecream boy is ingenius more power
10003 let me out! promote goodwill simple simon auto model
10004 matching color natural color natural essence figure it!
10005 god will meet at bus stop similar cases rosetta stone
10006 mud ringworm helper columns rainbow color geneva convention
10007 masked man tenor voice silica gel natural smell
10008 microcosm guns of navarone bloated cell enigmatic man
10009 cellular sky natural color color coated ingenius of him!
10010 ingenius body maker chambers old story minor changes
Question: Is it good to transform i.e. unpivot the Attribute columns first as column Attributes and Value?
Output Table 1:
Words ItemCount %ItemCount
auto 2 =2/10
bus 2 =2/10
ingenius 3 =3/10
natural 5 =5/10
color 3 =3/10
B) If i click on any of the words in the WordList, each word should drilldown to the following Output Table:
Output Table 2:
e.g. If i clicked on the word Natural in above list, i should get the Attribute names that contained it, the unique Attribute Values that contained it and the Total Count of the Attribute Values found under that Attribute Name.
AttributeName AttributeValue AttributeValueCount
Attr1 natural and soft 1
Attr2 natural icecream 1
Attr2 natural color 2
Attr3 so natural! 1
Attr3 natural essence 1
Attr4 natural smell 1
Solved! Go to Solution.
This is the query for your first part, where you want to see a 5 for the "natural" keyword:
let Source = DataExtract, #"Inserted Lowercased Text" = Table.AddColumn(Source, "Words", each List.Transform(Text.Split([AttributeValue], " "), Text.Lower), type list), #"Expanded Words" = Table.ExpandListColumn(#"Inserted Lowercased Text", "Words"), #"Merged Queries" = Table.NestedJoin(#"Expanded Words",{"Words"},Table.TransformColumns(WordList, {"Words", Text.Lower}),{"Words"},"WordList",JoinKind.LeftOuter), #"Expanded WordList" = Table.ExpandTableColumn(#"Merged Queries", "WordList", {"Words"}, {"Words.1"}), #"Filtered Rows" = Table.SelectRows(#"Expanded WordList", each ([Words.1] <> null)), #"Grouped Rows" = Table.Group(#"Filtered Rows", {"ItemID"}, {{"Result", each Table.Distinct(_, {"Words"}), type table}}), #"Expanded Result" = Table.ExpandTableColumn(#"Grouped Rows", "Result", {"AttributeName", "AttributeValue", "Words"}, {"AttributeName", "AttributeValue", "Words"}) in #"Expanded Result"
For the second part (where you expect a 7), just delete the last 2 steps of the query above.
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
@Anonymous
And this is the MEASURE which is used as VISUAL filter to get OUTPUT B
Measure = IF ( SEARCH ( SELECTEDVALUE ( WordList[Words] ), SELECTEDVALUE ( DataExtract[AttributeValue] ), 1, 0 ) > 0, 1, 0 )
Hi @Zubair_Muhammad ,
myword is defined by not used anywhere. SELECTEDVALUE is not present in Excel Power Query. Also, when i enter the ItemCount_ calculated column, it goes on calculating and Excel PowerPivot screen freezes.
@Anonymous
You can use MIN, MAX in place of selectedvalue
You can remove the "_"
I will write a MEASURE for you
Hi @Zubair_Muhammad ,
For Output 1, the Item Count for each Word is not show in your pbix file. Only the Attributes count is shown when a Word is clicked in the Slicer. Is it supposed to be a separate Output Table and how to display the Attribute Count?
e.g.
Output Table 1 lists all the Words and the Item Counts for each of those words.
Output Table 2 lists all the Attribute Names, Attribute values and their Counts, related to word selected in Slicer.
@Anonymous
Here is the MEASURE.
See the attached file as well
ItemCount = VAR myword = MIN ( WordList[Words] ) RETURN COUNTROWS ( FILTER ( VALUES ( DataExtract[ItemID] ), NOT ( ISEMPTY ( FILTER ( CALCULATETABLE ( VALUES ( DataExtract[AttributeValue] ) ), SEARCH ( myword, [AttributeValue], 1, 0 ) ) ) ) ) )
Hi @Zubair_Muhammad ,
MIN doesn't work with text fields in Excel and gives an error. Not sure how to make this work in Excel 2016. If you could show me in Excel, that would really help.
HI @Anonymous
You can also use VALUES in place of Selectedvalue
Measure = IF ( SEARCH ( VALUES ( WordList[Words] ), VALUES ( DataExtract[AttributeValue] ), 1, 0 ) > 0, 1, 0 )
Also, i think SEARCH does a partial searching of the word instead of finding the whole word in the Attribute Values. How can this measures be modified to search for whole words only?
@Zubair_Muhammad putting VALUES in the Measure gives an error "a table of multiple values was supplied where a single value was expected".
Is it possible for you to demo me an excel example?
Thanks @Zubair_Muhammad. I will have to go thru this to understand better how you achieved it, as i using Excel 2016 Pro 64 bit. Is it possible to recreate this solution in Excel using Slicer as a visual Filter?
Also, is it possible to achieve the same using Power Query transformation in Excel, as the DataExtract is huge (20+ million Item Rows & more than 255 Attribute columns)? Will there be a performance hit using the DAX measures?
@Anonymous
I will try it later tonight with Power Query alone
But I think this solution which i provided above will be more efficient than a whole sole Power Query Solution
Hi @Anonymous ,
still the open questions:
1) Shall the match be case sensitive?
2) What matches should be returned on "Autobus"?
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
This is the query for your first part, where you want to see a 5 for the "natural" keyword:
let Source = DataExtract, #"Inserted Lowercased Text" = Table.AddColumn(Source, "Words", each List.Transform(Text.Split([AttributeValue], " "), Text.Lower), type list), #"Expanded Words" = Table.ExpandListColumn(#"Inserted Lowercased Text", "Words"), #"Merged Queries" = Table.NestedJoin(#"Expanded Words",{"Words"},Table.TransformColumns(WordList, {"Words", Text.Lower}),{"Words"},"WordList",JoinKind.LeftOuter), #"Expanded WordList" = Table.ExpandTableColumn(#"Merged Queries", "WordList", {"Words"}, {"Words.1"}), #"Filtered Rows" = Table.SelectRows(#"Expanded WordList", each ([Words.1] <> null)), #"Grouped Rows" = Table.Group(#"Filtered Rows", {"ItemID"}, {{"Result", each Table.Distinct(_, {"Words"}), type table}}), #"Expanded Result" = Table.ExpandTableColumn(#"Grouped Rows", "Result", {"AttributeName", "AttributeValue", "Words"}, {"AttributeName", "AttributeValue", "Words"}) in #"Expanded Result"
For the second part (where you expect a 7), just delete the last 2 steps of the query above.
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
Hi @ImkeF ,
Can this solution be replicated in Power Query for Excel 2016?
Hi @ImkeF ,
I tried your solution as well as the modified one with the duplicated column. Here are the 2 solutions pasted from Advanced Editor:
Solution with Duplicated Column:
let Source = Excel.Workbook(File.Contents(FileName), null, true), Table1_Table = Source{[Item="Table1",Kind="Table"]}[Data], #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Table1_Table, {"UPC"}, "Attribute", "Value"), #"Changed Type" = Table.TransformColumnTypes(#"Unpivoted Other Columns",{{"UPC", type text}, {"Attribute", type text}, {"Value", type text}}), #"Merged Queries" = Table.NestedJoin(#"Changed Type",{"Attribute"},ExcludeList,{"ExcludeColumns"},"ExcludeList",JoinKind.LeftAnti), #"Duplicated Column" = Table.DuplicateColumn(#"Merged Queries", "Value", "Value - Copy"), #"Split Column by Delimiter" = Table.SplitColumn(#"Duplicated Column", "Value - Copy", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"Value - Copy.1", "Value - Copy.2", "Value - Copy.3", "Value - Copy.4", "Value - Copy.5", "Value - Copy.6", "Value - Copy.7", "Value - Copy.8", "Value - Copy.9", "Value - Copy.10", "Value - Copy.11", "Value - Copy.12", "Value - Copy.13", "Value - Copy.14", "Value - Copy.15", "Value - Copy.16", "Value - Copy.17"}), #"Unpivoted Other Columns1" = Table.UnpivotOtherColumns(#"Split Column by Delimiter", {"UPC", "Attribute", "Value"}, "Attribute.1", "Value.1"), #"Removed Columns" = Table.RemoveColumns(#"Unpivoted Other Columns1",{"Attribute.1"}), #"Merged Queries1" = Table.NestedJoin(#"Removed Columns",{"Value.1"},WordList,{"Words"},"WordList",JoinKind.LeftOuter), #"Expanded WordList" = Table.ExpandTableColumn(#"Merged Queries1", "WordList", {"Words"}, {"Words"}), #"Filtered Rows" = Table.SelectRows(#"Expanded WordList", each ([Words] <> null)), #"Grouped Rows" = Table.Group(#"Filtered Rows", {"UPC"}, {{"Result", each Table.Distinct(_, {"Words"}), type table}}), #"Expanded Result" = Table.ExpandTableColumn(#"Grouped Rows", "Result", {"Attribute", "Value", "Words"}, {"Attribute", "Value", "Words"}) in #"Expanded Result"
Your Solution:
let Source = Excel.Workbook(File.Contents(FileName), null, true), Table1_Table = Source{[Item="Table1",Kind="Table"]}[Data], #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(Table1_Table, {"UPC"}, "Attribute", "Value"), #"Changed Type" = Table.TransformColumnTypes(#"Unpivoted Other Columns",{{"UPC", type text}, {"Attribute", type text}, {"Value", type text}}), #"Merged Queries" = Table.NestedJoin(#"Changed Type",{"Attribute"},ExcludeList,{"ExcludeColumns"},"ExcludeList",JoinKind.LeftAnti), #"Inserted Uppercased Text" = Table.AddColumn(#"Merged Queries", "Words", each List.Transform(Text.Split([Value], " "), Text.Upper), type list), #"Expanded Words" = Table.ExpandListColumn(#"Inserted Uppercased Text", "Words"), #"Merged Queries1" = Table.NestedJoin(#"Expanded Words",{"Words"},Table.TransformColumns(WordList, {"Words", Text.Upper}),{"Words"},"WordList",JoinKind.LeftOuter), #"Expanded WordList" = Table.ExpandTableColumn(#"Merged Queries1", "WordList", {"Words"}, {"Words.1"}), #"Filtered Rows" = Table.SelectRows(#"Expanded WordList", each ([Words.1] <> null)), #"Grouped Rows" = Table.Group(#"Filtered Rows", {"UPC"}, {{"Result", each Table.Distinct(_, {"Words"}), type table}}), #"Expanded Result" = Table.ExpandTableColumn(#"Grouped Rows", "Result", {"Attribute", "Value", "Words"}, {"Attribute", "Value", "Words"}) in #"Expanded Result"
I have only added another step to using an ExcludeList to Exclude some unwanted Attribute columns from Attribute names column using MERGE with LEFT ANTI.
Once i know that the queries and pivots are giving the right output, eventually I plan to automate loading data to power query and data model.
Thanks.
Please post link to sample file or adapt your solution into the file that I've attached in my 1st answer, so I can follow your steps. (As you're using different table- and column names than in my example).
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
Hi @ImkeF ,
It is the same data, only the ITEM ID column is renamed to UPC. AttributeName column in Excel automatically becomes Attribute column and AttributeValue column becomes Value column when unpivoted.
ExcludeList table only consists of Attribute columns to remove, since original pivoted data may contain many unwanted columns like:
ExcludeList Table
ExcludeColumn
Attr13
Attr15
Attr20
Attr25
Sorry, but if you don't provide a sample file where your queries from above (both) work, I cannot help you any further here.
Imke Feldmann (The BIccountant)
If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!
How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries
Hi @ImkeF ,
I have tried replicating your solution in Excel 2016 with the sample data. Here is the SampleFile for your perusal. I have included your solution as well as my duplicated column solution, but your solution looks concise, dynamic and cool! I have followed your instructions of removing the last 2 steps and now getting the desired output. I still have to check if this will work with the original data as it was taking a lot of time and did not complete till now. Is there a way to make the query steps more concise and optimized as i will be using VBA to input these steps as strings?
Could you explain your below steps?
#"Inserted Uppercased Text" = Table.AddColumn(#"Merged Queries", "Words", each List.Transform(Text.Split([Value], " "), Text.Upper), type list),
and
#"Grouped Rows" = Table.Group(#"Filtered Rows", {"UPC"}, {{"Result", each Table.Distinct(_, {"Words"}), type table}}),
For demo purpose, i included the External file as an INPUT Sheet TABLE. But i plan to automate this via VBA. If the INPUT TABLE was in an External file, then how do i create a parameter table (as shown in sheet) that records the Path & FileName and add this as an input to a STAGING QUERY say INPUT QUERY and then reference this query in the INPUT TABLE QUERY? This is to avoid some errors i am currently getting in my main file while doing this process.
Thanks.
User | Count |
---|---|
103 | |
87 | |
78 | |
71 | |
69 |
User | Count |
---|---|
113 | |
99 | |
97 | |
72 | |
68 |