Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
I have 2 tables that i have imported to Excel 2016 PowerPivot Data Model.
Input Table (unpivoted):
UPC Attribute Value 1 Char1 OIL FREE 1 Char2 NO GLUTEN 1 Char3 ZERO PARABEN 1 Char4 HYPO ALLERGENIC 1 Char5 NO SODIUM; NO OIL; ORGANIC 1 Char8 NOT 2 Char6 CONTAINS JOJOBA OIL 3 Char2 NO GLUTEN 5 Char4 DOES NOT CONTAIN COMMON ALLERGENS 5 Char8 TO 7 Char1 FREE OF OIL 7 Char3 NO PARABEN 7 Char5 ZERO OIL; ZERO PARABEN; NO GLUTEN 8 Char8 BE 9 Char6 HAS NO OIL 10 Char2 GLUTEN FREE 10 Char3 FREE OF PARABEN 10 Char4 HYPOALLERGENIC 11 Char7 3% SODIUM 12 Char4 HYPO-ALLERGENIC 12 Char8 USED 13 Char1 CONTAINS NO OIL 14 Char4 HYPO ALLERGENIC 15 Char8 ANY 16 Char5 FREE OF GLUTEN 17 Char1 OIL FREE 18 Char1 OIL FREE 18 Char8 MORE 20 Char1 DOES NOT HAVE OIL 20 Char7 NO OIL
WordsList Table:
Words PARABEN OIL NO HYPO ALLERGEN SODIUM GLUTEN JOJOBA
I have to do a Partial Match of each Word in the WordsList table with the Value column in the InputTable using DAX and not Power Query. I have 2 Pivot Tables as shown below that i need to update with measures:
1st Pivot (based on WordList table, Words column) :
Words DistinctCountOfUPC %DistinctCountOfUPC ------------------------------------------------------------------------ OIL 8 53.33% NO 7 46.67% GLUTEN 5 33.33% ALLERGEN 5 33.33% HYPO 4 26.67% PARABEN 3 20.00% SODIUM 2 13.33% JOJOBA 1 6.67% ------------------------------------------------------------------------ Grand Total 20 100.00% ------------------------------------------------------------------------
2nd Pivot (based on InputTable - Attribute & Value columns):
This table gets filtered based on what Word gets selected in a Slicer shown below.
Attribute Value Count of Value ------------------------------------------------------------------------------- Char5 ZERO OIL; ZERO PARABEN; NO GLUTEN 1 Char5 NO SODIUM; NO OIL; ORGANIC 1 Char1 DOES NOT HAVE OIL 1 Char1 CONTAINS NO OIL 1 Char2 NO GLUTEN 2 Char6 HAS NO OIL 1 Char7 NO OIL 1 Char3 NO PARABEN 1 Char4 DOES NOT CONTAIN COMMON ALLERGENS 1 ------------------------------------------------------------------------ Grand Total 10 ------------------------------------------------------------------------
Slicer :
I can do a Partial Match in Power Query and then load to Data Model, but it will increase the number of Rows from 5M (50,954 Rows x 116 Columns) to 236 M (50,954 Rows x 116 Columns x 40 Words) i.e. for my original data and take an entire day to load to Data Model. So trying this way.
The problem i am facing is :
Any help would be most appreciated.
Thanks.
Solved! Go to Solution.
So you could use a pattern like the following for the Count of UPC and Value count measures. It seems to work on the small sample data set in this post, but I would be really concerned about the performance over larger datasets
Distinct UPC Count = COUNTROWS( CALCULATETABLE( DISTINCT(Input[UPC]) , FILTER(Input, IFERROR(SEARCH(SELECTEDVALUE(Wordslist[Words],""),Input[value]),-1)> 0) ) )
Value Count = COUNTROWS( CALCULATETABLE( Input , FILTER(Input, IFERROR(SEARCH(SELECTEDVALUE(Wordslist[Words],""),Input[value]),-1)> 0) ) )
Even though it will take longer to load, you may get much better query performance from the 230M row data set. How often does the data change, can you unpivot it and store it in a database to minimize the manipulation you have to do as you load it? If you are streaming that many rows direct from a database you should be able to load in less than 1.5 hours (assuming you have a 64bit machine otherwise you are likely to run into memory issues)
So you could use a pattern like the following for the Count of UPC and Value count measures. It seems to work on the small sample data set in this post, but I would be really concerned about the performance over larger datasets
Distinct UPC Count = COUNTROWS( CALCULATETABLE( DISTINCT(Input[UPC]) , FILTER(Input, IFERROR(SEARCH(SELECTEDVALUE(Wordslist[Words],""),Input[value]),-1)> 0) ) )
Value Count = COUNTROWS( CALCULATETABLE( Input , FILTER(Input, IFERROR(SEARCH(SELECTEDVALUE(Wordslist[Words],""),Input[value]),-1)> 0) ) )
Even though it will take longer to load, you may get much better query performance from the 230M row data set. How often does the data change, can you unpivot it and store it in a database to minimize the manipulation you have to do as you load it? If you are streaming that many rows direct from a database you should be able to load in less than 1.5 hours (assuming you have a 64bit machine otherwise you are likely to run into memory issues)
Hi @d_gosbell,
Just to give you a background - It is a 50K data extract that i Unpivot in Power Query, which increases the rows to 5M. Earlier i was doing the Join of a 40 Words Wordlist table with these 5M rows internally in Power Query and then loading to the Excel PowerPivot Data Model, which was making the rows increase to 236M thereby increasing the load time to >24 hours. I think this was happening because i was importing 1 source table from an external file and 2 source tables from within the tool into Power Query and then merging the sources. This design created multiple caches. Now when i moved all the source tables into a single External Source workbook, the load time reduced from >24 hours to 2 hours as a single cache gets created now. To further reduce the load time, i am now not merging the 40 Words Wordlist table with the 5M rows Input table in Power Query, but want to achieve the merge and calculations using DAX. Doing this, i am only loading the 5M rows to PowerPivot Data Model and then do the calculations. Hope you are getting my point.
BTW, i am using Excel 2016 64 bit on Windows 10 64 bit with all the latest updates. SELECTEDVALUE function is not present for Excel so the measures give errors. Is there an Alternative function that can be used as i am currently using FIRSTNONBLANK function which i think is giving me the desired results.
The imported data extract takes 2 minutes to load and then the 2 DAX (UPC Count & Value Count) calculations take some 1-2 minutes to calculate each word (with a 50K Rows x 116 Columns x 40 comparison Words file), but it is a huge improvement! Thanks.
UPC Count:
VAR GT = DISTINCTCOUNT(InputTable[UPC]) RETURN IF( HASONEFILTER(WordList[Words]), COUNTROWS( CALCULATETABLE( DISTINCT(InputTable[UPC]) , FILTER(InputTable, IFERROR(SEARCH(FIRSTNONBLANK(WordList[Words],""),InputTable[Value]),-1)> 0) ) ) , GT )
Value Count:
VAR GT = COUNTA(InputTable[Value]) RETURN IF(HASONEFILTER(WordList[Words]), COUNTROWS( CALCULATETABLE( InputTable , FILTER( InputTable, IFERROR(SEARCH(FIRSTNONBLANK(WordList[Words],1), InputTable[Value] ), -1)> 0 ) ) ) , GT )
Here are the 2 files for your perusal - Sample Tool, Dummy Data.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
113 | |
99 | |
75 | |
73 | |
49 |
User | Count |
---|---|
146 | |
108 | |
106 | |
90 | |
62 |