Optimizing Measures to avoid Out of memory errors

Anonymous · ‎11-21-2018

I need help in understanding how to optimize some measures and have a few questions. My system configuration is :

OS : Windows 10 (64 bit)
Memory : 8 GB RAM
MSOffice : Office 2016 Pro (64 bit)
Page File Size (Minimum MB) : 12888 MB
Page File Size (Maximum MB) : 49152 MB
"Disable Hardware Graphics Accelaration" is checked.
"Ignore other applications that use Dynamic Data Exchange (DDE)".

I am using PowerQuery to import 1.12 GB of external CSV data from multiple CSV files. After transforming that data by removing unnecessary columns, changing formats etc and adding to PowerPivot Data Model, the size is now reduced to 555MB.

I have created 2 tables in PowerPivot:

fData table from multiple csv's.
dDate table from a linked spreadsheet table. I have then deleted the Spreadsheet table and removed the link from dDate table.
dDate is linked to fData on Date ---> Item Appearance Date.

I have created a Simple Pivot and added Slicers on sheet for the following:

Year (from dDate table)
Country, Category, Brand Owner, Char Description, Char Value (from fData table)

I have the following measures created:

DistinctCount:=DISTINCTCOUNT(fData[Char Value])

TotalCountWithAllFiltersRemoved:=IF(CALCULATE( COUNTA('fData'[Char Value]), ALL(
'fData'[Category],
'fData'[Country],
'fData'[Brand Owner],
'fData'[Item Appearance Date],
'fData'[Char Description]
 )
) = COUNTA('fData'[CHAR VALUE]), COUNTA('fData'[Char Value])
)

UniquesCount:=IF(
	HASONEFILTER('fData'[Category]) 
	&& HASONEFILTER('fData'[Country]) 
	&& 	HASONEFILTER('fData'[Brand Owner]) 
	&& HASONEFILTER('fData'[Item Appearance Date])
	&& HASONEFILTER('fData'[Char Description]) 
	&& HASONEFILTER('fData'[Char Value])
,
[TotalCountWithAllFiltersRemoved],
    SUMX (
        SUMMARIZE (
		'fData', 
		[Category],
		[Country],
		[Brand Owner],
		[Item Appearance Date],
		[Char Description],
		[Char Value] 
        ),
        [TotalCountWithAllFiltersRemoved]
    )
)

UniqueChars:=IF(CALCULATE( COUNTA('fData'[Char Value]), ALL(
'fData'[Category],
'fData'[Country],
'fData'[Brand Owner],
'fData'[Item Appearance Date],
'fData'[Char Description]
 )
) = COUNTA('fData'[Char Value]), 1
)

UniqueCharsCount:=IF(
		HASONEFILTER('fData'[Category]) 
	&& HASONEFILTER('fData'[Country]) 
	&& 	HASONEFILTER('fData'[Brand Owner]) 
	&& HASONEFILTER('fData'[Item Appearance Date]) 
	&& HASONEFILTER('fData'[Char Description]) 
	&& HASONEFILTER('fData'[Char Value]) 
,
[UniqueChars],
    SUMX (
        SUMMARIZE (
		'fData', 
		[Category],
		[Country],
		[Brand Owner],
		[Item Appearance Date],
		[Char Description],
		[Char Value]
        ),
        [UniqueChars]
    )
)

My Questions:

Measures Issues:
- Is it correct to base the Year Slicer on the Year field in the dDate table, instead of Item Appearance Date from fData table?
- The above measures "UniquesCount" and "UniqueCharsCount" when applied to Pivot, do not give the Unique Char Values that appeared every Year. This is evident when i apply a Conditional formatting on the Char Values list for Uniques. Such Char Values newly appeared in that Year i.e. they do not repeat every year. How do i create such a Meaure?
- I would also like to find the Unique Counts of Char Values for SAMEPERIODLASTYEAR. Such Char Values newly appeared in that Year i.e. they do not repeat every year. How do i create such a Meaure?
- How do i also show the FirstDate a Char Value appeared in a particular Country (i.e. Market) or Category or for a Brand Owner (i.e. Manufacturer)? How do i create such a measure?

Memory issues:
- If i add the DistinctCount measure or add a "Distinct Count of Char Value" or a "Count of Char Value" in Values section of Pivot, the Pivot table calculates and renders quickly. But if i add any of the other Measures, it mostly results in the following Memory error:
- How do i optimize the above measures for better and quick performance?
- Why should the file size show 555MB when i have loaded external data as Connection to Model? Is it because i had clicked saved data to model? How can i conveniently share this file easily with others excluding the large size?

Any help would be most appreciated.

Anonymous · ‎11-27-2018

Hi,

Can anybody from the PowerBI community go through the comments and help quickly?

Here is the sample testdata.xlsx file.

Stachu · ‎12-03-2018

hi @Anonymous, I didn't have time to look into it earlier
can you have a look at this measure, I believe it gives the correct values e.g. 0.12OZ has 1 and 0.15OZ has null as it's was sold in 2004 & 2005)

UniquesCount =
VAR _Chars =
    VALUES ( fData[Char Value] )
VAR _CharsAllYears =
    CALCULATETABLE ( fData, _Chars, ALL ( dDate[Year] ) )
VAR _CharsSummary =
    GROUPBY (
        _CharsAllYears,
        fData[Char Value],
        "MaxYear", MAXX ( CURRENTGROUP (), RELATED ( dDate[Year] ) ),
        "MinYear", MINX ( CURRENTGROUP (), RELATED ( dDate[Year] ) )
    )
VAR _SameYearChars =
    FILTER ( _CharsSummary, [MaxYear] = [MinYear] )
RETURN
    COUNTROWS ( _SameYearChars )

Did I answer your question? Mark my post as a solution!
Thank you for the kudos 🙂

Anonymous · ‎12-03-2018

Hi @Stachu,

Thanks for your quick response.

This measure is pretty fast!

A few observations:

It works well when i apply Year from dDate table and Char Value from fData table to Rows section of Pivot. I check using Conditional Formatting on the Char Values. However, if i add Country, Category or any other columns to the Columns section of pivot, i do not get unique Char Values in Rows anymore. I get a yellow message in Pivot Filter section saying "Relationships need to be created". This i think is because of Many-to-Many relationship in the fData table (see my previous posts or messages in your Inbox for the columns). I tried solving this by creating the addtional single-column tables for relationship with fData table e.g. dBrandOwner, dDate, dCharDescription, dCountry, dCategory etc which contain distinct values collected from fData only. But i think i does not solve the issue. How do i resolve such M2M issues?
Will any other measures i create work properly, if i use this measure in the other measures? e.g. =CALCULATE ( [UniquesCount], filter, filter, filter....)
Can you please also answer my questions from my 1st post & 2nd post?
Can you please briefly explain how this measure you have created works?

Best,

Sifar

Stachu · ‎12-04-2018

I'm still not clear what you want to measure exactly - e.g. if there is [Char Value] in 1999 in France and in 2000 in DE should it be counted as unique or not?
regarding the error message - do you have relationships between dimensions tables? in order for them to work there should be 1 to many relationship, I don't really see why it would be many to many, and I don't think they're necessary in fData if you have them in dimension tables

Did I answer your question? Mark my post as a solution!
Thank you for the kudos 🙂

Anonymous · ‎12-04-2018

Hi @Stachu,

I have updated the TestData Workbook with dimension tables (and a Metrics table to avoid clutter) that i have created from the Facts table i.e. fData table, so it acts as a Cross table, just to solve the M2M relationship issue.
- e.g. there exists M2M between :
  - Category and Country
  - Country and BrandOwner
  - Category and BrandOwner
  - Char Description and Char Value

You can see it in the relationship window in PowerPivot. Am i doing it correctly?

If you see the pivots i have created, the 1st pivot shows the Unique Char values (names) per year, but the UniquesCount metric does not show how many times they appeared in that year. That is missing!

In the 2nd Pivot, if i put Country dimension in Columns section of pivot to see in which Country this Unique Char value originated, then the Char Values show Duplicates also (i have used Conditional Formatting to highlight Uniques). e.g. POWDER found in 2010, 2012.

if we get 2 Years of data, can we compare Past Year vs Previous Year to determine "Emerging" or "Trending" values?

How do i show the FirstDate (Item Appearance Date) when DISTINCT Char Values launched in a Country OR a Category to see a timeline Trend of how it travelled from one Country OR Category to another over the years?

Anything else interesting that might inform what additional insights we can find from this data.

Hope this makes it clear.

Stachu · ‎12-04-2018

OK, so I guess something like this should work, as long as you add the additional criteria you want to ignore to the
_CharsAll

UniquesCount:=
VAR _Chars =
 VALUES ( fData[Char Value] )
VAR _CharsAll =
 CALCULATETABLE ( fData, _Chars, ALL ( dDate[Year] ), ALL(dCountry[Country]) ) --add other filters here
VAR _CharsSummary =
 GROUPBY (
 _CharsAll,
 fData[Char Value],
 "MaxYear", MAXX ( CURRENTGROUP (), RELATED ( dDate[Year] ) ),
 "MinYear", MINX ( CURRENTGROUP (), RELATED ( dDate[Year] ) ),
 "NrOfChars", COUNTAX(CURRENTGROUP(), fData[Char Value])
 )
VAR _SameYearChars =
 FILTER ( _CharsSummary, [MaxYear] = [MinYear] )
RETURN
 SUMX(_SameYearChars,[NrOfChars])

Did I answer your question? Mark my post as a solution!
Thank you for the kudos 🙂

Anonymous · ‎12-05-2018

Hi @Stachu,

I am getting the following memory error everytime (tried restarting excel also) when applying the measure on my original dataset, though i am using Excel 2016 Pro 64 bit, with 8 GB RAM on a Windows 10 64 bit machine.

UniquesCount:=

// get unique Char values
VAR _Chars = VALUES ( fData[Char Value] ) 

// Filter fData table based on only the unique Char values
VAR _CharsAll =
 CALCULATETABLE ( 
	fData, 
	_Chars, 
	ALL ( dDate[Year] ), 
	ALL ( dDate[Date] ),
	ALL ( dCountry[Country] ), 
	ALL ( dCategory[Category] ), 
	ALL ( dBrandOwner[BrandOwner] ), 
	ALL ( dBrand[Brand] ),
	ALL ( dSubBrand[SubBrand] ),
	ALL ( dInnType[InnType] ),
	ALL ( dInnSubType[InnSubType] ),
	ALL ( dCharDescription[CharDescription] )
)

// 
VAR _CharsSummary =
	GROUPBY ( 
		_CharsAll, 
		fData[Char Value],
		"MaxYear", MAXX ( CURRENTGROUP (), RELATED ( dDate[Year] ) ),
		"MinYear", MINX ( CURRENTGROUP (), RELATED ( dDate[Year] ) ),
		"NrOfChars", COUNTAX( CURRENTGROUP (), fData[Char Value] )
 )

//
VAR _SameYearChars = FILTER ( _CharsSummary, [MaxYear] = [MinYear] )

RETURN SUMX( _SameYearChars, [NrOfChars] )

I have added the additional filters. Can this measure be optimized?

Anonymous · ‎12-07-2018

Hi @Stachu ,

Any update on the issue?

v-cherch-msft · ‎11-21-2018

Hi @Anonymous

Here is the document about Optimising Memory Usage for your reference. Simplified sample data and expected output will be helpful to provide an accurate solution.

https://www.linkedin.com/pulse/ten-techniques-optimising-memory-usage-microsoft-power-dejan-pajk

Regards,

Cherie

Community Support Team _ Cherie Chen
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Stachu · ‎11-21-2018

my guess the measure are not performaing beacause of the number of conditionals and iterator function you use

what do you want to measure in business terms with e.g. UniquesCount and UniqueCharsCount? why use IF and 1 as alternative resource?
can you also share anonymised sample of your data in copyable format here?

Did I answer your question? Mark my post as a solution!
Thank you for the kudos 🙂

Anonymous · ‎11-21-2018

Hi @Stachu,

Thanks for your quick response.

UniquesCount : is the count of unique Char Values that occurred first time within a year and not found in subsequent years. They can occur more than once within the year. I need to know their Firstdate of appearance (Item Appearance Date).

UniqueCharsCount : is the same as above. The only thing is i am counting their more than once occurence within the year as 1.

I need to know when UNIQUE Char Values appeared First time i.e FirstDate

for a BrandOwner (Manufacturer)
in a Country
in a Category

I also need to know when DISTINCT Char Values appeared First time i.e FirstDate for :

other Brand Owners (Manufacturers) and when (FirstDate)
in other Countries and when (FirstDate)
in other Categories and when (FirstDate)

i.e. how their paths have traversed across different Countries, Categories, Manufacturers over the years.

Also, is there a way to reduce file size for sharing purposes?

P.S: How do i anonymize this large data stretching different countries, categories, brand owners and years? Is there some free tool to do this?

Stachu · ‎11-21-2018

can you just share the Top20 rows from your data table and the expected output for the shared sample?
here is a post on how to share:
https://community.powerbi.com/t5/Community-Blog/How-to-Get-Your-Question-Answered-Quickly/ba-p/38490

regarding the measures:
does UniqueCount give you correct results? for the first date of appearance SUMMARIZE by Char with ALL and MIN on date should work

Did I answer your question? Mark my post as a solution!
Thank you for the kudos 🙂

Anonymous · ‎11-24-2018

Hi @stacu,

I have mailed you the sample testdata.xlsx file yesterday. It has the 2 tables added to data model ( links removed).

As you will see, there is a many-to-many relationship between country and category as well as brand owner. Also, you will see, some characteristic description are common across categories. I think this is the reason why uniques are not getting identified correctly. How do we tackle this M2M?

Anonymous · ‎11-21-2018

Hi @Stachu,

I have shared the sample anonymized testdata workbook for your inspection that you can download. I have added the 2 tables to the Data Model + the measures and created a sample pivot with Slicers. You can add Unique conditional formatting to E column to know if Char Values are indeed unique across Categories, Countries, BrandOwners & Years.

With the huge amount of data, i am not sure i am getting Uniques correctly. This is because a Char Value may appear in multiple Countries or Categories.

Regarding the FirstCharValDateofAppearance, i am not sure how to write this measure or represent it in pivot. Would need some help here.

Let me know if this is sufficient or you need anything else.

Optimizing Measures to avoid Out of memory errors

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly