Solved: Re: Waterfall chart - Unpivoting data for a large ...

Anonymous · ‎04-22-2021

Hello,

I'm facing an issue where the size of my data is potentially too large for the type of visual I want to create. I have a dataset that is around 45M records, and I need to create a waterfall chart that would, to my mind, require some unpivoting of the data, as some of the fields we are bringing into the dataset are aggregates of other fields. For example:

Date	Market	30 days past due	60 days past due
1/1/2021	US	7,000,000	8,000,401
1/1/2021	CA	5,300,000	500,130

Would then become the following in order to be able to do a waterfall breakdown by determining days past due based on distinct aging segments:

Date	Market	Metric	Value
1/1/2021	US	30 days past due	7,000,000
1/1/2021	US	60 days past due	8,000,401
1/1/2021	CA	30 days past due	5,300,000
1/1/2021	CA	60 days past due	500,130

The problem is the size of my dataset. It's already fairly large (45M) and I'm potentially going to have to blow it up much further to create the waterfall views I need to create unless I a) limit certain segmentations or b) collapse certain segments together before I bring the dataset into PowerBI. Neither is ideal. Is there a way to do a waterfall chart in the way that I'm thinking of without completely blowing up my dataset or report?

Anonymous · ‎04-28-2021

Hi @v-kelly-msft,

No worries at all. After doing some research I was able to come up with something that I think will work. Essentially, what I've found is that if I can omit any and all records with a zero value, the dataset becomes much more manageable. The way to do it would be to create a calculated table as such:

UnpivotedTable = UNION(CALCULATETABLE(
                             SUMMARIZE(OriginalTable,
                                       OriginalTable[date],
                                       OriginalTable[segment1],
                                       OriginalTable[segment2],
                                       OriginalTable[segment3],
                                       OriginalTable[segment4],
                                       OriginalTable[segment5],
                                       OriginalTable[segment6],
                                       OriginalTable[segment7],
                                       OriginalTable[segment8],
                                       OriginalTable[segment9],
                                       OriginalTable[segment10],
                                       OriginalTable[segment11],
                                       OriginalTable[segment12],
                                       OriginalTable[segment13],
                                       OriginalTable[segment14],
                                       "Metric",
                                       "30 days past due",
                                       "Value",
                                       sum(OriginalTable[30 days past due])),
                                       FILTER(OriginalTable,OriginalTable[30 days past due] <> 0)),
                CALCULATETABLE(
                             SUMMARIZE(OriginalTable,
                                       OriginalTable[date],
                                       OriginalTable[segment1],
                                       OriginalTable[segment2],
                                       OriginalTable[segment3],
                                       OriginalTable[segment4],
                                       OriginalTable[segment5],
                                       OriginalTable[segment6],
                                       OriginalTable[segment7],
                                       OriginalTable[segment8],
                                       OriginalTable[segment9],
                                       OriginalTable[segment10],
                                       OriginalTable[segment11],
                                       OriginalTable[segment12],
                                       OriginalTable[segment13],
                                       OriginalTable[segment14],
                                       "Metric",
                                       "60 days past due",
                                       "Value",
                                       sum(OriginalTable[60 days past due])),
                                       FILTER(OriginalTable,OriginalTable[60 days past due] <> 0)),

Whereas before unpivoting left me with a dataset 2-5 times the size of my original dataset, this one leaves me with a much smaller dataset that still contains all of the relevant data required for a working waterfall chart.

View solution in original post

v-kelly-msft · ‎04-26-2021

Hi @Anonymous ,

How about create a duplicated table then using unpivoting the columns ?

Best Regards,
Kelly

Did I answer your question? Mark my post as a solution!

Anonymous · ‎04-26-2021

Hi @v-kelly-msft,

Wouldn't this still be an enormous table that I would have to store somewhere? That is really the issue...it would be technically possible to duplicate and unpivot a table in M or DAX but would I even be able to store that in my file given that the main dataset itself is already 45M records? The only thing I could possibly think of would be to bring in the data through Direct Query (hence not storing it in PBI) which is simply not feasible because of the limitations of that method.

v-kelly-msft · ‎04-27-2021

Hi @Anonymous ,

If so,first create a dim table as below:

Then create a calculated table as below:

Table 2 = CROSSJOIN(SELECTCOLUMNS('Table',"Date",[Date],"Market",[Market]),VALUES('dim table'[Column1]))

And a measure as below:

Measure = 
SWITCH(
    SELECTEDVALUE('Table 2'[Column1]),
    "30 days past due",CALCULATE(SUM('Table'[30 days past due]),FILTER(ALL('Table'),'Table'[Date]=MAX('Table 2'[Date])&&'Table'[Market]=MAX('Table 2'[Market]))),
    "60 days past due",CALCULATE(SUM('Table'[60 days past due]),FILTER(ALL('Table'),'Table'[Date]=MAX('Table 2'[Date])&&'Table'[Market]=MAX('Table 2'[Market]))),BLANK())

Finally you will see:

For the related .pbix file,pls see attached.

Best Regards,
Kelly

Did I answer your question? Mark my post as a solution!

Anonymous · ‎04-27-2021

Hi @v-kelly-msft,

Unfortunately this option still blows my dataset up to an unsustainable size. I think I will need to handle this on the DB side. Thank you for your help.

v-kelly-msft · ‎04-28-2021

Hi @Anonymous ，

Sorry for not being able to help in the end.

Best Regards,
Kelly

Did I answer your question? Mark my post as a solution!

Anonymous · ‎04-28-2021

Hi @v-kelly-msft,

No worries at all. After doing some research I was able to come up with something that I think will work. Essentially, what I've found is that if I can omit any and all records with a zero value, the dataset becomes much more manageable. The way to do it would be to create a calculated table as such:

UnpivotedTable = UNION(CALCULATETABLE(
                             SUMMARIZE(OriginalTable,
                                       OriginalTable[date],
                                       OriginalTable[segment1],
                                       OriginalTable[segment2],
                                       OriginalTable[segment3],
                                       OriginalTable[segment4],
                                       OriginalTable[segment5],
                                       OriginalTable[segment6],
                                       OriginalTable[segment7],
                                       OriginalTable[segment8],
                                       OriginalTable[segment9],
                                       OriginalTable[segment10],
                                       OriginalTable[segment11],
                                       OriginalTable[segment12],
                                       OriginalTable[segment13],
                                       OriginalTable[segment14],
                                       "Metric",
                                       "30 days past due",
                                       "Value",
                                       sum(OriginalTable[30 days past due])),
                                       FILTER(OriginalTable,OriginalTable[30 days past due] <> 0)),
                CALCULATETABLE(
                             SUMMARIZE(OriginalTable,
                                       OriginalTable[date],
                                       OriginalTable[segment1],
                                       OriginalTable[segment2],
                                       OriginalTable[segment3],
                                       OriginalTable[segment4],
                                       OriginalTable[segment5],
                                       OriginalTable[segment6],
                                       OriginalTable[segment7],
                                       OriginalTable[segment8],
                                       OriginalTable[segment9],
                                       OriginalTable[segment10],
                                       OriginalTable[segment11],
                                       OriginalTable[segment12],
                                       OriginalTable[segment13],
                                       OriginalTable[segment14],
                                       "Metric",
                                       "60 days past due",
                                       "Value",
                                       sum(OriginalTable[60 days past due])),
                                       FILTER(OriginalTable,OriginalTable[60 days past due] <> 0)),

Whereas before unpivoting left me with a dataset 2-5 times the size of my original dataset, this one leaves me with a much smaller dataset that still contains all of the relevant data required for a working waterfall chart.