Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Waterfall chart - Unpivoting data for a large dataset (10M+ records)

Hello,

 

I'm facing an issue where the size of my data is potentially too large for the type of visual I want to create.  I have a dataset that is around 45M records, and I need to create a waterfall chart that would, to my mind, require some unpivoting of the data, as some of the fields we are bringing into the dataset are aggregates of other fields.  For example:

 

DateMarket30 days past due60 days past due
1/1/2021US7,000,0008,000,401
1/1/2021CA5,300,000500,130

 

Would then become the following in order to be able to do a waterfall breakdown by determining days past due based on distinct aging segments:

 

DateMarketMetricValue
1/1/2021US30 days past due7,000,000
1/1/2021US60 days past due8,000,401
1/1/2021CA30 days past due5,300,000
1/1/2021CA60 days past due500,130

 

The problem is the size of my dataset.  It's already fairly large (45M) and I'm potentially going to have to blow it up much further to create the waterfall views I need to create unless I a) limit certain segmentations or b) collapse certain segments together before I bring the dataset into PowerBI.  Neither is ideal.  Is there a way to do a waterfall chart in the way that I'm thinking of without completely blowing up my dataset or report? 

1 ACCEPTED SOLUTION
Anonymous
Not applicable

Hi @v-kelly-msft,

 

No worries at all.  After doing some research I was able to come up with something that I think will work.  Essentially, what I've found is that if I can omit any and all records with a zero value, the dataset becomes much more manageable.  The way to do it would be to create a calculated table as such:

 

UnpivotedTable = UNION(CALCULATETABLE(
                             SUMMARIZE(OriginalTable,
                                       OriginalTable[date],
                                       OriginalTable[segment1],
                                       OriginalTable[segment2],
                                       OriginalTable[segment3],
                                       OriginalTable[segment4],
                                       OriginalTable[segment5],
                                       OriginalTable[segment6],
                                       OriginalTable[segment7],
                                       OriginalTable[segment8],
                                       OriginalTable[segment9],
                                       OriginalTable[segment10],
                                       OriginalTable[segment11],
                                       OriginalTable[segment12],
                                       OriginalTable[segment13],
                                       OriginalTable[segment14],
                                       "Metric",
                                       "30 days past due",
                                       "Value",
                                       sum(OriginalTable[30 days past due])),
                                       FILTER(OriginalTable,OriginalTable[30 days past due] <> 0)),
                CALCULATETABLE(
                             SUMMARIZE(OriginalTable,
                                       OriginalTable[date],
                                       OriginalTable[segment1],
                                       OriginalTable[segment2],
                                       OriginalTable[segment3],
                                       OriginalTable[segment4],
                                       OriginalTable[segment5],
                                       OriginalTable[segment6],
                                       OriginalTable[segment7],
                                       OriginalTable[segment8],
                                       OriginalTable[segment9],
                                       OriginalTable[segment10],
                                       OriginalTable[segment11],
                                       OriginalTable[segment12],
                                       OriginalTable[segment13],
                                       OriginalTable[segment14],
                                       "Metric",
                                       "60 days past due",
                                       "Value",
                                       sum(OriginalTable[60 days past due])),
                                       FILTER(OriginalTable,OriginalTable[60 days past due] <> 0)),

 

Whereas before unpivoting left me with a dataset 2-5 times the size of my original dataset, this one leaves me with a much smaller dataset that still contains all of the relevant data required for a working waterfall chart.

View solution in original post

7 REPLIES 7
v-kelly-msft
Community Support
Community Support

Hi @Anonymous ,

 

How about create a duplicated table then using unpivoting the columns ?

 

Best Regards,
Kelly

Did I answer your question? Mark my post as a solution!

Anonymous
Not applicable

Hi @v-kelly-msft,

 

Wouldn't this still be an enormous table that I would have to store somewhere?  That is really the issue...it would be technically possible to duplicate and unpivot a table in M or DAX but would I even be able to store that in my file given that the main dataset itself is already 45M records?  The only thing I could possibly think of would be to bring in the data through Direct Query (hence not storing it in PBI) which is simply not feasible because of the limitations of that method.

Hi  @Anonymous ,

 

If so,first create a dim table as below:

v-kelly-msft_0-1619515614582.png

Then create a calculated table as below:

Table 2 = CROSSJOIN(SELECTCOLUMNS('Table',"Date",[Date],"Market",[Market]),VALUES('dim table'[Column1]))

And a measure as below:

Measure = 
SWITCH(
    SELECTEDVALUE('Table 2'[Column1]),
    "30 days past due",CALCULATE(SUM('Table'[30 days past due]),FILTER(ALL('Table'),'Table'[Date]=MAX('Table 2'[Date])&&'Table'[Market]=MAX('Table 2'[Market]))),
    "60 days past due",CALCULATE(SUM('Table'[60 days past due]),FILTER(ALL('Table'),'Table'[Date]=MAX('Table 2'[Date])&&'Table'[Market]=MAX('Table 2'[Market]))),BLANK())

Finally you will see:

v-kelly-msft_1-1619515704190.png

For the related .pbix file,pls see attached.

 

Best Regards,
Kelly

Did I answer your question? Mark my post as a solution!

Anonymous
Not applicable

Hi @v-kelly-msft,

 

Unfortunately this option still blows my dataset up to an unsustainable size.  I think I will need to handle this on the DB side.  Thank you for your help. 

Hi @Anonymous ,

 

Sorry for not being able to help in the end.

 

 

Best Regards,
Kelly

Did I answer your question? Mark my post as a solution!

Anonymous
Not applicable

Hi @v-kelly-msft,

 

No worries at all.  After doing some research I was able to come up with something that I think will work.  Essentially, what I've found is that if I can omit any and all records with a zero value, the dataset becomes much more manageable.  The way to do it would be to create a calculated table as such:

 

UnpivotedTable = UNION(CALCULATETABLE(
                             SUMMARIZE(OriginalTable,
                                       OriginalTable[date],
                                       OriginalTable[segment1],
                                       OriginalTable[segment2],
                                       OriginalTable[segment3],
                                       OriginalTable[segment4],
                                       OriginalTable[segment5],
                                       OriginalTable[segment6],
                                       OriginalTable[segment7],
                                       OriginalTable[segment8],
                                       OriginalTable[segment9],
                                       OriginalTable[segment10],
                                       OriginalTable[segment11],
                                       OriginalTable[segment12],
                                       OriginalTable[segment13],
                                       OriginalTable[segment14],
                                       "Metric",
                                       "30 days past due",
                                       "Value",
                                       sum(OriginalTable[30 days past due])),
                                       FILTER(OriginalTable,OriginalTable[30 days past due] <> 0)),
                CALCULATETABLE(
                             SUMMARIZE(OriginalTable,
                                       OriginalTable[date],
                                       OriginalTable[segment1],
                                       OriginalTable[segment2],
                                       OriginalTable[segment3],
                                       OriginalTable[segment4],
                                       OriginalTable[segment5],
                                       OriginalTable[segment6],
                                       OriginalTable[segment7],
                                       OriginalTable[segment8],
                                       OriginalTable[segment9],
                                       OriginalTable[segment10],
                                       OriginalTable[segment11],
                                       OriginalTable[segment12],
                                       OriginalTable[segment13],
                                       OriginalTable[segment14],
                                       "Metric",
                                       "60 days past due",
                                       "Value",
                                       sum(OriginalTable[60 days past due])),
                                       FILTER(OriginalTable,OriginalTable[60 days past due] <> 0)),

 

Whereas before unpivoting left me with a dataset 2-5 times the size of my original dataset, this one leaves me with a much smaller dataset that still contains all of the relevant data required for a working waterfall chart.

Hi @Anonymous ,

 

Thanks for sharing.

 

Best Regards,
Kelly

Did I answer your question? Mark my post as a solution!

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.