Re: Add a dynamic number of rows based on multiple...

alexmauer · ‎01-25-2022

I have a dataset with the following columns (among others): System, Location, Schedule Point, Cost. There are 2 systems, A and B, each being installed at 4 locations, 1, 2, 3, and 4. Schedule Points are a function of time based on the expected duration of the install. There will always be at least 10 Schedule Points between the start and finish, but there may be more (not shown in the example dataset). If a given System at a given Location has fewer than 10 Schedule Points, I'd like to add enough rows to have 10 Schedule Points (a separate column will be created to calculate an estimated cost for each newly created Schedule Point).

I've seen a number of questions where people have asked how to add a static number of rows based on a condition. I, however, need to add a dynamic number of rows based on the current maximum Schedule Point for a given System and Location.

This is an example of my current data vs what I'm looking to achieve:

For System 3, it would add 3 lines to reach Schedule Point 10. For System 4 it would add 7 lines to reach Schedule Point 10.

Potentially further complicating this, the data is updated on a quarterly basis, in which new Schedule Points are created in the dataset. So while System 3 currently has 7 Schedule Points in the dataset, it may have 8 or 9 at the next data pull, and System 4 may have 6 or 7.

Also, while 10 Schedule Points is the maximum number of Schedule Points we will report on, there may be additional Schedule Points in the dataset for some Systems if the installation is completed after the expected completion date.

This example dataset includes the 4 important columns, but the actual dataset is 40 columns, and I'd simply like to copy the data from the remaining 36 columns to the newly created rows (it's various financial data that isn't used in this report but that I've been instructed to retain in the file).

I'm not that good with PowerQuery or DAX so I'm struggling a bit with figuring out how to do this. Any help would be appreciated.

AlexisOlson · ‎01-25-2022

You can probably do this by generating a cross-join { A } x { 1, 2, 3, 4 } x { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 } and then merging in the data you do have (plus some additional work to accommodate more than 10 Schedule Points).

However, I don't think expanding your data table is likely the best way to handle the situation. Creating dimension tables for these columns should make things much easier in the long run.

I recommend reading these articles for an intro to dimensional modeling:

Basics of Modeling in Power BI: What is a Dimension Table and Why Say No to a Single Big Table

Power BI Basics of Modeling: Star Schema and How to Build it

alexmauer · ‎01-25-2022

I wasn't familiar with this as a path forward, but before I get too down the rabbit hole researching it, I'd like to present the full scope of the dataset and see if you think it'd still be feasible. The Schedule Points that we report estimates for actually run -1.2 to +1.2, incrementing by .01. The theoretical maximum Schedule Point is 18.25 (for a maximum of 1,945 individual schedule points). We only estimate costs out to Schedule Point 1.2 because we find that most costs beyond that point are too insignificant to bother estimating, however, in completed installations we track every cost we have, out to whatever the max may be for that location and system. There are actually 126 systems being installed at 29 locations (although the number of locations will be increasing).

Further, this will eventually be replicated for other divisions, each of which have a similar number of systems and locations.

I'm not sure what the set-up required for moving forward with this strategy is like, but I greatly simplified the problem in an attempt to make it easier to explain. However, if you think something like this is possible with a dataset like this, I'm willing to give it a shot.

AlexisOlson · ‎01-25-2022

The more columns and the more possible values per column you have, the more costly a cross join is going to be since the size is multiplicative. Expanding your dataset to the dimensions 1945 x 126 x 29 means you have to deal with more than 7 million rows for no particularly good reason.

Basically, the bigger and more complex the model, the more important it is to have it cleanly structured into the star schema dimensional modeling approach, both for performance reasons and for writing simpler DAX calculations. You'll be glad you did in the long run.

ronrsnfld · ‎01-25-2022

You write "I'd simply like to copy the data from the remaining 36 columns to the newly created rows". Are you sure that would be the case for those 36 columns? I ask because I see that you are NOT doing that for the new rows in Columns 4 and 5. I note that for Cost and Estimated Cost, you are not copying values. Rather you show empty cells for the extra Cost rows, and a token for the extra Estimated Cost rows.

alexmauer · ‎01-25-2022

Most of the columns are reference document and funding document data that are System specific, but not Location, Schedule Point, or Cost specific (like title/date/folder path for files that authorize the system or approve funding for the system). A few are dates that are Cost/Schedule Point specific, but these won't be used in this PBI file so it doesn't matter if they're copied/pasted. I assumed (possibly incorrectly) that fewer exclusions would be simpler.

The Cost column is System, Location, and Schedule Point specific, and the Estimated Cost is actually a function of average cumulative cost through a Schedule Point for completed System installations (in this case it would be the average cumulative cost through a Schedule Point for Location 1 and 2, unless that cost is less than the current cumulative cost for the specific System, Location, and maximum Schedule Point in the dataset. To further expand, Location 3, Schedule Point 8 would have an expected cost of 11, because the average cumulative cost of Location 1 and 2 through Schedule Point 8 is 24 and the cumulative cost of Location 3 through Schedule Point 7 is 13, yielding 24-13=11).

The calculations are kind of difficult to explain, so if this doesn't make sense or doesn't help, let me know and I can try to explain it differently.

Add a dynamic number of rows based on multiple conditions

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024