Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hi
I have two tables that need to be combined. One table contains basic information about deals and another table information about the associated products.Each row consists of a state a deal was in from Start Date Deal to End Date Deal. If one of the attributes changes, I have a new row with the new information and the date range for which this information held true. The same applies to the products table.
In order to work with the data, I need a table that contains the information from both tables. The problem is that I have Date Ranges and this approach should be robust to this.
Here's an example of my deals-table with one single deal (but I have many in my data):
Deal ID | Start Date Deal | End Date Deal | Deal attributes |
1 | 1.1.2021 | 1.2.2021 | status 1 |
1 | 2.2.2021 | 3.3.2021 | status 2 |
1 | 4.3.2021 | 15.6.2021 | status 3 |
And this is my table with the products:
Deal ID | Product ID | Start Date Product | End Date Product | Product attributes |
1 | 234 | 15.1.2021 | 2.2.2021 | Price 1 |
1 | 234 | 3.2.2021 | 15.6.2021 | Price 2 |
1 | 456 | 15.1.2021 | 4.2.2021 | Price 5 |
1 | 456 | 5.2.2021 | 10.6.2021 | Price 6 |
And the desired input would look like this:
Deal ID | Product ID | Start Date Phase | End Date Phase | Deal attributes | Product attributes |
1 | 234 | 1.1.2021 | 14.1.2021 | status 1 | (None) |
1 | 234 | 15.1.2021 | 1.2.2021 | status 1 | Price 1 |
1 | 234 | 2.2.2021 | 2.2.2021 | status 2 | Price1 |
1 | 234 | 3.2.2021 | 3.3.2021 | status 2 | Price 2 |
1 | 234 | 4.3.2021 | 15.6.2021 | status 3 | Price 2 |
2 | 456 | 1.1.2021 | 14.1.2021 | status 1 | (None) |
2 | 456 | 15.1.2021 | 1.2.2021 | status 1 | Price 5 |
2 | 456 | 2.2.2021 | 4.2.2021 | status 2 | Price 5 |
2 | 456 | 5.2.2021 | 3.3.2021 | status 2 | Price 6 |
2 | 456 | 4.3.2021 | 10.6.2021 | status 3 | Price 6 |
2 | 456 | 11.6.2021 | 15.6.2021 | status 3 | (None) |
The challenge for me is that I have to create rows for each phase, irrespective of whether something changed on the deal side or the product side.
I would really appreciated your help.
Thank in advance
Solved! Go to Solution.
I would expand these tables to have 1 deal/product row per day.
You can do this with by adding a custom column that lists all the dates between the start and end period.
= List.Generate(() => [Start Date], each _ <= [End Date], each _ + #duration(1, 0, 0, 0))
After doing that just click the new column and choose expand list value. From this point you should be able to merge the tables, and add some logic for the Product attributes.
If you going to create some kind of interactive power bi report with this, I would leave the data in this format. If your loading it for quick viewing, you will need to do a grouping operator with the min and max of the date column to get back the starting and ending date.
thank you @artemus and @Anonymous .
In the end I resorted to create expanded tables and merge those. The performance so far seems ok.
"Thank you very much for your suggestion. That was actually what I wanted to do at first. However, this is obviously just a small example and this table will grow huge as I have thousands of deals and products. That is why I wanted to look for another solution and try to avoid this."
"In the end I resorted to create expanded tables and merge those. The performance so far seems ok."
To make the discussion of general use, could you explain which solution in the end is the one that solved the problem?
Could you post the script you used?
I would expand these tables to have 1 deal/product row per day.
You can do this with by adding a custom column that lists all the dates between the start and end period.
= List.Generate(() => [Start Date], each _ <= [End Date], each _ + #duration(1, 0, 0, 0))
After doing that just click the new column and choose expand list value. From this point you should be able to merge the tables, and add some logic for the Product attributes.
If you going to create some kind of interactive power bi report with this, I would leave the data in this format. If your loading it for quick viewing, you will need to do a grouping operator with the min and max of the date column to get back the starting and ending date.
Thank you very much for your suggestion. That was actually what I wanted to do at first. However, this is obviously just a small example and this table will grow huge as I have thousands of deals and products. That is why I wanted to look for another solution and try to avoid this.
could you give the dimensions of your tables in terms of rows and columns?
the solution I proposed does not use list.generate and could be sufficiently performing.
Otherwise you should look for other tools.
DataFrames.jl in the julia data ecosystem, for example.
These two tables each have about 8 columns and approximately 3000 rows. With all the timestamps and changes I already get to more than 300'000 rows each for these two tables.
I'm a bit afraid that the data will grow really fast and will soon reach 1+ millions of rows so that performance could be an issue.
give a try, starting from a subset of db (f.i from 500 or 1k rows) and let's know the execution time.
Could you provide a full db (CSV or XLSX) with tokenized sensible data?
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.