Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
Hi everyone,
I have a table of produced units by date, channel and type. I receive a daily excel file from a client, e.g:
plant | as of date | production date | channel | type 1 | type 2 | |||
michigan | 7/25/2021 | 7/20/2021 | calls | 1 | 2 | |||
ohio | 7/25/2021 | 7/20/2021 | webpage | 2 | 3 | |||
florida | 7/25/2021 | 7/20/2021 | sales rep | 2 | 3 | |||
Michigan | 7/25/2021 | 7/20/2021 | webpage | 1 | 2 | |||
Ohio | 7/25/2021 | 7/20/2021 | sales rep | 1 | 1 | |||
Florida | 7/25/2021 | 7/20/2021 | calls | 3 | 4 | |||
Michigan | 7/25/2021 | 7/20/2021 | sales rep | 1 | 2 | |||
Ohio | 7/25/2021 | 7/20/2021 | calls | 0 | 1 | |||
Florida | 7/25/2021 | 7/20/2021 | webpage | 2 | 1 | |||
Michigan | 7/25/2021 | 7/21/2021 | calls | 1 | 3 | |||
Ohio | 7/25/2021 | 7/22/2021 | calls | 2 | 3 | |||
Florida | 7/25/2021 | 7/22/2021 | webpage | 1 | 3 |
When there are no units producted type 1 and 2, the report does not specify that the values are 0, like in the example above. In the production date 7/21/2021, for example, Ohio and Florida didn't have any quantities from calls, sales reps and webpage.
How can I make a M query, so the channels "calls", "webpage", and "sales rep", are always shown for each one of the plants, and production dates, even if there are no quantities produced? The expected table would look like:
plant | as of date | production date | channel | type 1 | type 2 | |||
Michigan | 7/25/2021 | 7/20/2021 | calls | 1 | 2 | |||
Ohio | 7/25/2021 | 7/20/2021 | webpage | 2 | 3 | |||
Florida | 7/25/2021 | 7/20/2021 | sales rep | 2 | 3 | |||
Michigan | 7/25/2021 | 7/20/2021 | webpage | 1 | 2 | |||
Ohio | 7/25/2021 | 7/20/2021 | sales rep | 1 | 1 | |||
Florida | 7/25/2021 | 7/20/2021 | calls | 3 | 4 | |||
Michigan | 7/25/2021 | 7/20/2021 | sales rep | 1 | 2 | |||
Ohio | 7/25/2021 | 7/20/2021 | calls | 0 | 1 | |||
Florida | 7/25/2021 | 7/20/2021 | webpage | 2 | 1 | |||
Michigan | 7/25/2021 | 7/21/2021 | calls | 1 | 3 | |||
Ohio | 7/25/2021 | 7/21/2021 | webpage | 0 | 0 | |||
Florida | 7/25/2021 | 7/21/2021 | sales rep | 0 | 0 | |||
Michigan | 7/25/2021 | 7/21/2021 | webpage | 0 | 0 | |||
Ohio | 7/25/2021 | 7/21/2021 | sales rep | 0 | 0 | |||
Florida | 7/25/2021 | 7/21/2021 | calls | 0 | 0 | |||
Michigan | 7/25/2021 | 7/21/2021 | sales rep | 0 | 0 | |||
Ohio | 7/25/2021 | 7/21/2021 | calls | 0 | 0 | |||
Florida | 7/25/2021 | 7/21/2021 | webpage | 0 | 0 | |||
Michigan | 7/25/2021 | 7/22/2021 | calls | 0 | 0 | |||
Ohio | 7/25/2021 | 7/22/2021 | webpage | 0 | 0 | |||
Florida | 7/25/2021 | 7/22/2021 | sales rep | 0 | 0 | |||
Michigan | 7/25/2021 | 7/22/2021 | webpage | 0 | 0 | |||
Ohio | 7/25/2021 | 7/22/2021 | sales rep | 0 | 0 | |||
Florida | 7/25/2021 | 7/22/2021 | calls | 0 | 0 | |||
Michigan | 7/25/2021 | 7/22/2021 | sales rep | 0 | 0 | |||
Ohio | 7/25/2021 | 7/22/2021 | calls | 2 | 3 | |||
Florida | 7/25/2021 | 7/22/2021 | webpage | 1 | 3 |
Thank you
Solved! Go to Solution.
Create a new blank query and paste this over the default code in Advanced Editor:
let
// Define Date.Today
Date.Today = Date.From(DateTime.LocalNow()),
Source = { Number.From(Date.StartOfYear(Date.AddYears(Date.Today, -2)))..Number.From(Date.Today) },
convToTable = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
chgDateType = Table.TransformColumnTypes(convToTable, {{"Column1", type date}}),
renCol = Table.RenameColumns(chgDateType,{{"Column1", "asOfDate"}}),
addProductionDate = Table.AddColumn(renCol, "productionDate", each List.Transform({ Number.From(Date.StartOfYear([asOfDate]))..Number.From(Date.EndOfYear(Date.AddYears([asOfDate], 1))) }, each Date.From(_))),
expandProductionDate = Table.ExpandListColumn(addProductionDate, "productionDate"),
addPlantCJ = Table.AddColumn(expandProductionDate, "plant", each plant),
expandPlantCJ = Table.ExpandTableColumn(addPlantCJ, "plant", {"plant"}, {"plant"}),
addChannelCJ = Table.AddColumn(expandPlantCJ, "channel", each channel),
expandChannelCJ = Table.ExpandTableColumn(addChannelCJ, "channel", {"channel"}, {"channel"}),
addUniqueRowID = Table.AddColumn(expandChannelCJ, "uniqueRowID", each Text.Combine({Text.From(Number.From([asOfDate]), "en-GB"), Text.From(Number.From([productionDate]), "en-GB"), [plant], [channel]}, "-"), type text),
chgTypes = Table.TransformColumnTypes(addUniqueRowID,{{"productionDate", type date}, {"plant", type text}, {"channel", type text}})
in
chgTypes
You will need your plant unique list and channel unique list already set up in Power Query (and called exactly 'plant' & 'channel') for this to work.
You can now follow the steps I took to generate this template table.
Pete
Proud to be a Datanaut!
Hi @dannyboc ,
I would start by building the template table before adding in the values via merge in Power Query, or relationships in the data model.
Start by identifying how you will dynamically create your [as of date] and [production date] columns. There's not enough information in your post for me to understand where these come from, but you should be able to use your calendar table or similar to create two columns that contain every combination of [as of date] and [production date] that you need in your table.
Once you have those, then you will want to crossjoin your plants and channels into this table. You will need a plants table that contains a unique list of all the plants you want to see, and the same for channels.
In your template table, add a new column, call it 'plants', and in the code window just type the name of the table that contains all the plant names. Expand this new column and you will see that it duplicates your date columns for each unique plant value.
Do the same for channels.
Now you have a template table that contains all possible combinations of these dimensions.
From here you would create a unique Id field for each row in both your template table and your fact table, maybe by combining the values of each field into a single field, then either merge on your fact table to get the values where they exist and replace nulls with 0, or use your unique ID field in the data model to relate this template table to your fact table for the same effect.
Pete
Proud to be a Datanaut!
Hi Pete,
Thank you for the idea you provided. I was hoping there was a query I could run to generate such table.
The data comes from an ERP as a daily excel file, unfortunately there is no other way to get the data at this time. I save the daily files in a SharePoint folder, so Power BI can merge them.
I will generate a template table first.
Do you know how to generate a combination of "as of date" and "production date", starting with as of date (January 1, 2019), until today? For each "as of date" there are production dates of the as of date year, up to the next year. E.g. the "as of date" of January 1st, 2019 contains production dates from January 1st, 2019 up to December 31, 2020.
"as of date" January 2, 2019 contains production dates from January 1,2019 up to December 31,2020.
Please let me know, I have done research but nothing comes close.
Thank you
Create a new blank query and paste this over the default code in Advanced Editor:
let
// Define Date.Today
Date.Today = Date.From(DateTime.LocalNow()),
Source = { Number.From(Date.StartOfYear(Date.AddYears(Date.Today, -2)))..Number.From(Date.Today) },
convToTable = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
chgDateType = Table.TransformColumnTypes(convToTable, {{"Column1", type date}}),
renCol = Table.RenameColumns(chgDateType,{{"Column1", "asOfDate"}}),
addProductionDate = Table.AddColumn(renCol, "productionDate", each List.Transform({ Number.From(Date.StartOfYear([asOfDate]))..Number.From(Date.EndOfYear(Date.AddYears([asOfDate], 1))) }, each Date.From(_))),
expandProductionDate = Table.ExpandListColumn(addProductionDate, "productionDate"),
addPlantCJ = Table.AddColumn(expandProductionDate, "plant", each plant),
expandPlantCJ = Table.ExpandTableColumn(addPlantCJ, "plant", {"plant"}, {"plant"}),
addChannelCJ = Table.AddColumn(expandPlantCJ, "channel", each channel),
expandChannelCJ = Table.ExpandTableColumn(addChannelCJ, "channel", {"channel"}, {"channel"}),
addUniqueRowID = Table.AddColumn(expandChannelCJ, "uniqueRowID", each Text.Combine({Text.From(Number.From([asOfDate]), "en-GB"), Text.From(Number.From([productionDate]), "en-GB"), [plant], [channel]}, "-"), type text),
chgTypes = Table.TransformColumnTypes(addUniqueRowID,{{"productionDate", type date}, {"plant", type text}, {"channel", type text}})
in
chgTypes
You will need your plant unique list and channel unique list already set up in Power Query (and called exactly 'plant' & 'channel') for this to work.
You can now follow the steps I took to generate this template table.
Pete
Proud to be a Datanaut!
Hi @BA_Pete ,
The method you previoulsy posted worked for me!
I manually created some tables like "plant" and "channel" in blank queries. Before I was extracting the unique plant and channel values from my original table, so that is why I was facing a memory error.
Thank you so much!
Hi Pete,
Thank you for sharing this! On the preview mode, the query looks to work.
Replicating with current data, I found out that the new table created (Date) with the query would take tens of gigabytes. I'm not sure if I'm doing something wrong. Can you please check the pbi file attached?
https://drive.google.com/file/d/1iuPOFAmyT8rKXKPKNAjhXxsaaV8R9OG2/view?usp=sharing
Thanks
Hi @dannyboc ,
I can't seem to download the files you shared, but I don't think I actually need to. What you are describing sounds plausible due to the crossjoins.
In order to show a row even if there's no data, the row needs to exist in the source data (not always, but I'll explain this in a minute, it might work better for you with limitations).
The code that I shared effectively does this:
- [asOfDate] = Start with every date for up to three years, maximum 1,096 rows
- [productionDate] = For each [asOfDate], crossjoin every date of another two years (731 rows).
- [plant] = For each [productionDate], crossjoin every possible plant (3 rows in example data).
- [channel] = For each [plant], crossjoin every possible channel (3 rows in example data).
The resultant row expansion, therefore, looks like this:
- Date fields: 1,096 * 731 = 801,176 rows
- 801,176 * 3 plants = 2,403,528 rows
- 2,403,528 * 3 channels = 7,210,584 rows
You can see quite quickly that just adding one more plant or channel adds 1,096 * 731 * 3 rows to the table.
If your real data has maybe 20 plants and 15 channels, we're getting 240m+ rows. So your large data size sounds plausible.
However, there may be another way: Instead of pre-building the rows in Power Query, you can create them at runtime instead. You would need to make sure that your [plant] and [channel] dimension tables are related to your fact table, and you would also need to create a new dimension table from the [asOfDate] and [productionDate] steps from my code and relate this to the fact table too. Ensure that it is the dimension table fields that are used in any visualisations. Then you just need to add ' + 0' to the end of your Type1 and Type2 measures. It might look something like this:
_type1 = SUM(yourFactTable[Type1]) + 0
This is going to force the visual to apply a zero value to all dimensions, whether there's values or not.
HOWEVER!!
There's a very good reason I didn't suggest this seemingly simple solution in the first place: doing this will still require all these rows to be materialised somewhere in order to be displayed. The difference with this solution is that it's going to use end-user RAM to do it, rather than having your gateway machine do it overnight then just serve it pre-made. I can't say for certain, but my gut-feel is that this will grind your end-users computer/your report/the universe to a halt.
This could possibly be mitigated by using parameters or fixed report filters to limit the rows needing to be materialised at runtime, but this would require specific scenario testing and is getting outside the scope of this thread.
Pete
Proud to be a Datanaut!
User | Count |
---|---|
140 | |
113 | |
104 | |
77 | |
65 |
User | Count |
---|---|
136 | |
118 | |
101 | |
71 | |
61 |