Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
Hi PBI and PQ community,
i learn a lot from all the postings.
But for the following situation i did not find anything yet. May be you can help.
I want to improve the data that is loaded into PBI in Power Query.
On the left side: the data that is loaded from a system, but it has missing values.
On the right side: the data how i need to have it. additional rows are added, with the previous value. (PQs Filldown might help here).
I have already calender tables that state all the available dates, but i do not know how to get those missing dates in this list
Any thoughts?
Sam.
date: 21-05-2018:
Previously i posted this message. Interkoubess replied with nice insights,
But i have a new challenge on this. When use the proposed method, the performance is low, as the table grow to over 20Gb and takes hours to process.
I added an additional sample below
The challenge is now that i have 1000 products and multiyear data.
This really slows down the preparation of the data in PBI.
I found that there is quite some insights on single product solutions (like A only in the above sample), but a list with all products i cannot find.
Any suggestions to solve this in M or PBI?
thanks,
Sam12
below the sample data:
date | product | price |
1-2-2017 | A | 120 |
2-2-2017 | A | 130 |
3-2-2017 | A | 140 |
6-2-2017 | A | 150 |
7-2-2017 | A | 160 |
8-2-2017 | A | 170 |
9-2-2017 | A | 160 |
10-2-2017 | A | 150 |
13-2-2017 | A | 140 |
14-2-2017 | A | 150 |
15-2-2017 | A | 145 |
1-2-2017 | B | 60 |
2-2-2017 | B | 52 |
3-2-2017 | B | 54 |
6-2-2017 | B | 49 |
7-2-2017 | B | 51 |
8-2-2017 | B | 46 |
9-2-2017 | B | 49 |
10-2-2017 | B | 51 |
13-2-2017 | B | 50 |
14-2-2017 | B | 48 |
15-2-2017 | B | 46 |
Solved! Go to Solution.
Hi @Sam12,
Please try this code in the advanced query editor after exporting your data and arrange the corresponding fields:
- First I have field Date and Amount ( from your raw data)
1) Make sure your Date is type Date
2) Create an index
3) Find the difference of day(s) between consecutive date
4) Create a list of date
5) Expand the list created and removed the unnecessary fields ( index, customs)
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content], #"Changed Type" = Table.TransformColumnTypes(Source,{{"Date", type date}, {"Amount", Int64.Type}}), #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1), #"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each try #"Added Index"[Date]{[Index]}-[Date] otherwise 1), #"Changed Type1" = Table.TransformColumnTypes(#"Added Custom",{{"Custom", Int64.Type}}), #"Added Custom1" = Table.AddColumn(#"Changed Type1", "Custom.1", each List.Dates(Date.From([Date]),[Custom],#duration(1,0,0,0))), #"Expanded Custom.1" = Table.ExpandListColumn(#"Added Custom1", "Custom.1"), #"Reordered Columns" = Table.ReorderColumns(#"Expanded Custom.1",{"Custom.1", "Amount", "Date", "Index", "Custom"}), #"Removed Other Columns" = Table.SelectColumns(#"Reordered Columns",{"Custom.1", "Amount"}), #"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns",{{"Custom.1", "Date"}}) in #"Renamed Columns"
Hi @Sam12,
Please try this code in the advanced query editor after exporting your data and arrange the corresponding fields:
- First I have field Date and Amount ( from your raw data)
1) Make sure your Date is type Date
2) Create an index
3) Find the difference of day(s) between consecutive date
4) Create a list of date
5) Expand the list created and removed the unnecessary fields ( index, customs)
let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content], #"Changed Type" = Table.TransformColumnTypes(Source,{{"Date", type date}, {"Amount", Int64.Type}}), #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1), #"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each try #"Added Index"[Date]{[Index]}-[Date] otherwise 1), #"Changed Type1" = Table.TransformColumnTypes(#"Added Custom",{{"Custom", Int64.Type}}), #"Added Custom1" = Table.AddColumn(#"Changed Type1", "Custom.1", each List.Dates(Date.From([Date]),[Custom],#duration(1,0,0,0))), #"Expanded Custom.1" = Table.ExpandListColumn(#"Added Custom1", "Custom.1"), #"Reordered Columns" = Table.ReorderColumns(#"Expanded Custom.1",{"Custom.1", "Amount", "Date", "Index", "Custom"}), #"Removed Other Columns" = Table.SelectColumns(#"Reordered Columns",{"Custom.1", "Amount"}), #"Renamed Columns" = Table.RenameColumns(#"Removed Other Columns",{{"Custom.1", "Date"}}) in #"Renamed Columns"
Hi Interkoubess,
I have played around with your suggestion. My original csv file has 65000 rows. In itself this file reads quickly.
I find that the following lines let the query explode in time to "Apply query changes".
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1),
#"Added Custom" = Table.AddColumn(#"Added Index", "Custom", each try #"Added Index"[Date]{[Index]}-[Date] otherwise 1),
The resulting PQ-apply-process gets to 12Gb before throwing "OLE DB or ODBC errors. We cannot convert the value null to type number."
Have you seen this before?
I think it would be great if we can make the additions you propose, but then saving this as hardcoded value, so that no calculation is needed anymore.
any thoughts?
Sam12
Hi @Sam12 ,
I have never seen this message>
But what about checking the field ( removing null or replacing by a constant) and then proceed....
Please let me know if it does not help, and we can figure out something different.
Ninter
Hi, i just posted the message.
I tried to check this (null) but scolling through 65k rows is not that fast in PQ. Is there a way to extract it from PQ? Like copy-paste into Excel with all rows and not only 999 rows? Let me know.
P.
Hi @Sam12,
Yeah you can extract all your data from PQ simply by close-and-load ( In the top left corner ).
Please let me know if I missed something.
But what about replacing the null value in PQ, it did not work?
Thank you.
Ninter
Hi, I did a replace, but it did not have an effect, still errors.
What about splitting the data set in smaller lists of approx 10000 rows (original file is 65000 row: 500products*130rows), and apply the proposed script. We can make sub selections using another table that states which products to include (innerjoin?)
Can we append the queries afterwards back to the total list again of 65000 rows? Does this reduce the load on the system?
thanks for you ideas.
P.
Hi Interkoubess this is indeed what I need. I played around a bit and it worked. Thanks!
User | Count |
---|---|
96 | |
85 | |
77 | |
66 | |
63 |
User | Count |
---|---|
110 | |
95 | |
95 | |
64 | |
59 |