Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
Hello everybody,
for me, i'm havin trouble realising the following task.
I want to add the Column "Expected Outcome" in PowerQuery. I was able to add the column in PowerPivot, but therefor i have to load all the rows (ca. 3 Million). And for the case, where ZAECO=2 and i have to pick the BTRST from the earlier Date and times it by -1 i had to create a new calculated column.
If there is a way to add this column via PowerQuery, i can delete the columns BTRST, INDEX and ZAECO and uncheck the null-Values wich makes the Datamodel more efficient. I don't think that a simple IF-Statement would be sufficient.
Thanks a lot for every respond.
Regards
BjoernSchaefer
FAZID | ZAECO | DATUM | Index | BTRST | Expected Outcome | Description of "Expected Outcome" |
SYS0000009 | null | 11.10.1994 | 3 | 10018 | null | |
SYS0000009 | null | 11.10.1994 | 4 | 10018 | null | |
SYS0000009 | null | 11.10.1994 | 5 | 10018 | null | |
SYS0000009 | null | 25.10.1995 | 6 | 10569 | null | |
SYS0000009 | null | 29.01.1998 | 7 | 11766 | null | |
SYS0000009 | null | 30.04.1998 | 8 | 11953 | null | |
SYS0000009 | null | 20.10.1999 | 9 | 12959 | null | |
SYS0000009 | null | 03.12.1999 | 10 | 12960 | 12960 | <- if ZEACO = null then BTRST by FAZID with MAX DATUM |
SYS0000210 | null | 04.05.2000 | 715 | 82 | null | |
SYS0000210 | null | 09.05.2001 | 716 | 158 | null | |
SYS0000210 | null | 06.05.2002 | 717 | 246 | null | |
SYS0000210 | null | 19.05.2003 | 718 | 364 | null | |
SYS0000210 | null | 02.06.2004 | 719 | 444 | null | |
SYS0000210 | null | 28.06.2005 | 720 | 632 | null | |
SYS0000210 | null | 18.07.2006 | 721 | 706 | null | |
SYS0000210 | null | 25.07.2007 | 722 | 844 | null | |
SYS0000210 | null | 30.07.2008 | 723 | 928 | 928 | <- if ZAECO=2 then BTRST where Index = Index-1 |
SYS0000210 | 2 | 06.10.2009 | 724 | 136 | -136 | <- if ZAECO=2 then BTRST *-1 |
SYS0000210 | null | 28.10.2010 | 725 | 243 | null | |
SYS0000210 | null | 14.07.2011 | 726 | 296 | null | |
SYS0000210 | null | 09.07.2012 | 727 | 366 | null | |
SYS0000210 | null | 18.09.2013 | 728 | 439 | null | |
SYS0000210 | null | 26.09.2014 | 729 | 497 | null | |
SYS0000210 | null | 07.10.2015 | 730 | 562 | null | |
SYS0000210 | null | 28.09.2016 | 731 | 641 | null | |
SYS0000210 | null | 20.07.2017 | 732 | 698 | null | |
SYS0000210 | null | 19.07.2018 | 733 | 771 | 771 | <- if ZEACO = null then BTRST by FAZID with MAX DATUM |
SYS0000337 | null | 14.02.2001 | 1030 | 5277 | null | |
SYS0000337 | null | 16.07.2001 | 1031 | 6231 | null | |
SYS0000337 | null | 20.02.2002 | 1032 | 7485 | null | |
SYS0000337 | 1 | 11.09.2002 | 1033 | 8477 | 8477 | <- if ZEACO = 1 then BTRST |
SYS0000337 | null | 16.01.2003 | 1034 | 4376 | null | |
SYS0000337 | null | 20.05.2003 | 1035 | 4564 | 4564 | <- if ZEACO = null then BTRST by FAZID with MAX DATUM |
try the code below - it uses grouping to identify specific scenarios and multiple conditionals to get final result - not sure about performance with 3mln rows
let Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("rZVPaxsxEMW/ivCxJItmRn9Whxy2SQo5hEDtQps0l6QuDgQfSkLab1/NW4l6s0ZOaQ1GWOvfzOjNG+3NzWL5ZWnxSYujxfb58TEvRB3ZjlJy+YfohrXU59Usbo/egLi/R/ybEPYF0b8HID6kNpI6S4po5Ii0MYQmIrazriI9kOSlncWWwnRPv8TJtwuz0hFXhOzIBF2/Plsr98fm4bu5Ph9Or8yJUcY8bdZb8371cbkyd7/Mh+H64sy8PDxtzOXw2ZwNq0+Xu8kYMWsy11nfcd5WFUjl6/l1eVMiFYJAQGw/684UCQVhICo2u5nUE4RqFgGi8SW4dhbucqKMOCCqnnNthPuC6Lkj65Mg7eNTRqIiAQhEsO2zZHOOSASi8fsDhanTgMCcrCIk7l9ZYDg/vTrh3e6/bNY/1uZi+239M3sD6zHNk/DYk2xMHp0XGZMp4WCGd/vi/VETIbEX2aPNs/GYqunGcxJ05IC5bKup/gMCM3GEMw6YKfcsKQIzQUcnsyGcniUUBGZimCnFdmGxHB9mEn3iQ9tMXAuDmURFCI7aSHEGpijCrCG1x4+qYjCTQIRI/+0+EYnThnK9HciOKnCcKTeFQnF7gaADy0yICaRCcL1UMgQ7uN7vhxCZRrUrIRhE1LZHCdqRoVk51Wsqh8RLTuLMjrPK/Q6kJTuP2+1fOnL7Gw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [FAZID = _t, ZAECO = _t, DATUM = _t, Index = _t, BTRST = _t, #"Description of ""Expected Outcome""" = _t]), #"Changed Type" = Table.TransformColumnTypes(Source,{{"FAZID", type text}, {"ZAECO", Int64.Type}, {"DATUM", type date}, {"Index", Int64.Type}, {"BTRST", Int64.Type}, {"Description of ""Expected Outcome""", type text}}), Buffer = Table.Buffer(#"Changed Type"), A_null_only = Table.SelectRows(Buffer, each ([ZAECO] = null)), A_null_only_grouped = Table.Group(A_null_only, {"FAZID", "ZAECO"}, {{"DATUM", each List.Max([DATUM]), type date}}), B_2_only = Table.SelectRows(Buffer, each ([ZAECO] = 2)), B_2_only_index = Table.SelectColumns(B_2_only,{"FAZID", "Index"}), B_2_only_new_index = Table.TransformColumns(B_2_only_index, {{"Index", each _ - 1, type number}}), Custom = Buffer, A_merged = Table.NestedJoin(Custom,{"FAZID", "ZAECO", "DATUM"},A_null_only_grouped,{"FAZID", "ZAECO", "DATUM"},"Custom1",JoinKind.LeftOuter), A_expanded = Table.ExpandTableColumn(A_merged, "Custom1", {"FAZID"}, {"A"}), B_merged = Table.NestedJoin(A_expanded,{"FAZID", "Index"},B_2_only_new_index,{"FAZID", "Index"},"Expanded Custom1",JoinKind.LeftOuter), B_expanded = Table.ExpandTableColumn(B_merged, "Expanded Custom1", {"FAZID"}, {"B"}), #"Merged Queries" = Table.NestedJoin(B_expanded,{"FAZID", "Index"},B_2_only_index,{"FAZID", "Index"},"B_expanded",JoinKind.LeftOuter), #"Expanded B_expanded" = Table.ExpandTableColumn(#"Merged Queries", "B_expanded", {"FAZID"}, {"B_minus"}), #"Added Custom" = Table.AddColumn(#"Expanded B_expanded", "Value", each if [FAZID]=[A] then [BTRST] else if [FAZID]=[B] then [BTRST] else if [FAZID]=[B_minus] then [BTRST]*-1 else if [ZAECO]=1 then [BTRST] else null) in #"Added Custom"
Hi @Stachu,
thank you very much. I'm still triying to figuere out how the code works. But the performance is extrmely slow. It's trying to finish the preview in PowerQuery for nearly 2 hours.
There is one phenomenon that doesn't make sense to me. As soon as i'm adding the Index-Column to the query, the size of my file increases by nearly 4 times. Before the Index-Column, the size of the file, wich includes 2.6 Million rows, was 13 MB. With the Index-Column, it rises to 53 MB! Do you have a clue why?
Regards
BjoernSchaefer
index cannot be compressed as well as the original data, I'd suggest adding it and then removeing again after it's not needed anymore
https://powerpivotpro.com/2015/12/compression-with-power-pivot/
Seems like you want to use EARLIER. See my article on Mean Time Before Failure (MTBF) which uses EARLIER: http://community.powerbi.com/t5/Community-Blog/Mean-Time-Between-Failure-MTBF-and-Power-BI/ba-p/3395...
Hi @Greg_Deckler,
thanks a lot for your reply. I want to realise this in PowerQuery. When i add a calculated column in PowerPivot, the size of my file is almost as twice so big as without that column.
Regards
BjoernSchaefer
User | Count |
---|---|
141 | |
113 | |
104 | |
78 | |
64 |
User | Count |
---|---|
136 | |
125 | |
107 | |
70 | |
61 |