Re: Complex Custom Column in PowerQuery

BjoernSchaefer · ‎02-01-2019

Hello everybody,

for me, i'm havin trouble realising the following task.

I want to add the Column "Expected Outcome" in PowerQuery. I was able to add the column in PowerPivot, but therefor i have to load all the rows (ca. 3 Million). And for the case, where ZAECO=2 and i have to pick the BTRST from the earlier Date and times it by -1 i had to create a new calculated column.

If there is a way to add this column via PowerQuery, i can delete the columns BTRST, INDEX and ZAECO and uncheck the null-Values wich makes the Datamodel more efficient. I don't think that a simple IF-Statement would be sufficient.

Thanks a lot for every respond.

Regards

BjoernSchaefer

FAZID	ZAECO	DATUM	Index	BTRST	Expected Outcome	Description of "Expected Outcome"
SYS0000009	null	11.10.1994	3	10018	null
SYS0000009	null	11.10.1994	4	10018	null
SYS0000009	null	11.10.1994	5	10018	null
SYS0000009	null	25.10.1995	6	10569	null
SYS0000009	null	29.01.1998	7	11766	null
SYS0000009	null	30.04.1998	8	11953	null
SYS0000009	null	20.10.1999	9	12959	null
SYS0000009	null	03.12.1999	10	12960	12960	<- if ZEACO = null then BTRST by FAZID with MAX DATUM
SYS0000210	null	04.05.2000	715	82	null
SYS0000210	null	09.05.2001	716	158	null
SYS0000210	null	06.05.2002	717	246	null
SYS0000210	null	19.05.2003	718	364	null
SYS0000210	null	02.06.2004	719	444	null
SYS0000210	null	28.06.2005	720	632	null
SYS0000210	null	18.07.2006	721	706	null
SYS0000210	null	25.07.2007	722	844	null
SYS0000210	null	30.07.2008	723	928	928	<- if ZAECO=2 then BTRST where Index = Index-1
SYS0000210	2	06.10.2009	724	136	-136	<- if ZAECO=2 then BTRST *-1
SYS0000210	null	28.10.2010	725	243	null
SYS0000210	null	14.07.2011	726	296	null
SYS0000210	null	09.07.2012	727	366	null
SYS0000210	null	18.09.2013	728	439	null
SYS0000210	null	26.09.2014	729	497	null
SYS0000210	null	07.10.2015	730	562	null
SYS0000210	null	28.09.2016	731	641	null
SYS0000210	null	20.07.2017	732	698	null
SYS0000210	null	19.07.2018	733	771	771	<- if ZEACO = null then BTRST by FAZID with MAX DATUM
SYS0000337	null	14.02.2001	1030	5277	null
SYS0000337	null	16.07.2001	1031	6231	null
SYS0000337	null	20.02.2002	1032	7485	null
SYS0000337	1	11.09.2002	1033	8477	8477	<- if ZEACO = 1 then BTRST
SYS0000337	null	16.01.2003	1034	4376	null
SYS0000337	null	20.05.2003	1035	4564	4564	<- if ZEACO = null then BTRST by FAZID with MAX DATUM

Stachu · ‎02-01-2019

try the code below - it uses grouping to identify specific scenarios and multiple conditionals to get final result - not sure about performance with 3mln rows

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("rZVPaxsxEMW/ivCxJItmRn9Whxy2SQo5hEDtQps0l6QuDgQfSkLab1/NW4l6s0ZOaQ1GWOvfzOjNG+3NzWL5ZWnxSYujxfb58TEvRB3ZjlJy+YfohrXU59Usbo/egLi/R/ybEPYF0b8HID6kNpI6S4po5Ii0MYQmIrazriI9kOSlncWWwnRPv8TJtwuz0hFXhOzIBF2/Plsr98fm4bu5Ph9Or8yJUcY8bdZb8371cbkyd7/Mh+H64sy8PDxtzOXw2ZwNq0+Xu8kYMWsy11nfcd5WFUjl6/l1eVMiFYJAQGw/684UCQVhICo2u5nUE4RqFgGi8SW4dhbucqKMOCCqnnNthPuC6Lkj65Mg7eNTRqIiAQhEsO2zZHOOSASi8fsDhanTgMCcrCIk7l9ZYDg/vTrh3e6/bNY/1uZi+239M3sD6zHNk/DYk2xMHp0XGZMp4WCGd/vi/VETIbEX2aPNs/GYqunGcxJ05IC5bKup/gMCM3GEMw6YKfcsKQIzQUcnsyGcniUUBGZimCnFdmGxHB9mEn3iQ9tMXAuDmURFCI7aSHEGpijCrCG1x4+qYjCTQIRI/+0+EYnThnK9HciOKnCcKTeFQnF7gaADy0yICaRCcL1UMgQ7uN7vhxCZRrUrIRhE1LZHCdqRoVk51Wsqh8RLTuLMjrPK/Q6kJTuP2+1fOnL7Gw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [FAZID = _t, ZAECO = _t, DATUM = _t, Index = _t, BTRST = _t, #"Description of ""Expected Outcome""" = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"FAZID", type text}, {"ZAECO", Int64.Type}, {"DATUM", type date}, {"Index", Int64.Type}, {"BTRST", Int64.Type}, {"Description of ""Expected Outcome""", type text}}),
    Buffer = Table.Buffer(#"Changed Type"),
    A_null_only = Table.SelectRows(Buffer, each ([ZAECO] = null)),
    A_null_only_grouped = Table.Group(A_null_only, {"FAZID", "ZAECO"}, {{"DATUM", each List.Max([DATUM]), type date}}),
    B_2_only = Table.SelectRows(Buffer, each ([ZAECO] = 2)),
    B_2_only_index = Table.SelectColumns(B_2_only,{"FAZID", "Index"}),
    B_2_only_new_index = Table.TransformColumns(B_2_only_index, {{"Index", each _ - 1, type number}}),
    Custom = Buffer,
    A_merged = Table.NestedJoin(Custom,{"FAZID", "ZAECO", "DATUM"},A_null_only_grouped,{"FAZID", "ZAECO", "DATUM"},"Custom1",JoinKind.LeftOuter),
    A_expanded = Table.ExpandTableColumn(A_merged, "Custom1", {"FAZID"}, {"A"}),
    B_merged = Table.NestedJoin(A_expanded,{"FAZID", "Index"},B_2_only_new_index,{"FAZID", "Index"},"Expanded Custom1",JoinKind.LeftOuter),
    B_expanded = Table.ExpandTableColumn(B_merged, "Expanded Custom1", {"FAZID"}, {"B"}),
    #"Merged Queries" = Table.NestedJoin(B_expanded,{"FAZID", "Index"},B_2_only_index,{"FAZID", "Index"},"B_expanded",JoinKind.LeftOuter),
    #"Expanded B_expanded" = Table.ExpandTableColumn(#"Merged Queries", "B_expanded", {"FAZID"}, {"B_minus"}),
    #"Added Custom" = Table.AddColumn(#"Expanded B_expanded", "Value", each if [FAZID]=[A] then [BTRST] else 
if [FAZID]=[B] then [BTRST] else
if [FAZID]=[B_minus] then [BTRST]*-1 else
if [ZAECO]=1 then [BTRST] else
null)
in
    #"Added Custom"

Did I answer your question? Mark my post as a solution!
Thank you for the kudos 🙂

BjoernSchaefer · ‎02-04-2019

Hi @Stachu,

thank you very much. I'm still triying to figuere out how the code works. But the performance is extrmely slow. It's trying to finish the preview in PowerQuery for nearly 2 hours.

There is one phenomenon that doesn't make sense to me. As soon as i'm adding the Index-Column to the query, the size of my file increases by nearly 4 times. Before the Index-Column, the size of the file, wich includes 2.6 Million rows, was 13 MB. With the Index-Column, it rises to 53 MB! Do you have a clue why?

Regards

BjoernSchaefer

Stachu · ‎02-05-2019

index cannot be compressed as well as the original data, I'd suggest adding it and then removeing again after it's not needed anymore
https://powerpivotpro.com/2015/12/compression-with-power-pivot/

Did I answer your question? Mark my post as a solution!
Thank you for the kudos 🙂

Greg_Deckler · ‎02-01-2019

Seems like you want to use EARLIER. See my article on Mean Time Before Failure (MTBF) which uses EARLIER: http://community.powerbi.com/t5/Community-Blog/Mean-Time-Between-Failure-MTBF-and-Power-BI/ba-p/3395...

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

BjoernSchaefer · ‎02-04-2019

Hi @Greg_Deckler,

thanks a lot for your reply. I want to realise this in PowerQuery. When i add a calculated column in PowerPivot, the size of my file is almost as twice so big as without that column.

Regards

BjoernSchaefer

Complex Custom Column in PowerQuery

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly