Lead function, with partition by and order by in P...

fabriciofds · ‎06-27-2022

Hello guys, i need help to replicate the same function i have in sql to power query. I need to bring the result of the previous row of the column "ORGANIZATION_UNIT_ID" partitioned by a cluster which in this case is the "GLOBAL_EMPLOYEE_ID" of the person and ordered by the date (LOAD DATE).

here is the expected result:

LOAD_DATE	GLOBAL_EMPLOYEE_ID	ORGANIZATION_UNIT_ID	Last_Organization_Unit_ID
31/10/2021	2000	1	null
30/11/2021	2000	1	1
31/12/2021	2000	1	1
31/01/2022	2000	2	1
28/02/2022	2000	2	2
31/03/2022	2000	2	2
30/04/2022	2000	1	2
31/05/2022	2000	1	1
30/06/2022	2000	3	1
31/10/2021	1000	1	null
30/11/2021	1000	1	1
31/12/2021	1000	2	1
31/01/2022	1000	2	2
28/02/2022	1000	3	2
31/03/2022	1000	5	3
30/04/2022	1000	5	5
31/05/2022	1000	5	5
30/06/2022	1000	5	5

This is the code I use in sql:

SELECT
  LOAD_DATE,
  ORGANIZATION_UNIT_ID,
  GLOBAL_EMPLOYEE_ID,
  LEAD(ORGANIZATION_UNIT_ID, 1)
  OVER (PARTITION BY GLOBAL_EMPLOYEE_ID
  ORDER BY LOAD_DATE DESC
  ) AS Last_Org_Unit_ID

FROM Table

I've already tried to merge the table with it even using load date -1, but the mass of data I have is very large and makes the process of becoming a data flow unfeasible

If anyone has any ideas, I'd appreciate it.

watkinnc · ‎06-27-2022

You can write that SQL right into a dataflow.

--Nate

I’m usually answering from my phone, which means the results are visualized only in my mind. You’ll need to use my answer to know that it works—but it will work!!

watkinnc · ‎06-27-2022

I always find that writing the LEAD/LAG SQL is faster than any other way of doing previous row stuff. If you are using PBI Desktop, you can add "EnableFolding = true" to the Value.NativeQuery parameter to keep the query folding.

--Nate

I’m usually answering from my phone, which means the results are visualized only in my mind. You’ll need to use my answer to know that it works—but it will work!!

fabriciofds · ‎06-27-2022

Hi @watkinnc. I totally agree with you. In SQL to do this same treatment takes a maximum of 2 minutes. In power query, it takes hours. The problem with this and the reason is that in a few days the base that I do this in sql will be made available to me only through a data flow.

watkinnc · ‎06-27-2022

You could also do the classic indexes starting from zero and then starting from one and then merge the table to itself.

--Nate

I’m usually answering from my phone, which means the results are visualized only in my mind. You’ll need to use my answer to know that it works—but it will work!!

Vijay_A_Verma · ‎06-27-2022

See the working here - Open a blank query - Home - Advanced Editor - Remove everything from there and paste the below code to test

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("dc9BDoAgDETRu7Am6UwR4l2I97+GSAxooSsWLxP6aw2JQohCGWJQAO1huGITCLmXtlFP0Dc6RbvoKdCtPJu0FwgOI/OfvJe2KUbSuHqU0i2lW0p79SilW0pzwSx9Ja+lf/mWLpuyynUD", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [LOAD_DATE = _t, GLOBAL_EMPLOYEE_ID = _t, ORGANIZATION_UNIT_ID = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"LOAD_DATE", type text}, {"GLOBAL_EMPLOYEE_ID", Int64.Type}, {"ORGANIZATION_UNIT_ID", Int64.Type}}),
    ListOfIDs = List.Buffer(#"Changed Type"[GLOBAL_EMPLOYEE_ID]),
    ListOfValues = List.Buffer(#"Changed Type"[ORGANIZATION_UNIT_ID]),
    GenList = List.Generate(()=>[x=null,i=0], each [i]<List.Count(ListOfValues), each [i=[i]+1, x=(if ListOfIDs{i}=ListOfIDs{[i]} then ListOfValues{[i]} else null)], each [x]),
    Result = Table.FromColumns(Table.ToColumns(#"Changed Type")&{GenList},Table.ColumnNames(#"Changed Type")&{"Last_Organization_Unit_ID"})
in
    Result

fabriciofds · ‎06-27-2022

Hi @Vijay_A_Verma. Thanks a lot for the solution. It brought the expected result, but not in the required performance. I applied it to my dataset and it's been two hours processing the data flow.

I'm not an expert on the subject, but I've been trying to evolve from a grouping made by "Global ID". I can add an index inside the virtual table. The expected is that it was already possible to create a column referencing the previous result. But according to some tutorials I've seen this is only possible by referencing the previous step. (https://www.youtube.com/watch?v=UPddzZnsf5w)

Due to the complexity of the code you gave me, I'm sure you could do something like that. I believe it can be more performative for the data profile I'm working on (a few million lines) kkk.

here is the code i am using:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("dc9BDoAgDETRu7Am6UwR4l2I97+GSAxooSsWLxP6aw2JQohCGWJQAO1huGITCLmXtlFP0Dc6RbvoKdCtPJu0FwgOI/OfvJe2KUbSuHqU0i2lW0p79SilW0pzwSx9Ja+lf/mWLpuyynUD", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [LOAD_DATE = _t, GLOBAL_EMPLOYEE_ID = _t, ORGANIZATION_UNIT_ID = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"LOAD_DATE", type date}, {"GLOBAL_EMPLOYEE_ID", Int64.Type}, {"ORGANIZATION_UNIT_ID", Int64.Type}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"GLOBAL_EMPLOYEE_ID"}, {{"Data", each _, type table [LOAD_DATE=nullable date, GLOBAL_EMPLOYEE_ID=nullable number, ORGANIZATION_UNIT_ID=nullable number]}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "DataIndex", each Table.AddIndexColumn([Data],"Index",1))
in
    #"Added Custom"

If I could do something along those lines, that would be great.

in advance, thank you very much

artpil · ‎06-27-2022

Hi Vijay,

This is great code. Can you explain what's going on in this code?

List.Generate(()=>[x=null,i=0], each [i]<List.Count(ListOfValues), each [i=[i]+1, x=(if ListOfIDs{i}=ListOfIDs{[i]} then ListOfValues{[i]} else null)], each [x])

What I undestood is:

()=>[x=null,i=0] - this creates record object with values x=null and i=0

each [i]<List.Count(ListOfValues) - this checks if iteration is smaller then number of items in list

each [i=[i]+1, x=(if ListOfIDs{i}=ListOfIDs{[i]} - this creates record with column i equal i + 1 and column x - here I don't understand what is happening. ListOfIDs{i} means get element i from list ListOfIDs but what does it does ListOfIDs{[i]} ?

last part returns x calculated earlier.

Thanks in advance for your explanantion.

Artur

Vijay_A_Verma · ‎06-27-2022

i = [i]+1, hence in ListOfIDs{i}=ListOfIDs{[i]}, I can also use ListOfIDs{i}=ListOfIDs{i-1} => I am checking wether current value of list is = previous value of list or not. If it is equal, it means same group continues. If it is unequal, then a new group has started. If a new group is started, then result should be null. If within same group, then value is ListOfValues{[i]} where [i] means i-1. So I could have written it ListOfValues{i-1} as well.

Lead function, with partition by and order by in Power Query

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024