Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
tempranello
Advocate I
Advocate I

Inefficient function?

Hi there

 

I wonder if anyone can give me a steer regarding my function code (below).  While it's functional, it cripples the speed of my query.

 

I've a source of Timesheets data where all staff record their time against various projects.  I've no issues obtaining and using the data.  However each staff member's team membership at the time of their timesheet entry is not reliable due to various reasons. 

 

To combat this I have a seperate data source, Staff Movements that simply lists each staff member, their team name and the data on which they joined that team.  That means the table may have multiple records for a staff member, recording their movement through the company.  Eg.

 

Timesheets

Name           Project      Date

Joe Bloggs    Project X   01/01/2014

Joe Bloggs    Project Y   10/04/2015

 

Staff Movements

Name           Team                Date Joined

Joe Bloggs    Project Office    01/01/2014

Joe Bloggs    HR                    10/04/2015

 

My aim is to produce a Timesheets query that includes a custom column, Timesheets.Team that pulls the Staff Movements.Team value where (please excuse the psuedocode) Timesheet.Name = Staff Movements.Namen AND Timesheets.Date >= Staff Movements.Date Joined.

 

I thought I had this licked by creating a power query function that I called from the Timesheets.Team customer column: 

 

fnLookupTeam([Name],[Date],"Team",#"Staff Movements")

 

 

My function:

 

/*  
      Purpose:  Filter the lookup_table by lookup_name and lookup_date, and return the value in the specified column
    */
(lookup_name as text, lookup_date as date, return_column as text, lookup_table as table) as any =>
let
        FilterTable = Table.SelectRows(lookup_table, each Text.Contains([Name], lookup_name) and [Joined] <= lookup_date),
        ReturnResult = 
            if Table.IsEmpty(FilterTable)=true
            then
            "NO TEAM"
            else
            Record.Field(Table.First(Table.Sort(FilterTable,{"Joined",Order.Descending})), return_column)
in ReturnResult#

 

The problem is:  My Timesheets query works blisteringly fast if I omit the function call.  That tells me that data source and my links to it are fine.  When I introduce the function call, the query becomes glacial.  Now I know what I'm asking Power Query to do, and its a bit of logic for each row in a 97,000 query, but I'm a bit surprised that the performance is soooo bad.  As a work around I've simply stood up the two seperate querys (Timesheets and Staff Movements) and performed a DAX query in a customer colum:

 

Team = CALCULATE(values('Staff movements'[team]),TOPN(1,FILTER('Staff movements','Timesheets'[Name]='Staff movements'[Name]&&'Timesheets'[Date]>'Staff movements'[Joined]),'Staff movements'[Joined]))

 

This works incredbibly fast, but means that I cannot use the custom team column in any power queries 😞

 

I'd appreciate anyone's thoughts!

 

Thanks

1 ACCEPTED SOLUTION

Hi Aaron,

if that's lightning-fast, let's make match just one row. Therefore you have to create "TeamLookup"-Table like this:

 

 

let

fnFillDateIntervalls = (Table as table, DateColumn as text, FillUpUntil as date)=>

let

//DebugParameters:
//Table = #"Staff Movements",
//DateColumn = "Joined",

    Source = Table,
    #"Sorted Rows" = Table.Sort(Source,{{DateColumn, Order.Descending}}),
    Index = Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1),
    Change = Table.TransformColumnTypes(Index,{{DateColumn, Int64.Type}}),
    #"Added Custom" = Table.AddColumn(Change, "Custom", each [Index]+1),
    #"Merged Queries" = Table.NestedJoin(#"Added Custom",{"Index"},#"Added Custom",{"Custom"},"NewColumn",JoinKind.LeftOuter),
    #"Expanded NewColumn" = Table.ExpandTableColumn(#"Merged Queries", "NewColumn", {DateColumn}, {"NewColumn.Date"}),
    #"Added Custom1" = Table.AddColumn(#"Expanded NewColumn", "Custom.1", each try {Record.Field(_,DateColumn)..[NewColumn.Date]} otherwise {Record.Field(_,DateColumn)..Number.From(FillUpUntil)}),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{DateColumn, "Index", "Custom", "NewColumn.Date"}),
    #"Expanded Custom.1" = Table.ExpandListColumn(#"Removed Columns", "Custom.1"),
    #"Renamed Columns" = Table.RenameColumns(#"Expanded Custom.1",{{"Custom.1", DateColumn}}),
    #"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{DateColumn, type date}})
in
    #"Changed Type",

    Source = #"Staff Movements",
    #"Grouped Rows" = Table.Group(Source, {"Name"}, {{"All", each fnFillDateIntervalls(_, "Joined", Date.From(DateTime.LocalNow())), type table}}),
    #"Expanded All" = Table.ExpandTableColumn(#"Grouped Rows", "All", {"Joined", "Team"}, {"Joined", "Team"})
in
    #"Expanded All"

 

 

This query transforms your "Staff Movements"-table into a table with one row for each day. You should disable load to the datamodel. So although this lengthens your table considerably, the result is still very fast, as it used the simple query you mentioned in your last post.

 

Edit: Don't use Internet-Explorer or Edge to copy this code but preferrably Firefox. Otherwise the code will break (this time will greet you with a request for a comma where there is already one) !!!

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

View solution in original post

7 REPLIES 7
ImkeF
Super User
Super User

There are some things you can try:

 

1) Group your fact table (Timesheets) on Name and return "All Rows" (Edit: See description here: https://blog.crossjoin.co.uk/2015/05/11/nested-calculations-in-power-query/). This will sort-of partition your big table: Returning only those rows of the table whith that name. You can then omit that condition in the SelectRows-step.

 

I would think that this will speed up considerably and no further actions necessary.

 

Other ideas: The sorting is expensive - so you could sort the lookup-table once as a start, pass an index-column to it and buffer that before passing to the function. You then select the MAX index-column instead in your last step.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Hi ImkeF

 

Thank you for your input.  It's been very helpful.  I reviewed the link and I had a bash and the resulting query is a little faster.  I wonder if it's not too much to ask for your opinion of how I complete the code?  I've found that grouping the data by name does, as you say, chunk the table up nicely.  It allows me to operate on a table per Name, which is great!

 

My problem is that I've not found a way to then work through each row of the table, comparing the Date with the Joined date of the Staff Movements table.  The only way I can do that is to still make a call to the original function that now doesn't require the Text.Contains([Name], lookup_name) and is therefore a bit faster.

 

As the result remains quite slow, am I missing a more elegant way to check the Timesheets.Date against the Staff Movements.Joined field?

 

My Timesheets query:

let
    Source = Sql.Databases("XXXX"),
    SystemDB = Source{[Name="SystemDB"]}[Data],
    dbo_Timesheets = SystemDB{[Schema="dbo",Item="Timesheets"]}[Data],
    //UNIMPORTANT CLEAN UP STEPS HERE
    #"Changed Type" = Table.TransformColumnTypes(dbo_Timesheets,{{"Date", type date}}),
    //Group by Name
    GroupedByName = Table.Group(#"Changed Type", {"Name"}, {{"AllRows", each _, type table}}),
    FindTeamFunction = (tabletopopulate as table) as table =>
        let
	    AssignTeam =  Table.AddColumn(tabletopopulate, "Team", each if [Name] <> "" then fnLookupTeam([Date],"Team",#"Staff movements lookup") else "NO TEAM", type text)
        in
    AssignTeam,
    //Apply the function to the AllRows column
    AddedTeam = Table.TransformColumns(GroupedByName, {"AllRows", each FindTeamFunction(_)}),
    #"Expanded AllRows" = Table.ExpandTableColumn(AddedTeam, "AllRows", {"Name", "Project", "Date", "Team"})
in
    #"Expanded AllRows"

 

I've modified the fnLookupTeam function to omit the Name search:

(lookup_date as date, return_column as text, lookup_table as table) as any =>
let
        /*FilterTable = Table.SelectRows(lookup_table, each Text.Contains([Name], lookup_name) and [Joined] <= lookup_date),*/
        FilterTable = Table.SelectRows(lookup_table, each [Joined] <= lookup_date),
        ReturnResult = 
            if Table.IsEmpty(FilterTable)=true
            /*if Table.RowCount(FilterTable)=0*/
            then
            "NO TEAM"
            else
            Record.Field(Table.First(Table.Sort(FilterTable,{"Joined",Order.Descending})), return_column)
in ReturnResult

 

If I've reached the limit of what can be done, then it's cool as I'll still have the DAX workaround; it's just a bummer as I'd much prefer to have Timesheets.Team in place in the query rather than in the table later.

 

Thanks for your help!

Aaron

Ah, this is interesting.

 

(paraphrasing)

Question:

Which way there will be a better performance:
1) If I use Table.NestedJoin
2) if I use Table.SelectRows

 

Answer:

Table.NestedJoin is probably the way to go. Table.SelectRows will end up scanning Table2 once for each row of Table1

 

I'm a bit stumped at how to make Table.NestedJoin work as the Formula Reference has no example of how the keyEqualityComparers work. 

 

I'll keep bashing away at this.

I wonder if I'm on the right track here:  I've attempted a nested join, which seems to be lightning-fast but leaves me stumped on how to filter the result by date.

 

let
    Source = Timesheets,
    TeamLookup = #"Staff Movements",
    AssignTeam = Table.NestedJoin(Source, {"Name"}, TeamLookup, {"Name"},"Team",JoinKind.LeftOuter)
in
    AssignTeam

 

Reminder of the tables I have

Timesheets

Name           Project      Date

Joe Bloggs    Project X   01/01/2014

Joe Bloggs    Project Y   10/04/2015

Fred Jones    Project A   01/02/2016

 

Staff Movements

Name           Team                Date Joined

Joe Bloggs    Project Office    01/01/2014

Joe Bloggs    HR                    10/04/2015

Fred Jones    Finance             30/12/2015

 

 

The code above works beautifully if a staff member has only one entry in the Staff Movements table, ie. if I expand the new "Team" column, I'd get a single row for Timesheet entries for Fred Jones as he's only one team.  However, for people like Joe Bloggs, he nested join understandably returns two rows.

 

What I'm struggling with here is how to introduce a filter to the new "Team" column to reduce the number of rows it returns to be only one, ie. where the Timesheets.Date value >= Staff Movements.Date Joined.

 

Any ideas?

 

Thanks!

Aaron

Hi Aaron,

if that's lightning-fast, let's make match just one row. Therefore you have to create "TeamLookup"-Table like this:

 

 

let

fnFillDateIntervalls = (Table as table, DateColumn as text, FillUpUntil as date)=>

let

//DebugParameters:
//Table = #"Staff Movements",
//DateColumn = "Joined",

    Source = Table,
    #"Sorted Rows" = Table.Sort(Source,{{DateColumn, Order.Descending}}),
    Index = Table.AddIndexColumn(#"Sorted Rows", "Index", 0, 1),
    Change = Table.TransformColumnTypes(Index,{{DateColumn, Int64.Type}}),
    #"Added Custom" = Table.AddColumn(Change, "Custom", each [Index]+1),
    #"Merged Queries" = Table.NestedJoin(#"Added Custom",{"Index"},#"Added Custom",{"Custom"},"NewColumn",JoinKind.LeftOuter),
    #"Expanded NewColumn" = Table.ExpandTableColumn(#"Merged Queries", "NewColumn", {DateColumn}, {"NewColumn.Date"}),
    #"Added Custom1" = Table.AddColumn(#"Expanded NewColumn", "Custom.1", each try {Record.Field(_,DateColumn)..[NewColumn.Date]} otherwise {Record.Field(_,DateColumn)..Number.From(FillUpUntil)}),
    #"Removed Columns" = Table.RemoveColumns(#"Added Custom1",{DateColumn, "Index", "Custom", "NewColumn.Date"}),
    #"Expanded Custom.1" = Table.ExpandListColumn(#"Removed Columns", "Custom.1"),
    #"Renamed Columns" = Table.RenameColumns(#"Expanded Custom.1",{{"Custom.1", DateColumn}}),
    #"Changed Type" = Table.TransformColumnTypes(#"Renamed Columns",{{DateColumn, type date}})
in
    #"Changed Type",

    Source = #"Staff Movements",
    #"Grouped Rows" = Table.Group(Source, {"Name"}, {{"All", each fnFillDateIntervalls(_, "Joined", Date.From(DateTime.LocalNow())), type table}}),
    #"Expanded All" = Table.ExpandTableColumn(#"Grouped Rows", "All", {"Joined", "Team"}, {"Joined", "Team"})
in
    #"Expanded All"

 

 

This query transforms your "Staff Movements"-table into a table with one row for each day. You should disable load to the datamodel. So although this lengthens your table considerably, the result is still very fast, as it used the simple query you mentioned in your last post.

 

Edit: Don't use Internet-Explorer or Edge to copy this code but preferrably Firefox. Otherwise the code will break (this time will greet you with a request for a comma where there is already one) !!!

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

That's perfect, ImkeF!

 

Thank you very much for your help.

Also: I've never used Table.IsEmpty - maybe Table.Countrows is faster - but just a vague guess.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.