Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
I am having a problem with performance on table with 1.5million rows of time series data where I am using the timestamp to look up an event ID from a table of events. The secondary tables have an event ID column with a column for start time and a column for end time. Note that not all rows have an event ID. With the event ID columns added to the time series table, a relationship is created back the event tables so that we can filter charts of the time series data by clicking on the events.
It takes 2 minutes to load all the data and set the types on the time series table, but when I added the custom columns and drop errors (bolded rows below) it takes hours to load.
Is there a better way to do this? I tried creating these columns as a Calculated Column (which is much faster), but it does not allow a relationship to be created between the calculated column and the original table because of circular dependency.
Note: The TimestampDT column exist because all these datetimes are in Date/Time/Zone format and I was not able to directly do a <= >= comparison in the power query without first converting to a Date/Time columns.
let
Source = TimeSeriesDataSource,
#"Changed Type" = Table.TransformColumnTypes(Source,List.Transform(List.Skip(Table.ColumnNames(Source),1), each {_, Number.Type})),
#"Changed Type2" = Table.TransformColumnTypes(#"Changed Type",{{"Timestamp", type datetimezone}}),
#"Duplicated Column" = Table.DuplicateColumn(#"Changed Type2", "Timestamp", "TimestampDT"),
#"Changed Type1" = Table.TransformColumnTypes(#"Duplicated Column",{{"TimestampDT", type datetime}}),
#"Added Custom" = Table.AddColumn(#"Changed Type1", "EventType1ID", each (let rowTime = [TimestampDT] in Table.SelectRows(Event1Table, each [StartDateTimeDT] <= rowTime and [EndDateTimeDT] >= rowTime)){0}[BusinessKey]),
#"Added Custom1" = Table.AddColumn(#"Added Custom", "EventType2ID", each (let rowTime = [TimestampDT] in Table.SelectRows(Event2Table, each [StartDateTimeDT] <= rowTime and [EndDateTimeDT] >= rowTime)){0}[BusinessKey]),
#"Replaced Errors1" = Table.ReplaceErrorValues(#"Added Custom1", {{"EventType1ID", ""}, {"EventType2ID", ""}})
in
#"Replaced Errors1"
Thanks in advance for the help.
Solved! Go to Solution.
The solution ended up being Custom Channels using the following code to move the lookup out of the Power Query.
To avoid the circular references, I created new duplicates of the event tables and I do the lookup from there. Then it is possible create a relationship between the Time Series EventID column (sourced from the lookup copy of the event table) to the original event table.
I'm not sure if this will help but I ran SQL Server Profiler and this is the item that is taking all the time (7+ hours):
This is the TextData from that row:
let __AS_Query__ = TimeSeriesData,
__AS_Table__ = Table.FromValue(__AS_Query__),
__AS_Compact__ = Table.RemoveColumns(__AS_Table__, Table.ColumnsOfType(__AS_Table__, { type table, type record, type list })),
__AS_Effective__ = Table.TransformColumnNames(__AS_Compact__, Text.Clean)
in __AS_Effective__
Hi @AaronApexPeak ,
I think your issue should be caused by large size of data or complex query. Here I suggest you to optmize your data model or update your query.
For reference: Optimization guide for Power BI - Power BI | Microsoft Learn
Best Regards,
Rico Zhou
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.
The solution ended up being Custom Channels using the following code to move the lookup out of the Power Query.
To avoid the circular references, I created new duplicates of the event tables and I do the lookup from there. Then it is possible create a relationship between the Time Series EventID column (sourced from the lookup copy of the event table) to the original event table.
User | Count |
---|---|
141 | |
113 | |
104 | |
77 | |
64 |
User | Count |
---|---|
135 | |
123 | |
101 | |
71 | |
61 |