Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and a 50 percent discount on exams.
Get startedEarn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
Hi,
I need to work with a very large data table on impala.
Incremental refresh works fine but then the problem I m facing is the initial load.
We need Rolling 1 year data in this table and this can be very large (we only have 2 months data in the impala table)
Workspace was upgraded to P2 premium capacity recently and we cannot upgrade for a new plan as of now.
No other query is being executed in this workspace at the time of refresh. So I need a workaround which can help me load the large amount of data on first load.
I tried the below approach....
1. I created a static table with one column containing Business units (Filter criteria for resultset)
2. Created function that queries for the filtered data and invoked it as a custom column (Works fine without incr refresh)
3. Enabled incremental refresh on a datetime column which was part of the result set... and I get an error shown below
Error: PipelineException: Access was denied. .. RootActivityId = b7f50a68-a02d-41f0-875d-9e23052b6000.Param1 = PipelineException: Access was denied. Request ID: 55c5e57a-ea3c-5fd8-6f84-c66617914a9d.
Note: While I can create separate table (BU wise an laod this data with incr refresh and then append these tables. I would like to avoid creating multiple tables. I m also inteersted to know the reason for failure in the above approach.
Final Output Query:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcjM0UIrVAdGGUNoIShuDaXeovDtU3h0q7w6V94DKe0HlvaF8bzjfCEoD1ccCAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [busunit = _t]),
#"Added custom" = Table.AddColumn(Source, "Custom", each Query([busunit])),
#"Expanded Custom" = Table.ExpandTableColumn(#"Added custom", "Custom", {"calday"}, {"calday"}),
#"Transform columns" = Table.TransformColumnTypes(#"Expanded Custom", {{"calday", type datetime}}),
#"Replace errors" = Table.ReplaceErrorValues(#"Transform columns", {{"calday", null}}),
#"Table-63616C646179-autogenerated_for_incremental_refresh" = Table.SelectRows(#"Replace errors", each DateTime.From([calday]) >= RangeStart and DateTime.From([calday]) < RangeEnd)
in
#"Table-63616C646179-autogenerated_for_incremental_refresh"
Get Query (Function):
let
Query = (BU as text) => let
DB = Impala.Database(@Server),
Schema = DB{[Name = "MySchema", Kind = "Schema"]}[Data],
Data = Schema{[Name = "MyTable", Kind = "Table"]}[Data],
#"Filtered rows" = Table.SelectRows(Data, each [busunit] = BU),
#"Removed other columns" = Table.SelectColumns(#"Filtered rows", {"calday"}),
#"Removed duplicates" = Table.Distinct(#"Removed other columns")
in
#"Removed duplicates"
in
Query
Thanks & regards,
Mannu
Solved! Go to Solution.
HI @modi123p,
I think the buffer functions may suitable for your scenario but I'm not so sure if it has enough permission to execute on the power bi premium server.
How and When to use List & Table Buffer?
For your requirement, you can take a look at the following steps if they works.
Steps:
1, Create a static table to store large amounts of old records and use on-demand refresh instead of configuring the scheduler refresh on this data source.
2, Create a dynamic table loading less amount of records, you need to exclude the ranges of first table records to reduce the processing data amount. (you can configure incremental refresh on it)
3, Add a custom query table that merges the above two tables records without any other advanced operations.
Regards,
Xiaoxin Sheng
HI @modi123p,
I think the buffer functions may suitable for your scenario but I'm not so sure if it has enough permission to execute on the power bi premium server.
How and When to use List & Table Buffer?
For your requirement, you can take a look at the following steps if they works.
Steps:
1, Create a static table to store large amounts of old records and use on-demand refresh instead of configuring the scheduler refresh on this data source.
2, Create a dynamic table loading less amount of records, you need to exclude the ranges of first table records to reduce the processing data amount. (you can configure incremental refresh on it)
3, Add a custom query table that merges the above two tables records without any other advanced operations.
Regards,
Xiaoxin Sheng