Solved: Re: Incremental refresh on Data Flow failing with ...

modi123p · ‎10-01-2020

Hi,

I need to work with a very large data table on impala.

Incremental refresh works fine but then the problem I m facing is the initial load.

We need Rolling 1 year data in this table and this can be very large (we only have 2 months data in the impala table)

Workspace was upgraded to P2 premium capacity recently and we cannot upgrade for a new plan as of now.

No other query is being executed in this workspace at the time of refresh. So I need a workaround which can help me load the large amount of data on first load.

I tried the below approach....

1. I created a static table with one column containing Business units (Filter criteria for resultset)

2. Created function that queries for the filtered data and invoked it as a custom column (Works fine without incr refresh)

3. Enabled incremental refresh on a datetime column which was part of the result set... and I get an error shown below

Error: PipelineException: Access was denied. .. RootActivityId = b7f50a68-a02d-41f0-875d-9e23052b6000.Param1 = PipelineException: Access was denied. Request ID: 55c5e57a-ea3c-5fd8-6f84-c66617914a9d.

Note: While I can create separate table (BU wise an laod this data with incr refresh and then append these tables. I would like to avoid creating multiple tables. I m also inteersted to know the reason for failure in the above approach.

Final Output Query:

let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WcjM0UIrVAdGGUNoIShuDaXeovDtU3h0q7w6V94DKe0HlvaF8bzjfCEoD1ccCAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [busunit = _t]),
#"Added custom" = Table.AddColumn(Source, "Custom", each Query([busunit])),
#"Expanded Custom" = Table.ExpandTableColumn(#"Added custom", "Custom", {"calday"}, {"calday"}),
#"Transform columns" = Table.TransformColumnTypes(#"Expanded Custom", {{"calday", type datetime}}),
#"Replace errors" = Table.ReplaceErrorValues(#"Transform columns", {{"calday", null}}),
#"Table-63616C646179-autogenerated_for_incremental_refresh" = Table.SelectRows(#"Replace errors", each DateTime.From([calday]) >= RangeStart and DateTime.From([calday]) < RangeEnd)
in
#"Table-63616C646179-autogenerated_for_incremental_refresh"

Get Query (Function):

let
Query = (BU as text) => let
DB = Impala.Database(@Server),
Schema = DB{[Name = "MySchema", Kind = "Schema"]}[Data],

Data = Schema{[Name = "MyTable", Kind = "Table"]}[Data],
#"Filtered rows" = Table.SelectRows(Data, each [busunit] = BU),
#"Removed other columns" = Table.SelectColumns(#"Filtered rows", {"calday"}),
#"Removed duplicates" = Table.Distinct(#"Removed other columns")
in
#"Removed duplicates"
in
Query

Thanks & regards,

Mannu

v-shex-msft · ‎10-04-2020

HI @modi123p,

I think the buffer functions may suitable for your scenario but I'm not so sure if it has enough permission to execute on the power bi premium server.

How and When to use List & Table Buffer?

For your requirement, you can take a look at the following steps if they works.

Steps:
1, Create a static table to store large amounts of old records and use on-demand refresh instead of configuring the scheduler refresh on this data source.
2, Create a dynamic table loading less amount of records, you need to exclude the ranges of first table records to reduce the processing data amount. (you can configure incremental refresh on it)
3, Add a custom query table that merges the above two tables records without any other advanced operations.

Regards,

Xiaoxin Sheng

Community Support Team _ Xiaoxin
If this post helps, please consider accept as solution to help other members find it more quickly.

View solution in original post

v-shex-msft · ‎10-04-2020

HI @modi123p,

I think the buffer functions may suitable for your scenario but I'm not so sure if it has enough permission to execute on the power bi premium server.

How and When to use List & Table Buffer?

For your requirement, you can take a look at the following steps if they works.

Steps:
1, Create a static table to store large amounts of old records and use on-demand refresh instead of configuring the scheduler refresh on this data source.
2, Create a dynamic table loading less amount of records, you need to exclude the ranges of first table records to reduce the processing data amount. (you can configure incremental refresh on it)
3, Add a custom query table that merges the above two tables records without any other advanced operations.

Regards,

Xiaoxin Sheng

Community Support Team _ Xiaoxin
If this post helps, please consider accept as solution to help other members find it more quickly.

modi123p · ‎10-12-2020

Yeah.. Now I created staging tables.. to bring in data week wise and then merged these tables. I enabled incremental refresh on the new data.

However, I had to create 30 staging tables as the size of data was extremely large. I wanted to avoid taking this approach (and hence thought of breaking up the tables by BU/ dates. .etc).

Not an ideal solution.. but it worked for me!!

Incremental refresh on Data Flow failing with Error: PipelineException: Access is denied

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024