Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
V55
Helper I
Helper I

Dataflow error for huge records - Evaluation resulted in a stack overflow

I have a table with lakhs of records. 'Time' is one of the fields. I have created a function fetchlatest to get the latest record based on 'time'. As you can see in the below code, the function basically groups the table based on unique key (GroupByIds) and then create column 'MaxDateTime'. Then filter out the records which has time = MaxDateTime.

 

I validated the function with a table having thousands of records and it worked fine. But when I did scale testing with lakhs of records, I am getting the error 'We encountered an error during evaluation. Errors- Evaluation resulted in a stack overflow and cannot continue'. When I manually tested each step, it happens at the step Table.Group.

 

let
  FetchLatest = (Table , AllProperties, GroupByIds) =>
let    
    Properties = List.Difference(AllProperties, GroupByIds),
    GroupRowsById = Table.Group( Table, GroupByIds, {{"MaxDateTime", each List.Max([time]), type datetime}, {"All rows", each _, type table}}),
    ExpandGroupedRows = Table.ExpandTableColumn(GroupRowsById , "All rows", Properties, Properties),
    FilterMaxTimeRow = Table.SelectRows(ExpandGroupedRows, each [time] = [MaxDateTime]),
    RemoveExtraColumns = Table.RemoveColumns(FilterMaxTimeRow ,{"MaxDateTime"}),

    Result = 
    if List.IsEmpty(GroupByIds) = true
    then Table
    else RemoveExtraColumns 
in 
    Result
in
  FetchLatest

 

 

I have shared pbix and excel in this link: pbix and excel table

 

Please note that I used Dataflows and the code was written in powerquery online.

 To test the functionality in powerBI desktop, this is what I have done:

1. Fetch table from blobsource in dataflow without calling getlatest function.

2. Import the entire table in powerBI desktop by connecting to dataflow. 

3. Then called getlatest funtion in powerBI desktop in m code and it worked!

 

When I tried in powerBI desktop for two tables (dataflow has around 10 tables), my local machine hit CPU and memory limit.

In dataflows, I am getting this stack overflow error and that too only for huge records and not for small table.

It is most likely due to high memory consumption. I also confirmed the premium capacity in powerbi exceeded 80% utilization.

 

How to optimize the M code to get latest records from the huge table?

5 REPLIES 5
v-yuta-msft
Community Support
Community Support

Hi V55,

 

These kind of issue may caused by the datasize. Is the data source Azure Data Lake Storage Gen2? Please refer to consideration and limitation here:

https://docs.microsoft.com/en-us/power-bi/service-dataflows-configure-workspace-storage-settings#con....

 

Regards,

Jimmy Tao

Hi @v-yuta-msft,

 

Thanks for the response.

 

I am assuming all the dataflow data gets stored ADLS Gen2. 

 

My datasource is blob, but I am actually storing it in a different dataflow.

Then I created another dataflow to fetch data from above dataflow, transform the data and then call get latest function.

 

I just want to highlight that I did not get the error when datasize is less. When I have lakhs of records, I am getting this error.

 

The error returned by dataflow can be seen here :Dataflow Error. (WCUS)

curth
Power BI Team
Power BI Team

Does the data file you shared represent the full data set or just a partial data set? Is it the same as the case which fails in Dataflows?

 

When you successfully load the data in Desktop, what's the maximum value reached for the commit size of the process Microsoft.Mashup.Container.NetFx45.exe? (You can see that in the Details tab of the Task Manager, but you'll probably need to right-click on the column header and select the "Commit size" column.)

Capture.PNGHi @curth,

 

I tested with much larger records. This is what I noticed in my localmachine. The highlighted value is the commit size.

 

Since it is not executing in dataflow  and giving stack overflow error, could it be related to dataflow capacity?

I am using powerBI premium shared capacity. But unfortunately, I dont know the dataflow capacity set.

Capture1.JPG

 

Actually the issue seems to due to the heavy function (get latest function) on the table having lakhs of records.

In the above post, I posted the commit size when I loaded single table in powerbi desktop. Now this image shows the commit size when I tried to do the same operation for three tables in powerbi desktop. And my CPU and memory hit the peak.

I also confirmed the memory thrashing in premium exceeded 80%.

 

Wondering if there is a better way to optimize get latest function?

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors