We have Premium and I've setup a dataflow that grabs data from an SQL DB. One of the entities with 30+ million rows of data has been configured for incremental refresh.
The incremental refresh is working as expected and the dataflow refreshes in ~2minutes.
However, now when the dataset that's connected to the dataflow refreshes, it is taking around 30minutes to refresh as it is essentially loading ALL the partitioned flat files off ADLS and there is no Query Folding at all.
I understand that my dataflow is much quicker to refresh but what is the point if my dataset will still take forever to refresh? I can of course just connect to the SQL DB directly but then what would be the point of using dataflows? I've implemented dataflows as the dataflow can be utilised by others so there is no need to hit the DB anymore.
I'm running into the same problem. Original source is SQL DB that gets loaded into dataflow with incremental refresh (2 to 3 minutes to refresh). The incremental refresh works great for this step. Next step is consuming the dataflow entity into a dataset with incremental refresh. This is terribly slow (1 hour to refresh). I'm considering removing the dataflow from the equation. How have others handled this situation?
Digging up an old topic here but I have the same issue. I love the idea of shared and incremental refreshed dataflows, but not at the expense of exponential dataset refresh times. In my case I have a dataset that takes about 6-8 mins to refresh using direct SQL but almost 30 mins when calling the dataflows that contain the same logic.
Has anyone found a fix for this? I'm about ready to ditch the dataflows for this project, which would be shame because they have so much promise but 30min refreshes for no reason are hard to sell to the business.