Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
I am trying to join two linked entities together in a dataflow. Entity A is 19 million rows, entity B is 30,000.
The dataflow refresh kept failing at first with what I assumed were memory issues (it worked with smaller volumes but when I increased them I got the error message "There was a problem refreshing your dataflow"). However, after I turned the enhanced compute engine on it worked ok. Both linked entities at this testing phase were within the same workspace as the dataflow.
I am now trying to do the same thing but with entity B in a different workspace. Now it takes ages before failing again. Why is it performing differently having one of the linked entities being sourced from another workspace - is it no longer using the enhanced compute engine for some reason? I can't see why it would work differently.
I'm literally working on this today.
I have some dataflows entities with 5m-151m records. These are activity records, when a "activity" is performed multiple times, my analyst users often only want the first activity, or most recent activity, in their analysis. I'd like to create "unqiue" activity dataflows for these use cases.
I'm doing this now, within one Workspace. I have a "ingest" dataflow, a Linked Entity "final" dataflow, and a "unlinked" entity (uses Linked Entities but with "Load Enabled" unchecked) "unique" dataflow.
If I chain my "unique" dataflow by enabling load on the entities, "uniques" will have to complete before "ingest" and "final" will finish. If "unique" fails, "ingest" will fail--and my data will have to be re-fetched from the data source again, even if that step actually succeeded. This is why "unique" isn't chained.
Since "unique" isn't chained, I believe I'm not leveraging the Enhanced Dataflows Compute Engine (EDCE).
So today I'm going to try creating my "unique" dataflows in a seperate Workspace where I can leave "Load Enabled" checked--hopefully engaging EDCE without making it a depdendcy on the ingest.
I will report back!
I have "Load enabled" ticked on all my linked entities. If you untick it then you no longer get the icon to indicate it is a linked entity and my understanding is that it does not use the enhanced compute engine. I did try unticking it at first as a way to hide my staging/transformation entities from the user, but the performance was very poor. It's a bit frustrating as it means not only do you have to separate out your ingestion into a separate dataflow but also the joining/transformation into a separate dataflow as well.
If my linked entity is from another workspace I do still get the "linked entity" icon.
With regards to Source = PowerBI.Dataflows() vs Source = PowerBI.Dataflows([IncludeGroups = false, SourceDataflowId = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"]), I don't believe that makes a difference. When I first add a linked entity it produces the latter M code, but when I save/close and re-open the dataflow, it reverts back to the former code.
Also, does your Power Query M look like this:
let
Source = PowerBI.Dataflows(),
...
Or like this:
let
Source = PowerBI.Dataflows([IncludeGroups = false, SourceDataflowId = "xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx"]),
...
I'm not sure, but I've wondered if the latter has an impact on the Linked Entity functionality.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.