Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
KrisF
Frequent Visitor

PowerBI Create a new Dataflow from 2 separate DataFlows through Merging - Errors out

 Hi all, 

 

I would like to create a new Dataflow sourced from merging 2 previously created Dataflows however when i try to complete the "Merge Queries" step, Power Query errors at the evaluation stage ater about 2 hours.

 

My desired result:

Dataflow 1 (DF1): Unique key table that has an account number as the unique key with other dimension columns

Dataflow 2 (DF2): Account history table showing monthly snapshots of data

 

Desired result Dataflow 3 (DF3): DF2 table with attributes from DF1 added 

 

in SQL Server this would be a simple join.

 

I am using the following process:

  • Create new "Account Number" Dataflow sourcing from a single table, and performing basic filtering removing some columns and rows only
  • Create a "Reference" Query in the flow
  • Save Dataflow, refresh and repeat for (DF2) "Account History" source requirement.
  • Create new DataFlow (DF1) linking to "Account Number" Dataflow > create "Reference" > perform ETL removing potential duplicates and retuning only required columns leaving "Account Number" as unique.
  • Create new Dataflow (DF3) linking to DF1 and DF2 > Create "Reference" for both > Merge Queries > choose "Inner Join"  and correct join columns
  • Runs for extreme period of time and then errors. 

The max rows for the history table is 20m and the account table has about 2m

 

Trouble shooting actions taken:

  • Changed Project options to on prem gateway - Error "The remote name could not be resolved: 'api.powerbi.com'"
  • Changed Project options to allow multiple sources - fail continues
  • Toggled and refreshed Enhanced compute engine settings on all 3 types - fail continues
  • Updated gateway settings - fail continues
  • Recreated flows - fail continues
  • Threatened my computer with fire - Blue screen of death. 

 

Not sure what else i can try to get this to work, 

 

Thanks in advance for your assistance. 

1 ACCEPTED SOLUTION

That's tough.  What you are trying to do is total denormalization (one flat table).  That is pretty much the opposite of what Power BI likes  (reasonably normalized star or snowflake schema). Merges in Power Query are very expensive.

 

Dataflows would be the right vehicle as they are basically just that - flat tables supplied by Parquet files. Can you do all the heavy lifting in your upstream system (SQL Server for example)  and then spool the result either into a materialized view or into a dataflow?

View solution in original post

4 REPLIES 4
lbendlin
Super User
Super User

Do you have to merge these dataflows?  Could they be consumed independently and then joined in the in-memory data model?

@lbendlin thanks for your reply. 

 

Indeed this is possible, the challenge i am trying to solve for in this instance is that the end users are not analysts, they are marketers, sales people, basically users from all walks of professional life. Due to this, as a whole they have a very limited skill or exposure to PBI and i unfortunatly dont have the bandwidth to train them at the present.

 

The table i am trying to build is basically a single curated collection of common data from multiple sources which the end user can pick up with confidence and minimal training to answer their common questions. In effect this is an extension of the company ETL, where i am making the specific data as accessable as possible as easily as possible. The tables available at the moment are easily managed for any BA, just not something a data novice would spend the time to learn etc hence my hope of making a DataFlow for these users available.

That's tough.  What you are trying to do is total denormalization (one flat table).  That is pretty much the opposite of what Power BI likes  (reasonably normalized star or snowflake schema). Merges in Power Query are very expensive.

 

Dataflows would be the right vehicle as they are basically just that - flat tables supplied by Parquet files. Can you do all the heavy lifting in your upstream system (SQL Server for example)  and then spool the result either into a materialized view or into a dataflow?

@lbendlin 

I think i have found both the issue and (in some instances) a work around. 

I definetly want to provide access using a fairly simple Star schema but i need to make sure the IDs being joined on are masked, so creating a fake key is a solution i worked through for this. The other element is that the volume of data i was trying to manage just pushed everything too far. So for bigger datasets, i broke them down into logical smaller chunks such as date, or product type for example, this seemed to solve all the immediate ETL errors. Basically PowerQuery was groning under the weight of the data. I also created a few custom queries which solved some of the particularly troublesome flows. 

In the end, a combination of custom queries and logically chunked down flows appended or merged later solved the issue. I will continue to make this more efficent over time and thanks for your input again.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors