Solved: Re: PowerBI Create a new Dataflow from 2 separate ...

KrisF · ‎09-01-2022

Hi all,

I would like to create a new Dataflow sourced from merging 2 previously created Dataflows however when i try to complete the "Merge Queries" step, Power Query errors at the evaluation stage ater about 2 hours.

My desired result:

Dataflow 1 (DF1): Unique key table that has an account number as the unique key with other dimension columns

Dataflow 2 (DF2): Account history table showing monthly snapshots of data

Desired result Dataflow 3 (DF3): DF2 table with attributes from DF1 added

in SQL Server this would be a simple join.

I am using the following process:

Create new "Account Number" Dataflow sourcing from a single table, and performing basic filtering removing some columns and rows only
Create a "Reference" Query in the flow
Save Dataflow, refresh and repeat for (DF2) "Account History" source requirement.
Create new DataFlow (DF1) linking to "Account Number" Dataflow > create "Reference" > perform ETL removing potential duplicates and retuning only required columns leaving "Account Number" as unique.
Create new Dataflow (DF3) linking to DF1 and DF2 > Create "Reference" for both > Merge Queries > choose "Inner Join" and correct join columns
Runs for extreme period of time and then errors.

The max rows for the history table is 20m and the account table has about 2m

Trouble shooting actions taken:

Changed Project options to on prem gateway - Error "The remote name could not be resolved: 'api.powerbi.com'"
Changed Project options to allow multiple sources - fail continues
Toggled and refreshed Enhanced compute engine settings on all 3 types - fail continues
Updated gateway settings - fail continues
Recreated flows - fail continues
Threatened my computer with fire - Blue screen of death.

Not sure what else i can try to get this to work,

Thanks in advance for your assistance.

lbendlin · ‎09-05-2022

That's tough. What you are trying to do is total denormalization (one flat table). That is pretty much the opposite of what Power BI likes (reasonably normalized star or snowflake schema). Merges in Power Query are very expensive.

Dataflows would be the right vehicle as they are basically just that - flat tables supplied by Parquet files. Can you do all the heavy lifting in your upstream system (SQL Server for example) and then spool the result either into a materialized view or into a dataflow?

View solution in original post

lbendlin · ‎09-04-2022

Do you have to merge these dataflows? Could they be consumed independently and then joined in the in-memory data model?

KrisF · ‎09-04-2022

@lbendlin thanks for your reply.

Indeed this is possible, the challenge i am trying to solve for in this instance is that the end users are not analysts, they are marketers, sales people, basically users from all walks of professional life. Due to this, as a whole they have a very limited skill or exposure to PBI and i unfortunatly dont have the bandwidth to train them at the present.

The table i am trying to build is basically a single curated collection of common data from multiple sources which the end user can pick up with confidence and minimal training to answer their common questions. In effect this is an extension of the company ETL, where i am making the specific data as accessable as possible as easily as possible. The tables available at the moment are easily managed for any BA, just not something a data novice would spend the time to learn etc hence my hope of making a DataFlow for these users available.

lbendlin · ‎09-05-2022

That's tough. What you are trying to do is total denormalization (one flat table). That is pretty much the opposite of what Power BI likes (reasonably normalized star or snowflake schema). Merges in Power Query are very expensive.

Dataflows would be the right vehicle as they are basically just that - flat tables supplied by Parquet files. Can you do all the heavy lifting in your upstream system (SQL Server for example) and then spool the result either into a materialized view or into a dataflow?

KrisF · ‎09-22-2022

@lbendlin

I think i have found both the issue and (in some instances) a work around.

I definetly want to provide access using a fairly simple Star schema but i need to make sure the IDs being joined on are masked, so creating a fake key is a solution i worked through for this. The other element is that the volume of data i was trying to manage just pushed everything too far. So for bigger datasets, i broke them down into logical smaller chunks such as date, or product type for example, this seemed to solve all the immediate ETL errors. Basically PowerQuery was groning under the weight of the data. I also created a few custom queries which solved some of the particularly troublesome flows.

In the end, a combination of custom queries and logically chunked down flows appended or merged later solved the issue. I will continue to make this more efficent over time and thanks for your input again.

PowerBI Create a new Dataflow from 2 separate DataFlows through Merging - Errors out

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024