Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hi all,
I would like to create a new Dataflow sourced from merging 2 previously created Dataflows however when i try to complete the "Merge Queries" step, Power Query errors at the evaluation stage ater about 2 hours.
My desired result:
Dataflow 1 (DF1): Unique key table that has an account number as the unique key with other dimension columns
Dataflow 2 (DF2): Account history table showing monthly snapshots of data
Desired result Dataflow 3 (DF3): DF2 table with attributes from DF1 added
in SQL Server this would be a simple join.
I am using the following process:
The max rows for the history table is 20m and the account table has about 2m
Trouble shooting actions taken:
Not sure what else i can try to get this to work,
Thanks in advance for your assistance.
Solved! Go to Solution.
That's tough. What you are trying to do is total denormalization (one flat table). That is pretty much the opposite of what Power BI likes (reasonably normalized star or snowflake schema). Merges in Power Query are very expensive.
Dataflows would be the right vehicle as they are basically just that - flat tables supplied by Parquet files. Can you do all the heavy lifting in your upstream system (SQL Server for example) and then spool the result either into a materialized view or into a dataflow?
Do you have to merge these dataflows? Could they be consumed independently and then joined in the in-memory data model?
@lbendlin thanks for your reply.
Indeed this is possible, the challenge i am trying to solve for in this instance is that the end users are not analysts, they are marketers, sales people, basically users from all walks of professional life. Due to this, as a whole they have a very limited skill or exposure to PBI and i unfortunatly dont have the bandwidth to train them at the present.
The table i am trying to build is basically a single curated collection of common data from multiple sources which the end user can pick up with confidence and minimal training to answer their common questions. In effect this is an extension of the company ETL, where i am making the specific data as accessable as possible as easily as possible. The tables available at the moment are easily managed for any BA, just not something a data novice would spend the time to learn etc hence my hope of making a DataFlow for these users available.
That's tough. What you are trying to do is total denormalization (one flat table). That is pretty much the opposite of what Power BI likes (reasonably normalized star or snowflake schema). Merges in Power Query are very expensive.
Dataflows would be the right vehicle as they are basically just that - flat tables supplied by Parquet files. Can you do all the heavy lifting in your upstream system (SQL Server for example) and then spool the result either into a materialized view or into a dataflow?
@lbendlin
I think i have found both the issue and (in some instances) a work around.
I definetly want to provide access using a fairly simple Star schema but i need to make sure the IDs being joined on are masked, so creating a fake key is a solution i worked through for this. The other element is that the volume of data i was trying to manage just pushed everything too far. So for bigger datasets, i broke them down into logical smaller chunks such as date, or product type for example, this seemed to solve all the immediate ETL errors. Basically PowerQuery was groning under the weight of the data. I also created a few custom queries which solved some of the particularly troublesome flows.
In the end, a combination of custom queries and logically chunked down flows appended or merged later solved the issue. I will continue to make this more efficent over time and thanks for your input again.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.