Solved: Can Power Query handle this much data?

Jkaelin · ‎06-22-2017

Good morning,

I have one dimension table. 8 fact tables. Each table has approximately 300,000 records.

Before I create this data model, can power query efficiently join (Left Outter Join) all 8 tables into 1 large flattened table? Or should I expect immediate performance issues & timeouts, etc?

(Flattening to enhance PowerPivot performance issues I've ran into)

My computer is i7 3700k processor, 16 GB Ram, Excel 2016 Professional Plus.

Thank you,

James K

vanessafvg · ‎06-22-2017

@Jkaelinwhat are your data types in your facts? i really dont see an issue in combining them if they are most numerical, why doing you just try and see what happens? essentially its mostly up to your machine. and obviously because this is in memory technology you want to make sure you standarize your data as much as possible , the less unique values you have the better in order to get the most compression. Start only with what you need rather than everything.

If I took the time to answer your question and I came up with a solution, please mark my post as a solution and /or give kudos freely for the effort 🙂 Thank you!

Proud to be a Super User!

View solution in original post

vanessafvg · ‎06-22-2017

@Jkaelin one dimension and 8 facts? sounds like the issue might be with the modelling, although i might be wrong.

it really depends on a variety of things

1) your hardware (you seem to have sufficient hardware)

2) the modelling of your data. what is in your facts from a data type perspective? How wide are they? do you need all the columns? 300000 is not a lot in my opinion, sounds like the design is the issue here

are you doing this in power bi or power pivot?

If I took the time to answer your question and I came up with a solution, please mark my post as a solution and /or give kudos freely for the effort 🙂 Thank you!

Proud to be a Super User!

Jkaelin · ‎06-22-2017

@vanessafvg

What is in your facts?

I am pulling in various stock characteristic items. P/E, P/S, Total Return, etc. For various index's across time (1990-Current).

How wide are they?

They are about 10 columns wide for each fact table.

Do I need all the columns?

Not at a given time, probably 20-30 columns at any given time.

Doing the work in Power Pivot using various DAX percentile, productx, measures that have to iterate over the data set to find the top third, middle third, etc, & calculate stock performance.

How could I improve the data modeling?

The data feed is limiting how many columns I can pull out, so I've had to harvest them seperatly into various fact tables. Which stinks.

Any thoughts would be very appreciated because I've been banging my head on a wall trying to optimize the data performance.

vanessafvg · ‎06-22-2017

@Jkaelinwhat are your data types in your facts? i really dont see an issue in combining them if they are most numerical, why doing you just try and see what happens? essentially its mostly up to your machine. and obviously because this is in memory technology you want to make sure you standarize your data as much as possible , the less unique values you have the better in order to get the most compression. Start only with what you need rather than everything.

If I took the time to answer your question and I came up with a solution, please mark my post as a solution and /or give kudos freely for the effort 🙂 Thank you!

Proud to be a Super User!

Jkaelin · ‎06-22-2017

@vanessafvg

Data types are all whole numbers & decimals.

Is flattening okay? Sometimes I have to create a financial ratio such as Sales / Assets. Sales will be in fTable 1 and Assets in fTable 2. I must have this data already combined into ratio for me to calculate statistics on it, such as Top third, Middle third, etc.

I really appreciate your help & thank you!

vanessafvg · ‎06-22-2017

@Jkaelin like i said i dont think there is an issue with combining, as long as its all at the same grain it might even make sense to do so, its not your typical relational data, you have only separated it out due to source constraints, your fields are mostly numeric so no issue there, 300 000 is not a biggie. Denormalising makes sense but like i say take out everything you dont need. leave the dimension as a dimension and only combine the facts. with my limited knowledge of what you doing, i dont see an issue

If I took the time to answer your question and I came up with a solution, please mark my post as a solution and /or give kudos freely for the effort 🙂 Thank you!

Proud to be a Super User!

KHorseman · ‎06-22-2017

Not sure why you would want to flatten something like that. That's like...the opposite of how this whole program is supposed to work.

What specific performance issues are you talking about?

Did I answer your question? Mark my post as a solution!

Proud to be a Super User!

Jkaelin · ‎06-22-2017

@KHorseman

I currently have a Snowflake schema. I've read that if I'm having poor performance it could be due to a lack of a Star Schema. I've read Star Schema's operate best via w/ flattened tables and fewer relationships. So I'm trying to flatten a few tables to help build an optimized Star Schema.

Am I doing it all wrong? I need help! I'm more of a finance guy trying to thrive in a data analyst world somedays.

Thank you!

James

Can Power Query handle this much data?

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly