Re: merging tables very slow performance

kaarel · ‎02-03-2017

Hi!

I imported 2 SQL views with 20000rows each into PowerBI Desktop. Then in the query editor I'm trying to merge them together using inner join based on 2 key fields. One is DATETIME data type and other is VARCHAR(50) data type. In the merged table when I try to expand some columns from the merged table it doesn't complete in 2-3 minutes. Shouldn't merge be done in memory? I don't want PowerBI to go back to the SQL views, which indeed are very slow when doing the join there...

Best Wishes,
Kaarel.

Anonymous · ‎04-15-2019

Hi Kareel,

we're seeing the exact same issue - but it's MUCH worse when dealing with tables with tens of millions of rows in it. The issue we are seeing is this - the "merge" step itself DOESN'T push down to the database...it pulls in the whole table, which on a 15 million row dimension table takes forever. Once the merge step is done and you click on the "expand column" icon to expand out a few columns - that DOES get pushed to the database, so it's very fast.

This is killing our development time...we can sit for upwards of an hour waiting for the merge step just to list columns so we can do the expand. Once we get past that it's fine...but this is killing productivity for our PBI developers. I'm opening a premium ticket with Microsoft to see what can be done.

One note - if you are using tables with primary and foreign keys, you don't hit this issue because you don't need to use MERGE operation to bring in related tables. But without them (in our case we use views to implement row level security - so we can't have PK/FK) it's bad to the point of being unusable.

Hopefully I can get an acceptable answer / workaround from Microsoft.

Scott

ImkeF · ‎04-15-2019

Agree, performance tuning is not much fun currently unfortunately.

With regards to wait-time during the design-process, this technique might help: https://www.thebiccountant.com/2016/11/08/speed-powerbi-power-query-design-process/

Otherwise, you might find some hints for additional improvements here: https://www.thebiccountant.com/speedperformance-aspects/

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Anonymous · ‎04-15-2019

The thing that is frustrating to me is that if we have PK/FK involved it pushes the query down, but if we merge two tables, selecting the columns and choosing inner, left, etc. types of joins, it doesn't push the query down. This makes absolutely no sense to me at all. If it can write SQL when PK/FK are present, it should write the same exact SQL when they aren't.

In fact, since the output of the "merge" operation is just the name of the table you've joined in along with it's columns...it shouldn't have to write any SQL at all. Power Query already knows what the table being merged in looks like...

Scott

ImkeF · ‎04-15-2019

Yes, very interested as well to hear the answer on it.

Could this technique be an alternative option for you?: https://community.powerbi.com/t5/Desktop/Parameterized-SQL-Query-with-query-folding/td-p/171503

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Anonymous · ‎04-15-2019

Unfortunately parameterizing the queries isn't really what we're looking for, for a couple of reasons:

1. It prevents Power BI incremental updates

2. The things users filter by (departments, dates, patients, diagnoses, surgical procedures, biometric info, etc.) varies a lot depending on their use case.

3. Mostly because I want something end users can do. PK/FK relationships are simple even for someone with no SQL experience. Setting up parameters and embedding in SQL is definitely not where I'd like to go.

I do appreciate the input though! I'll definitely share if we find a better solution.

Scott

Anonymous · ‎02-20-2019

Also very slow loading for an inner joined merged query. Tables have less than 20k rows. I've done all filtering prior to performing the inner join. Any ideas?

avisingh · ‎02-03-2017

Kaarel,

Query engine may automatically do query folding - which means instead of handling the steps in PBI Desktop, it may push the query up and run it on the SQL Server. A simple example of that is
e.g.

1. You connect to a SQL table in Query (PBI Desktop)

2. Then you add the next step to say Filter by Country = "United States"

It's very likely that the Query engine would instead of doing "Select * from table" and then filtering it, would instead run a query ass "Select * from table where Coutry = 'United States'. The assumption is that usually servers such as SQL Servers are WAY faster at such operations than the end-user's computers.

Ok, with that said...
> You said, " I don't want PowerBI to go back to the SQL views, which indeed are very slow when doing the join there..."
That is a bit surprising. SQL should not even blink at this. Perhaps the source SQL server is slow.
Is the performance without the merge really fast and it only slows down if you try the merge?

Hope some of it helps.

Power On!
-Avi Singh
Join me for a Live Power BI Q&A Session: http://avising.com/talkpowerbi

kaarel · ‎02-03-2017

Hi!

Is there a way to force PowerBI to make the JOIN in memory?

Lets assume that I don't have access to SQL and therefore can't do any optimizations in the data source.

Best Wishes,
Kaarel.

ImkeF · ‎02-04-2017

You can use Table.Buffer to prevent query-folding to happen. (But unfortunately this might not prevent the SQL-query being passed multiple times to the server, see here: https://blog.crossjoin.co.uk/2016/11/20/referenced-queries-and-caching-in-power-bi-and-power-query/)

Actually, there is a bug in M, that prevents query-folding altogether that might have hit you here: http://www.thebiccountant.com/2015/09/17/filter-sql-server-queries-with-excel-tables-query-folding-l...

So if you filter your data before the Inner-Join, you should buffer these steps before passing them on to the join-operation.

In addition to that, you can disable "Allow data preview": https://blog.crossjoin.co.uk/2016/12/05/power-query-power-bi-and-the-allow-data-preview-to-download-...

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

nirvana_moksh · ‎05-17-2018

What if I declare 'Table.Buffer' in the query editor on the step of 'Union' wherein I am doing a union of 17 views and want all of it done in memory opposed to it hitting SQL. These 17 views in total take 2 mins to run on SSMS but after all transformations and everything the whole refresh on PBI is taking 29 mins!

Anonymous · ‎02-13-2019

same issue here. any solution?

merging tables very slow performance

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly