Solved: Re: More rows than filtered in Power query

aj1973 · ‎10-11-2019

Hello,

I am gonna make it simple; I have a table "X" in (ODBC) that contains 10 rows, in Power Query i use a filter on column to see only 5 rows in my desktop. When I close & apply power query Popup shows me that it extracts or runs through 10 rows but i see only 5 rows in my desktop which is correct however what is not correct is the double time it takes for power query to load my query! why is that? am I doing something wrong?

Now if i relate my table X to another table Y that contains 100 rows and according to my query the power query should only load in my desktop 20 rows from table Y but instead of only 20 , all the 100 rows have been loaded! is that correct?

then in my desktop i have to check box the filter in my Table X to see the correct totals of the 20 rows from Table Y otherwise i will get the totals of 100 rows.

Last question is once i publish my PBIX file and the model in power bi service, at the time of the schedule refresh (through gateway of course)does the service also do the same work as the refresh from the desktop or Power query? if so then there is a problem

Thanks

Regards
Amine Jerbi

If I answered your question, please mark this thread as accepted
and you can follow me on
My Website, LinkedIn and Facebook

edhans · ‎11-01-2019

I think the problem is the ODBC driver. I just connected to my SQL server via ODBC instead of directly and it does not support query folding in Power Query. So what that means is Power Query is importing your ENTIRE TABLE contents for each table you connect to, and then it works with everything in RAM. It is very inefficient, and little better than working with CSV files. In fact, it is probably about the same, except your tables are coming in structured and the ODBC driver is hopefully passing along data types and other relevant metadata.

But Power Query isn't generating nice neat SQL statements and sending it back for processing. That is assuming it is a relational database to begin with. It may not be, in which case no folding will happen regardless of connection type.

You can validate this though. Pull in 1 table. Do 1 simple filter. Right-click on that step. Does it say "View Native Query?" If that is grayed out, you aren't getting the advantage of a relational database connection. If it is, my whole theory on your speed issue is shot - other than I don't know what your ODBC driver is doing exactly and it could be doing a lot of interpretation and processing slowing things down.

I had an issue a few years ago I had to connect to an old SQL server (2000 I believe) that didn't support features required by Power Query so I had to pull in entire tables for Power Query to work with. It was on site, so processing about 4M records across about 10 tables and doing joins and what-not took 20-25min start to finish on a PC with 16GB of RAM to return 7,000 relevant records. Your record count is smaller, but I've no clue how many columns you have or the column content. Plus you said you were working remotely, so the transmission speeds wouldn't match what I was getting via ethernet.

If it is a relational database on the other end, and you cannot do a direct connection vs bypassing the ODBC driver, you could create a view on the server. You can see here what data sources Power BI supports natively (no need for ODBC). Sources like Amazon Redshift, IBM DB2, Oracle, SQL, SAP HANA, PostgreSQL, Vertica and many others support query folding. A view would let you apply filters and remove unnecessary columns, radically reducing data transmission times and local processing times.

Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling

Proud to be a Super User!

MCSA: BI Reporting

View solution in original post

edhans · ‎10-11-2019

When a query runs, it will tell you what it is doing. I've seen it count to over 1M rows for large tables, but after grouping, filtering, summarizing, it may only load a few hundred. This is especially true for non-SQL sources or SQL sources where folding isn't working as Power Query is doing all of the work. So in your example, it has to process 10 rows to show you the 5 you filtered for.

As to your 100 rows loaded vs the 20 you think, we'd need to see data and the M code to see how your join is set up. Joins can increase, decrease, or leave row count unchanged. Depends on the join type and the data. For example, if you have the following tables with these records (pretend they are the customer number in the table):

Customer

1

2

Sales

1

2

if you join Sales to the Customer table, you'll get 4 rows in your customer table query.

And yes, the Mashup engine in the gateway is supposed to be the same as the engine in the Power BI Desktop file. That is why it is important to update them both in the same month. Usually the gateway release trails the desktop by 1-2 weeks.

Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling

Proud to be a Super User!

MCSA: BI Reporting

aj1973 · ‎10-11-2019

Hi Edhans,

Yes the query shows that it is running through 10 rows but it loads only 5 into the desktop, fine but it takes time consuming.

regarding the other table of 100 rows, I myself don't know how to do the joins with M code inside Power Query, i am not into coding so i have to simply rely on my Desktop to do the work but that after i get all the 100 rows already loaded in. This behaviour is time and size consuming.

the other dilemma in this whole process is when for example, in my table X i have 2 accounts that don't have any transactions today in Table Y and in power query I selected these 2 accounts to appear whenever new transactions are created. So in my desktop i don't see these to 2 accounts today in order to check box them in my Filter Page and i will have to verify every day if new transactions are created in order to be able to do so. As i said previously if i dont check box the account nembers that i need to see their Total sales then i will see the Total sales in the Table Y.

Not sure if Microsoft have a solution to this behavior.

Regards
Amine Jerbi

If I answered your question, please mark this thread as accepted
and you can follow me on
My Website, LinkedIn and Facebook

edhans · ‎10-20-2019

You don't need to know how to code to do joins/merges in Power Query. It is all in the UI. See this article for more. As for the other issues, I'm having trouble understanding what you are trying to accomplish with your description, so perhaps sharing your file would help.

Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling

Proud to be a Super User!

MCSA: BI Reporting

aj1973 · ‎11-01-2019

Hi,

Can the Indexes in Query Editor solve the issue? meaning when I refresh in desktop the uploading query runs through only filtered rows instead of all the 100 rows in my table. so the refresh does not take longer!

Regards
Amine Jerbi

If I answered your question, please mark this thread as accepted
and you can follow me on
My Website, LinkedIn and Facebook

edhans · ‎11-01-2019

No. Adding index columns will slow it down, because you are adding index columns. In PQ, adding an index is just a column of numbers usually starting at 0 or 1 and increment. It useful for matching with a previous row, etc. It is not like adding an index to a relational database (SQL, Access, etc) that can speed up the queries.

Not sure what you are doing exactly, but no way 100 records should take more than an eye-blink to read in unless your data is on an online site that is really slow. For example, if you have a CSV file with 100 records you are reading in from a file that is stored in SharePoint and you are accessing it via SharePoint.Files(), it has to read in every single file name and metadata before it even opens your CSV file.

But without more specifics I am just guessing here. As I said, if you can post a sample file, or give a complete picture of what you are doing (the data source, where it is, how many rows, how many columns, etc) it would be helpful.

Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling

Proud to be a Super User!

MCSA: BI Reporting

aj1973 · ‎11-01-2019

I truly appreciate your patient with me and your answer is helpful.

The source is ODBC, and i connect to it remotely.

From the database i only need 4 tables.

One of the table(dimension) contains 20 000 rows

Another Table A (Value) contains 500 000 rows

Another Table B (Value) contains 300 000 rows

In Query Editor I used filters so I can only upload to my PowerBI desktop 10 000 From Dim table, 190 000 rows from Table A and 70 000 rows from Table B.

The Model is in place in my Power BI desktop.

When I refresh the Data-set or my report in the Desktop, in the Popup that shows me the refresh process, you see that the query is running through all the rows of all Tables(20K, 500K, and the 300K) but of course only what it was filtered in Query Editor is what i get in my report. The refresh process takes 17 to 20 mins. That's the time consuming i was talking about and that's why i thought of adding the index.

I hope the picture is much clearer now!

Thanks again Edhans

Regards
Amine Jerbi

If I answered your question, please mark this thread as accepted
and you can follow me on
My Website, LinkedIn and Facebook

edhans · ‎11-01-2019

I think the problem is the ODBC driver. I just connected to my SQL server via ODBC instead of directly and it does not support query folding in Power Query. So what that means is Power Query is importing your ENTIRE TABLE contents for each table you connect to, and then it works with everything in RAM. It is very inefficient, and little better than working with CSV files. In fact, it is probably about the same, except your tables are coming in structured and the ODBC driver is hopefully passing along data types and other relevant metadata.

But Power Query isn't generating nice neat SQL statements and sending it back for processing. That is assuming it is a relational database to begin with. It may not be, in which case no folding will happen regardless of connection type.

You can validate this though. Pull in 1 table. Do 1 simple filter. Right-click on that step. Does it say "View Native Query?" If that is grayed out, you aren't getting the advantage of a relational database connection. If it is, my whole theory on your speed issue is shot - other than I don't know what your ODBC driver is doing exactly and it could be doing a lot of interpretation and processing slowing things down.

I had an issue a few years ago I had to connect to an old SQL server (2000 I believe) that didn't support features required by Power Query so I had to pull in entire tables for Power Query to work with. It was on site, so processing about 4M records across about 10 tables and doing joins and what-not took 20-25min start to finish on a PC with 16GB of RAM to return 7,000 relevant records. Your record count is smaller, but I've no clue how many columns you have or the column content. Plus you said you were working remotely, so the transmission speeds wouldn't match what I was getting via ethernet.

If it is a relational database on the other end, and you cannot do a direct connection vs bypassing the ODBC driver, you could create a view on the server. You can see here what data sources Power BI supports natively (no need for ODBC). Sources like Amazon Redshift, IBM DB2, Oracle, SQL, SAP HANA, PostgreSQL, Vertica and many others support query folding. A view would let you apply filters and remove unnecessary columns, radically reducing data transmission times and local processing times.

Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling

Proud to be a Super User!

MCSA: BI Reporting

aj1973 · ‎11-05-2019

Hi Edhan,

your reply helped me better understand my issue:

The data source is supported by the Power BI native query. I tested and it worked, I simply applied an SQL script in the data source settings and it imported only the rows that I filtered.

However when i apply "Filtered Rows" in Query Editor the "View native query" is greyed out when I right click on the Applied step!!

This means that the query folding will not take effect at this step in the server and therefore the hard work will be done by my desktop and of course more time is consumed.

Is there a solution to this?

Regards
Amine Jerbi

If I answered your question, please mark this thread as accepted
and you can follow me on
My Website, LinkedIn and Facebook

edhans · ‎11-05-2019

Well, maybe.

If you typed something in the data source settings, NOTHING will fold after that. You are hardcoding the starting point.

What happens if you just connect to the data source, the database then a table? You should get two steps, a Source, and Navigation. If you right-click on Navigation and you can View Native Query, you should see something like this (This is the Person table in AdventureWorks2017.

select [$Table].[BusinessEntityID] as [BusinessEntityID],
    [$Table].[PersonType] as [PersonType],
    [$Table].[NameStyle] as [NameStyle],
    [$Table].[Title] as [Title],
    [$Table].[FirstName] as [FirstName],
    [$Table].[MiddleName] as [MiddleName],
    [$Table].[LastName] as [LastName],
    [$Table].[Suffix] as [Suffix],
    [$Table].[EmailPromotion] as [EmailPromotion],
    [$Table].[AdditionalContactInfo] as [AdditionalContactInfo],
    [$Table].[Demographics] as [Demographics],
    [$Table].[rowguid] as [rowguid],
    [$Table].[ModifiedDate] as [ModifiedDate]
from [Person].[Person] as [$Table]

At that point, you should be able to filter.

If you do not see View Native Query (grayed out) then it could be your data source and Power Query just doesn't understand how to create native statements for it.

If that is the case, then you just need to modify the advanced query when you are doing the connection to have your first step or two in the statement. This is filtering just for "EM" person type in the same table as above.

select [_].[BusinessEntityID],
    [_].[PersonType],
    [_].[NameStyle],
    [_].[Title],
    [_].[FirstName],
    [_].[MiddleName],
    [_].[LastName],
    [_].[Suffix],
    [_].[EmailPromotion],
    [_].[AdditionalContactInfo],
    [_].[Demographics],
    [_].[rowguid],
    [_].[ModifiedDate]
from [Person].[Person] as [_]
where [_].[PersonType] = 'EM'

That will prefilter your data. You can also eliminate columns here you don't need.

I've never seen an advanced statement work in the connection dialog but folding not work if you don't use an advanced statement, so I'd have to actually see your connection and play with it, which obviously isn't possible. It sounds like you just might have a data source that accepts a SQL statement, but not one that Power Query can generate, and it could be the ODBC driver being the issue. It may not be passing along enough info for PQ to know what it is, or it simply is a source PQ doesn't support for folding.

Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling

Proud to be a Super User!

MCSA: BI Reporting

aj1973 · ‎11-05-2019

The advanced query works fine with the ODBC Driver, the problem is that the report is built and deployed for a client and the report contains many tables so imagine the Hard coding that i have to go through..

In my Power query the 2 steps Source and Navigation are there for all tables but the problem is that i am able to see the code but i can't edit it in order to add some filters.

I just did another test using Filtered rows and connecting to another server. By elimination the problem is the ODBC Driver 32 Bits like you said " it could be the ODBC driver being the issue. It may not be passing along enough info for PQ to know what it is, or it simply is a source PQ doesn't support for folding."

I think Microsoft needs to find us a solution to this.

Regards
Amine Jerbi

If I answered your question, please mark this thread as accepted
and you can follow me on
My Website, LinkedIn and Facebook

More rows than filtered in Power query

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024