Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

Best Way to Filter Million Rows of Data

Hello,

 

One of our users is getting data from an Excel file (stored in network drive) and importing data from SAP HANA database. What he’s doing is filtering the SAP HANA queries based on the table from the excel file which he converted to ‘list’. He is using List.Contains() function to do the filter. The Excel file contains 2000+ rows and it is used to filter multiple SAP HANA queries with million rows.

 

Any suggestions on what other ways we could try to filter a list of million things down to 2000+? 

 

List.Contains() function seems not efficient and causes slow performance in Power Query.

 

Thank you!

10 REPLIES 10
Jimmy801
Community Champion
Community Champion

Hello @Anonymous 

 

I've using a lot List.Contains to filter tables. The trick you have to apply ist to surround your list by List.Buffer(YourList). Otherwise it has to load this list for every row, and this will be sloooooooooooooow. I don't know how this would work on millions of rows filtering against 2000 criteria. Try it out and let me know... I would be very curious how this influences the performance

 

All the best

 

Jimmy

I do what @Jimmy801 does, I wrap the list in a List.Buffer first. 

 

NewStep = Table.SelectRows(#"Previous Step", let _list = List.Buffer(ExcelList) in each List.Contains(_list, [HanaColumn]))

Does that fold with a SAP Hana db @justinh ? I don't have enough experience with that, nor the means to test.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

@edhans 
Yes it folds with Hana. It can result in a stack overflow if the list is too large. 2000 items may be cutting it close.

Thanks @justinh - useful info. I've done it on SQL Server with more than that with no issues. Each database and how Power Query operates with it is different.

Merging is still faster, but you must be 100% on the server for that to fold.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

List.Contains() will not be good for this. It is essentially an IN operator, so even on SQL Server, filtering a million rows where a field is IN a list of 2,000 items, It. Will. Be. SLOW.

 

An inner merge would be faster if you can get the item list on the server so folding happens. List.Contains() will fold even with an Excel list though if it is converted to a list first. A merge of any kind though will not.

 

Did you convert the Excel numbers to a list prior to using it @Anonymous ? If you refer to the list via List.Contains(ExcelFile[Field], [ComparisonField]) that will not fold and be way slower than an inner merge.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting
Anonymous
Not applicable

@edhans what do you mean by "An inner merge would be faster if you can get the item list on the server so folding happens"?  

How can I get the item list on the server?

 

Did you convert the Excel numbers to a list prior to using it @Bernadette ? If you refer to the list via List.Contains(ExcelFile[Field], [ComparisonField]) that will not fold and be way slower than an inner merge. - Yes, it was converted to a list first.

 

Thank you!

You would need to work with your IT department/Database admins to get the list natively on your server. There are ways using Power Automate, stored procedures, scripts, etc. to get that data on a SQL server so it is now native to the database and will allow the server to do all of the work via folding. But that will be up to your DBA and the possibilities with the SAP Hana database.

 

If you are in a SQL environment, List.Contains() will fold if done properly. I have an article about this here. I cannot say whether or not that will fold in a SAP Hana environment. If you can see the native query after the step with List.Contains() you will know that is folding. It is possible in SAP Hana that isn't folding either. 



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting
artemus
Employee
Employee

Maybe something like this:

 

 

= Table.AddColumn(Source, "MatchFound", each Function.ScalarVector(type function(tbl as table [Key=text]) as list, each List.Accumulate([tbl], [results={}, lookuplist = InputList], (current, next) => [results = current[results] & {current[lookuplist]{0}? = next[Key]}, lookuplist = if Value.Compare(next[Key], current[lookuplist]{0}?) > -1 then List.RemoveFirstN(current[lookuplist], 1) else current[lookuplist]])[results])(_))

 

 

Then you would just filter on MatchFound...

 

Variables:

Source = The big table

InputList = The filter list

Key = Your key column that you are selecting on.

 

I haven't used SAP Hana, so I'm not sure if this works well or not?

 

Edit: Both the list and the table must be sorted.

 

Actually, this might be faster... depending on the size of your dataset:

Table.FromRecords(List.Transform(InputList, each Source{[Key = _}]), Value.Type(Source))

 

Greg_Deckler
Super User
Super User

@Anonymous - Don't use List.Contains but rather a merge query? Table.Join. But I will leave it to the experts in this such as @ImkeF and @edhans I think that biaccountant.com or maybe Chris Webb's blog had some articles on speeding up joins and such.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors