Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
pade
Advocate III
Advocate III

Query Folding code for: SELECT * FROM T1 WHERE EXIST (SELECT 1 FROM T2 WHERE T1.C=T2.C)

I'm looking for a M-code that minic this SQL code:

SELECT *
FROM DimensionTable as d
WHERE EXISTS (SELECT 1 FROM FactTable as f WHERE f.customerId = d.userId)
 
Edit: Please note that Query Folding is an absolute requirement for speed performance on large fact and dimension tables. The defined columns in this example have indexes defined in the database. Dimensions are ~100'000 rows and facts are between 1'000'000 and 50'000'000 rows. Using Direct Query is kind of a must due to that I'm processig and showing some fact columns of text type with Summaries and Description contexts, and these kind of data shall not be stored in the cube
 

--

Note: While writing this, I figured out a way to solve this. But it's kind of not how I expected this to be solved, and a bit more complex then I expected. But I add it here for reference and if someone have a better and easier solution, please show how.

Even though the solution was based on a M-code that was not that obvious, I'm finally impressed how the Power Query Engine managed to optimize the final SQL code

--

 

So I have two tables looking like this:
DimensionTable:
userId  Name
1Donald Trump
2Joe Biden

 

FactTable:

Fact  customerId
3.141
0.7071

 

Running the SQL query above will select only thouse rows in the DimensionTable that has an existing customerId in the FactTable like below:

DimensionTable:
userId  Name
1Donald Trump

 

I can't use Native SQL queries since that will stop query folding.

I also can't create a view in the SQL database that executes the SQL query for me, else this would have been a solution

So this needs to be solved using pure M.

 

What I first tried was to creat two none loading queries like below:

// DimensionTable
let
  Source = Sql.Database(Server1, Database1),
  DimensionTable = Source{[Schema=Schema1,Item="DimensionTable"]}[Data]
in
  DimensionTable

 

// FactTable
let
  Source = Sql.Database(Server1, Database1),
  FactTable = Source{[Schema=Schema1,Item="FactTable"]}[Data]
in
  FactTable

 

I can then referens these two queries in my first attempt

// CustomerTable1

let

  Source = Table.AddJoinColumn(DimensionTable, "userId", FactTable, "customerId", "facts"),

  // Or using this: Source = Table.NestedJoin(DimensionTable, "userId", FactTable, "customerId", "facts"),

  CustomersOnly = Table.SelectRows(Source, each Table.IsEmpty([facts])<>true)

in

  CustomersOnly

 

This kind of works (delivers the expecte result), although this doesn't support query folding and will be to slow. But maybe there's some other solution that will support Query Folding

 

The other approach would be to mimic the other SQL pattern doing the exact same thing:

SELECT *
FROM DimensionTable AS d
WHERE d.userId IN (SELECT customerId from FactTable)
 
The expression "SELECT customerId from FactTable" is easily mimiced by this M-code:
FactTable[customer]
But I can't figure out any M-code that mimic the WHERE-IN statement
 
--
So the solution I found
---
 
I finally ended up with was something that mimic this SQL code
SELECT DISTINCT d.*
FROM DimensionTable as d
INNER JOIN
(
    SELECT customerId
    FROM FactTable as t
as f on (d.userId = f.customerId)
 
This SQL code generates the same query plan on my SQL server as the other SQL queries, but I'm not sure it will on all SQL servers
The M-code I used for generating this:
// CustomerTable2
let
  Custom1 = Table.AddJoinColumn(DimensionTable, "userId", FactTable, "customerId", "test"),
  Expanded = Table.ExpandTableColumn(Custom1, "test", {"customer"}, {"customer"}),
  Filtered = Table.SelectRows(Expanded, each [customer] <> null),
  Cleaned = Table.RemoveColumns(Filtered,{"customer"}),
  DistinctTable = Table.Distinct(Cleaned)
in
  DistinctTable
 
Sooo. What I did was:
  1. Left joining the DimensionTable with the FactTable
  2. Expanding the customerId
  3. Removing each row where there was no match
  4. Removing the joined/expanded customer column
  5. Using Distinct on the whole table

I expected this to work and I also expected it to do Query Folding. But I didn't expect it to generate this SQL code, and I never expected it to generate the same query plans as the other SQL codes does.

 

But as I said in the beginning. If anyone knows a better M-code, please show how

4 REPLIES 4
v-alq-msft
Community Support
Community Support

Hi, @pade 

 

Based on your description, I created data to reproduce your scenario, The pbix file is attached in the end,

Dimension:

d1.png

 

Fact:

d2.png

 

Power Query:

As is suggested by @wdx223_Daniel , you may create a new query with the following codes in 'Advanced Editor'.

let
    Source = Table.SelectRows(Dimension,each List.Contains(Fact[CustomerId],[UserId]))
in
    Source

 

Result:

d3.png

 

DAX:

You may create a calculated table as below.

Table = 
FILTER(
    'Dimension',
    [UserId] in DISTINCT('Fact'[CustomerId])
)

 

Result:

d4.png

 

Best Regards

Allan

 

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Thanks @v-alq-msft .

 

Although, I can't use this solution either, and I realize it's my fault. I didn't mention that I'm using Direct Query mode, hense I can't create calculated tables.

I will update my initial post so this is more clear

 

 

wdx223_Daniel
Super User
Super User

= Table.SelectRows(DimensionTable,each List.Contains(FactTable[customerId],[userId]))

Thanks @wdx223_Daniel .

Although, I can't see that this supports Query Folding, so it will take a very long time to execute on large tables

 

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors