Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
AlB
Super User
Super User

Doubts with INNERJOINs and lineage

Hi all,

 

I am reading this article on joining tables in DAX and I am quite confused as to how these INNERJOINs work. In the section called "Using NATURALLEFTOUTERJOIN and NATURALINNERJOIN without Relationships" the authors claim:

 

You cannot join these two tables by using ProductKey, because such a these columns havehas the same name but different data lineages in the model. In fact, the following code generates an error:

EVALUATE
NATURALLEFTOUTERJOIN( P_A, P_B )

 

Ok, so from that I understand that the columns need a common lineage. But then a few lines later they state:

 

In order to join two columns with the same name and no relationships, it is necessary that these columns do not have a data lineage. To obtain that, it is necessary to write the column using an expression that breaks the data lineage, as in the following example.

 

So with this it seems we have to break the lineage (??). Later, again, the example with the TREATAS seems to be used to create rather than break the lineage. This is where I'm getting lost. Is it break or create lineage that you need for the INNERJOINS to work?

If anyone knows about this I would really appreciate it if you can clarify it for me, or direct me to other resources where I can read about it. I've checked it in another book but it doesn't really go into the details.


Many thanks for your help.

 

1 ACCEPTED SOLUTION

@AlB

Man... you ask very difficult questionsSmiley Wink

 

My limited understanding is following

 

1) InnerJoin can be created between tables that do not physically exist in the data model. In that case the tables should have a column with same name and same type. For example you can create temporary tables using variables using DATATABLE, Row or other such functions

2) When using INNERJOIN with Tables existing in the Data Model, there should already be a relationship beween these tables.


If there there is no relationship , we can still use INNERJOIN
                                but first
we need to remove the data lineage (i.e. the tag which identifies the original column in the data model that the values of a column originated from). Otherwise it can create confusion querying physical tables in the data model.


This tag is not removed simply by changing the names of the column using SELECTCOLUMNS or any other simialr function. becasue DAX engine is smart enough to remember where the values originated from. Data lineage is maintained as long as an expression is only made up of one column reference. To beak that lineage we use techniques like adding empty string or a 0 to the column expression.


Regards
Zubair

Please try my custom visuals

View solution in original post

9 REPLIES 9
v-piga-msft
Resident Rockstar
Resident Rockstar

Hi @AlB,

 

Hope the blogs below will help you.

 

https://curbal.com/blog/glossary/naturalleftouterjoin-dax

https://curbal.com/blog/glossary/naturalleftouterjoin-dax

 

In addition, we could use DAX Studio trace the dax query to analyze the query better.

 

Best Regards,

Cherry

Community Support Team _ Cherry Gao
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Hi @v-piga-msft

 

Thanks for your reply. I checked the videos but, unfortunately,  they don't answer my question. In fact the speaker says that she doesn't know how the joins work with no relationships and she is asking for tips herself.

Maybe @Zubair_Muhammad can give us a hand here?

Many thanks

 

 

@AlB

Man... you ask very difficult questionsSmiley Wink

 

My limited understanding is following

 

1) InnerJoin can be created between tables that do not physically exist in the data model. In that case the tables should have a column with same name and same type. For example you can create temporary tables using variables using DATATABLE, Row or other such functions

2) When using INNERJOIN with Tables existing in the Data Model, there should already be a relationship beween these tables.


If there there is no relationship , we can still use INNERJOIN
                                but first
we need to remove the data lineage (i.e. the tag which identifies the original column in the data model that the values of a column originated from). Otherwise it can create confusion querying physical tables in the data model.


This tag is not removed simply by changing the names of the column using SELECTCOLUMNS or any other simialr function. becasue DAX engine is smart enough to remember where the values originated from. Data lineage is maintained as long as an expression is only made up of one column reference. To beak that lineage we use techniques like adding empty string or a 0 to the column expression.


Regards
Zubair

Please try my custom visuals

@AlB  thanks for this great question !!!

@Zubair_Muhammad  thanks for the amazing explanation !!!

Did I answer your question? Mark my post as a solution!
Proud to be a Super User!
My custom visualization projects
Plotting Live Sound: Viz1
Beautiful News:Viz1, Viz2, Viz3
Visual Capitalist: Working Hrs

Hi  @Zubair_Muhammad

Thanks very much for your reply. I guess it's a good sign that the questions are difficult Smiley Wink

 

Your tips were very useful, particularly the difference between physical tables and those created dynamically. After testing a few things my conclusion is that:

 

1. Tables created dynamically from scratch, such as with DATATABLE or the table constructor:  { ("a", 1) , ("b", 2) }, do not have that lineage tag.

2. Columns in physical tables always have a lineage tag. 

 

In the first case  there's no lineage so there's no problem for the joins ; it seems like "no lineage" on both columns is considered "same lineage" by the engine. In the second, there's lineage and it's different. It needs to be removed so that we have two columns with no lineage that can now be joined.

What do you think?

 

Thanks a lot

 

@Zubair_Muhammad

 

And since you like tough questions Smiley Happy I have something else regarding the same article on SQLBI.

(Disclaimer: I have posted this question there as well, in case you prefer to wait for their answer. That might be soon or take quite long, depending on how busy they are). 

 

They show an interesting way of creating common lineage through TREATAS before joining the tables:

 

EVALUATE
VAR B_TreatAs =
TREATAS ( P_A, P_B[ProductKey], P_A[Code], P_A[Color] )
VAR Result =
NATURALLEFTOUTERJOIN ( B_TreatAs, P_B )
RETURN
Result

 

I have two variations of this code and corresponding questions:

 

1. The following does not work:

 

EVALUATE
VAR Result =
CALCULATETABLE(NATURALLEFTOUTERJOIN ( P_A, P_B ), TREATAS ( P_A, P_B[ProductKey], P_A[Code], P_A[Color] ))
RETURN
Result

 

Here we perform the TREATAS in an argument of the CALCULATETABLE. We are providing the CALCULATETABLE with a table (result of the TREATAS) with the adequate lineage changes. Any idea why this wouldn't work?

 

2. This does not work either

 

EVALUATE
VAR B_TreatAs =
TREATAS ( P_A, P_B[ProductKey], P_A[Code], P_A[Color] )
VAR Result =
NATURALLEFTOUTERJOIN ( P_A, P_B )
RETURN
Result

 

The only difference with the initial code we are using P_A as the first argument of the INNERJOIN instead of the variable B_TreatAs. Here is my reasoning as to why this should work:
TREATAS properly modifies the lineage when the variable B_treatAs is created. So P_B[ProductKey] has already "received" the lineage of P_A[ProductKey]. Once that has happened, the Innerjoin should work with both P_A or B_TreatAs.

What is wrong with my reasoning? Or is it that the variable B_TreatAs is actually not created if the variable is not used/consumed?

 

Many thanks

Hi AIB

Will get back to you.
Will have to look into this topic.

Regards
Zubair

Please try my custom visuals

@Zubair_Muhammad 

Just wanted to say thanks. Adding a text value to the column values solved my issue. 

Without the "C", the below didn't worked :). 

 

NATURALINNERJOIN(
   SELECTCOLUMNS(
      filter (
         'Model N Query'
         , 'Model N Query'[Date check product] = "Current"
            && 'Model N Query'[Date check header] = "Current"
            && 'Model N Query'[Doc Status Desc] = "Implemented"
            && 'Model N Query'[Sub Type Desc] = "Adjusted Cost"
         )
      , "InnerJoin" , "C" & 'Model N Query'[Contract Numb]
      , "Product" , 'Model N Query'[PROD_NUM]
      , "Currency" , 'Model N Query'[Tier1 Curr]
      )
   , SELECTCOLUMNS(
      filter (
         'Inclusion query'
         , 'Inclusion query'[Date check inclusion] = "Current"
         )
      , "InnerJoin" , "C" & 'Inclusion query'[Contract ID]
      , "Customer" , 'Inclusion query'[Customer ID]
      , "Wholesaler" , 'Inclusion query'[Wholesaler Name]
      )
   )

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.