Re: Single crossfiltering ignored for columns?

SuraMan · ‎02-13-2019

Hello,

I understand that when relationships are set to Single Direction Crossfiltering, filtering from the many side does not flow to the one side. For example, in the attached data model, filters on Store should not flow to District. This PBIX was taken from https://docs.microsoft.com/en-us/power-bi/sample-retail-analysis.

Then the question is, why the two visualisations show conflicting values.

First visualisation table: Store.[Chain], District.[District]

Second visualisation table: Store.[Chain], DISTINCTCOUNT( District.[District])

Why is the first visualisation table filtering appropriate District.[District]s, but when changed it to aggragate Count(Distinct), filtering doesn't flow to District.[District]?

Single direction crossfiltering

v-jiascu-msft · ‎03-04-2019

Hi @SuraMan ,

Could you please mark the proper answers as solutions?

Best Regards,

Community Support Team _ Dale
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

SuraMan · ‎03-04-2019

@v-jiascu-msft wrote:
Hi @SuraMan ,

Could you please mark the proper answers as solutions?

Best Regards,

Hi @v-jiascu-msft ,

No, a proper explanation has not been given.

1. Display the below two columns in a table:

Store.[Chain], District.[District] --> District.[District] is filtered appropriately in each row.

2. Display the below two columns in a table:

Store.[Chain], DISTINCTCOUNT( District.[District]) --> District.[District] is not filtered.

I would like an explanation preferrably from someone who understands this. For example, do "expanded tables" have anything to do with this? Is the expanded version of Store table used in the first situation? If so, why doesn't DISTINCTCOUNT() not apply on the expanded version of Store?

If you want to know what expanded tables are, please see:

https://www.sqlbi.com/articles/expanded-tables-in-dax/

Please note that expanded tables are created by ignoring the cross filtering direction.

Thanks

Regards

d_gosbell · ‎03-07-2019

I would like an explanation preferrably from someone who understands this. For example, do "expanded tables" have anything to do with this? Is the expanded version of Store table used in the first situation? If so, why doesn't DISTINCTCOUNT() not apply on the expanded version of Store?

Yes, expanded tables do come into play here. But the set of expanded tables is not static, it depends on the context of your calculation and which columns you are calculating over.

Tables only expand into other tables that are "pointing" at them. So if you do any calculations over columns in the Store table it will consider columns in the District table because the arrow in the relationship from the District table points at Store.

But if you do calculations over District it will not include any of the columns from the Store table in it's expanded tables as the relationship is pointing in the other direction (the arrow is pointing away from District). So doing DistinctCount( District[District] ) will not "group by" any columns in Store as the Store table is not part of the expanded tables for District so all the engine can do is to repeat the unfiltered total for each row.

SuraMan · ‎03-07-2019

So if you do any calculations over columns in the Store table it will consider columns in the District table because the arrow in the relationship from the District table points at Store.

But if you do calculations over District it will not include any of the columns from the Store table in it's expanded tables as the relationship is pointing in the other direction (the arrow is pointing away from District). So doing DistinctCount( District[District] ) will not "group by" any columns in Store as the Store table is not part of the expanded tables for District so all the engine can do is to repeat the unfiltered total for each row.

Column District[District] appears in two tables; (1) District table, and (2) expanded Store table.

So, it is incorrect to assume that DISTINCTCOUNT( District[District]) will operate on District table; it has the option of operating on either of the above tables. Which table is chosen and why, appears to be a Power-BI's internal implementation decision which needs an explanation. In my opinion, a consistent answer should be able to address the two table examples ("First visualisation" and "Second visualisation") I have given in the first first post, and consistently explain why the first visualisation used the expanded Store table but the second visualisation did not even though it could have.

d_gosbell · ‎03-07-2019

Column District[District] appears in two tables; (1) District table, and (2) expanded Store table.

This is not correct there are two district columns District[District] and Store[District]

So, it is incorrect to assume that DISTINCTCOUNT( District[District]) will operate on District table; it has the option of operating on either of the above tables.

No it is not. You are explicity using the District column from the District table

If you have a look back at my original answer I suggested changing your calcuation to DISTINCTCOUNT( Store[District]) so that it would use the District from the Store table instead of using the column from the District table in order to get it working the way you wanted.

SuraMan · ‎03-07-2019

So, it is incorrect to assume that DISTINCTCOUNT( District[District]) will operate on District table; it has the option of operating on either of the above tables.

No it is not. You are explicity using the District column from the District table

This is why I think that a successful explanation needs to explain both "First visualisation" and "Second visualisation" consistently. Both the first and second visualisations explicitly use District[District] reference, but Power BI chooses the expanded Store table and populates column District[District] from the expanded Store table for the first visual and it doesn't do so in the second visualisation.

Just because there is a preceeding "District" in "District[District]" does not mean it refers to District table. That is not how table expansion works. Please see the URL below and observe that TopSellerProduct[Product] is also a part of the expanded Product table.

https://www.sqlbi.com/articles/expanded-tables-in-dax/

PS:

I have read Kimball's and Chris Adamson's books on Dimensional Modelling and I know workarounds to get this to work in the way I expect. This is a purposefully constructed example to highlight what I think an inconsistency that I think needs deeper technical explanation from those who understand the internals.

v-jiascu-msft · ‎03-06-2019

Hi @SuraMan ,

I think the explanation of Expanded tables is too technical. To be honest, the relationship and filtering are the most complex parts in DAX, especially when they are nested.

Some of my thoughts are as follows.

1. There could be two parts of the connection between the two tables. One is the relationship, which connects two tables. The other one is filtering.

2. Actually, both the two fields are the context in the first visual. They are just connected by the relationship.

3. In the second visual, the context is the [Chain]. The filter direction works now. We can ask like "how many districts of each Chain?". But the context [Chain] can't filter the other table. So it counts all the [District] that it connects to.

I hope these above can give you some ideas.

Best Regards,

Community Support Team _ Dale
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

SuraMan · ‎03-07-2019

Hi @v-jiascu-msft

The last two posts I posted in this thread were marked by "SPAM" by someone, and as a result, they were removed.

Can you please investigate this abusive behaviour of marking legitimate posts as "spam" to remove them and take action against those who did it? Can you also restore my posts that were removed?

Thanks

v-jiascu-msft · ‎03-07-2019

Hi @SuraMan ,

I have removed them from the Spams. This is a system function without human intervention. Please feel free to @me if you have such issues.

Best Regards,

Community Support Team _ Dale
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

d_gosbell · ‎03-07-2019

Hmm, I did wonder where that reply went. 🙂 I got the text below in the email notification of your response:

This is why I think that a successful explanation needs to explain both "First visualisation" and "Second visualisation" consistently. Both the first and second visualisations explicitly uses District[District] reference, but Power BI chooses the expanded Store table and populate District[District] from it for the first visual and it doesn't do so in the second visual.

So "Table Expansion" is not a phyical thing, it's a conceptual tool used to help explain certain behaviours. In this case I think it's just confusing the issue.

With standard single direction relationships, tables on the "one" side filter tables on the "many" side (so District filters Store)

But tables on the "many" side DO NOT filter tables on the "one" side (so Stores does not filter Disticts)

In Visual 1 the order of the columns on the visual is irrelevant, because of the relationships in the model District[District] is filtering Store[Chain]. One conceptual way of thinking of this is that the engine evaluates all the combinations of Store[Chain] and District[District] and it finds that District[District] is in the extended tables for Store so it uses the current District[District] to filter records from the Stores table.

But in Visual 2 we have a set of rows from Store[Chain] to calculate a measure we have to pass the values from Store[Chain] as a filter context to the column that is referenced in the distinctcount. In order to apply the filter context it looks through the extended tables associated with the District table to see if Store[Chain] can be reached, but because Store is on the "many" side of the relationship the Store table is not part of the extended tables for the District table. Each table will have a different set of extended tables, notice in Alberto's article that Sales expands to Product and TopSellerProduct, but Product only expands to TopSellerProduct .

Just because there is a preceeding "District" in "District[District]" does not mean it refers to District table.

This is not correct. The preceeding district does mean that we are explicitly referencing a column in the District table.

SuraMan · ‎03-07-2019

Just because there is a preceding "District" in "District[District]" does not mean it refers to District table.
This is not correct. The preceeding district does mean that we are explicitly referencing a column in the District table.

If you observe Alberto's example, Product[Product] appears in all 3 expanded tables and not limited to Product table. Just because there was a preceding Product in Product[Product] did not prevent that column being present in all 3 expanded tables.

In Visual 1 the order of the columns on the visual is irrelevant, because of the relationships in the model District[District] is filtering Store[Chain].

How that works is through expanded tables. Please see below from Alberto's article.

The filter applied on Product[Product] follows the relationship between Product and Sales, thus filtering the Sales table too. A better description of that same filter propagation uses the concept of expanded tables. When you filter Product[Product], all the tables that contain that column – either native or related – are filtered.
Thus, Sales is filtered by Product[Product] because the expanded version of Sales contains Product[Product].

Each expanded table contains both native and related columns. Native columns are the ones originally present in the table. Related columns are all the columns of related tables, added to the original table through table expansion.

What the above explanation from Alberto says is that, even in the First Visual, the filtering you describe happened through the expanded Store table because it contained District[District] in it as a related column.

Now to summarise;

Both First visual and Second visual uses the same two columns only. Store[Chain] and District[District]. Nothing else.

expanded Store table contains both of the above columns.

So, both visuals had the option of choosing the expanded Store table.

Clearly, the first visual chose it, but the second visual did not. Second visual could also have easily chosen the expanded Store table because both the required columns (native column Store[Chain] and related column District[District]) are in it, so why it refused to chose the expanded Store table needs a consistent explanation.

d_gosbell · ‎03-07-2019

What the above explanation from Alberto says is that, even in the First Visual, the filtering you describe happened through the expanded Store table because it contained District[District] in it.

Yes that is what I was saying in my previous response. So I think we in agreement here. That visual 1 works as expected because District is in the expanded Store table. So let's focus now on the behaviour of Visual 2

How that works is through expanded tables. Please see below from Alberto's article.

The filter applied on Product[Product] follows the relationship between Product and Sales, thus filtering the Sales table too. A better description of that same filter propagation uses the concept of expanded tables. When you filter Product[Product], all the tables that contain that column – either native or related – are filtered.
Thus, Sales is filtered by Product[Product] because the expanded version of Sales contains Product[Product].

Each expanded table contains both native and related columns. Native columns are the ones originally present in the table. Related columns are all the columns of related tables, added to the original table through table expansion.

Yes, I did the technical review on Alberto and Marco's first book on the tabular engine. I know the concepts of extended tables and filter propogation well. When I read the above I see that it describes the behaviour that we are seeing in Visual 2.

If you translate the statements above to your schema - Store is filtered by District because District is in the extended table for Store.

But District is NOT filtered by Store as Store is not in the extended table for District.

So when you do DISTINCTCOUNT( District[District] ) , because you are explicitly referencing the District table, then engine looks in the extended tables for District to see if it can propogate any of the current filters, but it cannot because District does not have any extended tables in your schema (ie. there are no tables on "one" side of a "one to many" relationship)

SuraMan · ‎03-07-2019

Yes, I did the technical review on Alberto and Marco's first book on the tabular engine. I know the concepts of extended tables and filter propogation well.

So then, we are in agreement that District[District] is also a column of expanded Store table.

Now let's try to guess what Power BI rendering engine thinks.

First Visual:

Power BI engine thinks: "This visual needs Store[Chain] and District[District]. I could have chosen expanded District table but it does not have Store[Chain], so I decide not to chose it. So I decide to choose expanded Store table because it contains both columns."

Second Visual:

Power BI engine thinks: "This visual also needs the same two columns as above. I can choose expanded Store table as before because it contains both columns, but I will NOT chose it this time! Instead, I will chose expanded District table and then say; I can't get to Store[Chain] from this!!"

It is this contradicting "thinking" behaviour that needs a consistent explanation.

Input columns for both visuals are the same, so it is logical to assume that the both visuals should chose the same expanded table.

We are not being consistent if we say that for the First Visual, the rendering engine chooses an expanded table that has both the required columns but for the Second Visual, the rendering engine doesn't do that. That is the problem I have with @d_gosbell answer.

d_gosbell · ‎03-08-2019

So then, we are in agreement that District[District] is also a column of expanded Store table.

Yes, it was never in contention that Store can expand to District

Now let's try to guess what Power BI rendering engine thinks.

We don't really need to guess, Alberto's article has a pretty good explanation.

I could have chosen expanded District table

This is not correct. In your schema District cannot be expanded at all. It does not have any relationships pointing in the right direction to allow it to expand.

Second Visual:
Power BI engine thinks: "This visual also needs the same two columns as above. I can choose expanded Store table as before because it contains both columns,

No, it does not think this at all. The second visual has one column Store[Chain] and one measure. Measures are calculated in a filter context. So as it calculates each row of the table it picks up the current value of Store[Chain] and puts that in the filter context. It then looks at the fact that it's doing a distinct count over District[District] it looks at district and realizes that it cannot expand District to Store as the relationship is pointing the wrong way so it cannot apply the current value of Store[Chain] to the District table, so the District table is effectively unfiltered at this point so the DistinctCount that is returned is for the entire table.

This behaviour is consistent and explainable although I would agree that sometimes it can seem inconsistent until you dig into it.

I hope this makes sense, but if my attempts at explaining this are still not making sense then I'm sorry and I might bow out and let someone else see if they can attempt to explain these concepts in a different way.

SuraMan · ‎03-07-2019

Hi @v-jiascu-msft ,

Thank you for trying to help.

Are you able to get this answered by a Microsoft Power BI expert who understands this behaviour and can explain authoritatively why and how this happens?

Thanks

d_gosbell · ‎02-13-2019

> Why is the first visualisation table filtering appropriate District.[District]s, but when changed it to aggragate Count(Distinct), filtering doesn't flow to District.[District]?

In your first visual Chain is not actually filtering District, the District is filtering the Chains (the order on the visual is not important, this is driven by the relationships between the tables)

One possible "fix" for this is to not create a measure on a table on the "one" side of a relationship. If you did DISTINCTCOUNT( Store[DistrictID] ) I think you will get the result you are expecting.

SuraMan · ‎02-13-2019

@d_gosbell wrote:

In your first visual Chain is not actually filtering District, the District is filtering the Chains (the order on the visual is not important, this is driven by the relationships between the tables)

Hi @d_gosbell,

Thanks for the quick reply.

Is there a rule to find out which tables filters which and when? I understand that we can hypothesise that, in this case, Power BI knows that Store cannot filter District, therefore, it magically decides to go other way and use District to filter Store. Then why does DISTINCTCOUNT(District[District]) not give the filtered output? It looks as if that the measure is evaluated before two tables are joined and filtered for display. You see, these are only our guesses based on what we see. Are there definite rules to explain how these things work?

PS:

I know that DISTINCTCOUNT on Store tables gives the desired output, however, I am using this to highlight what I see as an inconsistency.

Secondly, as a design pattern, is it possible to avoid defining Measures on columns from "one side"? Suppose there is a column in District called [State] and we want the distinct count of [State]?

thanks

d_gosbell · ‎02-14-2019

> Is there a rule to find out which tables filters which and when?

The filtering follows the relationships. See how the arrow on the relationship points from District to Store? This indicates how the filters flow from one table to the other.

>Then why does DISTINCTCOUNT(District[District]) not give the filtered output?

Because your second visual only has an attribute from stores in it. So the District table therefore cannot be filtered, so it returns the count of all district rows for each chain

> Secondly, as a design pattern, is it possible to avoid defining Measures on columns from "one side"?

Yes, if you use dimensional modelling techniques to build your schema this is nearly always possible. A guy called Ralph Kimball wrote a number of excellent books on dimensional modelling, they were written many years ago before PowerBI existed, but the theory in those books is still valid today

> Suppose there is a column in District called [State] and we want the distinct count of [State]?

Typically you don't find business scenarios like this, there is not a lot of value in just counting states in isolation. Typically you'll want a count of distinct customers by state, or total sales amount by state or count of orders by state and in all these scenarios state is used as a grouping column, not as something that you apply an aggregate to

Single crossfiltering ignored for columns?

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly