BUG Assume Referential Integrity

mhotek · ‎03-21-2017

Yes, I read the article on going to the support site and submitting a bug report. I'm not a Pro user, so I don't get free support and I'm not about to charge my credit card to determine this is a bug and then have the charge refunded.

The Assume Referential Integrity option in the model does not work. I have 2 tables OrderHeader and Payment. They have a foreign key defined as OrderHeader.OrderID = Payment.OrderID. This foreign key is active and enforcing the relationship, so it is impossible to have a value in Payment.OrderID that does not exist in the OrderHeader table. However, when I select Assume Referential Integrity, I get an error back that there are rows in the Payment table that do not exist in the OrderHeader table. This is false.

The code that is being used is invalid to perform this type of check. It ran the following 3 queries:

SELECT

COUNT_BIG(*)

AS [a0]

FROM

(

(select [OrderID] as [OrderID],

--additional columns deleted

from [dbo.][OrderHeader] as [$Table])

)

AS [t1]

SELECT

COUNT_BIG(DISTINCT [t12].[OrderID])

AS [a0]

FROM

(

(select [PaymentID] as [PaymentID],

[OrderID] as [OrderID],

--additional columns deleted

from [dbo].[Payment] as [$Table])

)

AS [t12]

SELECT

COUNT_BIG(*)

AS [a0]

FROM

(

(select [OrderID] as [OrderID],

--additional rows deleted

from [OnlineSales].[OrderHeader] as [$Table])

)

AS [t1]

It appears that instead of checking to see if there are actually values in the Payment table that do not exist in the OrderHeader table, it is instead doing a pair of counts, comparing the values, and if they do not match, saying that there is no enforced referential integrity. This is invalid. This set of queries is saying that there MUST be a payment for EVERY value of OrderID. It does NOT say, there must be an OrderHeader.OrderID value for every Payment.OrderID.

datamodel · ‎11-01-2019

This is still a bug 😞

obs-shand · ‎03-26-2020

This is likely because Power BI executes the following query for the two tables:

SELECT TOP (20000)
  [t0].[Id]
FROM ((
  SELECT ...
  FROM [dbo].[Table] AS [$Table])
) AS [t0]
GROUP BY [t0].[Id]
ORDER BY
  [t0].[Id] ASC

I assume PowerBI is comparing the keys from the first 20000 rows in one table with the first 20000 rows in the second table. This is a terrible idea as it is possible that there may be more rows in the one table than the many table. If the first 20000 rows in the table on the one side of the relationship do not have rows in the many side of the relationship, this test will fail incorrectly.

This can be reproduced by creating two tables with auto-identity turned on. Insert more than 20000 rows into the table representing the one side of the relationship. Insert rows into the table representing the many side of the relationship but with RelatedId > 20000.

gmckinnon · ‎06-15-2020

I have also found this bug, I've tried rearranging the tables to order they join keys in a way to allow all the unique values to be read by power bi but has not worked.

mattcarter865 · ‎06-02-2021

Figured out a work around. Just use Tabular Editor and edit the Rely On Referential Integrity option for the relationship.

kr77025 · ‎05-06-2022

I confirmed this is a bug for me when my dimension table is > 20000 records. In the same model, i was able to successfully specify Referential Integrity but it was on my Date dimension that has < 20000 records.

Power BI Embedded with DirectLake Dataset Error

Small multiple Y axis values disappearing

Publish to web url not working

Slicer bug?

TypeConversionFailure when not trying to convert