Duplication and Missing records after inner join M...

scylld · ‎03-15-2018

I have issue with using Power BI desktop Merge Inner Join. I am pulling one dim table, services purchased, that has a unique key with 103829 unique values. I need to reduce this data set to be only the records in my fact table. The fact table, customer orders, has 59142 records but the ones that connect to my dim table is only 6228 unique values. I load both and do a merge of the dim table with the fact to reduce the table size. Post merge the number of records fluctuates from 5989 to 6293 and the number of unique values ranges from 5977to 6191. The amount of duplication can be as low as 3 and as high 105. It fluctuates every load, the data is coming from our data warehouse via SQL which only updates prior to 6 am daily so the data is not changing. I would consider using a remove duplicates but that will not resolve the issue that I am also getting a record shortage as well. The table should have exactly 6228 unique values. Every order by a customer is something in the services purchased list. I pull the data twice, once with the merge and once with out and the values on the one that is not merged do not change on each load. That table, non-merged, has a relationship with the fact table. The merged table relationship would break on every reload since the values are no longer unique.

Any help would be appreciated.

v-xjiin-msft · ‎03-16-2018

Hi @scylld,

When you Merge the fact table and dim table, you are using Inner join type. Right? Have verified that the relationship between two tables are one to one?

And since we cannot see the actual data in your scenario. It's hard to say where the issue is. Also I would suggest you to join two tables at SQL side. In my opinion, the join in SQL is more powerful than the join in Power Query.

Thanks,
Xi Jin.

scylld · ‎03-16-2018

I can join in SQL and it is a 1 to many relationship. The reason I do not currently do that is that this is only currently for one business unit. So my initial load on my fact table is for the entire business, then I eliminate only the rows that fit the business unit desired using a filter. This way if other business units want the report as well it only requires one change in one location to alter the load. If I make the change in the SQL I would have to make it to every SQL statement. I want to use the internal system to BI Desktop to manipulate the data. I have two loads for both tables with the same SQL, one with out merge or filter and one with. I am using tables on each doing a count and count distinct to verify the issue.

--Fact

--ActiveTransOrderID is Primary Key and will be Unique

--ActiveCustomerServiceID is non-unique

--After Load in BI Desktop we eliminate the BRANDCD that we no longer need

Select Distinct

ATO.ActiveTransOrderID,

ATO.ActiveCustomerServiceID,

ATO.CustomerID,

VDS.BrandCd

from ActiveTransOrder ATO

Inner Join ActiveCustomerService ACS

on ATO.ActiveCustomerServiceID = ACS.ActiveCustomerServiceID

Inner Join v_dim_ServiceUnit VDS

on ACS.SyCustomUnitID = VDS.CustomUnitID

Where ACS.OrderDate > '7/1/2017'

--Pre Filter Load

--(299,281)

--Post Filter Load

--(59,142 records returned with 6206 unique values that join to the dim)

--Dim

--ActiveCustomerServiceID is Primary Key for table and Unique

Select Distinct

ACS.ActiveCustomerServiceID,

ACS.Code,

ACS.ProviderID

from ActiveCustomerService ACS

Where OrderDate > '7/1/2017'

--Pre Merge Load

--(103,829)

--Post Merge Load

--(6230 with 6203 unique 1st Run)

--(6195with 6175 unique 2nd Run)

--(6222with 6189 unique 3rd Run)

--all runs were about 15 minutes apart

Duplication and Missing records after inner join Merge

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024

How to Get Your Question Answered Quickly