I have a PBIX file where when I go to refresh my data from a SQL server the refresh eventually fails with an error message that says one of my tables contains duplicate IDs. I went to my actual SQL DB and ran a query on the table for the exact ID provided by the error message. Only one row is returned by the straight query, therefore there are no duplicate values in my table.
I'm thinking this is some sort of red herring and that the real issue lies either in a relationship or somewhere else in the data. Any suggestions on how to troubleshoot? I've updated PowerBI to the latest released version.
I am having the same issue where I query data from app insights with
Data is fetched into a tables which contain one row per day having the date as a unique column set in the model. All tables also have 1:1 relation to a date-table which is the date master table.
When refreshing with Power BI desktop, everything works as expected. When doing a scheduled (incremental) refresh in the cloud, it fails with error:
Column '<pii>Date</pii>' in Table '<pii>SomeTable</pii>' contains a duplicate value '<pii>10/1/2021</pii>' and this is not allowed for columns on the one side of a many-to-one relationship or for columns that are used as the primary key of a table.
This is 100% deterministic and it fails always on the same date (10/1/2021).
I tried to make the Date column non-unique and changed the relation to be 1:*. This made the scheduled update work but the data was wrong since I had to apply a sum() function to get 1 row per day. 10/1/2021 was x2 then.
One observation I made:
When setting incremental refreshs to.
refresh: 1 month
detect changes: Date column
The error is deterministic with 10/1/2021 occuring twice.
If I change to
refresh: 2 weeks
detect changes: Date column
The error occurs with other dates (more than 1). In fact a series of 5 days is x2.
I, of course, checked the result by querying directly in app insights. The data is correct with no duplictate rows.
I have also had this issue when I am pulling data from an API. I assumed it was tempormentality of the API requests but I am consistently able to manually refresh or refresh in the desktop version whereas the PowerBI scheduled refresh seems to fail very regularly so I am now convinced there is a flaw in the way the cacheing of some queries works. My primary key is a integer ID column and whilst I don't personally manage the database I know that it is the primary key of an SQL table so it is almost inconceivable that there could be a genuine duplicate.
I am using the staging table suggested by mahidharsridhar below. Microsoft do need to look into this.
On many occasions I've also come across and been able to clear this issue via issuing an additional manual refresh but today a manual refresh wouldn't clear it.
Finally found my issue was caused by a Upper/Lower CASE SENSITIVITY ISSUE...
I use a bridge table to create a usable relationship between two *:* tables (instead creates a 1:* in both directions) and it seems I had a repeated value (only differing in case) when merging the two sources for the bridge. Power query would properly perform the merge when I was in the "Transform Data" window but I think it became case sensitive when running a refresh of the entire file...
Weird right? Anyway; hope this helps someone out there.
I had this issue with my SSAS Tabular, and have solved by "Processing it in Full"
I'm sure you can apply this with your PBI refresh.
Alternatively remove the Unique Key constraint within PBI (SSAS has this feature)
We have about 20 Power BI app workspaces scheduled to refresh on a regular bases... we get the false duplicate error message about 1 in every 100 refreshes. For those with the false duplicate error reported, we perform a manual on-demand refresh from the service and this solves the false reported duplicate 100% of the times.
It has been confirmed as a "false" duplicate error, but must be some other error.
Maybe its something to do with not clearing the table completely before reloading.
We have also seen the same behavior with PBI desktop, and again just doing another immediate refresh the reported duplicate is gone. Weird
This "false duplicate" error is still occurring for us. However it is increasing in frequency, we are now seeing this in more and more PBI apps, and the false error is happening about 20% of the time during scheduled refreshes. We are now seeing it reported as a problem experienced by an increased number of content producers too.
We can't reproduce the false error on-demand which limits our ability to troubleshoot. We are sharing this information here in hopes of keeping MS and the community aware of the problem.
We currently ignore the false error messages as it disappears with the next refresh cycle. However it appears the symptom is much more broad, varied, and not localized to our tenant or premium capacity.
Can you give more information on "we perform a manual on-demand refresh from the service and this solves the false reported duplicate 100% of the times."
I am having the same issue and need to try this, but I am not sure what to do. Thanks.
I should NOT have said it solves the problem... it just proves that there is not really a duplicate ID 100% of the time.
The problem is still happening for us and how at a much greater frequency.
Can confirm this is still an issue, started facing this recently. Also confirmed the source data has no duplicates at all, when this error pops up, in 99% of cases just re-doing the refresh makes it go away. Definitely a bug where the string matching routine within PBI thinks something is a dup, when in reality it isn't
For a workaround, please see below
I am experiencing the same issue when importing a file. The file format has not changed for months and so has relationships to other files.
The column that returns the error only contians unique values in form of ID's, validated that using excel. For somereason powerbi claims that the column contains ,,,None,Functional which are values from another column in the data set. the columns are 9 columns appart in the data set.
Error Message is below.
Column 'ID' in Table 'ReportExport' contains a duplicate value ',,,None,Functional' and this is not allowed for columns on the one side of a many-to-one relationship or for columns that are used as the primary key of a table. Table: ReportExport.Cluster URI:WABI-US-NORTH-CENTRAL-redirect.analysis.windows.netActivity ID:debdaefc-f0cb-4ae0-b698-b77851d4e999Request ID:2305575e-f6be-c22d-0bb4-549b25ac04f4Time:2019-01-25 15:59:14Z
I have no idea how to solve the suddenly introduced issue.
Hi, did you ever find out why this was happening? I encountered the same error message when I was trying to load the query to the model so I removed the column containing the duplicate values -->successfully loaded to the model-->went back to the query and deleted the step for removing the column---> successfully loaded back to the model containing the column with duplicates.
I have since tested several dashboards and gone through the same step of loading the model with the product ID (which continues to contain duplicates) and the query has loaded successfully without me having to go through the additional step of removing the column and adding back in. I'm not able to understand why it happened or why I couldn't replicate the error message. I am worried that any auto-refreshes once I publish might cause the query to fail at some point.
Can you share the exact error message that you are getting? Also, duplicates in DAX are not necessarily the same as in SQL so can you share the format of some of your ID's? DAX may be considering things to be duplicates that are not duplicates in SQL.
Hi, here are images from PowerBI and SQL.
There is not much DAX in this data model. There are a series of Measures to cacluate the "Latest" SWC since each row has a unique data time stamp. The SWC table has some modifications in the Query editor, some look-up columns are expanded and the date column is duplicated so that I can have the date in multiple user readable formats.
Try doing a Trim and Clean on that field in Power Query. If you have it in there with a trailing space, for example, or maybe a leading zero I believe the model will consider that a duplicate but your SQL will not find that.
No dice, which I can't say I'm suprised. We authored the application that is filling the SQL DB and while our app is not perfect, one part that is rock solid is the SWC ID. Its the unique key for that table in SQL too, so SQL will not allow duplicates (though I agree there can be situtations where something might not be seen as a dupe by SQL).
Here's what I would do. Create a new Power BI file and copy over your query for this table. Run the import/refresh. Since it is not part of a relationship, you will not get that error and then you will be able to better assess what is causing the duplicate issue. I can't imagine you are getting that error on a lark, something is causing it.
Yeah, that is more or less where I've gotten to. Made a copy of the PBIX and started to isolate and delete stuff to see what came of it. Looks like I found a relationship causing the issue. We'll see.
Thanks for replying though!
Click here to read more about the December 2021 Updates!
Don't miss the User Group Leader meetings on January, 24th & 25th, 2022.
Mark your calendars and join us for our next Power BI Dev Camp!