Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Syndicate_Admin
Administrator
Administrator

How do you prevent a dataflow from creating duplicates?

I have a datasource I'm using to load data into a Dataverse table but am finding it creates duplicate records with each import since the original records still exist in the source table. What is the best way to prevent the creation of duplicate records?

 

Here's an example with some sample data:

 

arpost_0-1636037953333.png

I know I could do a standard Merge step that only include items from the source not present in the Dataverse table, but the issue is a value for an existing record could theoretically change between imports, so I'm wanting to validate data in the most performant way possible.

 

1 ACCEPTED SOLUTION

Thanks for the reply, @cchannon. Very interesting. Is that different from the default key that is created for Dataverse tables (e.g., Name)? Also, how do these map onto items being imported? If, say, the data is coming from an Excel spreadsheet, how would those rows be identified, or is that handled behind the scenes by the dataflow?

View solution in original post

8 REPLIES 8
Syndicate_Admin
Administrator
Administrator

Also, @MischaSz and @romanduda: in the future when you have issues like this, I suggest you create a new forum post, and just give people a link back to the old one. This post is already resolved, so most people on the forums will just ignore it. By creating a new post, you will get the attention of more users who can help you out, faster.

Syndicate_Admin
Administrator
Administrator

If you want the dataflow to always yield an exact copy, inclusive of delete operations, then what you are looking for is a FULL refresh instead of an INCREMENTAL refresh. Full will completely sync the target table with the source to make them identically match every time. Incremental will only push incremental changes.  But you will also need to check the box in your dataflow designer that says, "Delete Rows that no longer exist in the query output"

See this old forum post here for more info on refresh behavior and deleting: Solved: Dataflows - Refresh behavior - Power Platform Community (microsoft.com)

Syndicate_Admin
Administrator
Administrator

I have the same question. My table should be an exact copy of the datasource (which are multiple price lists merged together). The app I want to create is a lookup tool for those merged price lists. I'm currently getting duplicates with every refresh. I understand that I can solve this by creating keys from some columns. However, I can't find out how I can automatically remove rows from the table if they do not exist in the datasource anymore. 

Syndicate_Admin
Administrator
Administrator

Hey cchannon,the update of the rows works fine. After a refresh only the new rows were added but the refresh ended up in an error. It also do not delete rows which where deleted in the datasource. Do you have any advice for it?

Best regards Roman

Syndicate_Admin
Administrator
Administrator

Make sure your target table has a Key set up that uniquely identifies each row (maybe just the name? or a composite of name and type?) Dataflow will pick up on the key on the table and for all the rows to be imported it will use the key to upsert instead of just inserting.

Thanks for the reply, @cchannon. Very interesting. Is that different from the default key that is created for Dataverse tables (e.g., Name)? Also, how do these map onto items being imported? If, say, the data is coming from an Excel spreadsheet, how would those rows be identified, or is that handled behind the scenes by the dataflow?

Yes, it is different from the default Primary Column because the default does not enforce uniqueness, whereas a configured Key does.

 

To set one up, just navigate to the table you want to configure in the make.powerapps UI and you'll find Keys in the left nav. Select that, then create a new key, and you can then select all the columns you want to use as a composite to form a unique key. For example, in the screenshot below, I have a table called Assignment and I am creating a key that enforces uniqueness by a combination of Name, Due, and Type:

cchannon_0-1636383375307.png

As far as linking them together with your import, that's actually the easy part! Dataflow automatically detects that there is a Key associated to the table you are importing and takes the field mapping you provide to build that key and validate it against what is already in the table. If the key for the imported row already exists in the table, Dataverse does an Update, otherwise it Inserts (and you don't need to set up any of the logic - it just works!)

That's awesome! Unfortunately, I'm currently using a Dataverse for Teams table, which apparently doesn't allow users to define keys in tables. 😢

 

Great to know this is possible, though. Much appreciated.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors