Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
edwardrmiles
Helper III
Helper III

Duplicates values exist in a column when in report / data view but not in query editor

Hi, I have encountered a problem whereby a column which is unique in query editor appears to lose uniqueness in the report view. I'm a certain as I can be the the column is unique in the data source and query editor but it is definitely not unique in report view because - amongst other checks - I cannot use it to create a relationship and the data view shows row count > distinct values

 

Has anyone seen this before? I've seen posts about similar issues here but they didn't get resolved.

 

Other useful info:

 

- The column is the primary key of a table in a PostgresSQL database and is of type int8 (and there are no leading zeroes)

- Power query imports the column as a whole number (without an additional 'changed type' query step)

- Using Count Values & Count Distinct Values (from Transform > Statistics) returns an identical value and the value matches a row count in the database

- I've also tried doing a full refresh, a preview refresh, and a restart

- It affects another dimension table from this database but not all of them

 

PS unfortunately I cannot share the PBIX

Thanks

 

 

13 REPLIES 13
MarcelBeug
Community Champion
Community Champion

If the column is of Int8 type than you have a maximum of distinct 256 values, so if you sort and check in the data view, you should easily find the duplicates (or any empty values)?

 

I know a phenomenon where adding a column with Int8.Type displays values in Power Query, even if they are outside the range, but they won't load into the data model.

I raised an issue some time ago, but it is qualified as "by design".

Try the simple query below and you'll see results in the Power Query editor that won't load into the data model.

 

let
    Source = {125..130},
    #"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), null, null, ExtraValues.Error),
    #"Added Custom" = Table.AddColumn(#"Converted to Table", "Custom", each [Column1], Int8.Type)
in
    #"Added Custom"
Specializing in Power Query Formula Language (M)

Thanks Marcel, the Int8 is the data type in PostgresSQL, which is a 64 bit int, so the values shouldn't be out of range. The value in query editor was 'whole number'. I decided to try explicty converting the column to type Int.64 but that didn't appear to make a difference

You can try and create a column chart with the count of the values, so anything >1 will be visible immediately.

Specializing in Power Query Formula Language (M)

Thanks Marcel; I'm able to identify the duplicates, but it isn't clear why they are being duplicated. The final result in the query editor should be identical to the data table in the report view, or at least I'm not aware of any reason why it shouldn't be...

Well then please share the information so we'll be better able to help you.

Which value is duplicated and what are the corresponding values in the Query Editor?

 

 

 

Specializing in Power Query Formula Language (M)

In the report view there are 403,320 rows and 399,226 distinct values, so 4,094 duplicates (all the duplictaes have 2 rows). There doesn't appear to be a pattern in the duplicated ids

 

As you can see from the 2 images below, id 35 occurs twice in report view, once in the query editor

 

duplicates.PNG

 

duplicates2.PNG

 

 

Hm, that doesn't look like Int8 as these range from -128 thru 127.

 

Anyhow, this is what I would do to analyse: add an Index column to the table and check which index values show up with id 35 in the data model and compare this with the id's in the Query Editor for those index values.

Specializing in Power Query Formula Language (M)

Int8 is the data type of the column in PostgresSQL, which is an 64 bit interger (8 bytes).  The data type of the column in query editor is 'whole number' which I think defaults to Int.64. So I don't think that should cause a problem

 

I can find the duplicate values no problem, it's why they are bing duplicated I'm struggling with. Interestingly, when I do a full refresh the duplicate ids change.

Please read part 2 of my previous post, with a suggestion how to analyze. That should give you the values in the Power Query editor that are duplicates in the data model. Maybe that can lead to an explanation why these are distinct in the Query Editor and duplicate in the data model. 

Specializing in Power Query Formula Language (M)

I'm able to see which values are being duplicated. There isn't a pattern that I can see. Interestingly, having done a full refresh different ids are being duplicated (e.g. id 35 is now unique)

Maybe something is happening in your query that might explain this behaviour.

 

I would focus on using volatile data like random numbers / current time stamps, different number formats (e.g. a duration of 35 days and number 35 would both load to number 35 in the data model).

 

One should also realize that query steps may be evaluated differently than might be expected. Maybe data is read from your source more than once.

 

How stable is the data in the source database? If it's highly volatile, then it may be that data is changing while loading.

 

Maybe you can use Table.Buffer to freeze your data, although this might harm performance as it will kill any query folding.

 

Or are you using Direct Query?

 

Maybe you can share the query code from the advanced editor (any sensitive parts like field names anonimized)?

 

As you may notice, it is hard to give a general answer, so we definitely need your help with supplying all relevant information as far as reasonably possible. I understand and appreciate fully that you can't share any sensitive data.

Specializing in Power Query Formula Language (M)

Some intersting suggestions.  I'm querying a production database (not ideal I know...), so there will be new IDs every refresh, and it is also possible that the source data will change during loading.

 

I'm not using direct query so I'll try your Table.Buffer suggestion; lack of query folding won't be a problem because I'm extracting the entire table and the load is pretty quick anyway

 

I won't have access to the database though until next week. If your suggestions don't work I will try and create a PBIX with just the id column and share that. In the meantime, thanks for your help!

 

 

Just a guess: Skip the sorting-operation and see if you still get dups.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.