cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Highlighted
Helper I
Helper I

Remove duplicates without changing values in calculated column

I have a bit of a tricky situation and I am not able to figure out how to solve it. So I have a huge data set (million rows) and many duplicate IDs. The duplicate IDs are within the table because it shows results over different years. Without the duplicates I should have around 100 000 rows. However, I've added some calculated columns to the table that sums values of an ID over all the years combined. 

 

Is there a way to remove the duplicate IDs without changing the result of the calculated columns?

1 ACCEPTED SOLUTION

Accepted Solutions
Highlighted
Helper I
Helper I

Re: Remove duplicates without changing values in calculated column

I managed to solve this by uploading again the same data set under a different name. I've removed the duplicates in the second data set and used FIRSTNONBLANK function to add the calculated Tot column of the first data set to the second data set. A bit of an unconventional approach I would say, but it does the trick!

 

This idea was inspired by @az38. Thank you!

View solution in original post

7 REPLIES 7
Highlighted
Super User VI
Super User VI

Re: Remove duplicates without changing values in calculated column

@bamba98 

if you created new columns with DAX expressions - no, dataset won't be removed correct in any case


do not hesitate to give a kudo to useful posts and mark solutions as solution
LinkedIn
Highlighted
Helper I
Helper I

Re: Remove duplicates without changing values in calculated column

I don't get your answer. Perhaps my problem was not very clear. Allow me to elaborate.

 

Consider the following example with just one ID:

bamba98_0-1594581393648.png

where Tot is calculated as follows: Tot = calculate(sum(val), allexcept(ID)).

 

I want to remove the duplicate IDs and just keep one ID without the Tot column to change. Now, if I remove the duplicate ID rows with edit query, I get that Tot=10, but I want the final result to look like this: 

bamba98_1-1594581589472.png

 

Highlighted
Super User VI
Super User VI

Re: Remove duplicates without changing values in calculated column

@bamba98 

what's rule for duplicates definition? why do you remove exactly 2019, 2018 and 2017 rows and not 2020?


do not hesitate to give a kudo to useful posts and mark solutions as solution
LinkedIn
Highlighted
Helper I
Helper I

Re: Remove duplicates without changing values in calculated column

@az38 Because I want to keep the latest period an ID was active and after removing the duplicates I don't care much about the column containing Val.

Highlighted
Super User VI
Super User VI

Re: Remove duplicates without changing values in calculated column

@bamba98 

try to create new calculated table like

Table 2 = 
ADDCOLUMNS(
SUMMARIZE('Table', 'Table'[ID], "Period", MAX('Table'[Period])),
"Val", CALCULATE(SUM('Table'[Val]), FILTER('Table', 'Table'[ID] = EARLIER([ID]) && 'Table'[Period] = EARLIER([Period]))),
"Tot", CALCULATE(SUM('Table'[Tot]), FILTER('Table', 'Table'[ID] = EARLIER([ID]) && 'Table'[Period] = EARLIER([Period])))

)

do not hesitate to give a kudo to useful posts and mark solutions as solution
LinkedIn
Highlighted
Community Support
Community Support

Re: Remove duplicates without changing values in calculated column

Hi @bamba98 ,

 

In Power Query Editor, you remove the duplicate ID rows, so they are not existed. You cannot get the Tot=45, because the real data is gone.

You need to get the Tot in Power Query Editor, and you can refer the following steps.

 

1. Group by ID.

 

R1.jpg

 

2. Add a custom column to get the max period.

 

R2.jpg

 

3. Expand the custom column.

 

R3.jpg

 

4. At last delete the Table column.

 

R4.jpg

 

R5.jpg

 

The complete M Query is as follows:

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUTIyMDIAUoYGSrE6MBFDSyBliixgAVKCImKOEDFCGIMsgGSKEZIpBsgiMFNiAQ==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, Period = _t, Val = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", Int64.Type}, {"Period", Int64.Type}, {"Val", Int64.Type}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"ID"}, {{"Table", each _, type table [ID=nullable number, Period=nullable number, Val=nullable number]}, {"New Tot", each List.Sum([Val]), type nullable number}}),
    #"Added Custom" = Table.AddColumn(#"Grouped Rows", "Custom.1", each Table.Max([Table],"Period")),
    #"Expanded Custom.1" = Table.ExpandRecordColumn(#"Added Custom", "Custom.1", {"Period"}, {"Custom.1.Period"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded Custom.1",{"Table"})
in
    #"Removed Columns"

 

BTW, pbix as attached.

 

Best regards,

 

Community Support Team _ zhenbw

If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Highlighted
Helper I
Helper I

Re: Remove duplicates without changing values in calculated column

I managed to solve this by uploading again the same data set under a different name. I've removed the duplicates in the second data set and used FIRSTNONBLANK function to add the calculated Tot column of the first data set to the second data set. A bit of an unconventional approach I would say, but it does the trick!

 

This idea was inspired by @az38. Thank you!

View solution in original post

Helpful resources

Announcements

August Community Highlights

Check out a full recap of the month!

August 2020 CYST Challenge

Check out the winners of the recent 'Can You Solve These?' community challenge!

Experience what’s next for Power BI

Join us for an in-depth look at the new Power BI features and capabilities at the free Microsoft Business Applications Launch Event.

Top Kudoed Authors