Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Grow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.

Reply
vjnvinod
Impactful Individual
Impactful Individual

Need to create a calculate coloumn to identify duplicates lines

Hi,

 

I have a table, with so many rows and coloumns,

i need to create a calculate coloumn, to segregate "Original and Duplicate entries"

Certain coloumns

 

GlobalService

DunsNumber

OpportunityID

Opportunity

PipelineOpen_constant

 

if the above coloumns , for the lines(Rows) are equal, then mark these as duplicate, except 1line(which is original)

that means if there are 3 lines same, mark 2 of them as duplicate and 1 as original.

 

let me know if this is possible to do and also i want a table to be created for those entries.

 

1 ACCEPTED SOLUTION
v-frfei-msft
Community Support
Community Support

Hi @vjnvinod ,

 

I have created a sample for your reference, please check the following steps as below.

 

1. Insert an index column in power query. M code for your reference.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSlSK1YGRSUhkMg4yBUymgsk0pdhYAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [catgeory = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"catgeory", type text}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1)
in
    #"Added Index"

 

2. Close and apply, then wen can get the excepted result by a calculated column.

Column =
VAR inde = 'Table'[Index]
VAR previous = inde - 1
RETURN
    IF (
        CALCULATE (
            MAX ( 'Table'[catgeory] ),
            FILTER ( 'Table', 'Table'[Index] = previous )
        ) = 'Table'[catgeory],
        "Duplicated",
        "Original"
    )

Capture.PNG

 

Pbix as attached.

 

 

Community Support Team _ Frank
If this post helps, then please consider Accept it as the solution to help the others find it more quickly.

View solution in original post

2 REPLIES 2
v-frfei-msft
Community Support
Community Support

Hi @vjnvinod ,

 

I have created a sample for your reference, please check the following steps as below.

 

1. Insert an index column in power query. M code for your reference.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WSlSK1YGRSUhkMg4yBUymgsk0pdhYAA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [catgeory = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"catgeory", type text}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1)
in
    #"Added Index"

 

2. Close and apply, then wen can get the excepted result by a calculated column.

Column =
VAR inde = 'Table'[Index]
VAR previous = inde - 1
RETURN
    IF (
        CALCULATE (
            MAX ( 'Table'[catgeory] ),
            FILTER ( 'Table', 'Table'[Index] = previous )
        ) = 'Table'[catgeory],
        "Duplicated",
        "Original"
    )

Capture.PNG

 

Pbix as attached.

 

 

Community Support Team _ Frank
If this post helps, then please consider Accept it as the solution to help the others find it more quickly.
danextian
Super User
Super User

Hi @vjnvinod ,

 

This is achievable.

 

  • First create a calculated column to concatenate all these columns.

 

Concatenate =
//you may also use "&" instead of COMBINEVALUES() to concatenate text strings
COMBINEVALUES (
    " ",
    'Table'[GlobalService],
    'Table'[DunsNumber],
    'Table'[OpportunityID],
    'Table'[Opportunity],
    'Table'[PipelineOpen_constant]
)
​

 

 

  • Now, there should some kind of a column to identify which among the duplicate concatenated values come first like datetimestamp, etc. If there isn't, to to the Query Editor (Designer Home Tab >> Transform Data) -->Add Column tab -->Index Column and then apply the changes. Create another calculated column:
    index.png
    KeepThisRow =
    CALCULATE (
        MIN ( 'Table'[Index/DateTimeStamp] ),
        ALLEXCEPT ( 'Table', 'Table'[Concatenate] )
    ) = 'Table'[Index/DateTimeStamp]
    ​

     

Another way to do this, you don't want  to keep all other rows except the first one based on several columns is do Remove Duplicates in Power Query which is very similar to Excel.  In Power Query, select all the columns in your list. Right click and then select Remove Duplicates.

remove duplicates.png

 

 

 










Did I answer your question? Mark my post as a solution!


Proud to be a Super User!









"Tell me and I’ll forget; show me and I may remember; involve me and I’ll understand."
Need Power BI consultation, get in touch with me on LinkedIn or hire me on UpWork.
Learn with me on YouTube @DAXJutsu or follow my page on Facebook @DAXJutsuPBI.

Helpful resources

Announcements
Europe Fabric Conference

Europe’s largest Microsoft Fabric Community Conference

Join the community in Stockholm for expert Microsoft Fabric learning including a very exciting keynote from Arun Ulag, Corporate Vice President, Azure Data.

RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayPowerBICarousel1

Power BI Monthly Update - May 2024

Check out the May 2024 Power BI update to learn about new features.