cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
vjnvinod Member
Member

Power query tricky question

Dear community.

 

I have a table whose coloumns are below

 

Source.NameFiscal yearAccountAccountChannelAccountSegmentAccountSubSegmentAccountSectorAccountSubSector1Area1RegionGCSP1CountryServiceLineSubServiceLineTER_FYTD_USDTER_FYTD_ConstantRevenuePlanFY_ConstantNER_FYTD_ConstantMarginFYTD_ConstantMarginPFYTD_ConstantTER_PFYTD_ConstantSaleFYTD_constantPipelineWeightedOpen_constantRegion

 

I need to create additional coloumn with a condition that, if all of my row data matches except for category, then i want that to show it as duplicate

For instance, if there are  3 duplicate rows, then it should show 2 rows as duplicate and 1 row as original.

 

let me know how to achieve this in power query

12 REPLIES 12
adetogni Established Member
Established Member

Re: Power query tricky question

As a rule of thumb, each "row" in powerquery cannot reference to "other rows". Especially as there's not the concept of "next" and "previous" rows.
BUT if your objective is to remove duplicate rows, you can simply remove the Category column and then group data.

vjnvinod Member
Member

Re: Power query tricky question

@adetogni

 

so you are telling there is no way we can achieve this?

my objective is to identify those rows which are duplicate and show that in different coloumn as "Duplicate" or "Original" but  i don't want to remove it for obvious reasons

adetogni Established Member
Established Member

Re: Power query tricky question

Well, then for doing something like that I suggest DAX as you can easily create a calculated column with a FILTER on all the rows of a group and then use a RANK to sort them. I don't know exactly how you can decide which one of the three is the original and which one is a duplicate, though.

ElenaN Member
Member

Re: Power query tricky question

Hello,

 

A solution would be the one attached below, however I am not that sure how doable it is considering you have a lot of columns by which to group by.

 

Steps would be to :

1. create an index for the rows in your table (ideal for the rows to be sorted so that the 1st row is the original one - in case it matters or there is a rule for this)

2. group by all columns except the category one and return count rows and also the previously created index from the new table created - so you know which rows are duplicated (the ones that have count > 1)

3. group again by the new count column and all the column except the category one and create min index inside each duplicated rows. 

4. create a new column by comparing initial index with min index inside a group to check which one is the 1st duplicated row and assign text either Duplicate or Original.

5. Remove not needed fields

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUUoEYkelWJ1oJSMgKwmIncA8YyArGYidwTyYSicUHlAuFgA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column1 = _t, Column2 = _t, Column3 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", Int64.Type}, {"Column2", type text}, {"Column3", type text}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
    #"Grouped Rows" = Table.Group(#"Added Index", {"Column1", "Column2"}, {{"Count", each Table.RowCount(_), type number}, {"Details", each _, type table}}),
    #"Subtracted from Column" = Table.TransformColumns(#"Grouped Rows", {{"Count", each _ - 1, type number}}),
    #"Expanded Details" = Table.ExpandTableColumn(#"Subtracted from Column", "Details", {"Column3", "Index"}, {"Details.Column3", "Details.Index"}),
    #"Renamed Columns" = Table.RenameColumns(#"Expanded Details",{{"Details.Index", "Duplicates Index"}}),
    #"Grouped Rows1" = Table.Group(#"Renamed Columns", {"Column1", "Column2", "Count"}, {{"Min", each List.Min([Duplicates Index]), type number}, {"Details", each _, type table}}),
    #"Expanded Details1" = Table.ExpandTableColumn(#"Grouped Rows1", "Details", {"Details.Column3", "Duplicates Index"}, {"Details.Details.Column3", "Details.Duplicates Index"}),
    #"Renamed Columns1" = Table.RenameColumns(#"Expanded Details1",{{"Min", "Min Index"}, {"Details.Duplicates Index", "Index"}}),
    #"Added Conditional Column" = Table.AddColumn(#"Renamed Columns1", "Is Duplicate", each if [Min Index] = [Index] then "Original" else "Duplicate"),
    #"Removed Columns" = Table.RemoveColumns(#"Added Conditional Column",{"Index", "Min Index", "Count"})
in
    #"Removed Columns"

 

I tested on the following data set:

dataset.png

 

 

 

 

 

where Column1 and Column 2 can be duplicated rows, while Column3 can have different values (like your mentioned category column)

 

Hope it helps,

ElenaN 

vjnvinod Member
Member

Re: Power query tricky question

@adetogni

 

Can you help with the DAX? i am not familiar with DAX.

also on the duplicate, it doesn't matter at all, for instance you have 3 rows repeating the values, we can mark 2 rows as duplicate & it doesn't matter which row

vjnvinod Member
Member

Re: Power query tricky question

@ElenaN

 

Can you share the PBIX file?

ElenaN Member
Member

Re: Power query tricky question

adetogni Established Member
Established Member

Re: Power query tricky question

Maybe Elena's solutions will work, anyway

 

in DAX create a calclulated column

 

Ordinal =
VAR thisColumn1=YourTable[Column1]
VAR thisColumn2=YourTable[Column2]
-- create a var for each column that you have to group by

 

RETURN
RANKX(
     FILTER(YourTable,
     YourTable[Column1]=thisColumn1 &&
     YourTable[Column2]=thisColumn2

     -- && repeat for each variable 
     ), YourTable[Category]
)

This will create a number from 1 to N for each row of the same "group". Then you add a secondary conditional column where if this column is 1 then is the valid one otherwise is duplicate.

 

There is also another solution that should provide exactly your result

IsDuplicate =
VAR thisColumn1 = YourTable[Column1]
VAR thisColumn2 = YourTable[Column2] 
-- create a var for each column that you have to group by
RETURN
    IF (
        CALCULATE (
            MIN ( YourTable[Category] ),
            FILTER (
                ALL ( YourTable ),
                YourTable[Column1] = thisColumn1
                    && YourTable[Column2] = thisColumn2
  -- repeat for all variables
            )
        )
            = MIN ( YourTable[Category] ),
        "Original",
        "Duplicate"
    )

 

These solutions however won't probably work if your categories might be the same as rankx and max will find the same values. If so follow Elena's suggestion and create an Index column, and RANKX on that one instead of the category

vjnvinod Member
Member

Re: Power query tricky question

@ElenaN @adetogni

 

superhelpful & both the solution works

 

i have another  tricky question on the same topic

 

i have a table "cross-Domain extract file" and  3 coloumns stated below

1)  “ClientSectorDesc” 

2)   "Sector as per Account list"

3)   "Account Name_Lookup"

 

In the table "cross-Domain extract file"

Filter Banking &Capital market in coloumn “ClientSectorDesc”  &

Select Null in “Sector as per Account list” column - Replace Null with BCM

and also With same filters on Replace null in  coloumn“Account Name_Lookup” with Others.

after the changes made, unfilter everything (like we do in excel)

 

how do i achieve this, i wasn't sucessfull using this find and replace method