Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
vjnvinod
Impactful Individual
Impactful Individual

Power query tricky question

Dear community.

 

I have a table whose coloumns are below

 

Source.NameFiscal yearAccountAccountChannelAccountSegmentAccountSubSegmentAccountSectorAccountSubSector1Area1RegionGCSP1CountryServiceLineSubServiceLineTER_FYTD_USDTER_FYTD_ConstantRevenuePlanFY_ConstantNER_FYTD_ConstantMarginFYTD_ConstantMarginPFYTD_ConstantTER_PFYTD_ConstantSaleFYTD_constantPipelineWeightedOpen_constantRegion

 

I need to create additional coloumn with a condition that, if all of my row data matches except for category, then i want that to show it as duplicate

For instance, if there are  3 duplicate rows, then it should show 2 rows as duplicate and 1 row as original.

 

let me know how to achieve this in power query

12 REPLIES 12
ElenaN
Resolver V
Resolver V

Hello,

 

A solution would be the one attached below, however I am not that sure how doable it is considering you have a lot of columns by which to group by.

 

Steps would be to :

1. create an index for the rows in your table (ideal for the rows to be sorted so that the 1st row is the original one - in case it matters or there is a rule for this)

2. group by all columns except the category one and return count rows and also the previously created index from the new table created - so you know which rows are duplicated (the ones that have count > 1)

3. group again by the new count column and all the column except the category one and create min index inside each duplicated rows. 

4. create a new column by comparing initial index with min index inside a group to check which one is the 1st duplicated row and assign text either Duplicate or Original.

5. Remove not needed fields

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WMlTSUUoEYkelWJ1oJSMgKwmIncA8YyArGYidwTyYSicUHlAuFgA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [Column1 = _t, Column2 = _t, Column3 = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", Int64.Type}, {"Column2", type text}, {"Column3", type text}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1),
    #"Grouped Rows" = Table.Group(#"Added Index", {"Column1", "Column2"}, {{"Count", each Table.RowCount(_), type number}, {"Details", each _, type table}}),
    #"Subtracted from Column" = Table.TransformColumns(#"Grouped Rows", {{"Count", each _ - 1, type number}}),
    #"Expanded Details" = Table.ExpandTableColumn(#"Subtracted from Column", "Details", {"Column3", "Index"}, {"Details.Column3", "Details.Index"}),
    #"Renamed Columns" = Table.RenameColumns(#"Expanded Details",{{"Details.Index", "Duplicates Index"}}),
    #"Grouped Rows1" = Table.Group(#"Renamed Columns", {"Column1", "Column2", "Count"}, {{"Min", each List.Min([Duplicates Index]), type number}, {"Details", each _, type table}}),
    #"Expanded Details1" = Table.ExpandTableColumn(#"Grouped Rows1", "Details", {"Details.Column3", "Duplicates Index"}, {"Details.Details.Column3", "Details.Duplicates Index"}),
    #"Renamed Columns1" = Table.RenameColumns(#"Expanded Details1",{{"Min", "Min Index"}, {"Details.Duplicates Index", "Index"}}),
    #"Added Conditional Column" = Table.AddColumn(#"Renamed Columns1", "Is Duplicate", each if [Min Index] = [Index] then "Original" else "Duplicate"),
    #"Removed Columns" = Table.RemoveColumns(#"Added Conditional Column",{"Index", "Min Index", "Count"})
in
    #"Removed Columns"

 

I tested on the following data set:

dataset.png

 

 

 

 

 

where Column1 and Column 2 can be duplicated rows, while Column3 can have different values (like your mentioned category column)

 

Hope it helps,

ElenaN 

vjnvinod
Impactful Individual
Impactful Individual

@ElenaN

 

Can you share the PBIX file?

vjnvinod
Impactful Individual
Impactful Individual

@ElenaN @Anonymous

 

superhelpful & both the solution works

 

i have another  tricky question on the same topic

 

i have a table "cross-Domain extract file" and  3 coloumns stated below

1)  “ClientSectorDesc” 

2)   "Sector as per Account list"

3)   "Account Name_Lookup"

 

In the table "cross-Domain extract file"

Filter Banking &Capital market in coloumn “ClientSectorDesc”  &

Select Null in “Sector as per Account list” column - Replace Null with BCM

and also With same filters on Replace null in  coloumn“Account Name_Lookup” with Others.

after the changes made, unfilter everything (like we do in excel)

 

how do i achieve this, i wasn't sucessfull using this find and replace method

 

 

Anonymous
Not applicable

Use a custom column: if (client sector) =finance ;"your non null value" ;your original column
vjnvinod
Impactful Individual
Impactful Individual

@Anonymous

 

thanks

Not sure, if this helps., i have 3 coloumns as mentioned in my post

can you let me know the steps or can you replicate this and share the pbix?

Anonymous
Not applicable

Can you share your (or a part of) pbix so I don't have to create from scratch?
Anonymous
Not applicable

As a rule of thumb, each "row" in powerquery cannot reference to "other rows". Especially as there's not the concept of "next" and "previous" rows.
BUT if your objective is to remove duplicate rows, you can simply remove the Category column and then group data.

vjnvinod
Impactful Individual
Impactful Individual

@Anonymous

 

so you are telling there is no way we can achieve this?

my objective is to identify those rows which are duplicate and show that in different coloumn as "Duplicate" or "Original" but  i don't want to remove it for obvious reasons

Anonymous
Not applicable

Well, then for doing something like that I suggest DAX as you can easily create a calculated column with a FILTER on all the rows of a group and then use a RANK to sort them. I don't know exactly how you can decide which one of the three is the original and which one is a duplicate, though.

vjnvinod
Impactful Individual
Impactful Individual

@Anonymous

 

Can you help with the DAX? i am not familiar with DAX.

also on the duplicate, it doesn't matter at all, for instance you have 3 rows repeating the values, we can mark 2 rows as duplicate & it doesn't matter which row

Anonymous
Not applicable

Maybe Elena's solutions will work, anyway

 

in DAX create a calclulated column

 

Ordinal =
VAR thisColumn1=YourTable[Column1]
VAR thisColumn2=YourTable[Column2]
-- create a var for each column that you have to group by

 

RETURN
RANKX(
     FILTER(YourTable,
     YourTable[Column1]=thisColumn1 &&
     YourTable[Column2]=thisColumn2

     -- && repeat for each variable 
     ), YourTable[Category]
)

This will create a number from 1 to N for each row of the same "group". Then you add a secondary conditional column where if this column is 1 then is the valid one otherwise is duplicate.

 

There is also another solution that should provide exactly your result

IsDuplicate =
VAR thisColumn1 = YourTable[Column1]
VAR thisColumn2 = YourTable[Column2] 
-- create a var for each column that you have to group by
RETURN
    IF (
        CALCULATE (
            MIN ( YourTable[Category] ),
            FILTER (
                ALL ( YourTable ),
                YourTable[Column1] = thisColumn1
                    && YourTable[Column2] = thisColumn2
  -- repeat for all variables
            )
        )
            = MIN ( YourTable[Category] ),
        "Original",
        "Duplicate"
    )

 

These solutions however won't probably work if your categories might be the same as rankx and max will find the same values. If so follow Elena's suggestion and create an Index column, and RANKX on that one instead of the category

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.