Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Grow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.

Reply
KH_Mike
Helper III
Helper III

Create new column to check the duplication with certain condition

Hello All,

 

In Power Query, is it possible to create a new column based on below condition?

There are "Data Group" either "Revenue" or "Cost". If the "Data Code" under "Data Group" is unique, then it will show "Y". If the "Data Code" under "Data Group" is not unique (like HHH in Data Code), then check the "Date Type" and show "Y" in "Revenue Actual". For "Data Group" equal to "Cost" also apply the same. 

 

 

IDData GroupData TypeData CodeCurrencyAmountChecking
A123B456RevenueRevenue ActualAAAEUR10Y
A123B456RevenueRevenue ActualBBBEUR30Y
A123B456RevenueRevenue AccrualCCCUSD40Y
A123B456RevenueRevenue AccrualDDDUSD50Y
A123B456RevenueRevenue ActualEEEUSD50Y
A123B456RevenueRevenue AccrualFFFUSD1268Y
A123B456RevenueRevenue ActualGGGUSD1268Y
A123B456RevenueRevenue AccrualHHHEUR1732N
A123B456RevenueRevenue ActualHHHEUR1732Y
A123B456CostCost ActualAAAEUR10Y
A123B456CostCost ActualBBBEUR30Y
A123B456CostCost AccrualCCCUSD40Y
A123B456CostCost AccrualDDDUSD50Y
A123B456CostCost ActualEEEUSD50Y
A123B456CostCost AccrualFFFUSD1268Y
A123B456CostCost ActualGGGUSD1268Y
A123B456CostCost AccrualHHHEUR1732N
A123B456CostCost ActualHHHEUR1732Y
5 REPLIES 5
spinfuzer
Super User
Super User

This is not exactly what you asked for, but I think it achieves what you are probably looking to do.  

 

spinfuzer_1-1702866425788.png

 

 

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("nZI9C8IwEIb/imTuYNIPXfPVZnKodJDSQUo3qaCtv98DbYp4kEuX90Lg4Tm4t22Z5CJVWV6whNXDaxjnYX3tZD/N1xt8SCkhbVND8j3EhXUJFVZKeTglw/3jQ2utIZuzgcyiaWOMp/PYxa21G+BFXZalp7kojpHyqqo24YveObee7JAKGKcIPYL/6fX9OX1HTFVQLFySX4xeD5wLFwNdM1wJXEcpAyqk1ABXUgqAKvHTd28=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, #"Data Group" = _t, #"Data Type" = _t, #"Data Code" = _t, Currency = _t, Amount = _t, Checking = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", type text}, {"Data Group", type text}, {"Data Type", type text}, {"Data Code", type text}, {"Currency", type text}, {"Amount", Int64.Type}, {"Checking", type text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"Checking"}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Removed Columns", "Data Type", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv), {"Data Type.1", "Data Type.2"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Data Type.1", type text}, {"Data Type.2", type text}}),
    #"Pivoted Column" = Table.Pivot(#"Changed Type1", List.Distinct(#"Changed Type1"[#"Data Type.2"]), "Data Type.2", "Amount", List.Sum),
    #"Added Custom" = Table.AddColumn(#"Pivoted Column", "Custom", each if List.NonNullCount({[Actual],[Accrual]}) = 2 then "Duplication" else null),
    #"Added Custom1" = Table.AddColumn(#"Added Custom", "Custom.1", each [Actual] ?? [Accrual])
in
    #"Added Custom1"

 

Hi @spinfuzer ,

 

I got your point while I try to do it in my actual data but it take long time to perform this task. I guess maybe the data size too big (more than 30 millions rows of records). Any ideas how to speed up ? Thank you.

This REQUIRES that the data is sorted exactly how it is shown in your example.  It also assumes Accrual always sorts before Actual as well.  That should be the case if it is in alphabetical order.  All this does it shift the table up one row and compares the current row values to the next row's values.

 

 

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("nZI9C8IwEIb/imTuYNIPXfPVZnKodJDSQUo3qaCtv98DbYp4kEuX90Lg4Tm4t22Z5CJVWV6whNXDaxjnYX3tZD/N1xt8SCkhbVND8j3EhXUJFVZKeTglw/3jQ2utIZuzgcyiaWOMp/PYxa21G+BFXZalp7kojpHyqqo24YveObee7JAKGKcIPYL/6fX9OX1HTFVQLFySX4xeD5wLFwNdM1wJXEcpAyqk1ABXUgqAKvHTd28=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [ID = _t, #"Data Group" = _t, #"Data Type" = _t, #"Data Code" = _t, Currency = _t, Amount = _t, Checking = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"ID", type text}, {"Data Group", type text}, {"Data Type", type text}, {"Data Code", type text}, {"Currency", type text}, {"Amount", Int64.Type}, {"Checking", type text}}),
    prior_step = Table.RemoveColumns(#"Changed Type",{"Checking"}),
    tbl = Table.SelectColumns(prior_step, {"ID","Data Group","Data Type", "Data Code"}),
    temp_columns = Table.FromColumns(
        Table.ToColumns(prior_step) & Table.ToColumns(Table.RemoveFirstN(tbl,1)),
        Table.ColumnNames(prior_step) & Table.ColumnNames(Table.PrefixColumns(tbl,"temp"))
        ),
    check_column = Table.AddColumn(temp_columns, "Check", each if [Data Code] <> [temp.Data Code] then "Y" else if [ID] = [temp.ID] and [Data Group] = [temp.Data Group] then "N" else "Y"),
    remove_temp_columns = Table.RemoveColumns(check_column, {"temp.ID","temp.Data Group","temp.Data Type","temp.Data Code"}) 
in
    remove_temp_columns

 

 

 

wdx223_Daniel
Super User
Super User

=Table.Combine(Table.Group(YourTable,{"ID","Data Group"},{"n",each Table.Combine(Table.Group(_,"Data Code",{"t",each let a=Table.RowCount(_) in Table.AddColumn(_,"Checking",each if a=1 then "Y" else if Text.EndsWith([Data Type]," Actual") then "Y" else "N")})[t])})[n])

Hi @wdx223_Daniel 

 

Sorry that I don't really able to use your formula. Can you elebrate more?

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayPowerBICarousel1

Power BI Monthly Update - May 2024

Check out the May 2024 Power BI update to learn about new features.

Top Solution Authors
Top Kudoed Authors