Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
fadile
Regular Visitor

Remove duplicate data from a column (but keep the row) if repeated more than "n" number of times

Hello Everyone,

I am currently trying to come up with a way to remove duplicate data from a column (while keeping the row) if the data is repeated immediately one another more than "n" number of times.

 

For example the table I have currently is like this 

 

Time Stamp Air Flow (cfm)
12/1/2021 10:50 AM624

12/1/2021 11:00 AM

620
12/1/2021 11:10 AM621
12/1/2021 11:20 AM623
12/1/2021 11:30 AM623
12/1/2021 11:40 AM623
12/1/2021 11:50 AM623
12/1/2021 12:00 PM623
12/1/2021 12:10 PM 621
12/1/2021 12:20 PM620
12/1/2021 12:30 PM619
12/1/2021 12:40 PM623
12/1/2021 12:50 PM623
12/1/2021 1:00 PM626

 

As you can see, At 12/1/2021 11:20 AM the sensor being used to measure airflow begins to send wrong data to the data source, repeating the same value until 12/1/2021 12:00 PM.

So I want to create a Query to remove wrong data like this and return blank cells after the first value until a new value is returned.

I however want this to only happen if the 3 repetitions or more occur in a row and I do not want to remove other reoccurences of the repeated number (since 613 cfm is still possibly reading).

 

So i would have a table like this 

 

Time Stamp Air Flow (cfm)
12/1/2021 10:50 AM624

12/1/2021 11:00 AM

620
12/1/2021 11:10 AM621
12/1/2021 11:20 AM623
12/1/2021 11:30 AM 
12/1/2021 11:40 AM 
12/1/2021 11:50 AM 
12/1/2021 12:00 PM 
12/1/2021 12:10 PM621
12/1/2021 12:20 PM620
12/1/2021 12:30 PM619
12/1/2021 12:40 PM623
12/1/2021 12:50 PM623
12/1/2021 1:10 PM626

 

I'm trying to this in power query editor so it is perfomed automatically on all new incoming data.

Could anyone help me with this, thank you in advance for your help.

 

1 ACCEPTED SOLUTION
parry2k
Super User
Super User

@fadile here is the PQ Script, start a new blank query -> click advanced editor and paste the following code

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fc87CsAgEIThq8jWAXfWB8QuBwjYi/e/RpAESdis9VfMP60RxMMLCxy4JHbHSRtlidS3D6LwRNaIidAoE4PGsMK4wmSjjNpqIh50P7kycqtxVEbujdg1xtVosvFVm6n3Cw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Time Stamp " = _t, #"Air Flow (cfm)" = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Time Stamp ", type datetime}, {"Air Flow (cfm)", Int64.Type}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1, Int64.Type),
    #"Added Previous Row Air Flow Value" = Table.AddColumn(#"Added Index", "Previous Air Flow", each try #"Added Index" [#"Air Flow (cfm)"] {[Index] - 1} otherwise null),
    #"Added Group" = Table.AddColumn(#"Added Previous Row Air Flow Value", "Group", each if [#"Air Flow (cfm)"] <> [Previous Air Flow] then [Index] else null),
    #"Filled Down" = Table.FillDown(#"Added Group",{"Group"}),
    #"Removed Columns" = Table.RemoveColumns(#"Filled Down",{"Index", "Previous Air Flow"}),
    #"Grouped Rows" = Table.Group(#"Removed Columns", {"Group"}, {{"Count", each Table.RowCount(_), Int64.Type},{"Rank", each Table.AddIndexColumn(_, "Rank", 1)} }),
    #"Expanded Rank" = Table.ExpandTableColumn(#"Grouped Rows", "Rank", {"Time Stamp ", "Air Flow (cfm)", "Rank"}, {"Time Stamp ", "Air Flow (cfm)", "Rank"}),
    #"Added Custom2" = Table.AddColumn(#"Expanded Rank", "New Air Flow (cfm)", each if [Rank] <> 1 and [Count] > 3 then null else [#"Air Flow (cfm)"]),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Custom2",{"Time Stamp ", "New Air Flow (cfm)"})
in
    #"Removed Other Columns"

 

and the output of the above query based on your sample data:

 

parry2k_0-1640363427506.png

 

Follow us on LinkedIn

 

Learn about conditional formatting at Microsoft Reactor

My latest blog post The Power of Using Calculation Groups with Inactive Relationships (Part 1) (perytus.com) I would  Kudos if my solution helped. 👉 If you can spend time posting the question, you can also make efforts to give Kudos to whoever helped to solve your problem. It is a token of appreciation!

 

Visit us at https://perytus.com, your one-stop-shop for Power BI-related projects/training/consultancy.

 



Subscribe to the @PowerBIHowTo YT channel for an upcoming video on List and Record functions in Power Query!!

Learn Power BI and Fabric - subscribe to our YT channel - Click here: @PowerBIHowTo

If my solution proved useful, I'd be delighted to receive Kudos. When you put effort into asking a question, it's equally thoughtful to acknowledge and give Kudos to the individual who helped you solve the problem. It's a small gesture that shows appreciation and encouragement! ❤


Did I answer your question? Mark my post as a solution. Proud to be a Super User! Appreciate your Kudos 🙂
Feel free to email me with any of your BI needs.

View solution in original post

1 REPLY 1
parry2k
Super User
Super User

@fadile here is the PQ Script, start a new blank query -> click advanced editor and paste the following code

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fc87CsAgEIThq8jWAXfWB8QuBwjYi/e/RpAESdis9VfMP60RxMMLCxy4JHbHSRtlidS3D6LwRNaIidAoE4PGsMK4wmSjjNpqIh50P7kycqtxVEbujdg1xtVosvFVm6n3Cw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Time Stamp " = _t, #"Air Flow (cfm)" = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Time Stamp ", type datetime}, {"Air Flow (cfm)", Int64.Type}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1, Int64.Type),
    #"Added Previous Row Air Flow Value" = Table.AddColumn(#"Added Index", "Previous Air Flow", each try #"Added Index" [#"Air Flow (cfm)"] {[Index] - 1} otherwise null),
    #"Added Group" = Table.AddColumn(#"Added Previous Row Air Flow Value", "Group", each if [#"Air Flow (cfm)"] <> [Previous Air Flow] then [Index] else null),
    #"Filled Down" = Table.FillDown(#"Added Group",{"Group"}),
    #"Removed Columns" = Table.RemoveColumns(#"Filled Down",{"Index", "Previous Air Flow"}),
    #"Grouped Rows" = Table.Group(#"Removed Columns", {"Group"}, {{"Count", each Table.RowCount(_), Int64.Type},{"Rank", each Table.AddIndexColumn(_, "Rank", 1)} }),
    #"Expanded Rank" = Table.ExpandTableColumn(#"Grouped Rows", "Rank", {"Time Stamp ", "Air Flow (cfm)", "Rank"}, {"Time Stamp ", "Air Flow (cfm)", "Rank"}),
    #"Added Custom2" = Table.AddColumn(#"Expanded Rank", "New Air Flow (cfm)", each if [Rank] <> 1 and [Count] > 3 then null else [#"Air Flow (cfm)"]),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Custom2",{"Time Stamp ", "New Air Flow (cfm)"})
in
    #"Removed Other Columns"

 

and the output of the above query based on your sample data:

 

parry2k_0-1640363427506.png

 

Follow us on LinkedIn

 

Learn about conditional formatting at Microsoft Reactor

My latest blog post The Power of Using Calculation Groups with Inactive Relationships (Part 1) (perytus.com) I would  Kudos if my solution helped. 👉 If you can spend time posting the question, you can also make efforts to give Kudos to whoever helped to solve your problem. It is a token of appreciation!

 

Visit us at https://perytus.com, your one-stop-shop for Power BI-related projects/training/consultancy.

 



Subscribe to the @PowerBIHowTo YT channel for an upcoming video on List and Record functions in Power Query!!

Learn Power BI and Fabric - subscribe to our YT channel - Click here: @PowerBIHowTo

If my solution proved useful, I'd be delighted to receive Kudos. When you put effort into asking a question, it's equally thoughtful to acknowledge and give Kudos to the individual who helped you solve the problem. It's a small gesture that shows appreciation and encouragement! ❤


Did I answer your question? Mark my post as a solution. Proud to be a Super User! Appreciate your Kudos 🙂
Feel free to email me with any of your BI needs.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.