cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
fadile
Regular Visitor

Remove duplicate data from a column (but keep the row) if repeated more than "n" number of times

Hello Everyone,

I am currently trying to come up with a way to remove duplicate data from a column (while keeping the row) if the data is repeated immediately one another more than "n" number of times.

 

For example the table I have currently is like this 

 

Time Stamp Air Flow (cfm)
12/1/2021 10:50 AM624

12/1/2021 11:00 AM

620
12/1/2021 11:10 AM621
12/1/2021 11:20 AM623
12/1/2021 11:30 AM623
12/1/2021 11:40 AM623
12/1/2021 11:50 AM623
12/1/2021 12:00 PM623
12/1/2021 12:10 PM 621
12/1/2021 12:20 PM620
12/1/2021 12:30 PM619
12/1/2021 12:40 PM623
12/1/2021 12:50 PM623
12/1/2021 1:00 PM626

 

As you can see, At 12/1/2021 11:20 AM the sensor being used to measure airflow begins to send wrong data to the data source, repeating the same value until 12/1/2021 12:00 PM.

So I want to create a Query to remove wrong data like this and return blank cells after the first value until a new value is returned.

I however want this to only happen if the 3 repetitions or more occur in a row and I do not want to remove other reoccurences of the repeated number (since 613 cfm is still possibly reading).

 

So i would have a table like this 

 

Time Stamp Air Flow (cfm)
12/1/2021 10:50 AM624

12/1/2021 11:00 AM

620
12/1/2021 11:10 AM621
12/1/2021 11:20 AM623
12/1/2021 11:30 AM 
12/1/2021 11:40 AM 
12/1/2021 11:50 AM 
12/1/2021 12:00 PM 
12/1/2021 12:10 PM621
12/1/2021 12:20 PM620
12/1/2021 12:30 PM619
12/1/2021 12:40 PM623
12/1/2021 12:50 PM623
12/1/2021 1:10 PM626

 

I'm trying to this in power query editor so it is perfomed automatically on all new incoming data.

Could anyone help me with this, thank you in advance for your help.

 

1 ACCEPTED SOLUTION
parry2k
Super User
Super User

@fadile here is the PQ Script, start a new blank query -> click advanced editor and paste the following code

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fc87CsAgEIThq8jWAXfWB8QuBwjYi/e/RpAESdis9VfMP60RxMMLCxy4JHbHSRtlidS3D6LwRNaIidAoE4PGsMK4wmSjjNpqIh50P7kycqtxVEbujdg1xtVosvFVm6n3Cw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Time Stamp " = _t, #"Air Flow (cfm)" = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Time Stamp ", type datetime}, {"Air Flow (cfm)", Int64.Type}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1, Int64.Type),
    #"Added Previous Row Air Flow Value" = Table.AddColumn(#"Added Index", "Previous Air Flow", each try #"Added Index" [#"Air Flow (cfm)"] {[Index] - 1} otherwise null),
    #"Added Group" = Table.AddColumn(#"Added Previous Row Air Flow Value", "Group", each if [#"Air Flow (cfm)"] <> [Previous Air Flow] then [Index] else null),
    #"Filled Down" = Table.FillDown(#"Added Group",{"Group"}),
    #"Removed Columns" = Table.RemoveColumns(#"Filled Down",{"Index", "Previous Air Flow"}),
    #"Grouped Rows" = Table.Group(#"Removed Columns", {"Group"}, {{"Count", each Table.RowCount(_), Int64.Type},{"Rank", each Table.AddIndexColumn(_, "Rank", 1)} }),
    #"Expanded Rank" = Table.ExpandTableColumn(#"Grouped Rows", "Rank", {"Time Stamp ", "Air Flow (cfm)", "Rank"}, {"Time Stamp ", "Air Flow (cfm)", "Rank"}),
    #"Added Custom2" = Table.AddColumn(#"Expanded Rank", "New Air Flow (cfm)", each if [Rank] <> 1 and [Count] > 3 then null else [#"Air Flow (cfm)"]),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Custom2",{"Time Stamp ", "New Air Flow (cfm)"})
in
    #"Removed Other Columns"

 

and the output of the above query based on your sample data:

 

parry2k_0-1640363427506.png

 

Follow us on LinkedIn

 

Learn about conditional formatting at Microsoft Reactor

My latest blog post The Power of Using Calculation Groups with Inactive Relationships (Part 1) (perytus.com) I would  Kudos if my solution helped. 👉 If you can spend time posting the question, you can also make efforts to give Kudos to whoever helped to solve your problem. It is a token of appreciation!

 

Visit us at https://perytus.com, your one-stop-shop for Power BI-related projects/training/consultancy.

 






Did I answer your question? Mark my post as a solution.

Proud to be a Super User! Appreciate your Kudos 🙂
Feel free to email me with any of your BI needs.





View solution in original post

1 REPLY 1
parry2k
Super User
Super User

@fadile here is the PQ Script, start a new blank query -> click advanced editor and paste the following code

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fc87CsAgEIThq8jWAXfWB8QuBwjYi/e/RpAESdis9VfMP60RxMMLCxy4JHbHSRtlidS3D6LwRNaIidAoE4PGsMK4wmSjjNpqIh50P7kycqtxVEbujdg1xtVosvFVm6n3Cw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Time Stamp " = _t, #"Air Flow (cfm)" = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Time Stamp ", type datetime}, {"Air Flow (cfm)", Int64.Type}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1, Int64.Type),
    #"Added Previous Row Air Flow Value" = Table.AddColumn(#"Added Index", "Previous Air Flow", each try #"Added Index" [#"Air Flow (cfm)"] {[Index] - 1} otherwise null),
    #"Added Group" = Table.AddColumn(#"Added Previous Row Air Flow Value", "Group", each if [#"Air Flow (cfm)"] <> [Previous Air Flow] then [Index] else null),
    #"Filled Down" = Table.FillDown(#"Added Group",{"Group"}),
    #"Removed Columns" = Table.RemoveColumns(#"Filled Down",{"Index", "Previous Air Flow"}),
    #"Grouped Rows" = Table.Group(#"Removed Columns", {"Group"}, {{"Count", each Table.RowCount(_), Int64.Type},{"Rank", each Table.AddIndexColumn(_, "Rank", 1)} }),
    #"Expanded Rank" = Table.ExpandTableColumn(#"Grouped Rows", "Rank", {"Time Stamp ", "Air Flow (cfm)", "Rank"}, {"Time Stamp ", "Air Flow (cfm)", "Rank"}),
    #"Added Custom2" = Table.AddColumn(#"Expanded Rank", "New Air Flow (cfm)", each if [Rank] <> 1 and [Count] > 3 then null else [#"Air Flow (cfm)"]),
    #"Removed Other Columns" = Table.SelectColumns(#"Added Custom2",{"Time Stamp ", "New Air Flow (cfm)"})
in
    #"Removed Other Columns"

 

and the output of the above query based on your sample data:

 

parry2k_0-1640363427506.png

 

Follow us on LinkedIn

 

Learn about conditional formatting at Microsoft Reactor

My latest blog post The Power of Using Calculation Groups with Inactive Relationships (Part 1) (perytus.com) I would  Kudos if my solution helped. 👉 If you can spend time posting the question, you can also make efforts to give Kudos to whoever helped to solve your problem. It is a token of appreciation!

 

Visit us at https://perytus.com, your one-stop-shop for Power BI-related projects/training/consultancy.

 






Did I answer your question? Mark my post as a solution.

Proud to be a Super User! Appreciate your Kudos 🙂
Feel free to email me with any of your BI needs.





Helpful resources

Announcements
June 2022 update 768X460.jpg

Check it out!

Click here to read more about the June 2022 updates!

Power Platform Conf 2022 768x460.jpg

Join us for Microsoft Power Platform Conference

The first Microsoft-sponsored Power Platform Conference is coming in September. 100+ speakers, 150+ sessions, and what's new and next for Power Platform.

Power BI Dev Camp Session 23 768x460.jpg

Check it Out!

Mark your calendars and join us on Thursday, June 30 at 11a PDT for a great session with Ted Pattison!

June 20 episode 7 with aka link 768x460.jpg

The Power BI Community Show

Join us on June 20 at 11 am PDT when Kim Manis shares the latest on Azure Synapse analytics, the Microsoft Intelligent Data Platform, and notable Power BI Updates from Microsoft Build 2022.

Top Solution Authors
Top Kudoed Authors