Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hello Everyone,
I am currently trying to come up with a way to remove duplicate data from a column (while keeping the row) if the data is repeated immediately one another more than "n" number of times.
For example the table I have currently is like this
Time Stamp | Air Flow (cfm) |
12/1/2021 10:50 AM | 624 |
12/1/2021 11:00 AM | 620 |
12/1/2021 11:10 AM | 621 |
12/1/2021 11:20 AM | 623 |
12/1/2021 11:30 AM | 623 |
12/1/2021 11:40 AM | 623 |
12/1/2021 11:50 AM | 623 |
12/1/2021 12:00 PM | 623 |
12/1/2021 12:10 PM | 621 |
12/1/2021 12:20 PM | 620 |
12/1/2021 12:30 PM | 619 |
12/1/2021 12:40 PM | 623 |
12/1/2021 12:50 PM | 623 |
12/1/2021 1:00 PM | 626 |
As you can see, At 12/1/2021 11:20 AM the sensor being used to measure airflow begins to send wrong data to the data source, repeating the same value until 12/1/2021 12:00 PM.
So I want to create a Query to remove wrong data like this and return blank cells after the first value until a new value is returned.
I however want this to only happen if the 3 repetitions or more occur in a row and I do not want to remove other reoccurences of the repeated number (since 613 cfm is still possibly reading).
So i would have a table like this
Time Stamp | Air Flow (cfm) |
12/1/2021 10:50 AM | 624 |
12/1/2021 11:00 AM | 620 |
12/1/2021 11:10 AM | 621 |
12/1/2021 11:20 AM | 623 |
12/1/2021 11:30 AM | |
12/1/2021 11:40 AM | |
12/1/2021 11:50 AM | |
12/1/2021 12:00 PM | |
12/1/2021 12:10 PM | 621 |
12/1/2021 12:20 PM | 620 |
12/1/2021 12:30 PM | 619 |
12/1/2021 12:40 PM | 623 |
12/1/2021 12:50 PM | 623 |
12/1/2021 1:10 PM | 626 |
I'm trying to this in power query editor so it is perfomed automatically on all new incoming data.
Could anyone help me with this, thank you in advance for your help.
Solved! Go to Solution.
@fadile here is the PQ Script, start a new blank query -> click advanced editor and paste the following code
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fc87CsAgEIThq8jWAXfWB8QuBwjYi/e/RpAESdis9VfMP60RxMMLCxy4JHbHSRtlidS3D6LwRNaIidAoE4PGsMK4wmSjjNpqIh50P7kycqtxVEbujdg1xtVosvFVm6n3Cw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Time Stamp " = _t, #"Air Flow (cfm)" = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Time Stamp ", type datetime}, {"Air Flow (cfm)", Int64.Type}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1, Int64.Type),
#"Added Previous Row Air Flow Value" = Table.AddColumn(#"Added Index", "Previous Air Flow", each try #"Added Index" [#"Air Flow (cfm)"] {[Index] - 1} otherwise null),
#"Added Group" = Table.AddColumn(#"Added Previous Row Air Flow Value", "Group", each if [#"Air Flow (cfm)"] <> [Previous Air Flow] then [Index] else null),
#"Filled Down" = Table.FillDown(#"Added Group",{"Group"}),
#"Removed Columns" = Table.RemoveColumns(#"Filled Down",{"Index", "Previous Air Flow"}),
#"Grouped Rows" = Table.Group(#"Removed Columns", {"Group"}, {{"Count", each Table.RowCount(_), Int64.Type},{"Rank", each Table.AddIndexColumn(_, "Rank", 1)} }),
#"Expanded Rank" = Table.ExpandTableColumn(#"Grouped Rows", "Rank", {"Time Stamp ", "Air Flow (cfm)", "Rank"}, {"Time Stamp ", "Air Flow (cfm)", "Rank"}),
#"Added Custom2" = Table.AddColumn(#"Expanded Rank", "New Air Flow (cfm)", each if [Rank] <> 1 and [Count] > 3 then null else [#"Air Flow (cfm)"]),
#"Removed Other Columns" = Table.SelectColumns(#"Added Custom2",{"Time Stamp ", "New Air Flow (cfm)"})
in
#"Removed Other Columns"
and the output of the above query based on your sample data:
✨ Follow us on LinkedIn
Learn about conditional formatting at Microsoft Reactor
My latest blog post The Power of Using Calculation Groups with Inactive Relationships (Part 1) (perytus.com) I would ❤ Kudos if my solution helped. 👉 If you can spend time posting the question, you can also make efforts to give Kudos to whoever helped to solve your problem. It is a token of appreciation!
⚡ Visit us at https://perytus.com, your one-stop-shop for Power BI-related projects/training/consultancy.⚡
Subscribe to the @PowerBIHowTo YT channel for an upcoming video on List and Record functions in Power Query!!
Learn Power BI and Fabric - subscribe to our YT channel - Click here: @PowerBIHowTo
If my solution proved useful, I'd be delighted to receive Kudos. When you put effort into asking a question, it's equally thoughtful to acknowledge and give Kudos to the individual who helped you solve the problem. It's a small gesture that shows appreciation and encouragement! ❤
Did I answer your question? Mark my post as a solution. Proud to be a Super User! Appreciate your Kudos 🙂
Feel free to email me with any of your BI needs.
@fadile here is the PQ Script, start a new blank query -> click advanced editor and paste the following code
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fc87CsAgEIThq8jWAXfWB8QuBwjYi/e/RpAESdis9VfMP60RxMMLCxy4JHbHSRtlidS3D6LwRNaIidAoE4PGsMK4wmSjjNpqIh50P7kycqtxVEbujdg1xtVosvFVm6n3Cw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Time Stamp " = _t, #"Air Flow (cfm)" = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Time Stamp ", type datetime}, {"Air Flow (cfm)", Int64.Type}}),
#"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 0, 1, Int64.Type),
#"Added Previous Row Air Flow Value" = Table.AddColumn(#"Added Index", "Previous Air Flow", each try #"Added Index" [#"Air Flow (cfm)"] {[Index] - 1} otherwise null),
#"Added Group" = Table.AddColumn(#"Added Previous Row Air Flow Value", "Group", each if [#"Air Flow (cfm)"] <> [Previous Air Flow] then [Index] else null),
#"Filled Down" = Table.FillDown(#"Added Group",{"Group"}),
#"Removed Columns" = Table.RemoveColumns(#"Filled Down",{"Index", "Previous Air Flow"}),
#"Grouped Rows" = Table.Group(#"Removed Columns", {"Group"}, {{"Count", each Table.RowCount(_), Int64.Type},{"Rank", each Table.AddIndexColumn(_, "Rank", 1)} }),
#"Expanded Rank" = Table.ExpandTableColumn(#"Grouped Rows", "Rank", {"Time Stamp ", "Air Flow (cfm)", "Rank"}, {"Time Stamp ", "Air Flow (cfm)", "Rank"}),
#"Added Custom2" = Table.AddColumn(#"Expanded Rank", "New Air Flow (cfm)", each if [Rank] <> 1 and [Count] > 3 then null else [#"Air Flow (cfm)"]),
#"Removed Other Columns" = Table.SelectColumns(#"Added Custom2",{"Time Stamp ", "New Air Flow (cfm)"})
in
#"Removed Other Columns"
and the output of the above query based on your sample data:
✨ Follow us on LinkedIn
Learn about conditional formatting at Microsoft Reactor
My latest blog post The Power of Using Calculation Groups with Inactive Relationships (Part 1) (perytus.com) I would ❤ Kudos if my solution helped. 👉 If you can spend time posting the question, you can also make efforts to give Kudos to whoever helped to solve your problem. It is a token of appreciation!
⚡ Visit us at https://perytus.com, your one-stop-shop for Power BI-related projects/training/consultancy.⚡
Subscribe to the @PowerBIHowTo YT channel for an upcoming video on List and Record functions in Power Query!!
Learn Power BI and Fabric - subscribe to our YT channel - Click here: @PowerBIHowTo
If my solution proved useful, I'd be delighted to receive Kudos. When you put effort into asking a question, it's equally thoughtful to acknowledge and give Kudos to the individual who helped you solve the problem. It's a small gesture that shows appreciation and encouragement! ❤
Did I answer your question? Mark my post as a solution. Proud to be a Super User! Appreciate your Kudos 🙂
Feel free to email me with any of your BI needs.
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
105 | |
93 | |
75 | |
62 | |
50 |
User | Count |
---|---|
146 | |
109 | |
106 | |
88 | |
61 |