Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more.
Get startedGrow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.
Hi everyone,
Sorry for my English, but I will try to do my best to make myself understood, because I am French.
I have a data group that starts every day in the morning and ends in the evening. Each start and end of the day I also have a variable which is 0.
I would like to delete the start and end of day lines which have variable 0, and only these lines. Because in my data during the day there is this variable which is at 0 which I do not want to delete
I repeat it only at the beginning and end of the day until the variable is no longer at 0.
So I know that my explanations and information are incomplete, but I hope that you will help me, because my level with Power Query is very low.
Tell me what you need so you can help me
Thanks in advance.
Thank you both,
I've learned a lot about the M language and I'm going to try to learn even more, because it looks very interesting.
Hi @treza88
Glad to hear you've learned a lot,if the solutions @ronrsnfld and @dufoq3 offered help you, please consider to mark them as a solution, and if you need more help, you can also provide some information or start a new post.
Best Regards!
Yolo Zhu
@dufoq3 : Thank you for pointing that out.
Exploring the steps, using the diagnostic tools in Power BI, it seems there is a lot of time taken up by sorting, as well as by the computed date column.
By assuming the data is sorted descending as it is in the data provided by @treza88 , and by doing the Table.Group on the "jour","mois" and "annee" columns instead of computing, and then removing a date column, execution speeds up considerably.
let
Source = Csv.Document(File.Contents("C:\Users\ron\Desktop\raw data.csv"),[Delimiter=",", Columns=10, Encoding=1252, QuoteStyle=QuoteStyle.None]),
#"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{
{"ind", Int64.Type}, {"jour", Int64.Type}, {"mois", Int64.Type},
{"annee", Int64.Type}, {"heure", Int64.Type}, {"minute", Int64.Type},
{"seconde", Int64.Type}, {"vitesse", type number},
{"etat_chaine", type text}, {"date_heure_server", type datetime}}),
//Group by Date Columns, then generate lists of dates to be removed at
// end and start of dates
#"Grouped Rows" = Table.Group(#"Changed Type", {"jour","mois","annee"}, {
{"Day Start", (t)=> let rc = Table.RowCount(t)-1,
rws = List.Generate(
()=>[r=if t[vitesse]{rc} < 0.1 then t{rc} else null, idx=rc],
each [r] <> null,
each [r = if t[vitesse]{[idx]-1} < 0.1 then t{[idx]-1} else null, idx=[idx]-1],
each [r])
in rws, type {[]}},
{"Day End", (t)=>
List.Generate(
()=>[r=if t[vitesse]{0} < 0.1 then t{0} else null, idx=0],
each [r] <> null,
each [r = if t[vitesse]{[idx]+1} < 0.1 then t{[idx]+1} else null, idx=[idx]+1],
each [r]), type {[]}}}),
#"Rows to Remove" = List.Combine(#"Grouped Rows"[Day Start] & #"Grouped Rows"[Day End]),
#"Remove Rows" = Table.RemoveMatchingRows(#"Changed Type",#"Rows to Remove")
in
#"Remove Rows"
It is much faster now @ronrsnfld, but still 2-3 times slower. I'm not saying this because I want to be rude or sometinig... I like to test queries, because this is how I learn.
I've found another issue with your query. Query should delete rows where[vitese] is less than 0,1 only if it is at the beginning or end of the date. You can see my sample data here (or just check your code with my sample data below).
Below is your query with my sample data and you can see that in Result it is missing Source row no 11. (It is less than 0.1 but it is not at the beginning or end of the day)
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("rdK7EcMgDADQVXLUPltfhNNlgEzg8/5rRFwcJwXCFCkkPsU7JLFtiS2vlKZkHuxB4gmLp+yh9QpmQF8edWsL8EJAcsNyB0v79CYwJOQg4IqAkOBBoqwhQaNECQnsE/kkLC6k282vkJuCfgbCM9Xzs0doSMhB2BUhIcGjr+CQoD8V4u3UNqEn0f7f+jNTaRMySsSFOLG/AA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [index = _t, jour = _t, mois = _t, annee = _t, heure = _t, minute = _t, seconde = _t, vitesse = _t, etat_chaine = _t, date_heure_server = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{
{"index", Int64.Type}, {"jour", Int64.Type}, {"mois", Int64.Type},
{"annee", Int64.Type}, {"heure", Int64.Type}, {"minute", Int64.Type},
{"seconde", Int64.Type}, {"vitesse", type number},
{"etat_chaine", type text}, {"date_heure_server", type datetime}}, "en-US"),
//Group by Date Columns, then generate lists of dates to be removed at
// end and start of dates
#"Grouped Rows" = Table.Group(#"Changed Type", {"jour","mois","annee"}, {
{"Day Start", (t)=> let rc = Table.RowCount(t)-1,
rws = List.Generate(
()=>[r=if t[vitesse]{rc} < 0.1 then t{rc} else null, idx=rc],
each [r] <> null,
each [r = if t[vitesse]{[idx]-1} < 0.1 then t{[idx]-1} else null, idx=[idx]-1],
each [r])
in rws, type {[]}},
{"Day End", (t)=>
List.Generate(
()=>[r=if t[vitesse]{0} < 0.1 then t{0} else null, idx=0],
each [r] <> null,
each [r = if t[vitesse]{[idx]+1} < 0.1 then t{[idx]+1} else null, idx=[idx]+1],
each [r]), type {[]}}}),
#"Rows to Remove" = List.Combine(#"Grouped Rows"[Day Start] & #"Grouped Rows"[Day End]),
#"Remove Rows" = Table.RemoveMatchingRows(#"Changed Type",#"Rows to Remove")
in
#"Remove Rows"
@dufoq3 That is only because your sample data is not sorted as is the OP's data. Once you sort it, that row is retained. Also, although not contributing to this problem, you have duplicate "Index" rows in your sample which I do not believe are present in the original data
You're wright, if the data is sorted by descending order, your query works. Regarding index. I know that it is not necessary to have additional index, but I use it because sometimes it is not possible to rely with user index. Without adding this additional index and sorting at the end my query will be even faster, but for this purpose it is better to use it with additional index (just for sure).
@dufoq3 I was not referring to the added index columns, rather to the Index column in your sample data (the first column) which is mimicing the "ind" column from the OP. That column has duplicate values which do not exist in the original.
Hi @treza88, I've changed sample data a bit to determine more situations.
Let me know if this is what you want.
Selected lines will be deleted:
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("rdK7EcMgDADQVXLUPltfhNNlgEzg8/5rRFwcJwXCFCkkPsU7JLFtiS2vlKZkHuxB4gmLp+yh9QpmQF8edWsL8EJAcsNyB0v79CYwJOQg4IqAkOBBoqwhQaNECQnsE/kkLC6k282vkJuCfgbCM9Xzs0doSMhB2BUhIcGjr+CQoD8V4u3UNqEn0f7f+jNTaRMySsSFOLG/AA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [index = _t, jour = _t, mois = _t, année = _t, heure = _t, minute = _t, seconde = _t, vitesse = _t, etat_machine = _t, date_heure_server = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source, {{"date_heure_server", type datetime}}, "fr-FR"),
#"Changed Type1" = Table.TransformColumnTypes(#"Changed Type",{{"vitesse", type number}}, "en-US"),
#"Inserted Date" = Table.AddColumn(#"Changed Type1", "Date", each DateTime.Date([date_heure_server]), type date),
#"Added Index" = Table.AddIndexColumn(#"Inserted Date", "Index_1", 1, 1, Int64.Type),
#"Grouped Rows" = Table.Group(#"Added Index", {"Date"}, {{"Min", each List.Min([date_heure_server]), type nullable datetime}, {"Max", each List.Max([date_heure_server]), type nullable datetime}, {"all", each Table.AddIndexColumn(Table.Sort(_, {{"date_heure_server", Order.Ascending}, {"vitesse", Order.Ascending}}), "IndexHelper", 0 , 1, Int64.Type)}, {"Result", each
[ a = Table.AddIndexColumn(Table.Sort(_, {{"date_heure_server", Order.Ascending}, {"vitesse", Order.Ascending}}), "IndexMin", 0 , 1, Int64.Type),
b = Table.AddIndexColumn(Table.Sort(a, {{"date_heure_server", Order.Ascending}, {"vitesse", Order.Descending}}), "IndexMax", 0 , 1, Int64.Type),
min = Table.First(Table.SelectRows(a, (x)=> x[vitesse] >= 0.1))[IndexMin],
max = Table.Last(Table.SelectRows(b, (x)=> x[vitesse] >= 0.1))[IndexMax],
result = Table.SelectRows(b, (x)=> x[IndexMin] >= min and x[IndexMax] <= max)
][result] , type table}}),
#"Combine Result" = Table.Combine(#"Grouped Rows"[Result]),
#"Sorted Rows" = Table.Sort(#"Combine Result",{{"Index_1", Order.Ascending}}),
#"Removed Columns" = Table.RemoveColumns(#"Sorted Rows",{"Index_1", "IndexMin", "IndexMax"})
in
#"Removed Columns"
Thank you Dufoq3 for your reply
I've just figured out how to pass a file, and I'm still trying to figure out how to integrate your request.
In the raw data file there's all the data and with the deleted data file lines 38137 to 38443 have been deleted, and that's what I want to do, but that's where I'm lost.
The deleted data are only at the end of March 7 and the beginning of March 8, the rest of the data remain unchanged.
Raw data file:
https://1drv.ms/u/s!Aum4XePOkLNetUfMM2bqgkp3wXs8?e=csAQcd
Deleted data file:
Here is a somewhat different method that, in your sample, deletes the rows with ind 38137-38443
let
//===================================
//Read in and process data
//change these lines to reflect your own data source
//to the point where you show a table with headers and properly typed
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "Column1", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), {"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7", "Column1.8", "Column1.9", "Column1.10"}),
#"Promoted Headers" = Table.PromoteHeaders(#"Split Column by Delimiter", [PromoteAllScalars=true]),
#"Changed Type2" = Table.TransformColumnTypes(#"Promoted Headers",{{"ind", Int64.Type}, {"jour", Int64.Type}, {"mois", Int64.Type}, {"annee", Int64.Type}, {"heure", Int64.Type}, {"minute", Int64.Type}, {"seconde", Int64.Type}, {"vitesse", type number}, {"etat_chaine", type text}, {"date_heure_server", type datetime}}),
//==============================
//Paste code below after the above part of your own query
//ensure ind sorted ascending
#"Sorted Rows" = Table.Sort(#"Changed Type2",{{"ind", Order.Ascending}}),
//Add date based on "jour, mois, annee" columns
#"Add ind Date" = Table.AddColumn(#"Sorted Rows", "Date Ind", each #date([annee]+2000,[mois],[jour]), type date),
//Group by ind Date, then generate lists of dates to be removed at
// end and start of dates
#"Grouped Rows" = Table.Group(#"Add ind Date", {"Date Ind"}, {
{"Day Start", (t)=>
List.Generate(
()=>[r=if t[vitesse]{0} < 0.1 then t{0} else null, idx=0],
each [r] <> null,
each [r = if t[vitesse]{[idx]+1} < 0.1 then t{[idx]+1} else null, idx=[idx]+1],
each [r]), type {[]}},
{"Day End", (t)=>
List.Generate(
()=>[r=if t[vitesse]{Table.RowCount(t)-1} < 0.1 then t{Table.RowCount(t)-1} else null, idx=Table.RowCount(t)-1],
each [r] <> null,
each [r = if t[vitesse]{[idx]-1} < 0.1 then t{[idx]-1} else null, idx=[idx]-1],
each [r]), type {[]}}}),
#"Rows to Remove" = List.Combine(#"Grouped Rows"[Day Start] & #"Grouped Rows"[Day End]),
#"Remove Rows" = Table.RemoveMatchingRows(#"Add ind Date",#"Rows to Remove"),
#"Remove Date ind column" = Table.RemoveColumns(#"Remove Rows",{"Date Ind"}),
//Return to original order
#"Sorted Rows1" = Table.Sort(#"Remove Date ind column",{{"ind", Order.Descending}})
in
#"Sorted Rows1"
Here are some other links.
Raw data file:
Deleted data file:
I have recreated the deleted data file, I think it is better
Deleted data file:
It works. My query above filters out same lines (38137 - 38443) 🙂
Just remember that it filters out [vitesse] less than 0.10. If you want to change it, you can do it here:
Can you explain to me how to use your query because I can't get it to work, even following the link that explains how to do it.
I created an empty query, then in source, I pasted your query, but it doesn't work.
Thanks for everything, I managed to put my csv file as a source file.
Is your query in DAX language?
Could you please explain how you managed to delete rows only at the beginning and end of each day?
Because I'm trying to decrypt your query, but I can't see the piece of code that does the job.
@treza88, if you don't know how to replace that Source step. Try this.
Just replace this address with your csv file address.
let
Source = Csv.Document(File.Contents("Address\raw data.csv"),[Delimiter="|", Columns=1, Encoding=1250, QuoteStyle=QuoteStyle.None]),
#"Split Column by Delimiter" = Table.SplitColumn(Source, "Column1", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), {"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7", "Column1.8", "Column1.9", "Column1.10"}),
#"Promoted Headers" = Table.PromoteHeaders(#"Split Column by Delimiter", [PromoteAllScalars=true]),
#"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers", {{"date_heure_server", type datetime}}, "fr-FR"),
#"Changed Type1" = Table.TransformColumnTypes(#"Changed Type",{{"vitesse", type number}}, "en-US"),
#"Inserted Date" = Table.AddColumn(#"Changed Type1", "Date", each DateTime.Date([date_heure_server]), type date),
#"Added Index" = Table.AddIndexColumn(#"Inserted Date", "Index_1", 1, 1, Int64.Type),
#"Grouped Rows" = Table.Group(#"Added Index", {"Date"}, {{"Min", each List.Min([date_heure_server]), type nullable datetime}, {"Max", each List.Max([date_heure_server]), type nullable datetime}, {"all", each Table.AddIndexColumn(Table.Sort(_, {{"date_heure_server", Order.Ascending}, {"vitesse", Order.Ascending}}), "IndexHelper", 0 , 1, Int64.Type)}, {"Result", each
[ a = Table.AddIndexColumn(Table.Sort(_, {{"date_heure_server", Order.Ascending}, {"vitesse", Order.Ascending}}), "IndexMin", 0 , 1, Int64.Type),
b = Table.AddIndexColumn(Table.Sort(a, {{"date_heure_server", Order.Ascending}, {"vitesse", Order.Descending}}), "IndexMax", 0 , 1, Int64.Type),
min = Table.First(Table.SelectRows(a, (x)=> x[vitesse] >= 0.1))[IndexMin],
max = Table.Last(Table.SelectRows(b, (x)=> x[vitesse] >= 0.1))[IndexMax],
result = Table.SelectRows(b, (x)=> x[IndexMin] >= min and x[IndexMax] <= max)
][result] , type table}}),
#"Combine Result" = Table.Combine(#"Grouped Rows"[Result]),
#"Sorted Rows" = Table.Sort(#"Combine Result",{{"Index_1", Order.Ascending}}),
#"Removed Columns" = Table.RemoveColumns(#"Sorted Rows",{"Index_1", "IndexMin", "IndexMax", "Date"})
in
#"Removed Columns"