Delete data at the start and end of the day

treza88 · ‎03-08-2024

Hi everyone,

Sorry for my English, but I will try to do my best to make myself understood, because I am French.

I have a data group that starts every day in the morning and ends in the evening. Each start and end of the day I also have a variable which is 0.

I would like to delete the start and end of day lines which have variable 0, and only these lines. Because in my data during the day there is this variable which is at 0 which I do not want to delete

I repeat it only at the beginning and end of the day until the variable is no longer at 0.

So I know that my explanations and information are incomplete, but I hope that you will help me, because my level with Power Query is very low.

Tell me what you need so you can help me

Thanks in advance.

treza88 · ‎03-11-2024

Thank you both,

I've learned a lot about the M language and I'm going to try to learn even more, because it looks very interesting.

v-xinruzhu-msft · ‎03-22-2024

Hi @treza88

Glad to hear you've learned a lot，if the solutions @ronrsnfld and @dufoq3 offered help you, please consider to mark them as a solution, and if you need more help, you can also provide some information or start a new post.

Best Regards!

Yolo Zhu

ronrsnfld · ‎03-10-2024

@dufoq3 : Thank you for pointing that out.

Exploring the steps, using the diagnostic tools in Power BI, it seems there is a lot of time taken up by sorting, as well as by the computed date column.

By assuming the data is sorted descending as it is in the data provided by @treza88 , and by doing the Table.Group on the "jour","mois" and "annee" columns instead of computing, and then removing a date column, execution speeds up considerably.

let
    Source = Csv.Document(File.Contents("C:\Users\ron\Desktop\raw data.csv"),[Delimiter=",", Columns=10, Encoding=1252, QuoteStyle=QuoteStyle.None]),
    #"Promoted Headers" = Table.PromoteHeaders(Source, [PromoteAllScalars=true]),

    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers",{
        {"ind", Int64.Type}, {"jour", Int64.Type}, {"mois", Int64.Type}, 
        {"annee", Int64.Type}, {"heure", Int64.Type}, {"minute", Int64.Type}, 
        {"seconde", Int64.Type}, {"vitesse", type number}, 
        {"etat_chaine", type text}, {"date_heure_server", type datetime}}),

//Group by Date Columns, then generate lists of dates to be removed at
//   end and start of dates
    #"Grouped Rows" = Table.Group(#"Changed Type", {"jour","mois","annee"}, {
        {"Day Start", (t)=> let rc = Table.RowCount(t)-1,
            rws = List.Generate(
                ()=>[r=if t[vitesse]{rc} < 0.1 then t{rc} else null, idx=rc],
                each [r] <> null,
                each [r = if t[vitesse]{[idx]-1} < 0.1 then t{[idx]-1} else null, idx=[idx]-1],
                each [r])
                    in rws, type {[]}},
        {"Day End", (t)=>
            List.Generate(
                ()=>[r=if t[vitesse]{0} < 0.1 then t{0} else null, idx=0],
                each [r] <> null,
                each [r = if t[vitesse]{[idx]+1} < 0.1 then t{[idx]+1} else null, idx=[idx]+1],
                each [r]), type {[]}}}),
    #"Rows to Remove" = List.Combine(#"Grouped Rows"[Day Start] & #"Grouped Rows"[Day End]),
    #"Remove Rows" = Table.RemoveMatchingRows(#"Changed Type",#"Rows to Remove")
in
    #"Remove Rows"

dufoq3 · ‎03-10-2024

It is much faster now @ronrsnfld, but still 2-3 times slower. I'm not saying this because I want to be rude or sometinig... I like to test queries, because this is how I learn.

I've found another issue with your query. Query should delete rows where[vitese] is less than 0,1 only if it is at the beginning or end of the date. You can see my sample data here (or just check your code with my sample data below).

@treza88 mentioned it here:

Below is your query with my sample data and you can see that in Result it is missing Source row no 11. (It is less than 0.1 but it is not at the beginning or end of the day)

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("rdK7EcMgDADQVXLUPltfhNNlgEzg8/5rRFwcJwXCFCkkPsU7JLFtiS2vlKZkHuxB4gmLp+yh9QpmQF8edWsL8EJAcsNyB0v79CYwJOQg4IqAkOBBoqwhQaNECQnsE/kkLC6k282vkJuCfgbCM9Xzs0doSMhB2BUhIcGjr+CQoD8V4u3UNqEn0f7f+jNTaRMySsSFOLG/AA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [index = _t, jour = _t, mois = _t, annee = _t, heure = _t, minute = _t, seconde = _t, vitesse = _t, etat_chaine = _t, date_heure_server = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{
        {"index", Int64.Type}, {"jour", Int64.Type}, {"mois", Int64.Type}, 
        {"annee", Int64.Type}, {"heure", Int64.Type}, {"minute", Int64.Type}, 
        {"seconde", Int64.Type}, {"vitesse", type number}, 
        {"etat_chaine", type text}, {"date_heure_server", type datetime}}, "en-US"),

//Group by Date Columns, then generate lists of dates to be removed at
//   end and start of dates
    #"Grouped Rows" = Table.Group(#"Changed Type", {"jour","mois","annee"}, {
        {"Day Start", (t)=> let rc = Table.RowCount(t)-1,
            rws = List.Generate(
                ()=>[r=if t[vitesse]{rc} < 0.1 then t{rc} else null, idx=rc],
                each [r] <> null,
                each [r = if t[vitesse]{[idx]-1} < 0.1 then t{[idx]-1} else null, idx=[idx]-1],
                each [r])
                    in rws, type {[]}},
        {"Day End", (t)=>
            List.Generate(
                ()=>[r=if t[vitesse]{0} < 0.1 then t{0} else null, idx=0],
                each [r] <> null,
                each [r = if t[vitesse]{[idx]+1} < 0.1 then t{[idx]+1} else null, idx=[idx]+1],
                each [r]), type {[]}}}),
    #"Rows to Remove" = List.Combine(#"Grouped Rows"[Day Start] & #"Grouped Rows"[Day End]),
    #"Remove Rows" = Table.RemoveMatchingRows(#"Changed Type",#"Rows to Remove")
in
    #"Remove Rows"

Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

ronrsnfld · ‎03-10-2024

@dufoq3 That is only because your sample data is not sorted as is the OP's data. Once you sort it, that row is retained. Also, although not contributing to this problem, you have duplicate "Index" rows in your sample which I do not believe are present in the original data

dufoq3 · ‎03-10-2024

You're wright, if the data is sorted by descending order, your query works. Regarding index. I know that it is not necessary to have additional index, but I use it because sometimes it is not possible to rely with user index. Without adding this additional index and sorting at the end my query will be even faster, but for this purpose it is better to use it with additional index (just for sure).

Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

ronrsnfld · ‎03-10-2024

@dufoq3 I was not referring to the added index columns, rather to the Index column in your sample data (the first column) which is mimicing the "ind" column from the OP. That column has duplicate values which do not exist in the original.

dufoq3 · ‎03-09-2024

Hi @treza88, I've changed sample data a bit to determine more situations.

Let me know if this is what you want.

Selected lines will be deleted:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("rdK7EcMgDADQVXLUPltfhNNlgEzg8/5rRFwcJwXCFCkkPsU7JLFtiS2vlKZkHuxB4gmLp+yh9QpmQF8edWsL8EJAcsNyB0v79CYwJOQg4IqAkOBBoqwhQaNECQnsE/kkLC6k282vkJuCfgbCM9Xzs0doSMhB2BUhIcGjr+CQoD8V4u3UNqEn0f7f+jNTaRMySsSFOLG/AA==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [index = _t, jour = _t, mois = _t, année = _t, heure = _t, minute = _t, seconde = _t, vitesse = _t, etat_machine = _t, date_heure_server = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source, {{"date_heure_server", type datetime}}, "fr-FR"),
    #"Changed Type1" = Table.TransformColumnTypes(#"Changed Type",{{"vitesse", type number}}, "en-US"),
    #"Inserted Date" = Table.AddColumn(#"Changed Type1", "Date", each DateTime.Date([date_heure_server]), type date),
    #"Added Index" = Table.AddIndexColumn(#"Inserted Date", "Index_1", 1, 1, Int64.Type),
    #"Grouped Rows" = Table.Group(#"Added Index", {"Date"}, {{"Min", each List.Min([date_heure_server]), type nullable datetime}, {"Max", each List.Max([date_heure_server]), type nullable datetime}, {"all", each Table.AddIndexColumn(Table.Sort(_, {{"date_heure_server", Order.Ascending}, {"vitesse", Order.Ascending}}), "IndexHelper", 0 , 1, Int64.Type)}, {"Result", each 
     [ a = Table.AddIndexColumn(Table.Sort(_, {{"date_heure_server", Order.Ascending}, {"vitesse", Order.Ascending}}), "IndexMin", 0 , 1, Int64.Type),
       b = Table.AddIndexColumn(Table.Sort(a, {{"date_heure_server", Order.Ascending}, {"vitesse", Order.Descending}}), "IndexMax", 0 , 1, Int64.Type),
       min = Table.First(Table.SelectRows(a, (x)=> x[vitesse] >= 0.1))[IndexMin],
       max = Table.Last(Table.SelectRows(b, (x)=> x[vitesse] >= 0.1))[IndexMax],
       result = Table.SelectRows(b, (x)=> x[IndexMin] >= min and x[IndexMax] <= max)
     ][result] , type table}}),
    #"Combine Result" = Table.Combine(#"Grouped Rows"[Result]),
    #"Sorted Rows" = Table.Sort(#"Combine Result",{{"Index_1", Order.Ascending}}),
    #"Removed Columns" = Table.RemoveColumns(#"Sorted Rows",{"Index_1", "IndexMin", "IndexMax"})
in
    #"Removed Columns"

Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

treza88 · ‎03-09-2024

Thank you Dufoq3 for your reply

I've just figured out how to pass a file, and I'm still trying to figure out how to integrate your request.

In the raw data file there's all the data and with the deleted data file lines 38137 to 38443 have been deleted, and that's what I want to do, but that's where I'm lost.

The deleted data are only at the end of March 7 and the beginning of March 8, the rest of the data remain unchanged.

Raw data file:

https://1drv.ms/u/s!Aum4XePOkLNetUfMM2bqgkp3wXs8?e=csAQcd

Deleted data file:

https://1drv.ms/u/s!Aum4XePOkLNetUjSp5W4H7dOWXJs?e=yf7Ocu

ronrsnfld · ‎03-09-2024

Here is a somewhat different method that, in your sample, deletes the rows with ind 38137-38443

Group a sorted list by Date
for each date, starting at the beginning and working down (or end and working up), generate a list of records (rows) which meet the criteria of a very small vitesse (I used 0.1 but you can adjust that) until the record no longer meets that criteria
Then use that list of records in the Table.RemoveMatchingRows function.

let

//===================================
//Read in and process data
//change these lines to reflect your own data source
//to the point where you show a table with headers and properly typed
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Changed Type", "Column1", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), {"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7", "Column1.8", "Column1.9", "Column1.10"}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Split Column by Delimiter", [PromoteAllScalars=true]),
    #"Changed Type2" = Table.TransformColumnTypes(#"Promoted Headers",{{"ind", Int64.Type}, {"jour", Int64.Type}, {"mois", Int64.Type}, {"annee", Int64.Type}, {"heure", Int64.Type}, {"minute", Int64.Type}, {"seconde", Int64.Type}, {"vitesse", type number}, {"etat_chaine", type text}, {"date_heure_server", type datetime}}),
//==============================

//Paste code below after the above part of your own query

//ensure ind sorted ascending
    #"Sorted Rows" = Table.Sort(#"Changed Type2",{{"ind", Order.Ascending}}),

//Add date based on "jour, mois, annee" columns
    #"Add ind Date" = Table.AddColumn(#"Sorted Rows", "Date Ind", each #date([annee]+2000,[mois],[jour]), type date),

//Group by ind Date, then generate lists of dates to be removed at
//   end and start of dates
    #"Grouped Rows" = Table.Group(#"Add ind Date", {"Date Ind"}, {
        {"Day Start", (t)=> 
            List.Generate(
                ()=>[r=if t[vitesse]{0} < 0.1 then t{0} else null, idx=0],
                each [r] <> null,
                each [r = if t[vitesse]{[idx]+1} < 0.1 then t{[idx]+1} else null, idx=[idx]+1],
                each [r]), type {[]}},
        {"Day End", (t)=>
            List.Generate(
                ()=>[r=if t[vitesse]{Table.RowCount(t)-1} < 0.1 then t{Table.RowCount(t)-1} else null, idx=Table.RowCount(t)-1],
                each [r] <> null,
                each [r = if t[vitesse]{[idx]-1} < 0.1 then t{[idx]-1} else null, idx=[idx]-1],
                each [r]), type {[]}}}),
    #"Rows to Remove" = List.Combine(#"Grouped Rows"[Day Start] & #"Grouped Rows"[Day End]),
    #"Remove Rows" = Table.RemoveMatchingRows(#"Add ind Date",#"Rows to Remove"),
    #"Remove Date ind column" = Table.RemoveColumns(#"Remove Rows",{"Date Ind"}),

//Return to original order
    #"Sorted Rows1" = Table.Sort(#"Remove Date ind column",{{"ind", Order.Descending}})
    
in
    #"Sorted Rows1"

dufoq3 · ‎03-10-2024

It is but if you use i.e. Table.Repeat (10times) for source table and check speeds, there is a huge difference.

Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

dufoq3 · ‎03-09-2024

Could you upload Deleted data file somewhere else? I'm trying to download it but it is still processing:

Clicking on the green button doesn't solve the problem.

Raw data link works.

Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

treza88 · ‎03-09-2024

Here are some other links.

Raw data file:

https://www.dropbox.com/scl/fi/72zo9ienjvhr8yse5t698/raw-data.csv?rlkey=k5wcvffxtqnauzrhclxd9ujac&dl...

Deleted data file:

https://www.dropbox.com/scl/fi/vszkzv3il14aey2sjoqpj/deleted-data.csv?rlkey=ccei1m2433hh2320gai1cv2f...

treza88 · ‎03-09-2024

I have recreated the deleted data file, I think it is better

Deleted data file:

https://www.dropbox.com/scl/fi/hk2awdiemvgooxs9krigv/deleted-data2.csv?rlkey=axzqgik0ml7f84tvlxveipy...

dufoq3 · ‎03-09-2024

It works. My query above filters out same lines (38137 - 38443) 🙂

Just remember that it filters out [vitesse] less than 0.10. If you want to change it, you can do it here:

Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

treza88 · ‎03-09-2024

Can you explain to me how to use your query because I can't get it to work, even following the link that explains how to do it.

I created an empty query, then in source, I pasted your query, but it doesn't work.

dufoq3 · ‎03-09-2024

Create blank query
Open Advanced Editor and replace whole code with one I created
Close Advanced Editor
Select Source step (on the right panel)
Now you should see my code for Source step
Select whole code and replace it with your table reference i.e. if you have table called Table1 so replace whole code with = Table1 or = #"Table 1" if you have some special characters or space in table name.
Your table must be like this:

Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

treza88 · ‎03-09-2024

Thanks for everything, I managed to put my csv file as a source file.

Is your query in DAX language?

Could you please explain how you managed to delete rows only at the beginning and end of each day?

Because I'm trying to decrypt your query, but I can't see the piece of code that does the job.

dufoq3 · ‎03-09-2024

It is not DAX langage but M (Power Query). You need whole code except Source step which you have to replace with your table reference as I mentioned before...

Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

dufoq3 · ‎03-09-2024

@treza88, if you don't know how to replace that Source step. Try this.

Just replace this address with your csv file address.

let
    Source = Csv.Document(File.Contents("Address\raw data.csv"),[Delimiter="|", Columns=1, Encoding=1250, QuoteStyle=QuoteStyle.None]),
    #"Split Column by Delimiter" = Table.SplitColumn(Source, "Column1", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), {"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7", "Column1.8", "Column1.9", "Column1.10"}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Split Column by Delimiter", [PromoteAllScalars=true]),
    #"Changed Type" = Table.TransformColumnTypes(#"Promoted Headers", {{"date_heure_server", type datetime}}, "fr-FR"),
    #"Changed Type1" = Table.TransformColumnTypes(#"Changed Type",{{"vitesse", type number}}, "en-US"),
    #"Inserted Date" = Table.AddColumn(#"Changed Type1", "Date", each DateTime.Date([date_heure_server]), type date),
    #"Added Index" = Table.AddIndexColumn(#"Inserted Date", "Index_1", 1, 1, Int64.Type),
    #"Grouped Rows" = Table.Group(#"Added Index", {"Date"}, {{"Min", each List.Min([date_heure_server]), type nullable datetime}, {"Max", each List.Max([date_heure_server]), type nullable datetime}, {"all", each Table.AddIndexColumn(Table.Sort(_, {{"date_heure_server", Order.Ascending}, {"vitesse", Order.Ascending}}), "IndexHelper", 0 , 1, Int64.Type)}, {"Result", each 
     [ a = Table.AddIndexColumn(Table.Sort(_, {{"date_heure_server", Order.Ascending}, {"vitesse", Order.Ascending}}), "IndexMin", 0 , 1, Int64.Type),
       b = Table.AddIndexColumn(Table.Sort(a, {{"date_heure_server", Order.Ascending}, {"vitesse", Order.Descending}}), "IndexMax", 0 , 1, Int64.Type),
       min = Table.First(Table.SelectRows(a, (x)=> x[vitesse] >= 0.1))[IndexMin],
       max = Table.Last(Table.SelectRows(b, (x)=> x[vitesse] >= 0.1))[IndexMax],
       result = Table.SelectRows(b, (x)=> x[IndexMin] >= min and x[IndexMax] <= max)
     ][result] , type table}}),
    #"Combine Result" = Table.Combine(#"Grouped Rows"[Result]),
    #"Sorted Rows" = Table.Sort(#"Combine Result",{{"Index_1", Order.Ascending}}),
    #"Removed Columns" = Table.RemoveColumns(#"Sorted Rows",{"Index_1", "IndexMin", "IndexMax", "Date"})
in
    #"Removed Columns"

Note: Check this link to learn how to use my query.
Check this link if you don't know how to provide sample data.

Delete data at the start and end of the day

Helpful resources

New forum boards available in Real-Time Intelligence.

Power BI Monthly Update - May 2024

Jumpstart your career with the Fabric Career Hub

Delete data at the start and end of the day

Helpful resources

New forum boards available in Real-Time Intelligence.

Power BI Monthly Update - May 2024