cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
Termite
Frequent Visitor

SOLVED: Count rows based on column values greater than record value (different column)

Hi All,

 

I'm able to do quite a bit (altough basic) in PowerBI however this riddle got me stunt ... I was able to produce the desired outcome in excel but it's inconvenient as the size of the dataset cripples my machine and so i need to make this work in PBI and so i'm seeking help.

 

This is what the data looks like and what i wand to do.

Order IDOrder dateShipping dateOrder typeongoing ordersongoing shirt orderongoind pants order
11/1/2021 12:04:00 AM1/3/2021 10:00:00 AMshirt000
21/3/2021 8:57:00 AM1/6/2021 2:00 PMshirt110
32/4/2021 7:23:00 AM2/6/2021 11:30:00 PMshirt000
42/5/2021 12:02:00 PM2/8/2021 7:00:00 AMpants110
52/6/2021 11:03:00 PM2/9/2021 7:00:00 AMpants211

 

The idea behind column "ongoing order" is to count the number of rows where the "shipping date" is after the current record's "order date", the other 2 column would split that same information based on "order type".

 

I tried a number of things but most of the time I end up with a circular reference.

 

 

Is there a way to even do this in Power BI ?

1 ACCEPTED SOLUTION

let
    Origine = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fY/BCsIwDIZfZfQ8aPKndTM3H0DYvfTgTS8ibu+Pdd26zoGHkPCRfElCMGxaw5YtCNwwlJwSNZfrjGXBlFjB4/3xnlKmErENBvVAr76rNKdM8UVDreAlskJSBetyc6eQosCqYFahg2Z/iZsH/PbRthe2X+3VQ6/bcxoP1/ifvSSV5vxHg6KK8QM=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Order ID" = _t, #"Order date" = _t, #"Shipping date" = _t, #"Order type" = _t, #"ongoing orders" = _t, #"ongoing shirt order" = _t, #"ongoind pants order" = _t]),
    #"Modificato tipo" = Table.TransformColumnTypes(Origine,{{"Order ID", Int64.Type}, {"Order date", type datetime}, {"Shipping date", type datetime}, {"Order type", type text}, {"ongoing orders", Int64.Type}, {"ongoing shirt order", Int64.Type}, {"ongoind pants order", Int64.Type}},"en-US"),
    #"Aggiunta colonna personalizzata" = Table.AddColumn(#"Modificato tipo", "oo", each Table.RowCount(Table.SelectRows(#"Modificato tipo", (r)=>r[Shipping date]>[Order date] and r[Order date]<[Order date] ))),
    #"Aggiunta colonna personalizzata1" = Table.AddColumn(#"Aggiunta colonna personalizzata", "oos", each Table.RowCount(Table.SelectRows(#"Modificato tipo", (r)=>r[Shipping date]>[Order date] and r[Order date]<[Order date] and r[Order type]="shirt" ))),
    #"Aggiunta colonna personalizzata2" = Table.AddColumn(#"Aggiunta colonna personalizzata1", "oop", each Table.RowCount(Table.SelectRows(#"Modificato tipo", (r)=>r[Shipping date]>[Order date] and r[Order date]<[Order date] and r[Order type]="pants" )))
in
    #"Aggiunta colonna personalizzata2"

View solution in original post

15 REPLIES 15
Termite
Frequent Visitor

Thanks a million for the speedy response, clarifications:

 

1) the date format is M/D/Y

 

2) on a specific row, the "order date" needs to be verified against all other record's "shipping date" and i need a count of record where "shipping date" is > "order date". I don't know if i'm being clear 🙂

 

3) The count needs to be on the whole table.

 

My table has about 40k record per day and i will need to keep the data to report trend for 13months. currently i have 43 days of record and the excel file is starting to slow down (8mn to open, 3mn to calulate) i only show a few columns here, but there are 23 columns in the set.

 

Thanks again for the help !

2) on a specific row, the "order date" needs to be verified against all other record's "shipping date" and i need a count of record where "shipping date" is > "order date". I don't know if i'm being clear

 

I don't understand how, with this rule, the values of the [Ongoing Order] column are obtained.

 

Could you give a more specific example of this calculation, in the case, for example, of the 2 of line 5?

 

 

 

Hello @Rocco_sprmnt21 ,

 

For line 2:  The "order date" is : 1/3/2021 8:57:00 AM, i need to check all other orders where the shipping date / time is greater than this , the only one that fits is order 1, "shippping date" is 1/3/2021 10:00:00 AM therefore the count is 1

 

For line 5: The "order date" is : 2/6/2021 11:03:00 PM, if you check all other order where the shipping date is greater, the order that fits are: 3 & 4 as their respective  "shipping date" are 2/6/2021 11:30:00 PM and 2/8/2021 7:00:00 AM which are both greater than the record order date so the count is 2. Order 3 is shirt and order 4 is pants so the numbers get into the right columns as well 🙂

 

Thanks !

 

mmhh line 5 goes well, line 2 less well.

Try this, to check (if and where)the logic(fail):

 

after that we can try to  find a more performatn algorithm

 

let
    Origine = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fY/BCsIwDIZfZfQ8aPKndTM3H0DYvfTgTS8ibu+Pdd26zoGHkPCRfElCMGxaw5YtCNwwlJwSNZfrjGXBlFjB4/3xnlKmErENBvVAr76rNKdM8UVDreAlskJSBetyc6eQosCqYFahg2Z/iZsH/PbRthe2X+3VQ6/bcxoP1/ifvSSV5vxHg6KK8QM=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Order ID" = _t, #"Order date" = _t, #"Shipping date" = _t, #"Order type" = _t, #"ongoing orders" = _t, #"ongoing shirt order" = _t, #"ongoind pants order" = _t]),
    #"Modificato tipo" = Table.TransformColumnTypes(Origine,{{"Order ID", Int64.Type}, {"Order date", type datetime}, {"Shipping date", type datetime}, {"Order type", type text}, {"ongoing orders", Int64.Type}, {"ongoing shirt order", Int64.Type}, {"ongoind pants order", Int64.Type}}),
    #"Aggiunta colonna personalizzata" = Table.AddColumn(#"Modificato tipo", "oo", each Table.RowCount(Table.SelectRows(#"Modificato tipo", (r)=>r[Shipping date]>[Order date]))-1)
in
    #"Aggiunta colonna personalizzata"
Termite
Frequent Visitor

Wait, i believe I understand what does not make sense !

 

The count, will count all of the ongoing orders which actually means, the counted record will have an "order date" < current record "order date" and a "shipping date" > current record "Order date"...

 

A "small" thing i completely forgot to mention 🙂

 

I have been at this for too long ... 2 weeks trying to figure out this makes me feel a bit dumb and my brain is starting to become mush ^^

arrrrg...

 

let
    Origine = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fY/BCsIwDIZfZfQ8aPKndTM3H0DYvfTgTS8ibu+Pdd26zoGHkPCRfElCMGxaw5YtCNwwlJwSNZfrjGXBlFjB4/3xnlKmErENBvVAr76rNKdM8UVDreAlskJSBetyc6eQosCqYFahg2Z/iZsH/PbRthe2X+3VQ6/bcxoP1/ifvSSV5vxHg6KK8QM=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Order ID" = _t, #"Order date" = _t, #"Shipping date" = _t, #"Order type" = _t, #"ongoing orders" = _t, #"ongoing shirt order" = _t, #"ongoind pants order" = _t]),
    #"Modificato tipo" = Table.TransformColumnTypes(Origine,{{"Order ID", Int64.Type}, {"Order date", type datetime}, {"Shipping date", type datetime}, {"Order type", type text}, {"ongoing orders", Int64.Type}, {"ongoing shirt order", Int64.Type}, {"ongoind pants order", Int64.Type}},"en-US"),
    #"Aggiunta colonna personalizzata" = Table.AddColumn(#"Modificato tipo", "oo", each Table.RowCount(Table.SelectRows(#"Modificato tipo", (r)=>r[Shipping date]>[Order date] and r[Order date]<[Order date] )))
in
    #"Aggiunta colonna personalizzata"

let
    Origine = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("fY/BCsIwDIZfZfQ8aPKndTM3H0DYvfTgTS8ibu+Pdd26zoGHkPCRfElCMGxaw5YtCNwwlJwSNZfrjGXBlFjB4/3xnlKmErENBvVAr76rNKdM8UVDreAlskJSBetyc6eQosCqYFahg2Z/iZsH/PbRthe2X+3VQ6/bcxoP1/ifvSSV5vxHg6KK8QM=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [#"Order ID" = _t, #"Order date" = _t, #"Shipping date" = _t, #"Order type" = _t, #"ongoing orders" = _t, #"ongoing shirt order" = _t, #"ongoind pants order" = _t]),
    #"Modificato tipo" = Table.TransformColumnTypes(Origine,{{"Order ID", Int64.Type}, {"Order date", type datetime}, {"Shipping date", type datetime}, {"Order type", type text}, {"ongoing orders", Int64.Type}, {"ongoing shirt order", Int64.Type}, {"ongoind pants order", Int64.Type}},"en-US"),
    #"Aggiunta colonna personalizzata" = Table.AddColumn(#"Modificato tipo", "oo", each Table.RowCount(Table.SelectRows(#"Modificato tipo", (r)=>r[Shipping date]>[Order date] and r[Order date]<[Order date] ))),
    #"Aggiunta colonna personalizzata1" = Table.AddColumn(#"Aggiunta colonna personalizzata", "oos", each Table.RowCount(Table.SelectRows(#"Modificato tipo", (r)=>r[Shipping date]>[Order date] and r[Order date]<[Order date] and r[Order type]="shirt" ))),
    #"Aggiunta colonna personalizzata2" = Table.AddColumn(#"Aggiunta colonna personalizzata1", "oop", each Table.RowCount(Table.SelectRows(#"Modificato tipo", (r)=>r[Shipping date]>[Order date] and r[Order date]<[Order date] and r[Order type]="pants" )))
in
    #"Aggiunta colonna personalizzata2"

View solution in original post

Ok, so this is odd, what you are giving me above works like a charm, i changed the data around a little and it works perfectly !!

 

Only issue is when I put that into my larger dataset ... it never resolves. it spins around in circle "applying changes" but it never succeeds, I waited for almost 2h and then i gave up 🙂

 

I'm going to let it run overnight to see if it works and then revert back.

 

Thanks again for all the help !!!

No. I think that for large dataset we need to change in some way strategy.

We need to limit at some little subgroup of rows the range where table.selectrows acts.

If, for example, the dataset has 100K lines the naive algorithm foresees the checking of curr_ord_date with all the other ord_dates and with all the other ship_dates. So, about, 2X100kX100k controls. If, on the other hand, the dataset is ordered by increasing order date, the checks to be made for each record can be limited to the previous records only. Therefore 100k X (100k + 1) / 2. ie about 1/4 of the previous number.

 

 

 

Ok i think i understand.. Is there a way to filter the table on the records that are "prior" to the current record, so for instance, we could keep this bit: 

 

(r)=>r[Shipping date]>[Order date]

 

 

 

 but then have the table we run this on (#"Modificato tipo") modified to only include record in the table where [Order date] < current record [Order date].

 

Is this possible perhaps?

 

this line of code

 

 

    #"Aggiunta colonna personalizzata" = Table.AddColumn(#"Modificato tipo", "oo", each Table.RowCount(Table.SelectRows(#"Modificato tipo", (r)=>r[Shipping date]>[Order date] and r[Order date]<[Order date] ))),

 

 

this line of code runs for each row of the table (let's say 100k times) and for each turn it checks the [order date] value with all the others [order date] and [order shipping] (so 200k checks). If the table is sorted by increasing values of [order date] we can we can do about a quarter of the checks.  If for each record instead of checking [the shipping date] in all the records of the table we check it only on 1/100 of the rows (the 1000 rows preceding the current record, for example) we have reduced the checks by 200 times. And the execution time will be reduced by a lot (not 200 times, but a lot anyway)

 

operationally this can be done by adding an index column and using, for example, the Table.Range function to restrict the field in which to search. But this way depends on the actual situation of your data.

Ok, now I understand ... there are, generaly speaking, less than 500 transaction per day and the difference between Order Time and Shipping time is always less than 3h as those are micro-transaction delivered online.

 

How do I limit the control for that ?

 

I am also quite comfortable to force the data set to be sorted by date/time prior to adding this new column so that it always "look upwards" if that makes any sense.

 

You have been awesome, and i believe i have pretty much everything i need to suceed. I'm gonna give it a go and let you know soon !

 

Thanks a million !

 

"

Ok, now I understand ... there are, generaly speaking, less than 500 transaction per day and the difference between Order Time and Shipping time is always less than 3h as those are micro-transaction delivered online.

How do I limit the control for that ?"

 

 

500 transactions per day and maximum 3h of time between order and shipping. Assuming that 500 orders are processed uniformly in 10h. This implies that 150 transactions are handled in 3 hours. So checking backwards for 200 transactions should be enough to verify all shipping timestamps. In the first place, I suggest you try only sorting by order date / time first and experiment on a 10K dataset measure the time and then gradually increase 8 by 10k in 10k, for example) the size of the dataset and see how time grows

 

 

you have to give some more info about struct of your dataset.

 

For example, it might be useful to use the information of the maximum difference between shipping time and order time. If this duration is a maximum of 5 days, there is no need to check records with order_date older than 5 days for a given record.

Assuming that we are certain that in 5 days there are at most 1k orders, we can limit the control to the previous 1000 order_dates. So we can do 100k X 1K checks instead of 2 X 100k X 100k that is 1/200-th

 

 

 

Rocco_sprmnt21
Super User II
Super User II

some clarifications:

1) is the date format d / m / y or m / d / y?

2) what do I mean by "the" shipping date "is after the current record's" order date ","? for example 02/06/2021 07:00:00 is after 02/06/2021 23:03:00?

3) should the count be done on the whole table or limited to some subgroup of records? 

 

PS

how many rowsXcolumns has your table?

 

Helpful resources

Announcements
PBI User Groups

Welcome to the User Group Public Preview

Check out new user group experience and if you are a leader please create your group!

MBAS Attendee Badge

Claim Your Badge & Digital Swag!

Check out how to claim yours today!

secondImage

Are You Ready?

Test your skills now with the Cloud Skills Challenge.

Top Solution Authors