Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
Anonymous
Not applicable

Need help with duplicated data

Hi everyone, i've just started using Power BI and seen parts of it's capabilities. However, i've gotten into a problem with structuring my data. 

 

I'm interested in making Power BI count the amount of specific veterinary treatments per week per county. My raw data is public records, so I can not continiusly be restructuring it. I've made a simple graph and made Power BI count the amount of treatments, but due to the data, treatments are counted several times 

 

In my dataset each treatment often have several inputs (one per week) as they usualy are ongoing over several weeks. Normally just removing duplicates would solve this issue. However, sites can treat multiple times a year, so this option would also give me the wrong count. 

 

So, is there a way to get power BI to ignore treatments with less than 5 weeks between them? I've attached a small sample from my sheet below. 

 

https://ibb.co/k5pcQmg

 

1 ACCEPTED SOLUTION

@Anonymous

Ok, I refined a bit what we had yesterday. No real changes. Following the logic explained earlier, we have three calculated columns to come up with an ID per treatment that you can then use as you please. See below. Here's your .pbix modified, with the three new columns. Does this solve your issue?

 

1. First column

 

Table1[TreatmentChange] = 
VAR MaxWeekGap = 2
VAR CurrentWeek = Table1[Uke]
VAR PreviousWeek =
    CALCULATE (
        MAX ( Table1[Uke] );
        Table1[Uke] < CurrentWeek;
        ALLEXCEPT ( Table1; Table1[Lokalitetsnummer])
    )
RETURN
    IF (
        ISBLANK ( PreviousWeek );
        1;
        IF (
            ( CurrentWeek - PreviousWeek )
                > MaxWeekGap;
            1;
            0
        )
    )

2. Second column. The number of treatment is determined as you explained. For a specific farm (Lokalitetsnummer), we start with treatment number 1 and go only to treatment number 2 when we encounter that more than two weeks have elapsed since the last time the site reported.    

 

Table1[TreatmentNumber] = 
VAR CurrentWeek = Table1[Uke]
RETURN
    CALCULATE (
        SUM ( Table1[TreatmentChange] );
        Table1[Uke] <= CurrentWeek;
        ALLEXCEPT ( Table1; Table1[Lokalitetsnummer] )
    )

3. Third column with the treatment ID. It is a concatenation of Lokalitetsnummer and the number of treatment. Thus, for example for site with Lokalitetsnummer 11116  we happen to have 11116_1, 11116_2 and 11116_3. Three different treatments.

 

Table1[TreatmentID] = Table1[Lokalitetsnummer] & "_" & Table1[TreatmentNumber] 

 

View solution in original post

19 REPLIES 19
AlB
Super User
Super User

Hi @Anonymous

Where in your data can we see how long a treatment is? And the ID for the treatment?

You should also elaborate on what exactly you mean by ignore treatments with less than 5 weeks between them. I'm interpreting to be that you want to keep treatments that took 5 weeks or more from the beginning to end. 

I suspect Uke is the number of the week on which the treatment was administered and Lokalitetsnummer a valid ID (unique) for the treatment. If so, you could use something like this (ID is the column with the treatment identifier) 

 

COUNTROWS (
    FILTER (
        SUMMARIZECOLUMNS (
            Table[ID];
            "WeekLenght"; CALCULATE ( MAX ( Table[Uke] ) ) - CALCULATE ( MIN ( Table[Uke] ) )
                + 1
        );
        [WeekLenght] >= 5
    )
)

 

     

Anonymous
Not applicable

Hi, thank you for taking interest! 

 

Sadly the way a treatment is registered is with an individual input each week without an uniqe ID for the treatment. So each treatment usually has 2-3 inputs in the goverment list. This is due to the way farmers are reporting to the goverment. Which in essense can be seen as: are you treating? if yes there is additional data

The data collected can be seen as: week, year, site-id, site name, action, type of treatment, chemical

 

https://ibb.co/KGdbV1r

 

As you can see from the attached picture, treatments often range over several weeks, which makes it hard to figure out how many uniqe treatments there actually has been. A regular remove duplicate would make earlier treatments disapear. 

 

So, could you think of a way to get power bi to count just one of the inputs per treatment? 

 

Could for example, making it count the treatment if there is more than 5 weeks since the last treatment at that site be a solution? 

 

Hope that clarifies a bit,

Thank you!

 

@Anonymous

I'm a bit lost here.

We first need to clarify what a "treatment" is.  You say they do not have IDs but at the same time you claim for instance that "...each treatment usually has 2-3 inputs in the goverment list..."

How do you tell those 2-3 inputs from the rest??? 

Anonymous
Not applicable

Hi, I can see that it is a bit messy written, sorry for that. 

 

A treatment in this case is when a farm has reported in their weekly report to the authorities that they are using a medical compound. These usualy ranges from 1-3 weeks, and generates a row on the public list per week the treatment is ongoing. 

 

I am interested in generating statistics on how many treatments has been conducted. Therefore, the weekly entries messes up the data as what is actually one treatment is counted 2-3 times by Power Bi. 

 

to clarify, attached is a small part of the list, with two scenarios. 

1. In this case you can see that the name of the site treating is Aldeøyna. both of the rows is in this case the same treatment (which is what messes up the counting). However, it is ranging over two weeks and therefore has two rows (one per week reportend ongoing treatment)

 

2. In this case the site Austenesåa is treating. The entries for week 1 and 2 is the same treatment (ongoing for two weeks), but since there is more than 2 weeks since the last reported treatment, you can see that in week (uke) 6 a new treatment is initiated. 

 

https://ibb.co/rZGmGfW

 

The way to seperate between what is the same treatment and a new one is to look at the weeknumbers. if there is more than 2 weeks since the site (lokalitetsnavn) reported a treatment, it should be considered a new one.

 

Hope this clarifies

 

 

@Anonymous

Ok, that clarifies it. Can you share a sample of your data? 

Hey @Anonymous

 

I think you can do the following.

Bear in mind that this is not tested at all (i could have, had you provided data) so it will need refinement but you can get the general idea. Please always share a sample of your data to make things easier for people trying help; you will be more likely to get responses if you do so.

 

1. Create a calculated column in your table. It will have a 1 when a new treatment is initiated (if more than two weeks since last entry for that farmer) and a zero otherwise.  

 

Table[TreatmentChange] =
VAR MaxWeekGap = 2
VAR Current = Table[Uke]
VAR Previous =
    CALCULATE (
        MAX ( Table[Uke] );
        Table[Uke] < Current;
        ALLEXCEPT ( Table; Table[Lockalitetsnummer] )
    )
RETURN
    IF (
        ISBLANK ( Previous );
        1;
        //Watch out with the edge value. Correct?
        IF (
            ( Current - Previous )
                > MaxWeekGap;
            1;
            0
        )
    )

2. Create a second calculated column that uses the previous one. In each row, Table[TreatmentNumber] will have the number of treatment for that particular farmer.

 

Table[TreatmentNumber] =
VAR CurrentWeek = Table[Uke]
RETURN
    CALCULATE (
        SUM ( Table[TreatmentChange] );
        Table[Uke] <= CurrentWeek;
        ALLEXCEPT ( Table; Table[Lockalitetsnummer] )
    )

 

 

3. You can then create a third column that concatenates Table[Lockalitetsnummer] and Table[TreatmentNumber]. With that you have an ID for each treatment and can do a count on it or whatever you need.  

Anonymous
Not applicable

Hi, i tried the formula but i'm getting "token EOF expected" for  "MaxWeekGap"

@Anonymous

Ok, I refined a bit what we had yesterday. No real changes. Following the logic explained earlier, we have three calculated columns to come up with an ID per treatment that you can then use as you please. See below. Here's your .pbix modified, with the three new columns. Does this solve your issue?

 

1. First column

 

Table1[TreatmentChange] = 
VAR MaxWeekGap = 2
VAR CurrentWeek = Table1[Uke]
VAR PreviousWeek =
    CALCULATE (
        MAX ( Table1[Uke] );
        Table1[Uke] < CurrentWeek;
        ALLEXCEPT ( Table1; Table1[Lokalitetsnummer])
    )
RETURN
    IF (
        ISBLANK ( PreviousWeek );
        1;
        IF (
            ( CurrentWeek - PreviousWeek )
                > MaxWeekGap;
            1;
            0
        )
    )

2. Second column. The number of treatment is determined as you explained. For a specific farm (Lokalitetsnummer), we start with treatment number 1 and go only to treatment number 2 when we encounter that more than two weeks have elapsed since the last time the site reported.    

 

Table1[TreatmentNumber] = 
VAR CurrentWeek = Table1[Uke]
RETURN
    CALCULATE (
        SUM ( Table1[TreatmentChange] );
        Table1[Uke] <= CurrentWeek;
        ALLEXCEPT ( Table1; Table1[Lokalitetsnummer] )
    )

3. Third column with the treatment ID. It is a concatenation of Lokalitetsnummer and the number of treatment. Thus, for example for site with Lokalitetsnummer 11116  we happen to have 11116_1, 11116_2 and 11116_3. Three different treatments.

 

Table1[TreatmentID] = Table1[Lokalitetsnummer] & "_" & Table1[TreatmentNumber] 

 

Anonymous
Not applicable

Fantastic! Thank you for all the help. By giving the treatments ID i can use exactly the same approach for all parts of the bigger dataset that I have. Again, thank you!

 

Cheers,

@Anonymous

My pleasure

Anonymous
Not applicable

Hi @AlB, thank you for all the previous help. From your formula i have managed to build significant improvments on my dataprocessing. However, could I ask you to look into one more thing? 

 

I have expanded the dataset from just 2018 to also include previous years. Naturally this bring another variable to the "treatmentchange" collumn as this just seperates per week. So if a treatment is started in closely realted week in a previous year it isn't picked up.

 

The sheet is exactly the same as you allready have downloaded, execpt that the "År" collumn now also contains 2017 ,2016 etc. 

 

Kind regards

Hey @Anonymous

 

What first comes to mind, and in order to reutilise to the max the approach we have already have, is to create a new "global" week number, i.e.  that gives the week number not for a year but for the whole period you are considering. For instance, imagine your first year is 2016:

- Week 1 of 2016 is your "beginning of time" and would be assigned GlobalWeekNum=1.

- Week number 13 of 2017 would be assigned GlobalWeekNum=NumberOfWeeksIn2016+13=52+13=65    

And so on...

so the GlobalWeekNumber doesn't go back to one at the change of the year like the standard week number does, but just continues on. 

If you have an extra column with that global week num you would use it instead of [Uke] in the previous code and I believe it should work . Give it a good thought. I'm just saying this quickly.

 

Be careful with years with 53 weeks and especially with the ALLEXCEPT( ) we had in the code as it might need some changes.

 

CALCULATE (
        MAX ( Table1[Uke] );
        Table1[Uke] < CurrentWeek;
        ALLEXCEPT ( Table1; Table1[Lokalitetsnummer])
    )

 

 

Anonymous
Not applicable

@AlBI like your thought, and i agree that it might work, and i don't see any hickups down the line of where I want to go with my data. 

As you probably well have realized i'm not exactly fluent in DAX, so do you have a suggested code for the Globalweek collumn? 

 

Cheers, 

@Anonymous

I'm a bit busy right now. You give it a try and if it doesn't work post what you have  and we'll take a look. 

It should be possible with code quite similar to what you would use in excel.

Anonymous
Not applicable

Thank you for the pointers,

Globalweek =
VAR MaxWeeks = SUMMARIZE(ALL('Slice 2016-2018');'Slice 2016-2018'[År];"MaxWeek";MAX('Slice 2016-2018'[Uke]))
VAR MyYear = [År]
VAR MyStart = SUMX(FILTER(MaxWeeks;[år]<MyYear);[MaxWeek])
VAR firstYear = CALCULATE(FIRSTNONBLANK('Slice 2016-2018'[År];1);ALL('Slice 2016-2018'))
VAR myNum = IF(MyYear=firstYear;[Uke];MyStart+[Uke])
RETURN myNum
 
Appears to have to have done the trick. 
 
Cheers,

@Anonymous

Aha... so "not exactly fluent in DAX", eh? You were trying to trick me. Smiley Very Happy

That looks pretty awesome.  Well done!

you gotten the whole thing to work?

@Anonymous

By the way, not sure if you know, you can use the "insert code" button to paste code keeping the formatting here. It's the one with the </> sign, next to the emoticon in the editor.

 

A tool you might find interesting is the DAX formatter by the Italian gurus. It formats/indents your code nicely, making it more readable and thus more maintainable.   

Anonymous
Not applicable

Haha, AIB i'm not going to claim credit for those lines of code (yet). After trying for a while i got some help from someone more experienced. I'm a veterinarian, not a computer scientist so my baseline DAX knowledge is low. However, with the help of yourself and the others in this great community i'm quickly progressing. Allready my Power BI sheets give me tenfold the information my old Excel sheets do. Everytime one of you help me with a line of code, i analyze it to understand why it is working. 

 

Again, thank you!

 

And for the future, any pointers on how to progress is much appreciated! 

Anonymous
Not applicable

Thank you for continiuing to look into this. Sorry for the late reply. I've attached the filtered public list in my Power BI. 

I'll get on to trying your solution aswell.

 

www.filedropper.com/emamectinbenzoatsorted

 

Cheers, 

edit@ Everything to the right of "virkestoff" is colums that i use to georaphicly place the treatment, and irrelevant here. 

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.