Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
AnnaKSkantz
Regular Visitor

Prepare data to count no of month occurences for HouseholdID in a dataset

Hello!

I'm stuck on something that sounds pretty easy but when it comes to practise I only get confused.

I have a table called 'Bidragstid' which contains data abount households. The table have column [URVALFROM] of type date which is a monthly period for facts about the households. Although its a date type, there are no running dates, but shows rather a kind of year-month where the day always is day one in the month: YYYY-MM-01. The dataset contains monthly data from 2019 january to 2023 october. I have also created a real ÅÅÅÅ-MM column from [URVALFROM].

Furthermore my table has the column [HushållsID] which contains unique IDs of households. 
Here are the first rows shown in PQ:

AnnaKSkantz_0-1699841805517.png

 

Now, what I want to do is to transform the data so that I can count no of households by month occurences for the last 12 months. That is, I need to visualise (in Power BI), how many households that occured 1 month, 2 months, 3 months... to 12 months in the last 12 months for every unique time period in [URVALFROM] (or [ÅrMån], which is obviously the same thing). 

My idea is that the result will be this, where "AntalFörekomster" is no of month occurences, [Mätperiod] is every unique value in URVALFROM repeated for every category of no of month occurences and, finally, [Count] has the the number of households for each combined value of the prevous columns:

AnnaKSkantz_1-1699842349982.png


How do I do this? 

If needed I have made a list of every uniqe URVALFROM and another one for every uniqe HushållsId.

Note1 if one household occures, say 5 times (5 months) the last 12 months, it should not be in the lower categories of no month occurences, that is 4 times or below.
Note2. I have already removed duplicates of HushållsID and URVALFROM combined, so any household only occure once in a specific month, if at all. 

The only thing i've managed so far is to get the result for only the last 12 months in the dataset, which is wrong, as i need to have the Count for every time period i want to measure. See the code below:
-------------------------

#"Inserted Text Before Delimiter" = Table.AddColumn(#"Removed Other Columns1", "ÅrMån", each Text.BeforeDelimiter(Text.From([URVALFROM], "sv-SE"), "-", 1), type text),
#"Removed Duplicates" = Table.Distinct(#"Inserted Text Before Delimiter", {"URVALFROM", "HushållsID"}),
#"Grouped Rows" = Table.Group(#"Removed Duplicates", {"HushållsID", "ÅrMån", "URVALFROM"}, {{"AntalFörekomster", each Table.RowCount(_), Int64.Type}}),
#"Filtered Rows" = Table.SelectRows(#"Grouped Rows", each Date.IsInPreviousNMonths([URVALFROM], 12)),
#"Grouped Rows1" = Table.Group(#"Filtered Rows", {"HushållsID"}, {{"Förekomster", each Table.RowCount(_), Int64.Type}}),
#"Grouped Rows2" = Table.Group(#"Grouped Rows1", {"Förekomster"}, {{"Count", each Table.RowCount(_), Int64.Type}}),
#"Sorted Rows" = Table.Sort(#"Grouped Rows2",{{"Förekomster", Order.Ascending}})
in
#"Sorted Rows"

----------------------------------------

I assume this could be done in DAX also, but, considering there has to be an expression that calculates a new table, I would rather have it done in PQ. 


I'd be more than grateful if someone can help me sort things out here :-))

Best regards, Anna

1 ACCEPTED SOLUTION

Hello, @AnnaKSkantz I would get a list of periods and travel over the list to  select rows from table and do your grouping. Fold it into a function with single parameter - period. And use either List.Transform or List.Accumulate. Here is sample code with List.Transform ( I am using my own table of 2 columns - period and id):

let
    Source = period_id_table,
    date_type = Table.TransformColumnTypes(Source,{{"period", type date}}),
    data = Table.Buffer(Table.Distinct(date_type)),
    dates = List.Buffer(List.Distinct(data[period])),
    f = (d) => 
        [select = Table.SelectRows(data, each ([period] >= Date.AddMonths(d, -12)) and ([period] < d)),
        gr_by_id = Table.Group(select, "id", {"count", Table.RowCount}),
        g_by_count = Table.Group(gr_by_id, "count", {{"id_count", Table.RowCount}}),
        add_period = Table.AddColumn(g_by_count, "period", each d)][add_period],
    txform = 
        List.Transform(
            dates,
            f
        ),
    combine = Table.Combine(txform)
in
    combine

List.Accumulate may use the same function but it's slower than List.Transform even though we have to Table.Combine in the end. 

To make it clear: for every period you are looking for previous 12 months. So that the very first period won't have any calculated result, correct?

View solution in original post

8 REPLIES 8
AnnaKSkantz
Regular Visitor

Hi there, and thank you very much for taking your time.

Unfortunately, I don't see the solution, or more correctly, understand what to write/do. I tried to write a code but didn't succeed  

 Could you please fill me in at the end of this code?
...
#"Inserted Text Before Delimiter" = Table.AddColumn(#"Removed Other Columns1", "ÅrMån", each Text.BeforeDelimiter(Text.From([URVALFROM], "sv-SE"), "-", 1), type text),

#"Removed Duplicates" = Table.Distinct(#"Inserted Text Before Delimiter", {"URVALFROM", "HushållsID"}),
#"Grouped Rows" = Table.Group(#"Removed Duplicates", {"HushållsID", "ÅrMån", "URVALFROM"}, {{"Antal", each Table.RowCount(_), Int64.Type}}),
#"Changed Type2" = Table.TransformColumnTypes(#"Grouped Rows",{{"URVALFROM", type date}}),

#"TestSolution" = Table.AddColumn each Table.SelectRows(#"Changed Type2", "TableColumn" each ?
in
#"TestSolution"

🙈😎

So the easiest way I can think of is to first, Group By HushallsID and URVALFROM, and for your aggregation, choose "All Rows" and name the new column "Details" in the Group BY dialogue.  This will yield a table with one row for each ID and month combination, along with the full table on each row (the "Details" column).

 

Next, you need to filter the tables in the table column so that each table's ID is equal to only the ID for that row, and the date is >= 1 year before [URVALFROM].  To do this, add a new step, let's name it "Filters":

 

= Table.TransformColumns(#"Grouped by", {{"Details", each _[URVALFROM] >= Date.AddYears([URVALFROM], -1)}})

 

Then count the rows:

 

= Table.AddColumn(Filters, each Table.RowCount(_[Details]))

 

Now you have ID, MatPeriod, Details, AND Counts (which is the number of times the ID appeared in a month in the last 12 months, from 0-12.

 

Now Group again, by Matperiod and Counts, and add an aggregation column, named "Histogram", using the Count aggregation to count the values in the "Counts" columns.  

 

That should do it.  If not, kindly paste some of those data rows into this thread; I didn't feel like typing all the data that I'd need to replicate your data.

 

--Nate


I’m usually answering from my phone, which means the results are visualized only in my mind. You’ll need to use my answer to know that it works—but it will work!!

Hello, and thank you for such a detailed step by step explanation! 

Now I've added the the step below, so far, but I get only Error rows in the Details table column after the last step, and therefor the next step you added wont work either, what do you think went wrong? The first row Error says "Expression.Error: The Date value must contain the Date component.
Details:
[List]"

#"Inserted Text Before Delimiter" = Table.AddColumn(#"Removed Other Columns1", "ÅrMån", each Text.BeforeDelimiter(Text.From([URVALFROM], "sv-SE"), "-", 1), type text),
#"Removed Duplicates" = Table.Distinct(#"Inserted Text Before Delimiter", {"URVALFROM", "HushållsID"}),
#"Grouped by" = Table.Group(#"Removed Duplicates", {"URVALFROM", "HushållsID"}, {{"Details", each _, type table [URVALFROM=nullable date, HushållsID=text, ÅrMån=text]}}),
#"Filters" = Table.TransformColumns(#"Grouped by", {{"Details", each _[URVALFROM] >= Date.AddYears([URVALFROM], -1)}})
in
#"Filters"

Hello, @AnnaKSkantz I would get a list of periods and travel over the list to  select rows from table and do your grouping. Fold it into a function with single parameter - period. And use either List.Transform or List.Accumulate. Here is sample code with List.Transform ( I am using my own table of 2 columns - period and id):

let
    Source = period_id_table,
    date_type = Table.TransformColumnTypes(Source,{{"period", type date}}),
    data = Table.Buffer(Table.Distinct(date_type)),
    dates = List.Buffer(List.Distinct(data[period])),
    f = (d) => 
        [select = Table.SelectRows(data, each ([period] >= Date.AddMonths(d, -12)) and ([period] < d)),
        gr_by_id = Table.Group(select, "id", {"count", Table.RowCount}),
        g_by_count = Table.Group(gr_by_id, "count", {{"id_count", Table.RowCount}}),
        add_period = Table.AddColumn(g_by_count, "period", each d)][add_period],
    txform = 
        List.Transform(
            dates,
            f
        ),
    combine = Table.Combine(txform)
in
    combine

List.Accumulate may use the same function but it's slower than List.Transform even though we have to Table.Combine in the end. 

To make it clear: for every period you are looking for previous 12 months. So that the very first period won't have any calculated result, correct?

Hi, and thank you very much!
It works perfectly according to how I put my question... and I asked ChatGPT to explain the steps, so that I actually learn something 😆
However, you asked: "for every period you are looking for previous 12 months. So that the very first period won't have any calculated result, correct?" 
Sorry for not being clear. For every period I'm looking for current period and previous 11 months... and I guess thats why i get the wrong result. Would it be easy to adjust the code a little? 🙏

@AnnaKSkantz play with <, >, <= and >=

Thank you very much, it all works and counts exactly as I wish 🙂

watkinnc
Super User
Super User

This is a very interesting problem to solve. Not being near my computer, I would offer this restatement.

 

For each unique ID that appears on any given month, you have to attach a table of rows containing that unique ID where the date is >= the beginning of the month of the ([row date] - 12 months. So you would Use Table.AddColumn each Table.SelectRows(PriorStepOrTableName, "TableColumn", each  //date is >= the beginning of the month of the ([row date] - 12 months //([Date])))

 

You can then add a column that counts the rows in each table in the table column. Now for each ID and month, you have the number of times each ID appears in the preceding 12 months.

 

Now you can group by date and the count column that you made, and "count the count", so apply the Count aggregation to the count column that you made.

 

I think that should be all you need (conceptually), except to throw that data into a histogram and show them the business!

 

--Nate  


I’m usually answering from my phone, which means the results are visualized only in my mind. You’ll need to use my answer to know that it works—but it will work!!

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors