cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
M_Aird
Frequent Visitor

Dynamic data transformation with complex pivoted dataset

I have a data table taken from the Australian bureau of statistics (ABS) website which has multiple pivoted tables within it that have repeated rows and columns, that I'm trying to transform into something useable. I have done so using very manual steps that are very reliant on the structure and content of the data not changing. Unfortunately though, there tends to be changes made each time it is released and my queries no longer work (and it's time consuming to fix them). I'd also like a solution that I can apply to the other tables in the dataset. My main issue is extracting the headers text from the value columns, or the heading/sub-heading from the first column and splitting into separate columns. At the moment I do it quite manually:

eg. #"Added Conditional Column" = Table.AddColumn(Source, "Custom", each if [Column1] = "Residential" then "Residential" else if [Column1] = "Community" then "Community" else if [Column1] = "Other location" then "Other location" else if [Column1] = "Total" then "Total" else null), etc...

 

Sometimes they change the categories included though so it might not pick them all up next time.

 

This is what one of the tables looks like (sorry about the gory subject matter!), and the second one is what I want the end result to be.

 
 

Capture1.JPGCapture2.JPG

 

1 ACCEPTED SOLUTION

the code per the proportion part.

Only the second line is changed:

from ->    colsNum=List.FirstN(Table.ColumnNames(Foglio2_Sheet),numCols),

to ->        colsNum=List.RemoveRange(Table.ColumnNames(Foglio2_Sheet),1,numCols-1),

 

And the name NUMBER to PC in the third last line.

 

 

 

    numCols=List.PositionOf(Record.FieldValues(Foglio2_Sheet{2}),"PROPORTION (%)"),
    colsNum=List.RemoveRange(Table.ColumnNames(Foglio2_Sheet),1,numCols-1),
    tabNum=Table.SelectColumns(Foglio2_Sheet,colsNum),
    #"Removed Top Rows" = Table.Skip(tabNum,3),
    #"Promoted Headers" = Table.PromoteHeaders(#"Removed Top Rows", [PromoteAllScalars=true]),
    tab = Table.AddColumn(#"Promoted Headers", "Offence", each let noNull=List.RemoveMatchingItems( Record.FieldValues(_),{null,""})   in   if List.Count(noNull)=1 then noNull{0} else null),
    cols=Table.ColumnNames(tab),
    supLoc = List.Accumulate(tab[Location],{},(s,c)=> if c="" or List.Last(s)= "" or c="Total" then s&{c} else s&{List.Last(s)}),
   tab1= Table.FromColumns( Table.ToColumns(tab)&{supLoc},cols&{"supLocation"}),
    #"Filled Down" = Table.FillDown(tab1,{"Offence"}),
    #"Filtered Rows" = Table.SelectRows(#"Filled Down", each ([supLocation] <> null and [supLocation] <>"")),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Filtered Rows", {"Offence", "supLocation", "Location"}, "Year", "Value"),
    #"Sorted Rows" = Table.Sort(#"Unpivoted Other Columns",{{"Year", Order.Ascending},{"Offence", Order.Ascending},{"supLocation", Order.Ascending}}),
    #"Added Custom" = Table.AddColumn(#"Sorted Rows", "Measure", each "PC"),
    #"Reordered Columns" = Table.ReorderColumns(#"Added Custom",{"Year", "supLocation", "Location", "Offence", "Measure", "Value"})
in
    #"Reordered Columns"

 

 

 

 

View solution in original post

12 REPLIES 12
Rocco_sprmnt21
Community Champion
Community Champion

Hi @M_Aird ,

I think that, in order to give a tray, could be usefull  show what the original table looks like when loaded into PBI.

The rows, for instance, where are MURDER, HOMICIDE how are filled in PBI?
A second clarification is useful to understand how to distinguish  the location from the sub location. The position? The indentation?
Finally have an example of how categories can change:
do they change their name, position number?

waiting for your information, I made assumptions about the logic of the structure of the input table and, for the transformation, I used the following code,which you should hang at the end of the loaded table

 

 

   
....

 tab = Table.AddColumn(yourTab, "Offence", each if List.Count(List.RemoveNulls( Record.FieldValues(_)))=1 then List.RemoveNulls( Record.FieldValues(_)){0} else null),
    cols=Table.ColumnNames(tab),
    supLoc = List.Accumulate(tab[Location],{},(s,c)=> if c=null or List.Last(s)= null or c="Total" then s&{c} else s&{List.Last(s)}),
   tab1= Table.FromColumns( Table.ToColumns(tab)&{supLoc},cols&{"supLocation"}),
    #"Filled Down" = Table.FillDown(tab1,{"Offence"}),
    #"Filtered Rows" = Table.SelectRows(#"Filled Down", each ([supLocation] <> null)),
    #"Reordered Columns" = Table.ReorderColumns(#"Filtered Rows",{"Offence", "supLocation", "Location", "2010", "2011", "2012", "2013"}),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Reordered Columns", {"Offence", "supLocation", "Location"}, "Attribute", "Value")
in
    #"Unpivoted Other Columns"

 

 

 

here a screen shot of the result

 

image.png

This code is great thanks! The only thing is I had to delete the top row (which has the Measure field in it....it's almost like two tables side by side with cols 2 -10 being NUMBER and columns 11-19 being PROPORTION (%)) and promote the headers to get it to work, but then it's missing the measure and I can't work out how to include it....hopefully with the data source I sent it might make more sense? I did try the code with another table which didn't have that additional layer and it worked! so that's awesome.

 

 

 

 

 

the code that makes use of the complete tables and transformation criteria you have provided.
This is relative to the measure = number, for the measure = proportion if needed it is obtained with quick changes.

 

 

 

numCols=List.PositionOf(Record.FieldValues(Foglio2_Sheet{2}),"PROPORTION (%)"),
    colsNum=List.FirstN(Table.ColumnNames(Foglio2_Sheet),numCols),
    tabNum=Table.SelectColumns(Foglio2_Sheet,colsNum),
    #"Removed Top Rows" = Table.Skip(tabNum,3),
    #"Promoted Headers" = Table.PromoteHeaders(#"Removed Top Rows", [PromoteAllScalars=true]),
    tab = Table.AddColumn(#"Promoted Headers", "Offence", each let noNull=List.RemoveMatchingItems( Record.FieldValues(_),{null,""})   in   if List.Count(noNull)=1 then noNull{0} else null),
    cols=Table.ColumnNames(tab),
    supLoc = List.Accumulate(tab[Location],{},(s,c)=> if c="" or List.Last(s)= "" or c="Total" then s&{c} else s&{List.Last(s)}),
   tab1= Table.FromColumns( Table.ToColumns(tab)&{supLoc},cols&{"supLocation"}),
    #"Filled Down" = Table.FillDown(tab1,{"Offence"}),
    #"Filtered Rows" = Table.SelectRows(#"Filled Down", each ([supLocation] <> null and [supLocation] <>"")),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Filtered Rows", {"Offence", "supLocation", "Location"}, "Year", "Value"),
    #"Sorted Rows" = Table.Sort(#"Unpivoted Other Columns",{{"Year", Order.Ascending},{"Offence", Order.Ascending},{"supLocation", Order.Ascending}}),
    #"Added Custom" = Table.AddColumn(#"Sorted Rows", "Measure", each "NUMBER"),
    #"Reordered Columns" = Table.ReorderColumns(#"Added Custom",{"Year", "supLocation", "Location", "Offence", "Measure", "Value"})
in
    #"Reordered Columns"

 

 

 

 

 

Hi @M_Aird ,

about the other tableyou mentioned here:"This is another example of the variations of table layout I'm hoping this might work with....

 

is a totally different story.

 

Only to give an example, but there are many other differences:
as you can see and as far as I can interpret, the location and sub-location is not identified by the empty line but by the indentation.
So the logic for grouping should be adapted.

the code per the proportion part.

Only the second line is changed:

from ->    colsNum=List.FirstN(Table.ColumnNames(Foglio2_Sheet),numCols),

to ->        colsNum=List.RemoveRange(Table.ColumnNames(Foglio2_Sheet),1,numCols-1),

 

And the name NUMBER to PC in the third last line.

 

 

 

    numCols=List.PositionOf(Record.FieldValues(Foglio2_Sheet{2}),"PROPORTION (%)"),
    colsNum=List.RemoveRange(Table.ColumnNames(Foglio2_Sheet),1,numCols-1),
    tabNum=Table.SelectColumns(Foglio2_Sheet,colsNum),
    #"Removed Top Rows" = Table.Skip(tabNum,3),
    #"Promoted Headers" = Table.PromoteHeaders(#"Removed Top Rows", [PromoteAllScalars=true]),
    tab = Table.AddColumn(#"Promoted Headers", "Offence", each let noNull=List.RemoveMatchingItems( Record.FieldValues(_),{null,""})   in   if List.Count(noNull)=1 then noNull{0} else null),
    cols=Table.ColumnNames(tab),
    supLoc = List.Accumulate(tab[Location],{},(s,c)=> if c="" or List.Last(s)= "" or c="Total" then s&{c} else s&{List.Last(s)}),
   tab1= Table.FromColumns( Table.ToColumns(tab)&{supLoc},cols&{"supLocation"}),
    #"Filled Down" = Table.FillDown(tab1,{"Offence"}),
    #"Filtered Rows" = Table.SelectRows(#"Filled Down", each ([supLocation] <> null and [supLocation] <>"")),
    #"Unpivoted Other Columns" = Table.UnpivotOtherColumns(#"Filtered Rows", {"Offence", "supLocation", "Location"}, "Year", "Value"),
    #"Sorted Rows" = Table.Sort(#"Unpivoted Other Columns",{{"Year", Order.Ascending},{"Offence", Order.Ascending},{"supLocation", Order.Ascending}}),
    #"Added Custom" = Table.AddColumn(#"Sorted Rows", "Measure", each "PC"),
    #"Reordered Columns" = Table.ReorderColumns(#"Added Custom",{"Year", "supLocation", "Location", "Offence", "Measure", "Value"})
in
    #"Reordered Columns"

 

 

 

 

This is great - thanks so much!

Hi @M_Aird 

I tryed to send a private message tu you, but seems not possible for the moment 😕.

So here it is ..

 

I'm glad the code works for you. I will use the file you sent me to continue the development and add the management of the other aspects that I had left out pending your clarifications.
But I want to answer in the forum and for this you have to give me permission to use your data in the forum.
Bye

rocco

and I provided some clarifications around your initial questions but can't see them on here now? Are you still waiting on anything from me?

That's awesome thanks Rocco.

Yep that is fine - it is all publicly available data.

 

thanks again for your time!!

M_Aird
Frequent Visitor

This is another example of the variations of table layout I'm hoping this might work with....

Thanks so much Rocco_sprmnt21 for your responses, and apologies for the delay in getting the info back to you.  When loading to Power BI this is how it looks.....

In regards to determining whether column1 is a Location category or sub-category the rules are if it immediately follows a null value or = 'total' then it is a category, then the subsequent ones are sub-categories under that particular category, until you hit another null value. I hope that makes sense. In regards to the Offence category, these are all text fields in column 2. Measures (Number and %) are in separate columns in the second row.

When the data changes they can sometimes add in additional categories/sub-categories and offences, or they add in an additional year which means extra columns etc.

 

Here is a link to the excel file which has the source data along with some queries that shows my [basic] data transfromations. If you look at table_4 you'll see that it has 3 levels of categories which are only discernible by indentations which don't carry through to the loaded table....

Am having some issues with your code which I'll keep playing around with...

 

thanks!

 

Capture3.JPG

 

 

Helpful resources

Announcements
Power Platform Conf 2022 768x460.jpg

Join us for Microsoft Power Platform Conference

The first Microsoft-sponsored Power Platform Conference is coming in September. 100+ speakers, 150+ sessions, and what's new and next for Power Platform.

Power BI Dev Camp Session 23 768x460.jpg

Check it Out!

Mark your calendars and join us on Thursday, June 30 at 11a PDT for a great session with Ted Pattison!

June 20 episode 7 with aka link 768x460.jpg

The Power BI Community Show

Join us on June 20 at 11 am PDT when Kim Manis shares the latest on Azure Synapse analytics, the Microsoft Intelligent Data Platform, and notable Power BI Updates from Microsoft Build 2022.

Top Solution Authors