Solved: Split varying number of columns into 4 columns eac...

PhilC · ‎02-17-2020

Hi,

I need to split a dynamic # of columns, each into 4 new columns with specific names (Surname, Initial, EmployeeID, Other)

Link to Excel file showing the source data and the output required is below. There is a query that does the initial transformation to achieve the dynamic columns. Just need to know how to then split all those columns irrespective of how many there are.

Sample file with Source data and required output

Thanks in advance.

Phil

ImkeF · ‎02-17-2020

Hi @PhilC

Excellent, that counts as an excuse 😉 (Otherwise: Everything that @Greg_Deckler said...)

Here we go:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("VY/NDoIwEIRfxfTMQYq/R3/DAQ0REw+EQ4MrJRRICvp03nwxd5eAcuh0kq+704lj4QpHHKrMgG3RHfG40kPdiMTpqJ9nWYN3yGw2YgG88pYGTwznqFuGEt1N2UaD4QGiC9QBRqmGolQV2gvTJeqeMcVHn/dd1w90V6arIZZoCC3Yia+qBmhBwE/WqLvfAp0bLNWv96Z9OJfD4JqqMXL/0VkVqnxaRSuYytG/fDBmQF5XN/kC", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [ID = _t, Surname = _t, Initial = _t, EmployeeID = _t, Other = _t]),
    #"Grouped Rows" = Table.Group(Source, {"ID"}, {{"IDTable", each Table.RemoveColumns(_, {"ID"})}, {"Count", each Table.RowCount(_), type number}}),
    AddColumnNames = Table.AddColumn(#"Grouped Rows", "NewColumnNames", each List.Combine( List.Transform({1..[Count]}, (x) => List.Transform(Table.ColumnNames([IDTable]), (t) => t & Text.From(x))))),
    AddNewTable = Table.AddColumn(AddColumnNames, "NewTable", each Table.FromRows({List.Combine(Table.ToRows([IDTable]))}, [NewColumnNames])),
    #"Expanded NewTable" = Table.ExpandTableColumn(AddNewTable, "NewTable", List.Distinct(List.Combine(AddNewTable[NewColumnNames]))),
    CleanUp = Table.RemoveColumns(#"Expanded NewTable",{"IDTable", "Count", "NewColumnNames"})
in
    CleanUp

You were on the right path with the grouping.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

View solution in original post

Greg_Deckler · ‎02-17-2020

In general, it is preferable to work with the data as you have it in your source data. Would be curious why you want it in your desired output format. Will definitely need to do a Group By operation. After that you will have to get some help from someone like @ImkeF , the Power Query guru extraordinaire.

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!: The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

PhilC · ‎02-17-2020

Thanks Greg,

It is indeed an odd layout. The data is for upload into a an academic software package.

Each row represents a publication.

There may be any number of authors (hence the dynamic # of columns piece).

For each author, there are 4 columns to be uploaded, Surname, Initial, ID, Other (not sure what this is yet, just know it is needed).

Yes, was hoping @ImkeF or @MarcelBeug might take an interest 🙂

ImkeF · ‎02-17-2020

Hi @PhilC

Excellent, that counts as an excuse 😉 (Otherwise: Everything that @Greg_Deckler said...)

Here we go:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("VY/NDoIwEIRfxfTMQYq/R3/DAQ0REw+EQ4MrJRRICvp03nwxd5eAcuh0kq+704lj4QpHHKrMgG3RHfG40kPdiMTpqJ9nWYN3yGw2YgG88pYGTwznqFuGEt1N2UaD4QGiC9QBRqmGolQV2gvTJeqeMcVHn/dd1w90V6arIZZoCC3Yia+qBmhBwE/WqLvfAp0bLNWv96Z9OJfD4JqqMXL/0VkVqnxaRSuYytG/fDBmQF5XN/kC", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [ID = _t, Surname = _t, Initial = _t, EmployeeID = _t, Other = _t]),
    #"Grouped Rows" = Table.Group(Source, {"ID"}, {{"IDTable", each Table.RemoveColumns(_, {"ID"})}, {"Count", each Table.RowCount(_), type number}}),
    AddColumnNames = Table.AddColumn(#"Grouped Rows", "NewColumnNames", each List.Combine( List.Transform({1..[Count]}, (x) => List.Transform(Table.ColumnNames([IDTable]), (t) => t & Text.From(x))))),
    AddNewTable = Table.AddColumn(AddColumnNames, "NewTable", each Table.FromRows({List.Combine(Table.ToRows([IDTable]))}, [NewColumnNames])),
    #"Expanded NewTable" = Table.ExpandTableColumn(AddNewTable, "NewTable", List.Distinct(List.Combine(AddNewTable[NewColumnNames]))),
    CleanUp = Table.RemoveColumns(#"Expanded NewTable",{"IDTable", "Count", "NewColumnNames"})
in
    CleanUp

You were on the right path with the grouping.

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

PhilC · ‎02-17-2020

Thanks very much for the incredibly quick response @ImkeF.

It is working as hoped with the small dataset, will test it in a couple of days with the full one.

Cheers

Phil

ImkeF · ‎02-17-2020

You're welcome @PhilC ,

if your data is sorted by ID already, you can speed up the processing considerably by using GroupKind.Local, like described here: https://blog.crossjoin.co.uk/2014/01/03/aggregating-by-local-groups-in-power-query/

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

PhilC · ‎11-17-2021

Hi @ImkeF ,

I have been using the above solution for a while, and while slow, has been manageable. I push this step out to a table so it is quicker when using in downstream queries.

I now have to add a lot more columns per author (total of 15), for around 2500 publications, and up to 41 authors (this is the max, most are under 5 authors).

It is now extremely slow (not 100% sure how long as I ran it and went to bed, but over 30mins and counting at the moment), so I was wondering if your suggestion re Chris's approach migh help.

Trouble is, your code is above my understanding, and I cannot work out how to incorporate the GroupKind.Local approach into your solution.

Any assistance would be very much appreciated.

Cheers, Phil

ImkeF · ‎11-18-2021

Hi @PhilC ,
please try this:

let
    Source = Table.FromRows(
        Json.Document(
            Binary.Decompress(
                Binary.FromText(
                    "VY/NDoIwEIRfxfTMQYq/R3/DAQ0REw+EQ4MrJRRICvp03nwxd5eAcuh0kq+704lj4QpHHKrMgG3RHfG40kPdiMTpqJ9nWYN3yGw2YgG88pYGTwznqFuGEt1N2UaD4QGiC9QBRqmGolQV2gvTJeqeMcVHn/dd1w90V6arIZZoCC3Yia+qBmhBwE/WqLvfAp0bLNWv96Z9OJfD4JqqMXL/0VkVqnxaRSuYytG/fDBmQF5XN/kC",
                    BinaryEncoding.Base64
                ),
                Compression.Deflate
            )
        ),
        let
            _t = ((type text) meta [Serialized.Text = true])
        in
            type table [ID = _t, Surname = _t, Initial = _t, EmployeeID = _t, Other = _t]
    ),
    Custom2 = Table.Group(
        Source,
        {"ID"},
        {
            {
                "IDTable",
                each Table.FromRows(
                    {{[ID]{0}} & List.Combine(Table.ToRows(Table.RemoveColumns(_, {"ID"})))},
                    {"ID"} & List.Combine(
                        List.Transform(
                            {1 .. Table.RowCount(_)},
                            (x) => {
                                "Surname" & Text.From(x),
                                "Initial" & Text.From(x),
                                "EmployeeID" & Text.From(x),
                                "Other" & Text.From(x)
                            }
                        )
                    )
                        
                )
            }
        }
    ),
    Custom3 = Table.Combine(Custom2[IDTable])
in
    Custom3

or, if the sorting is correct already, this might be even faster:

let
    Source = Table.FromRows(
        Json.Document(
            Binary.Decompress(
                Binary.FromText(
                    "VY/NDoIwEIRfxfTMQYq/R3/DAQ0REw+EQ4MrJRRICvp03nwxd5eAcuh0kq+704lj4QpHHKrMgG3RHfG40kPdiMTpqJ9nWYN3yGw2YgG88pYGTwznqFuGEt1N2UaD4QGiC9QBRqmGolQV2gvTJeqeMcVHn/dd1w90V6arIZZoCC3Yia+qBmhBwE/WqLvfAp0bLNWv96Z9OJfD4JqqMXL/0VkVqnxaRSuYytG/fDBmQF5XN/kC",
                    BinaryEncoding.Base64
                ),
                Compression.Deflate
            )
        ),
        let
            _t = ((type text) meta [Serialized.Text = true])
        in
            type table [ID = _t, Surname = _t, Initial = _t, EmployeeID = _t, Other = _t]
    ),
    Custom2 = Table.Group(
        Source,
        {"ID"},
        {
            {
                "IDTable",
                each Table.FromRows(
                    {{[ID]{0}} & List.Combine(Table.ToRows(Table.RemoveColumns(_, {"ID"})))},
                    {"ID"} & List.Combine(
                        List.Transform(
                            {1 .. Table.RowCount(_)},
                            (x) => {
                                "Surname" & Text.From(x),
                                "Initial" & Text.From(x),
                                "EmployeeID" & Text.From(x),
                                "Other" & Text.From(x)
                            }
                        )
                    )
                        
                )
            }
        }
     , GroupKind.Local
    ),
    Custom3 = Table.Combine(Custom2[IDTable])
in
    Custom3

Imke Feldmann (The BIccountant)

If you liked my solution, please give it a thumbs up. And if I did answer your question, please mark this post as a solution. Thanks!

How to integrate M-code into your solution -- How to get your questions answered quickly -- How to provide sample data -- Check out more PBI- learning resources here -- Performance Tipps for M-queries

PhilC · ‎11-21-2021

Hi @ImkeF , thanks for having a look at that. Tried both options and the speed has not noticeable changed. I have 12 columns now.

Out of interest, is the only real difference to the original solution effectively hard coding the columns?

Please do not spend any more time on this, your initial solution is operational and I can manage the timing so I do not have to run it urgently.

Appreciate all your time and assistance.

Cheers, Phil

Split varying number of columns into 4 columns each

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024