cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
rsimcoe Regular Visitor
Regular Visitor

Help converting XML CData

Hello,

 

Does anyone have expirence manipulating XML CDATA in Power BI? In Power Query, I am executing the following:

 

let
Source = Xml.Document(Web.Contents("https://api.adaptiveinsights.com/api/v20",
[Content=File.Contents("Path of my xml file")]))
in
Source

 

This returns an XML document. When I click into the Value Table that is returned I end up with a column of data that contains all of the data from the request. The data is transactional level data with the headers and then all subsequent row level data. Each column in the row is seperated by a comma and rows are seperated by a space. I added an example of sample data below. Does anybody know how to go about parsing this into a data table that would be usable? My first thought was to turn it into a list but I am still struggling with how to parse it into something usable.

 

Value (column Name)

account name, account code, level name, 2014,2015,2016,2017 "1111 Account",1111,100,150,150,200"11112 Account",1112,200,250,100,100"11113 Account",11113,500,500,500,500

 

Any help is greatly appreciated.

 

 

 

1 ACCEPTED SOLUTION

Accepted Solutions
Nolock Established Member
Established Member

Re: Help converting XML CData

Hi @rsimcoe,

the double quotes are correct, you use them inside a string in PowerQuery like in other programmic languages.

"foo ""bar""" is actually foo "bar".

I've improved the code, now you can have many columns which contain quoted strings.

 

let
    // source - header line and rows in one string
    Source = "account name, account code, level name, 2014,2015,2016,2017 ""1111 Account"",""foo bar"",315,100,150,150,200 ""11112 Account"",1112,315,200,250,100,100 ""11113 Account"",11113,315,500,500,500,500",
    
    // get position of the first quote - the position where header ends and rows data start
    PositionOfFirstQuote = Text.PositionOf(Source, """"),

    // header line doesn't contain strings in quotes and I have to process it separately
    HeaderLine = Text.Middle(Source, 0, PositionOfFirstQuote),
    // rows data
    Body = Text.Middle(Source, PositionOfFirstQuote),

    // headers are separated by comma, split them
    Headers = Text.Split(HeaderLine, ","),

    // create a table with one row and one column
    BodyTable = #table(1, {{Body}}),
    // split lines separated by space into columns (consider quotes too)
    BodyTableRowsAsColumns = Table.SplitColumn(BodyTable, "Column1", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv)),
    // transpose table (switch rows and columns)
    BodyTableTransposed = Table.Transpose(BodyTableRowsAsColumns),
    // split every row in column by comma
    SplitToColumns = Table.SplitColumn(BodyTableTransposed, "Column1", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), Headers)
in
    SplitToColumns
7 REPLIES 7
Nolock Established Member
Established Member

Re: Help converting XML CData

Hi @rsimcoe,

it is possible for sure but I'm not sure that the provided data are correct. Please correct me if I am wrong but there are 7 header columns but the 3 following rows contain only 6 columns.

 

account name, account code, level name, 2014,2015,2016,2017

"1111 Account",1111,100,150,150,200

"11112 Account",1112,200,250,100,100

"11113 Account",11113,500,500,500,500

rsimcoe Regular Visitor
Regular Visitor

Re: Help converting XML CData

@NolockMy Apologies. The data would look something like this.

 

account name, account code, level name, 2014,2015,2016,2017

"1111 Account",1111,315,100,150,150,200

"11112 Account",1112,315,200,250,100,100

"11113 Account",11113,315,500,500,500,500

 

This is how the data comes across but its all combined into 1 column of data and I am struggling to find a way to parse it out.

Nolock Established Member
Established Member

Re: Help converting XML CData

Hi @rsimcoe,

I've written a short M-expression which should do what you need. Every line is commented and I hope you'll understand how it works.

By the way, Power BI can handle XML and I think the best way is to parse the XML file directly instead of taking all text of all ancessors and parse it as text. There is a way for sure. You can also create an example of an XML file which you would like to parse and I can help you to handle it.

 

let
    // source - header line and rows in one string
    Source = "account name, account code, level name, 2014,2015,2016,2017 ""1111 Account"",1111,315,100,150,150,200 ""11112 Account"",1112,315,200,250,100,100 ""11113 Account"",11113,315,500,500,500,500",
    
    // header line doesn't contain strings in apostrophs and I have to process it separately
    HeaderLine = Text.Middle(Source, 0, 60),
    // rows data
    Body = Text.Middle(Source, 60),

    // headers are separated by comma, split them
    Headers = Text.Split(HeaderLine, ","),

    // rows are separated by space+apostroph
    Rows = Text.Split(Body, " """),
    // columns are separated by comma
    RowsWithColumns = List.Transform(Rows, each Text.Split(_, ",")),

    // create a table from rows and headers
    ResultTable = Table.FromRows(RowsWithColumns, Headers),

    // remove remaining apostrophes from the column "account name"
    RemoveApostrophes = Table.ReplaceValue(ResultTable,"""","",Replacer.ReplaceText,{"account name"})
in
    RemoveApostrophes

Capture.PNG

rsimcoe Regular Visitor
Regular Visitor

Re: Help converting XML CData

@Nolock  This almost worked! Thank you for posting this as it was really helpful for me in just learning some more about M.

 

When I send the api post request I am getting back the following. This is an example from the documentation.

 

Capture.PNG

 

In the Power Query Editor my Source data looks like the following.Capture2.PNG

 

In the output field is 1 row containing all of the data based on the api request I sent (you have already seen the sample data). Your solution worked perfect for the headers but the row level data was still getting parsed incorrectly. I noticed in your solution below you are showing two apostrophes before the Account number (ex: ""1111 Account"") but there is only 1 set. I also just noticed that the level column also includes apostrophes around the level name (systems name for department for my organization). Updated example below. I apolgize for this.

 

account name, account code, level name, 2014,2015,2016,2017

"1111 Account",1111,"315",100,150,150,200

"11112 Account",1112,"315",200,250,100,100

"11113 Account",11113,"315",500,500,500,500

 

Let me know if this is helpful. I am trying to get your solution to work as it gets me halfway there but I am still trying to separate the row data. Thanks again for your help.

Nolock Established Member
Established Member

Re: Help converting XML CData

Hi @rsimcoe,

the double quotes are correct, you use them inside a string in PowerQuery like in other programmic languages.

"foo ""bar""" is actually foo "bar".

I've improved the code, now you can have many columns which contain quoted strings.

 

let
    // source - header line and rows in one string
    Source = "account name, account code, level name, 2014,2015,2016,2017 ""1111 Account"",""foo bar"",315,100,150,150,200 ""11112 Account"",1112,315,200,250,100,100 ""11113 Account"",11113,315,500,500,500,500",
    
    // get position of the first quote - the position where header ends and rows data start
    PositionOfFirstQuote = Text.PositionOf(Source, """"),

    // header line doesn't contain strings in quotes and I have to process it separately
    HeaderLine = Text.Middle(Source, 0, PositionOfFirstQuote),
    // rows data
    Body = Text.Middle(Source, PositionOfFirstQuote),

    // headers are separated by comma, split them
    Headers = Text.Split(HeaderLine, ","),

    // create a table with one row and one column
    BodyTable = #table(1, {{Body}}),
    // split lines separated by space into columns (consider quotes too)
    BodyTableRowsAsColumns = Table.SplitColumn(BodyTable, "Column1", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv)),
    // transpose table (switch rows and columns)
    BodyTableTransposed = Table.Transpose(BodyTableRowsAsColumns),
    // split every row in column by comma
    SplitToColumns = Table.SplitColumn(BodyTableTransposed, "Column1", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), Headers)
in
    SplitToColumns
rsimcoe Regular Visitor
Regular Visitor

Re: Help converting XML CData

@NolockThank you so much for your help! This ended up working great. Much appreciated.

Nolock Established Member
Established Member

Re: Help converting XML CData

Glad to hear that Smiley Happy