Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
Hello,
Does anyone have expirence manipulating XML CDATA in Power BI? In Power Query, I am executing the following:
let
Source = Xml.Document(Web.Contents("https://api.adaptiveinsights.com/api/v20",
[Content=File.Contents("Path of my xml file")]))
in
Source
This returns an XML document. When I click into the Value Table that is returned I end up with a column of data that contains all of the data from the request. The data is transactional level data with the headers and then all subsequent row level data. Each column in the row is seperated by a comma and rows are seperated by a space. I added an example of sample data below. Does anybody know how to go about parsing this into a data table that would be usable? My first thought was to turn it into a list but I am still struggling with how to parse it into something usable.
Value (column Name)
account name, account code, level name, 2014,2015,2016,2017 "1111 Account",1111,100,150,150,200"11112 Account",1112,200,250,100,100"11113 Account",11113,500,500,500,500
Any help is greatly appreciated.
Solved! Go to Solution.
Hi @rsimcoe,
the double quotes are correct, you use them inside a string in PowerQuery like in other programmic languages.
"foo ""bar""" is actually foo "bar".
I've improved the code, now you can have many columns which contain quoted strings.
let // source - header line and rows in one string Source = "account name, account code, level name, 2014,2015,2016,2017 ""1111 Account"",""foo bar"",315,100,150,150,200 ""11112 Account"",1112,315,200,250,100,100 ""11113 Account"",11113,315,500,500,500,500", // get position of the first quote - the position where header ends and rows data start PositionOfFirstQuote = Text.PositionOf(Source, """"), // header line doesn't contain strings in quotes and I have to process it separately HeaderLine = Text.Middle(Source, 0, PositionOfFirstQuote), // rows data Body = Text.Middle(Source, PositionOfFirstQuote), // headers are separated by comma, split them Headers = Text.Split(HeaderLine, ","), // create a table with one row and one column BodyTable = #table(1, {{Body}}), // split lines separated by space into columns (consider quotes too) BodyTableRowsAsColumns = Table.SplitColumn(BodyTable, "Column1", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv)), // transpose table (switch rows and columns) BodyTableTransposed = Table.Transpose(BodyTableRowsAsColumns), // split every row in column by comma SplitToColumns = Table.SplitColumn(BodyTableTransposed, "Column1", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), Headers) in SplitToColumns
Hi @rsimcoe,
it is possible for sure but I'm not sure that the provided data are correct. Please correct me if I am wrong but there are 7 header columns but the 3 following rows contain only 6 columns.
account name, account code, level name, 2014,2015,2016,2017
"1111 Account",1111,100,150,150,200
"11112 Account",1112,200,250,100,100
"11113 Account",11113,500,500,500,500
@NolockMy Apologies. The data would look something like this.
account name, account code, level name, 2014,2015,2016,2017
"1111 Account",1111,315,100,150,150,200
"11112 Account",1112,315,200,250,100,100
"11113 Account",11113,315,500,500,500,500
This is how the data comes across but its all combined into 1 column of data and I am struggling to find a way to parse it out.
Hi @rsimcoe,
I've written a short M-expression which should do what you need. Every line is commented and I hope you'll understand how it works.
By the way, Power BI can handle XML and I think the best way is to parse the XML file directly instead of taking all text of all ancessors and parse it as text. There is a way for sure. You can also create an example of an XML file which you would like to parse and I can help you to handle it.
let // source - header line and rows in one string Source = "account name, account code, level name, 2014,2015,2016,2017 ""1111 Account"",1111,315,100,150,150,200 ""11112 Account"",1112,315,200,250,100,100 ""11113 Account"",11113,315,500,500,500,500", // header line doesn't contain strings in apostrophs and I have to process it separately HeaderLine = Text.Middle(Source, 0, 60), // rows data Body = Text.Middle(Source, 60), // headers are separated by comma, split them Headers = Text.Split(HeaderLine, ","), // rows are separated by space+apostroph Rows = Text.Split(Body, " """), // columns are separated by comma RowsWithColumns = List.Transform(Rows, each Text.Split(_, ",")), // create a table from rows and headers ResultTable = Table.FromRows(RowsWithColumns, Headers), // remove remaining apostrophes from the column "account name" RemoveApostrophes = Table.ReplaceValue(ResultTable,"""","",Replacer.ReplaceText,{"account name"}) in RemoveApostrophes
@Nolock This almost worked! Thank you for posting this as it was really helpful for me in just learning some more about M.
When I send the api post request I am getting back the following. This is an example from the documentation.
In the Power Query Editor my Source data looks like the following.
In the output field is 1 row containing all of the data based on the api request I sent (you have already seen the sample data). Your solution worked perfect for the headers but the row level data was still getting parsed incorrectly. I noticed in your solution below you are showing two apostrophes before the Account number (ex: ""1111 Account"") but there is only 1 set. I also just noticed that the level column also includes apostrophes around the level name (systems name for department for my organization). Updated example below. I apolgize for this.
account name, account code, level name, 2014,2015,2016,2017
"1111 Account",1111,"315",100,150,150,200
"11112 Account",1112,"315",200,250,100,100
"11113 Account",11113,"315",500,500,500,500
Let me know if this is helpful. I am trying to get your solution to work as it gets me halfway there but I am still trying to separate the row data. Thanks again for your help.
Hi @rsimcoe,
the double quotes are correct, you use them inside a string in PowerQuery like in other programmic languages.
"foo ""bar""" is actually foo "bar".
I've improved the code, now you can have many columns which contain quoted strings.
let // source - header line and rows in one string Source = "account name, account code, level name, 2014,2015,2016,2017 ""1111 Account"",""foo bar"",315,100,150,150,200 ""11112 Account"",1112,315,200,250,100,100 ""11113 Account"",11113,315,500,500,500,500", // get position of the first quote - the position where header ends and rows data start PositionOfFirstQuote = Text.PositionOf(Source, """"), // header line doesn't contain strings in quotes and I have to process it separately HeaderLine = Text.Middle(Source, 0, PositionOfFirstQuote), // rows data Body = Text.Middle(Source, PositionOfFirstQuote), // headers are separated by comma, split them Headers = Text.Split(HeaderLine, ","), // create a table with one row and one column BodyTable = #table(1, {{Body}}), // split lines separated by space into columns (consider quotes too) BodyTableRowsAsColumns = Table.SplitColumn(BodyTable, "Column1", Splitter.SplitTextByDelimiter(" ", QuoteStyle.Csv)), // transpose table (switch rows and columns) BodyTableTransposed = Table.Transpose(BodyTableRowsAsColumns), // split every row in column by comma SplitToColumns = Table.SplitColumn(BodyTableTransposed, "Column1", Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv), Headers) in SplitToColumns
@NolockThank you so much for your help! This ended up working great. Much appreciated.
Glad to hear that 🙂