Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
SamTrexler
Helper IV
Helper IV

Extracting data from multiple XML files

I'm following the method in http://exceleratorbi.com.au/combine-excel-workbooks-power-query-method-1/ to extract fields from multiple XML files in a directory structure, and it works great. I've created three functions to process the three types of XML files I encounter. Each works perfectly for the files of it's type - including chasing down multiple files in folders and sub-folders of a directory tree.

 

However, I'm having trouble putting it all together into a single query of the top level folder that will call one of the three functions to process each file. 

 

Problem: A simple syntax issue, probably - I'm fairly new to M language. All of the files have similar structure, and the same preliminary steps are needed for all of them (open the source, remove and rename some columns, etc.), so I created a function that performs those steps and then will call one function or another, based on data it has found.

 

The preliminary steps result in a single column, with a table type that contains the portion of the XML document I'm interested in. I'd like to pass that table to any of the three functions (so I don't have to open the file again, etc.), but I can't seem to get the syntax right. All of the functions take a single parameter of type table, but I'm not seeing how to pass the table in and how to use it as the source table for the first  statement in the function. Everything I've tried hasn't worked and I haven't found any examplse of how to do this.

 

I'm also wondering about the best way to determine which function to call. One set of files has a particular extension so they're easy to identify and process. But the two others have the same file extension, so I need to see if the file contains a particular node in order to determine which of the other two functions to call. Can anyone guide me in how to set up the code for this? What are the steps to perform this sort of nested conditional logic?

 

Thanks in advance.

 

Sam Trexler

1 ACCEPTED SOLUTION

Hi,

 

I changed my xml to make my returned columns looks similiar as yours.

 

1.png

 

Formula Query4:

 

3.png

Query3 without invoking Query4:

 

5.png

This is the result:

 

6.png

 

Query3 invoking Query4:

2.png

This is the result:

 

4.png

 

Query4 works well in my local. The second column's name has updated to "init-param".

So I hope you can change your parameter of Query4 to table and try again.

 

Best Regards

Alex

View solution in original post

7 REPLIES 7
SamTrexler
Helper IV
Helper IV

@AlexChen, by the way - in the previous post I was a little unclear at one point. If I remove the "DataSources" column to pass just "DataSets", it works if I make the definition of Query4 

 

(DataSetTable as table) =>

 

and change the name of the table in the function call appropriately. In the sample I gave, I've changed it to "list" to try to get the whole record. I also tried changing it to "record".

 

I think what I need is the right definition for the function, and the right syntax in the function call to pass the record. But I haven't been able to figure that out.

Hi,

 

You need to define your parameter using table type:  (DataSetTable as table) =>.

 

1.png

 

Because the return type of Table.RemoveColumns() is a table and there is no automatical conversion for table/list/record. You have to convert it yourself.

 

2.png

 

I changed your code in Query4 to show parameter only.

 

3.png


This is my xml:

 

4.png

 

This is the result:

 

5.png

 

If you defined you parameter of Query4 using record type parameter,

 

6.png

 

You must change your input parameter to record. See sample code below.

 

7.png

 

This is the result:

 

8.png

 

It is the same for list type. If you defined you parameter of Query4 using list type parameter,

 

9.png

 

You have to send a list type value to it.

 

10.png

 

This is result:

 

11.png

 

Best Regards

Alex

@AlexChen, thanks. That helps me understand the language a little better.

 

But your example is basically different from mine. You have three columns, each a text field, and have to convert it to a table in order to pass it to Query4 (when that query accepts a table as the parameter).

 

However, my example already has the columns containing tables. As you can see in the screen shot highlighting "Removed Columns1", there are two columns and each contains a table. I can click the icon at the upper right of each column to expand the table. But I don't because I want to pass each of these as a table and let Query4 deal with the entire table.

 

If I remove the "DataSources" column also (in the "Removed Columns1" step), that leaves just the "DataSets" column which contains a table. And the step to execute Query4 works perfectly: "Removed Columns1" is a table which displays as a row with one column that contains a table that can be expanded, and that table is passed to Query4 perfectly.

 

However, if i keep the "DataSources" column so that the row has two columns each of which contains a table, I can't find a way to pass that row to Query4 so it can process the two tables just as if it had opened the XML file and removed the extraneous columns.

 

Any ideas how to do this? I need it to pass a flexible record structure (e.g., sometimes the "DataSources" column will be present and sometimes it won't) and let Query4 handle that structure.

 

Thanks for your help.

Hi,

 

I changed my xml to make my returned columns looks similiar as yours.

 

1.png

 

Formula Query4:

 

3.png

Query3 without invoking Query4:

 

5.png

This is the result:

 

6.png

 

Query3 invoking Query4:

2.png

This is the result:

 

4.png

 

Query4 works well in my local. The second column's name has updated to "init-param".

So I hope you can change your parameter of Query4 to table and try again.

 

Best Regards

Alex

@AlexChen, it works perfectly now. I know I tested with Query4 taking a table as a parameter - even as recently as yesterday. I'm not sure what's changed, but it's working now with the help you've given. Thanks so much.

AlexChen
Employee
Employee

Hi,

 

Can you show me the detailed context about your 3 functions and its parameters? What table are you going to pass to the 3 functions ?

 

If possible also please share me a sample code and tell me what syntax issue or error message have you got.

 

Best Regards

Alex

 

@AlexChen, thanks for your reply.

 

The context is extracting data from SSRS RDL files - which are XML files. Specifically, I want to retrieve the query in each file. So I've set up a top-level query that retrieves the directory listing, and filters it to *.rdl. Then I open each file and begin parsing it.

 

That works great - as long as the query is in the RDL file itself. But if it uses a Shared Dataset, the nodes in the RDL file are different, and I need to retrieve the query from the linked RSD file.

 

So here's the query that processes a file. (Note - the real query will pass in the path and file name - I have that working, but I'm using this for testing.)

 

Query3:

let
Source = Xml.Tables(File.Contents("C:\Users\sam.trexler\Documents\Reporting\NY Sample Reports\State\Amanda C's Folder\All DEC 2014 MY 2001.rdl")),
#"Removed Columns" = Table.RemoveColumns(Source,{"AutoRefresh"}),
#"Removed Columns1" = Table.RemoveColumns(#"Removed Columns",{"ReportSections", "http://schemas.microsoft.com/SQLServer/reporting/reportdesigner", "ReportParameters"}),
#"Process RDL" = Query4(#"Removed Columns1")
in
#"Process RDL"

 

And here's the start of the function to process the file:

 

Query4:

(DataSetTable as list) =>
let
#"Expanded DataSets" = Table.ExpandTableColumn(DataSetTable, "DataSets", {"DataSet"}, {"DataSets.DataSet"}),
#"Expanded DataSets.DataSet" = Table.ExpandTableColumn(#"Expanded DataSets", "DataSets.DataSet", {"Attribute:Name", "Query"}, {"DataSets.DataSet.Attribute:Name", "DataSets.DataSet.Query"}),
#"Expanded DataSets.DataSet.Query" = Table.ExpandTableColumn(#"Expanded DataSets.DataSet", "DataSets.DataSet.Query", {"CommandText"}, {"DataSets.DataSet.Query.CommandText"}),
#"Split Column by Delimiter" = Table.SplitColumn(#"Expanded DataSets.DataSet.Query","DataSets.DataSet.Query.CommandText",Splitter.SplitTextByDelimiter("SELECT", QuoteStyle.Csv),{"DataSets.DataSet.Query.CommandText.1", "DataSets.DataSet.Query.CommandText.2"}),
#"Removed Columns2" = Table.RemoveColumns(#"Split Column by Delimiter",{"DataSets.DataSet.Query.CommandText.1"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns2",{{"DataSets.DataSet.Query.CommandText.2", "ColumnList"}}),
#"Split Column by Delimiter1" = Table.SplitColumn(#"Renamed Columns","ColumnList",Splitter.SplitTextByDelimiter("FROM", QuoteStyle.Csv),{"ColumnList.1", "ColumnList.2"}),
#"Renamed Columns1" = Table.RenameColumns(#"Split Column by Delimiter1",{{"ColumnList.1", "ColumnList"}, {"ColumnList.2", "TableList"}})
in
#"Renamed Columns1"

 

Now here's the problem: At step "Removed Columns1" Query3 looks like this:

 

Query3.jpg

 

I've found that if I elminate the DataSources column, I can pass the DataSets table just fine, and process it. But I'd really like to pass the whole row. And here's the error I get:

Query3Error.jpg

 

I haven't been able to figure out the syntax to pass a record of tables - neither "list" nor "record" nor "table" work in the definition of Query4, regardless of the syntax I use in the call to the function. Is there a way to do this that I haven't tried?

 

Once I pass the whole record in and perform some processing on it, I'll need to check to see if the "DataSet" node exists, or the "SharedDataSet" node. Based on an "if" statement, I'll need to call one of two other functions to do the remaining processing. I haven't written an "if" yet, but I think I can handle that.

 

Any help is greatly appreciated. Thanks,

 

Sam

 

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.