Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and a 50 percent discount on exams.
Get startedEarn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
Good evening!
I need to extract multiple pdf files (I'm trying to use the Folder connector) where each file has a different number of pages. My limited knowledge of M code is all the more evident during this process: I'm stuck on the "sample file" which requests I select a "page" to transform. Basically I can't work out how to import all the possible pages from each file to access all available "relevant" data.
I've attached a zip file with three PDFs (each with a different number of pages) as a sample. I need to get access to all the pages from each file to be able to work on the transformations.
Any guidance will be a huge help!
Many thanks for your time.
Best,
Paul.
Proud to be a Super User!
Paul on Linkedin.
Hi Paul,
did you resolved the above the problem ? There are 28 tables in your first file so just want to confirm wether all the pdf files have same number of tables and strucures ? Also do you wish to extract all the 28 tables or a specific table ?
Thanks
Hi all
just joining the talk now. Has a solution been found for multiple pdfs with multiple pages?. I have exactly same situation and looking for a solution
You could always delete the sample query, and also delete the invocation of the sample creamery in your main file, which will give you just a list of the tables, which you can then select individually with another query and transform each separately.
--Nate
@watkinnc Thanks for the suggestion, but I'm not too sure what you mean. The source data is 34 PDF files with at least half a dozen pages each (where there are rows/text which I don't need mixed with data which I need to transform).
This is the interface I get when I select the folder connector. I need to select a "page" to access the sample file:
which leads to the following in the Transform file code:
= (Parameter1 as binary) => let
Source = Pdf.Tables(Parameter1, [Implementation="1.3"]),
Page001 = Source{[Id="Page001"]}[Data],
#"Changed Type1" = Table.TransformColumnTypes(Page001,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type text}, {"Column6", type text}, {"Column7", type text}, {"Column8", type text}}),
#"Changed Type" = Table.TransformColumnTypes(#"Changed Type1",{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type text}, {"Column6", type text}, {"Column7", type text}, {"Column8", type text}})
in
#"Changed Type"
where the code is set to "page1". The final Query loading the data leads with this code:
Source = Folder.Files("D:\OneDrive - In2-Action.com\Biniarbolla\Informes MB\Estructura corta"),
#"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
#"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])),
#"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
#"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File"}),
#"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File"))),
#"Changed Type" = Table.TransformColumnTypes(#"Expanded Table Column1",{{"Source.Name", type text}, {"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type text}, {"Column6", type text}, {"Column7", type text}, {"Column8", type text}}),
This only loads the first page for each file. How can I change it to load every page from each file?
Many thanks!
Proud to be a Super User!
Paul on Linkedin.