Re: Extracting multiple PDF files from a folder

PaulDBrown · ‎12-19-2022

Good evening!

I need to extract multiple pdf files (I'm trying to use the Folder connector) where each file has a different number of pages. My limited knowledge of M code is all the more evident during this process: I'm stuck on the "sample file" which requests I select a "page" to transform. Basically I can't work out how to import all the possible pages from each file to access all available "relevant" data.
I've attached a zip file with three PDFs (each with a different number of pages) as a sample. I need to get access to all the pages from each file to be able to work on the transformations.
Any guidance will be a huge help!

Many thanks for your time.

Best,

Paul.

Did I answer your question? Mark my post as a solution!
In doing so, you are also helping me. Thank you!

Proud to be a Super User!
Paul on Linkedin.

nidheshtiwari · ‎04-02-2023

Hi Paul,

did you resolved the above the problem ? There are 28 tables in your first file so just want to confirm wether all the pdf files have same number of tables and strucures ? Also do you wish to extract all the 28 tables or a specific table ?

Thanks

tmhwk · ‎12-20-2022

Hi all

just joining the talk now. Has a solution been found for multiple pdfs with multiple pages?. I have exactly same situation and looking for a solution

watkinnc · ‎12-19-2022

You could always delete the sample query, and also delete the invocation of the sample creamery in your main file, which will give you just a list of the tables, which you can then select individually with another query and transform each separately.

--Nate

I’m usually answering from my phone, which means the results are visualized only in my mind. You’ll need to use my answer to know that it works—but it will work!!

PaulDBrown · ‎12-20-2022

@watkinnc Thanks for the suggestion, but I'm not too sure what you mean. The source data is 34 PDF files with at least half a dozen pages each (where there are rows/text which I don't need mixed with data which I need to transform).

This is the interface I get when I select the folder connector. I need to select a "page" to access the sample file:

which leads to the following in the Transform file code:

= (Parameter1 as binary) => let
    Source = Pdf.Tables(Parameter1, [Implementation="1.3"]),
    Page001 = Source{[Id="Page001"]}[Data],
    #"Changed Type1" = Table.TransformColumnTypes(Page001,{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type text}, {"Column6", type text}, {"Column7", type text}, {"Column8", type text}}),
    #"Changed Type" = Table.TransformColumnTypes(#"Changed Type1",{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type text}, {"Column6", type text}, {"Column7", type text}, {"Column8", type text}})
in
    #"Changed Type"

where the code is set to "page1". The final Query loading the data leads with this code:

Source = Folder.Files("D:\OneDrive - In2-Action.com\Biniarbolla\Informes MB\Estructura corta"),
    #"Filtered Hidden Files1" = Table.SelectRows(Source, each [Attributes]?[Hidden]? <> true),
    #"Invoke Custom Function1" = Table.AddColumn(#"Filtered Hidden Files1", "Transform File", each #"Transform File"([Content])),
    #"Renamed Columns1" = Table.RenameColumns(#"Invoke Custom Function1", {"Name", "Source.Name"}),
    #"Removed Other Columns1" = Table.SelectColumns(#"Renamed Columns1", {"Source.Name", "Transform File"}),
    #"Expanded Table Column1" = Table.ExpandTableColumn(#"Removed Other Columns1", "Transform File", Table.ColumnNames(#"Transform File"(#"Sample File"))),
    #"Changed Type" = Table.TransformColumnTypes(#"Expanded Table Column1",{{"Source.Name", type text}, {"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type text}, {"Column6", type text}, {"Column7", type text}, {"Column8", type text}}),

This only loads the first page for each file. How can I change it to load every page from each file?

Many thanks!

Did I answer your question? Mark my post as a solution!
In doing so, you are also helping me. Thank you!

Proud to be a Super User!
Paul on Linkedin.

Poohkrd · ‎12-20-2022

Hi, Paul! Try this .pbix file like example.

Extracting multiple PDF files from a folder

Helpful resources

New forum boards available in Real-Time Intelligence.

Fabric Monthly Update - May 2024

Fabric certifications survey

Jumpstart your career with the Fabric Career Hub

Extracting multiple PDF files from a folder

Helpful resources

New forum boards available in Real-Time Intelligence.

Fabric Monthly Update - May 2024

Fabric certifications survey