cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
chezmo Occasional Visitor
Occasional Visitor

Re: PDFs as a data source

Extracting table data from PDF documents can be really tricky. For example if a table spans over several pages or if your PDF file is actually a scanned images. There are however PDF Parser solutions on the market which can batch convert PDF to Excel. One I know of is called Docparser.

sharmv Visitor
Visitor

Re: PDFs as a data source

Hello Jim.  We need a solution to pull in a PDF document from a website on a recurring basis and import to Power BI.

 

Can you please provide further details on the VBA solution can help?

 

Thanks.

Jim_Philips Frequent Visitor
Frequent Visitor

Re: PDFs as a data source

As I mentioned in my reply to your Private Message, if I can get a copy of the PDF file and a good description of the information you would like to extract from it to Excel, I could provide a detailed response. 

Ehren Regular Visitor
Regular Visitor

Re: PDFs as a data source

Just to close the loop on this request: importing from PDF files is currently a preview feature in Power BI Desktop. Please try it out and let us know what you think!

 

Ehren

mike_honey
Advisor

Re: PDFs as a data source

I've used the new preview feature quite a bit on one project. After a pothole in the December update it is now (Feb 2019 update) working quite effectively. Due to the nature of the data source, it's always going to be more art than science and need a lot of supporting work in your queries, but this is a very good option to look at.

Jam54 Frequent Visitor
Frequent Visitor

Re: PDFs as a data source

As the new release has made the PDF connector a GA, I was wondering the following:

 

Is there a function or script that can make extraction from PDF tables values automatic? such as data scraping from HTML websites but for a bluk of PDFs files?

 

such as;

If I select a folder with PDFs, can it look for tables in all containing the referenced values/words and only download those tables (automatically) ?

 

 

I haven't found a function that enables such.

Does anyone have any advice?

Jam54 Frequent Visitor
Frequent Visitor

Re: PDFs as a data source

As the new release has made the PDF connector a GA, I was wondering the following:

 

Is there a function or script that can make extraction from PDF tables values automatic? such as data scraping from HTML websites but for a bluk of PDFs files?

 

such as;

If I select a folder with PDFs, can it look for tables in all containing the referenced values/words and only download those tables (automatically) ?

 

 

I haven't found a function that enables such.

Does anyone have any advice?

Super User
Super User

Re: PDFs as a data source

@Jam54 ,

to my knowledge, all tables would be downloaded and then you can filter after their content.

Just create a table with one URL in each row and add a column where you extract all tables first. Then add another column that filters the column with the list of tables (use Table.Contains if you want to search for a word in all columns)

If you liked my solution, please give it a thumbs up. An if I did answer your question, please mark my post as a solution. Thanks!

Proud to be a Datanaut!

Imke Feldmann

How to integrate M-code into your solution -- Check out more PBI- learning resources here -- Performance Tipps for M-queries




Highlighted
dfox New Member
New Member

Re: PDFs as a data source

Anyone know when this functionality will be available for power query in excel?

guyhunkin Occasional Visitor
Occasional Visitor

Re: PDFs as a data source

Hi,

My team is currently working on this. I hope the PDF connector in Excel will be available for Office 365 subscribers earlier next year.

Guy

- Excel Team

Helpful resources

Announcements
New Topics Started Badges Coming

New Topics Started Badges Coming

We're releasing new versions of the badge that everyone's talking about. ;) Check your inbox for notifications.

MBAS 2020

Save the new date (and location)!

Our business applications community is growing—so we needed a different venue, resulting in a new date and location. See you there!

Difinity Conference

Difinity Conference

The largest Power BI, Power Platform, and Data conference in New Zealand

Top Solution Authors
Top Kudoed Authors (Last 30 Days)