cancel
Showing results for 
Search instead for 
Did you mean: 
Reply
lizlizliz
Regular Visitor

Pulling data from multiple pdf forms, into workable format for PBI/PQ?

Hi all,
I have a set of completed pdf forms from which I need to extract the data and generate some analysis.

I have used acrobat's "merge to excel" function, which results in a single sheet with a row for each question and a separate column of responses for each file.

This format has proven difficult to analyse, so in the past I've manually moved the data around so all the responses are in a single column, in order to be able to graph it / chuck into a pivot table, etc.

 

0KXMw.png

Image description: example demonstrating how data looks when merged from pdf (table A) and also how it looks after I've manually reformatted it (table B).

 

Now, this isn't a particularly elegant solution, but it has worked ok for our purposes to date. However, our volume of data is about to increase from half a dozen files to around 100, so I really don't want to be dealing with having to reformat data manually in that volume.

I know I could throw some VBA at the situation to automate reformatting the data, however I feel like this shouldn't be necessary at all - that I ought  to be able to work with the data in its existing format using power query... however I just can't figure out how to make this work.

Can anyone please explain to me how to get data from table A in a way that can be used in PBI (or power pivot)? Or point me to any useful resources if this is well documented somewhere I've not managed to google my own way towards...

 

Many thanks in advance.

 

(nb: i've also asked this over on stackexchange, so please forgive me if you are seeing the question twice!)

1 ACCEPTED SOLUTION
Eyelyn9
Community Support
Community Support

Hi @lizlizliz ,

 

Data sample:

Eyelyn9_0-1634190300313.png

 

1.Get data from Excel

Eyelyn9_1-1634190351697.png

 

2.Choose Transform Data to go into Power Query dialog.

Eyelyn9_2-1634190402072.png

 

3.Click Column1 ,then choose "Unpivot other columns"

Eyelyn9_3-1634190473419.png

4. Below is the final output after changing column names

Eyelyn9_4-1634190631905.png

Best Regards,
Eyelyn Qin
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

2 REPLIES 2
Eyelyn9
Community Support
Community Support

Hi @lizlizliz ,

 

Data sample:

Eyelyn9_0-1634190300313.png

 

1.Get data from Excel

Eyelyn9_1-1634190351697.png

 

2.Choose Transform Data to go into Power Query dialog.

Eyelyn9_2-1634190402072.png

 

3.Click Column1 ,then choose "Unpivot other columns"

Eyelyn9_3-1634190473419.png

4. Below is the final output after changing column names

Eyelyn9_4-1634190631905.png

Best Regards,
Eyelyn Qin
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

Ehren
Microsoft
Microsoft

It sounds like you're looking for Unpivot. Try searching for this and you should find a number of helpful tutorials.

 

One other tip: you can use the Folder connector to connect to the PDF files, transform them using one of them as an example, apply the example file transformations to all the files, and then combine the results together into a single table.

Helpful resources

Announcements
UG GA Amplification 768x460.png

Launching new user group features

Learn how to create your own user groups today!

Power BI October Update 2021.jpg

Power BI Release

Click here to read more about the October 2021 Release!

Community Connections 768x460.jpg

Community & How To Videos

Check out the new Power Platform Community Connections gallery!

Teds Dev Camp Oct. 2021 768x460.jpg

Power BI Dev Camp - October 28th, 2021

Mark your calendars and join us for our next Power BI Dev Camp!

Top Solution Authors
Top Kudoed Authors