Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
lizlizliz
Regular Visitor

Pulling data from multiple pdf forms, into workable format for PBI/PQ?

Hi all,
I have a set of completed pdf forms from which I need to extract the data and generate some analysis.

I have used acrobat's "merge to excel" function, which results in a single sheet with a row for each question and a separate column of responses for each file.

This format has proven difficult to analyse, so in the past I've manually moved the data around so all the responses are in a single column, in order to be able to graph it / chuck into a pivot table, etc.

 

0KXMw.png

Image description: example demonstrating how data looks when merged from pdf (table A) and also how it looks after I've manually reformatted it (table B).

 

Now, this isn't a particularly elegant solution, but it has worked ok for our purposes to date. However, our volume of data is about to increase from half a dozen files to around 100, so I really don't want to be dealing with having to reformat data manually in that volume.

I know I could throw some VBA at the situation to automate reformatting the data, however I feel like this shouldn't be necessary at all - that I ought  to be able to work with the data in its existing format using power query... however I just can't figure out how to make this work.

Can anyone please explain to me how to get data from table A in a way that can be used in PBI (or power pivot)? Or point me to any useful resources if this is well documented somewhere I've not managed to google my own way towards...

 

Many thanks in advance.

 

(nb: i've also asked this over on stackexchange, so please forgive me if you are seeing the question twice!)

1 ACCEPTED SOLUTION
v-eqin-msft
Community Support
Community Support

Hi @lizlizliz ,

 

Data sample:

Eyelyn9_0-1634190300313.png

 

1.Get data from Excel

Eyelyn9_1-1634190351697.png

 

2.Choose Transform Data to go into Power Query dialog.

Eyelyn9_2-1634190402072.png

 

3.Click Column1 ,then choose "Unpivot other columns"

Eyelyn9_3-1634190473419.png

4. Below is the final output after changing column names

Eyelyn9_4-1634190631905.png

Best Regards,
Eyelyn Qin
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

View solution in original post

2 REPLIES 2
v-eqin-msft
Community Support
Community Support

Hi @lizlizliz ,

 

Data sample:

Eyelyn9_0-1634190300313.png

 

1.Get data from Excel

Eyelyn9_1-1634190351697.png

 

2.Choose Transform Data to go into Power Query dialog.

Eyelyn9_2-1634190402072.png

 

3.Click Column1 ,then choose "Unpivot other columns"

Eyelyn9_3-1634190473419.png

4. Below is the final output after changing column names

Eyelyn9_4-1634190631905.png

Best Regards,
Eyelyn Qin
If this post helps, then please consider Accept it as the solution to help the other members find it more quickly.

Ehren
Employee
Employee

It sounds like you're looking for Unpivot. Try searching for this and you should find a number of helpful tutorials.

 

One other tip: you can use the Folder connector to connect to the PDF files, transform them using one of them as an example, apply the example file transformations to all the files, and then combine the results together into a single table.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors