Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
diego_martinezn
Frequent Visitor

Losing a column when importing PDF

I'm working with a folder where I have some PDFs, all with the same structure. 

I've developed a function to just upload the files to that folder and automatically apply all transformations and operations to the table. However, I've noticed that power query sometimes recognize the PDF as 2 tables and sometimes as 3 tables. 

My main problem is that when it recognizes 3 tables, it skips one column of data that I need. 

Does this problem have any solution? 

Thank you

10 REPLIES 10
v-yanjiang-msft
Community Support
Community Support

Hi @diego_martinezn ,

Could you please share your PDF without any sensitive data here? It will helpful to solve your problem.

Best Regards,
Community Support Team _ kalyj

watkinnc
Super User
Super User

To prevent this, I always add a ColumnCount column that tells me the number of columns in each pdf table.  Then I sort by greatest to least.  This way, your sample file has the maximum number of columns, and when you expand them, you will no longer lose any data!

 

--Nate


I’m usually answering from my phone, which means the results are visualized only in my mind. You’ll need to use my answer to know that it works—but it will work!!

Hi @watkinnc , first of all, thank u for answering.

Here it's what I've done based on your answer. I add a personalized column where I count the number of columns of each table and sort them, so what I get is three tables sorted as follows: {21columns, 11 columns, 9columns}.

Then I click on the expand button, and as you said the whole new table has 21 columns but it fills with "nulls" where my missing data should be. 

I think this procedure is what you meant right? Maybe I'm skipping some steps or don't do it correctly. Let me know 🙂

edhans
Super User
Super User

I'm not sure about the PDF connector, but check out your M code in the Advanced editor. The first line for  a CSV import though might look like this:

= Csv.Document(#"C:\Users\FileList txt",[Delimiter=":", Columns=2, Encoding=1252, QuoteStyle=QuoteStyle.None])

It doesn't matter if that file gets a 3rd column added, it will always only read the first two. I remove that.

= Csv.Document(#"C:\Users\FileList txt",[Delimiter=":", Encoding=1252, QuoteStyle=QuoteStyle.None])

 

See if your PDF connection has that same setting.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

Hi @edhans , thanks for answering :).

For the PDF the code generated automatically is this:

= Pdf.Tables(File.Contents("C:\whatever direction i have the file"), [Implementation="1.3"])

 Any thoughts on how to solve the issue?

Are all columns coming in, but your M code isn't handleing a 2 or 3 column table depending on the file? If so, check out this blog post. Removing/Selecting Columns - Pitfall #5 in #PowerBI and #PowerQuery - DataChant - particularly the MissingField.Ignore discussion.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

Hi @edhans 

Depending on the file (all have the same structure) it recognizes 2 or 3 tables.

When there are 2 tables there are no problems, but when it recognizes 3 there is a column that is not loaded I don't know why. It's not that I deleted it during transformation or something like that, it is missing from the beginning.

Without code or samples, next to impossible for me to help here. Not sure what the issue is. How to get good help fast. Help us help you.

How To Ask A Technical Question If you Really Want An Answer

How to Get Your Question Answered Quickly - Give us a good and concise explanation
How to provide sample data in the Power BI Forum - Provide data in a table format per the link, or share an Excel/CSV file via OneDrive, Dropbox, etc.. Provide expected output using a screenshot of Excel or other image. Do not provide a screenshot of the source data. I cannot paste an image into Power BI tables.



Did I answer your question? Mark my post as a solution!
Did my answers help arrive at a solution? Give it a kudos by clicking the Thumbs Up!

DAX is for Analysis. Power Query is for Data Modeling


Proud to be a Super User!

MCSA: BI Reporting

Okay @edhans , I'll try to explain better the issue:

So, I have a folder with some PDFs, while loading them, power query sometimes detects 2 tables and sometimes 3 tables. PWBI_1.PNG

 I'm not going to focus a lot on this image cause when I get 2 tables there are no issues.

What we need to know here is that both tables are loaded with 21 columns and the same structure (after loading I need to do some transformations but that's not a problem).

Let's see when I get 3 tables:

PWBI_2.PNG

This is the case where I'm having problems. I get 3 tables instead of 2 for a different PDF, but it has the same structure as the previous one. 

In this particular case, the first table has 21 columns with some null columns I need to delete etc. After some transformations, I end up with these columns: {Delegaciones, Total Revenues, Personnel Expenses, Subcontract, Other Expenses, Total Expenses, Gross Margin B. Activity, Indirect Costs Location, Gross Margin Location, Indirect Costs Location, Overheads, MB3}. No issues with this first table.

The problems come with the second and third tables. First of all, instead of taking 21 columns with all the data (as it does with the first table or with the previous PDF) it takes in the second table 9 columns and in the third table 11 columns. 

The fact that doesn't let me continue is that when this happens, the column "Gross Margin B. Activity" disappears, it never gets loaded neither in table 2 or 3.

I hope I've explained myself and help you help me 🙂 

Depending on the file (all have the same structure) it recognizes 2 or 3 tables.

When there are 2 tables there are no problems, but when it recognizes 3 there is a column that is not loaded I don't know why. It's not that I deleted it during transformation or something like that, it is missing from the beginning.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors