Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
Hello,
I have a use case that I would like some help with
Every period, I will receive a PDF statement, and my task is to extract the content of that PDF and populate it to Excel.
The problem is, sometimes there might be slight deviation in the PDF statement after I initially read it using power query, for example
Variation 1
Column1 | Column2 | Column3 | Column4 |
xx | xx | xx | xx |
xx | xx | xx | xx |
xx | xx | xx | xx |
Name | Date | Class | Category |
abc | 1-Jan | xyz | dd |
Variation 2
Column1 | Column2 | Column3 | Column4 | Column 5 |
xx | xx | xx | xx | xx |
xx | xx | xx | xx | xx |
xx | xx | xx | xx | xx |
xx | Name | Date | Class | Category |
abc | abc | 1-Jan | xyz | dd |
What I need is to find the position of the row with value "Name", and then delete whatever rows are above them.
Is there a way in power query to use an IF statement to update a variable?
For example,
If I read "column1" for any value with "Name", I wouldn't find it and it will get the position of -1. If such case, then I want power query to find row with "Name" on "column2".
Thank you
Solved! Go to Solution.
pos = List.PositionOf(
Table.ToRows(Source),
"Name",
Occurrence.First,
(x, y) => List.Contains(x, y)
)
finds first occurrence of "Name" in ANY column. If you really want narrow your search by first 2 columns (and make it less dynamic) then make this change in the code: List.Contains(List.FirstN(x, 2), y)
Hi @npust333, different approach here.
Before
After (add the code as a new step and replace Source with your previous step reference)
Table.Skip(Source, each not List.Contains(Record.ToList(_), "Name"))
Whole code with sample data
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("i45WqqhQ0sFCxOpQJOWXmJsKpFwSS0CUc05icTGIBnLT84sqweoSk5KBQhDSUNcrMQ9kRGUVkExJUYqNBQA=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Column1 = _t, Column2 = _t, Column3 = _t, Column4 = _t, Column5 = _t]),
#"Removed Top Rows" = Table.Skip(Source, each not List.Contains(Record.ToList(_), "Name"))
in
#"Removed Top Rows"
pos = List.PositionOf(
Table.ToRows(your_table),
{"Name", "Date", "Class", "Category"},
Occurrence.First,
(x, y) => List.ContainsAll(x, y)
)
Hi @AlienSx , thanks for the answer.
Is there a way to make it dynamic as we don't know how many columns will be there?
One pattern i notice from the files i received is that the "Name" value that i will look at is either in column1 or column 2 only, so the search can be reduced to only that scope
pos = List.PositionOf(
Table.ToRows(Source),
"Name",
Occurrence.First,
(x, y) => List.Contains(x, y)
)
finds first occurrence of "Name" in ANY column. If you really want narrow your search by first 2 columns (and make it less dynamic) then make this change in the code: List.Contains(List.FirstN(x, 2), y)
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.