Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and a 50 percent discount on exams.
Get startedEarn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
Hi Experts,
I have the following messy data and unorganized column extracted from PDF which requires separating the data in [ ] to another column.
For example,
AF-20KC - Airframe 20,000 FC Structural Check - [SN: 556] -
Result: AF-20KC - Airframe 20,000 FC Structural Check in column 1 and 556 in another column.
Also, I have these rows in which I don't want -Correction *** words. I need only upto SN: 556 and then separating into another column.
AF-192M - Airframe 192 Month Structural Check - [SN: 556] -Correction from period 01-Dec-2023 to 31-Dec-2023
There are some rows which don't have space for the SN like below. I would like to separate SN from that row too.
EN-PR - EN-PR - Engine Full Performance Restoration
P783272]
The complete data is below:
Description |
AF-96M - Airframe 96 Month Structural Check - [SN: 555] |
AF-BC - Airframe Basic Check - [SN: 555] |
AF-20KC - Airframe 20,000 FC Structural Check - [SN: 555] |
LG-OH - LG-OH - Landing Gear Leg Overhaul |
AF-192M - Airframe 192 Month Structural Check - [SN: 555] |
AP-OH - AP-OH - APU Overhaul |
EN-PR - EN-PR - Engine Full Performance Restoration P783243] |
EN-LP - EN-LP - Engine LLP Replacement |
EN-PR - EN-PR - Engine Full Performance Restoration P783245] |
LG-OH - LG-OH - Landing Gear Leg Overhaul - [SN: 556] -Correction from period 01-Dec-2023 to 31-Dec-2023 |
AF-192M - Airframe 192 Month Structural Check - [SN: 556] -Correction from period 01-Dec-2023 to 31-Dec-2023 |
AF-BC - Airframe Basic Check - [SN: 556] - Correction from period 01-Dec-2023 to 31-Dec-2023 |
AF-96M - Airframe 96 Month Structural Check - [SN: 556] -Correction from period 01-Dec-2023 to 31-Dec-2023 |
AF-20KC - Airframe 20,000 FC Structural Check - [SN: 556] - |
Correction from period 01-Dec-2023 to 31-Dec-2023 |
AP-OH - AP-OH - APU Overhaul - [SN: PWC-ZD0103] - Correction from period 01-Dec-2023 to 31-Dec-2023 |
- [SN: EN-LP - EN-LP - Engine LLP Replacement P783246]Correction from period 01-Dec-2023 to 31-Dec-2023 |
- [SN: EN-PR - EN-PR - Engine Full Performance Restoration P783246] - Correction from period 01-Dec-2023 to 31-Dec-2023 |
- [SN: P783255] EN-LP - EN-LP - Engine LLP Replacement Correction from period 01-Dec-2023 to 31-Dec-2023 |
- [SN: EN-PR - EN-PR - Engine Full Performance Restoration P783255] - Correction from period 01-Dec-2023 to 31-Dec-2023 |
AF-96M - Airframe 96 Month Structural Check - [SN: 557] |
AF-192M - Airframe 192 Month Structural Check - [SN: 557] |
AF-20KC - Airframe 20,000 FC Structural Check - [SN: 557] |
AF-BC - Airframe Basic Check - [SN: 557] |
EN-PR - EN-PR - Engine Full Performance Restoration P783258] |
EN-PR - EN-PR - Engine Full Performance Restoration P783259] |
AF-BC - Airframe Basic Check - [SN: 558] |
AF-96M - Airframe 96 Month Structural Check - [SN: 558] |
AF-192M - Airframe 192 Month Structural Check - [SN: 558] |
AF-20KC - Airframe 20,000 FC Structural Check - [SN: 558] |
EN-PR - EN-PR - Engine Full Performance Restoration P783257] |
EN-PR - EN-PR - Engine Full Performance Restoration P783260] |
AF-96M - Airframe 96 Month Structural Check - [SN: 561] |
AF-192M - Airframe 192 Month Structural Check - [SN: 561] |
AF-20KC - Airframe 20,000 FC Structural Check - [SN: 561] |
AF-BC - Airframe Basic Check - [SN: 561] |
EN-PR - EN-PR - Engine Full Performance Restoration P783274] |
EN-PR - EN-PR - Engine Full Performance Restoration P783277] |
AF-192M - Airframe 192 Month Structural Check - [SN: 559] |
AF-96M - Airframe 96 Month Structural Check - [SN: 559] |
AF-192M - Airframe 192 Month Structural Check - [SN: 556] |
AF-BC - Airframe Basic Check - [SN: 556] |
AF-96M - Airframe 96 Month Structural Check - [SN: 556] |
AF-20KC - Airframe 20,000 FC Structural Check - [SN: 556] |
EN-PR - EN-PR - Engine Full Performance Restoration P783246] |
EN-PR - EN-PR - Engine Full Performance Restoration P783255] |
AF-96M - Airframe 96 Month Structural Check - [SN: 560] |
AF-192M - Airframe 192 Month Structural Check - [SN: 560] |
AF-20KC - Airframe 20,000 FC Structural Check - [SN: 560] |
AF-BC - Airframe Basic Check - [SN: 560] |
EN-PR - EN-PR - Engine Full Performance Restoration P783247] |
EN-PR - EN-PR - Engine Full Performance Restoration P783271] |
AF-20KC - Airframe 20,000 FC Structural Check - [SN: 559] |
AF-BC - Airframe Basic Check - [SN: 559] |
EN-PR - EN-PR - Engine Full Performance Restoration P783272] |
EN-PR - EN-PR - Engine Full Performance Restoration P783273] @ |
@dufoq3 Thanks! It works now. Just last thing!
"Restoration" word for the Engine Full Performance is going into the second column as well. Whereas, I would like it to remain in the first column.
EN-PR - EN-PR - Engine Full Performance | Restoration P783243 |
Sample data does have those rows. For example, if you search this text in Bold in the dataset, you will find the related row. I am also providing the sample data again.
EN-PR - EN-PR - Engine Full Performance Restoration
P783277]
One additional question.
The sample data I had provided had only one column but in my actual data set, I have multiple columns which I want to populate in the final step. Is it the group by where the column names have to be added to show them as well? Please review my sample data.
DescriptionUtilisationColumn4RateAmount
|
@vineshparekh, try this. If there is still some issue, upload your pdf somewhere and provide me a download link please.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("1VlbT9swGP0rFs+u5ftlb1BWkFZYBZsmDfUhKgGq9YJC2e+fY6dt0lwaXG/SIiDYgu98t3P8JX14ODsfDYy8AQNwPs+esmSZAiPBzXq1eQH3m+x9tnnPkgUYvqSzX/aPHu5vPwEhxPQMnhGEsb3dfLu2P4G7FKZ+E2yvYmcKHdLFsAx0kbzNZ42mKSeI5nZG12VjTJWsa2goR3xnm+IvFesUQ4wxGA2PxWGEtzoalrEAcMZLO5zobSjjq8HXa2tmd09Wj/PVM7hKkwyM02fw9XeavSTvi4Y0UShFOUu7dREHMbRSDrsOrock+KAexY7Dmnjn9/fvZb9tWkj+r3a7UgQqKs4zpZDwBj/fDiZ31tDuvnqer1Iwel8swCTNntbZMlnNUnCXvm3WWbKZr1fWzv4rtzFRmlHO2pqAsD06o1DIXeYs6Hjiwf3dg4/t4i59XSSzdJmuNs3lJrkZ7ZdEQGYUkrFDau3r9pB6d9m+EeQUDIbrLEtnuSvgKVsvwWuazdePAJPBZTqzNKEMbNaA7Zf73gF/r0XjeHa0p3tpTO4LOHCmVrY+zlGJkfsFnLXJlIGSaiTI1r2Pi220zB1T5xAFzZ3LSWUOwZol1PZ6wayQiEDl+5iGbZ2c/BgOfl5iglm17LWK9/bDYMRq0R7oIhU7XSz86K1QTfIhpyEJq5elInUSYi6QoQdeRlK8Jpb18rqBVRWRFJAw29yi7LaDtOdgaJb/t+zmof6D7AYJlsqPOozM8elQWiR90rHShlU/JQQ3J2qdw6L9hsX80qXYepxKzjwhvoE6Bl8Hhw0yMu6IInTugPS57Jz6XM8oGn3uE6YlBeUGtSvI7RD3odzqjtK5awBAw67fDyaCDgonhAYx4guhhG6iX318ZFpuA4zXLAVfMJKqmzAcYsORMnHxJfaxO/RumePKDgw8uI8kaUGqixzHGqlTOsljcdxP5YiSp2lqAVccpJ2iQyD1696s98abOqRMQMKhsfFyEbc9FO9KZC9OxvOllSrtiQgUInNy1EFKa5oYcqhDBCrbr5qfEp/0QJwdJT3DdraRH3oqbTZdZzm1YybD4dlySET1nWQUL51OYQ+J1qAQiNJuoaZ2YmaI8MivXhw8Nf7Ny9HBRrHd24SY03pzBqqTgIJc8JJ+f/ykwF2V7Uu/kGMDBwUYcmJECbHH4YGb6FiXFGLFjkWexHnxVEOOvfnK96I3qyKN6HWFy/dyUp2gDKYOdZhiR1IpSkA9tNS0fUBBGcKyQne/EzeD1ClO/vFEw7OcRoRsNzQUlhCGRcZ378916+caglUdEu5xf/oH", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Description = _t, Utilisation = _t, Column4 = _t, Rate = _t, Amount = _t]),
SourceColNames = List.Buffer(Table.ColumnNames(Source)),
StepBack = Source,
TrimmedText = Table.TransformColumns(StepBack, List.Transform(SourceColNames, (colName)=> {colName, Text.Trim, type text})),
ReplacedValue = Table.ReplaceValue(TrimmedText, null, null,
(x,y,z)=> if List.Contains({"", "-"}, x) then null else x,
SourceColNames),
AddedIndex = Table.AddIndexColumn(ReplacedValue, "Index", 0, 1, Int64.Type),
Ad_GroupHelper = Table.AddColumn(AddedIndex, "GroupHelper", each if (try Text.Range([Description], 2, 1) otherwise null) = "-" then [Index] else null, type text),
FilledDown = Table.FillDown(Ad_GroupHelper,{"GroupHelper"}),
GroupedRows = Table.Group(FilledDown, {"GroupHelper"}, {{"Description", each Text.Combine([Description], " "), type text}, {"FirstRow", each Table.FirstN(Table.SelectColumns(Table.FillUp(_, SourceColNames), List.Skip(SourceColNames)), 1) , type table}}),
ExtractedTextBeforeDelimiter = Table.TransformColumns(GroupedRows, {{"Description", each
[ a = Text.PositionOf(_, "]", Occurrence.First) +1,
b = Text.Range(_, 0, if a > 0 then a else null )
][b], type text}}),
Ad_SN = Table.AddColumn(ExtractedTextBeforeDelimiter, "SN", each
[ a = Text.PositionOf([Description], "[SN") +1,
b = Text.PositionOf([Description], " ", Occurrence.Last),
c = Text.Range([Description], if a > 0 then a else if Text.Contains([Description], "]") then b else Text.Length([Description]))
][c], type text),
CleanDescription = Table.ReplaceValue(Ad_SN,
each [SN],
null,
(x,y,z)=> if y = "" then x else Text.TrimEnd(Text.Replace(x, y, ""), {"[", " ", "-"}),
{"Description"}),
CleanSN = Table.ReplaceValue(CleanDescription,
null,
null,
(x,y,z)=> if x = "" then null else Text.Replace(Text.Trim(x, {" ", "]"}), "SN: ", ""),
{"SN"}),
ReorderedColumns = Table.ReorderColumns(CleanSN,{"GroupHelper", "Description", "SN", "FirstRow"}),
ExpandedFirstRow = Table.ExpandTableColumn(ReorderedColumns, "FirstRow", {"Utilisation", "Column4", "Rate", "Amount"}, {"Utilisation", "Column4", "Rate", "Amount"}),
RemovedGroupHelper = Table.RemoveColumns(ExpandedFirstRow,{"GroupHelper"}),
ChangedType = Table.TransformColumnTypes(RemovedGroupHelper,{{"SN", type text}, {"Column4", type text}, {"Utilisation", Currency.Type}, {"Rate", Currency.Type}, {"Amount", Currency.Type}}, "en-US")
in
ChangedType
Thanks @dufoq3
This step is pulling null in the "GroupHelper" column and because of that, description gets combined for all those lines. I have attached screenshot of those lines. Thanks!
Table.AddColumn(#"Filtered Rows", "GroupHelper", each if (try Text.Range([Description], 2, 1) otherwise null) = "-" then [Index] else null, type text)
Hi @dufoq3
Thank you! I am sharing the actual data and the results with the queries.
Source Data: (Two Months)
Source.Name | Description | Column2 | Utilisation | Column4 | Rate | Amount |
Dec23-555.pdf | LG-OH - LG-OH - Landing Gear Leg Overhaul - [SN: 190555] -Correction from period 01-Dec-2023 to 31-Dec-2023 | 1.00 | 2652 | 2652 | ||
Dec23-555.pdf | AF-192M - Airframe 192 Month Structural Check - [SN: 190555] -Correction from period 01-Dec-2023 to 31-Dec-2023 | 1.00 | 610 | 610 | ||
Dec23-555.pdf | AF-BC - Airframe Basic Check - [SN: 190555] - Correction from period 01-Dec-2023 to 31-Dec-2023 | 260.23 | 37 | 9628.51 | ||
Dec23-555.pdf | AF-96M - Airframe 96 Month Structural Check - [SN: 190555] -Correction from period 01-Dec-2023 to 31-Dec-2023 | 1.00 | 702 | 702 | ||
Dec23-555.pdf | AF-20KC - Airframe 20,000 FC Structural Check - [SN: 190555] - | 99.00 | 4.4 | 435.6 | ||
Dec23-555.pdf | Correction from period 01-Dec-2023 to 31-Dec-2023 | |||||
Dec23-555.pdf | AP-OH - AP-OH - APU Overhaul - [SN: PWC-ZD0103] - Correction from period 01-Dec-2023 to 31-Dec-2023 | 90.30 | 25 | 2257.5 | ||
Dec23-555.pdf | - [SN: EN-LP - EN-LP - Engine LLP Replacement P783246]Correction from period 01-Dec-2023 to 31-Dec-2023 | 99.00 | 162.08 | 16045.92 | ||
Dec23-555.pdf | - [SN: EN-PR - EN-PR - Engine Full Performance Restoration P783246] - Correction from period 01-Dec-2023 to 31-Dec-2023 | 260.23 | 135 | 35131.05 | ||
Dec23-555.pdf | - [SN: P783255] EN-LP - EN-LP - Engine LLP Replacement Correction from period 01-Dec-2023 to 31-Dec-2023 | 99.00 | 162.08 | 16045.92 | ||
Dec23-555.pdf | - [SN: EN-PR - EN-PR - Engine Full Performance Restoration P783255] - Correction from period 01-Dec-2023 to 31-Dec-2023 | 260.23 | 135 | 35131.05 | ||
Dec23-555.pdf | ||||||
Nov23-555.pdf | LG-OH - LG-OH - Landing Gear Leg Overhaul | - [SN: 190555] | 1.00 | MTH | 2652 | 2652 |
Nov23-555.pdf | AF-192M - Airframe 192 Month Structural Check - [SN: 190555] | 1.00 | MTH | 610 | 610 | |
Nov23-555.pdf | AF-BC - Airframe Basic Check - [SN: 190555] | 263.40 | FH | 37 | 9745.8 | |
Nov23-555.pdf | AF-96M - Airframe 96 Month Structural Check - [SN: 190555] | 1.00 | MTH | 702 | 702 | |
Nov23-555.pdf | AF-20KC - Airframe 20,000 FC Structural Check - [SN: 190555] | 102.00 | FC | 4.4 | 448.8 | |
Nov23-555.pdf | AP-OH - AP-OH - APU Overhaul | - [SN: PWC-ZD0103] | 96.60 | APUH | 25 | 2415 |
Nov23-555.pdf | EN-LP - EN-LP - Engine LLP Replacement | - [SN: P783246] | 102.00 | FC | 162.08 | 16532.16 |
Nov23-555.pdf | EN-PR - EN-PR - Engine Full Performance Restoration P783246] | - [SN: | 263.40 | FH | 135 | 35559 |
Nov23-555.pdf | EN-LP - EN-LP - Engine LLP Replacement | - [SN: P783255] | 102.00 | FC | 162.08 | 16532.16 |
Nov23-555.pdf | EN-PR - EN-PR - Engine Full Performance Restoration P783255] | - [SN: | 263.40 | FH | 135 | 35559 |
Nov23-555.pdf |
Query Steps applied:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Filtered Rows1" = Table.SelectRows(Source, each ([Source.Name] = "Dec23-555.pdf" or [Source.Name] = "Nov23-555.pdf")),
#"Reordered Columns" = Table.ReorderColumns(#"Filtered Rows1",{"Description", "Source.Name", "Column2", "Utilisation", "Column4", "Rate", "Amount"}),
SourceColNames = List.Buffer(Table.ColumnNames(#"Reordered Columns")),
StepBack = #"Reordered Columns",
ReplacedValue = Table.ReplaceValue(StepBack, null, null,
(x,y,z)=> if List.Contains({"", "-"}, x) then null else x,
SourceColNames),
AddedIndex = Table.AddIndexColumn(ReplacedValue, "Index", 0, 1, Int64.Type),
Ad_GroupHelper = Table.AddColumn(AddedIndex, "GroupHelper", each if (try Text.Range([Description], 2, 1) otherwise null) = "-" then [Index] else null, type text),
#"Changed Type" = Table.TransformColumnTypes(Ad_GroupHelper,{{"GroupHelper", type number}}),
FilledDown = Table.FillDown(Ad_GroupHelper,{"GroupHelper"}),
GroupedRows = Table.Group(FilledDown, {"GroupHelper"}, {{"Description", each Text.Combine([Description], "-"), type text}, {"FirstRow", each Table.FirstN(Table.SelectColumns(Table.FillUp(_, SourceColNames), List.Skip(SourceColNames)), 1) , type table}}),
#"Changed Type1" = Table.TransformColumnTypes(GroupedRows,{{"GroupHelper", type number}})
in
#"Changed Type1"
Results:
GroupHelperDescription
|
Hi @vineshparekh,
check whether this is what you need.
Result (top 20 rows)
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("xVdbb4IwFP4rJz6PpIIU2NMUp0vmhWiWJTN9aLAqGYKpuN8/hCm4S7ScJnuhqfR837l9p7hYtLoDw6NjMKAbyZXkWwEehXGaZBuYZ/IQZgfJY/A3InzPDy3mk3uwbZu12F1h2/Prpj2+j8K/D5vk+eK4Se4IITDwr1ONhsb0Kf/1vPJkGSVrGAouYSTWMP0QcsMP8Ymr7ZkXUeX7W8PixfNxYgSz/N15TdZRImBwiGMIhFylcsuTUMBM7LNU8ixKk8IucFzL7FjsBDIKSpByLUFG+WYmdjEPxVYkmQY+1TRVQVMGhp9KKcIjJKxkuoWdkFG6BNI2+iLMy2ZakKVgVVtMllGEN/TbER++ERTmN5OoCwIVUxNZHAkL+4asQdkf1fryozWCV99465M2sS4TWgAocH3BKUihbGnKmsVW8SH09FsTqfEXOPlIUQn8/wI+Oto04EaKcRhmhjioO8VRub4chh/PtqsDxFNx22XNa+OiauOiaqMlUzpqRknzDNI2JoM16wYZrFlfb5PTYVSmnI4OENxA8BDt7qGYqYoqKcJNitIV1VCjjg6Q2v8BdV0RlK4ISldERVdER7p1jDEHNUzU7hxPh7+mDpD8k/Whxdgn", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Description = _t]),
AddedIndex = Table.AddIndexColumn(Source, "Index", 0, 1, Int64.Type),
Ad_GroupHelper = Table.AddColumn(AddedIndex, "GroupHelper", each if (try Text.Range([Description], 2, 1) otherwise null) = "-" then [Index] else null, type text),
FilledDown = Table.FillDown(Ad_GroupHelper,{"GroupHelper"}),
GroupedRows = Table.Group(FilledDown, {"GroupHelper"}, {{"Description", each Text.Combine([Description], " "), type text}}),
ExtractedTextBeforeDelimiter = Table.TransformColumns(GroupedRows, {{"Description", each
[ a = Text.PositionOf(_, "]", Occurrence.First) +1,
b = Text.Range(_, 0, if a > 0 then a else null )
][b], type text}}),
TrimmedText = Table.TransformColumns(ExtractedTextBeforeDelimiter,{{"Description", Text.Trim, type text}}),
Ad_SN = Table.AddColumn(TrimmedText, "SN", each
[ a = Text.PositionOf([Description], "[SN") +1,
b = Text.PositionOf([Description], " ", Occurrence.Last),
c = Text.Range([Description], if a > 0 then a else if Text.Contains([Description], "]") then b else Text.Length([Description]))
][c], type text),
CleanDescription = Table.ReplaceValue(Ad_SN,
each [SN],
null,
(x,y,z)=> if y = "" then x else Text.TrimEnd(Text.Replace(x, y, ""), {"[", " ", "-"}),
{"Description"}),
CleanSN = Table.ReplaceValue(CleanDescription,
null,
null,
(x,y,z)=> if x = "" then null else Text.Replace(Text.Trim(x, {" ", "]"}), "SN: ", ""),
{"SN"}),
RemovedGroupHelper = Table.RemoveColumns(CleanSN,{"GroupHelper"})
in
RemovedGroupHelper
Hi @dufoq3
Thank you for the response!
I am getting this error when I use the M code you provided. This is coming from Ad_GroupHelper step.
Expression.Error: The 'offset' argument is out of range.