Solved: Re: Transforming rows to columns from a report out...

sevenhills · ‎03-16-2022

I am trying to transform rows to columns from report output file(s).

[Original post is updated after @BA_Pete reply, to cover duplicate medicine scenario]

I categorize as

Page Row (1 row - Title row containing "Pharmacy Report")
Header Rows (5 rows - "Patient ID" to "Pharmicist")
Items (Dynamic rows - "Prescription Items:" to "Items Count:", can span multiple pages for a given "Patient ID" )
Footer Rows (5 rows - "Prescription Date" to "Total Due")

Report data

001               Pharmacy Report     Date: 01-MAR-2022   Page: 1
Patient ID: 100
Name: Brenda, Chad T
Address: 123, East 12th Avenue, City
Pharmacy: Ground floor pharmacy
Pharmacist: Nyugen Tony

Prescription Items:
Medicine: Tylenol 500mg
Dosage: 2.0
Usage: Morning and Daily 1 tablet after food

Medicine: Oflaxicin 250mg
Dosage: 1.5
Usage: Morning 1 and night 0.5

Items Count: 2 

Prescription Date: 10-Feb-2022 10:30AM
Old Balance:    $0.00
Bill Amount:    $145.00
Bill Paid:      $145.00
Total Due:      $0.00

002               Pharmacy Report     Date: 01-MAR-2022   Page: 1
Patient ID: 101
Name: Joleen, Wright T
Address: 123, East 23th Avenue, City
Pharmacy: Ground floor pharmacy
Pharmacist:  Nyugen Tony

Prescription Items:
Medicine: Tylenol 500mg
Dosage: 2.0
Usage: Morning and Daily 1 tablet after food

Medicine: Bactrim DS 250mg
Dosage: 1.5
Usage: Morning 1 and night 0.5

Medicine: Thyorixn 50mcg
Dosage: 1.0
Usage: Morning 1.0

Medicine: Zyrtec
Dosage: 2.0
Usage: Morning 1 and night 1

003               Pharmacy Report     Date: 01-MAR-2022   Page: 2
Patient ID: 101
Name: Joleen, Wright T
Address: 123, East 23th Avenue, City
Pharmacy: Ground floor pharmacy
Pharmacist:  Nyugen Tony

Prescription Items:
Medicine: Hemarrhoid cream
Dosage: N/A 
Usage: Apply after each bowel movement

Items Count: 5 

Prescription Date: 11-Feb-2022 10:30AM
Old Balance:    $0.00
Bill Amount:    $245.00
Bill Paid:      $0.00
Total Due:      $245.00

004               Pharmacy Report     Date: 01-MAR-2022   Page: 1
Patient ID: 102
Name: Paul, Peter Chan
Address: 123, East 12th Avenue, City
Pharmacy: Emergency pharmacy
Pharmacist: Alan Sheppard

Prescription Items:
Medicine: Tylenol 250mg
Dosage: 1.0
Usage: Morning and Daily 1 tablet after food

Medicine: Amox 250mg
Dosage: 1.5
Usage: Morning 1 and night 0.5

Items Count: 2 

Prescription Date: 14-Feb-2022 02:30PM
Old Balance:    $0.00
Bill Amount:    $110.00
Bill Paid:      $110.00
Total Due:      $0.00

005               Pharmacy Report     Date: 01-MAR-2022   Page: 1
Patient ID: 103
Name: Cathy, Peter Chan
Address: 123, East 12th Avenue, City
Pharmacy: Emergency pharmacy
Pharmacist: Alan Sheppard

Prescription Items:
Medicine: Tylenol 250mg
Dosage: 1.0
Usage: Morning and Daily 1 tablet after food

Medicine: Amox 250mg
Dosage: 1.5
Usage: Morning 1 and night 0.5

Medicine: Amox 250mg
Dosage: 1.5
Usage: Morning 1 and night 0.5

Items Count: 3 

Prescription Date: 18-Feb-2022 02:30PM
Old Balance:    $0.00
Bill Amount:    $160.00
Bill Paid:      $160.00
Total Due:      $0.00

006               Pharmacy Report     Date: 01-MAR-2022   Page: 1
Patient ID: 109
Name: Jack, Musk E
Address: 123, East 23th Avenue, City
Pharmacy: Ground floor pharmacy
Pharmacist:  Nyugen Tony

Prescription Items:
Medicine: Tylenol 500mg
Dosage: 2.0
Usage: Morning and Daily 1 tablet after food

Medicine: Bactrim DS 250mg
Dosage: 1.5
Usage: Morning 1 and night 0.5

Medicine: Thyorixn 50mcg
Dosage: 1.0
Usage: Morning 1.0

Medicine: Bactrim DS 250mg
Dosage: 1.5
Usage: Morning 1 and night 0.5


007               Pharmacy Report     Date: 01-MAR-2022   Page: 2
Patient ID: 109
Name: Jack, Musk E
Address: 123, East 23th Avenue, City
Pharmacy: Ground floor pharmacy
Pharmacist:  Nyugen Tony

Prescription Items:
Medicine: Hemarrhoid cream
Dosage: N/A 
Usage: Apply after each bowel movement

Medicine: Zyrtec
Dosage: 2.0
Usage: Morning 1 and night 1

Medicine: Tylenol 500mg
Dosage: 2.0
Usage: Morning and Daily 1 tablet after food

Items Count: 7 

Prescription Date: 11-Feb-2022 10:30AM
Old Balance:    $0.00
Bill Amount:    $345.00
Bill Paid:      $0.00
Total Due:      $345.00

Rules:

* If Dosage is blank or N/A then chnage it to 1

Output expected: (or all data elements)

Seq Number	Report Name	Date	Page	Patient ID	Name	Pharmacy	Medicine	Dosage	Items Count	Prescription Date	Bill Amount
001	Pharmacy Report	01-MAR-2022	1	100	Brenda, Chad T	Ground floor pharmacy	Tylenol 500mg	2	2	10-Feb-2022 10:30AM	$145.00
001	Pharmacy Report	01-MAR-2022	1	100	Brenda, Chad T	Ground floor pharmacy	Oflaxicin 250mg	1.5	2	10-Feb-2022 10:30AM	$145.00
002	Pharmacy Report	01-MAR-2022	1	101	Joleen, Wright T	Ground floor pharmacy	Tylenol 500mg	2	5	11-Feb-2022 10:30AM	$245.00
002	Pharmacy Report	01-MAR-2022	1	101	Joleen, Wright T	Ground floor pharmacy	Bactrim DS 250mg	1.5	5	11-Feb-2022 10:30AM	$245.00
002	Pharmacy Report	01-MAR-2022	1	101	Joleen, Wright T	Ground floor pharmacy	Thyorixn 50mcg	1	5	11-Feb-2022 10:30AM	$245.00
002	Pharmacy Report	01-MAR-2022	1	101	Joleen, Wright T	Ground floor pharmacy	Zyrtec	2	5	11-Feb-2022 10:30AM	$245.00
003	Pharmacy Report	01-MAR-2022	2	101	Joleen, Wright T	Ground floor pharmacy	Hemarrhoid cream	1	5	11-Feb-2022 10:30AM	$245.00
004	Pharmacy Report	01-MAR-2022	1	102	Paul, Peter Chan	Emergency pharmacy	Tylenol 250mg	1	2	14-Feb-2022 02:30PM	$110.00
004	Pharmacy Report	01-MAR-2022	1	102	Paul, Peter Chan	Emergency pharmacy	Amox 250mg	1.5	2	14-Feb-2022 02:30PM	$110.00
005	Pharmacy Report	01-MAR-2022	1	103	Cathy, Peter Chan	Emergency pharmacy	Tylenol 250mg	1	3	18-Feb-2022 02:30PM	$160.00
005	Pharmacy Report	01-MAR-2022	1	103	Cathy, Peter Chan	Emergency pharmacy	Amox 250mg	1.5	3	18-Feb-2022 02:30PM	$160.00
005	Pharmacy Report	01-MAR-2022	1	103	Cathy, Peter Chan	Emergency pharmacy	Amox 250mg	1.5	3	18-Feb-2022 02:30PM	$160.00
006	Pharmacy Report	01-MAR-2022	1	109	Jack, Musk E	Ground floor pharmacy	Tylenol 500mg	2	7	11-Feb-2022 10:30AM	$345.00
006	Pharmacy Report	01-MAR-2022	1	109	Jack, Musk E	Ground floor pharmacy	Bactrim DS 250mg	1.5	7	11-Feb-2022 10:30AM	$345.00
006	Pharmacy Report	01-MAR-2022	1	109	Jack, Musk E	Ground floor pharmacy	Thyorixn 50mcg	1	7	11-Feb-2022 10:30AM	$345.00
006	Pharmacy Report	01-MAR-2022	1	109	Jack, Musk E	Ground floor pharmacy	Bactrim DS 250mg	1.5	7	11-Feb-2022 10:30AM	$345.00
007	Pharmacy Report	01-MAR-2022	2	109	Jack, Musk E	Ground floor pharmacy	Hemarrhoid cream	1	7	11-Feb-2022 10:30AM	$345.00
007	Pharmacy Report	01-MAR-2022	2	109	Jack, Musk E	Ground floor pharmacy	Zyrtec	2	7	11-Feb-2022 10:30AM	$345.00
007	Pharmacy Report	01-MAR-2022	2	109	Jack, Musk E	Ground floor pharmacy	Tylenol 500mg	2	7	11-Feb-2022 10:30AM	$345.00

19 rows

I started the power query as below:

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("7ZbdT9swEMD/lRPaY2G2Q2HrW0rZxqRCBZ0mDfFwJNfGwrEjx2Xkv8dN2pFWhM90YhN5iJK7i+/Dv9z5/HyLMQ6r1yhBm2JUwCllxrpSNkBHPWB8exiebgsmxNwOp17Gty4651sjdJK0g6OBlzBWyo4x9fq+JR1jBw4SjGFcKsI4tpTn3lIEHTjE3Pknl0B4TXpG3lS6olp1EUkPvloz0zFMlDEWsoW4biNz14PjYjYlDWOjK11l4H1FVmZOGg1HjtK8V8qHFMtIah/iuFCkjYIuY+m01A1MXiYndqpUflSvQ2O11FNAH8oApSqAg8NLRQ5w4sjCxJj4zvOdh5OJwpv5M4juug++073PBy+9aDlNHLCFSXkrU4ADXw+fsYCGRKsN42z7C11WG8ZZL2DhsDQ9UTH0UaGOvJG/PrCdxab1pVIQptXqcw3f7a7oRijjXgVKXTc2DhUMZrTU/VmxvDEmWqeM1yj7bhSR7sBPWxaskTMRtMHZ2wWtj5GzMoXBWQuk1QJPCmPljfaRp9H6ovdGvhSvLfSrsI6ip6Rej4rXQQpeBZL4L0D6RilamxgZQ2QJ05WCHn8MoV7RMMs8QBU3hFECl+Y3KUjNNaW+CA2dpftIZ+Gv7izigc7CGvtK7asFELutdxZRA2KEM9WBEc2r52eYfukEO0zJ+q32YTXSEPqywVlCWYY2fg4Oy75y3y/fTl/x+3bzt2fX7h1hTHjCRs+fXZw1zy7ezNj67Oq2TlhQI+wAXVK8I9YSYptlNniE2U+vZ3bvAWb3ns7sXuvMfq6PSYyuOjCc5Vdw+DZH5PtZq9X4FlTtt374+reo2szBq8Xz8eZ/gJV+uL/xU2LwolPi8quLWw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Line = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Line", type text}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
    #"Reordered Columns" = Table.ReorderColumns(#"Added Index",{"Index", "Line"}),
    #"Added Conditional Column" = Table.AddColumn(#"Reordered Columns", "Section", each if Text.Contains([Line], "Page:") then "Page Row" else if Text.Contains([Line], "Patient ID") 
        then "Header Patient" else if Text.Contains([Line], "Prescription Items:") then "Items" else if Text.Contains([Line], "Prescription Date:") then "Footer" else if Text.Contains([Line], "Total Due:") then "End Patient" else null),
    #"Sorted Rows" = Table.Sort(#"Added Conditional Column",{{"Index", Order.Ascending}}),
    #"Duplicated Column" = Table.DuplicateColumn(#"Sorted Rows", "Section", "Section - Begin"),
    #"Filtered Rows" = Table.SelectRows(#"Duplicated Column", each [Line] <> null and [Line] <> ""),
    #"Filled Down" = Table.FillDown(#"Filtered Rows",{"Section"}),
    #"Added Suffix" = Table.TransformColumns(#"Filled Down", {{"Section - Begin", each _ & " - Begin", type text}}),

    #"Added Conditional Column1" = Table.AddColumn(#"Added Suffix", "Seq Number", each if [#"Section - Begin"] = "Page Row - Begin" then Text.Trim(Text.Start([Line], 5)) else null, type nullable text),
    #"Filled Down1" = Table.FillDown(#"Added Conditional Column1",{"Seq Number"}),
    #"Added Conditional Column2" = Table.AddColumn(#"Filled Down1", "Patient ID", 
        each if [#"Section - Begin"] = "Header Patient - Begin" then Text.Trim(Text.AfterDelimiter([Line], "Patient ID: ")) else null  , type nullable text),
    #"Filled Down2" = Table.FillDown(#"Added Conditional Column2",{"Patient ID"}),
    #"Replaced Value" = Table.ReplaceValue(#"Filled Down2", each [Patient ID], each if [#"Section - Begin"] = "Page Row - Begin" then null else [Patient ID],Replacer.ReplaceValue,{"Patient ID"})
in
    #"Replaced Value"

from here, not sure how to take further ...

trying to think to use group by using one or two columns and split ...

Any thoughts?

BA_Pete · ‎03-17-2022

Hi @sevenhills ,

I've gone about it a slightly different way from you i.e. I've built up all the columns from the original table, then filled them around the [medicine] and [dosage] columns (as these are the only ones that can be distinct within a group).

I think the addition of the STOP row could be slicker, maybe using a record function, but couldn't get my head in gear for it.

The only difference I got to your example output was on poor old Joleen's special cream, where your output stated a dosage of 1, but the data had N/A. Probably easy enough for you to add a step into my code to fix this if required.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("3ZXdb9owEMD/lRPaIzDbwAtvoem2ToJGLdWkoT6Y5CCW/BE5piP//YwDIyCyrSuapuUhSu4u9+Vf7haLDiEUTq8k51bxtIIHLIx1QRZzh2MgtDeNHnqMMLaz42svo53n7qKTcCdQO7iLvYSQIJtx5fUTizrjXbjJeQbzoIiyzGJZeks26MItL51/cjlEL6g36E2Fq2qv+0zG8NGajc5gJY2xUOzFTRtRujHMqs0aNcyNrnW1gY+VWlE4YTTcOVTlOMinmIlUaJ/ivJKojYQRIWoddLEpQ3GsX5fyVL9OjdVCr4H7VGIuZAUUHF9KdMBXDi2sjMmOkY8R7leSb3fPwEbnMWh/dCkGDVG0WOcOyN4k3EIJcOP74Stm0FJofWCU9D7gsj4wSsYDEk2D6b3MYMIl16k38tc70t8f2kRICZGqve80dDg60SVcZOMalKZubhyXEG/woPvhMdwIYVenjDYo+2wkou7CFxsa1soZG1yDs38XtAlPnRUK4scrkNZIPK+MFVvtM1fpudOLmR/EZ46+VtZh+julN7OiTZAGbwKJ/RcgfULFrc2NyCC1yNVJQ2fvI2h2NCoKD1DNDfI0h6X5hhKUeUHlm9AyWUa/mCz0zZOF/WSykNa50vhqD8Tw6pOFNYBI+EZ2IcFd9/wO03+6wW4VWn/UPq1WGiLfNnjMsSi4zV6Dw2GuXPrlrzNX/Llt//buGh4JI8wTlrx+d1HSvrtoO2O15vk7", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Line = _t]),
    addSeqID = Table.AddColumn(Source, "seqID", each if Text.Contains([Line], "Total Due:") then "STOP" else if Text.Contains([Line], "Page:") then Text.BeforeDelimiter([Line], " ") else null, type text),
    addReportName = Table.AddColumn(addSeqID, "reportName", each if Text.Contains([Line], "Total Due:") then "STOP" else if [seqID] <> null then Text.BetweenDelimiters([Line], [seqID], "Date:") else null, type text),
    addDate = Table.AddColumn(addReportName, "date", each if Text.Contains([Line], "Total Due:") then "STOP" else if [seqID] <> null then Text.BetweenDelimiters([Line], "Date:", "Page:") else null),
    addPage = Table.AddColumn(addDate, "page", each if Text.Contains([Line], "Total Due:") then "STOP" else if [seqID] <> null then Text.AfterDelimiter([Line], "Page:") else null, type text),
    addPatientID = Table.AddColumn(addPage, "patientID", each if Text.Contains([Line], "Total Due:") then "STOP" else if Text.Contains([Line], "Patient ID:") then Text.AfterDelimiter([Line], "Patient ID:") else null, type text),
    addName = Table.AddColumn(addPatientID, "name", each if Text.Contains([Line], "Total Due:") then "STOP" else if Text.Contains([Line], "Name:") then Text.AfterDelimiter([Line], "Name:") else null, type text),
    addPharmacy = Table.AddColumn(addName, "pharmacy", each if Text.Contains([Line], "Total Due:") then "STOP" else if Text.Contains([Line], "Pharmacy:") then Text.AfterDelimiter([Line], "Pharmacy:") else null, type text),
    addMedicine = Table.AddColumn(addPharmacy, "medicine", each if Text.Contains([Line], "Total Due:") or Text.Contains([Line], "Usage:") then "STOP" else if Text.Contains([Line], "Medicine:") then Text.AfterDelimiter([Line], "Medicine:") else null, type text),
    addDosage = Table.AddColumn(addMedicine, "dosage", each if Text.Contains([Line], "Total Due:") or Text.Contains([Line], "Usage:") then "STOP" else if Text.Contains([Line], "Dosage:") then Text.AfterDelimiter([Line], "Dosage:") else null),
    addItemsCount = Table.AddColumn(addDosage, "itemCount", each if Text.Contains([Line], "Total Due:") then "STOP" else if Text.Contains([Line], "Items Count:") then Text.AfterDelimiter([Line], "Items Count:") else null),
    addPrescDate = Table.AddColumn(addItemsCount, "prescriptionDate", each if Text.Contains([Line], "Total Due:") then "STOP" else if Text.Contains([Line], "Prescription Date:") then Text.AfterDelimiter([Line], "Prescription Date:") else null),
    addBillAmount = Table.AddColumn(addPrescDate, "billAmount", each if Text.Contains([Line], "Total Due:") then "STOP" else if Text.Contains([Line], "Bill Amount:") then Text.AfterDelimiter([Line], "Bill Amount:") else null),
    trimTextTypeCols = Table.TransformColumns(addBillAmount,{{"seqID", Text.Trim, type text}, {"reportName", Text.Trim, type text}, {"date", Text.Trim, type text}, {"page", Text.Trim, type text}, {"patientID", Text.Trim, type text}, {"name", Text.Trim, type text}, {"pharmacy", Text.Trim, type text}, {"medicine", Text.Trim, type text}}),
    remLineCol = Table.RemoveColumns(trimTextTypeCols,{"Line"}),
    fillDownToMedicine = Table.FillDown(remLineCol,{"seqID", "reportName", "date", "page", "patientID", "name", "pharmacy", "medicine"}),
    fillUpFromDosage = Table.FillUp(fillDownToMedicine,{"dosage", "itemCount", "prescriptionDate", "billAmount"}),
    filterMedStopNull = Table.SelectRows(fillUpFromDosage, each ([medicine] <> null and [medicine] <> "STOP")),
    tableDistinct = Table.Distinct(filterMedStopNull)
in
    tableDistinct

Pete

Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!

View solution in original post

Anonymous · ‎03-18-2022

try change at those points in the way you see in the pictures

regarding your questions about using the table.group function it takes some time to make a detailed explanation.

You can find somethink here

View solution in original post

sevenhills · ‎03-22-2022

Hi @BA_Pete

Final code:
Raw Data

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("7ZbdT9swEMD/lRPaY2G2Q2HrW0rZxqRCBZ0mDfFwJNfGwrEjx2Xkv8dN2pFWhM90YhN5iJK7i+/Dv9z5/HyLMQ6r1yhBm2JUwCllxrpSNkBHPWB8exiebgsmxNwOp17Gty4651sjdJK0g6OBlzBWyo4x9fq+JR1jBw4SjGFcKsI4tpTn3lIEHTjE3Pknl0B4TXpG3lS6olp1EUkPvloz0zFMlDEWsoW4biNz14PjYjYlDWOjK11l4H1FVmZOGg1HjtK8V8qHFMtIah/iuFCkjYIuY+m01A1MXiYndqpUflSvQ2O11FNAH8oApSqAg8NLRQ5w4sjCxJj4zvOdh5OJwpv5M4juug++073PBy+9aDlNHLCFSXkrU4ADXw+fsYCGRKsN42z7C11WG8ZZL2DhsDQ9UTH0UaGOvJG/PrCdxab1pVIQptXqcw3f7a7oRijjXgVKXTc2DhUMZrTU/VmxvDEmWqeM1yj7bhSR7sBPWxaskTMRtMHZ2wWtj5GzMoXBWQuk1QJPCmPljfaRp9H6ovdGvhSvLfSrsI6ip6Rej4rXQQpeBZL4L0D6RilamxgZQ2QJ05WCHn8MoV7RMMs8QBU3hFECl+Y3KUjNNaW+CA2dpftIZ+Gv7izigc7CGvtK7asFELutdxZRA2KEM9WBEc2r52eYfukEO0zJ+q32YTXSEPqywVlCWYY2fg4Oy75y3y/fTl/x+3bzt2fX7h1hTHjCRs+fXZw1zy7ezNj67Oq2TlhQI+wAXVK8I9YSYptlNniE2U+vZ3bvAWb3ns7sXuvMfq6PSYyuOjCc5Vdw+DZH5PtZq9X4FlTtt374+reo2szBq8Xz8eZ/gJV+uL/xU2LwolPi8quLWw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Line = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Line", type text}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
    #"Reordered Columns" = Table.ReorderColumns(#"Added Index",{"Index", "Line"}),
    #"Added Conditional Column" = Table.AddColumn(#"Reordered Columns", "Section", each if Text.Contains([Line], "Page:") then "Page Row" else if Text.Contains([Line], "Patient ID") 
        then "Header Patient" else if Text.Contains([Line], "Prescription Items:") then "Items" else if Text.Contains([Line], "Items Count:") then "Footer" else if Text.Contains([Line], "Total Due:") then "End Patient" else null),
    #"Sorted Rows" = Table.Sort(#"Added Conditional Column",{{"Index", Order.Ascending}}),
    #"Duplicated Column" = Table.DuplicateColumn(#"Sorted Rows", "Section", "Section - Begin"),
    #"Filtered Rows" = Table.SelectRows(#"Duplicated Column", each [Line] <> null and [Line] <> ""),
    #"Filled Down" = Table.FillDown(#"Filtered Rows",{"Section"}),
    #"Added Suffix" = Table.TransformColumns(#"Filled Down", {{"Section - Begin", each _ & " - Begin", type text}}),

    #"Added Conditional Column1" = Table.AddColumn(#"Added Suffix", "Seq Number", each if [#"Section - Begin"] = "Page Row - Begin" then Text.Trim(Text.Start([Line], 5)) else null, type nullable text),
    #"Filled Down1" = Table.FillDown(#"Added Conditional Column1",{"Seq Number"}),
    #"Added Conditional Column2" = Table.AddColumn(#"Filled Down1", "Patient ID", 
        each if [#"Section - Begin"] = "Header Patient - Begin" then Text.Trim(Text.AfterDelimiter([Line], "Patient ID: ")) else null  , type nullable text),
    #"Filled Down2" = Table.FillDown(#"Added Conditional Column2",{"Patient ID"}),
    #"Replaced Value" = Table.ReplaceValue(#"Filled Down2", each [Patient ID], each if [#"Section - Begin"] = "Page Row - Begin" then null else [Patient ID],Replacer.ReplaceValue,{"Patient ID"}),
    #"Grouped Rows" = Table.Group(#"Replaced Value", {"Section", "Patient ID"}, {{"Rows Count", each Table.RowCount(_), Int64.Type}, {"First Row", each List.Min([Index]), type number}, {"End Row", each List.Max([Index]), type number}, {"TempTable100", each _, type table [Index=number, Line=nullable text, Section=text, #"Section - Begin"=text, Seq Number=nullable text, Patient ID=nullable text]}})
in
    #"Grouped Rows"

Data Prep - Page Row

let
    Source = #"Raw Data",
    #"Filtered Rows" = Table.SelectRows(Source, each ([Section] = "Page Row")),
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"TempTable100"}),
    #"Expanded TempTable100" = Table.ExpandTableColumn(#"Removed Other Columns", "TempTable100", {"Index", "Line", "Section", "Section - Begin", "Seq Number", "Patient ID"}, {"Index", "Line", "Section", "Section - Begin", "Seq Number", "Patient ID"}),
    #"Split Column by Positions" = Table.SplitColumn(#"Expanded TempTable100", "Line", Splitter.SplitTextByPositions({0, 18, 44, 58, 64}), {"Line.1", "Line.2", "Line.3", "Line.4", "Line.5"}),
    #"Replaced Value" = Table.ReplaceValue(#"Split Column by Positions","     Date: ","",Replacer.ReplaceText,{"Line.2"}),
    #"Removed Columns" = Table.RemoveColumns(#"Replaced Value",{"Line.4", "Section", "Section - Begin", "Patient ID", "Line.1"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Line.5", "Page"}, {"Line.3", "Date"}, {"Line.2", "Title"}}),
    #"Reordered Columns" = Table.ReorderColumns(#"Renamed Columns",{"Seq Number", "Index", "Title", "Date", "Page"})
in
    #"Reordered Columns"

Data Prep - HF Rows

et
    Source = #"Raw Data",
    #"Filtered Rows" = Table.SelectRows(Source, each ([Section] <> "Page Row" and [Section] <> "Items" )),
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"TempTable100"}),
    #"Expanded TempTable100" = Table.ExpandTableColumn(#"Removed Other Columns", "TempTable100", {"Index", "Line", "Section", "Section - Begin", "Seq Number", "Patient ID"}, {"Index", "Line", "Section", "Section - Begin", "Seq Number", "Patient ID"}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Expanded TempTable100", "Line", Splitter.SplitTextByDelimiter(": ", QuoteStyle.Csv), {"Line.1", "Line.2"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Line.1", type text}, {"Line.2", type text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"Index", "Section", "Section - Begin", "Seq Number"}),
    #"Trimmed Text" = Table.TransformColumns(#"Removed Columns",{{"Line.1", Text.Trim, type text}, {"Line.2", Text.Trim, type text}}),
    #"Filtered Rows1" = Table.SelectRows(#"Trimmed Text", each ([Line.1] <> "Patient ID")),
    #"Removed Duplicates" = Table.Distinct(#"Filtered Rows1"),
    #"Pivoted Column" = Table.Pivot(#"Removed Duplicates", List.Distinct(#"Removed Duplicates"[Line.1]), "Line.1", "Line.2")
in
    #"Pivoted Column"

Data Prep - Items

let
    Source = #"Raw Data",
    #"Filtered Rows" = Table.SelectRows(Source, each ([Section] = "Items" )),
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"TempTable100"}),
    #"Expanded TempTable100" = Table.ExpandTableColumn(#"Removed Other Columns", "TempTable100", {"Index", "Line", "Section", "Section - Begin", "Seq Number", "Patient ID"}, {"Index", "Line", "Section", "Section - Begin", "Seq Number", "Patient ID"}),
    #"Filtered Rows1" = Table.SelectRows(#"Expanded TempTable100", each ([Line] <> "Prescription Items:") and not Text.StartsWith([Line], "Usage")),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Filtered Rows1", "Line", Splitter.SplitTextByDelimiter(": ", QuoteStyle.Csv), {"Line.1", "Line.2"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Line.1", type text}, {"Line.2", type text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"Section", "Section - Begin"}),
    #"Trimmed Text" = Table.TransformColumns(#"Removed Columns",{{"Line.1", Text.Trim, type text}, {"Line.2", Text.Trim, type text}}),
    #"Added Conditional Column" = Table.AddColumn(#"Trimmed Text", "G1", 
          each 
          if [Line.1] = "Medicine" then "G1_L_" & Number.ToText([Index])
          else if [Line.1] = "Items Count" then "G2_" & [Patient ID]
          else null),
    #"Sorted Rows" = Table.Sort(#"Added Conditional Column",{{"Index", Order.Ascending}}),
    #"Filled Down" = Table.FillDown(#"Sorted Rows",{"G1"}),
    #"Removed Columns1" = Table.RemoveColumns(#"Filled Down",{"Index"}),
    #"Pivoted Column" = Table.Pivot(#"Removed Columns1", List.Distinct(#"Removed Columns1"[Line.1]), "Line.1", "Line.2"),
    #"Replaced Value" = Table.ReplaceValue(#"Pivoted Column","N/A","1.0",Replacer.ReplaceText,{"Dosage"})
in
    #"Replaced Value"

Data

let
    Source = Table.NestedJoin(#"Data Prep - HF Rows", {"Patient ID"}, #"Data Prep - Items", {"Patient ID"}, "Raw Data - Items", JoinKind.LeftOuter),
    #"Expanded Raw Data - Items" = Table.ExpandTableColumn(Source, "Raw Data - Items", {"Seq Number", "Medicine", "Dosage"}, {"Seq Number", "Medicine", "Dosage"}),
    #"Merged Queries" = Table.NestedJoin(#"Expanded Raw Data - Items", {"Seq Number"}, #"Data Prep - Page Row", {"Seq Number"}, "Raw Data - Page Row", JoinKind.LeftOuter),
    #"Expanded Raw Data - Page Row" = Table.ExpandTableColumn(#"Merged Queries", "Raw Data - Page Row", {"Title", "Date", "Page"}, {"Title", "Date", "Page"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Expanded Raw Data - Page Row",{"Patient ID", "Name", "Items Count", "Prescription Date", "Medicine", "Dosage", "Old Balance", "Bill Amount", "Bill Paid", "Total Due", "Seq Number", "Title", "Date", "Page", "Address", "Pharmacy", "Pharmacist"})
in
    #"Reordered Columns"

I will be happy to hear comments/updates/suggestions.

Converted as functions, used to process each file in the folder.

View solution in original post

sevenhills · ‎03-22-2022

I am able to process the data and got what I needed.

In a given file, I classified as sections and processed as three segments.

a) For a given Patient ID, identify the static rows & sections

i.e., header, footer, end row. Use split by ": " and get the info needed

b) For a given Patient ID, identify the dynamic rows
i.e., items. Use sort, split by ": ", fill down and fill up.

c) For a given Patient ID, identify the extended info,
i.e., rows with Seq Number, PAGE: , spanned across multiple pages and get info needed.

Join ((a) + (b)) + (c)

Question I have to figure out is the best way to add these steps in order to process all files in the folder.

Thank you @Anonymous , @BA_Pete

Anonymous · ‎03-22-2022

import-data-from-a-folder-with-multiple-files-power-query

sevenhills · ‎03-22-2022

Thank you.

I have been using the same for long time in many projects. As I used mockup data for this post, actual files has lot more combinations. Working on it for nested functions / transformation at each file level and clubbing them. 🙂

sevenhills · ‎03-24-2022

Hi @Anonymous

I am closing the thread and marking as solution.

Couple of issues:
I have been using the link approach i.e., files list, combine and then do transformation in many projects. No issues. Here, since each file is huge, I am getting into very slowness during processing in the dataflow. Better is to do each file separate processing and then do the join of each file later.

Other thing I noticed is Text.Combine works different sort order in the dataflow vs. dataset/pbix code on the Power BI service, which is outside the post context.

Kudos to your lambda expressions experience

BA_Pete · ‎03-22-2022

Hi @sevenhills ,

Any chance you can drop your final M with a JSON source in here please?

I'm interested to see/learn how you completed this in the end.

Ta,

Pete

Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!

sevenhills · ‎03-22-2022

Hi @BA_Pete

Final code:
Raw Data

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("7ZbdT9swEMD/lRPaY2G2Q2HrW0rZxqRCBZ0mDfFwJNfGwrEjx2Xkv8dN2pFWhM90YhN5iJK7i+/Dv9z5/HyLMQ6r1yhBm2JUwCllxrpSNkBHPWB8exiebgsmxNwOp17Gty4651sjdJK0g6OBlzBWyo4x9fq+JR1jBw4SjGFcKsI4tpTn3lIEHTjE3Pknl0B4TXpG3lS6olp1EUkPvloz0zFMlDEWsoW4biNz14PjYjYlDWOjK11l4H1FVmZOGg1HjtK8V8qHFMtIah/iuFCkjYIuY+m01A1MXiYndqpUflSvQ2O11FNAH8oApSqAg8NLRQ5w4sjCxJj4zvOdh5OJwpv5M4juug++073PBy+9aDlNHLCFSXkrU4ADXw+fsYCGRKsN42z7C11WG8ZZL2DhsDQ9UTH0UaGOvJG/PrCdxab1pVIQptXqcw3f7a7oRijjXgVKXTc2DhUMZrTU/VmxvDEmWqeM1yj7bhSR7sBPWxaskTMRtMHZ2wWtj5GzMoXBWQuk1QJPCmPljfaRp9H6ovdGvhSvLfSrsI6ip6Rej4rXQQpeBZL4L0D6RilamxgZQ2QJ05WCHn8MoV7RMMs8QBU3hFECl+Y3KUjNNaW+CA2dpftIZ+Gv7izigc7CGvtK7asFELutdxZRA2KEM9WBEc2r52eYfukEO0zJ+q32YTXSEPqywVlCWYY2fg4Oy75y3y/fTl/x+3bzt2fX7h1hTHjCRs+fXZw1zy7ezNj67Oq2TlhQI+wAXVK8I9YSYptlNniE2U+vZ3bvAWb3ns7sXuvMfq6PSYyuOjCc5Vdw+DZH5PtZq9X4FlTtt374+reo2szBq8Xz8eZ/gJV+uL/xU2LwolPi8quLWw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Line = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Line", type text}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
    #"Reordered Columns" = Table.ReorderColumns(#"Added Index",{"Index", "Line"}),
    #"Added Conditional Column" = Table.AddColumn(#"Reordered Columns", "Section", each if Text.Contains([Line], "Page:") then "Page Row" else if Text.Contains([Line], "Patient ID") 
        then "Header Patient" else if Text.Contains([Line], "Prescription Items:") then "Items" else if Text.Contains([Line], "Items Count:") then "Footer" else if Text.Contains([Line], "Total Due:") then "End Patient" else null),
    #"Sorted Rows" = Table.Sort(#"Added Conditional Column",{{"Index", Order.Ascending}}),
    #"Duplicated Column" = Table.DuplicateColumn(#"Sorted Rows", "Section", "Section - Begin"),
    #"Filtered Rows" = Table.SelectRows(#"Duplicated Column", each [Line] <> null and [Line] <> ""),
    #"Filled Down" = Table.FillDown(#"Filtered Rows",{"Section"}),
    #"Added Suffix" = Table.TransformColumns(#"Filled Down", {{"Section - Begin", each _ & " - Begin", type text}}),

    #"Added Conditional Column1" = Table.AddColumn(#"Added Suffix", "Seq Number", each if [#"Section - Begin"] = "Page Row - Begin" then Text.Trim(Text.Start([Line], 5)) else null, type nullable text),
    #"Filled Down1" = Table.FillDown(#"Added Conditional Column1",{"Seq Number"}),
    #"Added Conditional Column2" = Table.AddColumn(#"Filled Down1", "Patient ID", 
        each if [#"Section - Begin"] = "Header Patient - Begin" then Text.Trim(Text.AfterDelimiter([Line], "Patient ID: ")) else null  , type nullable text),
    #"Filled Down2" = Table.FillDown(#"Added Conditional Column2",{"Patient ID"}),
    #"Replaced Value" = Table.ReplaceValue(#"Filled Down2", each [Patient ID], each if [#"Section - Begin"] = "Page Row - Begin" then null else [Patient ID],Replacer.ReplaceValue,{"Patient ID"}),
    #"Grouped Rows" = Table.Group(#"Replaced Value", {"Section", "Patient ID"}, {{"Rows Count", each Table.RowCount(_), Int64.Type}, {"First Row", each List.Min([Index]), type number}, {"End Row", each List.Max([Index]), type number}, {"TempTable100", each _, type table [Index=number, Line=nullable text, Section=text, #"Section - Begin"=text, Seq Number=nullable text, Patient ID=nullable text]}})
in
    #"Grouped Rows"

Data Prep - Page Row

let
    Source = #"Raw Data",
    #"Filtered Rows" = Table.SelectRows(Source, each ([Section] = "Page Row")),
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"TempTable100"}),
    #"Expanded TempTable100" = Table.ExpandTableColumn(#"Removed Other Columns", "TempTable100", {"Index", "Line", "Section", "Section - Begin", "Seq Number", "Patient ID"}, {"Index", "Line", "Section", "Section - Begin", "Seq Number", "Patient ID"}),
    #"Split Column by Positions" = Table.SplitColumn(#"Expanded TempTable100", "Line", Splitter.SplitTextByPositions({0, 18, 44, 58, 64}), {"Line.1", "Line.2", "Line.3", "Line.4", "Line.5"}),
    #"Replaced Value" = Table.ReplaceValue(#"Split Column by Positions","     Date: ","",Replacer.ReplaceText,{"Line.2"}),
    #"Removed Columns" = Table.RemoveColumns(#"Replaced Value",{"Line.4", "Section", "Section - Begin", "Patient ID", "Line.1"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Line.5", "Page"}, {"Line.3", "Date"}, {"Line.2", "Title"}}),
    #"Reordered Columns" = Table.ReorderColumns(#"Renamed Columns",{"Seq Number", "Index", "Title", "Date", "Page"})
in
    #"Reordered Columns"

Data Prep - HF Rows

et
    Source = #"Raw Data",
    #"Filtered Rows" = Table.SelectRows(Source, each ([Section] <> "Page Row" and [Section] <> "Items" )),
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"TempTable100"}),
    #"Expanded TempTable100" = Table.ExpandTableColumn(#"Removed Other Columns", "TempTable100", {"Index", "Line", "Section", "Section - Begin", "Seq Number", "Patient ID"}, {"Index", "Line", "Section", "Section - Begin", "Seq Number", "Patient ID"}),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Expanded TempTable100", "Line", Splitter.SplitTextByDelimiter(": ", QuoteStyle.Csv), {"Line.1", "Line.2"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Line.1", type text}, {"Line.2", type text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"Index", "Section", "Section - Begin", "Seq Number"}),
    #"Trimmed Text" = Table.TransformColumns(#"Removed Columns",{{"Line.1", Text.Trim, type text}, {"Line.2", Text.Trim, type text}}),
    #"Filtered Rows1" = Table.SelectRows(#"Trimmed Text", each ([Line.1] <> "Patient ID")),
    #"Removed Duplicates" = Table.Distinct(#"Filtered Rows1"),
    #"Pivoted Column" = Table.Pivot(#"Removed Duplicates", List.Distinct(#"Removed Duplicates"[Line.1]), "Line.1", "Line.2")
in
    #"Pivoted Column"

Data Prep - Items

let
    Source = #"Raw Data",
    #"Filtered Rows" = Table.SelectRows(Source, each ([Section] = "Items" )),
    #"Removed Other Columns" = Table.SelectColumns(#"Filtered Rows",{"TempTable100"}),
    #"Expanded TempTable100" = Table.ExpandTableColumn(#"Removed Other Columns", "TempTable100", {"Index", "Line", "Section", "Section - Begin", "Seq Number", "Patient ID"}, {"Index", "Line", "Section", "Section - Begin", "Seq Number", "Patient ID"}),
    #"Filtered Rows1" = Table.SelectRows(#"Expanded TempTable100", each ([Line] <> "Prescription Items:") and not Text.StartsWith([Line], "Usage")),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Filtered Rows1", "Line", Splitter.SplitTextByDelimiter(": ", QuoteStyle.Csv), {"Line.1", "Line.2"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Line.1", type text}, {"Line.2", type text}}),
    #"Removed Columns" = Table.RemoveColumns(#"Changed Type",{"Section", "Section - Begin"}),
    #"Trimmed Text" = Table.TransformColumns(#"Removed Columns",{{"Line.1", Text.Trim, type text}, {"Line.2", Text.Trim, type text}}),
    #"Added Conditional Column" = Table.AddColumn(#"Trimmed Text", "G1", 
          each 
          if [Line.1] = "Medicine" then "G1_L_" & Number.ToText([Index])
          else if [Line.1] = "Items Count" then "G2_" & [Patient ID]
          else null),
    #"Sorted Rows" = Table.Sort(#"Added Conditional Column",{{"Index", Order.Ascending}}),
    #"Filled Down" = Table.FillDown(#"Sorted Rows",{"G1"}),
    #"Removed Columns1" = Table.RemoveColumns(#"Filled Down",{"Index"}),
    #"Pivoted Column" = Table.Pivot(#"Removed Columns1", List.Distinct(#"Removed Columns1"[Line.1]), "Line.1", "Line.2"),
    #"Replaced Value" = Table.ReplaceValue(#"Pivoted Column","N/A","1.0",Replacer.ReplaceText,{"Dosage"})
in
    #"Replaced Value"

Data

let
    Source = Table.NestedJoin(#"Data Prep - HF Rows", {"Patient ID"}, #"Data Prep - Items", {"Patient ID"}, "Raw Data - Items", JoinKind.LeftOuter),
    #"Expanded Raw Data - Items" = Table.ExpandTableColumn(Source, "Raw Data - Items", {"Seq Number", "Medicine", "Dosage"}, {"Seq Number", "Medicine", "Dosage"}),
    #"Merged Queries" = Table.NestedJoin(#"Expanded Raw Data - Items", {"Seq Number"}, #"Data Prep - Page Row", {"Seq Number"}, "Raw Data - Page Row", JoinKind.LeftOuter),
    #"Expanded Raw Data - Page Row" = Table.ExpandTableColumn(#"Merged Queries", "Raw Data - Page Row", {"Title", "Date", "Page"}, {"Title", "Date", "Page"}),
    #"Reordered Columns" = Table.ReorderColumns(#"Expanded Raw Data - Page Row",{"Patient ID", "Name", "Items Count", "Prescription Date", "Medicine", "Dosage", "Old Balance", "Bill Amount", "Bill Paid", "Total Due", "Seq Number", "Title", "Date", "Page", "Address", "Pharmacy", "Pharmacist"})
in
    #"Reordered Columns"

I will be happy to hear comments/updates/suggestions.

Converted as functions, used to process each file in the folder.

Anonymous · ‎03-17-2022

I don't know how different my idea is from those already proposed (they are too long to read 😁).
There is certainly something to clean or refine in this one.
If considered valid, you can work on it a little longer.

sevenhills · ‎03-17-2022

Thank you @Anonymous

I am thinking along these lines
(The report file looks like detailed report output, which is similar to consolidated invoice / order details report for all customers ~ based on my oracle reports knowledge. but this is pharmacy domain)

a) Load the file and add line number.

b) Categorize into 5 sections, as each may have one row (or) begin and end rows

Section: Page Row ~ 1 row. Unique is "001", "002". To identify the begin of each page
Section: Header Patient ~ Number of rows fixed. Repeats on every page.
Section: Items ~ Dynamic section with variable number of rows and combinations. Spans across pages for a given patient.
Section: Footer ~ Number of rows fixed. Repeats at the end of a given patient, which can be on the same page or in other page. It all depends on number of items.
Section: End Patient ~ 1 row. To identify the end for each patient (Total Due: )
- which I borrowed the concept from @BA_Pete

c) Modularizing by passing sections to the functions

Similar to the function you did

I started with these, In a given file,

Uniqueness is Seq Number "001", "002", ... in each page. Added a column "Seq Number".
"Patient ID", to identify the rows across one or multiple pages
Similarly parse each of the above sections and club them back

Still working on it ...

- Club them back and remove unwanted rows and columns and present

Coming back to your solution, I don't know Spanish and tying to understand and adapt the concept.

Anonymous · ‎03-17-2022

I try to summarize the steps of my idea.
I have divided the report into groups identified by the "Total Due" line.
Since I used the Equation Criteria parameter of the Table.Group () function, I had to insert a dummy line of this type at the beginning of the report which becomes the first line of each group (not the last) but is eliminated with Table.Skip (). If this info is needed, it can be obtained from the Bill lines.
On each of these groups I have applied the trans () function.

I am attaching a file with an application of the trans function to the more complex group (the one that has two different Pharmacy Reports).
The important part is the indexing by subgroups of the Patient Items part, so that you can pivot correctly.
I did some filldowns (but it might be useful to do some fillups too), before deleting the superfluous lines.

Some cleaning and tidying up of the labels and column names remain to be done.
For example, since I used ":" to divide the initial text into columns, the term "Page" remained with the values of the "Line.2" field ie the date. It should be deleted from there and added to the column where the page number "Line.3" is.

PS

My language is Italian not Spanish.

I don't know Spanish either 😁

sevenhills · ‎03-18-2022

Thanks @Anonymous

Sorry, I thought it is Spanish and tried translating your M query code online and tried to understand it 🙂

I looked into your latest file and my raw data (check my original post) and added the Totat due line and tried applying your code and it is not doing correctly. Please could you check.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("7ZZfT9swEMC/ygntsWW2Q2HrW0q7jUmFCjpNWsXDkbiNhWNHjsvIt5+bBOFWDX/TiU30oUruzue78y93ns32ptqihOGS9wE+kH1C9i47sz1CKKz/JgmaFKMCznmmjS1lQ7RuGaHdcXjeZYSxlR0unIyWXiZoBVcWToZOUns+xdTpB4arGDtwnGAM01IRxrHhee4sWdCBEebWPdkEwhuultyZCltUXutI+vDV6KWKYS61NpDVYt9G5LYPp8VywRVMtap0lYHbKzIis0IrOLE8zfulfMxjEQnlQpwWkistoUdIuih1Q52XybH9KpUf1etYGyXUAtCFMkQhC6Bg8UpyCzi33MBc6/h+5/sdzuYSb1fPwHqbe9D93rY9aLmLEovEAqlNyr8yBTh29XAZM2hItDowSrpf+FV1YJT0AxKOS9MzGcMAJapoBYOPw0BICWFaeV9p6EFvTTdBEfcrUHydD9e6x5oy1jpl1KPsu5acqw78NGXBGjljQRucvV3QBhhZI1IYXrRAmhd4UmgjbpWLPI02nW6N/E684ehXYSyPnpK6HxX1QQpeBRL7L0D6xlM0JtEihshwTNcKevoxBL+iYZY5gCpuOEYJXOnfXEKqb3jqitDQWXqPdBb66s7CHugspLGveKtqIA5a7yzMA2KCS9mBCV9Vz80w9dIJNkq5cUftwmqkIXRlg4uEZxma+Dk43PWVbZ98O33Fndvt355dB/eEEeYImzx/dlHSPLtoM2Obs6vXOmGBR9gx2qR4R6wlxHbLbPAIs59ez+zhA8wePp3Zw9aZ/eyPSYyuOzBe5tcwepsj8v2u1Wp8NVVHrV++/i2qdnPxavF+vPsPYK0fHu38lhi86JZ4t+ryDw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Line = _t]),
    #"Split Column by Delimiter" = Table.SplitColumn(Source, "Line", Splitter.SplitTextByDelimiter(": ", QuoteStyle.Csv), {"Line.1", "Line.2", "Line.3"}),
    #"Removed Blank Rows" = Table.SelectRows(#"Split Column by Delimiter", each not List.IsEmpty(List.RemoveMatchingItems(Record.FieldValues(_), {"", null}))),
    #"Filled Down" = Table.FillDown(#"Removed Blank Rows",{"Line.3"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Filled Down",{{"Line.1", type text}, {"Line.2", type text}, {"Line.3", type text}}),
    #"Grouped Rows" = Table.Group(#"Changed Type", {"Line.1"}, {{"all", each _}}, GroupKind.Local, (x,y)=>Number.From(x[#"Line.1"]="Total Due" and y = x) ),
    #"Patient ID 101" = #"Grouped Rows"{1}[all],
    #"Removed Top Rows" = Table.Skip(#"Patient ID 101",1),
    #"Grouped Rows1" = Table.Group(#"Removed Top Rows", {"Line.1", "Line.2"}, {{"all", each _}}, GroupKind.Local, (x,y) => Number.From( Text.Contains(x[#"Line.1"], "Pharmacy") and Text.End(x[#"Line.1"],10) = Text.End(y[#"Line.1"],10) )),
    #"Expanded all" = Table.ExpandTableColumn(#"Grouped Rows1", "all", {"Line.1", "Line.2", "Line.3"}, {"Line.1.1", "Line.2.1", "Line.3"}),
    #"Grouped Rows2" = Table.Group(#"Expanded all", {"Line.1.1"}, {{"all", each Table.AddIndexColumn(_, "Pat_Index1", 0, 1) }}),
    #"Expanded all1" = Table.ExpandTableColumn(#"Grouped Rows2", "all", {"Line.1", "Line.2", "Line.1.1", "Line.2.1", "Line.3", "Pat_Index1"}, {"Line.1", "Line.2", "Line.1.1.1", "Line.2.1", "Line.3", "Pat_Index1"}),
    #"Pivoted Column" = Table.Pivot(#"Expanded all1", List.Distinct(#"Expanded all1"[Line.1.1]), "Line.1.1", "Line.2.1", (x) => x{0}?),
    #"Filled Down1" = Table.FillDown(#"Pivoted Column",{"Patient ID", "Name", "Address", "Pharmacy", "Pharmacist", "Items Count", "Prescription Date", "Old Balance", "Bill Amount", "Bill Paid"}),
    #"Filtered Rows" = Table.SelectRows(#"Filled Down1", each [Medicine] <> null and [Medicine] <> "")
in
    #"Filtered Rows"

Also, please explain these lines what you are doing here, not able to get it

    #"Grouped Rows" = Table.Group(#"Changed Type", {"Line.1"}, {{"all", each _}}, GroupKind.Local, (x,y)=>Number.From(x[#"Line.1"]="Total Due" and y = x) ),
    
    #"Grouped Rows1" = Table.Group(#"Removed Top Rows", {"Line.1", "Line.2"}, {{"all", each _}}, GroupKind.Local, (x,y) => Number.From( Text.Contains(x[#"Line.1"], "Pharmacy") and Text.End(x[#"Line.1"],10) = Text.End(y[#"Line.1"],10) )),

    #"Pivoted Column" = Table.Pivot(#"Expanded all1", List.Distinct(#"Expanded all1"[Line.1.1]), "Line.1.1", "Line.2.1", (x) => x{0}?),

Once I understand this and make it working, I will go back functions implementation above.

My reason for going to functions are

~ my data has lot of lines in each section

~ each file has 10K+ lines and 100+ patient records

~ easy in long term to add more custom logic in the PBI Service dataflow

Anonymous · ‎03-18-2022

try change at those points in the way you see in the pictures

regarding your questions about using the table.group function it takes some time to make a detailed explanation.

You can find somethink here

sevenhills · ‎03-21-2022

Thank you, I will look into the lamda explanation and try to understand.

Able to move forward further along. Appreciated , will update the post by the end of the day

BA_Pete · ‎03-17-2022

Hi @sevenhills ,

I've gone about it a slightly different way from you i.e. I've built up all the columns from the original table, then filled them around the [medicine] and [dosage] columns (as these are the only ones that can be distinct within a group).

I think the addition of the STOP row could be slicker, maybe using a record function, but couldn't get my head in gear for it.

The only difference I got to your example output was on poor old Joleen's special cream, where your output stated a dosage of 1, but the data had N/A. Probably easy enough for you to add a step into my code to fix this if required.

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("3ZXdb9owEMD/lRPaIzDbwAtvoem2ToJGLdWkoT6Y5CCW/BE5piP//YwDIyCyrSuapuUhSu4u9+Vf7haLDiEUTq8k51bxtIIHLIx1QRZzh2MgtDeNHnqMMLaz42svo53n7qKTcCdQO7iLvYSQIJtx5fUTizrjXbjJeQbzoIiyzGJZeks26MItL51/cjlEL6g36E2Fq2qv+0zG8NGajc5gJY2xUOzFTRtRujHMqs0aNcyNrnW1gY+VWlE4YTTcOVTlOMinmIlUaJ/ivJKojYQRIWoddLEpQ3GsX5fyVL9OjdVCr4H7VGIuZAUUHF9KdMBXDi2sjMmOkY8R7leSb3fPwEbnMWh/dCkGDVG0WOcOyN4k3EIJcOP74Stm0FJofWCU9D7gsj4wSsYDEk2D6b3MYMIl16k38tc70t8f2kRICZGqve80dDg60SVcZOMalKZubhyXEG/woPvhMdwIYVenjDYo+2wkou7CFxsa1soZG1yDs38XtAlPnRUK4scrkNZIPK+MFVvtM1fpudOLmR/EZ46+VtZh+julN7OiTZAGbwKJ/RcgfULFrc2NyCC1yNVJQ2fvI2h2NCoKD1DNDfI0h6X5hhKUeUHlm9AyWUa/mCz0zZOF/WSykNa50vhqD8Tw6pOFNYBI+EZ2IcFd9/wO03+6wW4VWn/UPq1WGiLfNnjMsSi4zV6Dw2GuXPrlrzNX/Llt//buGh4JI8wTlrx+d1HSvrtoO2O15vk7", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Line = _t]),
    addSeqID = Table.AddColumn(Source, "seqID", each if Text.Contains([Line], "Total Due:") then "STOP" else if Text.Contains([Line], "Page:") then Text.BeforeDelimiter([Line], " ") else null, type text),
    addReportName = Table.AddColumn(addSeqID, "reportName", each if Text.Contains([Line], "Total Due:") then "STOP" else if [seqID] <> null then Text.BetweenDelimiters([Line], [seqID], "Date:") else null, type text),
    addDate = Table.AddColumn(addReportName, "date", each if Text.Contains([Line], "Total Due:") then "STOP" else if [seqID] <> null then Text.BetweenDelimiters([Line], "Date:", "Page:") else null),
    addPage = Table.AddColumn(addDate, "page", each if Text.Contains([Line], "Total Due:") then "STOP" else if [seqID] <> null then Text.AfterDelimiter([Line], "Page:") else null, type text),
    addPatientID = Table.AddColumn(addPage, "patientID", each if Text.Contains([Line], "Total Due:") then "STOP" else if Text.Contains([Line], "Patient ID:") then Text.AfterDelimiter([Line], "Patient ID:") else null, type text),
    addName = Table.AddColumn(addPatientID, "name", each if Text.Contains([Line], "Total Due:") then "STOP" else if Text.Contains([Line], "Name:") then Text.AfterDelimiter([Line], "Name:") else null, type text),
    addPharmacy = Table.AddColumn(addName, "pharmacy", each if Text.Contains([Line], "Total Due:") then "STOP" else if Text.Contains([Line], "Pharmacy:") then Text.AfterDelimiter([Line], "Pharmacy:") else null, type text),
    addMedicine = Table.AddColumn(addPharmacy, "medicine", each if Text.Contains([Line], "Total Due:") or Text.Contains([Line], "Usage:") then "STOP" else if Text.Contains([Line], "Medicine:") then Text.AfterDelimiter([Line], "Medicine:") else null, type text),
    addDosage = Table.AddColumn(addMedicine, "dosage", each if Text.Contains([Line], "Total Due:") or Text.Contains([Line], "Usage:") then "STOP" else if Text.Contains([Line], "Dosage:") then Text.AfterDelimiter([Line], "Dosage:") else null),
    addItemsCount = Table.AddColumn(addDosage, "itemCount", each if Text.Contains([Line], "Total Due:") then "STOP" else if Text.Contains([Line], "Items Count:") then Text.AfterDelimiter([Line], "Items Count:") else null),
    addPrescDate = Table.AddColumn(addItemsCount, "prescriptionDate", each if Text.Contains([Line], "Total Due:") then "STOP" else if Text.Contains([Line], "Prescription Date:") then Text.AfterDelimiter([Line], "Prescription Date:") else null),
    addBillAmount = Table.AddColumn(addPrescDate, "billAmount", each if Text.Contains([Line], "Total Due:") then "STOP" else if Text.Contains([Line], "Bill Amount:") then Text.AfterDelimiter([Line], "Bill Amount:") else null),
    trimTextTypeCols = Table.TransformColumns(addBillAmount,{{"seqID", Text.Trim, type text}, {"reportName", Text.Trim, type text}, {"date", Text.Trim, type text}, {"page", Text.Trim, type text}, {"patientID", Text.Trim, type text}, {"name", Text.Trim, type text}, {"pharmacy", Text.Trim, type text}, {"medicine", Text.Trim, type text}}),
    remLineCol = Table.RemoveColumns(trimTextTypeCols,{"Line"}),
    fillDownToMedicine = Table.FillDown(remLineCol,{"seqID", "reportName", "date", "page", "patientID", "name", "pharmacy", "medicine"}),
    fillUpFromDosage = Table.FillUp(fillDownToMedicine,{"dosage", "itemCount", "prescriptionDate", "billAmount"}),
    filterMedStopNull = Table.SelectRows(fillUpFromDosage, each ([medicine] <> null and [medicine] <> "STOP")),
    tableDistinct = Table.Distinct(filterMedStopNull)
in
    tableDistinct

Pete

Now accepting Kudos! If my post helped you, why not give it a thumbs-up?

Proud to be a Datanaut!

sevenhills · ‎03-17-2022

Thank you @BA_Pete

--------

I updated the raw data and output in the original post(, helps the question in long term viewers.)

--------

Few points I missed in the original post, after looking into your solution.

a) In a given report file,

i) No patient id duplicates in a given file.

ii) For a given patient,

Medicine, Dosage, Usage combinations can have duplicates.

Medicine, Dosage, Usage will always stay as combination in a given page. (no split of pages of these three row combination)

b) I went back to the requirements and saw that if dosage is blank or N/A then change to 1. So, you are right, I can add a step to fix it. Sorry, I did not mention in the original post, as I was not aware when I did the first post.

--------

My each data file is upto 25000 lines, and get each week one file. Data combinations are many.

Let me check and get back to you, (will update this reply)

sevenhills · ‎03-17-2022

I am thinking along these lines, for modularity

Work in progress
Raw Data - Step 1

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("7ZbdT9swEMD/lRPaY2G2Q2HrW0rZxqRCBZ0mDfFwJNfGwrEjx2Xkv8dN2pFWhM90YhN5iJK7i+/Dv9z5/HyLMQ6r1yhBm2JUwCllxrpSNkBHPWB8exiebgsmxNwOp17Gty4651sjdJK0g6OBlzBWyo4x9fq+JR1jBw4SjGFcKsI4tpTn3lIEHTjE3Pknl0B4TXpG3lS6olp1EUkPvloz0zFMlDEWsoW4biNz14PjYjYlDWOjK11l4H1FVmZOGg1HjtK8V8qHFMtIah/iuFCkjYIuY+m01A1MXiYndqpUflSvQ2O11FNAH8oApSqAg8NLRQ5w4sjCxJj4zvOdh5OJwpv5M4juug++073PBy+9aDlNHLCFSXkrU4ADXw+fsYCGRKsN42z7C11WG8ZZL2DhsDQ9UTH0UaGOvJG/PrCdxab1pVIQptXqcw3f7a7oRijjXgVKXTc2DhUMZrTU/VmxvDEmWqeM1yj7bhSR7sBPWxaskTMRtMHZ2wWtj5GzMoXBWQuk1QJPCmPljfaRp9H6ovdGvhSvLfSrsI6ip6Rej4rXQQpeBZL4L0D6RilamxgZQ2QJ05WCHn8MoV7RMMs8QBU3hFECl+Y3KUjNNaW+CA2dpftIZ+Gv7izigc7CGvtK7asFELutdxZRA2KEM9WBEc2r52eYfukEO0zJ+q32YTXSEPqywVlCWYY2fg4Oy75y3y/fTl/x+3bzt2fX7h1hTHjCRs+fXZw1zy7ezNj67Oq2TlhQI+wAXVK8I9YSYptlNniE2U+vZ3bvAWb3ns7sXuvMfq6PSYyuOjCc5Vdw+DZH5PtZq9X4FlTtt374+reo2szBq8Xz8eZ/gJV+uL/xU2LwolPi8quLWw==", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type nullable text) meta [Serialized.Text = true]) in type table [Line = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Line", type text}}),
    #"Added Index" = Table.AddIndexColumn(#"Changed Type", "Index", 1, 1, Int64.Type),
    #"Reordered Columns" = Table.ReorderColumns(#"Added Index",{"Index", "Line"}),
    #"Added Conditional Column" = Table.AddColumn(#"Reordered Columns", "Section", each if Text.Contains([Line], "Page:") then "Page Row" else if Text.Contains([Line], "Patient ID") 
        then "Header Patient" else if Text.Contains([Line], "Prescription Items:") then "Items" else if Text.Contains([Line], "Prescription Date:") then "Footer" else if Text.Contains([Line], "Total Due:") then "End Patient" else null),
    #"Sorted Rows" = Table.Sort(#"Added Conditional Column",{{"Index", Order.Ascending}}),
    #"Duplicated Column" = Table.DuplicateColumn(#"Sorted Rows", "Section", "Section - Begin"),
    #"Filtered Rows" = Table.SelectRows(#"Duplicated Column", each [Line] <> null and [Line] <> ""),
    #"Filled Down" = Table.FillDown(#"Filtered Rows",{"Section"}),
    #"Added Suffix" = Table.TransformColumns(#"Filled Down", {{"Section - Begin", each _ & " - Begin", type text}}),

    #"Added Conditional Column1" = Table.AddColumn(#"Added Suffix", "Seq Number", each if [#"Section - Begin"] = "Page Row - Begin" then Text.Trim(Text.Start([Line], 5)) else null, type nullable text),
    #"Filled Down1" = Table.FillDown(#"Added Conditional Column1",{"Seq Number"}),
    #"Added Conditional Column2" = Table.AddColumn(#"Filled Down1", "Patient ID", 
        each if [#"Section - Begin"] = "Header Patient - Begin" then Text.Trim(Text.AfterDelimiter([Line], "Patient ID: ")) else null  , type nullable text),
    #"Filled Down2" = Table.FillDown(#"Added Conditional Column2",{"Patient ID"}),
    #"Replaced Value" = Table.ReplaceValue(#"Filled Down2", each [Patient ID], each if [#"Section - Begin"] = "Page Row - Begin" then null else [Patient ID],Replacer.ReplaceValue,{"Patient ID"})
in
    #"Replaced Value"

Raw Data Prep - Step 2

let
    Source = #"Raw Data - Step 1",
    addPageRowInfo = Table.AddColumn(Source, "Page Row Info", each if [#"Section - Begin"] = "Page Row - Begin" then Parse_PageRow([Line]) else null, type table),
    addPatientEndRowInfo = Table.AddColumn(addPageRowInfo, "addPageRowInfo", each if [#"Section - Begin"] = "End Patient - Begin" then Parse_PatientEndRow([Line]) else null, type table),
    #"Grouped Rows" = Table.Group(addPatientEndRowInfo, {"Seq Number", "Section"}, {{"Rows Count", each Table.RowCount(_), Int64.Type}, {"First Row", each List.Min([Index]), type number}, {"End Row", each List.Max([Index]), type number}, {"TempTable100", each _, type table [Index=number, Line=nullable text, Section=text, #"Section - Begin"=text, Seq Number=nullable text, Patient ID=nullable text, Page Row Info=table, addPageRowInfo=table]}})
in
    #"Grouped Rows"

For each row, took the "TempTable100" column (and to modularize,) added these functions and later clubbed them all. it is the same logic we did for the whole file.

...
Function: Parse_HeaderPatientRows
Function: Parse_Items

Function: Parse_FooterRows

Function: Parse_PatientEndRow

= Table.AddColumn(#"Grouped Rows", "Data Cols Info", each
if [#"Section"] = "End Patient" then Parse_PatientEndRow([allrows]{0}[Line])
else if [#"Section"] = "Footer" then Parse_FooterRows([allrows])
else if [#"Section"] = "Header Patient" then Parse_HeaderPatientRows([allrows])
else if [#"Section"] = "Items" then Parse_Items([allrows])
else null

, type table)

Note: All the functions code can be done as one function too. Because of lot more data in each row and few more transformations, I adapted as separate functions.

Thank you @BA_Pete . Stopper row identification is a cool technique.

Transforming rows to columns from a report output

Helpful resources

Microsoft Fabric Learn Together

Power BI Monthly Update - April 2024

Fabric Community Update - April 2024