Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
I am extracting the Table of Contents from a PDF and want to create page ranges (e.g., 1-5) for each item that operates by subtracting 1 from the entry after. Here is the sample data:
TOC Entry | Page Number |
Lesson 1: Getting Started | 1 |
Topic A: Navigate | 2 |
Topic B: View | 20 |
Topic C: Create | 25 |
Topic D : Navigate | 30 |
Lesson 2: Developing Content | 50 |
Topic A: Create | 51 |
Topic B: Edit | 60 |
Topic C: Work | 65 |
Topic D : Design | 70 |
This is what I would like to see:
TOC Entry | Page Number |
Lesson 1: Getting Started | 1 |
Topic A: Navigate | 2-19 |
Topic B: View | 20-24 |
Topic C: Create | 25-29 |
Topic D : Navigate | 30-49 |
Lesson 2: Developing Content | 50 |
Topic A: Create | 51-59 |
Topic B: Edit | 60-64 |
Topic C: Work | 65-69 |
Topic D : Design | 70 |
Any ideas/suggestions would be greatly appreciated.
Solved! Go to Solution.
For the first one:
Page Number 2 =
VAR __Current = [Page Number]
VAR __Next = MAXX(FILTER('Table',[Index] = (EARLIER([Index]) + 1)),[Page Number])
RETURN
SWITCH(TRUE(),
__Next < __Current,__Current & "",
__Next - __Current = 1,__Current & "",
__Current & "-" & __Next - 1
)
For the second one, yes:
Page Number 2 =
VAR __Current = [Page Number]
VAR __Next = MAXX(FILTER('Table',[Index] = (EARLIER([Index]) + 1) && [Document] = EARLIER([Document])),[Page Number])
RETURN
SWITCH(TRUE(),
__Next < __Current,__Current & "",
__Next - __Current = 1,__Current & "",
__Current & "-" & __Next - 1
)
Hey there. This should work for just about any TOC. This assumes your table is in an Excel sheet, if not, just amend the source, then copy paste these onto the queries pane:
// Table1
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"TOC Entry", type text}, {"Page Number", Int64.Type}})
in
#"Changed Type"
// LessonEntry
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Filtered Rows" = Table.SelectRows(Source, each Text.StartsWith([TOC Entry], "Lesson")),
#"Added Index" = Table.AddIndexColumn(#"Filtered Rows", "Index", 1, 1),
#"Added Index1" = Table.AddIndexColumn(#"Added Index", "Index.1", 0, 1),
#"Merged Queries" = Table.NestedJoin(#"Added Index1", {"Index"}, #"Added Index1", {"Index.1"}, "Added Index1", JoinKind.LeftOuter),
#"Expanded Added Index2" = Table.ExpandTableColumn(#"Merged Queries", "Added Index1", {"TOC Entry", "Page Number"}, {"TOC Entry.1", "Page Number.1"}),
#"Removed Columns1" = Table.RemoveColumns(#"Expanded Added Index2",{"Index", "Index.1"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns1",{{"Page Number.1", "ToPage"}}),
#"Subtracted from Column1" = Table.TransformColumns(#"Renamed Columns", {{"ToPage", each _ - 1, type number}}),
#"Replaced Value" = Table.ReplaceValue(#"Subtracted from Column1",null,List.Max(Table1[Page Number]),Replacer.ReplaceValue,{"ToPage"}),
#"Removed Columns" = Table.RemoveColumns(#"Replaced Value",{"TOC Entry.1"}),
#"Changed Type" = Table.TransformColumnTypes(#"Removed Columns",{{"Page Number", type text}, {"ToPage", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Page_Number", each [Page Number] &" - " &[ToPage], type text),
#"Removed Columns2" = Table.RemoveColumns(#"Added Custom",{"Page Number", "ToPage"}),
#"Changed Type1" = Table.TransformColumnTypes(#"Removed Columns2",{{"TOC Entry", type text}})
in
#"Changed Type1"
// TOC
let
Source = Table1,
#"Added Index" = Table.AddIndexColumn(Source, "Index", 1, 1),
#"Added Index1" = Table.AddIndexColumn(#"Added Index", "Index.1", 0, 1),
#"Merged Queries" = Table.NestedJoin(#"Added Index1", {"Index"}, #"Added Index1", {"Index.1"}, "Added Index1", JoinKind.LeftOuter),
#"Expanded Added Index1" = Table.ExpandTableColumn(#"Merged Queries", "Added Index1", {"Page Number"}, {"Page Number.1"}),
#"Removed Columns" = Table.RemoveColumns(#"Expanded Added Index1",{"Index", "Index.1"}),
#"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Page Number.1", "ToPage"}}),
#"Replaced Value" = Table.ReplaceValue(#"Renamed Columns",null, List.Max(Source[Page Number]),Replacer.ReplaceValue,{"ToPage"}),
#"Changed Type" = Table.TransformColumnTypes(#"Replaced Value",{{"Page Number", type text}, {"ToPage", type text}}),
#"Added Custom" = Table.AddColumn(#"Changed Type", "Page_Number", each [Page Number]&" - "&[ToPage],type text),
#"Removed Columns1" = Table.RemoveColumns(#"Added Custom",{"Page Number", "ToPage"}),
#"Merged Queries1" = Table.NestedJoin(#"Removed Columns1", {"TOC Entry"}, LessonEntry, {"TOC Entry"}, "LessonEntry", JoinKind.LeftOuter),
#"Expanded LessonEntry" = Table.ExpandTableColumn(#"Merged Queries1", "LessonEntry", {"Page_Number"}, {"Page_Number.1"}),
#"Added Conditional Column" = Table.AddColumn(#"Expanded LessonEntry", "New_Page Number", each if [Page_Number.1] = null then [Page_Number] else [Page_Number.1], type text),
#"Removed Columns2" = Table.RemoveColumns(#"Added Conditional Column",{"Page_Number", "Page_Number.1"}),
#"Renamed Columns1" = Table.RenameColumns(#"Removed Columns2",{{"New_Page Number", "Page Number"}})
in
#"Renamed Columns1"
//Let me know how it goes!
Add an Index column in Power Query and then you can create this column using DAX:
Page Number 2 =
VAR __Current = [Page Number]
VAR __Next = MAXX('Table',[Index] = EARLIER([Index]) + 1),[Page Number])
RETURN
IF([Next] - [Current] = 1,__Current,__Current & "-" & __Next - 1)
@Greg_Deckler, thanks for the fast response. I am getting errors when I use the DAX code you provided.
This is what happens when I don't test the code, sorry, stray paren )
Page Number 2 =
VAR __Current = [Page Number]
VAR __Next = MAXX('Table',[Index] = (EARLIER([Index]) + 1),[Page Number])
RETURN
IF([Next] - [Current] = 1,__Current,__Current & "-" & __Next - 1)
@Greg_Deckler, I know how it is. I tried that code again; same result. I'm getting an "Unexpected Param" for the [Page Number] in the MAXX variable.
Also, noticed the [Next] and [Current] columns are marked as columns but wondered if those should be the variable names (e.g., __Current).
OK, let's try this again. I actually tested this one:
Page Number 2 =
VAR __Current = [Page Number]
VAR __Next = MAXX(FILTER('Table',[Index] = (EARLIER([Index]) + 1)),[Page Number])
RETURN
IF(__Next - __Current = 1,__Current & "",__Current & "-" & __Next - 1)
@Greg_Deckler, that worked, and it is fantastic. Couple of questions for you.
First, the last item in the list shows 219-0, which I assume is due to it looping around to the first INDEX item. Is there any way to just make it say 219, which is the last page number in the list?
Second, if there are multiple PDFs in use, is there a way this DAX formula could be configured to "restart" when the text value changes? For example:
Document | TOC Entry | Page Number | Page Range |
Orientation | Lesson 1: Getting Started | 1 | 1 |
Orientation | Topic A: Navigate | 2 | 2-19 |
Orientation | Topic B: Create | 20 | 20 |
Content Creation | Lesson 1: Developing Content | 1 | 1 |
Content Creation | Topic A: Create | 2 | 2-9 |
Content Creation | Topic B: Edit | 10 | 10 |
For the first one:
Page Number 2 =
VAR __Current = [Page Number]
VAR __Next = MAXX(FILTER('Table',[Index] = (EARLIER([Index]) + 1)),[Page Number])
RETURN
SWITCH(TRUE(),
__Next < __Current,__Current & "",
__Next - __Current = 1,__Current & "",
__Current & "-" & __Next - 1
)
For the second one, yes:
Page Number 2 =
VAR __Current = [Page Number]
VAR __Next = MAXX(FILTER('Table',[Index] = (EARLIER([Index]) + 1) && [Document] = EARLIER([Document])),[Page Number])
RETURN
SWITCH(TRUE(),
__Next < __Current,__Current & "",
__Next - __Current = 1,__Current & "",
__Current & "-" & __Next - 1
)
@Greg_Deckler, wow. That's awesome.
Okay, last one. Is it possible to ALSO have each "lesson" range include all page numbers minus the start of the next lesson/appendix? For example:
TOC Entry | Page Number | Page Range |
Lesson 1: Getting Started | 1 | 1-19 |
Topic A: Navigate | 2 | 2-19 |
Topic B: Create | 20 | 20-29 |
Lesson 2: Developing Content | 30 | 30-39 |
Topic A: Create | 31 | 31-39 |
Appendix A | 40 | 40 |
Create this column:
@Greg_Deckler, thanks for providing that. The Lesson Number worked wonderfully, but the adjustments to the Page Number column aren't working properly. Rather than show the Lesson 1 page range, it just removed the page range for all items at the end of a particular lesson.
TOC Entry | Page Number | Page Range |
Lesson 1: Getting Started | 1 | 1 |
Topic A: Navigate | 2 | 2-19 |
Topic B: Create | 20 | 20 |
Lesson 2: Developing Content | 30 | 30 |
Topic A: Create | 31 | 31-39 |
What I'm hoping is possible would be this result (if this makes sense):
TOC Entry | Page Number | Page Range |
Lesson 1: Getting Started | 1 | 1-29 |
Topic A: Navigate | 2 | 2-19 |
Topic B: Create | 20 | 20-29 |
Lesson 2: Developing Content | 30 | 30-39 |
Topic A: Create | 31 | 31-39 |
@Anonymous - I feel like I've answered this like 12 times now and you keep changing the goal posts. So, you are just going to have to play with the pattern to meet your ever-changing requirements.
@Greg_Deckler, I have marked the fourth response as a solution since that contains the code that worked for the original question.
The "changing of the goal posts" resulted from either my noticing a problem with the code you gave that needed to be resolved or thinking of a new value-add item that builds on the solution you provided. Assistance is by no means required, but that is the nature of an online help forum: people ask for help if they do not know how to solve something.
I do appreciate the help you did provide and wish you well.
Hi @Anonymous,
This is M version rather than DAX, if this is an acceptable way of solving it. Plug your source into the SourceTable step.
let
Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("XY7BCsIwEER/Zcm5h7YShdw0ES/iRdFD6SHYJYRKIs1Sf98tFml6nTczvKYRZ0wpBqgUnJDIBwdXsgNhJwpRibZoxC2+/RP2Ci529M4SMqkX5KDg7vEzpeUi1gr0gHNdLoCB7GrzG80etQKDI764ySo6BsJA3JJl7vK/llWucuz81N+uVB5x6Kd4LWIweReY7HjQfgE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [#"TOC Entry" = _t, #"Page Number" = _t]),
#"Changed Type" = Table.TransformColumnTypes(Source,{{"TOC Entry", type text}, {"Page Number", Int64.Type}}),
SourceTable = #"Changed Type",
MakeLastPage = List.Transform(List.Skip (SourceTable[Page Number]), (x)=> x-1),
MakeTable = Table.FromColumns(Table.ToColumns(SourceTable) & {MakeLastPage}, type table [#"TOC"=text, #"First"=Int64.Type, #"Last"=Int64.Type]),
AddPageRange = Table.AddColumn(MakeTable, "Page Number", each if [First] = [Last] or [Last]=null then Number.ToText([First]) else Number.ToText([First]) & "-" & Number.ToText([Last]), type text),
Output = Table.SelectColumns(AddPageRange,{"TOC", "Page Number"})
in
Output
Kind regards,
John
@Anonymous, This is great. I really wanted an M solution as I prefer edits being made all in one space. I did have to make a slight change to allow for additional columns in my source data:
MakeTable = Table.FromColumns(Table.ToColumns(SourceTable) & {MakeLastPage}, type table [#"TOC"=text, #"First"=Int64.Type, #"Course"=text,#"Index"=Int64.Type,#"Last"=Int64.Type]),
Beyond that, the only other item I'd like to solve is creating a page range for the "Lesson" sections to cover all of the "topics" within them.
Fortunately, your solution allowed me to solve this by creating a separate table containing only the Lesson entries, applying your M query to the filtered table, and then merging to the TOC table. Worked like a charm. Kudos coming your way.
Hi @Anonymous
Thanks, mate. Happy to help 🙂
Kind regards,
JB
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.