Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
Anonymous
Not applicable

How can I create a number range from a list of page numbers from a PDF?

I am extracting the Table of Contents from a PDF and want to create page ranges (e.g., 1-5) for each item that operates by subtracting 1 from the entry after. Here is the sample data:

 

TOC EntryPage Number
Lesson 1: Getting Started1
Topic A: Navigate2
Topic B: View20
Topic C: Create25
Topic D : Navigate30
Lesson 2: Developing Content50
Topic A: Create51
Topic B: Edit60
Topic C: Work65
Topic D : Design70

 

This is what I would like to see:

 

TOC EntryPage Number
Lesson 1: Getting Started1
Topic A: Navigate2-19
Topic B: View20-24
Topic C: Create25-29
Topic D : Navigate30-49
Lesson 2: Developing Content50
Topic A: Create51-59
Topic B: Edit60-64
Topic C: Work65-69
Topic D : Design70

 

Any ideas/suggestions would be greatly appreciated.

1 ACCEPTED SOLUTION

For the first one:

Page Number 2 = 
    VAR __Current = [Page Number]
    VAR __Next = MAXX(FILTER('Table',[Index] = (EARLIER([Index]) + 1)),[Page Number])
RETURN
    SWITCH(TRUE(),
        __Next < __Current,__Current & "",
        __Next - __Current = 1,__Current & "",
        __Current & "-" & __Next - 1
    )

 

For the second one, yes:

 

Page Number 2 = 
    VAR __Current = [Page Number]
    VAR __Next = MAXX(FILTER('Table',[Index] = (EARLIER([Index]) + 1) && [Document] = EARLIER([Document])),[Page Number])
RETURN
    SWITCH(TRUE(),
        __Next < __Current,__Current & "",
        __Next - __Current = 1,__Current & "",
        __Current & "-" & __Next - 1
    )

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...

View solution in original post

16 REPLIES 16
watkinnc
Super User
Super User

Hey there.   This should work for just about any TOC.  This assumes your table is in an Excel sheet, if not, just amend the source, then copy paste these onto the queries pane:

 

// Table1
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"TOC Entry", type text}, {"Page Number", Int64.Type}})
in
    #"Changed Type"

// LessonEntry
let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Filtered Rows" = Table.SelectRows(Source, each Text.StartsWith([TOC Entry], "Lesson")),
    #"Added Index" = Table.AddIndexColumn(#"Filtered Rows", "Index", 1, 1),
    #"Added Index1" = Table.AddIndexColumn(#"Added Index", "Index.1", 0, 1),
    #"Merged Queries" = Table.NestedJoin(#"Added Index1", {"Index"}, #"Added Index1", {"Index.1"}, "Added Index1", JoinKind.LeftOuter),
    #"Expanded Added Index2" = Table.ExpandTableColumn(#"Merged Queries", "Added Index1", {"TOC Entry", "Page Number"}, {"TOC Entry.1", "Page Number.1"}),
    #"Removed Columns1" = Table.RemoveColumns(#"Expanded Added Index2",{"Index", "Index.1"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Columns1",{{"Page Number.1", "ToPage"}}),
    #"Subtracted from Column1" = Table.TransformColumns(#"Renamed Columns", {{"ToPage", each _ - 1, type number}}),
    #"Replaced Value" = Table.ReplaceValue(#"Subtracted from Column1",null,List.Max(Table1[Page Number]),Replacer.ReplaceValue,{"ToPage"}),
    #"Removed Columns" = Table.RemoveColumns(#"Replaced Value",{"TOC Entry.1"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Removed Columns",{{"Page Number", type text}, {"ToPage", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Page_Number", each [Page Number] &" - " &[ToPage], type text),
    #"Removed Columns2" = Table.RemoveColumns(#"Added Custom",{"Page Number", "ToPage"}),
    #"Changed Type1" = Table.TransformColumnTypes(#"Removed Columns2",{{"TOC Entry", type text}})
in
    #"Changed Type1"

// TOC
let
    Source = Table1,
    #"Added Index" = Table.AddIndexColumn(Source, "Index", 1, 1),
    #"Added Index1" = Table.AddIndexColumn(#"Added Index", "Index.1", 0, 1),
    #"Merged Queries" = Table.NestedJoin(#"Added Index1", {"Index"}, #"Added Index1", {"Index.1"}, "Added Index1", JoinKind.LeftOuter),
    #"Expanded Added Index1" = Table.ExpandTableColumn(#"Merged Queries", "Added Index1", {"Page Number"}, {"Page Number.1"}),
    #"Removed Columns" = Table.RemoveColumns(#"Expanded Added Index1",{"Index", "Index.1"}),
    #"Renamed Columns" = Table.RenameColumns(#"Removed Columns",{{"Page Number.1", "ToPage"}}),
    #"Replaced Value" = Table.ReplaceValue(#"Renamed Columns",null, List.Max(Source[Page Number]),Replacer.ReplaceValue,{"ToPage"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Replaced Value",{{"Page Number", type text}, {"ToPage", type text}}),
    #"Added Custom" = Table.AddColumn(#"Changed Type", "Page_Number", each [Page Number]&" - "&[ToPage],type text),
    #"Removed Columns1" = Table.RemoveColumns(#"Added Custom",{"Page Number", "ToPage"}),
    #"Merged Queries1" = Table.NestedJoin(#"Removed Columns1", {"TOC Entry"}, LessonEntry, {"TOC Entry"}, "LessonEntry", JoinKind.LeftOuter),
    #"Expanded LessonEntry" = Table.ExpandTableColumn(#"Merged Queries1", "LessonEntry", {"Page_Number"}, {"Page_Number.1"}),
    #"Added Conditional Column" = Table.AddColumn(#"Expanded LessonEntry", "New_Page Number", each if [Page_Number.1] = null then [Page_Number] else [Page_Number.1], type text),
    #"Removed Columns2" = Table.RemoveColumns(#"Added Conditional Column",{"Page_Number", "Page_Number.1"}),
    #"Renamed Columns1" = Table.RenameColumns(#"Removed Columns2",{{"New_Page Number", "Page Number"}})
in
    #"Renamed Columns1"

 

//Let me know how it goes!


I’m usually answering from my phone, which means the results are visualized only in my mind. You’ll need to use my answer to know that it works—but it will work!!
Greg_Deckler
Super User
Super User

Add an Index column in Power Query and then you can create this column using DAX:

 

 

Page Number 2 = 
  VAR __Current = [Page Number]
  VAR __Next = MAXX('Table',[Index] = EARLIER([Index]) + 1),[Page Number])
RETURN
  IF([Next] - [Current] = 1,__Current,__Current & "-" & __Next - 1)
  

 


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...
Anonymous
Not applicable

@Greg_Deckler, thanks for the fast response. I am getting errors when I use the DAX code you provided.

 

DaxError.png

This is what happens when I don't test the code, sorry, stray paren )

 

Page Number 2 = 
  VAR __Current = [Page Number]
  VAR __Next = MAXX('Table',[Index] = (EARLIER([Index]) + 1),[Page Number])
RETURN
  IF([Next] - [Current] = 1,__Current,__Current & "-" & __Next - 1)

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...
Anonymous
Not applicable

@Greg_Deckler, I know how it is. I tried that code again; same result. I'm getting an "Unexpected Param" for the [Page Number] in the MAXX variable.

 

Also, noticed the [Next] and [Current] columns are marked as columns but wondered if those should be the variable names (e.g., __Current).

 

DaxError.png

OK, let's try this again. I actually tested this one:

 

Page Number 2 = 
  VAR __Current = [Page Number]
  VAR __Next = MAXX(FILTER('Table',[Index] = (EARLIER([Index]) + 1)),[Page Number])
RETURN
  IF(__Next - __Current = 1,__Current & "",__Current & "-" & __Next - 1)

 

 


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...
Anonymous
Not applicable

@Greg_Deckler, that worked, and it is fantastic. Couple of questions for you.

 

First, the last item in the list shows 219-0, which I assume is due to it looping around to the first INDEX item. Is there any way to just make it say 219, which is the last page number in the list?

 

DaxError.png

 

Second, if there are multiple PDFs in use, is there a way this DAX formula could be configured to "restart" when the text value changes? For example:

 

DocumentTOC EntryPage NumberPage Range
OrientationLesson 1: Getting Started11
OrientationTopic A: Navigate22-19
OrientationTopic B: Create2020
Content CreationLesson 1: Developing Content11
Content CreationTopic A: Create22-9
Content CreationTopic B: Edit1010

 

For the first one:

Page Number 2 = 
    VAR __Current = [Page Number]
    VAR __Next = MAXX(FILTER('Table',[Index] = (EARLIER([Index]) + 1)),[Page Number])
RETURN
    SWITCH(TRUE(),
        __Next < __Current,__Current & "",
        __Next - __Current = 1,__Current & "",
        __Current & "-" & __Next - 1
    )

 

For the second one, yes:

 

Page Number 2 = 
    VAR __Current = [Page Number]
    VAR __Next = MAXX(FILTER('Table',[Index] = (EARLIER([Index]) + 1) && [Document] = EARLIER([Document])),[Page Number])
RETURN
    SWITCH(TRUE(),
        __Next < __Current,__Current & "",
        __Next - __Current = 1,__Current & "",
        __Current & "-" & __Next - 1
    )

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...
Anonymous
Not applicable

@Greg_Deckler, wow. That's awesome.

 

Okay, last one. Is it possible to ALSO have each "lesson" range include all page numbers minus the start of the next lesson/appendix? For example:

 

TOC EntryPage NumberPage Range
Lesson 1: Getting Started11-19
Topic A: Navigate22-19
Topic B: Create2020-29
Lesson 2: Developing Content3030-39
Topic A: Create3131-39
Appendix A4040

Create this column:

Lesson Number =
COUNTROWS(FILTER('Table',[Index]<=EARLIER([Index]) && SEARCH("Lesson",[TOC Entry],,FALSE())))
 
and then this column:
Page Number 3 =
VAR __Current = [Page Number]
VAR __Next = MAXX(FILTER('Table',[Index] = (EARLIER([Index]) + 1) && [Lesson Number] = EARLIER('Table'[Lesson Number])),[Page Number])
RETURN
SWITCH(TRUE(),
__Next < __Current,__Current & "",
__Next - __Current = 1,__Current & "",
__Current & "-" & __Next - 1
)

@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...
Anonymous
Not applicable

@Greg_Deckler, thanks for providing that. The Lesson Number worked wonderfully, but the adjustments to the Page Number column aren't working properly. Rather than show the Lesson 1 page range, it just removed the page range for all items at the end of a particular lesson.

 

TOC EntryPage NumberPage Range
Lesson 1: Getting Started11
Topic A: Navigate22-19
Topic B: Create2020
Lesson 2: Developing Content3030
Topic A: Create3131-39

 

What I'm hoping is possible would be this result (if this makes sense):

 

TOC EntryPage NumberPage Range
Lesson 1: Getting Started11-29
Topic A: Navigate22-19
Topic B: Create2020-29
Lesson 2: Developing Content3030-39
Topic A: Create3131-39

 

 

@Anonymous - I feel like I've answered this like 12 times now and you keep changing the goal posts. So, you are just going to have to play with the pattern to meet your ever-changing requirements.


@ me in replies or I'll lose your thread!!!
Instead of a Kudo, please vote for this idea
Become an expert!: Enterprise DNA
External Tools: MSHGQM
YouTube Channel!: Microsoft Hates Greg
Latest book!:
The Definitive Guide to Power Query (M)

DAX is easy, CALCULATE makes DAX hard...
Anonymous
Not applicable

@Greg_Deckler, I have marked the fourth response as a solution since that contains the code that worked for the original question.

 

The "changing of the goal posts" resulted from either my noticing a problem with the code you gave that needed to be resolved or thinking of a new value-add item that builds on the solution you provided. Assistance is by no means required, but that is the nature of an online help forum: people ask for help if they do not know how to solve something.

 

I do appreciate the help you did provide and wish you well.

Anonymous
Not applicable

Hi @Anonymous,

 

This is M version rather than DAX, if this is an acceptable way of solving it. Plug your source into the SourceTable step. 

 

let
    Source = Table.FromRows(Json.Document(Binary.Decompress(Binary.FromText("XY7BCsIwEER/Zcm5h7YShdw0ES/iRdFD6SHYJYRKIs1Sf98tFml6nTczvKYRZ0wpBqgUnJDIBwdXsgNhJwpRibZoxC2+/RP2Ci529M4SMqkX5KDg7vEzpeUi1gr0gHNdLoCB7GrzG80etQKDI764ySo6BsJA3JJl7vK/llWucuz81N+uVB5x6Kd4LWIweReY7HjQfgE=", BinaryEncoding.Base64), Compression.Deflate)), let _t = ((type text) meta [Serialized.Text = true]) in type table [#"TOC Entry" = _t, #"Page Number" = _t]),
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"TOC Entry", type text}, {"Page Number", Int64.Type}}),
    
    SourceTable = #"Changed Type",
    MakeLastPage = List.Transform(List.Skip (SourceTable[Page Number]), (x)=> x-1),
    MakeTable = Table.FromColumns(Table.ToColumns(SourceTable) & {MakeLastPage}, type table [#"TOC"=text, #"First"=Int64.Type, #"Last"=Int64.Type]),
    AddPageRange = Table.AddColumn(MakeTable, "Page Number", each if [First] = [Last] or [Last]=null then Number.ToText([First]) else Number.ToText([First]) & "-" & Number.ToText([Last]), type text),
    Output = Table.SelectColumns(AddPageRange,{"TOC", "Page Number"})

in
    Output

 

Kind regards,

John

 

Anonymous
Not applicable

@Anonymous, This is great. I really wanted an M solution as I prefer edits being made all in one space. I did have to make a slight change to allow for additional columns in my source data:

 

MakeTable = Table.FromColumns(Table.ToColumns(SourceTable) & {MakeLastPage}, type table [#"TOC"=text, #"First"=Int64.Type, #"Course"=text,#"Index"=Int64.Type,#"Last"=Int64.Type]),

 

Beyond that, the only other item I'd like to solve is creating a page range for the "Lesson" sections to cover all of the "topics" within them.

 

Fortunately, your solution allowed me to solve this by creating a separate table containing only the Lesson entries, applying your M query to the filtered table, and then merging to the TOC table. Worked like a charm. Kudos coming your way.

Anonymous
Not applicable

Hi @Anonymous 

 

Thanks, mate. Happy to help 🙂

 

Kind regards,

JB

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors