Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!

Reply
Jrussi1301
Frequent Visitor

Wordpress Tablepress multi-page table do not import all rows

I need to import all rows of a tablepress (wordpress plugin) page, about 200 rows. The page shows  30 rows each time, but the URL is the same, so, can´t merge multiple datasources. Power BI Desktop gets only the first 30 rows.

 

The table is on the web source, all rows, but the presentation is 30 rows each time.

Tried to use powershell to get a .html file, but the parsing become wrong (some <br /> inside the columns).

Any hint?

 

1 ACCEPTED SOLUTION

No, it didn't work to us. The point is that with this Wordpress Plugin, the URL doesn't change when you change the page. Our solution (after a lot of search on the web) was to adapt a Powershell script we found, in order to filter only the 'table ... /table" section, cleaning its fields, transforming it on a .CSV file. Unfortunately I can't find the original URL, so here is our script, after adaptations:

$url ="https://some_url/some_page/"

$r=Invoke-WebRequest $url
$r.Content |Out-File C:\temp\some_page.html



$Path = 'C:\temp\some_page.html'


[regex]$regex = "(?s)<table id=.*?</table>"
$tables = $regex.matches((GC $Path -raw)).groups.value

ForEach($String in $tables){
$String = $String|ForEach{$_ -replace '(?s)<br.*?</td>','/<td>'}
$table = $String.split("`n")

$CurTable = @()
#$CurTableName = ([regex]'table id="([^"]*)"').matches($table[0]).groups[1].value
$CurTable += ($table[1] -replace "<.th><th class=.*?>",";") -replace "</?(th.*?|b.*?)>"
$CurTable += $table[2..($table.count-2)]|ForEach{$_ -replace '(?s)<br.*?<br' -replace '(?s)<br.*?<','<' -replace "(`r|`n|`t)" -replace "( )+", " " -replace "&nbsp;" -replace "</t(h|d)><t(h|d) class=.*?>",";" -replace "<\tr>","`n" -replace "<.?t(d.*?|b.*?|r.*?|h.*?)>" }


$CurTable | convertfrom-csv -Delimiter ";" | export-csv "C:\temp\some_page.csv" -notype -Encoding UTF8 -Force 
}

 

Of course, for each different page, probably you will need to change this code.

View solution in original post

2 REPLIES 2

No, it didn't work to us. The point is that with this Wordpress Plugin, the URL doesn't change when you change the page. Our solution (after a lot of search on the web) was to adapt a Powershell script we found, in order to filter only the 'table ... /table" section, cleaning its fields, transforming it on a .CSV file. Unfortunately I can't find the original URL, so here is our script, after adaptations:

$url ="https://some_url/some_page/"

$r=Invoke-WebRequest $url
$r.Content |Out-File C:\temp\some_page.html



$Path = 'C:\temp\some_page.html'


[regex]$regex = "(?s)<table id=.*?</table>"
$tables = $regex.matches((GC $Path -raw)).groups.value

ForEach($String in $tables){
$String = $String|ForEach{$_ -replace '(?s)<br.*?</td>','/<td>'}
$table = $String.split("`n")

$CurTable = @()
#$CurTableName = ([regex]'table id="([^"]*)"').matches($table[0]).groups[1].value
$CurTable += ($table[1] -replace "<.th><th class=.*?>",";") -replace "</?(th.*?|b.*?)>"
$CurTable += $table[2..($table.count-2)]|ForEach{$_ -replace '(?s)<br.*?<br' -replace '(?s)<br.*?<','<' -replace "(`r|`n|`t)" -replace "( )+", " " -replace "&nbsp;" -replace "</t(h|d)><t(h|d) class=.*?>",";" -replace "<\tr>","`n" -replace "<.?t(d.*?|b.*?|r.*?|h.*?)>" }


$CurTable | convertfrom-csv -Delimiter ";" | export-csv "C:\temp\some_page.csv" -notype -Encoding UTF8 -Force 
}

 

Of course, for each different page, probably you will need to change this code.

Helpful resources

Announcements
April AMA free

Microsoft Fabric AMA Livestream

Join us Tuesday, April 09, 9:00 – 10:00 AM PST for a live, expert-led Q&A session on all things Microsoft Fabric!

March Fabric Community Update

Fabric Community Update - March 2024

Find out what's new and trending in the Fabric Community.