Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.

Reply
JFGC08
Frequent Visitor

Web scraping

Hello community,

 

I need to weekly store data from a web SCADA, I am trying to use power query but when I drill down in the HTML document I find the element but it is empty, any guess what could be the problem?

 

I attach the M code and some screenshots

 

let
Source = Web.Page(Web.Contents("http://192.168.111.111/?dse=engine  ")),
Data = Source{27}[Data],
Children = Data{0}[Children],
Children1 = Children{1}[Children],
#"Filtered Rows" = Table.SelectRows(Children1, each ([Kind] = "Element") and ([Name] = "DIV")),
Children2 = #"Filtered Rows"{4}[Children],
Children3 = Children2{22}[Children],
Children4 = Children3{0}[Children],
Children5 = Children4{1}[Children],
Children6 = Children5{0}[Children],
Children7 = Children6{7}[Children],
Children8 = Children7{2}[Children]
in
Children8

 

pq.JPGweb.JPG

 

4 REPLIES 4
PhilipTreacy
Super User
Super User

Hi @JFGC08 

 

Yes unfortunately Web.BrowserContents is only in PBI - but you posted in the Power BI forums??  🙂

 

If the element you needed is dynamically generted, that is, for example it is loaded via JavaScript/AJAX, then PBI won't be able to see it.  You certain it's dynamically loaded?

 

I can't test this as I don't have access to your internal web server.

 

Phil



Did I answer your question? Then please mark my post as the solution.
If I helped you, click on the Thumbs Up to give Kudos.


Blog :: YouTube Channel :: Connect on Linkedin


Proud to be a Super User!


Hi @PhilipTreacy 

 

Yes I posted in BI forums because I did not find an specific forum for power query, my bad. Now I am testing in PBI.

 

Yes I am sure the content is generated dyamically, I found a couple of web pages for testing purposes, one with static content and another with dynamic content.

 

If I inspect the static page in elements, the values of the tables are there, the same if I ispect in sources.

If I inspect the dynamic page in elements, the values of the tables are there, but if I use sorces what I see is the JS co

de.

 

How to Convert a Static HTML Table to a Dynamic JavaScript Data Grid (freecodecamp.org)

static content

Table Example (eviltester.github.io)

Dynamic content

Table Example (eviltester.github.io)

 

So, when I retrieve the data by using  Web.BrowserContents what I get is the code no the values. I could not attach the PBI file here because the .pbix is not supported. 

 

Below is the data I get in PBI by using Web.BrowserContents

Dynamic table

<html><head>
    <title>Table Example</title>
</head>

<body>

    <style>
        table {
            border-collapse: collapse;
            width: 100%;
        }

        td,
        th {
            border: 1px solid #000000;
            text-align: left;
            padding: 8px;
        }
    </style>

    <h1>TODO List</h1>

    <div id="data-table">
        <table id="html-data-table">
            <tbody><tr>
                <th>userId</th>
                <th>id</th>
                <th>title</th>
                <th>completed</th>
            </tr>
        </tbody></table>    
    </div>

    <script type="text/javascript" charset="utf-8">

        fetch('https://jsonplaceholder.typicode.com/todos')
            .then(function (response) {
                return response.json();
            }).then(function (apiJsonData) {
                console.log(apiJsonData);
                renderDataInTheTable(apiJsonData);
            })

        function renderDataInTheTable(todos) {
            const mytable = document.getElementById("html-data-table");
            todos.forEach(todo => {
                let newRow = document.createElement("tr");
                Object.values(todo).forEach((value) => {
                    let cell = document.createElement("td");
                    cell.innerText = value;
                    newRow.appendChild(cell);
                })
                mytable.appendChild(newRow);
            });
        }
    </script>


</body></html>

Static table 

<html><head>
    <title>Table Example</title>
</head>

<body>

    <style>
        table {
            border-collapse: collapse;
            width: 100%;
        }

        td,
        th {
            border: 1px solid #000000;
            text-align: left;
            padding: 8px;
        }
    </style>

    <h1>TODO List</h1>

    <div id="data-table">
        <table id="html-data-table">
            <tbody><tr>
                <th>userId</th>
                <th>id</th>
                <th>title</th>
                <th>completed</th>
            </tr>
            <tr>
                <td>1</td>
                <td>1</td>
                <td>My todo 1</td>
                <td>false</td>
            </tr>
        </tbody></table>    
    </div>



</body></html>

  

Regards.

PhilipTreacy
Super User
Super User

Hi @JFGC08 

 

If I start a new Web Query it uses Web.BrowserContents not Web.Page(Web.Contents())

 

If you use Web.BrowserContents you should get presented with a list of tables on the page, like this

 

web-tabs.png

 

You can then select the table you want, rather than navigating though the query inside PQ as you are currently.

 

Here's my query

 

let
    Source = Web.BrowserContents("https://www.microsoft.com"),
    #"Extracted Table From Html" = Html.Table(Source, {{"Column1", ".carouselbase .card A"}, {"Column2", ".mb-5 P"}, {"Column3", ".carouselbase H2"}, {"Column4", "H3"}, {"Column5", ".highlightcarousel .mb-4"}}, [RowSelector=".carouselbase .card"]),
    #"Changed Type" = Table.TransformColumnTypes(#"Extracted Table From Html",{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type text}})
in
    #"Changed Type"

 

Regards

 

Phil

 



Did I answer your question? Then please mark my post as the solution.
If I helped you, click on the Thumbs Up to give Kudos.


Blog :: YouTube Channel :: Connect on Linkedin


Proud to be a Super User!


Hi @PhilipTreacy 

 

Many thanks for your answer. Yor example code works perfectly.

 

From my side I noticed the following problems.

 

- I was using excel power query where the funtion Web.BrowserContents is not supported. It only works on Power BI desktop. Troubleshooting the Power Query Web connector - Power Query | Microsoft Learn

- I tested the funtion Web.BrowserContents in power BI desktop and I found the element I need,  but it still empty because I think it is generated dynamically

**Code start**

    <section id="engine" class="dynamic-content" style="display: block;">

        <div class="container settingsContainer">

            <div class="col-lg-6">

                &nbsp;

                <table class="engtable">

                    <tbody><tr class="toprow">

                        <th><strong>Engine Instrumentation</strong></th>

                        <th> </th>

                        <th> </th>

                    </tr>

.

.

.

                            <p>Engine Hours</p>

                        </td>

                        <td class="tval">

                            <p id="ehrs"></p>

                        </td>

                    </tr>

                </tbody></table>

            </div>

            <div class="col-lg-6">

                &nbsp;

**Code end**

 

Regards.

Helpful resources

Announcements
PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

Top Solution Authors
Top Kudoed Authors