Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
JFGC08
Frequent Visitor

Web scraping

Hello community,

 

I need to weekly store data from a web SCADA, I am trying to use power query but when I drill down in the HTML document I find the element but it is empty, any guess what could be the problem?

 

I attach the M code and some screenshots

 

let
Source = Web.Page(Web.Contents("http://192.168.111.111/?dse=engine  ")),
Data = Source{27}[Data],
Children = Data{0}[Children],
Children1 = Children{1}[Children],
#"Filtered Rows" = Table.SelectRows(Children1, each ([Kind] = "Element") and ([Name] = "DIV")),
Children2 = #"Filtered Rows"{4}[Children],
Children3 = Children2{22}[Children],
Children4 = Children3{0}[Children],
Children5 = Children4{1}[Children],
Children6 = Children5{0}[Children],
Children7 = Children6{7}[Children],
Children8 = Children7{2}[Children]
in
Children8

 

pq.JPGweb.JPG

 

4 REPLIES 4
PhilipTreacy
Super User
Super User

Hi @JFGC08 

 

Yes unfortunately Web.BrowserContents is only in PBI - but you posted in the Power BI forums??  🙂

 

If the element you needed is dynamically generted, that is, for example it is loaded via JavaScript/AJAX, then PBI won't be able to see it.  You certain it's dynamically loaded?

 

I can't test this as I don't have access to your internal web server.

 

Phil



Did I answer your question? Then please mark my post as the solution.
If I helped you, click on the Thumbs Up to give Kudos.


Blog :: YouTube Channel :: Connect on Linkedin


Proud to be a Super User!


Hi @PhilipTreacy 

 

Yes I posted in BI forums because I did not find an specific forum for power query, my bad. Now I am testing in PBI.

 

Yes I am sure the content is generated dyamically, I found a couple of web pages for testing purposes, one with static content and another with dynamic content.

 

If I inspect the static page in elements, the values of the tables are there, the same if I ispect in sources.

If I inspect the dynamic page in elements, the values of the tables are there, but if I use sorces what I see is the JS co

de.

 

How to Convert a Static HTML Table to a Dynamic JavaScript Data Grid (freecodecamp.org)

static content

Table Example (eviltester.github.io)

Dynamic content

Table Example (eviltester.github.io)

 

So, when I retrieve the data by using  Web.BrowserContents what I get is the code no the values. I could not attach the PBI file here because the .pbix is not supported. 

 

Below is the data I get in PBI by using Web.BrowserContents

Dynamic table

<html><head>
    <title>Table Example</title>
</head>

<body>

    <style>
        table {
            border-collapse: collapse;
            width: 100%;
        }

        td,
        th {
            border: 1px solid #000000;
            text-align: left;
            padding: 8px;
        }
    </style>

    <h1>TODO List</h1>

    <div id="data-table">
        <table id="html-data-table">
            <tbody><tr>
                <th>userId</th>
                <th>id</th>
                <th>title</th>
                <th>completed</th>
            </tr>
        </tbody></table>    
    </div>

    <script type="text/javascript" charset="utf-8">

        fetch('https://jsonplaceholder.typicode.com/todos')
            .then(function (response) {
                return response.json();
            }).then(function (apiJsonData) {
                console.log(apiJsonData);
                renderDataInTheTable(apiJsonData);
            })

        function renderDataInTheTable(todos) {
            const mytable = document.getElementById("html-data-table");
            todos.forEach(todo => {
                let newRow = document.createElement("tr");
                Object.values(todo).forEach((value) => {
                    let cell = document.createElement("td");
                    cell.innerText = value;
                    newRow.appendChild(cell);
                })
                mytable.appendChild(newRow);
            });
        }
    </script>


</body></html>

Static table 

<html><head>
    <title>Table Example</title>
</head>

<body>

    <style>
        table {
            border-collapse: collapse;
            width: 100%;
        }

        td,
        th {
            border: 1px solid #000000;
            text-align: left;
            padding: 8px;
        }
    </style>

    <h1>TODO List</h1>

    <div id="data-table">
        <table id="html-data-table">
            <tbody><tr>
                <th>userId</th>
                <th>id</th>
                <th>title</th>
                <th>completed</th>
            </tr>
            <tr>
                <td>1</td>
                <td>1</td>
                <td>My todo 1</td>
                <td>false</td>
            </tr>
        </tbody></table>    
    </div>



</body></html>

  

Regards.

PhilipTreacy
Super User
Super User

Hi @JFGC08 

 

If I start a new Web Query it uses Web.BrowserContents not Web.Page(Web.Contents())

 

If you use Web.BrowserContents you should get presented with a list of tables on the page, like this

 

web-tabs.png

 

You can then select the table you want, rather than navigating though the query inside PQ as you are currently.

 

Here's my query

 

let
    Source = Web.BrowserContents("https://www.microsoft.com"),
    #"Extracted Table From Html" = Html.Table(Source, {{"Column1", ".carouselbase .card A"}, {"Column2", ".mb-5 P"}, {"Column3", ".carouselbase H2"}, {"Column4", "H3"}, {"Column5", ".highlightcarousel .mb-4"}}, [RowSelector=".carouselbase .card"]),
    #"Changed Type" = Table.TransformColumnTypes(#"Extracted Table From Html",{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type text}})
in
    #"Changed Type"

 

Regards

 

Phil

 



Did I answer your question? Then please mark my post as the solution.
If I helped you, click on the Thumbs Up to give Kudos.


Blog :: YouTube Channel :: Connect on Linkedin


Proud to be a Super User!


Hi @PhilipTreacy 

 

Many thanks for your answer. Yor example code works perfectly.

 

From my side I noticed the following problems.

 

- I was using excel power query where the funtion Web.BrowserContents is not supported. It only works on Power BI desktop. Troubleshooting the Power Query Web connector - Power Query | Microsoft Learn

- I tested the funtion Web.BrowserContents in power BI desktop and I found the element I need,  but it still empty because I think it is generated dynamically

**Code start**

    <section id="engine" class="dynamic-content" style="display: block;">

        <div class="container settingsContainer">

            <div class="col-lg-6">

                &nbsp;

                <table class="engtable">

                    <tbody><tr class="toprow">

                        <th><strong>Engine Instrumentation</strong></th>

                        <th> </th>

                        <th> </th>

                    </tr>

.

.

.

                            <p>Engine Hours</p>

                        </td>

                        <td class="tval">

                            <p id="ehrs"></p>

                        </td>

                    </tr>

                </tbody></table>

            </div>

            <div class="col-lg-6">

                &nbsp;

**Code end**

 

Regards.

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors