Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
gulrog
New Member

Azure DocumentDB and slow refresh

I'm testing PowerBi and the azure documentdb datasource. It works fine, but when I try to Refresh the data it takes forever. I have 500 000 records in this database, and it uses around 15 minutes to refresh. In a production environment it would be 25 million records in this database and I would like it to update as often as possible (every 1 hour it seems), but with this refresh speed it would not finnish an update within 1 hour. Am I doing someting wrong? I was thinking that PowerBi should handle large amount of data, but it looks like every time I update it fetches all records from the database again (even if none of them has changed).

1 ACCEPTED SOLUTION
ankitpatira
Community Champion
Community Champion

@gulrog Below is the screenshot of data refresh rate in power bi based on subscription. Even if you have Pro it is going to take a lot to refresh 25 million records. Data refresh in power bi is not delta so each time you refresh it gets all the data.

 

I would suggest you look at limiting your data you import. Also power bi has direct query feature that doesn't import but connects live to data source. DirectQuery is available with azure sql db and azure sql dw but i don't think it is yet available with document db since it is still in beta. You can submit that as an idea to get it prioritised.

 

Capture.PNG

View solution in original post

4 REPLIES 4
ankitpatira
Community Champion
Community Champion

@gulrog Below is the screenshot of data refresh rate in power bi based on subscription. Even if you have Pro it is going to take a lot to refresh 25 million records. Data refresh in power bi is not delta so each time you refresh it gets all the data.

 

I would suggest you look at limiting your data you import. Also power bi has direct query feature that doesn't import but connects live to data source. DirectQuery is available with azure sql db and azure sql dw but i don't think it is yet available with document db since it is still in beta. You can submit that as an idea to get it prioritised.

 

Capture.PNG

Hmm... I get a feeling that the whole product is in beta. I was first trying to use the REST API, but it turned out it was not possible to do something as simple as search out records from yesterday (because of missing DAX support). I then I then decided to use Azure stream analytics together with power bi to analyze realtime events and have an auto updating dashboard. It worked fine, but I also need ot analyze the historical data, so I tried with azure documentdb, but as it turns out it is not a good match with PowerBI, so how I see it now, only traditional sql server will work, but then I need to investigate a lot into tuning indexes and partitions to make it perform good enough with 25 million records. SQL DW could be an option, but I fear the learning curve is a bit steep since I don't know anything about DW. 

 

I read that the team behind PowerBI had a goal that people that see the product should Wow within 5 minutes. I really did, but sadly that Wow turned into a dissapointment when digging a bit deeper.

 

Anyhow - thank you for taking time to answer my question.

Anonymous
Not applicable

Filtering DocumentDB data based on relative times at the source is possible with some tweaking:

 

1. Create a DocumentDB data source as normal. Do not enter any custom SQL.

2. Right click the query and select "Advanced Editor"

3. Immediately after the let line add in your custom query such as the example below. This uses the Power Query functions to construct a string on the fly that will be sent to the database. In this case it works out the date for 30 days ago and embeds that date and the current date into the string to get data for the last 30 days only. If done correctly you should see a new "querystring" step added under Query Settings.

 

let

   querystring = "SELECT * FROM yourdoc WHERE (yourdoc.datetimefield BETWEEN '" & DateTimeZone.ToText(DateTimeZone.UtcNow() - #duration(30,0,0,0),"yyyy-MM-ddTHH:mm:ss.000Z" ) & "' AND '" & DateTimeZone.ToText(DateTimeZone.UtcNow(),"yyyy-MM-ddTHH:mm:ss.000Z") & "')",

 

*** rest of original script ***

 

4. Under query settings go to the Source step. Should look something like:

 

= DocumentDB.Contents("https://yourserver.documents.azure.com", "yourdb", "yourcollection")

 

5. Add a 4th parameter to the end exactly as shown here-> [Query = querystring]

 

= DocumentDB.Contents("https://yourserver.documents.azure.com", "yourdb", "yourcollection", [Query = querystring])

 

PowerBI will now use a filtered query against DocumentDB

Dear all, 

 

How should my query look like when I specifically need to get data where  "Document.LaneId"=1

 

 

 
let
    Source = DocumentDB.Contents("https://xxx-iot-service-data.documents.azure.com/"),
    iotreportdata = Source{[id="iotreportdata"]}[Collections],
    iotreportdata_xxxTestingAggregatedBy5Minutes =      
    iotreportdata{[db_id="iotreportdata",id="XXXTestingAggregatedBy5Minutes"]}[Documents],
    #"Expanded Document" = Table.ExpandRecordColumn(iotreportdata_XXXTestingAggregatedBy5Minutes, "Document", 
    #"Changed Type2" = Table.TransformColumnTypes(#"Renamed Columns",{{"Document.ProductsInRange", Int64.Type}, {"Time", type time}})
in
    #"Changed Type2"

 

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.

Top Solution Authors
Top Kudoed Authors