Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.

Reply
juju
Helper III
Helper III

Datasource questions for PowerBI

 

I am having some trouble figuring out the most efficient way to first, setup my data and second, send into powerbi.  The sheer number of options adds to the confusion as well - in regards to determining whats best for my use case.

 

I have data from energy monitoring devices being logged into an sql database on AWS every 5 minutes. The data is growing very fast very quickly. We are up to about 2 Mlillion rows in a month.  I need to send that data into Powerbi.  We have created an API endpoint for the data which we use in Powerbi to get the data we need.  Worked well for awhile, then started slowing waaay down on refreshes.

 

We are debating the following:

 

  1. Is an AWS sql db the best place to send this data? What works best with powerbi for this kind of data? Maybe Azure sql db works better with powerbi? I have seen discussions around Tabular models, Azure blobs, Azure tables etc Wondering if my data should be stored there instead of the aws sql db.
  2. What if we created a view in the db? Does this improve performance vs hitting the actual db itself?
  3. Should we use directquery or import?  Dashboard refreshes by the hour is what we are shooting for.  I have seem folks talk about several million rows of data imported without an issue and I am inclined to go the import route

You can tell I am pretty new to all this. I am more of an analyst than a db person.  I am just curious how you guys would have set things up with the kind of data and requirements I have. 

4 REPLIES 4
erik_tarnvik
Solution Specialist
Solution Specialist

I should mention that there is an easy way to test this. Export the data you already have in your DB into a CSV format your Power BI model can digest. Then just see how long it takes Power BI to suck that data in compared to accessing the DB. Should get you a quick read on where to go.

erik_tarnvik
Solution Specialist
Solution Specialist

I'm not an authority on performance but I strongly suspect that your best option could be to have the devices spit the data into plain vanilla CSV files in a folder without doing anything fancy at all, and then just import the files using the "Folder" method. That's right, don't even store the files in a DB unless there is some other reason why you would do that. I've had some pretty surprising results in this regard. If it is easy to test, this is what I would do.

hello @erik_tarnvik 

 

The raw data requires some cleaning and processing ( calculations ) before its in usable format for powerbi hence the route through the sql database. The data actually comes in xml format.  I have considered doing all the cleaning and calculations in powerbi but decided against it because I thought shifting all that to the server side is better for performance.

 

After I clean up the data, my options are:

 

  1. Use directquery 
  2. dump the output of the server, hourly, in a csv format into an Azure blob for powerbi to access
  3. import the entire db directly into powerbi hourly ( doesnt sound efficient  )

 

I am hoping someone can point me in the right direction of whats best / optimal

 

 

erik_tarnvik
Solution Specialist
Solution Specialist

I recently did some testing on a reasonably large dataset (>3m lines) and found Power BI to be quite efficient at reading CSV files, although there was not much cleaning involved in my dataset. I also had some non-intuitive results. For example, I would have expected reading one CSV file with >3m lines to be more effecient than reading 500 files with the same total amount of data split among those 500 files.

 

That was not the case, Power BI read 500 files in about half the time it took to read the exact same amount of data in one file. My conclusion would be that there is no real substutute to testing various approaches before selecting one that works for you. I would go ahead and test your method 2 and see if you get adequate performance that way. I would be interested to hear whqt you learn if you do, I have some similar questions of my own and your input would be interesting!

Helpful resources

Announcements
Microsoft Fabric Learn Together

Microsoft Fabric Learn Together

Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City

PBI_APRIL_CAROUSEL1

Power BI Monthly Update - April 2024

Check out the April 2024 Power BI update to learn about new features.

April Fabric Community Update

Fabric Community Update - April 2024

Find out what's new and trending in the Fabric Community.