Register now to learn Fabric in free live sessions led by the best Microsoft experts. From Apr 16 to May 9, in English and Spanish.
I am having some trouble figuring out the most efficient way to first, setup my data and second, send into powerbi. The sheer number of options adds to the confusion as well - in regards to determining whats best for my use case.
I have data from energy monitoring devices being logged into an sql database on AWS every 5 minutes. The data is growing very fast very quickly. We are up to about 2 Mlillion rows in a month. I need to send that data into Powerbi. We have created an API endpoint for the data which we use in Powerbi to get the data we need. Worked well for awhile, then started slowing waaay down on refreshes.
We are debating the following:
You can tell I am pretty new to all this. I am more of an analyst than a db person. I am just curious how you guys would have set things up with the kind of data and requirements I have.
I should mention that there is an easy way to test this. Export the data you already have in your DB into a CSV format your Power BI model can digest. Then just see how long it takes Power BI to suck that data in compared to accessing the DB. Should get you a quick read on where to go.
I'm not an authority on performance but I strongly suspect that your best option could be to have the devices spit the data into plain vanilla CSV files in a folder without doing anything fancy at all, and then just import the files using the "Folder" method. That's right, don't even store the files in a DB unless there is some other reason why you would do that. I've had some pretty surprising results in this regard. If it is easy to test, this is what I would do.
hello @erik_tarnvik
The raw data requires some cleaning and processing ( calculations ) before its in usable format for powerbi hence the route through the sql database. The data actually comes in xml format. I have considered doing all the cleaning and calculations in powerbi but decided against it because I thought shifting all that to the server side is better for performance.
After I clean up the data, my options are:
I am hoping someone can point me in the right direction of whats best / optimal
I recently did some testing on a reasonably large dataset (>3m lines) and found Power BI to be quite efficient at reading CSV files, although there was not much cleaning involved in my dataset. I also had some non-intuitive results. For example, I would have expected reading one CSV file with >3m lines to be more effecient than reading 500 files with the same total amount of data split among those 500 files.
That was not the case, Power BI read 500 files in about half the time it took to read the exact same amount of data in one file. My conclusion would be that there is no real substutute to testing various approaches before selecting one that works for you. I would go ahead and test your method 2 and see if you get adequate performance that way. I would be interested to hear whqt you learn if you do, I have some similar questions of my own and your input would be interesting!
Covering the world! 9:00-10:30 AM Sydney, 4:00-5:30 PM CET (Paris/Berlin), 7:00-8:30 PM Mexico City
Check out the April 2024 Power BI update to learn about new features.
User | Count |
---|---|
111 | |
100 | |
80 | |
64 | |
58 |
User | Count |
---|---|
146 | |
110 | |
93 | |
84 | |
67 |