Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
Dear Community,
I have a question :
In a Big Data Architecture, which solution would you advocate to transform Data from a Data Lake Store :
Can you give some advantages and disadvantages ?
Thank you for your answers.
@Anonymous huh... I responded to this on Friday but apparently the site didn't take it. (Although it notified me of other comments, which is weird). In any case, here are my thoughts. In a nutshell, our organization is going through this right now and our choice is databricks. Multiple language support, but scale up and down for different jobs and direct access to store in SPARK is to good to pass up. ADF can control the process flow, but databricks is where we're doing all the heavy lifting. The downside is the other tools have robust GUI's, and databricks requires understanding or learning a new language, but there are a bunch of options.
Thank you for your answers.
Finally, we will design this architecture :
Data Lake Store ---> Databricks ---> Azure SQL DWH ---> Azure Analysis Services ---> Power BI
We will use LiveConnect.
My questions :
1. In your opinion, does it seem relevant to you ?
2. Is LiveConnect better than DirectQuery ?
3. Do you find a data transformation solution better than Azure Analysis Services ?
Thank you !
@Anonymous
1. In your opinion, does it seem relevant to you ? Yes
2. Is LiveConnect better than DirectQuery ? Yes
3. Do you find a data transformation solution better than Azure Analysis Services ? AS shouldn't do any heavy lifting in terms of ETL transformation, by the time you hit that stage you should have your objects pretty well laid out. It does a fantastic job of modeling to the needs of your report, as well as adding the additional business side outputs in measures and calculations. In addition to this semantic layer, it compresses the data so you get much better performance on large models and the live connection is very performant as we result.
hi, @Anonymous
I think you could transform data source in Edit Queries instead of these
https://docs.microsoft.com/en-us/power-bi/desktop-query-overview
https://docs.microsoft.com/en-us/power-bi/desktop-common-query-tasks
Best Regards,
Lin
@Anonymous My company is going down this route right now. We're going to be doing all ETL in databricks. From my perspective, the scale and options to process data quickly and scale down resources when jobs are not running outweighs alot of other methods. It comes at a cost of individuals knowing the languages vs. SSIS or ADF where those are easier to implement. Overall for the Enterprise data workstreams we're using databricks, for over all job execution (not etl) - ADF for orchastration, and will likely throw in some Power BI dataflows at some point for quick win reports or analysis.
I don't have direct experience with HDI but I hear its a complex beast, and I don't know snowflake.