Find everything you need to get certified on Fabric—skills challenges, live sessions, exam prep, role guidance, and more.
Get startedGrow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.
Hi everyone,
I would like to implement a dashboard in power bi using a parquet file from Azure Data Lake blob storage.
It contains 4 columns (Date, ID, Product Price, number of stores) and this dashboard would later on be filtered by ID.
I thought I found the answer by using Azure Data Lake storage Gen 2 "Get Data" method and adding the parquet file URL in the data lake as bellow :
https://<accountname>.dfs.core.windows.net/<container>/<subfolder>
I get access to the correct file and I can combine and load the data.
Unfortunately, only the import method is available and the file is way too heavy to be imported in Power Bi Desktop. (~70GB)
I then found some documentation stating that there was a possibility to use Dataflows with Direct Query to avoid importing the data and still building reports by querying directly from the Azure Data Lake Storage.
I found how to connect Azure to Power Bi and build Dataflows but can't manage to reach the parquet file I need.
I can't find a way to create the Common Data Model folder necessary to implement in the Dataflow to get the data directly from ADLS. (But found the way to create a dataflow via CDM folder)
Could you please help with creating a CDM file in Azure Data Lake Storage ?
I also saw in the documentation that only CSV files are permitted in a CDM file, is that so ? Can I not use my parquet file as a data source ?
If this is not the correct way to do so, could you please provide me with a solution for querying/importing large datasets (~70GB) into Power Bi Desktop with Azure Data Lake Storage without modifying the original dataset ?
Thank you for your help,
AP
I'd agree with @TomMartens
As an alternative to the a spark sql pool you could give serverless sql in synapse ago. You can create a view against your exisiting data lake parquet which you can then use direct query against.
Have a look at https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/create-use-views
Sorry to ask a follow up question in old thread. May I know what is the license requirement to setup serverless sql in synapse? Recently Fabric is released and it seems Synapse is under Fabric's umbrella. Also do we need SQL server license in this case?
Fabric is in preview and you should not use it in prod.
Synapse is a separate product. to use serverless cluster you just need to create a synapse worksapce in azure. it'll have a default synapse serveless pool that you can use. it's pay per use. you pay per query and data movement.
Is below the correct link I should look into?
Pricing - Azure Synapse Analytics | Microsoft Azure
yes it is.
serverless is under data warehousing workloads
dedicated is under the same category but is optional
Up
Hey @a_pereira ,
when you are starting to create a new dataflow select "Define you new table" instead of "Attach a Common data model folder"
I assume this will help, but please be aware that there might be data size limits as well. DirectQuery to dataflow does not mean the Parquet files will be queried in DQ mode, instead the dataflow will be queried.
If you have a Spark/Databricks Cluster, I would recommend using DQ against a Spark SQL query.
Hopefully, this provides some ideas on how to tackle your challenge.
Regards,
Tom