Skip to main content
cancel
Showing results for 
Search instead for 
Did you mean: 

Grow your Fabric skills and prepare for the DP-600 certification exam by completing the latest Microsoft Fabric challenge.

Reply
a_pereira
New Member

Reading large Azure data lake parquet files in power bi

Hi everyone,

 

I would like to implement a dashboard in power bi using a parquet file from Azure Data Lake blob storage.

 

It contains 4 columns (Date, ID, Product Price, number of stores) and this dashboard would later on be filtered by ID.

 

I thought I found the answer by using Azure Data Lake storage Gen 2 "Get Data" method and adding the parquet file URL in the data lake as bellow :

 

https://<accountname>.dfs.core.windows.net/<container>/<subfolder>

 

I get access to the correct file and I can combine and load the data.

 

Unfortunately, only the import method is available and the file is way too heavy to be imported in Power Bi Desktop. (~70GB)

 

I then found some documentation stating that there was a possibility to use Dataflows with Direct Query to avoid importing the data and still building reports by querying directly from the Azure Data Lake Storage.

 

I found how to connect Azure to Power Bi and build Dataflows but can't manage to reach the parquet file I need.

 

can't find a way to create the Common Data Model folder necessary to implement in the Dataflow to get the data directly from ADLS. (But found the way to create a dataflow via CDM folder)

 

Could you please help with creating a CDM file in Azure Data Lake Storage ?

I also saw in the documentation that only CSV files are permitted in a CDM file, is that so ? Can I not use my parquet file as a data source ?

 

If this is not the correct way to do so, could you please provide me with a solution for querying/importing large datasets (~70GB) into Power Bi Desktop with Azure Data Lake Storage without modifying the original dataset ?

 

Thank you for your help,

AP

7 REPLIES 7
bcdobbs
Super User
Super User

I'd agree with @TomMartens

As an alternative to the a spark sql pool you could give serverless sql in synapse ago. You can create a view against your exisiting data lake parquet which you can then use direct query against.

Have a look at https://docs.microsoft.com/en-us/azure/synapse-analytics/sql/create-use-views



Ben Dobbs

LinkedIn | Twitter | Blog

Did I answer your question? Mark my post as a solution! This will help others on the forum!
Appreciate your Kudos!!

Sorry to ask a follow up question in old thread. May I know what is the license requirement to setup serverless sql in synapse? Recently Fabric is released and it seems Synapse is under Fabric's umbrella. Also do we need SQL server license in this case?

R1k91
Continued Contributor
Continued Contributor

Fabric is in preview and you should not use it in prod. 
Synapse is a separate product. to use serverless cluster you just need to create a synapse worksapce in azure. it'll have a default synapse serveless pool that you can use. it's pay per use. you pay per query and data movement.

Is below the correct link I should look into?
Pricing - Azure Synapse Analytics | Microsoft Azure

R1k91
Continued Contributor
Continued Contributor

yes it is. 

serverless is under data warehousing workloads

 

R1k91_0-1692114824949.png

 

dedicated is under the same category but is optional

a_pereira
New Member

Up

Hey @a_pereira ,

 

when you are starting to create a new dataflow select "Define you new table" instead of "Attach a Common data model folder"

image.png

 

I assume this will help, but please be aware that there might be data size limits as well. DirectQuery to dataflow does not mean the Parquet files will be queried in DQ mode, instead the dataflow will be queried.

If you have a Spark/Databricks Cluster, I would recommend using DQ against a Spark SQL query.

Hopefully, this provides some ideas on how to tackle your challenge.

 

Regards,
Tom



Did I answer your question? Mark my post as a solution, this will help others!

Proud to be a Super User!
I accept Kudos 😉
Hamburg, Germany

Helpful resources

Announcements
RTI Forums Carousel3

New forum boards available in Real-Time Intelligence.

Ask questions in Eventhouse and KQL, Eventstream, and Reflex.

MayPowerBICarousel1

Power BI Monthly Update - May 2024

Check out the May 2024 Power BI update to learn about new features.

Top Solution Authors