Earn the coveted Fabric Analytics Engineer certification. 100% off your exam for a limited time only!
Hello
I have muliple csv in sharepoint ( about 100 files). I want to combine all cvs in one file and generate report in powerbi.
In all their are about 1000 million rows.
How should I process such huge dataset in power bi with low latency?
Your suggestions/ comments would be valuable.
Thank you in advance.
Solved! Go to Solution.
Hello @Anonymous ,
so, we're talking about 1 billion rows in 100 files, right?
From my experience that seems to be too much to load it directly to Power BI.
I personally would store the csv in a BLOB storage and try Azure Synapse:
Instead of ETL, design ELT - Azure Synapse Analytics | Microsoft Docs
Or if you want to stay in the SQL world, I would store the CSV in a BLOB storage and then load it directly or with data factory to an Azure SQL Server:
Bulk access to data in Azure Blob storage - SQL Server | Microsoft Docs
Hello @Anonymous ,
so, we're talking about 1 billion rows in 100 files, right?
From my experience that seems to be too much to load it directly to Power BI.
I personally would store the csv in a BLOB storage and try Azure Synapse:
Instead of ETL, design ELT - Azure Synapse Analytics | Microsoft Docs
Or if you want to stay in the SQL world, I would store the CSV in a BLOB storage and then load it directly or with data factory to an Azure SQL Server:
Bulk access to data in Azure Blob storage - SQL Server | Microsoft Docs
Hi @selimovd
Thank you for a quick reply.
Can I use dataflow? If yes then how should I be using it
Hello
I have muliple csv in sharepoint ( about 100 files). I want to combine all cvs in one file and generate report in powerbi.
In all their are about 1000 million rows.
How should I process such huge dataset in power bi with low latency?
Your suggestions/ comments would be valuable.
Thank you in advance.
Hey @Anonymous ,
I guess that was a double post:
https://community.powerbi.com/t5/Desktop/Processing-of-huge-file-in-power-bi/m-p/1846991
Best regards
Denis
Hey @Anonymous ,
you can definitely give it a try.
Be aware there are some limitations. Check the documentation:
For example:
"Shared dataflows have a refresh limit of two hours per entity, and three per dataflow. So if you have two entities, and each takes two hours, you shouldn't put them in the same dataflow".
I think data flows have some throttling and from my experience Sharepoint tends to be kind of slow. But give it a try. In the best case it works, in the worst case it doesn't.
User | Count |
---|---|
141 | |
113 | |
104 | |
78 | |
64 |
User | Count |
---|---|
136 | |
125 | |
107 | |
70 | |
61 |