Earn a 50% discount on the DP-600 certification exam by completing the Fabric 30 Days to Learn It challenge.
Hello
I have muliple csv in sharepoint ( about 100 files). I want to combine all cvs in one file and generate report in powerbi.
In all their are about 1000 million rows.
How should I process such huge dataset in power bi with low latency?
Your suggestions/ comments would be valuable.
Thank you in advance.
Solved! Go to Solution.
Hello @Anonymous ,
so, we're talking about 1 billion rows in 100 files, right?
From my experience that seems to be too much to load it directly to Power BI.
I personally would store the csv in a BLOB storage and try Azure Synapse:
Instead of ETL, design ELT - Azure Synapse Analytics | Microsoft Docs
Or if you want to stay in the SQL world, I would store the CSV in a BLOB storage and then load it directly or with data factory to an Azure SQL Server:
Bulk access to data in Azure Blob storage - SQL Server | Microsoft Docs
Hello @Anonymous ,
so, we're talking about 1 billion rows in 100 files, right?
From my experience that seems to be too much to load it directly to Power BI.
I personally would store the csv in a BLOB storage and try Azure Synapse:
Instead of ETL, design ELT - Azure Synapse Analytics | Microsoft Docs
Or if you want to stay in the SQL world, I would store the CSV in a BLOB storage and then load it directly or with data factory to an Azure SQL Server:
Bulk access to data in Azure Blob storage - SQL Server | Microsoft Docs
Hi @selimovd
Thank you for a quick reply.
Can I use dataflow? If yes then how should I be using it
Hello
I have muliple csv in sharepoint ( about 100 files). I want to combine all cvs in one file and generate report in powerbi.
In all their are about 1000 million rows.
How should I process such huge dataset in power bi with low latency?
Your suggestions/ comments would be valuable.
Thank you in advance.
Hey @Anonymous ,
I guess that was a double post:
https://community.powerbi.com/t5/Desktop/Processing-of-huge-file-in-power-bi/m-p/1846991
Best regards
Denis
Hey @Anonymous ,
you can definitely give it a try.
Be aware there are some limitations. Check the documentation:
For example:
"Shared dataflows have a refresh limit of two hours per entity, and three per dataflow. So if you have two entities, and each takes two hours, you shouldn't put them in the same dataflow".
I think data flows have some throttling and from my experience Sharepoint tends to be kind of slow. But give it a try. In the best case it works, in the worst case it doesn't.
User | Count |
---|---|
102 | |
91 | |
84 | |
77 | |
71 |
User | Count |
---|---|
113 | |
104 | |
100 | |
73 | |
65 |