For me a big dataset should be those size are too large for Power BI desktop to handle, and would casue the system in bad performance.
Yes, I agree with that. And Direct Query in Power BI are somehow not very smart to optimize the Query from the data source, it will translate the query language in a "simple" way. Well I suspect the product team are already deal with that part. Hope this part could be better in future.
Has anyone furthered this researched? If I choose the exact same attributes and measures for DirectQuery and Import, the same data is being "pulled in"; it's just done in a different way. With DirectQuery it's just compressed and put into memory (for Desktop). I imagine it's put into Azure for Online. With Import it's put directly in the pbix file. Since it appears that the compression method with DirectQuery is better, I do see some benefit there.
Obviously for SQL Server - we'll choose Tab or MD over connection directly for the datamart. With SAP HANA though, this presents a major obstacle. I can't filter on date and then aggregate by year effectively, because Power BI wants to pull in every record since I have chosen date as an attribute. My expectation would be multiple SQL requests to populate distinct attributes and then subsequent calls to populate the other visualizations.
We're working with billions of records (without aggregateion), so this is something we are really trying to get a handle on. For example, we have an SAP HANA database that has 1.2B distinct records when aggregated by date with primary key and 8M records when aggregated by year with primary key. If I simply want to display total sales (exactly 1 value) while filtering, I have to "pull in" all 1.2B or 8M records.
To me the true point of DirectQuery is missing here.
Based on this text I don't believe DirectQuery is behaving the way it is intended - or I'm not using it correctly.
"The differences between selecting Import and DirectQuery are the following:
Import – the selected tables and columns are imported into Power BI Desktop. As you create or interact with a visualization, Power BI Desktop uses the imported data. You must refresh the data, which imports the full data set again, to see any changes that occurred to the underlying data since the initial import or the most recent refresh.
DirectQuery – no data is imported or copied into Power BI Desktop. The selected tables and columns appear in the Fields list.
As you create or interact with a visualization, Power BI Desktop queries the underlying data source, which means you’re always viewing current data."