If I have a snowflake model in my Datawarehouse, do I need to flatten/denormalize that to star schema before connecting Power BI in, to improve performance or storage? I assume that Power BI is using verti pack/MD storage to store MOLAP data internally. So in that case, will there be a performance difference, storage cost difference, for star schema DW over Snowflake? Which one is the best candidate?
I see that Power BI can connect directly to OLTP DBs as well and report agains it. So if I need only operational reports, is there a point in restructuring data to read optimized one?
Power BI utilizes the SSAS Tabular storage and formula engines. This includes very powerful compression based on the columnstore technology utilized by the storage engine.
Snowflake vs star isn't too important in terms of storage. You won't see any storage space benefits of snowflake over star like you might in a SQL database, because the column-based compression makes duplicate values in a field nearly costless in terms of storage.
Snowflake sees a slight performance hit compared to star for query run-time speed. This is because joins are expensive. Certainly, they're optimized and quite fast in the Tabular storage engine, but doing N joins will always be faster than >N joins.
Snowflake vs star can be important for user-friendliness, depending on how extreme your snowflaking is. In my experience, non-technical users find a star more approachable and easier to use.
My rule of thumb is to stick to a star schema unless I've got a good reason not to (and there are plenty of good reasons).
Direct Query against SQL sources is possible, but you cannot define measures against these sources, which limits a lot of the functionality. The DAX -> SQL translation layer is also not the most efficient, though it is currently undergoing performance enhancements, so this should improve. Using Power BI against a normalized OLTP schema seems a recipe for frustration to me, though it should still be doable.
The use case is more for going against a SQL data warehouse in a dimensional schema, to be able to meet simple reporting needs quickly.
But with tabular model you can not create a relationship on a "non key" dimension attribute and using a another granularity.
And for this reason, sometimes you have to use snowflake modelisation.
Nowadays, I prefer to use snowflake in my sql dwh and flattenize dimensions by using views.
On the other hand, Direct Query would be not so bad in term of performance when combining Clustered Column Store Indexes on your SQL Datawarehouse combining "super dax" querying available in SQL Server 2016: