I setting up a star Schema from transational data. An important metric is active users at time of day for a location and who they are. There is a lot of overlap between this an the need to calculate the number of open tickets, current employees etc. What I would like is to create a star schema so it is easy & efficient to calculate active users/visitors.
Relevant reading & background info:
A lot of the infomation about how to do this is every spread out across various forums & comunity posts, So I will aim to bring what I have found together and share my thoughts too. Maybe some of my ideas are faulty leading to issues in the metrics calculations. Yes I could get clarification on the star Schema seperately but the design challanges seem to overlap with SCD challenges. This is intended both for the person who may have the answer to my current issues but also if the reader is new to some of the topics.
the source for the fact table is essencially:
User FK, Location FK, Status, Date FK
1 1 Enter date time
1 1 Exit date time
Star Schema info:
https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dimen... & of course guyinacube's SCD video https://www.youtube.com/watch?v=tKeaQpWynzg .
A visitor can enter and exit often, this would be a Slowly Changing Dimension. But which type? Since I need to keep the history, Type 2 or 4 would make sense to me. I have currently decided to try Type 4 (history/mini-dimension) due to the high possible frequency that users could enter and Exit.
The next question for the fact table is date and time. Readings so far (including Kimble) seem to suggest having the time table seperate to reduce table size with the disadvantage of days actually being shorter/longer with changes during daylight savings.
This would result in a fact table like so, with the Visit Dimension having the User's history:
User FK, Visit FK, Location FK, Status, Date FK, Time FK
1 1 1 Enter date time
1 1 1 Exit date time
The Visit dimention table would then be;
Visit PK Start End
1 date time date time
Thoughts and challanges with this schema shape:
It seems to be generally recomeded that date key is stored as day type & thus I assume the same applies to time https://www.sqlbi.com/articles/choosing-between-date-or-integer-to-represent-dates-in-power-bi-and-t....
Info on wether or not Visit table should also include a User FK seems mixed but having does seem to be an advantage but I also get the sense it would be better to calculate distinct visitors using the fact table so the metric can also be filtered by location or other dimensional tables.
Calculate active visitors
also reffered to as active/current & users/employees/visitors/tickets
amitchandak provides this blog and youtube video:
the awesome Greg_Deckl also provides some good blogs:
Enterprise DNA's youtube video:
In summary, these are all Interesting methods and they seem to fall in to creating a calculated table that blows out the data (Greg's Open tickets) or creating a "table" inside the dax measure (DNA's version) and a sql like filter (v-yuezhe-msft). From a pure efficiency stand point it, if data is to be blownout, it would seem better to have the source/wharehourse blow out the fact table to meet this need. But a regular data mart is going to have more then one Slowly Changing Dimensions, and it would appear excessive to have a data wharehouse handle them all like that. It would also go against the Kimball method, which appears to be still best practice for a data mart and Power BI. So it looks like an in memory version should be considered and tried. However, most of these example do not consider the impact of a fact table or Date and Time mentions. I understand that they want a simple data model to show the measure, but if it's too simple then it might not work well in a enterprise data mart/star schema.
So I built a PBI data set with a small star schema with all the PKs, FKs and the SCDs. See here:
I have added a type 2 SCD for location too as might be expected by location dimenion table.
A visit can be anywhere from 0s duration to a couple of weeks and more then once a day. This should be reasonably well represented in the fake data across the 10 days.
I do wonder if the fact table could be designed better but so far it seems to fit with star schema methods.
Advice, ideas and suggestions are very much welcome.
Ok I am getting much closer to a solution, here is what I got so far..
Lets start with the Schema,
The key component is that there is no active relationship between DimTime or DimDate and DimVisits.
To caculate Active Users/Visitors for datetime calculations, two methods can be used;
Current Staff Datetime = var result = CALCULATE( DISTINCTCOUNT(DimVisits[User FK] ), FILTER( VALUES( DimVisits[StartDatetime] ), DimVisits[StartDatetime] <= MAX( DimDate[Date] ) + MAX(DimTime[Time]) ), FILTER( VALUES( DimVisits[EndDatetime] ), OR( DimVisits[EndDatetime] >= MIN( DimDate[Date]) + MIN(DimTime[Time]), ISBLANK( DimVisits[EndDatetime] ) ) ) ) Return IF(result=0,0,result)
Current Members Datetime = VAR minDate = FIRSTDATE('DimDate'[Date]) VAR maxDate = LASTDATE('DimDate'[Date]) VAR minTime = MIN(DimTime[Time]) VAR maxTime = MAX(DimTime[Time]) VAR tmpTable = SELECTCOLUMNS( FILTER( DimVisits, [StartDatetime] <= maxDate + maxTime && [EffectiveDatetime] >= minDate + minTime ),"User FK", [User FK] ) VAR tmpT2 = GROUPBY(tmpTable,[User FK]) VAR result = COUNTROWS(tmpT2) RETURN IF(result=0,0,result)
But neither work for time only calculations.
For time only calculations:
Active Employee Time = VAR currentTime = MAX ( DimTime[Time]) VAR result = CALCULATE ( DISTINCTCOUNT( DimVisits[User FK] ), FILTER ( DimVisits, ( DimVisits[STime] <= currentTime && DimVisits[ETime] >= currentTime ) ) ) return IF(result=0,0,result)
However, this doesn't filter by date (date slicer).
Hi @AaronC ，
When I click the link you shared ,it return me an error.
Hi @v-luwang-msft, thank you for this. I have replaced the dropbox link with a google drive link. https://drive.google.com/file/d/1GHuUHwVHK8TWDJWXzanNOM90AG69M7A_/view?usp=sharing
Welcome to the Power BI Community Show! Jeroen ter Heerdt talks about the importance of Data Modeling.
Put your data visualization and design skills to the test! This exciting challenge is happening now through May 31st!
Mark your calendars and join us on Thursday, May 26 at 11a PDT for a great session with Ted Pattison!
Click here to read more about the May 2022 updates!