Let’s create dimensional model for the stories feature of Facebook. There are two different processes related to Facebook stories – one is the story creation process and second is the story consumption/viewing process. Hence we will create two separate star schemas for these two processes.
One basic mistake dimensional modelers do is that they start with a particular metric question in mind and then create the star schema to solve that metric. However the design of the dimension model should always start from the business process. If you design the star schema based on the complete business process in scope then you can answer a lot of other metrics which the product managers might need in the future.
The best book to learn Dimensional Modeling is The Data Warehousing Toolkit by Ralph Kimball. I would highly recommend this book for anyone who wants to learn various design patterns involved in Dimensional Modelling.
Before we start designing the dimension models lets understand the step by step process involved in both Story use cases.
Story Creation Process
Lets’s write the steps involved in creating a story. I am keeping the workflow simple for our modelling purposes, hence I am leaving out steps which are ordinarily not needed to post a story (for ex setting privacy or saving instead of posting etc.)
- Open Facebook App
- Tap on Create a Story
- Select the type of Story
- Photo/Video from Camera Roll or Facebook Camera App
- Text
- Music
- Boomerang
- Mood
- Selfie
- Poll
- Select the items to include in your story
- Embellish your story with following possible widgets
- Stickers/GIFs/Emojis
- Text
- Drawing
- Effects
- Tagging People
- Animation
- Press Share to Story button
- Add more to story
- Stories are time sensitive and they disappear after 24 hours
Story Consumption/Viewing
- Open Facebook App
- Scroll on Story Carousel
- Tap on a story to view
- User Action
- Tap Forward
- Tap Backward
- Swipe to next Story
- React – Like, Heart, love, Laugh, Wow, Cry and Angry
- Reply
- Share – Share to your story, Send to Friends via messenger
- Mute user
- Flag and report
Ok, now Let’s start with Story Creation Process Dimension modelling. First thing we will do is identify the grain of the fact table. The best practice is to choose the lowest level grain for the fact table which will enable most number of dimensions and subsequently enable data scientists to perform much deeper analysis. However we also need to consider the growth of our fact table and choose a grain wisely.
Fact table Grain
For our purpose I will choose the grain as each step of the story creation process. Therefore as the user moves through the story creation process flow, at each step there will be a row inserted into our Fact table
Now that our grain is selected, we will choose the dimensions involved at the level of fact grain.
Dimensions
Let’s list all of the dimensions that we can think of
- Date Story Posted On
- Time Story Poster On
- User who posted the story
- User Session used to post the story
- User IP Location
- User Device
- User Browser or App (Method)
- Story
- Story Type
- Step
Facts
We will now list what we can measure at this grain of each step
- Step Duration
All count fields will be set to zero for steps for which they are not applicable
Following are the type of queries the product managers can perform on this fact table
- How many users post stories on a daily basis? i.e. DAU for Stories, similarly monthly, weekly, stickiness etc( See my earlier blog post for more details).
- How much time it takes for users to complete each step for each story type on average?
- How many users complete each step or which step causes most users to leave the process incomplete?
- How many items does a story contain on average?
- How many widgets does a story contain on average?
- Which location has the highest story publication?
- Which type of users publish the most stories?
- What is the education level of the users types publishing most stories?
Story Consumtion/Viewing Process Modelling
Let’s choose the fact table grain. I am tempted to select each user action as the gain of this fact table. That means when user touches the screen a row will be inserted in this fact table
Dimensions
The dimensions at this grain are :-
- Date Story Opened On
- Time Story Opened On
- User who opened the story
- User Session used to view the story
- User IP Location
- User Device
- User Browser or App (Method)
- Story
- Story Type
- User Action Type
- Open Story
- Tap and hold
- Right Tap to move to next item in same story
- Left Tap to go back to previous item in the same story
- Swipe left to move to next story
- Swipe right to move to previous story
- React – Like, love etc
- Share
Fact
Duration of each action type
Now I will create star schema for each of these facts with all attributes of dimension tables as well
Another thing we will need to define the type of our dimensions. Which technique will be used to handle changes in dimension attributes.
Could you please share the star schema related to this example?