Data warehouses (in all their forms and iterations) have become the backbone of almost every organizations analytics, data science and BI departments.
Over the years they have evolved and taken on new shapes as modern infrastructure has been developed to better manage analytical queries and workloads.
In our last few articles and videos we discussed why data warehouses are used as well as a high level for what they are used for.
Now we wanted to provide more depth to a data warehouse.
Beyond just talking about the high level reasons why you might want to incorporate a data warehouse into your infrastructure. We wanted to discuss the key design components.
A common interview question for data engineers and BI developers is to develop a data warehouse. Personally, I have been asked to design a parking lot data warehouse, a college courses data warehouse and several others for interviews.
I suggest that most people watch a few videos on the topic of data modeling as well as read up on Kimball’s data warehouse design book. After that, then you should think about a few workflows you might enjoy and practice modeling them.
For example, let’s walk through designing a data warehouse for a food delivery app.
How would you approach this design?
Generally, a good way to start is to list out the entities you would consider being part of a food delivery app.
For example here is a list:
* Menu Items(And possibly add-ons)
* Drop Off Locations
* Persons(Customers and drivers, since a driver in the future might be a customer and visa versa)
This would be a good high level set of entities to start with. Especially in an interview. You don’t want to focus on every possible issue and detail.
I would list these out and prod the interviewer to see if this is all the entities they were concerned with. Often times interviewers have a specific set of questions they want to ask. So if you don’t include all the entities, they may ask about different parts of the workflow you might have forgotten.
From here, the dimensions it may be obvious. You have menu items, restaurants, drop off locations, cars and persons.
These are all dimensional items because they represent entities that don’t change often, don’t contain measurable data and can be used to pivot and break down your future reports.
If you want to learn more, then watch the rest of the video! Or consider watching some of our other content.
What is a data warehouse and why build it?
Why invest in a data warehouse?