Database vs Data Warehouse vs Data Lake | What is the Difference?
Database vs Data Warehouse vs Data Lake | Today we take a look at these 3 different ways to store data and the differences between them.
____________________________________________
SUBSCRIBE!
Do you want to become a Data Analyst? That’s what this channel is all about! My goal is to help you learn everything you need in order to start your career or even switch your career into Data Analytics. Be sure to subscribe to not miss out on any content!
____________________________________________
RESOURCES:
Coursera Courses:
Google Data Analyst Certification: https://coursera.pxf.io/5bBd62
Data Analysis with Python – https://coursera.pxf.io/BXY3Wy
IBM Data Analysis Specialization – https://coursera.pxf.io/AoYOdR
Tableau Data Visualization – https://coursera.pxf.io/MXYqaN
Udemy Courses:
Python for Data Analysis and Visualization- https://bit.ly/3hhX4LX
Statistics for Data Science – https://bit.ly/37jqDbq
SQL for Data Analysts (SSMS) – https://bit.ly/3fkqEij
Tableau A-Z – http://bit.ly/385lYvN
*Please note I may earn a small commission for any purchase through these links – Thanks for supporting the channel!*
____________________________________________
SUPPORT MY CHANNEL – PATREON/MERCH
Patreon Page – https://www.patreon.com/AlexTheAnalyst
Alex The Analyst Shop – https://teespring.com/stores/alex-the-analyst-shop
____________________________________________
Websites:
Website: AlexTheAnalyst.com
GitHub: https://github.com/AlexTheAnalyst
Instagram: @Alex_The_Analyst
____________________________________________
*All opinions or statements in this video are my own and do not reflect the opinion of the company I work for or have ever worked for*
source
Leave us a comment Cancel reply
COMMON QUESTIONS
Categories
- Agile Coach
- ARTIFICIAL INTELLIGENCE
- Block chain
- Business Analyst BA
- DATA science
- Development
- Devops Automation
- Dotnet
- Education
- Free Job training and assistance
- Information
- Infrastructure automation
- Interview questions
- JAVA
- jobs
- Learn
- Machine learning
- Python Automation
- QA Automation
- RPA Automation
- SAP crm
- Scrum master Agile
- Skill
- UI Design
- Uncategorized
38 Comments
I bet you read my username and like it
May 23, 2022 02:41 amIts 2:00am – Why am I watching a video about data lakes?
James Hughes
May 23, 2022 02:41 amThis was an Excellent Summary, Alex. I'll be showing this to the young'ins next time I need to train one =D
B G
May 23, 2022 02:41 amThough often times by database people do mean a RDBMS (Relational Data Base Management System), and use SQL, a database is any organized data store and goes well beyond just relational data models. One of the more common generic alternatives are hierarchical databases. Some simple examples of that are xml with xquery or the windows registry. I have even used what was tab delineated files and a file system hierarchy to do bioinformatics research.
A second thing is that databases don't need to be persistent, ACID, or on disk. The point being keep your mind open to things past RDBs. A different data store and query system may provide a better solution to the problem at hand.
If you do use RDBs, please bring it to third normal form, preferably BCNF. I cannot tell you the number of times I have come into a company to help solve a problem and it turns out that their data doesn't even fit first normal form.
Walter White
May 23, 2022 02:41 amI think this guy's trying to look like me.
Dustin
May 23, 2022 02:41 amThe "transaction" in OLTP doesn't mean retail sale. It means that a specific and complete interaction with the database has occurred. Yes, it works like a retail sale, but using such an analogy tends to confuse people that are unfamiliar with databases.
Zach Wiggin
May 23, 2022 02:41 amNot sure why this video was in my feed but it was interesting! I now know enough to be stupid.
Rajib Roy
May 23, 2022 02:41 amLike your Explanation Sir.
Stk
May 23, 2022 02:41 amAs a dev its a bit confusing with the data warehouse. I mean it's just a concept and a design idea right? I mean if I create a new database in SQL server and optimized for Data warehouse its an data warehouse right?
Andrij Demianczuk
May 23, 2022 02:41 amAlso have a look at Delta Lake. It’s about providing warehousing capabilities on a data lake, relying on a hive metastore and parquet normalization for column features. Delta lake is OSS and becoming more widely supported 🙂
Classicracer Uk
May 23, 2022 02:41 amDon’t forget Data Marts
Charlie
May 23, 2022 02:41 amGreat video.
MastJedi
May 23, 2022 02:41 amI like very much work with Databases rather than Warehouses or Lakes. Especially when I can prove (many times did), that my report is right, and report based on Warehouse/Lake is wrong. Work with databases give me freedom and very fast answer on problems. Investigate, find issue and ask data owner to fix input data (usually).
When chat with owners of reports based on Warehouse/Lakes is always the same: – I dont know.
My report – headcount 167 ppl.
Lake 152.
Where did you take 152 from? I dont know
I can give you 167 names with numbers of hours, when, where, how long. Can you do the same with 152 so we can compare our data? – I dont have this data. But 152 is THE number.
Taken all data from Data Lake and check it took them TWO Months. lol.
Welcome to the big corpo life.
Martijn Vos
May 23, 2022 02:41 amNice breakdown. From your explanation, I get the impression that the neo4j graph database we used on my previous project was actually a data warehouse for us, because we filled it with data we drew from many other databases, and structured it in a specific way (relations between the many different data elements from the different systems) for the reporting tool we built on top of it.
And much of our data didn't come directly from those other databases, but drew it daily from an intermediary portal that contained all sorts of different kinda of data from different systems, which I guess was a data lake of some sort.
Andrew Whitworth
May 23, 2022 02:41 amIt's worth mentioning, and I don't think you said this explicitly, that the three things can all be running the same underlying software. You can run a Database, Data Warehouse and (at least part) of a Data Lake using the same SQL engine, for example. The big difference is in how you use it, which has more to do with convention (and configuration) than any kind of different technical solution.
WECON Technology
May 23, 2022 02:41 amWecon IIOT V-BOX could support PLC connecting with database.
Richard Sanders
May 23, 2022 02:41 amMaybe a video on how the cluster/cloud companies charge the customer based on processing/RAM/data storage/bytes transferred, etc. 😀
VadimChes
May 23, 2022 02:41 amDon't put your face video to the right upper corner, where you close your slide text…
Knight Errant
May 23, 2022 02:41 amI disagree with your descriptions in some ways. Especially the bullet point and mention of the data warehouse's data being summarized. You seemed to say, without saying the actual word, that the database data was being aggregated when the ETL process brought the data into the data warehouse. Ok. That is fine and in my experience happens that way quite often by default. The issue I have with that is you also describe the Data Warehouse as where the the analysis and reporting of the data occurs. The sort of 'analysis' you can perform with aggregated data is severely limited. As in 'kindergarden through grade school' limited. Any sort of analysis with aggregated data will be comparing volumes of X and volumes of Y against dates when known incidents occurred or certain conditions obtained (i.e. an outage, inventory shortfall, holiday weekend, etc) at most. And that is pretty much the limit of what can be done when performing an analysis with aggregated data. I – myself – as a systems data analyst and data architect, have come to regard these terms as having become 'executive-speak' as of 2022 and are not very useful in describing the concepts required for conceptualizing modern data storage, processing, and analytics – especially in light of the revolution slowly unfolding with the emergence and evolution of cloud (distributed) technologies. Hmm. Perhaps I need to make some videos of my own (or collaborate on making a few) to put this new conceptualization model out there? What do you think, Alex? And my fellow YouTube video consumers and data nerds? 🙂
Random Debris
May 23, 2022 02:41 amData lake makes me think of Apple's Newton's data storage (Soups)
Jesse Lima
May 23, 2022 02:41 amBy the first time, those concepts got clear to me. Thanks!
おちゃむ
May 23, 2022 02:41 amI develop DWH.It's good video.
Moses Varghese
May 23, 2022 02:41 amAlex, it is awesome to see these videos man! Very informative.
Rik Killeen
May 23, 2022 02:41 amAs a foodie, I propose we rename "data lake" to "data ragu."
Liam Dennehy
May 23, 2022 02:41 amYou typically don't find a relational database with a flexible schema – they're most often mutually exclusive. A few other generalisations need attention. I understand this is a short video, but still…
D Lama
May 23, 2022 02:41 amYour head is covering parts of the slide.
Matt B
May 23, 2022 02:41 amAny videos on Data visualization?
Ou8y2k2
May 23, 2022 02:41 amOne or all three? What about just two?
milligoree
May 23, 2022 02:41 amGreat video, nice and clear. The only thing that could have been a helpful addition would be the names of the main tools for each.
DanDirindon
May 23, 2022 02:41 amSorry but you have described a Data Lake in a too generic way. It sounds only like a place where you can put everything, and nothing more! So does any company that holds data own a Data Lake? I do not think so! In my humble opinion, a Data Like is more an idea than a well-defined concept, appreciated more by the technology vendors to sell huge systems. The term was coined by the Chief Technology Officer of Pentaho, James Dixon, but then it evolved depending on the interests of users, vendors and academic gurus. A serious Data Lake should have the technology, tools, space, resources to load and manage any kind of data mainly from the scratch, with no predetermined or predefined structure or schema. Including real-time data, raw, not cleaned data, or unstructured data, videos, voice, text or clickstream! Independently from the true presence of all these kinds of data in the company and from the real usage of this data. It is strategic to maintain a central repository with metatags and the references to all the data stored in the Data Lake to avoid losing large amount of time to look for something. The same for a system of Data Security to prevent unauthorized accesses to each portion of data. And a lot of effort to avoid that the Data Lake evolves in a Data Swamp! To be more precise, a Data Lake is useful but not strictly necessary for machine learning or artificial intelligence like many people say. And finally, let me state that at least 99% of the companies in the world do not need a real Data Lake, no matter what the technology vendors suggest.
amitava paul
May 23, 2022 02:41 amvery comprehensive video..Thanks!
Łukasz Smykał
May 23, 2022 02:41 amclear, straightforward, fantastic 😉
thx
Katlego
May 23, 2022 02:41 amSQL Developer all the way from South Africa 👋🏾. These lessons really help, thank you. But they (Datawarehouse Architects) don't trust me yet with WRITE access to the Datawarehouse 😂
EndMan7
May 23, 2022 02:41 amThere is also a Data Lakehouse
FrostSpike
May 23, 2022 02:41 amData in the core of the warehouse is normally full-grain (albeit not necessarily all the tables/columns – it's a subset not a summary). It's highly normalised to keep the size down as much as possible. Overlaying the data warehouse are the data marts which are structured (with summaries) for easy analytical consumption. Some people conflate the two and see the data marts (sometimes called information marts) as being part of the data warehouse (and, depending on the platform, might just be views over the curated store) but they're logically a separate layer performing a different function. The Data Lake fulfils the same function as the Data Warehouse core but is more flexible on the types of content it can capture. There's a new-ish architecture called a "Data Lakehouse" which sort of composites the two concepts.
CodingStream
May 23, 2022 02:41 amyou could have included some examples of each, that would be of great help !!
and also maybe a real life usage of these, I know you discussed it for database and datawarehouse but for data-lake as well in a littlr but more detail. datalake was very vaguely covered
JaeHo Yun
May 23, 2022 02:41 amThanks Alex. This is simple and useful for me
Joshua Morley
May 23, 2022 02:41 amAnd now we see the latest iteration of a data topology, the lakehouse 🙂
Great video! i really like that you mention the summarisation as a point of differentiation. Analytics on a data warehouse is typically done by a data analyst with traditional data analyst skillsets (SQL, R etc). Analytics on a data lake is typically done by a data scientist (Python, ML etc)
An important point to note is "if you have all this data and you have no idea what to do with it" for data lake, a vital thing to focus on when creating a data lake is the structure of your data in the lake (as in directory and file structure)
if you just dump it all in, it will become a data swamp and waste of money
Anas Farah
May 23, 2022 02:41 amWould it be fair to say that a database goal is to optimize for write operations while a DW is to optimize for read operations?