What is the difference between Database vs. Data lake vs.  Warehouse?



In this video, we will describe the differences between database, data lake and data warehouse.

If you like this content, please check out the following top-rated courses on Udemy:

AWS SageMaker Practical for Beginners | Build 6 Projects
https://www.udemy.com/course/practical-aws-sagemaker-6-real-world-case-studies/?referralCode=36D5AA54496B682D0D65

AWS Machine Learning Certification Exam | Complete Guide
https://www.udemy.com/course/amazon-web-services-machine-learning/?referralCode=07A0615400D9D1766930

Modern Artificial Intelligence with Zero Coding
https://www.udemy.com/course/modern-artificial-intelligence-with-zero-coding/?referralCode=62EE759C06EF0BAF699A

Python & Machine Learning for Financial Analysis
https://www.udemy.com/course/ml-and-python-in-finance-real-cases-and-practical-solutions/?referralCode=4B6502E065821E886C6E

Modern Artificial Intelligence Masterclass: Build 6 Projects
https://www.udemy.com/course/modern-artificial-intelligence-applications/?referralCode=3A1C79099D9CCB1D482B

Data Science for Business | 6 Real-world Case Studies
https://www.udemy.com/course/data-science-for-business-6-real-world-case-studies/?referralCode=028F68A46A519870807B

TensorFlow 2.0 Practical
https://www.udemy.com/course/tensorflow-2-practical/?referralCode=2009364111D673CAB7D6

TensorFlow 2.0 Practical Advanced
https://www.udemy.com/course/tensorflow-2-practical-advanced/?referralCode=57D1520CB407568F7E81

Machine Learning Regression Masterclass in Python
https://www.udemy.com/course/machine-learning-regression-masterclass-in-python/?referralCode=C473449E55009C3C19AD

Machine Learning Practical Workout | 8 Real-World Projects
https://www.udemy.com/course/deep-learning-machine-learning-practical/?referralCode=E4F2FD5068F398EE7CC9

Machine Learning Classification Bootcamp in Python
https://www.udemy.com/course/machine-learning-classification/?referralCode=492B6665E11EFD36F87D

MATLAB/SIMULINK Bible|Go From Zero to Hero!
https://www.udemy.com/course/matlabsimulink-biblego-from-zero-to-hero/?referralCode=14304D32D81878AD4F8E

Python 3 Programming: Beginner to Pro Masterclass
https://www.udemy.com/course/python-3-beginner-to-pro/?referralCode=D3E22C545F5EF7052629

Autonomous Cars: Deep Learning and Computer Vision in Python
https://www.udemy.com/course/autonomous-cars-deep-learning-and-computer-vision-in-python/?referralCode=ABD5D1368BBD00D65226

Control Systems Made Simple | Beginner’s Guide
https://www.udemy.com/course/control-systems-made-simple-modeling-simulation-control/?referralCode=A5F01F4E703DEC03B3A1

Artificial Intelligence in Arabicالذكاء الصناعي مبتدئ لمحترف
https://www.udemy.com/course/artificial-intelligence-in-arabic/?referralCode=72FD85D08ADCBA46E903

The Complete MATLAB Computer Programming Bootcamp
https://www.udemy.com/course/the-complete-computer-programming-boot-camp/?referralCode=4B2A7BFAF95EFC4ABD32

Databases are typically structured with a defined schema. Items are organized as a set of tables with columns and rows. Columns include attributes and rows indicate an object or entity.

Database is typically designed to be transactional and they are not designed to perform data analytics.

A data warehouse exists on top of several databases and used for business intelligence. Data warehouse consumes data from all these databases and creates a layer optimized to perform data analytics. Schema is done on import.

A data lake is a centralized repository for structured and unstructured data storage. Data lakes could be used to store raw data as is without any structure (schema). There is no need to perform any ETL or transformation jobs on it. You can store many types of data such images, text, files, videos. You can store machine learning models artifacts, real-time data, and analytics outputs in data lakes. Processing could be done on export so schema is defined on read.

I hope you guys enjoyed my videos. Please subscribe for more videos!
https://www.youtube.com/channel/UC76VWNgXnU6z0RSPGwSkNIg

Thanks!

#database #datalake #datawarehouse #s3

source

44 thoughts on “What is the difference between Database vs. Data lake vs. Warehouse?”
  1. It's very annoying when we don't get the difference between a database and a database management system. Mssql, oracle etc are all DBMS. Hardisk, SSD can be considered as database.

  2. I'm a bike courier (Data) & I've had this job since I was a kid and now fresh out of my 20's it's kept key moments with your viewing pleasure for support heck… That's even afterwards of my salvage company job and that awareness YOU LIVE & YOU LEARN so good luck with your lifestyle choices thanks 👍👋🎯

  3. I never was much a fan of the distinction "structured" vs "unstructured" data. Tabular vs. non-tabular format would be more apt. Expecting data in formal 1970s style SQL data format is a lazy IT centric view. Digital data always has a structure the moment it is materialized on a storage media, or you wouldn't be able to do NLP (based on ASCII/ANSI text encoding) or Computer Vision (based on JPG/PNG file structures). What is really more relevant is the semantic structure, and that is often not given even in tabular structured data sets.

  4. I did have a quick question, Might sound dumb. But still if the datalake contains unstructured and structured data. There is this last point which says "Processing of data can be done where schema is defined on read". Well but we have both types of data so the schema will be made just from structured data right ? or will it be able to make schema for unstrctured data ?

  5. Amazing job you did here !!! Very clear and to the point. Keep up the good work brother 🙂

  6. Really good explanation. Probably just need to fix sound quality but the cpntent is really good. Thanks

  7. Nice job and a good show but it is out of date and could use a refresh as Cloud Data Warehousing solutions like Redshift and Snowflake can persist both structured and semi-structured data without limits on storage or compute limitations. These endless storage/compute capabilities in modern Cloud DW offerings knocked the Data Lakes out of the "Data Lake as a Data Warehouse" game and placed Data Lakes back into the domain of Big Data where unstructured data (pdfs, jpegs, mv4, mp3, etc) or data with high velocity and volume (Iot sensor data, web logs) are the norm.

  8. Very good explanation with one nitpick: The first section on databases says the data must be structured and it shows a typical database "table". This is no longer the case since noSQL options have become popular. You can have a noSQL database that is not structured.

Leave a Reply

Your email address will not be published.

Captcha loading...