What is the difference between Database vs. Data lake vs.  Warehouse?

In this video, we will describe the differences between database, data lake and data warehouse.

If you like this content, please check out the following top-rated courses on Udemy:

AWS SageMaker Practical for Beginners | Build 6 Projects

AWS Machine Learning Certification Exam | Complete Guide

Modern Artificial Intelligence with Zero Coding

Python & Machine Learning for Financial Analysis

Modern Artificial Intelligence Masterclass: Build 6 Projects

Data Science for Business | 6 Real-world Case Studies

TensorFlow 2.0 Practical

TensorFlow 2.0 Practical Advanced

Machine Learning Regression Masterclass in Python

Machine Learning Practical Workout | 8 Real-World Projects

Machine Learning Classification Bootcamp in Python

MATLAB/SIMULINK Bible|Go From Zero to Hero!

Python 3 Programming: Beginner to Pro Masterclass

Autonomous Cars: Deep Learning and Computer Vision in Python

Control Systems Made Simple | Beginner’s Guide

Artificial Intelligence in Arabicالذكاء الصناعي مبتدئ لمحترف

The Complete MATLAB Computer Programming Bootcamp

Databases are typically structured with a defined schema. Items are organized as a set of tables with columns and rows. Columns include attributes and rows indicate an object or entity.

Database is typically designed to be transactional and they are not designed to perform data analytics.

A data warehouse exists on top of several databases and used for business intelligence. Data warehouse consumes data from all these databases and creates a layer optimized to perform data analytics. Schema is done on import.

A data lake is a centralized repository for structured and unstructured data storage. Data lakes could be used to store raw data as is without any structure (schema). There is no need to perform any ETL or transformation jobs on it. You can store many types of data such images, text, files, videos. You can store machine learning models artifacts, real-time data, and analytics outputs in data lakes. Processing could be done on export so schema is defined on read.

I hope you guys enjoyed my videos. Please subscribe for more videos!


#database #datalake #datawarehouse #s3


44 thoughts on “What is the difference between Database vs. Data lake vs. Warehouse?”
  1. It's very annoying when we don't get the difference between a database and a database management system. Mssql, oracle etc are all DBMS. Hardisk, SSD can be considered as database.

  2. I'm a bike courier (Data) & I've had this job since I was a kid and now fresh out of my 20's it's kept key moments with your viewing pleasure for support heck… That's even afterwards of my salvage company job and that awareness YOU LIVE & YOU LEARN so good luck with your lifestyle choices thanks 👍👋🎯

  3. I never was much a fan of the distinction "structured" vs "unstructured" data. Tabular vs. non-tabular format would be more apt. Expecting data in formal 1970s style SQL data format is a lazy IT centric view. Digital data always has a structure the moment it is materialized on a storage media, or you wouldn't be able to do NLP (based on ASCII/ANSI text encoding) or Computer Vision (based on JPG/PNG file structures). What is really more relevant is the semantic structure, and that is often not given even in tabular structured data sets.

  4. I did have a quick question, Might sound dumb. But still if the datalake contains unstructured and structured data. There is this last point which says "Processing of data can be done where schema is defined on read". Well but we have both types of data so the schema will be made just from structured data right ? or will it be able to make schema for unstrctured data ?

  5. Amazing job you did here !!! Very clear and to the point. Keep up the good work brother 🙂

  6. Really good explanation. Probably just need to fix sound quality but the cpntent is really good. Thanks

  7. Nice job and a good show but it is out of date and could use a refresh as Cloud Data Warehousing solutions like Redshift and Snowflake can persist both structured and semi-structured data without limits on storage or compute limitations. These endless storage/compute capabilities in modern Cloud DW offerings knocked the Data Lakes out of the "Data Lake as a Data Warehouse" game and placed Data Lakes back into the domain of Big Data where unstructured data (pdfs, jpegs, mv4, mp3, etc) or data with high velocity and volume (Iot sensor data, web logs) are the norm.

  8. Very good explanation with one nitpick: The first section on databases says the data must be structured and it shows a typical database "table". This is no longer the case since noSQL options have become popular. You can have a noSQL database that is not structured.

Leave a Reply

Your email address will not be published.

Captcha loading...