Hadoop Tutorial For Beginners | Hadoop Ecosystem Explained in 20 min! – Frank Kane


    Explore the full course on Udemy (special discount included in the link):

    Hadoop and its associated distributions from Hortonworks, Cloudera and MapR include a dizzying array of technologies. We will start talking about the origins and history of Hadoop, and then take a look at how all the different open-source systems that surround Hadoop clusters fit together. After this video, you will have a high level overview of the biggest systems in the world of Hadoop today and see how they interoperate.

    Apache projects tend to have cryptic names, and we will decipher what they all really do. We will talk briefly about the core components of Hadoop: HDFS, YARN, and MapReduce. And then we will touch upon all the other systems built up around them, including Apache Spark, Hive, Pig, Ambari, Oozie, Zookeeper, Sqoop, Flume, Kafka, Mesos, HBase, Storm, Hue, Presto, Zeppelin, MySQL, Cassandra, MongoDB, Drill, and Phoenix.

    My larger course covers each technology in more depth, but at the end of this video, these terms should at least make sense to you and you will be less confused when people talk about them all.

    Course Description
    The world of Hadoop and “Big Data” can be intimidating – hundreds of different technologies with cryptic names form the Hadoop ecosystem. With this course, you’ll not only understand what those systems are and how they fit together – but you’ll go hands-on and learn how to use them to solve real business problems!

    Learn and master the most popular big data technologies in this comprehensive course, taught by a former engineer and senior manager from Amazon and IMDb. We’ll go way beyond Hadoop itself, and dive into all sorts of distributed systems you may need to integrate with.

    Install and work with a real Hadoop installation right on your desktop with Hortonworks and the Ambari UI
    Manage big data on a cluster with HDFS and MapReduce
    Write programs to analyze data on Hadoop with Pig and Spark
    Store and query your data with Sqoop, Hive, MySQL, HBase, Cassandra, MongoDB, Drill, Phoenix, and Presto
    Design real-world systems using the Hadoop ecosystem
    Learn how your cluster is managed with YARN, Mesos, Zookeeper, Oozie, Zeppelin, and Hue
    Handle streaming data in real time with Kafka, Flume, Spark Streaming, Flink, and Storm
    Understanding Hadoop is a highly valuable skill for anyone working at companies with large amounts of data.

    Almost every large company you might want to work at uses Hadoop in some way, including Amazon, Ebay, Facebook, Google, LinkedIn, IBM, Spotify, Twitter, and Yahoo! And it’s not just technology companies that need Hadoop; even the New York Times uses Hadoop for processing images.

    This course is comprehensive, covering over 25 different technologies in over 14 hours of video lectures. It’s filled with hands-on activities and exercises, so you get some real experience in using Hadoop – it’s not just theory.

    You’ll find a range of activities in this course for people at every level. If you’re a project manager who just wants to learn the buzzwords, there are web UI’s for many of the activities in the course that require no programming knowledge. If you’re comfortable with command lines, we’ll show you how to work with them too. And if you’re a programmer, I’ll challenge you with writing real scripts on a Hadoop system using Scala, Pig Latin, and Python.

    You’ll walk away from this course with a real, deep understanding of Hadoop and its associated distributed systems, and you can apply Hadoop to real-world problems. Plus a valuable completion certificate is waiting for you at the end!

    Please note the focus on this course is on application development, not Hadoop administration. Although you will pick up some administration skills along the way.

    I hope to see you in the course soon!


    Who is the target audience?
    Software engineers and programmers who want to understand the larger Hadoop ecosystem, and use it to store, analyze, and vend “big data” at scale.
    Project, program, or product managers who want to understand the lingo and high-level architecture of Hadoop.
    Data analysts and database administrators who are curious about Hadoop and how it relates to their work.
    System architects who need to understand the components available in the Hadoop ecosystem, and how they fit together.

    Your instructor is Frank Kane, who spent nine years at Amazon.com and IMDb.com as a senior engineer and a senior manager. Frank job included extracting meaning from their massive data sets to recommend products to Amazons customers, and movies to IMDb users.


    Previous articleMengenal Proses CI/CD dan Jenkins
    Next articleGroovy Fundamentals