If you have an interview coming up and you want to revise 10 most important machine learning Algorithms real quick you will not find a better video than this let’s go ahead and do the revision of 10 most frequent used ml algorithms these are the 10 algorithms I am going

To explain you how they work and what are their pros and cons okay and as you can see first five algorithms is in one color next three is in a different color and last two is in a different color there is a reason for that guys I will

Tell you in a moment but before that let’s try to answer two basic questions okay let’s try to answer what is machine learning and what are algorithms okay so I’ll start with a non-bookish definition and I will give you one simple example suppose you want to travel from

Bangalore to Hyderabad okay where you want to go you want to go from Bangalore to Hyderabad for this you can either take a train or you can either take a flight or you can take a bus as well or maybe you can drive your own car as well okay so two

Things we have to understand here guys what is the task okay and what is the approach fine so the task in hand is we have to go from Bangalore to Hyderabad okay and the approach is all these three options that I told you just now now related to the world of machine

Learning in machine learning the task can be different kinds of tasks okay for example it can be a regression task okay or it can be a classification task okay or it can be a unsupervised learning problem I will just write unsupervised okay so in approach section we can have

Different different approaches based on if we are solving a regression problem or we are solving a classification or we are solving a particular case of unsupervised learning okay in regression also we can take many approaches for example in regression there is not only one approach in regression I can take

Approach one approach two approach 3 approach 4 approach five in classification I can take this approach this approach this approach in unsupervised also I can take multiple approaches so that is why this color coding is there the first five algorithms that you see here will solve I will explain you for

Regression use case Okay so there we will take a regression use case and try to understand how to solve that using these five algorithms okay the next three that you see I am going to explain you with a classification use case so these approaches are for classification problem okay

And last two I am going to explain you for a unsupervised learning problem how that will be this these algorithms will be used to solve unsupervised learning problem okay so let’s go ahead guys and try to understand with a simple input data I have taken a sample input data here and

Let’s without any delay start on the first Algorithm known as linear regression so machine learning is all about learning pattern from the data using algorithms okay so if we are using a algorithm known as linear regression then what will happen let’s try to understand that so first algorithm of

Our list linear regression okay now suppose this is the employee data of an organization you have a age column you have a salary column fine so 22 years person earns 23 000 and so on and so forth suppose we using the linear regression approach to solve this

Regression problem now as I told you first five problems will be regression problems first five algorithms you will understand using regression problem okay come here this is your data so what linear regression will do is it will just take this data and it will see how the data is plotted on a XY

Plane like this for example on one axis we can take salary okay on y axis and on x axis we can take Edge okay and I am just roughly pointing these points okay first point 22 and 23 000 maybe it can come somewhere here on x axis if you

Put h on Y axis salary I am just putting here second data point can come somewhere here let’s say 41 and 80 000 data points and third data point 58 and 150k this data point can come maybe somewhere here I can say okay so what linear regression will do is it

Will try to plot a line okay ideally what the assumption is all these points should fall on same line a line like this can be plotted or a line like this can be plotted but the Assumption here is ideally in an Ideal World all these

Points will fall in the same line but it will never happen in the real world so what logistic linear regression will do is it will try to fit something known as a best fit line okay so this is your best fit line let’s assume that how this

Best fit line is computed it will try to minimize the distance from all these points together so distance from this point is this distance from this point is this parallel to Y axis distance from this point is this okay so you can call this even you can call this E2 you can

Call this E3 okay so what linear regression will do is it will try to minimize even Square plus E2 square plus E3 Square for whichever line it finds the minimum even Square E2 Square E3 Square it will call that line as the model okay it will call that line as the model

Now as you know from your normal understanding of mathematics this straight line will have a equation in the form of mostly simplest we can write Y is equal to MX plus C right in our case I can say salary is equal to M times h m times of H this is

Multiplication plus c c can be an intercept let’s give some number here some random number I will give let’s say 2000 okay so imagine this line which is the model for linear regression has this formula okay now the next question comes tomorrow when the pattern has been

Learned and a new age comes let’s say age is 50. so what will be the salary for that person so very simple the model will come here and put the numbers here for example if for M we can put any number let’s say 0.2 then age will be 50 and then salary will

Be intercept will be 2000 whatever this calculation comes that will be the prediction of the salary for this 50 okay very simple very simple mathematical model the assumption is there is a linear relation between independent variable and Target variable okay that’s the Assumption so what it

Will do it will try to plot a line what it will call as a best fit line wherever it finds this value as minimum once the best fit line comes then how the prediction happens like this okay obviously there will be pros and cons of all the algorithms all the models so

What is the pros and cons of linear regression the the pluses or Pros for this model will be it’s a simple to understand model it’s a mathematical model you can explain to someone but the cons will be um it’s not necessary that your data will always be this simple that can be

Fit in a line right or close to a line so it’s a simple model hence lot of real world problems it may be difficult to solve with simple linear regression there can be a varieties in linear regression that um I have created videos you can watch through those videos but simply linear

Regression works like this okay this is one first approach first approach means first algorithm now let’s go ahead and try to see how decision tree will approach the same problem okay how decision tree will approach this same problem so if you give this same data okay if

You give the same data to decision tree and you ask hey learn pattern from this data what decision tree will do is it will just try to break the data how it will break the data is it will create a rule like this okay so I can write a

Rule here for example I can say is less than equals to 30 this is a rule okay so some records will satisfy this rule okay some records will satisfy and some records will not satisfy this way data will break okay if you come here is less

Than 30 how many records only one record is more than 30 two records so how many records will come this side only one record will come okay so let’s say that record is I should not write the wrong Numbers 22 23k 4180k so I will write here 22 and 23 K and

Here I will write 41 and 80k okay and there is one more record let me take the numbers 58 and 150k 58 and 150k understand this carefully guys because for next next algorithms this is the base okay so decision tree will split your data like this so you had total how many

Records in the beginning three records here how many records you are having one record here how many records you are having two records okay so this is first level of split now definitely can split it one more time okay so tree can make here there are limited

Number of Records but imagine if there are more records there can be one more split here saying you know another filter is is maybe less than 40 or something like this okay but I will not take that now that will make the tree complex okay so this is your model

Breaking your data based on some conditions is nothing but your model so somebody asks you what is a model in decision tree this is your model now the important question is suppose tomorrow somebody comes and asks for a person with age 50 what is your prediction for

A person with age 50 what is your prediction very very important concept to understand guys decision tree will come and check what is this for age 50 okay so age 50 will come in which category will come in this line okay in this line how many records are there two

Records so decision tree will go ahead and take the average of these two salaries so for age 50 your prediction will be what will be the prediction guys for age 50 prediction will be 80k plus 150k divided by 2. okay this is how decision tree will be making the prediction

Suppose you ask through this entry hey what will be the salary of a person with age 21 so it will not go to right hand side it will go to left hand side because this is the tree branch in which it should go it will directly say 23k in

This case because there is only one record Suppose there are two records it will take the average okay so you see how these two approaches are different for solving same regression problem here a mathematical line will be fit and here a decision tree you know data will be broken into multiple pieces and

Prediction will be made okay remember guys decision tree is based for many other Advanced algorithms and our third algorithm in the list is something non as a random Forest okay a random Forest what random Forest will do is it will say decision tree okay you have done a good job but

Uh there is a chances of overfitting of the data so we did not discuss pros and cons of this process it’s a simple model you know you don’t need to do a lot of mathematics Etc and cons is there is a chances of overfitting because you know

If there is a little change in the data your model may change totally that’s a risk here in decision tree so overfitting So Random Forest will come and say Hey you are taking a right approach but there is a chances of overfitting so why don’t you fit multiple trees so what

Random Forest will do is it will come and create multiple trees this is your tree one okay like the way we saw decision tree this is your for example tree one okay this is your for example tree two okay and similarly there can be n number of

Trees okay similarly there can be n number of trees so we will call this as T1 we will call this as T2 and that there can be you know 500 trees for example so what random Forest will do is it will say two deficiently hey if you are

Fitting one tree there is a chance of result being biased or there is a chance of overfitting or there is a chance of model not being stable but what I will do is I will fit 500 trees okay and how I will make the prediction is very important to understand here guys

Prediction of random Forest will be average of all these prediction for example if we are trying to predict for the age 50 right for the age 50 what will be the salary if we are trying to predict okay then in random Forest it will take prediction from tree one plus

Prediction from 3 2. Plus prediction from tree 500 okay it will take all the predictions and it will take average of that what is the what is the thing that we are trying to achieve here suppose in one decision tree your tree is overfitting or not performing well or is

Biased okay so what may happen in diffusion trees since you are taking a feedback from 500 different trees so that overfitting problem or model in stability problem may not be there okay so this is how random Forest is different from decision tree remember all these individual trees will not be

Using all the data for example suppose in your data there is one thousand rows and 10 columns okay just an example I am giving so all these all these trees will not use necessarily all the records it may be possible that tree One is using 100 records and three

Columns randomly selected three two T2 is using three two hundred records and three columns randomly selected okay and that is the advantage of this random Forest that all these trees Will May learn a different kind of pattern and when you take a aggregated result then you will have all the flavors okay this

Kind of learning that I just explained you is known as and Sample learning okay remember guys at unfold data science you will find a big playlist explaining all the algorithms of Ensemble learning in detail I will paste the link in the description you must check if you have

Any confusion on how and simple learning works okay but there is more to Ensemble learning what happened just now in random Forest is known as parallel way of learning okay parallel way of learning parallel way of learning why parallel way of learning guys because here tree

One and three two and three three are independent of each other when you call a random forest model 31 can start building by taking a sub sample of the data 3 2 can start building by taking a subsample of the data they are not dependent on each other okay so all

These things can happen parallely hence we call it a parallel learning now the question is is there another way of learning in Ensemble yes there comes our next algorithm known as add a boost okay Ada boost standing for adaptive boosting so what Ada boost will do is

Let me write the data here let me write the data one more time and I may be writing some different numbers so that’s not important just understanding the concept is important okay so 42 I will write 50 000 and let’s say 58 I will write 150 000 just as an

Example this is your input data so boosting is another technique boosting is another technique of Ensemble category okay in boosting especially at a boost what will happen is it will assign a weight to all your observations okay suppose this is your original data for training salary being your target column so initial weights

Initial weights okay and what the initial weights will be it will be the same weight for all your records for example there are three records so one by three I am saying one by three I am saying one by three I am saying so all the rows are equally important okay

Try to understand the concept guys in Ada boost in the beginning first iteration all the rows are equally important okay but how Ada boost works is in the name only there is adaptive it adapts to the mistakes of the previous model now why I am saying a previous

Model and next model is one thing you have to always remember at a boost is a sequential learning process you you remember how I just now told random Forest is a parallel learning process so in random Forest tree one and three two are independent of each other okay it will take a sub

Sample and create it will take a sub sample and create nothing to do with each other but in adoboost or other boosting techniques it’s a sequential model so there will be a multiple models in this so there will be multiple models fitted to the data I

Will tell you in a moment what these models will be model 1 model 2 model 3 Model 4 and so on and so forth how many ever model comes but it will not happen parallely okay it will happen in sequence now the important thing to understand is

How this sequence will be generated okay so what will happen is this model one you can think of as a base model this model one you can think of as a base model and remember in Ada boost your decision trees will look like stumps stumps means there will be a tree like

This and there will be another tree like this so it will the depth of the tree will not be Beyond one level okay so this is called stumps in the language of machine learning so multiple stems will be created now suppose your model 1 is this first stump

What is your model one guys this first stump okay model one comes and make some prediction about the salary model one comes and make some predictions about this salary okay so what we will have is another column called as salary underscore prediction and where from this prediction Comes This prediction comes

From model one the first model okay so obviously there will be some mistakes so 22 000 may be said as 21 900 and 50 and 150 can be said as 50 can be said as let’s say 52 000 okay and 150 can be said as let’s say two hundred thousand

Based on this first model first decision tree that it is creating which I am calling a system so there will be some differences between actual and predicted and from this there will be a residual coming residual means errors right residual means errors okay so what will

Be the errors 21 900 minus 22 000 right so it will be for example I can say a hundred actual minus predicted it is minus two thousand and it is minus minus 50 000 because we have put okay so this is the errors these are the actual values and

The first model what it predicts right those are the errors from the first model OKAY twenty two thousand minus twenty one nine hundred is one hundred and so on and so forth now these are the initial weights okay so what will happen in the next model when the M2 is fitted right these

Initial weights will be changed and more preference will be given to the observations where these residuals are more okay I am repeating one more time guys M1 will predict this and then residuals or errors will come when the M2 is trained right then the weights will not be same for all these three

Records rather weight will be increased for this because you are getting more errors here and weight will be decreased for this because you are getting less error here okay and so on and so forth M2 will come compute create the residual then again weights will be adjusted M3 will come

Predict residual will be calculated weights will be adjusted and finally what you will get is a combination of what will be your final model your final model will be a combination of base model I am calling it the first model okay plus M1 plus M2 plus M3 plus so on

And so forth remember this this is not a mathematical equation this is just indicative equation I am giving you okay if you want to understand more mathematics behind it please go ahead and click on the link I’m giving you in the description okay and all these

Things will not have equal say in the final output their say also will be different in the final output for example in random Forest you saw all the models have equal C in the final output we are dividing by 500 okay but here all these models will not have

Equal say they will have an equal say okay let’s move ahead to another what is the pros and cons for this model again this model will give you a may give you a better result than most of the models because it is adapting to the changes

But if you have a larger data side it may it may need more resources to train and also it is a one kind of Black Box model some kind of Black Box model means you don’t have much explanation of what is going on inside apart from some hyper parameters okay

Let’s move ahead to the last algorithm integration category known as gradient boost okay what is the last algorithm integration category gradient boost remember guys all these algorithms that I’m explaining you I have not taken anything that is used less all are used more only okay so I will take a simple data age

Salary is 21 salary let’s say 20K is 40 salaries let’s say 42k is 58 salary is let’s say 60k this is your input data and you want to run a gradient boost on this what will happen is understand guys this is again a sequential learning not a

Parallel learning okay so there will be a base prediction for all these data base prediction okay base prediction what is the base prediction guys base prediction is nothing but it’s a kind of dumb model it will assume that for all these guys it will be a average of you

Know all these three records so what is the average of this uh 80 plus 42. 80 plus 42 divided by 3 right so 2 1 1 2. right let’s say assume for Simplicity this is 36k okay so the base prediction will be put here 36k 36k 36k one is the

Base prediction comes then there will be a residual computed okay residual will be the difference between actual and predicted values whatever these numbers are fine now comes the interesting part how gradient boost is different from Ada boost or other algorithms so what gradient boost will do is it will try to

Fit a model on this residual okay it will try to fit a model on this residual and try to minimize these residuals so that will be called as a base model okay and then there will be next model you can call it residual model one okay and

Then there will be a next model you can call it residual model 2 and so on and so forth okay so what will happen is residuals will be computed and then whatever the residual comes based on that base prediction will be updated so for example let’s say your residual here

Is how much 20 minus 36 minus 16 is your residual right so this will act as a independent column and this residual will act as a Target column and then let’s say in the prediction this minus 16 is is comes as let’s say minus 10. so what will happen is this

Base prediction will get updated by this this base prediction will get updated again it’s a complicated model if you want to understand more details there are links in the description please click on that it will be very clear to you okay so what will happen base model plus residual model 1 plus residual

Model 2 so on and so forth and there will be some parameters which will assign weight to all these models so as I say all these models will not have equal vote in the final output there will be a different votes in this fine so this is about gradient boost one of

The famous algorithm for winning kaggle competitions and most of the things so gradient boost and there is another variant of gradient boost known as xgb extreme gradient boost please go ahead and read about this algorithm guys I am not covering because there is a slight difference between gradient boost and

Sgb you can read about that as well fine let’s move ahead to the second category of algorithms known as classification algorithms so in classification algorithms the first algorithm that I am going to cover is logistic regression now very very important guys please pay attention here and try to understand how

Logistic regression is going to work for any given scenario it’s a mathematical model hence it is important for you to understand okay suppose this is an employee data and you have 21 22k whether the employee leaves the organization or does not leave the organization just I am saying 1 0 okay

And then 40 year guy makes let’s say 42k leave 0 no 58 year guy makes let’s say 60k just for example leaves know one so this is a classification problem where we are trying to predict whether a employee will leave the organization or does not leave the organization the last

Column that you see is your target column the last column that you see is your target column this type of problem is called a classification problem because what this what the objective of this model is tomorrow I give you age of the employee for example 31 salary of

The employee for example 34k and I asked to the model hey Will the guy leave or not leave the organization okay so this is a classification problem how logistic regression will take this problem is we have we have to understand some mathematical Concepts here so if you see

Here the target column is 1 0 only so that is either one or zero one or zero okay so which means that Y which is our Target can be understand this is very important concept guys can be either 0 or 1 it cannot be anything else your target cannot be

Anything else apart from 0 or 1 but your age and salary can take any real number X can be any value between minus infinity to plus infinity right so X can be any value between minus infinity 2 plus infinity y can be only 0 or 1 okay

So what we have to understand here is we have to somehow create a relation that will enable us to predict y given X okay the problem here is on the left hand side we have minus infinity to plus infinity range that is X range okay so I

Will write here x x means independent features on the right hand side your values can be only 0 to 1 0 or 1 not 0 to 1 okay so what we do is we do not directly predict y rather we predict something else what is that something else that we

Predict so in place of predicting y we predict probabilities okay probabilities of an observation falling in y probabilities l i t i e s Okay so what we will do is we will predict probabilities then the range will be 0 to 1 as you know probability can take

The range between 0 to 1 okay now this range is also not what we are looking for minus infinity to plus infinity so what we will do is we will do one more transformation and we will make this as a odds so what is the range

Of odds 0 to Infinity okay but still we are not minus infinity 2 plus infinity range so what we will do is we will take log of odds okay log of odds okay and then the range will become minus infinity to plus infinity how your equation will look like here is

When you say log of odds right so on the right hand side it will be log of P by 1 minus p okay on the left hand side you will have beta node plus beta 1 x 1 plus so on and so forth okay this equation that you see

In front of you now is called the base equation for the logistic regression now one important concept to understand here guys this is a logic function okay and inverse of logit function H sigmoid function okay support suppose you take the inverse of this or sigmoid of this

So what will happen is if you apply Sigma at both sides so if you don’t know what is sigmoid function then sigmoid function f x looks like this 1 by 1 plus e to the power minus X this is your sigmoid function on XY plane how it will

Look like is this suppose this is your 0 this is your 1 and this is your 0.5 okay so sigmat will look like this so it will always be between 0 to 1 okay so your logistic regression this equation will be changed in the form of sigmoid

Function so your f x or P okay P will look like if you take if you take sigmoid on both sides right then on the right hand side you will just have p and here we will have 1 by 1 plus e to the power minus

Beta0 plus beta 1 x 1 okay remember guys this equation is equation 1 and this equation is equation to both the equations are same the difference is this is a logit equation and this is a sigmoid equation okay now take it if you if you take a inverse of logit that is

Nothing but sigmoid okay understand this carefully and now this equation from our example how it can be written is 1 by 1 plus e to the power minus beta 0 plus beta 1 into age okay plus beta 2 into salary this is nothing but your logistic regression equation okay and as you know

As I told you this is a sigmoid function so the output of what you see here output of this will always be between 0 to 1 which means you can get a probability and then you can say that based on this probability I can say whether the employee leaves or does not

Leave okay logistic regression is again a very important and not easy to understand concept okay so as you can see we are modeling a categorical variable against real numbers hence we need to do certain Transformations these are the Transformations that we need to do and how it relates to the probability

I just explained you now okay pros and cons mathematical model not very difficult to understand cons it again assumes a lot of things about the data which may or may not be correct hence it may not give a great result all the time okay but very famous and very important algorithm to understand

Next algorithm in the category in the classification category one simple one I want to cover here that is known as gear nearest neighbor okay it’s a pretty simple algorithm suppose in the same data on this data you want to build a k n algorithm okay so since I have data

Here so I will explain here only so what can happen is it will plot a x-ray plane like this okay and it’s a three-dimensional data so you can have one more axis for salary or you can have two access only because from two axis we can we can predict okay

So let’s come here age and let’s say here salary okay out of these three employees let’s say one employees 21 22 employees Falls here and second employee 40 Falls here and 58 Falls somewhere here okay so what K nearest neighbor will do is it will try to allocate neighbors to all these

Individual observations for example this is your observation one this is your observation two and this is your observation three okay so one does not has any neighbors but 2 is the neighbor of 3 and 3 is the neighbor of two okay so tomorrow some prediction comes for

Let’s say age 50 again I will take 50 example 50 example so what it will do is it will try to see and I will take salary also because in this case salary is also there so salary is let’s say 61k okay so what it will do

Is it will try to see where can I fit this 58 percent and salary 61k Maybe who are the nearest neighbor to that guy so the nearest neighbor to that guy may be this guy and this guy suppose that new guy comes somewhere here okay so who

Are the neighbors for this this is the first neighbor this is the second neighbor okay so it will simply go ahead and take the you know mode of results for example these these two guys are the are the second neighbors right I mean two neighbors of that so it will take 0

And 1 which is maximum so in this case there is no mode of the data but obviously if you take a larger data there will be modes of the data okay so whichever mode for example Suppose there are 30 records out of that 20 is 1 and

10 is 0. so the prediction for this guy will be whatever is maximum or whatever is mode so if the mode is one or zero whatever it is that will be the prediction for k n okay so as I told you Cannon is a pretty simple algorithm it

Will just plot your data try to find the nearest neighbors and then when a new observation comes you give how many how many Observer how many neighbors you want for that record and it will create one based on that okay so Canon is a simple to understand algorithm nothing

Complex in that so I covered quickly in that that slide itself okay now let’s try to understand another classification technique known as support Vector machines or svms so what svms will do is it will plot your data in whatever axis you have suppose age is one axis and salary is one axis okay

And your data points I will take little more data points okay your data points look like this so these are some data points and this is these are some more data points okay so what sbm will do is it will try to create something known as a decision

Boundary okay how this decision boundary is different from linear regression decision boundary is in any integration there is a pure mathematical equation involved here there is a concept of something known as a hyper plane okay for example if I draw a line between this right so all these guys black guys

You can think leaves or Target column is one all these guys you can think does not leave or Target column is zero does not leave okay suppose your data is like this so what will happen is your svm will plot this is called in the language of sbm this is

Called a decision boundary okay decision boundary so in this case your data looks pretty simple pretty separated hence the decision boundary can be as simple as a line okay but in most of the scenarios real world scenarios decision boundary cannot be as simple as this okay so

There will be some black dots here there will be some black circles here okay and there can be some this Cross Blue Cross this side right so in this case decision boundary is not doing Justice so decision boundary need to change and that is where the concept of hyper

Planes and kernels two very important Concept in svm guys if you want to explore more on sbm hyper planes okay and kernels so when your data become complex then simple decision boundaries cannot predict it well okay so you need to have a have a complex decision boundary and

That is where hyperplane and kernels concept come but just to give you an idea of how svm works it will create a decision boundary and tomorrow any prediction any new result come for example somebody asks what is the um you know for a person with for a person with age 50

And salary is 60k whether the person will leave or not leave so this svm model will see on which side of decision boundary this guy is falling if this guy falls on this side of decision boundary it says do not leave if this guy falls on this side of decision boundary it

Says leaves okay so in svm remember concept of decision boundaries hyper planes kernels and kernel tricks okay so we have covered three things from the classification scenarios and five things from the regression scenarios let’s go ahead and try to see some unsupervised learning problems okay so what is the

Meaning of unsupervised learning till now we are having a Target column but in unsupervised learning we may not have a Target column okay suppose for the same employee data we have age and salary but somebody comes to you and tells you that hey can you tell me if there are

Different buckets of employees existing in my organization different buckets means some people with less age and more salary some people with more is endless salary so are there different buckets somebody can can come and ask you okay so how you will solve that problem is by using something knowledge Clustering or

Segmentation okay so suppose the task in hand is here there are three records only but there can be more records right in the real world scenario what I am interested in knowing is if there are natural clusters in my organization so this is my organization data on one axis

I have is on other axis I have salary okay and I have multiple data points here three data points only but I am plotting more data points just for demonstration okay so there is nothing to predict but employed is interested in knowing if there are buckets means if

Few employees are closer to each other in terms of their characteristics so for example these employees are closer you can call bucket one these employees are closer you can call bucket two or segment 2. okay but how this will be implemented is in K means clustering so one technique for implementing bucketing

Is K means clustering okay there can be other techniques also for segmentation or bucketing one technique is K means clustering in this technique what will happen is the distance between the various employees will be computed for example this is your employee one and this is your employee two okay suppose I

Ask you how similar is employee one from employee two so there can be different similarity metric that you can compute for example euclidean distance or Manhattan distance or cosine similarity Etc I have detailed video on these things as well I will link it in the description but suppose I tell you a

Simple uh you know how the distance how the similar Sim how these two employees are similar or different so you will say 21 minus 40 whole Square plus 20K 20K minus 42k whole Square so on all the dimensions you are taking the distance between them squaring it and under

Rooting it this is called euclidean distance between E1 and U2 whatever number you get it okay so suppose E1 and E2 equilibrium distance is less and E1 and E3 equilibrium distance is more so in that case you say E1 and E2 are closer to each other okay and in the similar way

You start finding the employees which are closer to each other and then you call this as one bucket similarly this score you call is at an another bucket okay remember I have explained you in simple terms but there is a very important Concept in k-means known as

Centroid concept okay so please go ahead and watch unfold data science detailed the video on k-means clustering you will understand all the details of how centroid is defined and how this algorithm works at a mathematical level okay I will link that video please ensure you watch that

So this is about k-means clustering now last but not the least guys you might have seen in Amazon and Flipkart that there are different different uh products that is recommended to you for example if you buy a laptop then it will tell you hey go ahead and buy this

Laptop bag as well so this is nothing but a recommendation okay in the Netflix if you watch let’s say one movie one action movie let’s say if you watch Mission Impossible then it will go and recommend you Jack Ryan series maybe okay so this is called a recommendation

System that is running in background okay so how this system works one simple uh yet powerful technique for recommender system is known as collaborative filtering collaborative filtering okay so what collaborative filtering does is it will take users okay users and it will take items try to understand this simple concept

Edge it’s pretty simple to understand so users can be a month and users can be John and users can be do okay and in the items we can have let’s say Mission Impossible in the atoms we can have Jack Ryan in the atoms we can have another any movie

Of James Bond Series in the atoms we can have Spiderman okay in the atoms we can have any comedy movies for example home alone you can say okay so Aman which movie Aman watches or which movie Aman has watched for example Mission Impossible Aman has watched Jack

Ryan he has watched but he has not watched let’s say this movie Zero I will say okay James one movie and this movie he has not not uh watched okay Spider-Man movie there is another guy John who has watched Mission Impossible Jack Ryan James Bond movie and Spider-Man movie as well

There is another guy doe who has not watched any of these movies but has watched Home Alone the comedy movie okay so the which users are similar to which user will be computed based on one of the user similarity metric so what are the user similarity metric I told you

Cosine similarity it can be different kind of distance metric so as you can think from the common sense also here Aman watches action movies if you can see here and John also watches action movies more Mission Impossible and Jack Ryan but Aman has not watched James one movie and

Aman has not watched Spider-Man movie so what will happen is since Aman and Jon are similar to each other so go ahead and watch the movies that Jon has watched but Aman has not watched because Aman and Jon both tastes are similar so go ahead and recommend what John has

Watched but Aman has not watched so what will be the recommendation going to Aman James Bond movie and Spider-Man movie Okay now imagine this is a large metric of large users and large items so it will be seen which users tastes are similar to each other okay and then

The other user which has not watched that movie will be recommended the movies or series based on the similar users watching history okay this is pretty simple but powerful technique known as collaborative filtering so let’s revise once guys what all we discussed long discussion but very very fruitful for you to revise few

Fundamental concepts linear regression decision tree random Forest data boost gradient boost for segregation we discussed classification I explained you logistic regression how svm works and how k n works and I explained you two unsupervised technique came instant collaborative filtering now not in too much detail I went because it’s not

Possible to go in all the details of 10 algorithms in short time but read this as a refresher and please go ahead and click all the links of whichever algorithm you are more interested in learning all the videos are there on all four data science okay

I request you guys please press the like button if you like this video and please press the Subscribe button and the bell icon if you want me to create more videos like this see you all in the next video guys wherever you are stay safe and take care

10 ML algorithms in 45 minutes | machine learning algorithms for data science | machine learning

#machinelearning #datascience

Hello ,

My name is Aman and I am a Data Scientist.

All amazing data science courses at most affordable price here:

Please find link for all algorithms in detail:

Linear regression :

Logistic Regression :

Ensemble models :

SVM :

Kmeans :

Recommendation engine :

Topics for the video:

10 ML algorithms in 45 minutes

machine learning algorithms for data science

machine learning algorithm interview question and answers

machine learning algorithm in hindi

machine learning algorithm mathematics

machine learning all topics

machine learning algorithm telugu

machine learning algorithm projects

About Unfold Data science: This channel is to help people understand basics of data science through simple examples in easy way. Anybody without having prior knowledge of computer Programming or statistics or machine learning and artificial intelligence can get an understanding of data science at high level through this channel. The videos uploaded will not be very technical in nature and hence it can be easily grasped by viewers from different background as well.

Book recommendation for Data Science:

Category 1 – Must Read For Every Data Scientist:

The Elements of Statistical Learning by Trevor Hastie –

Python Data Science Handbook –

Business Statistics By Ken Black –

Hands-On Machine Learning with Scikit Learn, Keras, and TensorFlow by Aurelien Geron –

Ctaegory 2 – Overall Data Science:

The Art of Data Science By Roger D. Peng –

Predictive Analytics By By Eric Siegel –

Data Science for Business By Foster Provost –

Category 3 – Statistics and Mathematics:

Naked Statistics By Charles Wheelan –

Practical Statistics for Data Scientist By Peter Bruce –

Category 4 – Machine Learning:

Introduction to machine learning by Andreas C Muller –

The Hundred Page Machine Learning Book by Andriy Burkov –

Category 5 – Programming:

The Pragmatic Programmer by David Thomas –

Clean Code by Robert C. Martin –

My Studio Setup:

My Camera :

My Mic :

My Tripod :

My Ring Light :

Join Facebook group :

Follow on medium :

Follow on quora:

Follow on twitter : @unfoldds

Get connected on LinkedIn :

Follow on Instagram : unfolddatascience

Watch Introduction to Data Science full playlist here :

Watch python for data science playlist here:

Watch statistics and mathematics playlist here :

Watch End to End Implementation of a simple machine learning model in Python here:

Learn Ensemble Model, Bagging and Boosting here:

Build Career in Data Science Playlist:

Artificial Neural Network and Deep Learning Playlist:

Natural langugae Processing playlist:

Understanding and building recommendation system:

Access all my codes here:

Have a different question for me? Ask me here :

My Music:

00:00 if you have an interview coming up and

00:02 you want to revise 10 most important

00:04 machine learning algorithms real quick

00:07 you will not find a better video than

00:09 this let’s go ahead and do the revision

00:11 of 10 most frequent used ml algorithms

00:14 these are the 10 algorithms I am going

00:17 to explain you how they work and what

00:19 are their pros and cons okay and as you

00:23 can see first five algorithms is in one

00:25 color next three is in a different color

00:27 and last two is in a different color

00:29 there is a reason for that guys I will

00:31 tell you in a moment but before that

00:33 let’s try to answer two basic questions

00:37 okay let’s try to answer what is machine

00:40 learning and what are algorithms okay so

00:44 I’ll start with a non-bookish definition

00:47 and I will give you one simple example

00:49 suppose you want to travel from

00:52 Bangalore to Hyderabad okay where you

00:55 want to go you want to go from Bangalore

00:57 to Hyderabad

00:58 for this you can either take a train or

01:03 you can either take a flight or you can

01:07 take a bus as well or maybe you can

01:10 drive your own car as well okay so two

01:13 things we have to understand here guys

01:15 what is the task okay and what is the

01:19 approach

01:21 fine

01:22 so the task in hand is we have to go

01:25 from Bangalore to Hyderabad okay and the

01:28 approach is all these three options that

01:31 I told you just now

01:32 now related to the world of machine

01:34 learning in machine learning the task

01:37 can be different kinds of tasks okay for

01:41 example it can be a regression task

01:44 okay or it can be a classification task

01:48 okay

01:49 or it can be a unsupervised learning

01:53 problem I will just write unsupervised

01:55 okay so in approach section we can have

02:00 different different approaches based on

02:02 if we are solving a regression problem

02:05 or we are solving a classification or we

02:07 are solving a particular case of

02:09 unsupervised learning okay in regression

02:12 also we can take many approaches for

02:16 example in regression there is not only

02:17 one approach in regression I can take

02:20 approach one approach two approach 3

02:22 approach 4 approach five in

02:24 classification I can take this approach

02:25 this approach this approach in

02:27 unsupervised also I can take multiple

02:29 approaches so that is why this color

02:32 coding is there

02:33 the first five algorithms that you see

02:36 here

02:37 will solve I will explain you for

02:39 regression use case Okay so there we

02:42 will take a regression use case and try

02:44 to understand how to solve that using

02:46 these five algorithms okay the next

02:49 three that you see I am going to explain

02:51 you with a classification use case so

02:54 these approaches are for classification

02:56 problem okay

02:58 and last two I am going to explain you

03:00 for a unsupervised learning problem how

03:03 that will be this these algorithms will

03:04 be used to solve unsupervised learning

03:06 problem okay

03:08 so let’s go ahead guys and try to

03:10 understand with a simple input data I

03:13 have taken a sample input data here and

03:15 let’s without any delay start on the

03:18 first algorithm known as linear

03:19 regression so machine learning is all

03:22 about learning pattern from the data

03:24 using algorithms okay so if we are using

03:28 a algorithm known as linear regression

03:30 then what will happen let’s try to

03:33 understand that so first algorithm of

03:36 our list linear regression okay now

03:39 suppose this is the employee data of an

03:40 organization you have a age column you

03:42 have a salary column fine so 22 years

03:45 person earns 23 000 and so on and so

03:48 forth suppose we using the linear

03:50 regression approach to solve this

03:52 regression problem now as I told you

03:54 first five problems will be regression

03:56 problems first five algorithms you will

03:58 understand using regression problem okay

03:59 come here this is your data so what

04:02 linear regression will do is

04:04 it will just take this data and it will

04:08 see how the data is plotted on a XY

04:11 plane like this for example on one axis

04:14 we can take salary okay on y axis

04:18 and on x axis we can take Edge okay and

04:22 I am just roughly pointing these points

04:24 okay first point 22 and 23 000 maybe it

04:28 can come somewhere here on x axis if you

04:31 put h on Y axis salary I am just putting

04:33 here second data point can come

04:36 somewhere here let’s say 41 and 80 000

04:38 data points and third data point 58 and

04:41 150k this data point can come maybe

04:43 somewhere here I can say okay

04:47 so what linear regression will do is it

04:50 will try to plot a line okay ideally

04:53 what the assumption is all these points

04:56 should fall on same line

04:58 a line like this can be plotted or a

05:00 line like this can be plotted but the

05:03 Assumption here is

05:04 ideally in an Ideal World all these

05:07 points will fall in the same line but it

05:09 will never happen in the real world so

05:11 what logistic linear regression will do

05:14 is it will try to fit something known as

05:16 a best fit line okay so this is your

05:18 best fit line let’s assume that how this

05:21 best fit line is computed it will try to

05:24 minimize the distance from all these

05:27 points together so distance from this

05:29 point is this distance from this point

05:31 is this parallel to Y axis distance from

05:34 this point is this okay so you can call

05:36 this even you can call this E2 you can

05:38 call this E3 okay so what linear

05:41 regression will do is it will try to

05:44 minimize even Square

05:46 plus E2 square plus E3 Square

05:49 for whichever line it finds the minimum

05:52 even Square E2 Square E3 Square it will

05:55 call that line as the model

05:57 okay it will call that line as the model

06:00 now as you know from your normal

06:02 understanding of mathematics

06:04 this straight line will have a equation

06:07 in the form of mostly simplest we can

06:09 write Y is equal to MX plus C right in

06:13 our case I can say salary is equal to M

06:17 times h m times of H this is

06:20 multiplication plus c c can be an

06:23 intercept let’s give some number here

06:25 some random number I will give let’s say

06:27 2000 okay

06:29 so imagine this line which is the model

06:32 for linear regression has this formula

06:35 okay now the next question comes

06:38 tomorrow when the pattern has been

06:40 learned and a new age comes let’s say

06:43 age is 50.

06:44 so what will be the salary for that

06:46 person so very simple the model will

06:49 come here and put the numbers here for

06:52 example if for M we can put any number

06:54 let’s say 0.2

06:56 then age will be 50 and then salary will

07:00 be intercept will be 2000 whatever this

07:03 calculation comes that will be the

07:05 prediction of the salary for this 50

07:07 okay very simple very simple

07:10 mathematical model the assumption is

07:13 there is a linear relation between

07:15 independent variable and Target variable

07:17 okay that’s the Assumption so what it

07:20 will do it will try to plot a line what

07:22 it will call as a best fit line wherever

07:24 it finds this value as minimum once the

07:27 best fit line comes then how the

07:29 prediction happens like this okay

07:32 obviously there will be pros and cons of

07:34 all the algorithms all the models so

07:36 what is the pros and cons of linear

07:38 regression the the pluses or Pros for

07:41 this model will be it’s a simple to

07:42 understand model it’s a mathematical

07:44 model you can explain to someone but the

07:46 cons will be

07:48 um it’s not necessary that your data

07:51 will always be this simple that can be

07:53 fit in a line right or close to a line

07:55 so it’s a simple model hence lot of real

07:59 world problems it may be difficult to

08:00 solve with simple linear regression

08:02 there can be a varieties in linear

08:04 regression that

08:05 um I have created videos you can watch

08:07 through those videos but simply linear

08:10 regression works like this okay this is

08:12 one first approach

08:14 first approach means first algorithm now

08:16 let’s go ahead and try to see how

08:18 decision tree will approach the same

08:21 problem okay how decision tree will

08:23 approach this same problem

08:25 so if you give this same data okay if

08:28 you give the same data to decision tree

08:30 and you ask hey learn pattern from this

08:33 data what decision tree will do is

08:36 it will just try to break the data how

08:38 it will break the data is it will create

08:40 a rule like this okay so I can write a

08:43 rule here for example I can say is less

08:47 than equals to 30 this is a rule okay so

08:51 some records will satisfy this rule okay

08:54 some records will satisfy and some

08:56 records will not satisfy this way data

08:59 will break okay if you come here is less

09:02 than 30 how many records only one record

09:04 is more than 30 two records so how many

09:08 records will come this side only one

09:10 record will come okay so let’s say that

09:13 record is

09:15 I should not write the wrong Numbers 22

09:17 23k 4180k

09:20 so I will write here 22 and 23 K and

09:25 here I will write 41 and 80k okay and

09:28 there is one more record let me take the

09:30 numbers 58 and 150k

09:32 58 and

09:34 150k understand this carefully guys

09:36 because for next next algorithms this is

09:38 the base okay

09:40 so decision tree will split your data

09:42 like this so you had total how many

09:44 records in the beginning three records

09:46 here how many records you are having one

09:48 record here how many records you are

09:49 having two records okay so this is first

09:51 level of split now definitely can split

09:54 it one more time okay

09:56 so tree can make here there are limited

09:59 number of Records but imagine if there

10:00 are more records there can be one more

10:02 split here saying you know another

10:05 filter is is maybe less than 40 or

10:08 something like this okay but I will not

10:10 take that now that will make the tree

10:12 complex okay so this is your model

10:15 breaking your data based on some

10:18 conditions is nothing but your model so

10:21 somebody asks you what is a model in

10:22 decision tree this is your model now the

10:25 important question is suppose tomorrow

10:27 somebody comes and asks for a person

10:30 with age 50 what is your prediction for

10:33 a person with age 50 what is your

10:35 prediction very very important concept

10:37 to understand guys decision tree will

10:39 come and check what is this for age 50

10:43 okay so age 50 will come in which

10:45 category will come in this line okay in

10:49 this line how many records are there two

10:51 records so decision tree will go ahead

10:53 and take the average of these two

10:55 salaries so for age 50 your prediction

10:58 will be what will be the prediction guys

11:00 for age 50 prediction will be 80k plus

11:04 150k divided by 2. okay this is how

11:08 decision tree will be making the

11:09 prediction

11:10 suppose you ask through this entry hey

11:13 what will be the salary of a person with

11:15 age 21 so it will not go to right hand

11:17 side it will go to left hand side

11:19 because this is the tree branch in which

11:21 it should go it will directly say 23k in

11:23 this case because there is only one

11:24 record Suppose there are two records it

11:26 will take the average okay so you see

11:29 how these two approaches are different

11:31 for solving same regression problem here

11:33 a mathematical line will be fit and here

11:36 a decision tree you know data will be

11:39 broken into multiple pieces and

11:40 prediction will be made okay remember

11:43 guys decision tree is based for many

11:45 other Advanced algorithms and our third

11:48 algorithm in the list is something non

11:50 as a random Forest okay a random Forest

11:54 what random Forest will do is it will

11:56 say decision tree okay you have done a

11:58 good job but

12:00 uh there is a chances of overfitting of

12:03 the data so we did not discuss pros and

12:05 cons of this process it’s a simple model

12:07 you know you don’t need to do a lot of

12:09 mathematics Etc and cons is there is a

12:12 chances of overfitting because you know

12:14 if there is a little change in the data

12:16 your model may change totally that’s a

12:18 risk here in decision tree so

12:20 overfitting

12:21 So Random Forest will come and say Hey

12:24 you are taking a right approach but

12:26 there is a chances of overfitting so why

12:28 don’t you fit multiple trees so what

12:31 random Forest will do is it will come

12:33 and create multiple trees this is your

12:35 tree one okay like the way we saw

12:37 decision tree this is your for example

12:39 tree one okay this is your for example

12:42 tree two

12:44 okay

12:45 and similarly there can be n number of

12:48 trees okay similarly there can be n

12:51 number of trees so we will call this as

12:54 T1 we will call this as T2 and that

12:58 there can be you know 500 trees for

13:01 example

13:02 so what random Forest will do is it will

13:05 say two deficiently hey if you are

13:06 fitting one tree there is a chance of

13:08 result being biased or there is a chance

13:11 of overfitting or there is a chance of

13:13 model not being stable but what I will

13:15 do is I will fit 500 trees okay and how

13:18 I will make the prediction is very

13:20 important to understand here guys

13:22 prediction of random Forest will be

13:26 average of all these prediction for

13:28 example if we are trying to predict for

13:31 the age 50 right for the age 50 what

13:34 will be the salary if we are trying to

13:36 predict okay then in random Forest it

13:40 will take prediction from tree one plus

13:44 prediction from 3 2. Plus

13:48 prediction from tree 500 okay it will

13:52 take all the predictions and it will

13:54 take average of that

13:56 what is the what is the thing that we

13:58 are trying to achieve here suppose in

14:01 one decision tree your tree is

14:03 overfitting or not performing well or is

14:05 biased okay so what may happen in

14:07 diffusion trees since you are taking a

14:09 feedback from 500 different trees so

14:11 that overfitting problem or model in

14:13 stability problem may not be there okay

14:15 so this is how random Forest is

14:17 different from decision tree remember

14:19 all these individual trees will not be

14:22 using all the data for example

14:25 suppose in your data there is one

14:28 thousand rows and 10 columns okay just

14:31 an example I am giving so all these all

14:34 these trees will not use necessarily all

14:38 the records it may be possible that tree

14:41 One is using 100 records and three

14:44 columns randomly selected three two T2

14:47 is using three two hundred records and

14:49 three columns randomly selected okay and

14:52 that is the advantage of this random

14:54 Forest that all these trees Will May

14:57 learn a different kind of pattern and

14:59 when you take a aggregated result then

15:02 you will have all the flavors okay this

15:04 kind of learning that I just explained

15:06 you is known as and Sample learning okay

15:10 remember guys at unfold data science you

15:13 will find a big playlist explaining all

15:15 the algorithms of Ensemble learning in

15:17 detail I will paste the link in the

15:19 description you must check if you have

15:21 any confusion on how and simple learning

15:23 works okay

15:25 but there is more to Ensemble learning

15:28 what happened just now in random Forest

15:30 is known as parallel way of learning

15:33 okay parallel way of learning

15:37 parallel way of learning why parallel

15:40 way of learning guys because here tree

15:44 one and three two and three three are

15:45 independent of each other when you call

15:47 a random forest model 31 can start

15:50 building by taking a sub sample of the

15:52 data 3 2 can start building by taking a

15:54 subsample of the data they are not

15:55 dependent on each other okay so all

15:58 these things can happen parallely hence

16:00 we call it a parallel learning now the

16:03 question is is there another way of

16:04 learning in Ensemble yes there comes our

16:07 next algorithm known as add a boost okay

16:10 Ada boost standing for adaptive boosting

16:14 so what Ada boost will do is

16:16 let me write the data here let me write

16:19 the data one more time and I may be

16:21 writing some different numbers

16:23 so that’s not important just

16:25 understanding the concept is important

16:27 okay so 42 I will write 50 000 and let’s

16:31 say 58 I will write 150 000 just as an

16:36 example this is your input data

16:38 so boosting is another technique

16:40 boosting is another technique of

16:43 Ensemble category okay in boosting

16:47 especially at a boost what will happen

16:49 is it will assign a weight to all your

16:52 observations okay suppose this is your

16:55 original data for training salary being

16:56 your target column so initial weights

17:00 initial weights

17:03 okay

17:04 and

17:06 what the initial weights will be it will

17:08 be the same weight for all your records

17:10 for example there are three records so

17:12 one by three I am saying one by three I

17:14 am saying one by three I am saying so

17:16 all the rows are equally important okay

17:19 try to understand the concept guys in

17:21 Ada boost in the beginning first

17:23 iteration all the rows are equally

17:25 important okay but how Ada boost works

17:29 is in the name only there is adaptive it

17:32 adapts to the mistakes of the previous

17:34 model now why I am saying a previous

17:36 model and next model is one thing you

17:39 have to always remember at a boost is a

17:42 sequential learning process you you

17:45 remember how I just now told random

17:46 Forest is a parallel learning process

17:49 so in random Forest

17:51 tree one and three two are independent

17:53 of each other okay it will take a sub

17:55 sample and create it will take a sub

17:56 sample and create nothing to do with

17:58 each other

17:59 but in adoboost or other boosting

18:01 techniques

18:02 it’s a sequential model so there will be

18:04 a multiple models in this so there will

18:07 be multiple models fitted to the data I

18:09 will tell you in a moment what these

18:10 models will be model 1 model 2 model 3

18:14 Model 4 and so on and so forth how many

18:17 ever model comes but it will not happen

18:19 parallely okay it will happen in

18:21 sequence

18:22 now the important thing to understand is

18:24 how this sequence will be generated okay

18:27 so what will happen is this model one

18:31 you can think of as a base model this

18:33 model one you can think of as a base

18:35 model and remember in Ada boost your

18:38 decision trees will look like stumps

18:40 stumps means there will be a tree like

18:42 this and there will be another tree like

18:44 this so it will the depth of the tree

18:47 will not be Beyond one level okay so

18:49 this is called stumps in the language of

18:51 machine learning

18:52 so multiple stems will be created now

18:55 suppose your model 1 is this first stump

18:58 what is your model one guys this first

19:00 stump okay

19:02 model one comes and make some prediction

19:05 about the salary model one comes and

19:08 make some predictions about this salary

19:09 okay so what we will have is another

19:12 column called as salary underscore

19:16 prediction and where from this

19:18 prediction Comes This prediction comes

19:19 from model one the first model okay so

19:22 obviously there will be some mistakes so

19:24 22 000 may be said as 21 900 and 50 and

19:29 150 can be said as 50 can be said as

19:32 let’s say 52 000 okay and 150 can be

19:36 said as let’s say two hundred thousand

19:37 based on this first model first decision

19:40 tree that it is creating which I am

19:42 calling a system so there will be some

19:45 differences between actual and predicted

19:48 and from this there will be a residual

19:51 coming residual means errors right

19:53 residual means errors okay so what will

19:56 be the errors 21 900 minus 22 000 right

20:00 so it will be for example I can say a

20:03 hundred

20:04 actual minus predicted it is minus two

20:06 thousand and it is minus minus 50 000

20:10 because we have put okay so this is the

20:13 errors these are the actual values and

20:16 the first model what it predicts right

20:18 those are the errors from the first

20:20 model OKAY twenty two thousand minus

20:22 twenty one nine hundred is one hundred

20:24 and so on and so forth

20:26 now

20:27 these are the initial weights okay

20:29 so what will happen in the next model

20:31 when the M2 is fitted right these

20:34 initial weights will be changed and more

20:37 preference will be given to the

20:39 observations where these residuals are

20:41 more okay I am repeating one more time

20:43 guys M1 will predict this and then

20:47 residuals or errors will come when the

20:49 M2 is trained right then the weights

20:52 will not be same for all these three

20:54 records rather weight will be increased

20:57 for this because you are getting more

21:00 errors here and weight will be decreased

21:03 for this because you are getting less

21:04 error here okay

21:06 and so on and so forth M2 will come

21:09 compute create the residual then again

21:12 weights will be adjusted M3 will come

21:14 predict residual will be calculated

21:16 weights will be adjusted and finally

21:19 what you will get is a combination of

21:22 what will be your final model your final

21:25 model will be a combination of base

21:29 model I am calling it the first model

21:30 okay plus M1 plus M2 plus M3 plus so on

21:37 and so forth remember this this is not a

21:40 mathematical equation this is just

21:41 indicative equation I am giving you okay

21:43 if you want to understand more

21:45 mathematics behind it please go ahead

21:47 and click on the link I’m giving you in

21:49 the description okay and all these

21:52 things will not have equal say in the

21:54 final output their say also will be

21:56 different in the final output for

21:58 example in random Forest you saw all the

22:01 models have equal C in the final output

22:03 we are dividing by 500 okay

22:05 but here all these models will not have

22:07 equal say they will have an equal say

22:09 okay

22:10 let’s move ahead to another what is the

22:13 pros and cons for this model again this

22:15 model will give you a may give you a

22:17 better result than most of the models

22:18 because it is adapting to the changes

22:20 but if you have a larger data side it

22:23 may it may need more resources to train

22:25 and also it is a one kind of Black Box

22:28 model some kind of Black Box model means

22:31 you don’t have much explanation of what

22:32 is going on inside apart from some hyper

22:34 parameters okay

22:36 let’s move ahead to the last algorithm

22:38 integration category known as gradient

22:40 boost okay what is the last algorithm

22:43 integration category gradient boost

22:45 remember guys all these algorithms that

22:47 I’m explaining you I have not taken

22:49 anything that is used less all are used

22:51 more only okay

22:53 so I will take a simple data age

22:56 salary is 21 salary let’s say 20K is 40

23:01 salaries let’s say 42k is 58 salary is

23:06 let’s say 60k this is your input data

23:08 and you want to run a gradient boost on

23:09 this

23:10 what will happen is understand guys this

23:13 is again a sequential learning not a

23:15 parallel learning okay so there will be

23:18 a base prediction

23:20 for all these data base prediction okay

23:23 base prediction

23:24 what is the base prediction guys base

23:26 prediction is nothing but it’s a kind of

23:29 dumb model it will assume that for all

23:31 these guys it will be a average of you

23:34 know all these three records so what is

23:36 the average of this uh 80 plus 42.

23:40 80 plus 42 divided by 3 right so 2 1 1

23:45 2. right let’s say assume for Simplicity

23:49 this is 36k okay so the base prediction

23:52 will be put here 36k 36k 36k one is the

23:57 base prediction comes then there will be

23:59 a residual computed okay residual will

24:02 be the difference between actual and

24:03 predicted values whatever these numbers

24:05 are

24:06 fine now comes the interesting part how

24:09 gradient boost is different from Ada

24:11 boost or other algorithms so what

24:14 gradient boost will do is it will try to

24:17 fit a model on this residual okay it

24:21 will try to fit a model on this residual

24:23 and try to minimize these residuals so

24:26 that will be called as a base model okay

24:29 and then there will be next model you

24:33 can call it residual model one okay and

24:36 then there will be a next model you can

24:38 call it residual model 2 and so on and

24:41 so forth okay so what will happen is

24:43 residuals will be computed and then

24:46 whatever the residual comes based on

24:48 that base prediction will be updated so

24:51 for example let’s say your residual here

24:53 is how much 20 minus 36 minus 16 is your

24:57 residual right

24:58 so this will act as a independent column

25:02 and this residual will act as a Target

25:04 column

25:05 and then let’s say in the prediction

25:07 this minus 16 is is comes as let’s say

25:11 minus 10. so what will happen is this

25:14 base prediction will get updated by this

25:16 this base prediction will get updated

25:18 again it’s a complicated model if you

25:21 want to understand more details there

25:23 are links in the description please

25:24 click on that it will be very clear to

25:26 you okay so what will happen base model

25:29 plus residual model 1 plus residual

25:31 model 2 so on and so forth and there

25:34 will be some parameters which will

25:35 assign weight to all these models so as

25:37 I say all these models will not have

25:39 equal vote in the final output there

25:41 will be a different votes in this fine

25:43 so this is about gradient boost one of

25:45 the famous algorithm for winning kaggle

25:47 competitions and most of the things so

25:49 gradient boost and there is another

25:51 variant of gradient boost known as xgb

25:54 extreme gradient boost please go ahead

25:56 and read about this algorithm guys I am

25:58 not covering because there is a slight

26:00 difference between gradient boost and

26:01 sgb you can read about that as well fine

26:04 let’s move ahead to the second category

26:06 of algorithms known as classification

26:08 algorithms so in classification

26:10 algorithms the first algorithm that I am

26:12 going to cover is logistic regression

26:15 now very very important guys please pay

26:17 attention here and try to understand how

26:19 logistic regression is going to work for

26:22 any given scenario it’s a mathematical

26:24 model hence it is important for you to

26:27 understand okay suppose this is an

26:29 employee data and you have 21 22k

26:33 whether the employee leaves the

26:35 organization or does not leave the

26:36 organization just I am saying 1 0 okay

26:39 and then 40 year guy makes let’s say 42k

26:42 leave 0 no 58 year guy makes let’s say

26:45 60k just for example leaves know one so

26:49 this is a classification problem where

26:51 we are trying to predict whether a

26:53 employee will leave the organization or

26:54 does not leave the organization the last

26:57 column that you see is your target

26:58 column the last column that you see is

27:00 your target column this type of problem

27:03 is called a classification problem

27:05 because what this what the objective of

27:07 this model is tomorrow I give you age of

27:09 the employee for example 31 salary of

27:12 the employee for example 34k and I asked

27:15 to the model hey Will the guy leave or

27:17 not leave the organization okay so this

27:20 is a classification problem how logistic

27:22 regression will take this problem is we

27:24 have we have to understand some

27:25 mathematical Concepts here so if you see

27:28 here the target column is 1 0 only so

27:32 that is either one or zero one or zero

27:34 okay so which means that Y which is our

27:38 Target can be understand this is very

27:40 important concept guys

27:43 can be either 0 or 1 it cannot be

27:47 anything else your target cannot be

27:49 anything else apart from 0 or 1 but your

27:51 age and salary can take any real number

27:54 X can be

27:57 any value between minus infinity to plus

27:59 infinity right

28:01 so X can be any value between minus

28:03 infinity 2 plus infinity y can be only 0

28:05 or 1 okay

28:07 so what we have to understand here is we

28:11 have to somehow create a relation that

28:13 will enable us to predict y given X okay

28:16 the problem here is on the left hand

28:19 side we have minus infinity to plus

28:22 infinity range that is X range okay so I

28:24 will write here x x means independent

28:27 features on the right hand side your

28:30 values can be only 0 to 1 0 or 1 not 0

28:33 to 1 okay

28:35 so what we do is we do not directly

28:38 predict y rather we predict something

28:40 else what is that something else that we

28:43 predict so in place of predicting y we

28:47 predict probabilities okay probabilities

28:50 of an observation falling in y

28:52 probabilities

28:54 l i t i e s Okay so

28:57 what we will do is we will predict

28:59 probabilities then the range will be 0

29:01 to 1 as you know probability can take

29:03 the range between 0 to 1 okay

29:06 now this range is also not what we are

29:08 looking for minus infinity to plus

29:10 infinity so what we will do is we will

29:12 do one more transformation and we will

29:14 make this as a odds so what is the range

29:16 of odds 0 to Infinity okay but still we

29:20 are not minus infinity 2 plus infinity

29:21 range so what we will do is we will take

29:24 log of odds okay log of odds

29:29 okay and then the range will become

29:31 minus infinity to plus infinity

29:33 how your equation will look like here is

29:36 when you say log of odds right so on the

29:39 right hand side it will be log of P by 1

29:42 minus p

29:43 okay on the left hand side you will have

29:46 beta node plus beta 1 x 1 plus so on and

29:51 so forth okay this equation that you see

29:54 in front of you now is called the base

29:56 equation for the logistic regression now

29:58 one important concept to understand here

30:00 guys this is a logic function okay and

30:04 inverse of logit function H sigmoid

30:06 function okay support suppose you take

30:10 the inverse of this or sigmoid of this

30:12 so what will happen is if you apply

30:14 Sigma at both sides so if you don’t know

30:16 what is sigmoid function then sigmoid

30:18 function f x looks like this 1 by 1 plus

30:22 e to the power minus X this is your

30:24 sigmoid function on XY plane how it will

30:27 look like is this suppose this is your 0

30:28 this is your 1 and this is your 0.5 okay

30:32 so sigmat will look like this so it will

30:36 always be between 0 to 1 okay so your

30:40 logistic regression this equation will

30:42 be changed in the form of sigmoid

30:44 function so your f x or P okay P will

30:49 look like if you take if you take

30:51 sigmoid on both sides right then on the

30:54 right hand side you will just have p and

30:56 here we will have 1 by 1 plus e to the

30:59 power minus

31:00 beta0 plus beta 1 x 1 okay remember guys

31:05 this equation is equation 1 and this

31:09 equation is equation to both the

31:11 equations are same the difference is

31:14 this is a logit equation and this is a

31:16 sigmoid equation okay now take it if you

31:20 if you take a inverse of logit that is

31:22 nothing but sigmoid okay understand this

31:24 carefully and now this equation from our

31:27 example how it can be written is 1 by 1

31:31 plus e to the power minus beta 0

31:34 plus beta 1 into age okay plus beta 2

31:40 into salary

31:42 this is nothing but your logistic

31:44 regression equation okay and as you know

31:46 as I told you this is a sigmoid function

31:48 so the output of what you see here

31:51 output of this will always be between 0

31:54 to 1 which means you can get a

31:56 probability and then you can say that

31:58 based on this probability I can say

32:00 whether the employee leaves or does not

32:02 leave okay logistic regression is again

32:05 a very important and not easy to

32:07 understand concept okay so as you can

32:10 see we are modeling a categorical

32:12 variable against real numbers hence we

32:15 need to do certain Transformations these

32:17 are the Transformations that we need to

32:19 do and how it relates to the probability

32:21 I just explained you now okay pros and

32:24 cons mathematical model not very

32:26 difficult to understand cons it again

32:28 assumes a lot of things about the data

32:30 which may or may not be correct hence it

32:33 may not give a great result all the time

32:35 okay but very famous and very important

32:38 algorithm to understand

32:40 next algorithm in the category in the

32:43 classification category one simple one I

32:45 want to cover here that is known as gear

32:47 nearest neighbor okay it’s a pretty

32:49 simple algorithm suppose in the same

32:51 data on this data you want to build a k

32:54 n algorithm okay so since I have data

32:57 here so I will explain here only so what

32:59 can happen is it will plot a x-ray plane

33:01 like this okay and it’s a

33:04 three-dimensional data so you can have

33:06 one more axis for salary

33:08 or you can have two access only because

33:10 from two axis we can we can predict okay

33:12 so let’s come here age and let’s say

33:15 here salary okay

33:17 out of these three employees let’s say

33:19 one employees 21 22 employees Falls here

33:22 and second employee 40 Falls here and 58

33:26 Falls somewhere here okay so what K

33:28 nearest neighbor will do is it will try

33:31 to allocate neighbors to all these

33:33 individual observations for example this

33:37 is your observation one this is your

33:38 observation two and this is your

33:39 observation three okay so one does not

33:43 has any neighbors but 2 is the neighbor

33:46 of 3 and 3 is the neighbor of two okay

33:48 so tomorrow some prediction comes for

33:51 let’s say age 50 again I will take 50

33:53 example

33:54 50 example

33:55 so what it will do is it will try to see

33:58 and I will take salary also because in

34:00 this case salary is also there so salary

34:02 is let’s say 61k okay so what it will do

34:06 is it will try to see where can I fit

34:08 this 58 percent and salary 61k Maybe

34:13 who are the nearest neighbor to that guy

34:15 so the nearest neighbor to that guy may

34:17 be this guy and this guy suppose that

34:19 new guy comes somewhere here okay so who

34:22 are the neighbors for this this is the

34:24 first neighbor this is the second

34:25 neighbor okay so it will simply go ahead

34:28 and take the you know mode of results

34:31 for example these these two guys are the

34:34 are the second neighbors right I mean

34:36 two neighbors of that so it will take 0

34:38 and 1 which is maximum so in this case

34:40 there is no mode of the data but

34:42 obviously if you take a larger data

34:44 there will be modes of the data okay so

34:46 whichever mode for example Suppose there

34:48 are 30 records out of that 20 is 1 and

34:50 10 is 0. so the prediction for this guy

34:52 will be whatever is maximum or whatever

34:55 is mode so if the mode is one or zero

34:58 whatever it is that will be the

34:59 prediction for k n okay so as I told you

35:03 Cannon is a pretty simple algorithm it

35:05 will just plot your data try to find the

35:07 nearest neighbors and then when a new

35:09 observation comes you give how many how

35:12 many Observer how many neighbors you

35:13 want for that record and it will create

35:15 one based on that okay so Canon is a

35:18 simple to understand algorithm nothing

35:19 complex in that so I covered quickly in

35:21 that that slide itself okay now let’s

35:24 try to understand another classification

35:26 technique known as support Vector

35:28 machines or svms

35:30 so what svms will do is it will plot

35:33 your data in whatever axis you have

35:36 suppose age is one axis and salary is

35:38 one axis okay

35:40 and your data points I will take little

35:42 more data points okay your data points

35:45 look like this so these are some data

35:47 points and this is these are some more

35:49 data points okay

35:51 so what sbm will do is it will try to

35:54 create something known as a decision

35:56 boundary okay how this decision boundary

35:59 is different from linear regression

36:00 decision boundary is in any integration

36:03 there is a pure mathematical equation

36:04 involved here there is a concept of

36:07 something known as a hyper plane okay

36:08 for example if I draw a line between

36:11 this right so all these guys black guys

36:14 you can think leaves or Target column is

36:17 one

36:17 all these guys you can think does not

36:20 leave or Target column is zero does not

36:22 leave

36:24 okay

36:25 suppose your data is like this so what

36:28 will happen is your svm will plot this

36:30 is called in the language of sbm this is

36:32 called a decision boundary okay decision

36:35 boundary so in this case your data looks

36:38 pretty simple pretty separated hence the

36:40 decision boundary can be as simple as a

36:43 line okay but in most of the scenarios

36:46 real world scenarios decision boundary

36:49 cannot be as simple as this okay so

36:51 there will be some black dots here there

36:54 will be some black circles here okay and

36:57 there can be some this Cross Blue Cross

37:00 this side right so in this case decision

37:03 boundary is not doing Justice so

37:05 decision boundary need to change and

37:08 that is where the concept of hyper

37:10 planes and kernels two very important

37:12 Concept in svm guys if you want to

37:14 explore more on sbm hyper planes okay

37:18 and kernels

37:20 so when your data become complex then

37:23 simple decision boundaries cannot

37:25 predict it well okay so you need to have

37:28 a have a complex decision boundary and

37:30 that is where hyperplane and kernels

37:32 concept come but just to give you an

37:34 idea of how svm works it will create a

37:37 decision boundary and tomorrow any

37:39 prediction any new result come for

37:41 example somebody asks what is the um you

37:44 know for a person with for a person with

37:46 age 50

37:47 and salary is 60k whether the person

37:50 will leave or not leave so this svm

37:53 model will see on which side of decision

37:55 boundary this guy is falling if this guy

37:57 falls on this side of decision boundary

37:58 it says do not leave if this guy falls

38:01 on this side of decision boundary it

38:02 says leaves okay so in svm remember

38:05 concept of decision boundaries hyper

38:08 planes kernels and kernel tricks okay

38:11 so we have covered three things from the

38:14 classification scenarios and five things

38:17 from the regression scenarios let’s go

38:20 ahead and try to see some unsupervised

38:22 learning problems okay so what is the

38:24 meaning of unsupervised learning till

38:26 now we are having a Target column but in

38:29 unsupervised learning we may not have a

38:31 Target column okay suppose for the same

38:33 employee data we have age and salary

38:37 but somebody comes to you and tells you

38:39 that hey can you tell me if there are

38:42 different buckets of employees existing

38:44 in my organization

38:45 different buckets means some people with

38:48 less age and more salary some people

38:50 with more is endless salary so are there

38:53 different buckets somebody can can come

38:56 and ask you okay

38:58 so how you will solve that problem is by

39:01 using something knowledge clustering or

39:02 segmentation okay so suppose the task in

39:06 hand is here there are three records

39:08 only but there can be more records right

39:10 in the real world scenario what I am

39:12 interested in knowing is if there are

39:14 natural clusters in my organization so

39:17 this is my organization data on one axis

39:20 I have is on other axis I have salary

39:22 okay and I have multiple data points

39:26 here three data points only but I am

39:28 plotting more data points just for

39:30 demonstration okay so there is nothing

39:33 to predict but employed is interested in

39:36 knowing if there are buckets means if

39:37 few employees are closer to each other

39:39 in terms of their characteristics so for

39:42 example these employees are closer you

39:43 can call bucket one these employees are

39:45 closer you can call bucket two or

39:47 segment 2. okay but how this will be

39:50 implemented is in K means clustering so

39:54 one technique for implementing bucketing

39:55 is K means clustering okay there can be

39:57 other techniques also for segmentation

39:59 or bucketing one technique is K means

40:01 clustering in this technique what will

40:03 happen is the distance between the

40:06 various employees will be computed for

40:08 example this is your employee one and

40:11 this is your employee two okay suppose I

40:13 ask you how similar is employee one from

40:16 employee two so there can be different

40:18 similarity metric that you can compute

40:21 for example euclidean distance or

40:23 Manhattan distance or cosine similarity

40:25 Etc I have detailed video on these

40:28 things as well I will link it in the

40:29 description but suppose I tell you a

40:32 simple uh you know how the distance how

40:35 the similar Sim how these two employees

40:36 are similar or different so you will say

40:39 21 minus 40 whole Square

40:42 plus 20K 20K

40:45 minus 42k whole Square so on all the

40:48 dimensions you are taking the distance

40:50 between them squaring it and under

40:52 rooting it this is called euclidean

40:53 distance between E1 and U2 whatever

40:55 number you get it okay

40:57 so suppose E1 and E2 equilibrium

41:00 distance is less and E1 and E3

41:03 equilibrium distance is more so in that

41:04 case you say E1 and E2 are closer to

41:07 each other okay and in the similar way

41:10 you start finding the employees which

41:11 are closer to each other and then you

41:14 call this as one bucket similarly this

41:16 score you call is at an another bucket

41:18 okay remember I have explained you in

41:21 simple terms but there is a very

41:23 important Concept in k-means known as

41:25 centroid concept okay so please go ahead

41:28 and watch unfold data science detailed

41:31 the video on k-means clustering you will

41:33 understand all the details of how

41:35 centroid is defined and how this

41:36 algorithm works at a mathematical level

41:38 okay I will link that video please

41:41 ensure you watch that

41:43 so this is about k-means clustering now

41:46 last but not the least guys you might

41:48 have seen in Amazon and Flipkart that

41:51 there are different different uh

41:52 products that is recommended to you for

41:54 example if you buy a laptop then it will

41:57 tell you hey go ahead and buy this

41:59 laptop bag as well so this is nothing

42:01 but a recommendation okay

42:04 in the Netflix if you watch let’s say

42:06 one movie one action movie let’s say if

42:09 you watch Mission Impossible then it

42:12 will go and recommend you Jack Ryan

42:13 series maybe okay

42:16 so this is called a recommendation

42:18 system that is running in background

42:20 okay so how this system works one simple

42:23 uh yet powerful technique for

42:26 recommender system is known as

42:28 collaborative filtering collaborative

42:30 filtering

42:32 okay

42:33 so what collaborative filtering does is

42:36 it will take users okay users

42:40 and it will take items

42:43 try to understand this simple concept

42:44 Edge it’s pretty simple to understand so

42:47 users can be a month

42:49 and users can be John

42:51 and users can be do okay and in the

42:55 items we can have let’s say Mission

42:58 Impossible

42:59 in the atoms we can have Jack Ryan in

43:02 the atoms we can have another any movie

43:04 of James Bond Series in the atoms we can

43:07 have Spiderman okay in the atoms we can

43:09 have any comedy movies for example home

43:12 alone you can say okay

43:14 so Aman which movie Aman watches or

43:17 which movie Aman has watched for example

43:19 Mission Impossible Aman has watched Jack

43:21 Ryan he has watched but he has not

43:24 watched let’s say this movie Zero I will

43:28 say okay James one movie and this movie

43:31 he has not not uh watched okay

43:33 Spider-Man movie there is another guy

43:35 John who has watched Mission Impossible

43:38 Jack Ryan James Bond movie and

43:41 Spider-Man movie as well

43:43 there is another guy doe who has not

43:45 watched any of these movies but has

43:47 watched Home Alone the comedy movie okay

43:52 so

43:53 the which users are similar to which

43:56 user will be computed based on one of

43:59 the user similarity metric so what are

44:01 the user similarity metric I told you

44:03 cosine similarity it can be different

44:05 kind of distance metric so as you can

44:08 think from the common sense also here

44:10 Aman watches action movies if you can

44:12 see here and John also watches action

44:15 movies more

44:18 Mission Impossible and Jack Ryan but

44:21 Aman has not watched James one movie and

44:23 Aman has not watched Spider-Man movie so

44:26 what will happen is since Aman and Jon

44:29 are similar to each other so go ahead

44:31 and watch the movies that Jon has

44:34 watched but Aman has not watched because

44:37 Aman and Jon both tastes are similar so

44:40 go ahead and recommend what John has

44:43 watched but Aman has not watched so what

44:45 will be the recommendation going to Aman

44:47 James Bond movie and Spider-Man movie

44:50 Okay now imagine this is a large metric

44:53 of large users and large items so it

44:57 will be seen which users tastes are

45:00 similar to each other okay and then

45:03 the other user which has not watched

45:05 that movie will be recommended the

45:07 movies or series based on the similar

45:09 users watching history okay this is

45:13 pretty simple but powerful technique

45:15 known as collaborative filtering

45:17 so let’s revise once guys what all we

45:20 discussed long discussion but very very

45:22 fruitful for you to revise few

45:24 fundamental concepts linear regression

45:26 decision tree random Forest data boost

45:28 gradient boost for segregation we

45:29 discussed classification I explained you

45:32 logistic regression how svm works and

45:34 how k n works and I explained you two

45:37 unsupervised technique came instant

45:38 collaborative filtering now not in too

45:40 much detail I went because it’s not

45:42 possible to go in all the details of 10

45:45 algorithms in short time but read this

45:47 as a refresher and please go ahead and

45:50 click all the links of whichever

45:51 algorithm you are more interested in

45:54 learning all the videos are there on all

45:56 four data science okay

45:58 I request you guys please press the like

46:00 button if you like this video and please

46:01 press the Subscribe button and the bell

46:03 icon

46:04 if you want me to create more videos

46:06 like this see you all in the next video

46:07 guys wherever you are stay safe and take

46:09 care

MachineLearningAlgorithms

You're making education engaging and accessible for everyone. #NurserytoVarsity

Try to tell in this code also

Code with Aarohi is best channel to learn Artificial Intelligence & Data Science

#BestLearningChannel

thanks for this very helpful video !

Excellent, Thank you very much

Thanks a lot for this. Very helpful! I was a bit lost at a few points such as Ada Boost & Log Regression. But that's efficient for a starter. 👍👍👍

please explain base model in adaBoost . It sounds similar to M1 model. is it different from M1 model. if it is so, what is the difference. Kindly explain. But great explanation.Keep up the good work sir. God bless

this is very helpful video those who want to gain basic knowledge in ML algos

but uh did a mistake in Gradient boost calculation in 23:44 .

once check it

Nice, super Duper, you are awesome boss

Very Informative video, thank you

Need your help understanding a scenario where the OA and kappa coefficient are more or less similar on test and validation datasets when using only one independent variable. Here, the validation dataset meaning completely a new dataset in time and space. Train and Test belong to same time and space. Can you explain to me why this is? I appreciate your help on this. When run with a few more variables, this issue is not showing up.

For more understanding, Train and Test are from same day satellite image for city A. Validation dataset is from different day satellite image for City B.

Aman bhaiya I am too from CEB bhubaneswar. I hope you remember

Great presentation and i think this is one of the best videos on simply making understandable to the concepts. thanks for the video

Wow…Nice (AI-ML) Story.

Very informative. Thank u…

Nicely explained! Very helpful.

did he mention black guys while teaching SVM?

Amazing video will let you know if I pass the interview 😂🙏🏼

Do you have PPT slide?

Sir, Ultimate Teaching Style, Sequence of arranging Topics are highly help full to us. Great