Introduction:
There are two major aspects of supervised machine learning - Regression and Classification. Linear Regression is our friend when it comes to predicting continuous values, like house prices, or price of a car. However, there are some problems where we might need a discrete output, like Yes/No. This aspect of Machine Learning is called Classification. Just like Linear Regression for regression, Logistic Regression is a simple way to classify binary outputs. Scikit-Learn offers a simple function that helps us understand and implement Logistic Regression faster than ever.
What is Logistic Regression:
Logistic Regression is a method used to predict binary outcomes, meaning it's used when the dependent variable (what you're trying to predict) can only have two possible outcomes. It's called "logistic" because it uses the logistic function to model the probability of a certain class or event occurring. In simpler terms, imagine you want to predict if it will rain tomorrow based on weather data. Logistic Regression would help you decide if the answer is "yes" or "no". It's like fitting a curve to your data that best separates the two possible outcomes (rain or no rain), based on the input variables you provide (like temperature, humidity, etc.).
Real-World Example:
Let us take an example. Let us assume we have to predict if a student passes an exam or not. We have noticed that the hours they studied and attempting a practice test are the only determining factors for this (this wouldn't happen in real-life). Plotting our data, we get this:
Student | Hours Studied | Completed Practice Test | Pass (1 = Yes, 0 = No) |
---|
1 | 2 | 0 | 0 |
2 | 1 | 1 | 0 |
3 | 3 | 1 | 1 |
4 | 5 | 1 | 1 |
5 | 4 | 1 | 1 |
6 | 2 | 0 | 0 |
We can clearly make a few conclusions on our own, like the importance of the practice-test. However, for a huge set of data, we can't make such predictions on our own. So, let us try to build a Logistic Regression model to predict whether a student passes or not.
How it works:
Logistic Regression has two steps - The first step is to derive a linear equation that is a function of our independent variables, or the values we know. In our problem, we know the values of Hours Studied, and Completion of Practice Test. These are our independent variables. Now, we need to derive a line equation that is a function of these two values, and returns some value y. For our dataset, the line equation would be somewhere around this:
Code:
from sklearn.linear_model import LogisticRegression # Assuming X contains your input variables (hours studied and completed practice test) X = [[2, 0], [1, 1], [3, 1], [5, 1], [4, 1], [2, 0]] # Assuming y contains your output variable (pass exam) y = [0, 0, 1, 1, 1, 0] # Create a Logistic Regression model model = LogisticRegression() # Fit the model model.fit(X, y) #Predict the result for (2,1) result = model.predict([[3,1]]) print(result)
[1]
And that's all there is. You have your very own, working Logistic Regression Model. Even though there are numerous, highly advanced classification models in the world now, Logistic Regression still remains a very useful algorithm for binary classifications and thanks to Scikit-Learn, we can develop our very own Logistic Regression model in barely any time.
If you haven't yet checked out the Linear Regression post: Linear Regression
If you want to explore Scikit-Learn further: scikit-learn: machine learning in Python — scikit-learn 1.3.0 documentation
Comments
Post a Comment