# Ridge Regression With Examples

## 1. What is Ridge Regression?

Ridge Regression is a type of regularized linear regression that helps to prevent overfitting. It is similar to ordinary least squares regression, but with an additional penalty term added to the cost function. The penalty term is a regularization parameter, denoted by ‘alpha’, that controls the degree of shrinkage of the regression coefficients towards zero. The higher the value of alpha, the greater the degree of shrinkage and the more robust the model becomes.

## 1.1 The Ridge Regression cost function is given by:

``````
J(θ) = MSE(θ) + α * L2_norm(θ)
``````

Where `MSE(θ) `is the mean squared error of the regression, `L2_norm(θ)` is the `L2` norm (i.e., the sum of squares) of the regression coefficients, and `α `is the regularization parameter.

The L2 norm of the regression coefficients acts as a penalty term that forces the model to have smaller coefficients, which in turn reduces the model’s complexity and helps to prevent overfitting. The value of α controls the strength of this penalty term and can be adjusted to obtain the best model performance on the validation set.

### 1.2 Example of how to use Ridge Regression in Python:

In order to implement Ridge Regression in Python, we can use the Ridge module from the `sklearn.linear_model` library.

Here is the data set link: https://github.com/Narenderbeniwal/Spark-By-Example

``````
# Imports
from sklearn.linear_model import Ridge
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error

# Load the Boston Housing dataset

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(boston.data, boston.target, test_size=0.3, random_state=42)

# Instantiate the Ridge Regression model with alpha = 0.1
ridge = Ridge(alpha=0.1)

# Fit the model to the training data
ridge.fit(X_train, y_train)

# Make predictions on the testing data
y_pred = ridge.predict(X_test)

# Calculate the mean squared error of the predictions
mse = mean_squared_error(y_test, y_pred)

# Print the mean squared error
print("Mean Squared Error:", mse)

``````

Yields below output.

``````
## Output
Mean Squared Error: 21.5851159150243
``````

In this example, we first loaded the Boston Housing dataset and split it into training and testing sets. We then instantiated the Ridge Regression model with an alpha value of 0.1 and fit the model to the training data. We made predictions on the testing data and calculated the mean squared error of the predictions. Finally, we printed the mean squared error to evaluate the performance of the Ridge Regression model.

## 2. Evaluation Metrics for Ridge Regression

The evaluation metrics for Ridge Regression are similar to those used for ordinary least squares regression. The most commonly used evaluation metrics for Ridge Regression are:

### 2.1 Mean Squared Error (MSE):

The mean squared error is the average of the squared differences between the actual values and the predicted values. It is given by:

``````
MSE = (1/n) * Σ(y - y_hat)^2
``````

Where `n` is the number of observations, `y` is the actual value, and `y_hat` is the predicted value.

### 2.2 R-squared (R2):

The R-squared value represents the proportion of variance in the dependent variable that is explained by the independent variables in the model. It ranges from 0 to 1, with higher values indicating a better fit. It is given by:

``````
R2 = 1 - (RSS/TSS)
``````

Where `RSS` is the residual sum of squares and `TSS` is the total sum of squares.

### 2.3 Root Mean Squared Error (RMSE):

The root mean squared error is the square root of the mean squared error. It is given by:

``````

RMSE = sqrt((1/n) * Σ(y - y_hat)^2)

``````

### 2.4 Mean Absolute Error (MAE):

The mean absolute error is the average of the absolute differences between the actual values and the predicted values. It is given by:

``````
MAE = (1/n) * Σ|y - y_hat|
``````

These evaluation metrics can be calculated using Python libraries such as scikit-learn or numpy. In scikit-learn, we can use the mean_squared_error, r2_score, mean_absolute_error functions to calculate MSE, R2, and MAE, respectively.

## 3. Key benefits of Ridge regression Over Linear Regression

Ridge Regression is a regularization technique used to prevent overfitting in linear regression models. Here are some key benefits of using Ridge Regression:

### 3.1 Reduces overfitting:

Ridge Regression adds a regularization term to the objective function that penalizes large values of the regression coefficients. This helps prevent overfitting by reducing the impact of high-variance features in the model.

### 3.2 Handles multicollinearity:

Ridge Regression can handle multicollinearity (correlation between independent variables) by shrinking the regression coefficients towards zero. This helps reduce the variance of the estimates and makes the model more stable.

### 3.3 Improves model performance:

By reducing the impact of high-variance features and handling multicollinearity, Ridge Regression can improve the predictive performance of a linear regression model.

### 3.4 Works well with large datasets:

Ridge Regression can handle large datasets efficiently due to its computational simplicity.

### 3.5 Provides a range of solutions: Ridge

Regression provides a range of solutions depending on the value of the regularization parameter (alpha), allowing for flexibility in controlling the balance between bias and variance in the model.

Overall, Ridge Regression is a useful technique for improving the performance and stability of linear regression models, especially in situations where multicollinearity or overfitting are concerns.

## 4. Assumptions Of the Ridge Regression

Ridge Regression is a regularized regression technique used to prevent overfitting in linear regression models. Like any other regression technique, Ridge Regression makes some assumptions about the data being used. Here are the key assumptions of Ridge Regression:

### 4.1 Linear Relationship

Ridge Regression assumes that there is a linear relationship between the independent variables and the dependent variable.

### 4.2 Homoscedasticity:

Ridge Regression assumes that the variance of the errors is constant across all levels of the independent variables.

### 4.3 Independence of errors:

Ridge Regression assumes that the errors are independent of each other, i.e., the errors are not correlated.

### 4.4 Normality of errors:

Ridge Regression assumes that the errors follow a normal distribution.

### 4.5 No multicollinearity:

Ridge Regression assumes that the independent variables are not highly correlated with each other. When multicollinearity exists, Ridge Regression is a preferred technique because it can handle multicollinearity better than ordinary least squares regression.

### 4.6 The number of predictors should be less than the number of observations:

Ridge Regression assumes that the number of predictors is less than the number of observations. If this assumption is not met, then the model may not perform well.

It is important to check for these assumptions before using Ridge Regression. Violations of these assumptions can lead to biased or inefficient estimates, and can impact the accuracy of the model.

## 5. What are the disadvantages of Ridge Regression?

It is a regularization technique used to address the limitations and challenges of linear regression. However, like any other technique, it also has its own disadvantages. Here are some of the disadvantages:

1. It includes all the predictors in the final model.
2. It is not capable of performing feature selection.
3. It shrinks coefficients towards zero.
4. It trades variance for bias.

## 6. What is Shrinkage & Regularisation in Ridge Regression Model?

Shrinkage and regularization are key concepts in this algorithm, which is a type of linear regression that uses regularization to address some of the limitations and challenges of standard linear regression.

Shrinkage refers to the process of shrinking the estimated regression coefficients towards zero. This is done by adding a penalty term to the sum of squared residuals in the regression equation, which is called the regularization term. The regularization term is proportional to the square of the magnitude of the regression coefficients, and it is controlled by a tuning parameter, usually denoted as λ or alpha. The higher the value of λ, the more the coefficients are shrunk towards zero. Shrinkage helps to reduce the variance of the estimates and can improve the prediction accuracy of the model.

Regularization refers to the process of adding a penalty term to the regression equation to prevent overfitting. Overfitting occurs when the model is too complex and fits the training data too closely, resulting in poor performance on new data. Regularization helps to simplify the model and improve its generalizability by shrinking the coefficients towards zero.

This uses L2 regularization, which adds a penalty term proportional to the square of the magnitude of the coefficients to the sum of squared residuals. This penalty term encourages the model to have smaller coefficients, which reduces the variance of the estimates and can improve the prediction accuracy of the model. The amount of regularization is controlled by the tuning parameter alpha.

Overall, shrinkage and regularization are important concepts in Ridge Regression, which help to reduce the variance of the estimates and prevent overfitting, resulting in a more robust and accurate model.

## Conclusion

In conclusion, Ridge Regression is a powerful technique for regularized linear regression that addresses some of the limitations and challenges of standard linear regression. It uses L2 regularization to shrink the coefficients towards zero and prevent overfitting, resulting in a more robust and accurate model. The amount of regularization is controlled by the tuning parameter alpha, which balances the tradeoff between bias and variance.