Regularization with Linear and Logistic Regression
%load_ext oct2py.ipython
Objective
● To implement regularized linear regression and regularized logistic regression.
● To apply regularized logistic regression to predict which passengers survived the Titanic shipwreck tragedy.
Regularized linear regression
Data Plot
%%octave -f svg
x = load('ML4/ml4Linx.dat');
y = load('ML4/ml4Liny.dat');
m = length(x);
plot(x,y,'go', 'MarkerFaceColor','auto')
xlabel('x');
ylabel('y');
Hypothesis
Six Polynomial Features
Six Model Parameters
Linear Regression Cost Function
Linear Regression with Regularization Cost Function (To guard against overfitting!)
where $\lambda$ is the regularization parameter
Regularization parameter
● The regularization parameter λ is a control on your fitting parameters.
● As the magnitudes of the fitting parameters increase, there will be an increasing penalty on the cost function.
● This penalty is dependent on the squares of the parameters as well as the magnitude of λ.
Normal equations
Now we will find the best parameters of our model using the normal equations. Recall that the normal equations solution to regularized linear regression is
Problem
Using the Normal equation, find values for θ using the three regularization parameters below:
a. λ = 0 (this is the same case as nonregularized linear regression)
b. λ = 1
c. λ = 10
a. λ = 0 (this is the same case as nonregularized linear regression)
%%octave -f svg
% Regularization parameter
lambda = 0;
x = load('ML4/ml4Linx.dat');
y = load('ML4/ml4Liny.dat');
m = length(x);
%Plot
plot(x,y,'go', 'MarkerFaceColor','auto');
hold on;
% Features
x = [ ones(m,1), x, x.^2, x.^3, x.^4, x.^5 ];
theta = zeros(size(x(1,:))); % Theta size is equivalent to the no. of x columns
% Closed form solution (Normal Equations)
L = lambda.*eye(6);
L(1) = 0; % intercept term don't need regularization'
theta = (x' * x + L)\x' * y;
% Denser x values
x_vals = (linspace(-1,1,1000))';
features = [ ones(size(x_vals)), x_vals, x_vals.^2, x_vals.^3, x_vals.^4, x_vals.^5 ];
size(features)
size(theta)
% Hypothesis
h = features*theta;
%Plot
plot(x_vals, h, 'r--', 'Linewidth', 3);
xlabel('x');
ylabel('y');
title('Hypothesis at λ = 0');
legend('Training data', '5th order fit');
ans =
1000 6
ans =
6 1
b. λ = 1
%%octave -f svg
% Regularization parameter
lambda = 1;
x = load('ML4/ml4Linx.dat');
y = load('ML4/ml4Liny.dat');
m = length(x);
%Plot
plot(x,y,'go', 'MarkerFaceColor','auto');
hold on;
% Features
x = [ ones(m,1), x, x.^2, x.^3, x.^4, x.^5 ];
theta = zeros(size(x(1,:))); % Theta size is equivalent to the no. of x columns
% Closed form solution (Normal Equations)
L = lambda.*eye(6);
L(1) = 0; % intercept term don't need regularization'
theta = (x' * x + L)\x' * y;
% Denser x values
x_vals = (linspace(-1,1,1000))';
features = [ ones(size(x_vals)), x_vals, x_vals.^2, x_vals.^3, x_vals.^4, x_vals.^5 ];
size(features)
size(theta)
% Hypothesis
h = features*theta;
%Plot
plot(x_vals, h, 'r--', 'Linewidth', 3);
xlabel('x');
ylabel('y');
title('Hypothesis at λ = 1');
legend('Training data', '5th order fit');
ans =
1000 6
ans =
6 1
c. λ = 10
%%octave -f svg
% Regularization parameter
lambda = 10;
x = load('ML4/ml4Linx.dat');
y = load('ML4/ml4Liny.dat');
m = length(x);
%Plot
plot(x,y,'go', 'MarkerFaceColor','auto');
hold on;
% Features
x = [ ones(m,1), x, x.^2, x.^3, x.^4, x.^5 ];
theta = zeros(size(x(1,:))); % Theta size is equivalent to the no. of x columns
% Closed form solution (Normal Equations)
L = lambda.*eye(6);
L(1) = 0; % intercept term don't need regularization'
theta = (x' * x + L)\x' * y;
% Denser x values
x_vals = (linspace(-1,1,1000))';
features = [ ones(size(x_vals)), x_vals, x_vals.^2, x_vals.^3, x_vals.^4, x_vals.^5 ];
size(features)
size(theta)
% Hypothesis
h = features*theta;
%Plot
plot(x_vals, h, 'r--', 'Linewidth', 3);
xlabel('x');
ylabel('y');
title('Hypothesis at λ = 10');
legend('Training data', '5th order fit');
ans =
1000 6
ans =
6 1
Regularized logistic regression
Implement regularized logistic regression using Newton’s Method.
Data Plot
%%octave -f svg
x = load('ML4/ml4Logx.dat');
y = load('ML4/ml4Logy.dat');
% Distinguish positive and negative samples
negative = find(y == 0);
positive = find(y == 1);
plot( x(positive,1), x(positive,2),'r+');
hold on;
plot( x(negative,1), x(negative,2),'bo');
xlabel('u');
ylabel('v');
legend(' y = 1 ', ' y = 0 ');
%cat ML4/map_feature.m
function out = map_feature(feat1, feat2)
% MAP_FEATURE Feature mapping function for Exercise 5
%
% map_feature(feat1, feat2) maps the two input features
% to higher-order features as defined in Exercise 5.
%
% Returns a new feature array with more features
%
% Inputs feat1, feat2 must be the same size
%
% Note: this function is only valid for Ex 5, since the degree is
% hard-coded in.
degree = 6;
out = ones(size(feat1(:,1)));
for i = 1:degree
for j = 0:i
out(:, end+1) = (feat1.^(i-j)).*(feat2.^j);
end
end
Recall the Logistic Regression Cost function:
Logistic Regression with Regularization Cost function:
Newton’s Method Update Rule
Cost function plot
%%octave -f svg
x = load('ML4/ml4Logx.dat');
y = load('ML4/ml4Logy.dat');
% Distinguish positive and negative samples
negative = find(y == 0);
positive = find(y == 1);
addpath ML4/
% Add polynomial features
x = map_feature(x(:,1), x(:,2)); % Increase the no. of features from 2 to 28
[m, n] = size(x)
[m n]
% Initialize theta
theta = zeros(n, 1);
% Sigmoid
g = inline('1.0 ./ (1.0 + exp(-z))');
% Regularization parameter
lambda = 0
% Newton's method
MAX_ITER = 16;
J = zeros(MAX_ITER, 1);
for i = 1:MAX_ITER
% hypothesis
h = g(x*theta); % Sigmoid
% cost function
J(i) = (1/m)*sum(-y.*log(h) - (1-y).*log(1 - h)) + (lambda/(2*m)) * norm(theta([2:end]))^2;
% gradient
G = (lambda/m).*theta;
G(1) = 0; % Note: intercept term is not included
grad = ((1/m).*x' * (h - y)) + G;
% hessian
L = (lambda/m).*eye(n);
L(1) = 0;
H = ((1/m).*x' * diag(h) * diag(1-h) * x) + L;
%theta update
theta = theta - H\grad;
endfor
plot(0:MAX_ITER-1, J, 'ro--','MarkerFaceColor','auto')
xlim([0 15]);
xlabel('Iteration');
ylabel('J')
m = 117
n = 28
ans =
117 28
lambda = 0
a. Decision Boundary at λ = 0 (this is the same case as nonregularized linear regression)
%%octave -f svg
x = load('ML4/ml4Logx.dat');
y = load('ML4/ml4Logy.dat');
% Distinguish positive and negative samples
negative = find(y == 0);
positive = find(y == 1);
plot( x(positive,1), x(positive,2),'r+');
hold on;
plot( x(negative,1), x(negative,2),'bo');
xlabel('u');
ylabel('v');
legend(' y = 1 ', ' y = 0 ');
addpath ML4/
% Add polynomial features
x = map_feature(x(:,1), x(:,2)); % Increase the no. of features from 2 to 28
[m, n] = size(x)
[m n]
% Initialize theta
theta = zeros(n, 1);
% Sigmoid
g = inline('1.0 ./ (1.0 + exp(-z))');
% Regularization parameter
lambda = 0
% Newton's method
MAX_ITER = 16;
J = zeros(MAX_ITER, 1);
for i = 1:MAX_ITER
% hypothesis
h = g(x*theta); % Sigmoid
% cost function
J(i) = (1/m)*sum(-y.*log(h) - (1-y).*log(1 - h)) + (lambda/(2*m)) * norm(theta([2:end]))^2;
% gradient
G = (lambda/m).*theta;
G(1) = 0; % Note: intercept term is not included
grad = ((1/m).*x' * (h - y)) + G;
% hessian
L = (lambda/m).*eye(n);
L(1) = 0;
H = ((1/m).*x' * diag(h) * diag(1-h) * x) + L;
%theta update
theta = theta - H\grad;
endfor
u = linspace(-1, 1.5, 300);
v = linspace(-1, 1.5, 300);
h = zeros(length(u), length(v));
for i = 1:length(u)
for j = 1:length(v)
z(i,j) = map_feature(u(i), v(j))*theta;
endfor
endfor
contour(u, v, z', [0, 0],'m-','LineWidth', 4)
legend('y = 1', 'y = 0', 'Decision Boundary')
m = 117
n = 28
ans =
117 28
lambda = 0
b. Decision Boundary at λ = 1
%%octave -f svg
x = load('ML4/ml4Logx.dat');
y = load('ML4/ml4Logy.dat');
% Distinguish positive and negative samples
negative = find(y == 0);
positive = find(y == 1);
plot( x(positive,1), x(positive,2),'r+');
hold on;
plot( x(negative,1), x(negative,2),'bo');
xlabel('u');
ylabel('v');
legend(' y = 1 ', ' y = 0 ');
addpath ML4/
% Add polynomial features
x = map_feature(x(:,1), x(:,2)); % Increase the no. of features from 2 to 28
[m, n] = size(x)
[m n]
% Initialize theta
theta = zeros(n, 1);
% Sigmoid
g = inline('1.0 ./ (1.0 + exp(-z))');
% Regularization parameter
lambda = 1
% Newton's method
MAX_ITER = 16;
J = zeros(MAX_ITER, 1);
for i = 1:MAX_ITER
% hypothesis
h = g(x*theta); % Sigmoid
% cost function
J(i) = (1/m)*sum(-y.*log(h) - (1-y).*log(1 - h)) + (lambda/(2*m)) * norm(theta([2:end]))^2;
% gradient
G = (lambda/m).*theta;
G(1) = 0; % Note: intercept term is not included
grad = ((1/m).*x' * (h - y)) + G;
% hessian
L = (lambda/m).*eye(n);
L(1) = 0;
H = ((1/m).*x' * diag(h) * diag(1-h) * x) + L;
%theta update
theta = theta - H\grad;
endfor
u = linspace(-1, 1.5, 300);
v = linspace(-1, 1.5, 300);
h = zeros(length(u), length(v));
for i = 1:length(u)
for j = 1:length(v)
z(i,j) = map_feature(u(i), v(j))*theta;
endfor
endfor
contour(u, v, z', [0, 0],'m-','LineWidth', 4)
legend('y = 1', 'y = 0', 'Decision Boundary')
m = 117
n = 28
ans =
117 28
lambda = 1
c. Decision Boundary at λ = 10
%%octave -f svg
x = load('ML4/ml4Logx.dat');
y = load('ML4/ml4Logy.dat');
% Distinguish positive and negative samples
negative = find(y == 0);
positive = find(y == 1);
plot( x(positive,1), x(positive,2),'r+');
hold on;
plot( x(negative,1), x(negative,2),'bo');
xlabel('u');
ylabel('v');
legend(' y = 1 ', ' y = 0 ');
addpath ML4/
% Add polynomial features
x = map_feature(x(:,1), x(:,2)); % Increase the no. of features from 2 to 28
[m, n] = size(x)
[m n]
% Initialize theta
theta = zeros(n, 1);
% Sigmoid
g = inline('1.0 ./ (1.0 + exp(-z))');
% Regularization parameter
lambda = 10
% Newton's method
MAX_ITER = 16;
J = zeros(MAX_ITER, 1);
for i = 1:MAX_ITER
% hypothesis
h = g(x*theta); % Sigmoid
% cost function
J(i) = (1/m)*sum(-y.*log(h) - (1-y).*log(1 - h)) + (lambda/(2*m)) * norm(theta([2:end]))^2;
% gradient
G = (lambda/m).*theta;
G(1) = 0; % Note: intercept term is not included
grad = ((1/m).*x' * (h - y)) + G;
% hessian
L = (lambda/m).*eye(n);
L(1) = 0;
H = ((1/m).*x' * diag(h) * diag(1-h) * x) + L;
%theta update
theta = theta - H\grad;
endfor
u = linspace(-1, 1.5, 300);
v = linspace(-1, 1.5, 300);
h = zeros(length(u), length(v));
for i = 1:length(u)
for j = 1:length(v)
z(i,j) = map_feature(u(i), v(j))*theta;
endfor
endfor
contour(u, v, z', [0, 0],'m-','LineWidth', 4)
legend('y = 1', 'y = 0', 'Decision Boundary')
m = 117
n = 28
ans =
117 28
lambda = 10
–to be continued