Q2: Training a Support Vector Machine

同Q1,使用cifar10数据集,数据导入不过多赘述

To be continued

分割数据集

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# Split the data into train, val, and test sets. In addition we will
# create a small development set as a subset of the training data;
# we can use this for development so our code runs faster.
num_training = 49000
num_validation = 1000
num_test = 1000
num_dev = 500

# Our validation set will be num_validation points from the original
# training set.
mask = range(num_training, num_training + num_validation)
X_val = X_train[mask]
y_val = y_train[mask]

# Our training set will be the first num_train points from the original
# training set.
mask = range(num_training)
X_train = X_train[mask]
y_train = y_train[mask]

# We will also make a development set, which is a small subset of
# the training set.
mask = np.random.choice(num_training, num_dev, replace=False)
X_dev = X_train[mask]
y_dev = y_train[mask]

# We use the first num_test points of the original test set as our
# test set.
mask = range(num_test)
X_test = X_test[mask]
y_test = y_test[mask]


# Preprocessing: reshape the image data into rows
X_train = np.reshape(X_train, (X_train.shape[0], -1))
X_val = np.reshape(X_val, (X_val.shape[0], -1))
X_test = np.reshape(X_test, (X_test.shape[0], -1))
X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))

print('Train data shape: ', X_train.shape)
print('Train labels shape: ', y_train.shape)
print('Validation data shape: ', X_val.shape)
print('Validation labels shape: ', y_val.shape)
print('Test data shape: ', X_test.shheape)
print('Test labels shape: ', y_test.shape)
print('dev data shape: ', X_dev.shape)

Train data shape: (49000,3072) Train labels shape: (49000,) Validation data shape: (1000, 3072) Validation labels shape: (1000,) Test data shape: (1000, 3072) Test labels shape: (1000,)

数据集处理

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
# Preprocessing: subtract the mean image
# first: compute the image mean based on the training data
mean_image = np.mean(X_train, axis=0)
print(mean_image[:10]) # print a few of the elements
plt.figure(figsize=(4,4))
plt.imshow(mean_image.reshape((32,32,3)).astype('uint8')) # visualize the mean image
plt.show()

# second: subtract the mean image from train and test data
#减去均值,标准化
X_train -= mean_image
X_val -= mean_image
X_test -= mean_image
X_dev -= mean_image

# third: append the bias dimension of ones (i.e. bias trick) so that our SVM
# only has to worry about optimizing a single weight matrix W.
X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])
X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])
X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])
X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])

print(X_train.shape, X_val.shape, X_test.shape, X_dev.shape)

[130.64189796 135.98173469 132.47391837 130.05569388 135.34804082 131.75402041 130.96055102 136.14328571 132.47636735 131.48467347]

(49000, 3073) (1000, 3073) (1000, 3073) (500, 3073)

编写svm_loss_naive

在这里,loss已经计算好,我们只需要手动计算梯度。但是!手码梯度可太痛苦了,所以我导入了Pytorch来跟踪向量,一句backward拯救世界为什么不偷懒呢🤣

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
def svm_loss_naive(W, X, y, reg):
"""
Structured SVM loss function, naive implementation (with loops).

Inputs have dimension D, there are C classes, and we operate on minibatches
of N examples.

Inputs:
- W: A numpy array of shape (D, C) containing weights.
- X: A numpy array of shape (N, D) containing a minibatch of data.
- y: A numpy array of shape (N,) containing training labels; y[i] = c means
that X[i] has label c, where 0 <= c < C.
- reg: (float) regularization strength

Returns a tuple of:
- loss as single float
- gradient with respect to weights W; an array of same shape as W
"""
import torch as t
dW = np.zeros(W.shape) # initialize the gradient as zero
X = t.from_numpy(X)
W = t.from_numpy(W)
W.requires_grad=True
y = t.from_numpy(y)
# compute the loss and the gradient
num_classes = W.shape[1]
num_train = X.shape[0]
loss = t.tensor([0.0],requires_grad=True)
for i in range(num_train):
scores = t.mm(X[i].unsqueeze(0),W)
correct_class_score = scores[0][y[i]]
for j in range(num_classes):
if j == y[i]:
continue
margin = scores[0][j] - correct_class_score + 1 # note delta = 1
if margin > 0:
loss = margin +loss

# Right now the loss is a sum over all training examples, but we want it
# to be an average instead so we divide by num_train.
loss = loss/num_train

# Add regularization to the loss.
loss = reg * ((W * W).sum())+loss

#############################################################################
# TODO: #
# Compute the gradient of the loss function and store it dW. #
# Rather that first computing the loss and then computing the derivative, #
# it may be simpler to compute the derivative at the same time that the #
# loss is being computed. As a result you may need to modify some of the #
# code above to compute the gradient. #
#############################################################################
# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

loss.backward()
dW = W.grad
# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****

return loss, dW

调用试试

1
2
3
4
5
6
7
8
9
# Evaluate the naive implementation of the loss we provided for you:
from cs231n.classifiers.linear_svm import svm_loss_naive
import time

# generate a random SVM weight matrix of small numbers
W = np.random.randn(3073, 10) * 0.0001

loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.000005)
print('loss: %f' % (loss, ))

loss: 8.751775

算法校验

​ 来看看我们写的梯度算法是否正确

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Once you've implemented the gradient, recompute it with the code below
# and gradient check it with the function we provided for you

# Compute the loss and its gradient at W.
loss, grad = svm_loss_naive(W, X_dev, y_dev, 0.0)

# Numerically compute the gradient along several randomly chosen dimensions, and
# compare them with your analytically computed gradient. The numbers should match
# almost exactly along all dimensions.
from cs231n.gradient_check import grad_check_sparse
f = lambda w: svm_loss_naive(w, X_dev, y_dev, 0.0)[0]
grad_numerical = grad_check_sparse(f, W, grad)

# do the gradient check once again with regularization turned on
# you didn't forget the regularization gradient did you?
loss, grad = svm_loss_naive(W, X_dev, y_dev, 5e1)
f = lambda w: svm_loss_naive(w, X_dev, y_dev, 5e1)[0]
grad_numerical = grad_check_sparse(f, W, grad)

numerical: 9.059906 analytic: 8.921706, relative error: 7.685618e-03 numerical: 9.346008 analytic: 9.287930, relative error: 3.116827e-03 numerical: 6.532669 analytic: 6.546950, relative error: 1.091799e-03 numerical: 10.538101 analytic: 10.352420, relative error: 8.888305e-03 numerical: 8.440018 analytic: 8.401690, relative error: 2.275789e-03 numerical: 3.290176 analytic: 3.357421, relative error: 1.011565e-02 numerical: 5.531311 analytic: 5.485640, relative error: 4.145520e-03 numerical: -46.300888 analytic: -45.812960, relative error: 5.296994e-03 numerical: 7.772446 analytic: 7.616360, relative error: 1.014283e-02 numerical: 11.014938 analytic: 11.025092, relative error: 4.606967e-04 numerical: 2.670288 analytic: 3.042669, relative error: 6.518187e-02 numerical: -11.301041 analytic: -11.615975, relative error: 1.374236e-02 numerical: 0.667572 analytic: 0.818471, relative error: 1.015444e-01 numerical: -45.633316 analytic: -45.023512, relative error: 6.726490e-03 numerical: 20.170212 analytic: 20.151709, relative error: 4.588866e-04 numerical: 11.539459 analytic: 11.752489, relative error: 9.146033e-03 numerical: 5.769730 analytic: 5.840804, relative error: 6.121552e-03 numerical: -12.159348 analytic: -12.446800, relative error: 1.168215e-02 numerical: -10.490417 analytic: -10.358632, relative error: 6.320931e-03 numerical: 13.065338 analytic: 13.197099, relative error: 5.017111e-03

这个结果相对网上手动实现精度确实弱了些,但是相关度还是比较高的。为什么精度低,我也不知道,也许应该问问torch?