I'm in the early stages of understanding
backpropagation and I attempted to implement it myself.
The dataset I attempted to work with was the iris dataset of size (150, 4).
I'm only worried about
backpropagation and not gradient descent, so I'm just trying my algorithm on one example to see if I can get a seemingly proper output.
However, my issue is trying to get my gradients for my initial weight matrix, I'm getting an error with the shapes.
My code is below. The error is with the last line because x is of size (4,1) and delta2 is of size (8,8) so I can't get the dot product I just don't understand how I am supposed to get a correct size of delta2 if I'm following the algorithm correctly according to other sources.
from sklearn.datasets import load_iris from keras.utils import to_categorical import numpy as np # LOAD DATA data = load_iris() X = data.data[:-20] y = to_categorical(data.target[:-20]) # only 20 samples because we have a small dataset X_test = data.data[-20:] y_test = to_categorical(data.target[-20:]) # INIT WEIGHTS - will try to add bias later on w1 = np.random.rand(np.shape(X), h_neurons) w2 = np.random.rand(h_neurons, 3) def sigmoid(x, deriv=False): if deriv: return sigmoid(x)*(1-sigmoid(x)) else: return 1/(1+np.exp(-x)) # Feed forward x = X.reshape(4,1) z1 = w1.T.dot(x) # need to transpose weight matrix a1 = sigmoid(z1) z2 = w2.T.dot(a1) y_hat = sigmoid(z2,deriv=True) # output # BACKPROP y_ = y.reshape(3,1) delta3 = np.multiply((y_hat - y_), sigmoid(z2, deriv=True)) dJdW2 = a1.dot(delta3) ## ERROR !!! delta2 = np.dot(delta3, w2.T) * sigmoid(z1, deriv=True) dJdW1 = np.dot(x.T, delta2) ## ERROR !!!
I thought I implemented
backpropagation correctly, but apparently not, can someone please point out where I went wrong?
I'm stuck and I've looked at various sources and the code to compute dJdW (derivative of cost with respect to weights) is roughly the same.