Jelenlegi hely

Neural Networks III: How would I implement one?

Utolsó bejegyzés

Neural Networks III: How would I implement one?

Címkék:

kép:https://i.imgur.com/FC1QvBY.jpg

In the third article in the series, I am attempting to keep everything fairly detailed and explain everything I do, as I dive deeply into actual implementation of the Feed-Forward Neural Network. To make the implementation process easier to comprehend, the article is divided into 5 sub-segments:

- Making a simple, Feed-Forward Neural Network structure
- Fixing the Neural Network’s bias/weight initial values
- Adding a learning algorithm to the Neural Network
- Multiple Input and Output Sets for our Neural Network
- Training for handwriting recognition with MNIST data set

Let’s jump right in!

I will use Java in this case, but any other programming language will follow the exact same route of ideas. I’m not going to use any exotic language specific solution, but will try to keep everything as generic as possible.

Making a simple, Feed-Forward Neural Network structure:

The structure of our entire Neural Network supposed to be very simple, as the theoretical example was in the previous article. I presume the entire codebase should cap around 150-200 lines of code, plus the helper utility classes.

First, let’s create a Network.java class, which represent our NN object.

We would need a few integer constant attributes here, which defines our network and doesn’t need to change over the program’s lifetime.

NETWORK_LAYER_SIZES” Contains the number or neurons in each of our layer.

NETWORK_SIZE” Contains the number of layers over our NN. We set this number to be associated from the NETWORK_LAYER_SIZES array’s length.

INPUT_SIZE” Contains the number of input neurons. Input layer is the first in the network, so the first number in the NETWORK_LAYER_SIZES will represent this variable.

OUTPUT_SIZE” Contains the number of output neurons in our NN. Output is the last layer, so it is represented by the last index in the NETWORK_LAYER_SIZES array.

Now we will declare a couple of variables to work with:

output” Contains the calculated output value of every neuron over our entire network. This value needs to be as precise as possible to give us accurate results, so we use Double as a datatype. Using a two dimensional array here is sufficient enough to store the given layer, and the given neuron positions as well.

weights” Stores all the weight data over the network. Note that this needs to be a three dimensional array to store all the necessary positions. The first value would be the given layer, the second is the given neuron, and the third is the previous neuron the weight is connected to. We need this previous neuron data because as we have learned in the previous article, a single neuron on the given layer is connected to all of the previous ones at the adjacent previous layer.

bias” Is a two dimensional array, similar as the output, because every neuron has one bias variable as well.

public class Network {
	public final int[] NETWORK_LAYER_SIZES;
	public final int NETWORK_SIZE;
	public final int INPUT_SIZE;
	public final int OUTPUT_SIZE;
 
	private double[][] output; //layer, neuron
	private double[][][] weights; //layer, neuron, previousNeuron 
	private double[][] bias; //layer, neuron
}

Our constructor will receive the NETWORK_LAYER_SIZES value from the initialization method, and all the rest of the constant and variable data can be calculated from it.

For initializing the output, weight and bias values, we assign the NETWORK_SIZE as the first dimension’s size. Also we need to iterate through a FOR loop to initialize all the rest of the elements over the second layer. Note that while every neuron has an output and a bias, the very first layer doesn’t have weights on it (being the input layer), so we start assigning weight values from the second layer in this loop.

public Network(int[] NETWORK_LAYER_SIZES) {
	this.NETWORK_LAYER_SIZES = NETWORK_LAYER_SIZES;
	this.NETWORK_SIZE = NETWORK_LAYER_SIZES.length;
	this.INPUT_SIZE = NETWORK_LAYER_SIZES[0];
	this.OUTPUT_SIZE = NETWORK_LAYER_SIZES[NETWORK_SIZE - 1];
 
	this.output = new double[NETWORK_SIZE][];
	this.weights = new double[NETWORK_SIZE][][];
	this.bias = new double[NETWORK_SIZE][];
 
	for (int i = 0; i < NETWORK_SIZE; i++) {
 
		this.output[i] = new double[NETWORK_LAYER_SIZES[i]];
		this.bias[i] = new double[NETWORK_LAYER_SIZES[i]];
 
		if (i > 0) {
			weights[i] = new double[NETWORK_LAYER_SIZES[i]][NETWORK_LAYER_SIZES[i - 1]];
		}
	}
}

We now have a basic initialization constructor, but we need to have a method that will calculate the FEED-FORWARDING values as well. Let’s call it “calculate”. This methods takes an array of doubles as an input parameter, and returns an array of doubles as an output.

public double[] calculate(double input[]) {
	if (input.length != this.INPUT_SIZE) {
		return null;
	}
 
	this.output[0] = input;
	for (int layer = 1; layer < NETWORK_SIZE; layer++) {
		for (int neuron = 0; neuron < NETWORK_LAYER_SIZES[layer]; neuron++) {
			double sum = bias[layer][neuron];
			for (int prevNeuron = 0; prevNeuron < NETWORK_LAYER_SIZES[layer - 1]; prevNeuron++) {
				sum += output[layer - 1][prevNeuron] * weights[layer][neuron][prevNeuron];
			}
			output[layer][neuron] = sigmoid(sum);
		}
	}
 
	return output[NETWORK_SIZE - 1];
}

The very first IF block check just makes sure that the input array’s size matches our network’s previously set INPUT_SIZE constant value. If it doesn’t we cannot do any calculations.
The next line just passes these input values to the output array’s first element, since it doesn’t need to do any calculations with it.
After that, we have a nested FOR loop that iterates through all the rest of the layers, while iterating through all the neurons as well in the given layer. Here, each of the neurons will apply the summarization with the bias, apply the weight multiplication over each of the previous neurons, iterating through yet another FOR loop. Finally we apply the sigmoid function to this summarized value.

The math for the sigmoid function can be represented by this in Java:

private static double sigmoid(double x) {
	return 1d / (1 + Math.exp( - x));
}

We can quickly make a Main method to test our current version of the network with some random values. Let’s instantiate our network and have the input and output layers contain 5-5 neurons, and have two hidden layers, containing 4 and 3 neurons. We can feed some random values as inputs like 0.2, 0.3, 0.1, 0.2, 0.5. Java has a good amount of integrated helper methods to make our life easier as a programmers, and the “Arrays.toString” can print out all the values on the given array in a nicely formatted, coma separated list.

public static void main(String[] args) {
	Network net = new Network(new int[]{5,4,3,5});
 
	double[] output = net.calculate(new double[]{0.2,0.3,0.1,0.2,0.5});
 
	System.out.println(Arrays.toString(output));
}

When we run this program, we notice an issue right away. No matter what input values are we entering, all five of the output values are always exactly 0.5. This occurs because all the weight and bias values are initialized as 0 instead of 1, which basically nullifies all of our calculated summary values, causing the sigmoid function to return us 0.5 every time as well.

Fixing the Neural Network’s bias/weight initial values:

We can change the weight/bias initialization lines at our constructor to start with 1, but I will go one step further and make those values randomized within a certain range. This will give us more flexible control over the network’s behavior right from the start.

I’m creating a helper class called “NetworkTools” to store all the array creating randomizer and related utility methods. These methods are going to become handy as we go on, and are very straight forward to understand. I’ve commented their functions at each of their header:

public class NetworkTools {
 
	//every value in this generated array will be the init_value
	public static double[] createArray(int size, double init_value) {
		if (size < 1) {
			return null;
		}
 
		double[] ar = new double[size];
		for (int i = 0; i < size; i++) {
			ar[i] = init_value;
		}
 
		return ar;
	}
 
	//every value in this generated 1 dimensional array will be random number, within a lower and upper bound
	public static double[] createRandomArray(int size, double lower_bound, double upper_bound) {
		if (size < 1) {
			return null;
		}
 
		double[] ar = new double[size];
		for (int i = 0; i < size; i++) {
			ar[i] = randomValue(lower_bound, upper_bound);
		}
		return ar;
	}
 
	//every value in this generated 2 dimensional array will be random number, within a lower and upper bound
	public static double[][] createRandomArray(int sizeX, int sizeY, double lower_bound, double upper_bound) {
		if (sizeX < 1 || sizeY < 1) {
			return null;
		}
 
		double[][] ar = new double[sizeX][sizeY];
		for (int i = 0; i < sizeX; i++) {
			ar[i] = createRandomArray(sizeY, lower_bound, upper_bound);
		}
 
		return ar;
	}
 
	//returns a random double within the desired lower and upper bound
	public static double randomValue(double lower_bound, double upper_bound) {
		return Math.random() * (upper_bound - lower_bound) + lower_bound;
	}
 
	//returns a specific amount of random unique (cannot appear more than once) integers, from the desired lower and upper bound 
	public static Integer[] randomValues(int lowerBound, int upperBound, int amount) {
 
		lowerBound--;
 
		if (amount > (upperBound - lowerBound)) {
			return null;
		}
 
		Integer[] values = new Integer[amount];
		for (int i = 0; i < amount; i++) {
			int n = (int)(Math.random() * (upperBound - lowerBound + 1) + lowerBound);
			while (containsValue(values, n)) {
				n = (int)(Math.random() * (upperBound - lowerBound + 1) + lowerBound);
			}
			values[i] = n;
		}
		return values;
	}
 
	//receives any datatype array as a first parameter, and checks if the provided second parameter value is contained in it
	public static < T extends Comparable < T >> boolean containsValue(T[] ar, T value) {
		for (int i = 0; i < ar.length; i++) {
			if (ar[i] != null) {
				if (value.compareTo(ar[i]) == 0) {
					return true;
				}
			}
 
		}
		return false;
	}
 
	//returns the highest value's index within the provided double array.
	public static int indexOfHighestValue(double[] values) {
		int index = 0;
		for (int i = 1; i < values.length; i++) {
			if (values[i] > values[index]) {
				index = i;
			}
		}
		return index;
	}
}

So going back to the Network constructor and changing the weight and bias initialization lines to produce some random values. These values absolutely doesn’t matter right now, they can be positive or negative also:

public Network(int[] NETWORK_LAYER_SIZES) {for (int i = 0; i < NETWORK_SIZE; i++) {this.bias[i] = NetworkTools.createRandomArray(NETWORK_LAYER_SIZES[i], -0.5, 0.7);
 
		if (i > 0) {
			weights[i] = NetworkTools.createRandomArray(NETWORK_LAYER_SIZES[i], NETWORK_LAYER_SIZES[i - 1], -1, 1);
		}
	}
}

Now every time we run the program and make one pass of feed-forwarding calculations, it will give us random output values, proving that the network works as intended. Without a learning algorithm however, the network is fairly useless in this state, so let’s tackle that issue as well.

Adding a learning algorithm to the Neural Network:

So for every given input value combination, we would need to have a “targeted” output value combination as well. With these values we can measure and compare how far or close the network is from the currently calculated output values. Let’s say the previously declared input values (0.2,0.3,0.1,0.2,0.5) we want the network to output ideally (1,0,0,0,0) instead of any other seemingly random numbers.

As explained in the previous article, this is where Backpropagation, our learning mechanism comes in and trying to predict each of the previous adjacent layers supposed weight and bias values, which would produce this final desired output. It will start from the last layer and try to predict what weight and bias combination could produce a value closer to the desired output results, once that is figured out, will jump to the previous adjacent layer and do these modification again and so on until it gets to the starting layer. We start from the last layer because those related weight and bias values have the greatest influence over the final output. Note that by the rules of the network, we can only change these bias and weight values if we want to influence these output values, we cannot change any other values directly.

These differences over the desired output and current output are called the “error signal”. Before making any changes to the weight/bias values, we finish the backpropagation completely and store this error signal from each layer (except the very first input layer).

Intuitively this seems like a very easy task to do. Just subtracting the targeted value from the current value, try nudging the weight/biases over some positive values and measure if we got closer or further from the desired output. If we got closer, then try adding more positive values until we reach the desired output, but if we got further away, start applying negative values instead and keep doing so. This would be exactly true for a simple input range, representing a straightforward curve:

kép:https://i.imgur.com/W77WOyG.png

But unfortunately, as we get more and more input values, the complexity of the whole Neural Network function gets significantly more complex as well, and predicting the exact “right” position and the path towards it becomes less and less obvious:

kép:https://i.imgur.com/3LgjejG.png

As you can see, having multiple local minimums can easily “fool” the algorithm, thinking that it goes the right way, but in reality it may just pursuit some local ones, which never going to produce the desired output. You can think of the algorithm as a “heavy ball” for weights. This ball will run down the slope where it started from, and stop at the bottom that it happen to find. Now for instance, if the initial weight value would be between 0 and 0.5 somewhere, and no matter where it would start to adjust, with a naïve “heavy ball” approach, it would slide down to be around ~0.65 and stop there, while we can clearly see that that would always produce a wrong result. This is the primary reason we use randomized values each time we start the training process, instead of setting them to 1, so every time there would be a new chance for these weights to propagate over the proper global minimum values.

Furthermore, to successful tackle this backpropagation, each neuron needs to have an “error signal” and an “output derivative” values, beside their regular output values. Our backpropagation error function, which calculates how close we are from the target output values looks like this:
E = ½ (target-output)^2.

I am not going to go into the details of all the related math in this article, because it’s a fairly large subject of its own. But for anyone interested, I can refer you to Ryan Harris over Youtube. He has excellent tutorial series for backpropagation algorithms, just to help you comprehend the whole concept easier:

Neural network tutorial: The back-propagation algorithm (Part 1)
Inditás gomb

Neural network tutorial: The back-propagation algorithm (Part 1) (00:13:01)

There are many good written articles over the net for it, Wikipedia is a great detailed source as well:
https://en.wikipedia.org/wiki/Backpropagation

Alright, going back to coding. We need to declare these two additional variables and initialize them at the constructor before we can use them:

public class Network {
	private double[][] errorSignal;
	private double[][] outputDerivative;
}
 
public Network(int[] NETWORK_LAYER_SIZES) {this.errorSignal = new double[NETWORK_SIZE][];
	this.outputDerivative = new double[NETWORK_SIZE][];
 
	for (int i = 0; i < NETWORK_SIZE; i++) {this.errorSignal[i] = new double[NETWORK_LAYER_SIZES[i]];
		this.outputDerivative[i] = new double[NETWORK_LAYER_SIZES[i]];}
}

The feed-forwarding calculation needs to be updated with these variables as well:

public double[] calculate(double input[]) {for (int layer = 1; layer < NETWORK_SIZE; layer++) {
		for (int neuron = 0; neuron < NETWORK_LAYER_SIZES[layer]; neuron++) {for (int prevNeuron = 0; prevNeuron < NETWORK_LAYER_SIZES[layer - 1]; prevNeuron++) {}
			output[layer][neuron] = sigmoid(sum);
			outputDerivative[layer][neuron] = output[layer][neuron] * (1 - output[layer][neuron]);
		}
	}}

Calculating the error needs another method, we will call this “backpropError”, which receives the target output array and does the error calculations for each layer, starting from the last one:

public void backpropError(double[] target) {
	for (int neuron = 0; neuron < NETWORK_LAYER_SIZES[NETWORK_SIZE - 1]; neuron++) { //for output layer neurons
		errorSignal[NETWORK_SIZE - 1][neuron] = (output[NETWORK_SIZE - 1][neuron] - target[neuron]) * outputDerivative[NETWORK_SIZE - 1][neuron];
	}
 
	for (int layer = NETWORK_SIZE - 2; layer > 0; layer--) { //for hidden layer neurons
		for (int neuron = 0; neuron < NETWORK_LAYER_SIZES[layer]; neuron++) {
			double sum = 0;
			for (int nextNeuron = 0; nextNeuron < NETWORK_LAYER_SIZES[layer + 1]; nextNeuron++) {
				sum += weights[layer + 1][nextNeuron][neuron] * errorSignal[layer + 1][nextNeuron];
			}
			this.errorSignal[layer][neuron] = sum * outputDerivative[layer][neuron];
		}
	}
}

Once we have these values, we can finally update the weights and biases over our network. We will need another method for this. Let’s call this “updateWeightsAndBiases”. It can receive 1 parameter called the “learning rate”. The learning rate is just a ratio value, indication how brave should the learning algorithm be over nudging those values in a positive or negative values. Setting this number to too small will produce a much slower learning periods, but setting it to too high may produce errors or anomalies over the calculations, making the whole learning process taking slower again.

public void updateWeightsAndBiases(double learningRate) {
	for (int layer = 1; layer < NETWORK_SIZE; layer++) {
		for (int neuron = 0; neuron < NETWORK_LAYER_SIZES[layer]; neuron++) {
			//for bias
			double delta = -learningRate * errorSignal[layer][neuron];
			bias[layer][neuron] += delta;
 
			//for weights
			for (int prevNeuron = 0; prevNeuron < NETWORK_LAYER_SIZES[layer - 1]; prevNeuron++) {
				//weights[layer, neuron, prevNeuron]
				weights[layer][neuron][prevNeuron] += delta * output[layer - 1][prevNeuron];
			}
		}
	}
}

Next, let’s have a method that will make our live easier and connect all these learning functionalities together. Let’s call it “train”. It receives the input array, the output array and a learning rate. It goes over all these previously mentioned calculatios:

public void train(double[] input, double[] target, double learningRate) {
	if (input.length != INPUT_SIZE || target.length != OUTPUT_SIZE) {
		return;
	}
 
	calculate(input);
	backpropError(target);
	updateWeightsAndBiases(learningRate);
}

We are now set to use our learning algorithm! Let’s change the main method to do so. We can use the same similar network setup, having for instance the input layer as 3 neurons values: 0.1, 0.5, 0.2, while having the output layer of 5 neurons, expecting the output to be for this given input combination to be 0, 1, 0, 0, 0. The FOR loop represents the number of times we are running the learning algorithm and applying the changes to the weights/biases.

public static void main(String[] args) {
	Network net = new Network(new int[]{3,4,3,5});
 
	double[] input = new double[]{0.1, 0.5, 0.2};
    double[] expected = new double[]{0, 1, 0, 0, 0};
 
	for (int i = 0; i < 1; i++) {
		net.train(input, expected, 1); //input, target, learningRate
	}
 
	double[] output = net.calculate(input);
 
	System.out.print("  current output neuron values: ");
	for (double neuronValue: output) {
		System.out.printf("%02.3f  ", neuronValue);
	}
 
	System.out.printf("\n expected output neuron values: ");
	for (double neuronValue: expected) {
		System.out.printf("%02.3f  ", neuronValue);
	}
}

Running the program gives us a seemingly far away result from desired:

output:
  current output neuron values: 0.744  0.667  0.566  0.576  0.388  
 expected output neuron values: 0.000  1.000  0.000  0.000  0.000

This is reasonable again, we ran the learning algorithm only once, and as we know already, it’s virtually impossible to “guess” the right weight/bias combination. The network needs many tries and measuring over and over, until it can get closer and closer. Let’s try running the learning algorithm 10 times for instance by changing the FOR loop value:

output:
  current output neuron values: 0.143  0.791  0.226  0.224  0.242  
 expected output neuron values: 0.000  1.000  0.000  0.000  0.000 

We can see that the output this time is getting actually closer to the desired values. The ones that should be zero are ~0.2, and the one that should be 1, is almost ~0.8. Ok, let’s try running the learning algorithm, say 10,000 times:

output:
  current output neuron values: 0.004  0.996  0.004  0.004  0.004  
 expected output neuron values: 0.000  1.000  0.000  0.000  0.000  

We can see that the values are getting really close to the desired ones, and the more and more we train the network, the more accurate will it actually be. It depends on us how close do we want to get to the desired values before we can safely say that the network knows the right output for the right input, and how much processing power do we want to trade in for the training. You can imagine that over a large network and large amount of input data, a couple of million iterations can take hours or even days.

Multiple Input and Output Sets for our Neural Network:

In most of the cases, we would have a large amount of different input sets and all of them need to produce a given targeted output sets. We could have different variable names for each selected input and desired output arrays, but you can imagine that this would start to get tedious even for a couple of hundred values, not talking about millions.

To tackle this issue, we need to be as efficient as possible and create a new class, which can contain and work with many-many input and their corresponding expected output values. Let’s call it “TrainSet”. I’m briefly going the talk about a few methods in it, because most of them are straight forward to understand just by looking at them.

So we have a constructor that can accept the input and output size, this will represent the number of neurons at the network’s input and output layer.

“addData(input[], expected[])” will expect two parameters, the first one being the currently inserted input array values and the second being currently expected output array values for it. You can call this method as many time you need, and add as many input/expected array combinations to the set, for instance with a simple FOR loop.

“getInput(index)” and “getOutput(index)” will get you back these input/expected array values from the given index point.

“extractBatch” Gives us the ability to extract only a given range of the preloaded set, instead of the all. This can be handy for instance if we have 7000 entries in the set, but we would like to work with only 20 for a given task.

The main method just generates some random input/expected values, stores them in the set with the help of the FOR loop, and outputs them in the end as a demonstration.

public class TrainSet {
 
	public final int INPUT_SIZE;
	public final int OUTPUT_SIZE;
 
	//double[][] <- index1: 0 = input, 1 = output || index2: index of element
	private ArrayList < double[][] > data = new ArrayList < >();
 
	public TrainSet(int INPUT_SIZE, int OUTPUT_SIZE) {
		this.INPUT_SIZE = INPUT_SIZE;
		this.OUTPUT_SIZE = OUTPUT_SIZE;
	}
 
	//adds new data to the data set
	public void addData(double[] in , double[] expected) {
		if ( in .length != INPUT_SIZE || expected.length != OUTPUT_SIZE) return;
		data.add(new double[][] { in ,
			expected
		});
	}
 
	public TrainSet extractBatch(int size) {
		if (size > 0 && size <= this.size()) {
			TrainSet set = new TrainSet(INPUT_SIZE, OUTPUT_SIZE);
			Integer[] ids = NetworkTools.randomValues(0, this.size() - 1, size);
			for (Integer i: ids) {
				set.addData(this.getInput(i), this.getOutput(i));
			}
			return set;
		} else return this;
	}
 
	public static void main(String[] args) {
		TrainSet set = new TrainSet(3, 2);
 
		for (int i = 0; i < 8; i++) {
			double[] a = new double[3];
			double[] b = new double[2];
			for (int k = 0; k < 3; k++) {
				a[k] = (double)((int)(Math.random() * 10)) / (double) 10;
				if (k < 2) {
					b[k] = (double)((int)(Math.random() * 10)) / (double) 10;
				}
			}
			set.addData(a, b);
		}
 
		System.out.println(set);
	}
 
	public String toString() {
		String s = "TrainSet [" + INPUT_SIZE + " ; " + OUTPUT_SIZE + "]\n";
		int index = 0;
		for (double[][] r: data) {
			s += index + ":   " + Arrays.toString(r[0]) + "  >-||-<  " + Arrays.toString(r[1]) + "\n";
			index++;
		}
		return s;
	}
 
	//how many data sets we got
	public int size() {
		return data.size();
	}
 
	//gets the input set from a certain index on the data set 
	public double[] getInput(int index) {
		if (index >= 0 && index < size()) return data.get(index)[0];
		else return null;
	}
 
	//gets the output set from a certain index on the data set 
	public double[] getOutput(int index) {
		if (index >= 0 && index < size()) return data.get(index)[1];
		else return null;
	}
 
	public int getINPUT_SIZE() {
		return INPUT_SIZE;
	}
 
	public int getOUTPUT_SIZE() {
		return OUTPUT_SIZE;
	}
}

Going back to our Network class, let’s create a method called “trainWithSet”. This method will accept a whole trainset to work with, a number of training loops we would like to go through the whole set, and the batchSize we would like to work with:

public void trainWithSet(TrainSet set, int loops, int batchSize) {
	if (set.INPUT_SIZE != INPUT_SIZE || set.OUTPUT_SIZE != OUTPUT_SIZE) {
		return;
	}
 
	for (int i = 0; i < loops; i++) {
		TrainSet batch = set.extractBatch(batchSize);
 
		for (int b = 0; b < batchSize; b++) {
			this.train(batch.getInput(b), batch.getOutput(b), 0.3);
		}
	}
}

We need a new main method to handle traning sets, let’s make one:

public static void main(String[] args) {
	Network net = new Network(new int[] {
		5,
		3,
		3,
		2
	});
 
	TrainSet set = new TrainSet(5, 2);
 
	set.addData(new double[]{0.1,0.2,0.3,0.4,0.5}, new double[]{0.9,0.1});
	set.addData(new double[]{0.9,0.8,0.7,0.6,0.2}, new double[]{0.1,0.9});
	set.addData(new double[]{0.3,0.8,0.7,0.4,0.1}, new double[]{0.3,0.7});
	set.addData(new double[]{0.9,0.3,0.4,0.5,0.6}, new double[]{0.7,0.3});
	set.addData(new double[]{0.2,0.9,0.4,0.2,0.4}, new double[]{0.2,0.4});
    set.addData(new double[]{0.1,0.1,0.9,0.9,0.9}, new double[]{0.5,0.5});
 
 
	net.trainWithSet(set, 1, 6);
 
	for (int i = 0; i < 6; i++) {
		System.out.println(Arrays.toString(net.calculate(set.getInput(i))));
	}
}

I’ve made a network with 5 neurons at the input layer, 2 at the output and 3 at both of the hidden layers. For this example this will be sufficient, but this is the time when we need to think of the hidden layer’s size. If we define the number of neurons too small here, the network won’t have enough space to “store” very large number of data combinations, because the new input values that would set the weights and biases, can override the already properly defined ones, resulting in never-ending try and error iterations, that will never produce accurate result for all the desired values.

On the other hand, having too large network size will make the network extremely slow to work with, and making it significantly slower to learn as well.

So we instantiated a new trainset in the main method, having the same number of input and output neuron numbers as our network does. We added 6 data sets, each containing the input set, and the expected output set for it.

Finally, we called each input set entry values (in our example, that’s 6 entries to loop through), and verify out network if it produces the expected output values, after the training.

If we run our program, we can see that the values are all random numbers all over the place:

output:
[0.5804082848557195, 0.611540224159821]
[0.5793365763365534, 0.6152768910621293]
[0.5792965979397438, 0.6135348171779963]
[0.5798916531162135, 0.6149473277301244]
[0.5799083251987178, 0.6139233850164606]
[0.5825736971539314, 0.6127042166996858]

This is fully expected now that we know how the training process works. Let’s notch up the training to run 1000 times:

output:
[0.8269183006326297, 0.1488344744465759]
[0.13914142176572553, 0.7727892885721617]
[0.16591764093195452, 0.7372080691272994]
[0.6711814111500592, 0.27195100535567884]
[0.3606707137762615, 0.5284685374836402]
[0.4670048361628152, 0.43512742680336075]

We can see that the numbers are converging closer and closer to the expected output values, the more and more training do we make before testing. This is fully expected again. Let’s try 100,000 training iterations anyway:

output:
[0.9000000328097575, 0.10000019245068295]
[0.10000119473600338, 0.9000012484104274]
[0.29999933445396226, 0.6999994676152578]
[0.6999999906432556, 0.29999997748811974]
[0.2000002150797167, 0.39999998997508834]
[0.5000000439132515, 0.500000037083972]

Yep, as expected, all the tested output values are getting extremely close to the expected values, after this many training iterations.

You can see where we are going with this. Yes, again referring back to the previous article, we can use this to train the network with large number of written single digit numbers, to “guess” our uniquely handwritten sample that we will provide to it.

Finally, training for handwriting recognition with MNIST data set:

MNIST is a large, open and free database of handwritten digit values, and their supposed output labels. It has a training set over 60,000 examples and test set of 10,000 examples:
http://yann.lecun.com/exdb/mnist/

Let’s download the training set of images and training set of labels and store them in /res folder. We can make another file here, called number.png, a 28*28pixel large file that will eventually contain our personally handwritten testable image.

We will make several classes to work with the MNIST dataset values and connect them with our network. First the “MnistDbFile.java” to help us work with the database files:

import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.RandomAccessFile;
 
/**
 * MNIST database file containing entries that can represent image or label
 * data. Extends the standard random access file with methods for navigating
 * over the entries. The file format is basically idx with specific header
 * information. This includes a magic number for determining the type of stored
 * entries, count of entries.
 */
public abstract class MnistDbFile extends RandomAccessFile {
	private int count;
 
	/**
     * Creates new instance and reads the header information.
     * 
     * @param name
     *            the system-dependent filename
     * @param mode
     *            the access mode
     * @throws IOException
     * @throws FileNotFoundException
     * @see RandomAccessFile
     */
	public MnistDbFile(String name, String mode) throws IOException {
		super(name, mode);
		if (getMagicNumber() != readInt()) {
			throw new RuntimeException("This MNIST DB file " + name + " should start with the number " + getMagicNumber() + ".");
		}
		count = readInt();
	}
 
	/**
     * MNIST DB files start with unique integer number.
     * 
     * @return integer number that should be found in the beginning of the file.
     */
	protected abstract int getMagicNumber();
 
	/**
     * The current entry index.
     * 
     * @return long
     * @throws IOException
     */
	public long getCurrentIndex() throws IOException {
		return (getFilePointer() - getHeaderSize()) / getEntryLength() + 1;
	}
 
	/**
     * Set the required current entry index.
     * 
     * @param curr
     *            the entry index
     */
	public void setCurrentIndex(long curr) {
		try {
			if (curr < 0 || curr > count) {
				throw new RuntimeException(curr + " is not in the range 0 to " + count);
			}
			seek(getHeaderSize() + (curr - 1) * getEntryLength());
		} catch(IOException e) {
			throw new RuntimeException(e);
		}
	}
 
	public int getHeaderSize() {
		return 8; // two integers
	}
 
	/**
     * Number of bytes for each entry.
     * Defaults to 1.
     * 
     * @return int
     */
	public int getEntryLength() {
		return 1;
	}
 
	/**
     * Move to the next entry.
     * 
     * @throws IOException
     */
	public void next() throws IOException {
		if (getCurrentIndex() < count) {
			skipBytes(getEntryLength());
		}
	}
 
	/**
     * Move to the previous entry.
     * 
     * @throws IOException
     */
	public void prev() throws IOException {
		if (getCurrentIndex() > 0) {
			seek(getFilePointer() - getEntryLength());
		}
	}
 
	public int getCount() {
		return count;
	}
}

Next is the “MnistImageFile.java” to work with the images in the database:

import java.io.FileNotFoundException;
import java.io.IOException;
 
/**
 * 
 * MNIST database image file. Contains additional header information for the
 * number of rows and columns per each entry.
 * 
 */
public class MnistImageFile extends MnistDbFile {
	private int rows;
	private int cols;
 
	/**
     * Creates new MNIST database image file ready for reading.
     * 
     * @param name
     *            the system-dependent filename
     * @param mode
     *            the access mode
     * @throws IOException
     * @throws FileNotFoundException
     */
	public MnistImageFile(String name, String mode) throws FileNotFoundException,
	IOException {
		super(name, mode);
 
		// read header information
		rows = readInt();
		cols = readInt();
	}
 
	/**
     * Reads the image at the current position.
     * 
     * @return matrix representing the image
     * @throws IOException
     */
	public int[][] readImage() throws IOException {
		int[][] dat = new int[getRows()][getCols()];
		for (int i = 0; i < getCols(); i++) {
			for (int j = 0; j < getRows(); j++) {
				dat[i][j] = readUnsignedByte();
			}
		}
		return dat;
	}
 
	/**
     * Move the cursor to the next image.
     * 
     * @throws IOException
     */
	public void nextImage() throws IOException {
		super.next();
	}
 
	/**
     * Move the cursor to the previous image.
     * 
     * @throws IOException
     */
	public void prevImage() throws IOException {
		super.prev();
	}
 
	@Override
	protected int getMagicNumber() {
		return 2051;
	}
 
	/**
     * Number of rows per image.
     * 
     * @return int
     */
	public int getRows() {
		return rows;
	}
 
	/**
     * Number of columns per image.
     * 
     * @return int
     */
	public int getCols() {
		return cols;
	}
 
	@Override
	public int getEntryLength() {
		return cols * rows;
	}
 
	@Override
	public int getHeaderSize() {
		return super.getHeaderSize() + 8; // to more integers - rows and columns
	}
}

Next is the “MnistLabelFile.java” to help us work with the labels over the database:

import java.io.FileNotFoundException;
import java.io.IOException;
 
/**
 * 
 * MNIST database label file.
 * 
 */
public class MnistLabelFile extends MnistDbFile {
 
	/**
     * Creates new MNIST database label file ready for reading.
     * 
     * @param name
     *            the system-dependent filename
     * @param mode
     *            the access mode
     * @throws IOException
     * @throws FileNotFoundException
     */
	public MnistLabelFile(String name, String mode) throws IOException {
		super(name, mode);
	}
 
	/**
     * Reads the integer at the current position.
     * 
     * @return integer representing the label
     * @throws IOException
     */
	public int readLabel() throws IOException {
		return readUnsignedByte();
	}
 
	/** Read the specified number of labels from the current position*/
	public int[] readLabels(int num) throws IOException {
		int[] out = new int[num];
		for (int i = 0; i < num; i++) out[i] = readLabel();
		return out;
	}
 
	@Override
	protected int getMagicNumber() {
		return 2049;
	}
}

And finally, the “Mnist.java” file, that will contain our main method to run the training algorithms, connect them with the training sets, and finally test our handwritten number and try to guess its value:

import java.awt.Color;
import java.awt.image.BufferedImage;
import java.io.File;
import java.io.IOException;
 
import javax.imageio.ImageIO;
 
public class Mnist {
 
	public static void main(String[] args) {
		Network network = new Network(new int[]{784, 70, 35, 10});
 
		TrainSet set = createTrainSet(0, 5000);
		trainData(network, set, 10, 20, 5000);
 
		testMyImage(network);
	}
 
	public static void testMyImage(Network net) {
		BufferedImage img = null;
		try {
			img = ImageIO.read(new File("res/number.png"));
		} catch(IOException e) {
			e.printStackTrace();
		}
		double[] input = new double[784];
		for (int i = 0; i < 28; i++) {
			for (int n = 0; n < 28; n++) {
				input[n * 28 + i] = (float)(new Color(img.getRGB(i, n)).getRed()) / 256f;
			}
		}
 
		System.out.print("output neuron values: ");
		double[] output = net.calculate(input);
		for (double neuronValue: output) {
			System.out.printf("%02.3f  ", neuronValue);
		}
		System.out.println();
		System.out.print("corresponding number:     0      1      2      3      4      5      6      7      8      9");
		System.out.println();
 
		System.out.println("I think, that the handwritten number is: " + NetworkTools.indexOfHighestValue(output) + "!");
	}
 
	public static TrainSet createTrainSet(int start, int end) {
 
		TrainSet set = new TrainSet(28 * 28, 10);
 
		try {
 
			String path = new File("").getAbsolutePath();
 
			MnistImageFile m = new MnistImageFile(path + "/res/trainImage.idx3-ubyte", "rw");
			MnistLabelFile l = new MnistLabelFile(path + "/res/trainLabel.idx1-ubyte", "rw");
 
			for (int i = start; i <= end; i++) {
				if (i % 100 == 0) {
					System.out.println("prepared: " + i);
				}
 
				double[] input = new double[28 * 28];
				double[] output = new double[10];
 
				output[l.readLabel()] = 1d;
				for (int j = 0; j < 28 * 28; j++) {
					input[j] = (double) m.read() / (double) 256; //images are from 0-256 but we want 0-1 for our network to learn.
				}
 
				set.addData(input, output);
				m.next();
				l.next();
			}
		} catch(Exception e) {
			e.printStackTrace();
		}
 
		return set;
	}
 
	public static void trainData(Network net, TrainSet set, int epochs, int loops, int batch_size) {
		for (int e = 0; e < epochs; e++) {
			net.trainWithSet(set, loops, batch_size);
			System.out.println(">>>>>>>>>>>>>>>>>>>>>>>>>   " + e + "   <<<<<<<<<<<<<<<<<<<<<<<<<<");
		}
	}
}

Let’s take a look at the main method in the “Mnist.java” class. The input neuron array size is 784, each one representing a single pixel value from a written number (from 28*28 images). An output neuron array size can be 10, each one representing a single digit number, from 0-9. The two hidden layer’s neuron sizes are set 70 and 35 in this case.

The “createTrainSet” method gives us the ability to choose only a range of batch set’s from the 60,000+ values. Setting this to a reasonable small number helps us reduce the time needed to load up the desired number of training values. In this example, we are using the values from 0 to 5000 from the 60,000.

The “trainData” method does all the training steps. We pass the created neural network to it, then the created trainSet. After that, we pass the number of “epoch” we want to loop through, then the number of “iterations” we would like to loop through. Finally, we pass the number of batch size we would like to work with. We preloaded 5000 images so might as well pass all those, but we can always chose a smaller or bigger number.

By “iteration” we mean the regular of times we loop through the training methods, as we did in the previous examples. On the other hand, “Epoch” in the neural network industry means the number of times rerun all the iteration loops with all the working datasets. You can think of the two terms as nested loops. “Iterations” being the inner loop, while “epoch” being the outer loop. The final results will be more accurate as more and more training iterations and epochs do we make, with more and more unique datasets. Naturally, the larger these numbers get, the slower our whole training process will get also.

The “testMyImage” methods will load our custom handwritten image, and tries to guess its digit value, based on the network’s trained knowledge. Let’s write any number with a mouse, with white brush over a completely black background. These pixels will represent input values from range from 0 (being completely black) to 1 (being completely white). In my case, I wrote number 3:

kép:https://i.imgur.com/00CJKGH.png

Let’s run the program. As you can see, this significantly takes longer than our previous small examples. We are working with larger amount of data over a larger network. This is a good time to mention, that the training normally only needs to occur once, even if it takes hours or days to setup. Once we properly trained our network, all the weight and bias values can be serialized and saved to a file for instance, so every time when we want to read a new handwritten image and ask the network for its guessed digit value, it can process it almost instantaneously. I will not go in detail of discussing the serialization process in this article, but will assume that the reader at this level does know what I’m talking about, or can figure it out very easily.

output:
prepared: 0
prepared: 100
prepared: 200
prepared: 300
prepared: 400
prepared: 500
prepared: 600
prepared: 700
prepared: 800
prepared: 900
prepared: 1000
prepared: 1100
prepared: 1200
prepared: 1300
prepared: 1400
prepared: 1500
prepared: 1600
prepared: 1700
prepared: 1800
prepared: 1900
prepared: 2000
prepared: 2100
prepared: 2200
prepared: 2300
prepared: 2400
prepared: 2500
prepared: 2600
prepared: 2700
prepared: 2800
prepared: 2900
prepared: 3000
prepared: 3100
prepared: 3200
prepared: 3300
prepared: 3400
prepared: 3500
prepared: 3600
prepared: 3700
prepared: 3800
prepared: 3900
prepared: 4000
prepared: 4100
prepared: 4200
prepared: 4300
prepared: 4400
prepared: 4500
prepared: 4600
prepared: 4700
prepared: 4800
prepared: 4900
prepared: 5000
>>>>>>>>>>>>>>>>>>>>>>>>> epoch: 0 <<<<<<<<<<<<<<<<<<<<<<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>> epoch: 1 <<<<<<<<<<<<<<<<<<<<<<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>> epoch: 2 <<<<<<<<<<<<<<<<<<<<<<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>> epoch: 3 <<<<<<<<<<<<<<<<<<<<<<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>> epoch: 4 <<<<<<<<<<<<<<<<<<<<<<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>> epoch: 5 <<<<<<<<<<<<<<<<<<<<<<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>> epoch: 6 <<<<<<<<<<<<<<<<<<<<<<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>> epoch: 7 <<<<<<<<<<<<<<<<<<<<<<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>> epoch: 8 <<<<<<<<<<<<<<<<<<<<<<<<<<
>>>>>>>>>>>>>>>>>>>>>>>>> epoch: 9 <<<<<<<<<<<<<<<<<<<<<<<<<<
 output neuron values: 0.030  0.001  0.000  0.256  0.000  0.002  0.000  0.001  0.000  0.000  
 corresponding number:     0      1      2      3      4      5      6      7      8      9 
I think, that the handwritten number is: 3!

Did the network produce an accurate result? If yes, excellent! If not, don’t give up! Keep fiddling with the parameter values, give the network some more storage range in form of neurons, more testing data, or more iteration/epoch loops and retry the results until you manage to get it right.

Got any inspiration where else could you use this technology?

Szerkesztette: psishock - 2018 ápr 02


ha leakadtok valahol a kódolásnál nyugodtan kérdezzetek, megpróbálok segíteni.

Itt van az összes kód letöltésre, ha valaki egyben akarja az egészet látni:
http://www.rewired.hu/sites/default/files/hiriro/code/Network.java
http://www.rewired.hu/sites/default/files/hiriro/code/NetworkTools.java
http://www.rewired.hu/sites/default/files/hiriro/code/TrainSet.java
http://www.rewired.hu/sites/default/files/hiriro/code/Mnist.java
http://www.rewired.hu/sites/default/files/hiriro/code/MnistDbFile.java
http://www.rewired.hu/sites/default/files/hiriro/code/MnistImageFile.java
http://www.rewired.hu/sites/default/files/hiriro/code/MnistLabelFile.java

Javaslom viszont hogy letöltés helyett szép lassan programozzátok le magatok a cikket követve, hogy pontosan megértsétek a dolgok miértjét és hogyanját. :)

ugye

REWiRED - Kutyus felfedő szétszéledés - 2014-2057 © Minden Jog Fenntartva!
Virtuális valóság és Kecskeklónozó központ - Oculus MegaRift - PS21 - Mozi - 4D - Bajuszpödrés
Médiaajánlat/Borsós Brassói Árak
Rohadt Impresszum!