Author: Pratik Sharma

Project 7 - Neural Networks

Recognizing multi-digit numbers in photographs captured at street level is an important component of modern-day map making. A classic example of a corpus of such street level photographs is Google’s Street View imagery comprised of hundreds of millions of geo-located 360 degree panoramic images. The ability to automatically transcribe an address number from a geo-located patch of pixels and associate the transcribed number with a known street address helps pinpoint, with a high degree of accuracy, the location of the building it represents.

More broadly, recognizing numbers in photographs is a problem of interest to the optical character recognition community. While OCR on constrained domains like document processing is well studied, arbitrary multi-character text recognition in photographs is still highly challenging. This difficulty arises due to the wide variability in the visual appearance of text in the wild on account of a large range of fonts, colors, styles, orientations, and character arrangements. The recognition problem is further complicated by environmental factors such as lighting, shadows, specularities, and occlusions as well as by image acquisition factors such as resolution, motion, and focus blurs.

In this project we will use dataset with images centred around a single digit (many of the images do contain some distractors at the sides). Although we are taking a sample of the data which is simpler, it is more complex than MNIST because of the distractors.

The Street View House Numbers (SVHN) Dataset

SVHN is a real-world image dataset for developing machine learning and object recognition algorithms with minimal requirement on data formatting but comes from a significantly harder, unsolved, real world problem (recognizing digits and numbers in natural scene images). SVHN is obtained from house numbers in Google Street View images.

Link to the dataset

Acknowledgement for the datasets

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, Andrew Y. Ng Reading Digits in Natural Images with Unsupervised Feature Learning NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011. PDF http://ufldl.stanford.edu/housenumbers as the URL for this site when necessary.

Objective of the project is to learn how to implement a simple image classification pipeline based on a deep neural network.

  • Understand the basic Image Classification pipeline and the data-driven approach (train/predict stages)
  • Data fetching and understand the train/val/test splits
  • Implement and apply a deep neural network classifier including (feed forward neural network, RELU, activations)
  • Understand and be able to implement (vectorized) backpropagation (cost stochastic gradient descent, cross entropy loss, cost functions)
  • Implement batch normalization for training the neural network
  • Print the classification accuracy metrics
In [1]:
# Mounting Google Drive
from google.colab import drive
drive.mount('/content/drive')
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
In [0]:
# Setting the current working directory
import os; os.chdir('drive/My Drive/Great Learning/Neural Network')

Import Packages

In [3]:
# Imports
import pandas as pd, numpy as np, matplotlib.pyplot as plt, seaborn as sns, h5py
import matplotlib.style as style; style.use('fivethirtyeight')
%matplotlib inline

# Metrics and preprocessing
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve, auc
from sklearn.model_selection import train_test_split
from sklearn import preprocessing

# TF and Keras
from tensorflow.keras.layers import Dense, Dropout, BatchNormalization
from tensorflow.keras.layers import Activation, Dense
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras import optimizers

# Checking if GPU is found
import tensorflow as tf
device_name = tf.test.gpu_device_name()
if device_name != '/device:GPU:0':
  raise SystemError('GPU device not found')
print('Found GPU at: {}'.format(device_name))

tf.reset_default_graph()
tf.set_random_seed(42)

The default version of TensorFlow in Colab will soon switch to TensorFlow 2.x.
We recommend you upgrade now or ensure your notebook will continue to use TensorFlow 1.x via the %tensorflow_version 1.x magic: more info.

Found GPU at: /device:GPU:0
In [4]:
!ls '/content/drive/My Drive/Great Learning/Neural Network'
'07_Neural Network.ipynb'   SVHN_single_grey1.h5

Load train, val and test datasets from h5py file

In [5]:
# Read the h5 file
h5_SVH = h5py.File('SVHN_single_grey1.h5', 'r')

# Load the training, validation and test sets
X_train = h5_SVH['X_train'][:]
y_train_o = h5_SVH['y_train'][:]
X_val = h5_SVH['X_val'][:]
y_val_o = h5_SVH['y_val'][:]
X_test = h5_SVH['X_test'][:]
y_test_o = h5_SVH['y_test'][:]

# Close this file

h5_SVH.close()

print('Training set', X_train.shape, y_train_o.shape)
print('Validation set', X_val.shape, y_val_o.shape)
print('Test set', X_test.shape, y_test_o.shape)

print('\n')
print('Unique labels in y_train:', np.unique(y_train_o))
print('Unique labels in y_val:', np.unique(y_val_o))
print('Unique labels in y_test:', np.unique(y_test_o))
Training set (42000, 32, 32) (42000,)
Validation set (60000, 32, 32) (60000,)
Test set (18000, 32, 32) (18000,)


Unique labels in y_train: [0 1 2 3 4 5 6 7 8 9]
Unique labels in y_val: [0 1 2 3 4 5 6 7 8 9]
Unique labels in y_test: [0 1 2 3 4 5 6 7 8 9]

Observation 1 - Sets Shape

  • Length of training sets: 42k, validation sets: 60k, test sets: 18k
  • Size of the images: 32*32
  • Number of class: 10

Visualizing first 10 images

In [6]:
# Visualizing first 10 images in the dataset and their labels
plt.figure(figsize = (15, 4.5))
for i in range(10):  
    plt.subplot(1, 10, i+1)
    plt.imshow(X_train[i].reshape((32, 32)),cmap = plt.cm.binary)
    plt.axis('off')
plt.subplots_adjust(wspace = -0.1, hspace = -0.1)
plt.show()

print('Label for each of the above image: %s' % (y_train_o[0 : 10]))
Label for each of the above image: [2 6 7 4 4 0 3 0 7 3]
In [7]:
print('Checking first image and label in training set'); print('--'*40)
plt.imshow(X_train[0], cmap = plt.cm.binary)    
plt.show()
print('Label:', y_train_o[0])
Checking first image and label in training set
--------------------------------------------------------------------------------
Label: 2
In [8]:
print('Checking first image and label in validation set'); print('--'*40)
plt.imshow(X_val[0], cmap = plt.cm.binary)    
plt.show()
print('Label:', y_val_o[0])
Checking first image and label in validation set
--------------------------------------------------------------------------------
Label: 0
In [9]:
print('Checking first image and label in test set'); print('--'*40)
plt.imshow(X_test[0], cmap = plt.cm.binary)    
plt.show()
print('Label:', y_test_o[0])
Checking first image and label in test set
--------------------------------------------------------------------------------
Label: 1

Flatten and normalize the images for Keras

In [10]:
print('Reshaping X data: (n, 32, 32) => (n, 1024)'); print('--'*40)
X_train = X_train.reshape((X_train.shape[0], -1))
X_val = X_val.reshape((X_val.shape[0], -1))
X_test = X_test.reshape((X_test.shape[0], -1))

print('Making sure that the values are float so that we can get decimal points after division'); print('--'*40)
X_train = X_train.astype('float32')
X_val = X_val.astype('float32')
X_test = X_test.astype('float32')

print('Normalizing the RGB codes by dividing it to the max RGB value'); print('--'*40)
X_train /= 255
X_val /= 255
X_test /= 255

print('Converting y data into categorical (one-hot encoding)'); print('--'*40)
y_train = to_categorical(y_train_o)
y_val = to_categorical(y_val_o)
y_test = to_categorical(y_test_o)
Reshaping X data: (n, 32, 32) => (n, 1024)
--------------------------------------------------------------------------------
Making sure that the values are float so that we can get decimal points after division
--------------------------------------------------------------------------------
Normalizing the RGB codes by dividing it to the max RGB value
--------------------------------------------------------------------------------
Converting y data into categorical (one-hot encoding)
--------------------------------------------------------------------------------
In [11]:
print('X_train shape:', X_train.shape)
print('X_val shape:', X_val.shape)
print('X_test shape:', X_test.shape)

print('\n')
print('y_train shape:', y_train.shape)
print('y_val shape:', y_val.shape)
print('y_test shape:', y_test.shape)

print('\n')
print('Number of images in X_train', X_train.shape[0])
print('Number of images in X_val', X_val.shape[0])
print('Number of images in X_test', X_test.shape[0])
X_train shape: (42000, 1024)
X_val shape: (60000, 1024)
X_test shape: (18000, 1024)


y_train shape: (42000, 10)
y_val shape: (60000, 10)
y_test shape: (18000, 10)


Number of images in X_train 42000
Number of images in X_val 60000
Number of images in X_test 18000

Modelling - Baby sitting the learning process

Fully connected linear layer

In [0]:
class Linear():
    def __init__(self, in_size, out_size):
        self.W = np.random.randn(in_size, out_size) * 0.01
        self.b = np.zeros((1, out_size))
        self.params = [self.W, self.b]
        self.gradW = None
        self.gradB = None
        self.gradInput = None        

    def forward(self, X):
        self.X = X
        self.output = np.dot(X, self.W) + self.b
        return self.output

    def backward(self, nextgrad):
        self.gradW = np.dot(self.X.T, nextgrad)
        self.gradB = np.sum(nextgrad, axis=0)
        self.gradInput = np.dot(nextgrad, self.W.T)
        return self.gradInput, [self.gradW, self.gradB]

ReLU

In [0]:
class ReLU():
    def __init__(self):
        self.params = []
        self.gradInput = None

    def forward(self, X):
        self.output = np.maximum(X, 0)
        return self.output

    def backward(self, nextgrad):
        self.gradInput = nextgrad.copy()
        self.gradInput[self.output <=0] = 0
        return self.gradInput, []

Softmax function

In [0]:
def softmax(x):
    exp_x = np.exp(x - np.max(x, axis=1, keepdims=True))
    return exp_x / np.sum(exp_x, axis=1, keepdims=True)

Cross entropy loss

In [0]:
class CrossEntropy:
    def forward(self, X, y):
        self.m = y.shape[0]
        self.p = softmax(X)
        cross_entropy = -np.log(self.p[range(self.m), y]+1e-16)
        loss = np.sum(cross_entropy) / self.m
        return loss
    
    def backward(self, X, y):
        y_idx = y.argmax()        
        grad = softmax(X)
        grad[range(self.m), y] -= 1
        grad /= self.m
        return grad

NN class that enables the forward prop and backward propagation of the entire network

In [0]:
class NN():
    def __init__(self, lossfunc = CrossEntropy(), mode = 'train'):
        self.params = []
        self.layers = []
        self.loss_func = lossfunc
        self.grads = []
        self.mode = mode
        
    def add_layer(self, layer):
        self.layers.append(layer)
        self.params.append(layer.params)

    def forward(self, X):
        for layer in self.layers:
            X = layer.forward(X)
        return X
    
    def backward(self, nextgrad):
        self.clear_grad_param()
        for layer in reversed(self.layers):
            nextgrad, grad = layer.backward(nextgrad)
            self.grads.append(grad)
        return self.grads
    
    def train_step(self, X, y):
        out = self.forward(X)
        loss = self.loss_func.forward(out,y)
        nextgrad = self.loss_func.backward(out,y)
        grads = self.backward(nextgrad)
        return loss, grads
    
    def predict(self, X):
        X = self.forward(X)
        p = softmax(X)
        return np.argmax(p, axis=1)
    
    def predict_scores(self, X):
        X = self.forward(X)
        p = softmax(X)
        return p
    
    def clear_grad_param(self):
        self.grads = []

Update function SGD with momentum

In [0]:
def update(velocity, params, grads, learning_rate=0.01, mu=0.9):
    for v, p, g, in zip(velocity, params, reversed(grads)):
        for i in range(len(g)):
            v[i] = (mu * v[i]) - (learning_rate * g[i])
            p[i] += v[i]

Get minibatches

In [0]:
def minibatch(X, y, minibatch_size):
    n = X.shape[0]
    minibatches = []
    permutation = np.random.permutation(X.shape[0])
    X = X[permutation]
    y = y[permutation]
    
    for i in range(0, n , minibatch_size):
        X_batch = X[i:i + minibatch_size, :]
        y_batch = y[i:i + minibatch_size, ]
        minibatches.append((X_batch, y_batch))
        
    return minibatches

The training loop

In [0]:
def train(net, X_train, y_train, minibatch_size, epoch, learning_rate, mu = 0.9, X_val = None, y_val = None, Lambda = 0, verb = True):
    val_loss_epoch = []
    minibatches = minibatch(X_train, y_train, minibatch_size)
    minibatches_val = minibatch(X_val, y_val, minibatch_size)
    
    for i in range(epoch):
        loss_batch = []
        val_loss_batch = []
        velocity = []
        for param_layer in net.params:
            p = [np.zeros_like(param) for param in list(param_layer)]
            velocity.append(p)
            
        # iterate over mini batches
        for X_mini, y_mini in minibatches:
            loss, grads = net.train_step(X_mini, y_mini)
            loss_batch.append(loss)
            update(velocity, net.params, grads, learning_rate=learning_rate, mu=mu)

        for X_mini_val, y_mini_val in minibatches_val:
            val_loss, _ = net.train_step(X_mini, y_mini)
            val_loss_batch.append(val_loss)
        
        # accuracy of model at end of epoch after all mini batch updates
        m_train = X_train.shape[0]
        m_val = X_val.shape[0]
        y_train_pred = []
        y_val_pred = []
        y_train1 = []
        y_vall = []
        for ii in range(0, m_train, minibatch_size):
            X_tr = X_train[ii:ii + minibatch_size, : ]
            y_tr = y_train[ii:ii + minibatch_size,]
            y_train1 = np.append(y_train1, y_tr)
            y_train_pred = np.append(y_train_pred, net.predict(X_tr))

        for ii in range(0, m_val, minibatch_size):
            X_va = X_val[ii:ii + minibatch_size, : ]
            y_va = y_val[ii:ii + minibatch_size,]
            y_vall = np.append(y_vall, y_va)
            y_val_pred = np.append(y_val_pred, net.predict(X_va))
            
        train_acc = check_accuracy(y_train1, y_train_pred)
        val_acc = check_accuracy(y_vall, y_val_pred)
        
        ## weights
        w = np.array(net.params[0][0])
        
        ## adding regularization to cost
        mean_train_loss = (sum(loss_batch) / float(len(loss_batch)))
        mean_val_loss = sum(val_loss_batch) / float(len(val_loss_batch))
        
        val_loss_epoch.append(mean_val_loss)
        if verb:
            if i%50==0:
                print("Epoch {3}/{4}: Loss = {0} | Training Accuracy = {1}".format(mean_train_loss, train_acc, val_acc, i, epoch))
    return net, val_acc

Checking the accuracy of the model

In [0]:
def check_accuracy(y_true, y_pred):
    return np.mean(y_pred == y_true)

Invoking all that we have created until now

In [0]:
# Invoking the model
## input size
input_dim = X_train.shape[1]

def train_and_test_loop(iterations, lr, Lambda, verb = True):
    ## hyperparameters
    iterations = iterations
    learning_rate = lr
    hidden_nodes1 = 10
    output_nodes = 10

    ## define neural net
    nn = NN()
    nn.add_layer(Linear(input_dim, hidden_nodes1))

    nn, val_acc = train(nn, X_train, y_train_o, minibatch_size = 200, epoch = iterations, learning_rate = learning_rate,\
                      X_val = X_test, y_val = y_test_o, Lambda = Lambda, verb = verb)
    return val_acc

Double Check that the loss is reasonable : Disable the regularization

In [22]:
lr = 0.00001
Lambda = 0
train_and_test_loop(1, lr, Lambda)
Epoch 0/1: Loss = 2.3117883972138844 | Training Accuracy = 0.09335714285714286
Out[22]:
0.09438888888888888

Now, lets crank up the Lambda(Regularization) and check what it does to our loss function

In [23]:
lr = 0.00001
Lambda = 1e3
train_and_test_loop(1, lr, Lambda)
Epoch 0/1: Loss = 2.3082000425724667 | Training Accuracy = 0.08478571428571428
Out[23]:
0.08188888888888889

Now, lets overfit to a small subset of our dataset, in this case 20 images

In [24]:
X_train_subset = X_train[0:20]
y_train_subset = y_train_o[0:20]

X_train = X_train_subset
y_train_o = y_train_subset

X_train.shape, y_train_o.shape
Out[24]:
((20, 1024), (20,))

Make sure that you can overfit very small portion of the training data

So, set a small learning rate and turn regularization off In the code below:

  • Take the first 20 examples
  • turn off regularization(reg=0.0)
  • use simple vanilla 'sgd'
In [25]:
%time
lr = 0.001
Lambda = 0
train_and_test_loop(5000, lr, Lambda)
CPU times: user 0 ns, sys: 8 µs, total: 8 µs
Wall time: 8.11 µs
Epoch 0/5000: Loss = 2.343525441008851 | Training Accuracy = 0.0
Epoch 50/5000: Loss = 1.9368840865311359 | Training Accuracy = 0.3
Epoch 100/5000: Loss = 1.8502138907049706 | Training Accuracy = 0.3
Epoch 150/5000: Loss = 1.795704759770493 | Training Accuracy = 0.35
Epoch 200/5000: Loss = 1.7505141963680764 | Training Accuracy = 0.4
Epoch 250/5000: Loss = 1.7098591715226046 | Training Accuracy = 0.45
Epoch 300/5000: Loss = 1.6720807405140872 | Training Accuracy = 0.45
Epoch 350/5000: Loss = 1.6364256819871694 | Training Accuracy = 0.5
Epoch 400/5000: Loss = 1.602489808977548 | Training Accuracy = 0.55
Epoch 450/5000: Loss = 1.5700278004175436 | Training Accuracy = 0.55
Epoch 500/5000: Loss = 1.5388749934942745 | Training Accuracy = 0.6
Epoch 550/5000: Loss = 1.5089113416081257 | Training Accuracy = 0.6
Epoch 600/5000: Loss = 1.480043412413456 | Training Accuracy = 0.6
Epoch 650/5000: Loss = 1.4521947854019437 | Training Accuracy = 0.65
Epoch 700/5000: Loss = 1.425300621663516 | Training Accuracy = 0.65
Epoch 750/5000: Loss = 1.3993044229509757 | Training Accuracy = 0.65
Epoch 800/5000: Loss = 1.3741559975684867 | Training Accuracy = 0.65
Epoch 850/5000: Loss = 1.3498101223720105 | Training Accuracy = 0.65
Epoch 900/5000: Loss = 1.3262256235641494 | Training Accuracy = 0.65
Epoch 950/5000: Loss = 1.303364719491542 | Training Accuracy = 0.65
Epoch 1000/5000: Loss = 1.2811925334058236 | Training Accuracy = 0.65
Epoch 1050/5000: Loss = 1.2596767202488166 | Training Accuracy = 0.65
Epoch 1100/5000: Loss = 1.2387871723562258 | Training Accuracy = 0.65
Epoch 1150/5000: Loss = 1.218495781389982 | Training Accuracy = 0.65
Epoch 1200/5000: Loss = 1.1987762414312768 | Training Accuracy = 0.7
Epoch 1250/5000: Loss = 1.1796038829744615 | Training Accuracy = 0.7
Epoch 1300/5000: Loss = 1.1609555306717911 | Training Accuracy = 0.7
Epoch 1350/5000: Loss = 1.1428093797371377 | Training Accuracy = 0.7
Epoch 1400/5000: Loss = 1.1251448873080339 | Training Accuracy = 0.7
Epoch 1450/5000: Loss = 1.1079426760244366 | Training Accuracy = 0.8
Epoch 1500/5000: Loss = 1.0911844477557584 | Training Accuracy = 0.85
Epoch 1550/5000: Loss = 1.0748529058881875 | Training Accuracy = 0.85
Epoch 1600/5000: Loss = 1.058931684932634 | Training Accuracy = 0.85
Epoch 1650/5000: Loss = 1.0434052864697851 | Training Accuracy = 0.85
Epoch 1700/5000: Loss = 1.0282590206397022 | Training Accuracy = 0.85
Epoch 1750/5000: Loss = 1.013478952527564 | Training Accuracy = 0.85
Epoch 1800/5000: Loss = 0.9990518529073633 | Training Accuracy = 0.85
Epoch 1850/5000: Loss = 0.9849651528906417 | Training Accuracy = 0.85
Epoch 1900/5000: Loss = 0.9712069020941513 | Training Accuracy = 0.85
Epoch 1950/5000: Loss = 0.9577657299933456 | Training Accuracy = 0.85
Epoch 2000/5000: Loss = 0.9446308101711862 | Training Accuracy = 0.85
Epoch 2050/5000: Loss = 0.9317918272064819 | Training Accuracy = 0.85
Epoch 2100/5000: Loss = 0.9192389459746264 | Training Accuracy = 0.85
Epoch 2150/5000: Loss = 0.9069627831576392 | Training Accuracy = 0.85
Epoch 2200/5000: Loss = 0.8949543807808057 | Training Accuracy = 0.85
Epoch 2250/5000: Loss = 0.8832051816107626 | Training Accuracy = 0.9
Epoch 2300/5000: Loss = 0.8717070062651777 | Training Accuracy = 0.95
Epoch 2350/5000: Loss = 0.8604520318976363 | Training Accuracy = 0.95
Epoch 2400/5000: Loss = 0.849432772333326 | Training Accuracy = 0.95
Epoch 2450/5000: Loss = 0.8386420595418531 | Training Accuracy = 0.95
Epoch 2500/5000: Loss = 0.828073026343203 | Training Accuracy = 0.95
Epoch 2550/5000: Loss = 0.8177190902516587 | Training Accuracy = 0.95
Epoch 2600/5000: Loss = 0.8075739383704768 | Training Accuracy = 0.95
Epoch 2650/5000: Loss = 0.7976315132574238 | Training Accuracy = 0.95
Epoch 2700/5000: Loss = 0.7878859996879619 | Training Accuracy = 0.95
Epoch 2750/5000: Loss = 0.778331812248966 | Training Accuracy = 0.95
Epoch 2800/5000: Loss = 0.7689635837014647 | Training Accuracy = 1.0
Epoch 2850/5000: Loss = 0.7597761540560017 | Training Accuracy = 1.0
Epoch 2900/5000: Loss = 0.7507645603089181 | Training Accuracy = 1.0
Epoch 2950/5000: Loss = 0.7419240267921232 | Training Accuracy = 1.0
Epoch 3000/5000: Loss = 0.7332499560928596 | Training Accuracy = 1.0
Epoch 3050/5000: Loss = 0.7247379205035223 | Training Accuracy = 1.0
Epoch 3100/5000: Loss = 0.716383653964874 | Training Accuracy = 1.0
Epoch 3150/5000: Loss = 0.7081830444689625 | Training Accuracy = 1.0
Epoch 3200/5000: Loss = 0.700132126890769 | Training Accuracy = 1.0
Epoch 3250/5000: Loss = 0.6922270762200713 | Training Accuracy = 1.0
Epoch 3300/5000: Loss = 0.6844642011672761 | Training Accuracy = 1.0
Epoch 3350/5000: Loss = 0.6768399381190008 | Training Accuracy = 1.0
Epoch 3400/5000: Loss = 0.6693508454210667 | Training Accuracy = 1.0
Epoch 3450/5000: Loss = 0.6619935979682519 | Training Accuracy = 1.0
Epoch 3500/5000: Loss = 0.6547649820817112 | Training Accuracy = 1.0
Epoch 3550/5000: Loss = 0.6476618906563633 | Training Accuracy = 1.0
Epoch 3600/5000: Loss = 0.6406813185618502 | Training Accuracy = 1.0
Epoch 3650/5000: Loss = 0.6338203582818351 | Training Accuracy = 1.0
Epoch 3700/5000: Loss = 0.62707619577748 | Training Accuracy = 1.0
Epoch 3750/5000: Loss = 0.6204461065619331 | Training Accuracy = 1.0
Epoch 3800/5000: Loss = 0.613927451973544 | Training Accuracy = 1.0
Epoch 3850/5000: Loss = 0.6075176756363615 | Training Accuracy = 1.0
Epoch 3900/5000: Loss = 0.6012143000972181 | Training Accuracy = 1.0
Epoch 3950/5000: Loss = 0.5950149236294051 | Training Accuracy = 1.0
Epoch 4000/5000: Loss = 0.5889172171935882 | Training Accuracy = 1.0
Epoch 4050/5000: Loss = 0.5829189215472004 | Training Accuracy = 1.0
Epoch 4100/5000: Loss = 0.5770178444941011 | Training Accuracy = 1.0
Epoch 4150/5000: Loss = 0.5712118582668013 | Training Accuracy = 1.0
Epoch 4200/5000: Loss = 0.5654988970340131 | Training Accuracy = 1.0
Epoch 4250/5000: Loss = 0.5598769545267331 | Training Accuracy = 1.0
Epoch 4300/5000: Loss = 0.5543440817764675 | Training Accuracy = 1.0
Epoch 4350/5000: Loss = 0.5488983849595841 | Training Accuracy = 1.0
Epoch 4400/5000: Loss = 0.5435380233421363 | Training Accuracy = 1.0
Epoch 4450/5000: Loss = 0.538261207319821 | Training Accuracy = 1.0
Epoch 4500/5000: Loss = 0.5330661965480605 | Training Accuracy = 1.0
Epoch 4550/5000: Loss = 0.5279512981574574 | Training Accuracy = 1.0
Epoch 4600/5000: Loss = 0.5229148650501726 | Training Accuracy = 1.0
Epoch 4650/5000: Loss = 0.5179552942730058 | Training Accuracy = 1.0
Epoch 4700/5000: Loss = 0.513071025463205 | Training Accuracy = 1.0
Epoch 4750/5000: Loss = 0.5082605393632545 | Training Accuracy = 1.0
Epoch 4800/5000: Loss = 0.5035223564010968 | Training Accuracy = 1.0
Epoch 4850/5000: Loss = 0.49885503533244346 | Training Accuracy = 1.0
Epoch 4900/5000: Loss = 0.4942571719420129 | Training Accuracy = 1.0
Epoch 4950/5000: Loss = 0.4897273978007071 | Training Accuracy = 1.0
Out[25]:
0.1381111111111111

Loading the original dataset again

In [26]:
h5_SVH = h5py.File('SVHN_single_grey1.h5', 'r')
# Load the training, validation and test sets
X_train = h5_SVH['X_train'][:]
y_train_o = h5_SVH['y_train'][:]
X_val = h5_SVH['X_val'][:]
y_val_o = h5_SVH['y_val'][:]
X_test = h5_SVH['X_test'][:]
y_test_o = h5_SVH['y_test'][:]

print('Reshaping X data: (n, 32, 32) => (n, 1024)'); print('--'*40)
X_train = X_train.reshape((X_train.shape[0], -1))
X_val = X_val.reshape((X_val.shape[0], -1))
X_test = X_test.reshape((X_test.shape[0], -1))

print('Making sure that the values are float so that we can get decimal points after division'); print('--'*40)
X_train = X_train.astype('float32')
X_val = X_val.astype('float32')
X_test = X_test.astype('float32')

print('Normalizing the RGB codes by dividing it to the max RGB value'); print('--'*40)
X_train /= 255
X_val /= 255
X_test /= 255

print('Converting y data into categorical (one-hot encoding)'); print('--'*40)
y_train = to_categorical(y_train_o)
y_val = to_categorical(y_val_o)
y_test = to_categorical(y_test_o)
Reshaping X data: (n, 32, 32) => (n, 1024)
--------------------------------------------------------------------------------
Making sure that the values are float so that we can get decimal points after division
--------------------------------------------------------------------------------
Normalizing the RGB codes by dividing it to the max RGB value
--------------------------------------------------------------------------------
Converting y data into categorical (one-hot encoding)
--------------------------------------------------------------------------------

Start with small regularization and find learning rate that makes the loss go down.

  • we start with Lambda(small regularization) = 1e-7
  • we start with a small learning rate = 1e-7
In [27]:
lr = 1e-7
Lambda = 1e-7
train_and_test_loop(500, lr, Lambda)
Epoch 0/500: Loss = 2.317728157047642 | Training Accuracy = 0.104
Epoch 50/500: Loss = 2.3127425227478797 | Training Accuracy = 0.10471428571428572
Epoch 100/500: Loss = 2.3094828415428403 | Training Accuracy = 0.1055
Epoch 150/500: Loss = 2.307345836509397 | Training Accuracy = 0.10542857142857143
Epoch 200/500: Loss = 2.305942805671862 | Training Accuracy = 0.10428571428571429
Epoch 250/500: Loss = 2.3050204802732313 | Training Accuracy = 0.10395238095238095
Epoch 300/500: Loss = 2.3044129184583078 | Training Accuracy = 0.10254761904761905
Epoch 350/500: Loss = 2.3040111864235984 | Training Accuracy = 0.101
Epoch 400/500: Loss = 2.303743794017468 | Training Accuracy = 0.10019047619047619
Epoch 450/500: Loss = 2.3035638954300524 | Training Accuracy = 0.09911904761904762
Out[27]:
0.09555555555555556

Lets try to train now with a value of learning rate 0.001

In [28]:
lr = 0.001
Lambda = 1e-7
train_and_test_loop(500, lr, Lambda)
Epoch 0/500: Loss = 2.3045206865294667 | Training Accuracy = 0.11473809523809524
Epoch 50/500: Loss = 2.259865884155515 | Training Accuracy = 0.20614285714285716
Epoch 100/500: Loss = 2.251416078728257 | Training Accuracy = 0.21633333333333332
Epoch 150/500: Loss = 2.2471058220607465 | Training Accuracy = 0.22138095238095237
Epoch 200/500: Loss = 2.2442164094973287 | Training Accuracy = 0.22454761904761905
Epoch 250/500: Loss = 2.2420426056295346 | Training Accuracy = 0.22678571428571428
Epoch 300/500: Loss = 2.240300552374621 | Training Accuracy = 0.22766666666666666
Epoch 350/500: Loss = 2.2388468080995563 | Training Accuracy = 0.22914285714285715
Epoch 400/500: Loss = 2.237598607772921 | Training Accuracy = 0.2300952380952381
Epoch 450/500: Loss = 2.2365039099696107 | Training Accuracy = 0.23076190476190475
Out[28]:
0.21433333333333332

Hyperparameter Optimization

In [29]:
import math
for k in range(1, 10):
    lr = math.pow(10, np.random.uniform(-3.0, -2.0))
    Lambda = math.pow(10, np.random.uniform(-5, 2))
    best_acc = train_and_test_loop(100, lr, Lambda, False)
    print("Try {0}/{1}: Best_val_acc: {2}, lr: {3}, Lambda: {4}\n".format(k, 10, best_acc, lr, Lambda))
Try 1/10: Best_val_acc: 0.18288888888888888, lr: 0.005248163657617819, Lambda: 64.48340786832672

Try 2/10: Best_val_acc: 0.2053888888888889, lr: 0.006679712019881258, Lambda: 0.3646857771377305

Try 3/10: Best_val_acc: 0.18827777777777777, lr: 0.0046008725625766725, Lambda: 0.03253300599381195

Try 4/10: Best_val_acc: 0.19616666666666666, lr: 0.004120077105146745, Lambda: 13.396751846027179

Try 5/10: Best_val_acc: 0.18988888888888888, lr: 0.0015322728061813945, Lambda: 0.05780821685783054

Try 6/10: Best_val_acc: 0.2031111111111111, lr: 0.003223010681924104, Lambda: 19.57233858291238

Try 7/10: Best_val_acc: 0.18738888888888888, lr: 0.007976192792141505, Lambda: 0.0004221599617830911

Try 8/10: Best_val_acc: 0.19411111111111112, lr: 0.009793789314692974, Lambda: 1.7025515288536694

Try 9/10: Best_val_acc: 0.19827777777777778, lr: 0.0023285365617220083, Lambda: 5.1261156557902146

Observation 2 - Baby sitting the neural network for SVHN
  • Best accuracy achieved using this method after hyperparameter optimization: 21%.

Modelling - Neural Network API

NN model, sigmoid activations, SGD optimizer

In [30]:
print('NN model with sigmoid activations'); print('--'*40)
# Initialize the neural network classifier
model1 = Sequential()

# Input Layer - adding input layer and activation functions sigmoid
model1.add(Dense(128, input_shape = (1024, )))
# Adding activation function
model1.add(Activation('sigmoid'))

#Hidden Layer 1 - adding first hidden layer
model1.add(Dense(64))
# Adding activation function
model1.add(Activation('sigmoid'))

# Output Layer - adding output layer which is of 10 nodes (digits)
model1.add(Dense(10))
# Adding activation function - softmax for multiclass classification
model1.add(Activation('softmax'))
NN model with sigmoid activations
--------------------------------------------------------------------------------
WARNING:tensorflow:From /usr/local/lib/python3.6/dist-packages/tensorflow_core/python/ops/resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
In [31]:
model1.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 128)               131200    
_________________________________________________________________
activation (Activation)      (None, 128)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 64)                8256      
_________________________________________________________________
activation_1 (Activation)    (None, 64)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 10)                650       
_________________________________________________________________
activation_2 (Activation)    (None, 10)                0         
=================================================================
Total params: 140,106
Trainable params: 140,106
Non-trainable params: 0
_________________________________________________________________
In [32]:
# compiling the neural network classifier, sgd optimizer
sgd = optimizers.SGD(lr = 0.01)
model1.compile(optimizer = sgd, loss = 'categorical_crossentropy', metrics = ['accuracy'])

# Fitting the neural network for training
history = model1.fit(X_train, y_train, validation_data = (X_val, y_val), batch_size = 200, epochs = 100, verbose = 1)
Train on 42000 samples, validate on 60000 samples
Epoch 1/100
42000/42000 [==============================] - 3s 64us/sample - loss: 2.3172 - acc: 0.1011 - val_loss: 2.3029 - val_acc: 0.1029
Epoch 2/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3030 - acc: 0.1024 - val_loss: 2.3030 - val_acc: 0.1024
Epoch 3/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3031 - acc: 0.0996 - val_loss: 2.3028 - val_acc: 0.0991
Epoch 4/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3030 - acc: 0.0988 - val_loss: 2.3028 - val_acc: 0.1019
Epoch 5/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3029 - acc: 0.0998 - val_loss: 2.3027 - val_acc: 0.1009
Epoch 6/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.3028 - acc: 0.0989 - val_loss: 2.3029 - val_acc: 0.0998
Epoch 7/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.3028 - acc: 0.1022 - val_loss: 2.3026 - val_acc: 0.1011
Epoch 8/100
42000/42000 [==============================] - 1s 27us/sample - loss: 2.3027 - acc: 0.1015 - val_loss: 2.3027 - val_acc: 0.0998
Epoch 9/100
42000/42000 [==============================] - 1s 28us/sample - loss: 2.3027 - acc: 0.1011 - val_loss: 2.3028 - val_acc: 0.1001
Epoch 10/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3027 - acc: 0.1016 - val_loss: 2.3026 - val_acc: 0.1012
Epoch 11/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.3027 - acc: 0.1014 - val_loss: 2.3024 - val_acc: 0.1020
Epoch 12/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.3026 - acc: 0.0993 - val_loss: 2.3024 - val_acc: 0.0998
Epoch 13/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3026 - acc: 0.1008 - val_loss: 2.3023 - val_acc: 0.1032
Epoch 14/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3025 - acc: 0.1026 - val_loss: 2.3023 - val_acc: 0.1045
Epoch 15/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.3024 - acc: 0.1019 - val_loss: 2.3022 - val_acc: 0.1067
Epoch 16/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3024 - acc: 0.1028 - val_loss: 2.3022 - val_acc: 0.0962
Epoch 17/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3024 - acc: 0.1019 - val_loss: 2.3022 - val_acc: 0.1077
Epoch 18/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3023 - acc: 0.1036 - val_loss: 2.3021 - val_acc: 0.1125
Epoch 19/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3022 - acc: 0.1033 - val_loss: 2.3022 - val_acc: 0.1000
Epoch 20/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.3023 - acc: 0.1028 - val_loss: 2.3021 - val_acc: 0.1013
Epoch 21/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3022 - acc: 0.1040 - val_loss: 2.3021 - val_acc: 0.1061
Epoch 22/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3021 - acc: 0.1048 - val_loss: 2.3020 - val_acc: 0.1111
Epoch 23/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3021 - acc: 0.1043 - val_loss: 2.3020 - val_acc: 0.1078
Epoch 24/100
42000/42000 [==============================] - 1s 27us/sample - loss: 2.3020 - acc: 0.1039 - val_loss: 2.3021 - val_acc: 0.1015
Epoch 25/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.3021 - acc: 0.1037 - val_loss: 2.3019 - val_acc: 0.1015
Epoch 26/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3020 - acc: 0.1052 - val_loss: 2.3018 - val_acc: 0.1037
Epoch 27/100
42000/42000 [==============================] - 1s 27us/sample - loss: 2.3020 - acc: 0.1053 - val_loss: 2.3018 - val_acc: 0.1041
Epoch 28/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3019 - acc: 0.1047 - val_loss: 2.3018 - val_acc: 0.1081
Epoch 29/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3019 - acc: 0.1064 - val_loss: 2.3018 - val_acc: 0.1072
Epoch 30/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.3019 - acc: 0.1028 - val_loss: 2.3018 - val_acc: 0.1039
Epoch 31/100
42000/42000 [==============================] - 1s 27us/sample - loss: 2.3018 - acc: 0.1055 - val_loss: 2.3017 - val_acc: 0.1143
Epoch 32/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3017 - acc: 0.1055 - val_loss: 2.3018 - val_acc: 0.1142
Epoch 33/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.3016 - acc: 0.1061 - val_loss: 2.3018 - val_acc: 0.1019
Epoch 34/100
42000/42000 [==============================] - 1s 28us/sample - loss: 2.3018 - acc: 0.1057 - val_loss: 2.3015 - val_acc: 0.1066
Epoch 35/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3017 - acc: 0.1079 - val_loss: 2.3015 - val_acc: 0.1040
Epoch 36/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.3017 - acc: 0.1078 - val_loss: 2.3015 - val_acc: 0.1039
Epoch 37/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3014 - acc: 0.1084 - val_loss: 2.3018 - val_acc: 0.1082
Epoch 38/100
42000/42000 [==============================] - 1s 27us/sample - loss: 2.3015 - acc: 0.1052 - val_loss: 2.3017 - val_acc: 0.0992
Epoch 39/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.3014 - acc: 0.1075 - val_loss: 2.3017 - val_acc: 0.1029
Epoch 40/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.3015 - acc: 0.1057 - val_loss: 2.3013 - val_acc: 0.1061
Epoch 41/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.3014 - acc: 0.1077 - val_loss: 2.3015 - val_acc: 0.1013
Epoch 42/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3014 - acc: 0.1063 - val_loss: 2.3013 - val_acc: 0.1096
Epoch 43/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3014 - acc: 0.1086 - val_loss: 2.3013 - val_acc: 0.1166
Epoch 44/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3014 - acc: 0.1085 - val_loss: 2.3012 - val_acc: 0.1049
Epoch 45/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.3013 - acc: 0.1101 - val_loss: 2.3013 - val_acc: 0.1107
Epoch 46/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3013 - acc: 0.1059 - val_loss: 2.3012 - val_acc: 0.1158
Epoch 47/100
42000/42000 [==============================] - 1s 28us/sample - loss: 2.3012 - acc: 0.1110 - val_loss: 2.3012 - val_acc: 0.1061
Epoch 48/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3012 - acc: 0.1094 - val_loss: 2.3011 - val_acc: 0.1081
Epoch 49/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3012 - acc: 0.1083 - val_loss: 2.3013 - val_acc: 0.1005
Epoch 50/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.3011 - acc: 0.1099 - val_loss: 2.3011 - val_acc: 0.1132
Epoch 51/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3010 - acc: 0.1121 - val_loss: 2.3013 - val_acc: 0.1023
Epoch 52/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3011 - acc: 0.1110 - val_loss: 2.3010 - val_acc: 0.1064
Epoch 53/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3011 - acc: 0.1106 - val_loss: 2.3009 - val_acc: 0.1101
Epoch 54/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3010 - acc: 0.1100 - val_loss: 2.3009 - val_acc: 0.1094
Epoch 55/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.3009 - acc: 0.1097 - val_loss: 2.3009 - val_acc: 0.1133
Epoch 56/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3009 - acc: 0.1113 - val_loss: 2.3009 - val_acc: 0.1195
Epoch 57/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.3009 - acc: 0.1130 - val_loss: 2.3008 - val_acc: 0.1099
Epoch 58/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3009 - acc: 0.1145 - val_loss: 2.3008 - val_acc: 0.1093
Epoch 59/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.3009 - acc: 0.1117 - val_loss: 2.3008 - val_acc: 0.1138
Epoch 60/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.3008 - acc: 0.1125 - val_loss: 2.3006 - val_acc: 0.1204
Epoch 61/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.3008 - acc: 0.1122 - val_loss: 2.3007 - val_acc: 0.1053
Epoch 62/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3007 - acc: 0.1095 - val_loss: 2.3008 - val_acc: 0.1187
Epoch 63/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3007 - acc: 0.1140 - val_loss: 2.3007 - val_acc: 0.1177
Epoch 64/100
42000/42000 [==============================] - 1s 28us/sample - loss: 2.3007 - acc: 0.1117 - val_loss: 2.3005 - val_acc: 0.1193
Epoch 65/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3006 - acc: 0.1120 - val_loss: 2.3005 - val_acc: 0.1189
Epoch 66/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3006 - acc: 0.1135 - val_loss: 2.3005 - val_acc: 0.1089
Epoch 67/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.3006 - acc: 0.1143 - val_loss: 2.3005 - val_acc: 0.1201
Epoch 68/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.3004 - acc: 0.1117 - val_loss: 2.3005 - val_acc: 0.1099
Epoch 69/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.3005 - acc: 0.1138 - val_loss: 2.3004 - val_acc: 0.1221
Epoch 70/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.3004 - acc: 0.1152 - val_loss: 2.3004 - val_acc: 0.1177
Epoch 71/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3004 - acc: 0.1198 - val_loss: 2.3005 - val_acc: 0.1110
Epoch 72/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3004 - acc: 0.1152 - val_loss: 2.3003 - val_acc: 0.1209
Epoch 73/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3004 - acc: 0.1161 - val_loss: 2.3002 - val_acc: 0.1174
Epoch 74/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.3003 - acc: 0.1162 - val_loss: 2.3002 - val_acc: 0.1155
Epoch 75/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3002 - acc: 0.1153 - val_loss: 2.3002 - val_acc: 0.1095
Epoch 76/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3002 - acc: 0.1156 - val_loss: 2.3001 - val_acc: 0.1209
Epoch 77/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3002 - acc: 0.1168 - val_loss: 2.3001 - val_acc: 0.1187
Epoch 78/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3001 - acc: 0.1217 - val_loss: 2.3003 - val_acc: 0.1151
Epoch 79/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3001 - acc: 0.1150 - val_loss: 2.3000 - val_acc: 0.1183
Epoch 80/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.3001 - acc: 0.1196 - val_loss: 2.3001 - val_acc: 0.1110
Epoch 81/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3001 - acc: 0.1145 - val_loss: 2.3000 - val_acc: 0.1260
Epoch 82/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.3000 - acc: 0.1196 - val_loss: 2.3001 - val_acc: 0.1094
Epoch 83/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.3000 - acc: 0.1144 - val_loss: 2.2999 - val_acc: 0.1208
Epoch 84/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2999 - acc: 0.1197 - val_loss: 2.2998 - val_acc: 0.1211
Epoch 85/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.2999 - acc: 0.1178 - val_loss: 2.2998 - val_acc: 0.1247
Epoch 86/100
42000/42000 [==============================] - 1s 28us/sample - loss: 2.2998 - acc: 0.1186 - val_loss: 2.2997 - val_acc: 0.1234
Epoch 87/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.2998 - acc: 0.1185 - val_loss: 2.2998 - val_acc: 0.1148
Epoch 88/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2998 - acc: 0.1168 - val_loss: 2.2997 - val_acc: 0.1222
Epoch 89/100
42000/42000 [==============================] - 1s 28us/sample - loss: 2.2998 - acc: 0.1189 - val_loss: 2.2997 - val_acc: 0.1289
Epoch 90/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2997 - acc: 0.1195 - val_loss: 2.2998 - val_acc: 0.1131
Epoch 91/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2997 - acc: 0.1196 - val_loss: 2.2997 - val_acc: 0.1146
Epoch 92/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.2996 - acc: 0.1196 - val_loss: 2.2995 - val_acc: 0.1250
Epoch 93/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2996 - acc: 0.1202 - val_loss: 2.2995 - val_acc: 0.1207
Epoch 94/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2995 - acc: 0.1209 - val_loss: 2.2995 - val_acc: 0.1238
Epoch 95/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2995 - acc: 0.1215 - val_loss: 2.2995 - val_acc: 0.1158
Epoch 96/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.2994 - acc: 0.1203 - val_loss: 2.2996 - val_acc: 0.1181
Epoch 97/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2994 - acc: 0.1227 - val_loss: 2.2995 - val_acc: 0.1090
Epoch 98/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.2994 - acc: 0.1210 - val_loss: 2.2994 - val_acc: 0.1194
Epoch 99/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2994 - acc: 0.1224 - val_loss: 2.2993 - val_acc: 0.1255
Epoch 100/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.2993 - acc: 0.1243 - val_loss: 2.2994 - val_acc: 0.1126
In [33]:
print('Evaluate NN model with sigmoid activations'); print('--'*40)
results1 = model1.evaluate(X_val, y_val)
print('Validation accuracy: {}'.format(round(results1[1]*100, 2), '%'))
Evaluate NN model with sigmoid activations
--------------------------------------------------------------------------------
60000/60000 [==============================] - 3s 47us/sample - loss: 2.2994 - acc: 0.1126
Validation accuracy: 11.26

NN model, sigmoid activations, SGD optimizer, changing learning rate

In [34]:
print('NN model with sigmoid activations - changing learning rate'); print('--'*40)
# compiling the neural network classifier, sgd optimizer
sgd = optimizers.SGD(lr = 0.001)
model1.compile(optimizer = sgd, loss = 'categorical_crossentropy', metrics = ['accuracy'])

# Fitting the neural network for training
history = model1.fit(X_train, y_train, validation_data = (X_val, y_val), batch_size = 200, epochs = 100, verbose = 1)
NN model with sigmoid activations - changing learning rate
--------------------------------------------------------------------------------
Train on 42000 samples, validate on 60000 samples
Epoch 1/100
42000/42000 [==============================] - 1s 28us/sample - loss: 2.2991 - acc: 0.1169 - val_loss: 2.2993 - val_acc: 0.1169
Epoch 2/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2990 - acc: 0.1221 - val_loss: 2.2992 - val_acc: 0.1207
Epoch 3/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.2990 - acc: 0.1222 - val_loss: 2.2992 - val_acc: 0.1231
Epoch 4/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2990 - acc: 0.1248 - val_loss: 2.2992 - val_acc: 0.1244
Epoch 5/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2990 - acc: 0.1266 - val_loss: 2.2992 - val_acc: 0.1257
Epoch 6/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2990 - acc: 0.1285 - val_loss: 2.2992 - val_acc: 0.1260
Epoch 7/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2990 - acc: 0.1264 - val_loss: 2.2991 - val_acc: 0.1277
Epoch 8/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2990 - acc: 0.1262 - val_loss: 2.2991 - val_acc: 0.1290
Epoch 9/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2990 - acc: 0.1290 - val_loss: 2.2991 - val_acc: 0.1291
Epoch 10/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2990 - acc: 0.1296 - val_loss: 2.2991 - val_acc: 0.1291
Epoch 11/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2990 - acc: 0.1286 - val_loss: 2.2991 - val_acc: 0.1295
Epoch 12/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2989 - acc: 0.1320 - val_loss: 2.2991 - val_acc: 0.1291
Epoch 13/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.2989 - acc: 0.1301 - val_loss: 2.2991 - val_acc: 0.1289
Epoch 14/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2989 - acc: 0.1279 - val_loss: 2.2991 - val_acc: 0.1287
Epoch 15/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2989 - acc: 0.1305 - val_loss: 2.2991 - val_acc: 0.1286
Epoch 16/100
42000/42000 [==============================] - 1s 28us/sample - loss: 2.2989 - acc: 0.1282 - val_loss: 2.2991 - val_acc: 0.1289
Epoch 17/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2989 - acc: 0.1305 - val_loss: 2.2991 - val_acc: 0.1286
Epoch 18/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2989 - acc: 0.1276 - val_loss: 2.2991 - val_acc: 0.1293
Epoch 19/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2989 - acc: 0.1284 - val_loss: 2.2991 - val_acc: 0.1300
Epoch 20/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.2989 - acc: 0.1305 - val_loss: 2.2991 - val_acc: 0.1300
Epoch 21/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2989 - acc: 0.1314 - val_loss: 2.2991 - val_acc: 0.1294
Epoch 22/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2989 - acc: 0.1301 - val_loss: 2.2991 - val_acc: 0.1296
Epoch 23/100
42000/42000 [==============================] - 1s 27us/sample - loss: 2.2989 - acc: 0.1304 - val_loss: 2.2991 - val_acc: 0.1293
Epoch 24/100
42000/42000 [==============================] - 1s 27us/sample - loss: 2.2989 - acc: 0.1315 - val_loss: 2.2991 - val_acc: 0.1279
Epoch 25/100
42000/42000 [==============================] - 1s 27us/sample - loss: 2.2989 - acc: 0.1285 - val_loss: 2.2991 - val_acc: 0.1289
Epoch 26/100
42000/42000 [==============================] - 1s 27us/sample - loss: 2.2989 - acc: 0.1281 - val_loss: 2.2991 - val_acc: 0.1292
Epoch 27/100
42000/42000 [==============================] - 1s 27us/sample - loss: 2.2989 - acc: 0.1303 - val_loss: 2.2991 - val_acc: 0.1293
Epoch 28/100
42000/42000 [==============================] - 1s 27us/sample - loss: 2.2989 - acc: 0.1315 - val_loss: 2.2991 - val_acc: 0.1281
Epoch 29/100
42000/42000 [==============================] - 1s 27us/sample - loss: 2.2989 - acc: 0.1288 - val_loss: 2.2991 - val_acc: 0.1285
Epoch 30/100
42000/42000 [==============================] - 1s 28us/sample - loss: 2.2989 - acc: 0.1285 - val_loss: 2.2990 - val_acc: 0.1287
Epoch 31/100
42000/42000 [==============================] - 1s 28us/sample - loss: 2.2989 - acc: 0.1293 - val_loss: 2.2990 - val_acc: 0.1290
Epoch 32/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2989 - acc: 0.1287 - val_loss: 2.2990 - val_acc: 0.1294
Epoch 33/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2989 - acc: 0.1316 - val_loss: 2.2990 - val_acc: 0.1288
Epoch 34/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2989 - acc: 0.1297 - val_loss: 2.2990 - val_acc: 0.1289
Epoch 35/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2989 - acc: 0.1300 - val_loss: 2.2990 - val_acc: 0.1294
Epoch 36/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.2989 - acc: 0.1313 - val_loss: 2.2990 - val_acc: 0.1283
Epoch 37/100
42000/42000 [==============================] - 1s 27us/sample - loss: 2.2988 - acc: 0.1293 - val_loss: 2.2990 - val_acc: 0.1284
Epoch 38/100
42000/42000 [==============================] - 1s 27us/sample - loss: 2.2988 - acc: 0.1300 - val_loss: 2.2990 - val_acc: 0.1288
Epoch 39/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2988 - acc: 0.1285 - val_loss: 2.2990 - val_acc: 0.1293
Epoch 40/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2988 - acc: 0.1292 - val_loss: 2.2990 - val_acc: 0.1298
Epoch 41/100
42000/42000 [==============================] - 1s 29us/sample - loss: 2.2988 - acc: 0.1310 - val_loss: 2.2990 - val_acc: 0.1293
Epoch 42/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2988 - acc: 0.1317 - val_loss: 2.2990 - val_acc: 0.1290
Epoch 43/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2988 - acc: 0.1301 - val_loss: 2.2990 - val_acc: 0.1291
Epoch 44/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2988 - acc: 0.1301 - val_loss: 2.2990 - val_acc: 0.1285
Epoch 45/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2988 - acc: 0.1310 - val_loss: 2.2990 - val_acc: 0.1288
Epoch 46/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2988 - acc: 0.1300 - val_loss: 2.2990 - val_acc: 0.1286
Epoch 47/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2988 - acc: 0.1310 - val_loss: 2.2990 - val_acc: 0.1283
Epoch 48/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2988 - acc: 0.1312 - val_loss: 2.2990 - val_acc: 0.1279
Epoch 49/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2988 - acc: 0.1260 - val_loss: 2.2990 - val_acc: 0.1295
Epoch 50/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2988 - acc: 0.1295 - val_loss: 2.2990 - val_acc: 0.1300
Epoch 51/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2988 - acc: 0.1297 - val_loss: 2.2990 - val_acc: 0.1302
Epoch 52/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.2988 - acc: 0.1319 - val_loss: 2.2990 - val_acc: 0.1297
Epoch 53/100
42000/42000 [==============================] - 1s 28us/sample - loss: 2.2988 - acc: 0.1313 - val_loss: 2.2990 - val_acc: 0.1287
Epoch 54/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2988 - acc: 0.1287 - val_loss: 2.2989 - val_acc: 0.1296
Epoch 55/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2988 - acc: 0.1314 - val_loss: 2.2989 - val_acc: 0.1295
Epoch 56/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2988 - acc: 0.1299 - val_loss: 2.2989 - val_acc: 0.1292
Epoch 57/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.2988 - acc: 0.1321 - val_loss: 2.2989 - val_acc: 0.1289
Epoch 58/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2988 - acc: 0.1298 - val_loss: 2.2989 - val_acc: 0.1293
Epoch 59/100
42000/42000 [==============================] - 1s 27us/sample - loss: 2.2988 - acc: 0.1293 - val_loss: 2.2989 - val_acc: 0.1301
Epoch 60/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2988 - acc: 0.1285 - val_loss: 2.2989 - val_acc: 0.1304
Epoch 61/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2987 - acc: 0.1317 - val_loss: 2.2989 - val_acc: 0.1305
Epoch 62/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.2987 - acc: 0.1312 - val_loss: 2.2989 - val_acc: 0.1306
Epoch 63/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2987 - acc: 0.1307 - val_loss: 2.2989 - val_acc: 0.1302
Epoch 64/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2987 - acc: 0.1331 - val_loss: 2.2989 - val_acc: 0.1294
Epoch 65/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2987 - acc: 0.1318 - val_loss: 2.2989 - val_acc: 0.1296
Epoch 66/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2987 - acc: 0.1304 - val_loss: 2.2989 - val_acc: 0.1296
Epoch 67/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2987 - acc: 0.1306 - val_loss: 2.2989 - val_acc: 0.1300
Epoch 68/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2987 - acc: 0.1323 - val_loss: 2.2989 - val_acc: 0.1293
Epoch 69/100
42000/42000 [==============================] - 1s 27us/sample - loss: 2.2987 - acc: 0.1300 - val_loss: 2.2989 - val_acc: 0.1298
Epoch 70/100
42000/42000 [==============================] - 1s 28us/sample - loss: 2.2987 - acc: 0.1308 - val_loss: 2.2989 - val_acc: 0.1302
Epoch 71/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.2987 - acc: 0.1307 - val_loss: 2.2989 - val_acc: 0.1304
Epoch 72/100
42000/42000 [==============================] - 1s 27us/sample - loss: 2.2987 - acc: 0.1316 - val_loss: 2.2989 - val_acc: 0.1309
Epoch 73/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.2987 - acc: 0.1319 - val_loss: 2.2989 - val_acc: 0.1301
Epoch 74/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.2987 - acc: 0.1331 - val_loss: 2.2989 - val_acc: 0.1287
Epoch 75/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2987 - acc: 0.1304 - val_loss: 2.2989 - val_acc: 0.1290
Epoch 76/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.2987 - acc: 0.1294 - val_loss: 2.2989 - val_acc: 0.1298
Epoch 77/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.2987 - acc: 0.1307 - val_loss: 2.2989 - val_acc: 0.1300
Epoch 78/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2987 - acc: 0.1292 - val_loss: 2.2989 - val_acc: 0.1303
Epoch 79/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2987 - acc: 0.1312 - val_loss: 2.2988 - val_acc: 0.1303
Epoch 80/100
42000/42000 [==============================] - 1s 27us/sample - loss: 2.2987 - acc: 0.1324 - val_loss: 2.2988 - val_acc: 0.1301
Epoch 81/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2987 - acc: 0.1315 - val_loss: 2.2988 - val_acc: 0.1308
Epoch 82/100
42000/42000 [==============================] - 1s 27us/sample - loss: 2.2987 - acc: 0.1318 - val_loss: 2.2988 - val_acc: 0.1305
Epoch 83/100
42000/42000 [==============================] - 1s 27us/sample - loss: 2.2987 - acc: 0.1308 - val_loss: 2.2988 - val_acc: 0.1310
Epoch 84/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2987 - acc: 0.1311 - val_loss: 2.2988 - val_acc: 0.1316
Epoch 85/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2986 - acc: 0.1324 - val_loss: 2.2988 - val_acc: 0.1314
Epoch 86/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2986 - acc: 0.1318 - val_loss: 2.2988 - val_acc: 0.1308
Epoch 87/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2986 - acc: 0.1334 - val_loss: 2.2988 - val_acc: 0.1299
Epoch 88/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2986 - acc: 0.1312 - val_loss: 2.2988 - val_acc: 0.1305
Epoch 89/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2986 - acc: 0.1306 - val_loss: 2.2988 - val_acc: 0.1307
Epoch 90/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2986 - acc: 0.1332 - val_loss: 2.2988 - val_acc: 0.1297
Epoch 91/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2986 - acc: 0.1320 - val_loss: 2.2988 - val_acc: 0.1299
Epoch 92/100
42000/42000 [==============================] - 1s 29us/sample - loss: 2.2986 - acc: 0.1315 - val_loss: 2.2988 - val_acc: 0.1297
Epoch 93/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.2986 - acc: 0.1313 - val_loss: 2.2988 - val_acc: 0.1295
Epoch 94/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2986 - acc: 0.1301 - val_loss: 2.2988 - val_acc: 0.1297
Epoch 95/100
42000/42000 [==============================] - 1s 28us/sample - loss: 2.2986 - acc: 0.1307 - val_loss: 2.2988 - val_acc: 0.1302
Epoch 96/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2986 - acc: 0.1309 - val_loss: 2.2988 - val_acc: 0.1303
Epoch 97/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.2986 - acc: 0.1324 - val_loss: 2.2988 - val_acc: 0.1299
Epoch 98/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2986 - acc: 0.1320 - val_loss: 2.2988 - val_acc: 0.1295
Epoch 99/100
42000/42000 [==============================] - 1s 27us/sample - loss: 2.2986 - acc: 0.1299 - val_loss: 2.2988 - val_acc: 0.1310
Epoch 100/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2986 - acc: 0.1308 - val_loss: 2.2988 - val_acc: 0.1307
In [35]:
print('Evaluate NN model with sigmoid activations - changing learning rate'); print('--'*40)
results1 = model1.evaluate(X_val, y_val)
print('Validation accuracy: {}'.format(round(results1[1]*100, 2), '%'))
Evaluate NN model with sigmoid activations - changing learning rate
--------------------------------------------------------------------------------
60000/60000 [==============================] - 3s 45us/sample - loss: 2.2988 - acc: 0.1307
Validation accuracy: 13.07

Observation 3 - NN model with sigmoid activations
  • Validation score is very low, changing learning rate further reduces it.
  • Optimizing the network in order to better learn the patterns in the dataset.
  • Best model out of the above is the one with lower learning rate using SGD optimizer and sigmoid activations.
  • Next, let's use relu activations and see if the score improves.

NN model, relu activations, SGD optimizer

In [36]:
%time
print('NN model with relu activations and sgd optimizers'); print('--'*40)
# Initialize the neural network classifier
model2 = Sequential()

# Input Layer - adding input layer and activation functions relu
model2.add(Dense(128, input_shape = (1024, )))
# Adding activation function
model2.add(Activation('relu'))

#Hidden Layer 1 - adding first hidden layer
model2.add(Dense(64))
# Adding activation function
model2.add(Activation('relu'))

# Output Layer - adding output layer which is of 10 nodes (digits)
model2.add(Dense(10))
# Adding activation function - softmax for multiclass classification
model2.add(Activation('softmax'))
CPU times: user 2 µs, sys: 1 µs, total: 3 µs
Wall time: 6.68 µs
NN model with relu activations and sgd optimizers
--------------------------------------------------------------------------------
In [37]:
model2.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_3 (Dense)              (None, 128)               131200    
_________________________________________________________________
activation_3 (Activation)    (None, 128)               0         
_________________________________________________________________
dense_4 (Dense)              (None, 64)                8256      
_________________________________________________________________
activation_4 (Activation)    (None, 64)                0         
_________________________________________________________________
dense_5 (Dense)              (None, 10)                650       
_________________________________________________________________
activation_5 (Activation)    (None, 10)                0         
=================================================================
Total params: 140,106
Trainable params: 140,106
Non-trainable params: 0
_________________________________________________________________
In [38]:
# compiling the neural network classifier, sgd optimizer
sgd = optimizers.SGD(lr = 0.01)
model2.compile(optimizer = sgd, loss = 'categorical_crossentropy', metrics = ['accuracy'])

# Fitting the neural network for training
history = model2.fit(X_train, y_train, validation_data = (X_val, y_val), batch_size = 200, epochs = 100, verbose = 1)
Train on 42000 samples, validate on 60000 samples
Epoch 1/100
42000/42000 [==============================] - 1s 27us/sample - loss: 2.3001 - acc: 0.1166 - val_loss: 2.2903 - val_acc: 0.1328
Epoch 2/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2854 - acc: 0.1465 - val_loss: 2.2786 - val_acc: 0.1673
Epoch 3/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2722 - acc: 0.1781 - val_loss: 2.2643 - val_acc: 0.1904
Epoch 4/100
42000/42000 [==============================] - 1s 25us/sample - loss: 2.2561 - acc: 0.2140 - val_loss: 2.2467 - val_acc: 0.2199
Epoch 5/100
42000/42000 [==============================] - 1s 28us/sample - loss: 2.2369 - acc: 0.2507 - val_loss: 2.2263 - val_acc: 0.2579
Epoch 6/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.2128 - acc: 0.2883 - val_loss: 2.1973 - val_acc: 0.3150
Epoch 7/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.1828 - acc: 0.3158 - val_loss: 2.1645 - val_acc: 0.3450
Epoch 8/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.1463 - acc: 0.3465 - val_loss: 2.1254 - val_acc: 0.3584
Epoch 9/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.1025 - acc: 0.3752 - val_loss: 2.0772 - val_acc: 0.3822
Epoch 10/100
42000/42000 [==============================] - 1s 26us/sample - loss: 2.0490 - acc: 0.4008 - val_loss: 2.0188 - val_acc: 0.4164
Epoch 11/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.9873 - acc: 0.4255 - val_loss: 1.9517 - val_acc: 0.4352
Epoch 12/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.9183 - acc: 0.4470 - val_loss: 1.8801 - val_acc: 0.4531
Epoch 13/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.8455 - acc: 0.4674 - val_loss: 1.8076 - val_acc: 0.4808
Epoch 14/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.7732 - acc: 0.4886 - val_loss: 1.7343 - val_acc: 0.4963
Epoch 15/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.7041 - acc: 0.5085 - val_loss: 1.6676 - val_acc: 0.5157
Epoch 16/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.6407 - acc: 0.5236 - val_loss: 1.6103 - val_acc: 0.5370
Epoch 17/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.5833 - acc: 0.5417 - val_loss: 1.5525 - val_acc: 0.5501
Epoch 18/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.5326 - acc: 0.5534 - val_loss: 1.5026 - val_acc: 0.5629
Epoch 19/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.4851 - acc: 0.5659 - val_loss: 1.4584 - val_acc: 0.5728
Epoch 20/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.4431 - acc: 0.5756 - val_loss: 1.4193 - val_acc: 0.5908
Epoch 21/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.4061 - acc: 0.5859 - val_loss: 1.3822 - val_acc: 0.5966
Epoch 22/100
42000/42000 [==============================] - 1s 28us/sample - loss: 1.3737 - acc: 0.5931 - val_loss: 1.3552 - val_acc: 0.5914
Epoch 23/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.3442 - acc: 0.6000 - val_loss: 1.3209 - val_acc: 0.6088
Epoch 24/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.3146 - acc: 0.6076 - val_loss: 1.2990 - val_acc: 0.6103
Epoch 25/100
42000/42000 [==============================] - 1s 25us/sample - loss: 1.2933 - acc: 0.6137 - val_loss: 1.2714 - val_acc: 0.6263
Epoch 26/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.2681 - acc: 0.6206 - val_loss: 1.2635 - val_acc: 0.6183
Epoch 27/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.2485 - acc: 0.6246 - val_loss: 1.2253 - val_acc: 0.6389
Epoch 28/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.2259 - acc: 0.6333 - val_loss: 1.2039 - val_acc: 0.6449
Epoch 29/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.2094 - acc: 0.6355 - val_loss: 1.1899 - val_acc: 0.6465
Epoch 30/100
42000/42000 [==============================] - 1s 25us/sample - loss: 1.1919 - acc: 0.6437 - val_loss: 1.1801 - val_acc: 0.6503
Epoch 31/100
42000/42000 [==============================] - 1s 25us/sample - loss: 1.1786 - acc: 0.6477 - val_loss: 1.1587 - val_acc: 0.6580
Epoch 32/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.1616 - acc: 0.6523 - val_loss: 1.1568 - val_acc: 0.6514
Epoch 33/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.1512 - acc: 0.6531 - val_loss: 1.1345 - val_acc: 0.6618
Epoch 34/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.1352 - acc: 0.6613 - val_loss: 1.1188 - val_acc: 0.6700
Epoch 35/100
42000/42000 [==============================] - 1s 25us/sample - loss: 1.1242 - acc: 0.6632 - val_loss: 1.1040 - val_acc: 0.6727
Epoch 36/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.1088 - acc: 0.6689 - val_loss: 1.0946 - val_acc: 0.6725
Epoch 37/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0974 - acc: 0.6719 - val_loss: 1.0941 - val_acc: 0.6751
Epoch 38/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0869 - acc: 0.6758 - val_loss: 1.0704 - val_acc: 0.6830
Epoch 39/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0775 - acc: 0.6783 - val_loss: 1.0789 - val_acc: 0.6772
Epoch 40/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0685 - acc: 0.6812 - val_loss: 1.0560 - val_acc: 0.6874
Epoch 41/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0565 - acc: 0.6835 - val_loss: 1.0430 - val_acc: 0.6918
Epoch 42/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0457 - acc: 0.6873 - val_loss: 1.0934 - val_acc: 0.6640
Epoch 43/100
42000/42000 [==============================] - 1s 25us/sample - loss: 1.0348 - acc: 0.6916 - val_loss: 1.0173 - val_acc: 0.6985
Epoch 44/100
42000/42000 [==============================] - 1s 29us/sample - loss: 1.0230 - acc: 0.6930 - val_loss: 1.0077 - val_acc: 0.7038
Epoch 45/100
42000/42000 [==============================] - 1s 25us/sample - loss: 1.0125 - acc: 0.6962 - val_loss: 0.9966 - val_acc: 0.7057
Epoch 46/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0083 - acc: 0.6993 - val_loss: 1.0228 - val_acc: 0.6914
Epoch 47/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.9951 - acc: 0.7034 - val_loss: 1.0381 - val_acc: 0.6836
Epoch 48/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.9866 - acc: 0.7040 - val_loss: 0.9950 - val_acc: 0.7046
Epoch 49/100
42000/42000 [==============================] - 1s 25us/sample - loss: 0.9829 - acc: 0.7075 - val_loss: 0.9727 - val_acc: 0.7103
Epoch 50/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.9720 - acc: 0.7098 - val_loss: 0.9611 - val_acc: 0.7149
Epoch 51/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.9621 - acc: 0.7136 - val_loss: 0.9521 - val_acc: 0.7190
Epoch 52/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.9557 - acc: 0.7144 - val_loss: 0.9603 - val_acc: 0.7140
Epoch 53/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.9464 - acc: 0.7175 - val_loss: 0.9550 - val_acc: 0.7157
Epoch 54/100
42000/42000 [==============================] - 1s 25us/sample - loss: 0.9408 - acc: 0.7178 - val_loss: 0.9284 - val_acc: 0.7271
Epoch 55/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.9319 - acc: 0.7215 - val_loss: 0.9394 - val_acc: 0.7183
Epoch 56/100
42000/42000 [==============================] - 1s 25us/sample - loss: 0.9281 - acc: 0.7238 - val_loss: 0.9207 - val_acc: 0.7272
Epoch 57/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.9127 - acc: 0.7280 - val_loss: 0.9036 - val_acc: 0.7338
Epoch 58/100
42000/42000 [==============================] - 1s 25us/sample - loss: 0.9087 - acc: 0.7296 - val_loss: 0.9100 - val_acc: 0.7281
Epoch 59/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.9039 - acc: 0.7315 - val_loss: 0.9017 - val_acc: 0.7313
Epoch 60/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.8953 - acc: 0.7330 - val_loss: 0.8901 - val_acc: 0.7363
Epoch 61/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8903 - acc: 0.7342 - val_loss: 0.9211 - val_acc: 0.7243
Epoch 62/100
42000/42000 [==============================] - 1s 25us/sample - loss: 0.8842 - acc: 0.7363 - val_loss: 0.8956 - val_acc: 0.7299
Epoch 63/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8779 - acc: 0.7378 - val_loss: 0.8624 - val_acc: 0.7475
Epoch 64/100
42000/42000 [==============================] - 1s 25us/sample - loss: 0.8720 - acc: 0.7404 - val_loss: 0.8584 - val_acc: 0.7469
Epoch 65/100
42000/42000 [==============================] - 1s 25us/sample - loss: 0.8632 - acc: 0.7435 - val_loss: 0.8876 - val_acc: 0.7364
Epoch 66/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8595 - acc: 0.7448 - val_loss: 0.8548 - val_acc: 0.7465
Epoch 67/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8490 - acc: 0.7484 - val_loss: 0.8642 - val_acc: 0.7442
Epoch 68/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8486 - acc: 0.7457 - val_loss: 0.8481 - val_acc: 0.7492
Epoch 69/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8393 - acc: 0.7504 - val_loss: 0.8327 - val_acc: 0.7544
Epoch 70/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8353 - acc: 0.7501 - val_loss: 0.8289 - val_acc: 0.7560
Epoch 71/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8261 - acc: 0.7539 - val_loss: 0.8208 - val_acc: 0.7594
Epoch 72/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8244 - acc: 0.7542 - val_loss: 0.8829 - val_acc: 0.7296
Epoch 73/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8219 - acc: 0.7559 - val_loss: 0.8300 - val_acc: 0.7531
Epoch 74/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8139 - acc: 0.7574 - val_loss: 0.8140 - val_acc: 0.7588
Epoch 75/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8078 - acc: 0.7588 - val_loss: 0.8330 - val_acc: 0.7498
Epoch 76/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8063 - acc: 0.7593 - val_loss: 0.7989 - val_acc: 0.7649
Epoch 77/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.8026 - acc: 0.7609 - val_loss: 0.8428 - val_acc: 0.7449
Epoch 78/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.7969 - acc: 0.7625 - val_loss: 0.8424 - val_acc: 0.7461
Epoch 79/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.7871 - acc: 0.7662 - val_loss: 0.8011 - val_acc: 0.7637
Epoch 80/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.7874 - acc: 0.7667 - val_loss: 0.7827 - val_acc: 0.7681
Epoch 81/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.7836 - acc: 0.7662 - val_loss: 0.7725 - val_acc: 0.7734
Epoch 82/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.7790 - acc: 0.7682 - val_loss: 0.7892 - val_acc: 0.7655
Epoch 83/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.7736 - acc: 0.7713 - val_loss: 0.7670 - val_acc: 0.7764
Epoch 84/100
42000/42000 [==============================] - 1s 25us/sample - loss: 0.7643 - acc: 0.7731 - val_loss: 0.7617 - val_acc: 0.7786
Epoch 85/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.7653 - acc: 0.7727 - val_loss: 0.7642 - val_acc: 0.7736
Epoch 86/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.7625 - acc: 0.7740 - val_loss: 0.7830 - val_acc: 0.7649
Epoch 87/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.7564 - acc: 0.7759 - val_loss: 0.7579 - val_acc: 0.7788
Epoch 88/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.7540 - acc: 0.7751 - val_loss: 0.7516 - val_acc: 0.7801
Epoch 89/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.7418 - acc: 0.7804 - val_loss: 0.7488 - val_acc: 0.7813
Epoch 90/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.7452 - acc: 0.7795 - val_loss: 0.7650 - val_acc: 0.7742
Epoch 91/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.7419 - acc: 0.7801 - val_loss: 0.7780 - val_acc: 0.7645
Epoch 92/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.7399 - acc: 0.7804 - val_loss: 0.7546 - val_acc: 0.7770
Epoch 93/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.7336 - acc: 0.7807 - val_loss: 0.7553 - val_acc: 0.7751
Epoch 94/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.7345 - acc: 0.7808 - val_loss: 0.7326 - val_acc: 0.7833
Epoch 95/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.7269 - acc: 0.7834 - val_loss: 0.7250 - val_acc: 0.7897
Epoch 96/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.7175 - acc: 0.7872 - val_loss: 0.7265 - val_acc: 0.7863
Epoch 97/100
42000/42000 [==============================] - 1s 25us/sample - loss: 0.7217 - acc: 0.7840 - val_loss: 0.7476 - val_acc: 0.7760
Epoch 98/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.7156 - acc: 0.7873 - val_loss: 0.7110 - val_acc: 0.7939
Epoch 99/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.7118 - acc: 0.7906 - val_loss: 0.7762 - val_acc: 0.7646
Epoch 100/100
42000/42000 [==============================] - 1s 25us/sample - loss: 0.7070 - acc: 0.7895 - val_loss: 0.7015 - val_acc: 0.7965
In [39]:
print('Evaluate NN model with relu activations'); print('--'*40)
results2 = model2.evaluate(X_val, y_val)
print('Validation accuracy: {}'.format(round(results2[1]*100, 2), '%'))
Evaluate NN model with relu activations
--------------------------------------------------------------------------------
60000/60000 [==============================] - 3s 48us/sample - loss: 0.7015 - acc: 0.7965
Validation accuracy: 79.65

NN model, relu activations, SGD optimizer, changing learning rate

In [40]:
%time
print('NN model with relu activations and sgd optimizers - changing learning rate'); print('--'*40)
# compiling the neural network classifier, sgd optimizer
sgd = optimizers.SGD(lr = 0.001)
model2.compile(optimizer = sgd, loss = 'categorical_crossentropy', metrics = ['accuracy'])

# Fitting the neural network for training
history = model2.fit(X_train, y_train, validation_data = (X_val, y_val), batch_size = 200, epochs = 100, verbose = 1)
CPU times: user 2 µs, sys: 1e+03 ns, total: 3 µs
Wall time: 15.5 µs
NN model with relu activations and sgd optimizers - changing learning rate
--------------------------------------------------------------------------------
Train on 42000 samples, validate on 60000 samples
Epoch 1/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.6704 - acc: 0.8051 - val_loss: 0.6922 - val_acc: 0.8003
Epoch 2/100
42000/42000 [==============================] - 1s 25us/sample - loss: 0.6680 - acc: 0.8054 - val_loss: 0.6917 - val_acc: 0.7996
Epoch 3/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6679 - acc: 0.8055 - val_loss: 0.6901 - val_acc: 0.8005
Epoch 4/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6671 - acc: 0.8058 - val_loss: 0.6899 - val_acc: 0.7997
Epoch 5/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6665 - acc: 0.8065 - val_loss: 0.6900 - val_acc: 0.8002
Epoch 6/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6665 - acc: 0.8061 - val_loss: 0.6891 - val_acc: 0.8007
Epoch 7/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6659 - acc: 0.8061 - val_loss: 0.6901 - val_acc: 0.8005
Epoch 8/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6655 - acc: 0.8067 - val_loss: 0.6885 - val_acc: 0.8010
Epoch 9/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6651 - acc: 0.8063 - val_loss: 0.6880 - val_acc: 0.8008
Epoch 10/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6648 - acc: 0.8069 - val_loss: 0.6878 - val_acc: 0.8016
Epoch 11/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6643 - acc: 0.8066 - val_loss: 0.6878 - val_acc: 0.8010
Epoch 12/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.6637 - acc: 0.8069 - val_loss: 0.6871 - val_acc: 0.8013
Epoch 13/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6638 - acc: 0.8066 - val_loss: 0.6866 - val_acc: 0.8017
Epoch 14/100
42000/42000 [==============================] - 1s 25us/sample - loss: 0.6632 - acc: 0.8067 - val_loss: 0.6865 - val_acc: 0.8019
Epoch 15/100
42000/42000 [==============================] - 1s 25us/sample - loss: 0.6627 - acc: 0.8073 - val_loss: 0.6869 - val_acc: 0.8012
Epoch 16/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6625 - acc: 0.8076 - val_loss: 0.6858 - val_acc: 0.8022
Epoch 17/100
42000/42000 [==============================] - 1s 25us/sample - loss: 0.6623 - acc: 0.8075 - val_loss: 0.6851 - val_acc: 0.8023
Epoch 18/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6619 - acc: 0.8079 - val_loss: 0.6852 - val_acc: 0.8011
Epoch 19/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6611 - acc: 0.8081 - val_loss: 0.6861 - val_acc: 0.8014
Epoch 20/100
42000/42000 [==============================] - 1s 25us/sample - loss: 0.6610 - acc: 0.8076 - val_loss: 0.6847 - val_acc: 0.8024
Epoch 21/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6606 - acc: 0.8094 - val_loss: 0.6845 - val_acc: 0.8022
Epoch 22/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6603 - acc: 0.8082 - val_loss: 0.6841 - val_acc: 0.8021
Epoch 23/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6602 - acc: 0.8080 - val_loss: 0.6833 - val_acc: 0.8022
Epoch 24/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6594 - acc: 0.8084 - val_loss: 0.6835 - val_acc: 0.8027
Epoch 25/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6592 - acc: 0.8084 - val_loss: 0.6834 - val_acc: 0.8022
Epoch 26/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6587 - acc: 0.8086 - val_loss: 0.6830 - val_acc: 0.8022
Epoch 27/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6585 - acc: 0.8086 - val_loss: 0.6827 - val_acc: 0.8020
Epoch 28/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.6582 - acc: 0.8083 - val_loss: 0.6815 - val_acc: 0.8029
Epoch 29/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6578 - acc: 0.8085 - val_loss: 0.6820 - val_acc: 0.8035
Epoch 30/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6573 - acc: 0.8087 - val_loss: 0.6811 - val_acc: 0.8037
Epoch 31/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6568 - acc: 0.8087 - val_loss: 0.6810 - val_acc: 0.8034
Epoch 32/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6568 - acc: 0.8094 - val_loss: 0.6814 - val_acc: 0.8037
Epoch 33/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6561 - acc: 0.8095 - val_loss: 0.6820 - val_acc: 0.8025
Epoch 34/100
42000/42000 [==============================] - 1s 25us/sample - loss: 0.6561 - acc: 0.8091 - val_loss: 0.6800 - val_acc: 0.8039
Epoch 35/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6556 - acc: 0.8095 - val_loss: 0.6815 - val_acc: 0.8023
Epoch 36/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6553 - acc: 0.8102 - val_loss: 0.6793 - val_acc: 0.8039
Epoch 37/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6552 - acc: 0.8091 - val_loss: 0.6786 - val_acc: 0.8048
Epoch 38/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6545 - acc: 0.8091 - val_loss: 0.6787 - val_acc: 0.8039
Epoch 39/100
42000/42000 [==============================] - 1s 25us/sample - loss: 0.6544 - acc: 0.8102 - val_loss: 0.6791 - val_acc: 0.8040
Epoch 40/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6538 - acc: 0.8104 - val_loss: 0.6795 - val_acc: 0.8035
Epoch 41/100
42000/42000 [==============================] - 1s 25us/sample - loss: 0.6536 - acc: 0.8103 - val_loss: 0.6788 - val_acc: 0.8041
Epoch 42/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6531 - acc: 0.8098 - val_loss: 0.6793 - val_acc: 0.8042
Epoch 43/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6528 - acc: 0.8108 - val_loss: 0.6783 - val_acc: 0.8041
Epoch 44/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6525 - acc: 0.8099 - val_loss: 0.6770 - val_acc: 0.8048
Epoch 45/100
42000/42000 [==============================] - 1s 25us/sample - loss: 0.6522 - acc: 0.8120 - val_loss: 0.6772 - val_acc: 0.8040
Epoch 46/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6518 - acc: 0.8104 - val_loss: 0.6773 - val_acc: 0.8048
Epoch 47/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6514 - acc: 0.8103 - val_loss: 0.6762 - val_acc: 0.8050
Epoch 48/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6512 - acc: 0.8111 - val_loss: 0.6762 - val_acc: 0.8047
Epoch 49/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6509 - acc: 0.8113 - val_loss: 0.6750 - val_acc: 0.8047
Epoch 50/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6503 - acc: 0.8108 - val_loss: 0.6763 - val_acc: 0.8041
Epoch 51/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6502 - acc: 0.8111 - val_loss: 0.6752 - val_acc: 0.8049
Epoch 52/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6497 - acc: 0.8110 - val_loss: 0.6747 - val_acc: 0.8061
Epoch 53/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6495 - acc: 0.8118 - val_loss: 0.6739 - val_acc: 0.8058
Epoch 54/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.6493 - acc: 0.8111 - val_loss: 0.6742 - val_acc: 0.8049
Epoch 55/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6491 - acc: 0.8117 - val_loss: 0.6735 - val_acc: 0.8054
Epoch 56/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6483 - acc: 0.8115 - val_loss: 0.6735 - val_acc: 0.8059
Epoch 57/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6482 - acc: 0.8117 - val_loss: 0.6736 - val_acc: 0.8061
Epoch 58/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6479 - acc: 0.8123 - val_loss: 0.6735 - val_acc: 0.8053
Epoch 59/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6475 - acc: 0.8125 - val_loss: 0.6731 - val_acc: 0.8060
Epoch 60/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6472 - acc: 0.8123 - val_loss: 0.6724 - val_acc: 0.8061
Epoch 61/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6469 - acc: 0.8119 - val_loss: 0.6721 - val_acc: 0.8064
Epoch 62/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6465 - acc: 0.8119 - val_loss: 0.6723 - val_acc: 0.8060
Epoch 63/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6463 - acc: 0.8126 - val_loss: 0.6713 - val_acc: 0.8056
Epoch 64/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6460 - acc: 0.8130 - val_loss: 0.6707 - val_acc: 0.8064
Epoch 65/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6454 - acc: 0.8123 - val_loss: 0.6711 - val_acc: 0.8061
Epoch 66/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.6453 - acc: 0.8136 - val_loss: 0.6703 - val_acc: 0.8069
Epoch 67/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6449 - acc: 0.8130 - val_loss: 0.6699 - val_acc: 0.8062
Epoch 68/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6446 - acc: 0.8127 - val_loss: 0.6698 - val_acc: 0.8075
Epoch 69/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6442 - acc: 0.8132 - val_loss: 0.6693 - val_acc: 0.8071
Epoch 70/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6437 - acc: 0.8129 - val_loss: 0.6696 - val_acc: 0.8058
Epoch 71/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6437 - acc: 0.8131 - val_loss: 0.6694 - val_acc: 0.8057
Epoch 72/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6433 - acc: 0.8131 - val_loss: 0.6692 - val_acc: 0.8067
Epoch 73/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6430 - acc: 0.8137 - val_loss: 0.6687 - val_acc: 0.8073
Epoch 74/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6428 - acc: 0.8131 - val_loss: 0.6690 - val_acc: 0.8073
Epoch 75/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6425 - acc: 0.8139 - val_loss: 0.6676 - val_acc: 0.8073
Epoch 76/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6419 - acc: 0.8140 - val_loss: 0.6672 - val_acc: 0.8077
Epoch 77/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6418 - acc: 0.8141 - val_loss: 0.6672 - val_acc: 0.8074
Epoch 78/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6415 - acc: 0.8141 - val_loss: 0.6676 - val_acc: 0.8061
Epoch 79/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6411 - acc: 0.8145 - val_loss: 0.6669 - val_acc: 0.8072
Epoch 80/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6409 - acc: 0.8139 - val_loss: 0.6685 - val_acc: 0.8061
Epoch 81/100
42000/42000 [==============================] - 1s 25us/sample - loss: 0.6408 - acc: 0.8138 - val_loss: 0.6668 - val_acc: 0.8075
Epoch 82/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6404 - acc: 0.8141 - val_loss: 0.6655 - val_acc: 0.8078
Epoch 83/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.6398 - acc: 0.8137 - val_loss: 0.6651 - val_acc: 0.8077
Epoch 84/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6394 - acc: 0.8151 - val_loss: 0.6666 - val_acc: 0.8076
Epoch 85/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6395 - acc: 0.8140 - val_loss: 0.6655 - val_acc: 0.8083
Epoch 86/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6392 - acc: 0.8147 - val_loss: 0.6653 - val_acc: 0.8075
Epoch 87/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6387 - acc: 0.8147 - val_loss: 0.6643 - val_acc: 0.8076
Epoch 88/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6384 - acc: 0.8151 - val_loss: 0.6646 - val_acc: 0.8078
Epoch 89/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6380 - acc: 0.8145 - val_loss: 0.6636 - val_acc: 0.8078
Epoch 90/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6375 - acc: 0.8148 - val_loss: 0.6638 - val_acc: 0.8083
Epoch 91/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6376 - acc: 0.8157 - val_loss: 0.6637 - val_acc: 0.8088
Epoch 92/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6370 - acc: 0.8152 - val_loss: 0.6632 - val_acc: 0.8087
Epoch 93/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6367 - acc: 0.8152 - val_loss: 0.6636 - val_acc: 0.8081
Epoch 94/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6367 - acc: 0.8140 - val_loss: 0.6623 - val_acc: 0.8089
Epoch 95/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6362 - acc: 0.8158 - val_loss: 0.6631 - val_acc: 0.8092
Epoch 96/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6358 - acc: 0.8150 - val_loss: 0.6630 - val_acc: 0.8080
Epoch 97/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6358 - acc: 0.8151 - val_loss: 0.6615 - val_acc: 0.8091
Epoch 98/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6353 - acc: 0.8160 - val_loss: 0.6622 - val_acc: 0.8086
Epoch 99/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.6350 - acc: 0.8152 - val_loss: 0.6614 - val_acc: 0.8087
Epoch 100/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6348 - acc: 0.8156 - val_loss: 0.6611 - val_acc: 0.8088
In [41]:
print('Evaluate NN model with relu activations'); print('--'*40)
results2 = model2.evaluate(X_val, y_val)
print('Validation accuracy: {}'.format(round(results2[1]*100, 2), '%'))
Evaluate NN model with relu activations
--------------------------------------------------------------------------------
60000/60000 [==============================] - 3s 51us/sample - loss: 0.6611 - acc: 0.8088
Validation accuracy: 80.88

NN model, relu activations, adam optimizer

In [42]:
%time
print('NN model with relu activations and adam optimizer'); print('--'*40)
# compiling the neural network classifier, adam optimizer
adam = optimizers.Adam(lr = 0.01)
model2.compile(optimizer = adam, loss = 'categorical_crossentropy', metrics = ['accuracy'])

# Fitting the neural network for training
history = model2.fit(X_train, y_train, validation_data = (X_val, y_val), batch_size = 200, epochs = 100, verbose = 1)
CPU times: user 2 µs, sys: 0 ns, total: 2 µs
Wall time: 5.01 µs
NN model with relu activations and adam optimizer
--------------------------------------------------------------------------------
Train on 42000 samples, validate on 60000 samples
Epoch 1/100
42000/42000 [==============================] - 1s 34us/sample - loss: 3.3943 - acc: 0.1031 - val_loss: 2.3027 - val_acc: 0.1000
Epoch 2/100
42000/42000 [==============================] - 1s 30us/sample - loss: 2.3013 - acc: 0.1004 - val_loss: 2.3027 - val_acc: 0.1014
Epoch 3/100
42000/42000 [==============================] - 1s 29us/sample - loss: 2.2895 - acc: 0.1191 - val_loss: 2.2743 - val_acc: 0.1331
Epoch 4/100
42000/42000 [==============================] - 1s 31us/sample - loss: 2.2683 - acc: 0.1293 - val_loss: 2.2518 - val_acc: 0.1432
Epoch 5/100
42000/42000 [==============================] - 1s 28us/sample - loss: 2.1576 - acc: 0.1853 - val_loss: 2.0355 - val_acc: 0.2498
Epoch 6/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.8345 - acc: 0.3417 - val_loss: 1.6677 - val_acc: 0.4254
Epoch 7/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.5568 - acc: 0.4742 - val_loss: 1.4540 - val_acc: 0.5070
Epoch 8/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.3969 - acc: 0.5453 - val_loss: 1.4334 - val_acc: 0.5188
Epoch 9/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.3505 - acc: 0.5663 - val_loss: 1.3200 - val_acc: 0.5760
Epoch 10/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.2855 - acc: 0.5901 - val_loss: 1.2031 - val_acc: 0.6226
Epoch 11/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.2371 - acc: 0.6069 - val_loss: 1.2411 - val_acc: 0.6005
Epoch 12/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.2108 - acc: 0.6166 - val_loss: 1.2598 - val_acc: 0.5979
Epoch 13/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.2105 - acc: 0.6179 - val_loss: 1.2068 - val_acc: 0.6124
Epoch 14/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.1587 - acc: 0.6356 - val_loss: 1.1786 - val_acc: 0.6265
Epoch 15/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.1466 - acc: 0.6399 - val_loss: 1.0966 - val_acc: 0.6581
Epoch 16/100
42000/42000 [==============================] - 1s 29us/sample - loss: 1.1529 - acc: 0.6390 - val_loss: 1.2340 - val_acc: 0.6092
Epoch 17/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.1638 - acc: 0.6342 - val_loss: 1.1474 - val_acc: 0.6446
Epoch 18/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.1361 - acc: 0.6437 - val_loss: 1.1375 - val_acc: 0.6424
Epoch 19/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.1319 - acc: 0.6441 - val_loss: 1.1487 - val_acc: 0.6376
Epoch 20/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.1126 - acc: 0.6519 - val_loss: 1.0731 - val_acc: 0.6636
Epoch 21/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.1100 - acc: 0.6529 - val_loss: 1.1204 - val_acc: 0.6526
Epoch 22/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.1103 - acc: 0.6528 - val_loss: 1.1775 - val_acc: 0.6292
Epoch 23/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.1069 - acc: 0.6525 - val_loss: 1.0985 - val_acc: 0.6520
Epoch 24/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0999 - acc: 0.6546 - val_loss: 1.0967 - val_acc: 0.6591
Epoch 25/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0973 - acc: 0.6555 - val_loss: 1.0405 - val_acc: 0.6792
Epoch 26/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.1014 - acc: 0.6541 - val_loss: 1.1677 - val_acc: 0.6265
Epoch 27/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0893 - acc: 0.6584 - val_loss: 1.0540 - val_acc: 0.6711
Epoch 28/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0889 - acc: 0.6568 - val_loss: 1.0725 - val_acc: 0.6622
Epoch 29/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0825 - acc: 0.6615 - val_loss: 1.1438 - val_acc: 0.6394
Epoch 30/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0843 - acc: 0.6596 - val_loss: 1.0277 - val_acc: 0.6816
Epoch 31/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0669 - acc: 0.6661 - val_loss: 1.0409 - val_acc: 0.6762
Epoch 32/100
42000/42000 [==============================] - 1s 29us/sample - loss: 1.0803 - acc: 0.6599 - val_loss: 1.0849 - val_acc: 0.6575
Epoch 33/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0737 - acc: 0.6627 - val_loss: 1.1215 - val_acc: 0.6391
Epoch 34/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0673 - acc: 0.6649 - val_loss: 1.0624 - val_acc: 0.6672
Epoch 35/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0648 - acc: 0.6656 - val_loss: 1.0486 - val_acc: 0.6743
Epoch 36/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0647 - acc: 0.6647 - val_loss: 1.0660 - val_acc: 0.6658
Epoch 37/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0650 - acc: 0.6636 - val_loss: 1.0980 - val_acc: 0.6513
Epoch 38/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0624 - acc: 0.6673 - val_loss: 1.0543 - val_acc: 0.6686
Epoch 39/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0690 - acc: 0.6653 - val_loss: 1.0335 - val_acc: 0.6793
Epoch 40/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0487 - acc: 0.6729 - val_loss: 1.0449 - val_acc: 0.6762
Epoch 41/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0556 - acc: 0.6700 - val_loss: 1.0483 - val_acc: 0.6722
Epoch 42/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0526 - acc: 0.6711 - val_loss: 1.0227 - val_acc: 0.6811
Epoch 43/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0472 - acc: 0.6733 - val_loss: 1.0363 - val_acc: 0.6783
Epoch 44/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0532 - acc: 0.6717 - val_loss: 1.0223 - val_acc: 0.6808
Epoch 45/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0416 - acc: 0.6732 - val_loss: 1.0771 - val_acc: 0.6583
Epoch 46/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0372 - acc: 0.6736 - val_loss: 1.0675 - val_acc: 0.6671
Epoch 47/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0606 - acc: 0.6655 - val_loss: 1.0282 - val_acc: 0.6787
Epoch 48/100
42000/42000 [==============================] - 1s 28us/sample - loss: 1.0661 - acc: 0.6638 - val_loss: 1.0242 - val_acc: 0.6798
Epoch 49/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0452 - acc: 0.6718 - val_loss: 1.0708 - val_acc: 0.6636
Epoch 50/100
42000/42000 [==============================] - 1s 28us/sample - loss: 1.0386 - acc: 0.6764 - val_loss: 1.0317 - val_acc: 0.6770
Epoch 51/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0393 - acc: 0.6755 - val_loss: 1.0551 - val_acc: 0.6654
Epoch 52/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0469 - acc: 0.6717 - val_loss: 1.0547 - val_acc: 0.6689
Epoch 53/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0485 - acc: 0.6729 - val_loss: 1.0864 - val_acc: 0.6599
Epoch 54/100
42000/42000 [==============================] - 1s 28us/sample - loss: 1.0401 - acc: 0.6743 - val_loss: 0.9906 - val_acc: 0.6897
Epoch 55/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0368 - acc: 0.6735 - val_loss: 1.0132 - val_acc: 0.6853
Epoch 56/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0445 - acc: 0.6722 - val_loss: 1.0171 - val_acc: 0.6813
Epoch 57/100
42000/42000 [==============================] - 1s 30us/sample - loss: 1.0387 - acc: 0.6763 - val_loss: 1.0498 - val_acc: 0.6689
Epoch 58/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0460 - acc: 0.6731 - val_loss: 1.0686 - val_acc: 0.6675
Epoch 59/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0375 - acc: 0.6751 - val_loss: 1.0465 - val_acc: 0.6723
Epoch 60/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0389 - acc: 0.6742 - val_loss: 1.0595 - val_acc: 0.6651
Epoch 61/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0250 - acc: 0.6791 - val_loss: 1.0227 - val_acc: 0.6823
Epoch 62/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0342 - acc: 0.6758 - val_loss: 1.0227 - val_acc: 0.6824
Epoch 63/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0375 - acc: 0.6764 - val_loss: 1.0060 - val_acc: 0.6898
Epoch 64/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0230 - acc: 0.6793 - val_loss: 1.0115 - val_acc: 0.6835
Epoch 65/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0224 - acc: 0.6798 - val_loss: 1.0603 - val_acc: 0.6657
Epoch 66/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0352 - acc: 0.6760 - val_loss: 1.0281 - val_acc: 0.6746
Epoch 67/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0335 - acc: 0.6775 - val_loss: 0.9844 - val_acc: 0.6960
Epoch 68/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0224 - acc: 0.6811 - val_loss: 1.0320 - val_acc: 0.6740
Epoch 69/100
42000/42000 [==============================] - 1s 30us/sample - loss: 1.0351 - acc: 0.6750 - val_loss: 1.0805 - val_acc: 0.6631
Epoch 70/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0223 - acc: 0.6813 - val_loss: 1.0350 - val_acc: 0.6804
Epoch 71/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0255 - acc: 0.6781 - val_loss: 1.0166 - val_acc: 0.6827
Epoch 72/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0279 - acc: 0.6781 - val_loss: 1.0422 - val_acc: 0.6733
Epoch 73/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0341 - acc: 0.6777 - val_loss: 0.9790 - val_acc: 0.6986
Epoch 74/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0214 - acc: 0.6818 - val_loss: 1.0892 - val_acc: 0.6592
Epoch 75/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0283 - acc: 0.6802 - val_loss: 1.0154 - val_acc: 0.6855
Epoch 76/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0244 - acc: 0.6779 - val_loss: 1.0062 - val_acc: 0.6894
Epoch 77/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0194 - acc: 0.6811 - val_loss: 1.0128 - val_acc: 0.6851
Epoch 78/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0110 - acc: 0.6831 - val_loss: 1.0117 - val_acc: 0.6898
Epoch 79/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0215 - acc: 0.6807 - val_loss: 1.0087 - val_acc: 0.6883
Epoch 80/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0231 - acc: 0.6806 - val_loss: 1.0040 - val_acc: 0.6889
Epoch 81/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0298 - acc: 0.6776 - val_loss: 1.0346 - val_acc: 0.6800
Epoch 82/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0202 - acc: 0.6811 - val_loss: 1.0856 - val_acc: 0.6620
Epoch 83/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0209 - acc: 0.6805 - val_loss: 0.9843 - val_acc: 0.6952
Epoch 84/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0093 - acc: 0.6862 - val_loss: 1.0686 - val_acc: 0.6638
Epoch 85/100
42000/42000 [==============================] - 1s 28us/sample - loss: 1.0101 - acc: 0.6857 - val_loss: 0.9739 - val_acc: 0.6989
Epoch 86/100
42000/42000 [==============================] - 1s 28us/sample - loss: 1.0075 - acc: 0.6861 - val_loss: 0.9899 - val_acc: 0.6946
Epoch 87/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0080 - acc: 0.6878 - val_loss: 1.0014 - val_acc: 0.6913
Epoch 88/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0107 - acc: 0.6852 - val_loss: 1.0051 - val_acc: 0.6876
Epoch 89/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0133 - acc: 0.6821 - val_loss: 1.0165 - val_acc: 0.6839
Epoch 90/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0023 - acc: 0.6889 - val_loss: 0.9863 - val_acc: 0.6931
Epoch 91/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0083 - acc: 0.6878 - val_loss: 0.9998 - val_acc: 0.6890
Epoch 92/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0092 - acc: 0.6862 - val_loss: 0.9811 - val_acc: 0.6973
Epoch 93/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0127 - acc: 0.6836 - val_loss: 1.0062 - val_acc: 0.6899
Epoch 94/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.9985 - acc: 0.6901 - val_loss: 0.9626 - val_acc: 0.7035
Epoch 95/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.9983 - acc: 0.6881 - val_loss: 0.9701 - val_acc: 0.7008
Epoch 96/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.9938 - acc: 0.6915 - val_loss: 0.9868 - val_acc: 0.6932
Epoch 97/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0051 - acc: 0.6875 - val_loss: 1.0199 - val_acc: 0.6880
Epoch 98/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0072 - acc: 0.6864 - val_loss: 0.9687 - val_acc: 0.7041
Epoch 99/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0078 - acc: 0.6880 - val_loss: 0.9660 - val_acc: 0.7028
Epoch 100/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.9940 - acc: 0.6921 - val_loss: 1.0037 - val_acc: 0.6911
In [43]:
print('Evaluate NN model with relu activations'); print('--'*40)
results2 = model2.evaluate(X_val, y_val)
print('Validation accuracy: {}'.format(round(results2[1]*100, 2), '%'))
Evaluate NN model with relu activations
--------------------------------------------------------------------------------
60000/60000 [==============================] - 3s 47us/sample - loss: 1.0037 - acc: 0.6911
Validation accuracy: 69.11

NN model, relu activations, adam optimizer, changing learning rate

In [44]:
%time
print('NN model with relu activations and adam optimizer'); print('--'*40)
# compiling the neural network classifier, adam optimizer
adam = optimizers.Adam(lr = 0.001)
model2.compile(optimizer = adam, loss = 'categorical_crossentropy', metrics = ['accuracy'])

# Fitting the neural network for training
history = model2.fit(X_train, y_train, validation_data = (X_val, y_val), batch_size = 200, epochs = 100, verbose = 1)
CPU times: user 2 µs, sys: 1e+03 ns, total: 3 µs
Wall time: 4.77 µs
NN model with relu activations and adam optimizer
--------------------------------------------------------------------------------
Train on 42000 samples, validate on 60000 samples
Epoch 1/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.9207 - acc: 0.7166 - val_loss: 0.9223 - val_acc: 0.7184
Epoch 2/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.9098 - acc: 0.7203 - val_loss: 0.9175 - val_acc: 0.7193
Epoch 3/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.9100 - acc: 0.7193 - val_loss: 0.9187 - val_acc: 0.7174
Epoch 4/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.9088 - acc: 0.7202 - val_loss: 0.9171 - val_acc: 0.7187
Epoch 5/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.9080 - acc: 0.7191 - val_loss: 0.9185 - val_acc: 0.7184
Epoch 6/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.9064 - acc: 0.7220 - val_loss: 0.9181 - val_acc: 0.7187
Epoch 7/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.9074 - acc: 0.7217 - val_loss: 0.9131 - val_acc: 0.7214
Epoch 8/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.9038 - acc: 0.7233 - val_loss: 0.9160 - val_acc: 0.7208
Epoch 9/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.9038 - acc: 0.7221 - val_loss: 0.9150 - val_acc: 0.7202
Epoch 10/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.9043 - acc: 0.7222 - val_loss: 0.9147 - val_acc: 0.7218
Epoch 11/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.9017 - acc: 0.7234 - val_loss: 0.9187 - val_acc: 0.7187
Epoch 12/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.9026 - acc: 0.7225 - val_loss: 0.9187 - val_acc: 0.7187
Epoch 13/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.9057 - acc: 0.7221 - val_loss: 0.9153 - val_acc: 0.7211
Epoch 14/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.9016 - acc: 0.7229 - val_loss: 0.9116 - val_acc: 0.7207
Epoch 15/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.9031 - acc: 0.7222 - val_loss: 0.9213 - val_acc: 0.7177
Epoch 16/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.9001 - acc: 0.7229 - val_loss: 0.9053 - val_acc: 0.7233
Epoch 17/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.9013 - acc: 0.7225 - val_loss: 0.9189 - val_acc: 0.7166
Epoch 18/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8989 - acc: 0.7245 - val_loss: 0.9059 - val_acc: 0.7229
Epoch 19/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.9003 - acc: 0.7237 - val_loss: 0.9150 - val_acc: 0.7199
Epoch 20/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.9003 - acc: 0.7231 - val_loss: 0.9091 - val_acc: 0.7216
Epoch 21/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8980 - acc: 0.7254 - val_loss: 0.9098 - val_acc: 0.7218
Epoch 22/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8977 - acc: 0.7244 - val_loss: 0.9094 - val_acc: 0.7217
Epoch 23/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8987 - acc: 0.7237 - val_loss: 0.9145 - val_acc: 0.7200
Epoch 24/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8967 - acc: 0.7244 - val_loss: 0.9107 - val_acc: 0.7208
Epoch 25/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8976 - acc: 0.7246 - val_loss: 0.9145 - val_acc: 0.7199
Epoch 26/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8976 - acc: 0.7237 - val_loss: 0.9047 - val_acc: 0.7238
Epoch 27/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8966 - acc: 0.7230 - val_loss: 0.9074 - val_acc: 0.7229
Epoch 28/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8947 - acc: 0.7263 - val_loss: 0.9040 - val_acc: 0.7226
Epoch 29/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8959 - acc: 0.7252 - val_loss: 0.9215 - val_acc: 0.7157
Epoch 30/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8937 - acc: 0.7264 - val_loss: 0.9097 - val_acc: 0.7207
Epoch 31/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8959 - acc: 0.7242 - val_loss: 0.9120 - val_acc: 0.7211
Epoch 32/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8944 - acc: 0.7248 - val_loss: 0.9027 - val_acc: 0.7243
Epoch 33/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8942 - acc: 0.7254 - val_loss: 0.9032 - val_acc: 0.7250
Epoch 34/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8928 - acc: 0.7254 - val_loss: 0.9032 - val_acc: 0.7240
Epoch 35/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8926 - acc: 0.7265 - val_loss: 0.9091 - val_acc: 0.7202
Epoch 36/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8949 - acc: 0.7246 - val_loss: 0.9041 - val_acc: 0.7226
Epoch 37/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.8940 - acc: 0.7237 - val_loss: 0.9028 - val_acc: 0.7241
Epoch 38/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8925 - acc: 0.7257 - val_loss: 0.9063 - val_acc: 0.7229
Epoch 39/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8917 - acc: 0.7259 - val_loss: 0.9024 - val_acc: 0.7245
Epoch 40/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8943 - acc: 0.7253 - val_loss: 0.9051 - val_acc: 0.7239
Epoch 41/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8918 - acc: 0.7263 - val_loss: 0.9123 - val_acc: 0.7199
Epoch 42/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8918 - acc: 0.7262 - val_loss: 0.9091 - val_acc: 0.7220
Epoch 43/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8912 - acc: 0.7266 - val_loss: 0.9088 - val_acc: 0.7209
Epoch 44/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8900 - acc: 0.7281 - val_loss: 0.9084 - val_acc: 0.7224
Epoch 45/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8904 - acc: 0.7269 - val_loss: 0.9066 - val_acc: 0.7211
Epoch 46/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8903 - acc: 0.7260 - val_loss: 0.9017 - val_acc: 0.7251
Epoch 47/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8908 - acc: 0.7267 - val_loss: 0.9037 - val_acc: 0.7225
Epoch 48/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8892 - acc: 0.7248 - val_loss: 0.9043 - val_acc: 0.7222
Epoch 49/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8892 - acc: 0.7263 - val_loss: 0.8993 - val_acc: 0.7247
Epoch 50/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8902 - acc: 0.7260 - val_loss: 0.9123 - val_acc: 0.7200
Epoch 51/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8890 - acc: 0.7261 - val_loss: 0.9017 - val_acc: 0.7247
Epoch 52/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8873 - acc: 0.7271 - val_loss: 0.8991 - val_acc: 0.7254
Epoch 53/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8879 - acc: 0.7272 - val_loss: 0.8996 - val_acc: 0.7247
Epoch 54/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8880 - acc: 0.7273 - val_loss: 0.9110 - val_acc: 0.7222
Epoch 55/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8882 - acc: 0.7275 - val_loss: 0.9008 - val_acc: 0.7245
Epoch 56/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8879 - acc: 0.7273 - val_loss: 0.9052 - val_acc: 0.7240
Epoch 57/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.8860 - acc: 0.7280 - val_loss: 0.9027 - val_acc: 0.7250
Epoch 58/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8857 - acc: 0.7295 - val_loss: 0.8991 - val_acc: 0.7258
Epoch 59/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8899 - acc: 0.7264 - val_loss: 0.9094 - val_acc: 0.7238
Epoch 60/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.8868 - acc: 0.7277 - val_loss: 0.9053 - val_acc: 0.7228
Epoch 61/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8864 - acc: 0.7288 - val_loss: 0.8992 - val_acc: 0.7254
Epoch 62/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8879 - acc: 0.7276 - val_loss: 0.8937 - val_acc: 0.7282
Epoch 63/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8860 - acc: 0.7279 - val_loss: 0.9124 - val_acc: 0.7200
Epoch 64/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8867 - acc: 0.7286 - val_loss: 0.9027 - val_acc: 0.7235
Epoch 65/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8852 - acc: 0.7286 - val_loss: 0.8949 - val_acc: 0.7265
Epoch 66/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8861 - acc: 0.7287 - val_loss: 0.8954 - val_acc: 0.7266
Epoch 67/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8840 - acc: 0.7283 - val_loss: 0.9055 - val_acc: 0.7222
Epoch 68/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8838 - acc: 0.7285 - val_loss: 0.8979 - val_acc: 0.7258
Epoch 69/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8849 - acc: 0.7277 - val_loss: 0.8946 - val_acc: 0.7275
Epoch 70/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8830 - acc: 0.7291 - val_loss: 0.8960 - val_acc: 0.7247
Epoch 71/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8848 - acc: 0.7285 - val_loss: 0.8995 - val_acc: 0.7262
Epoch 72/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.8834 - acc: 0.7289 - val_loss: 0.9017 - val_acc: 0.7239
Epoch 73/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8826 - acc: 0.7281 - val_loss: 0.8945 - val_acc: 0.7273
Epoch 74/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8825 - acc: 0.7292 - val_loss: 0.8963 - val_acc: 0.7259
Epoch 75/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8826 - acc: 0.7280 - val_loss: 0.8977 - val_acc: 0.7272
Epoch 76/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8830 - acc: 0.7291 - val_loss: 0.8935 - val_acc: 0.7268
Epoch 77/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8825 - acc: 0.7294 - val_loss: 0.8943 - val_acc: 0.7264
Epoch 78/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8835 - acc: 0.7282 - val_loss: 0.8985 - val_acc: 0.7260
Epoch 79/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8825 - acc: 0.7271 - val_loss: 0.8932 - val_acc: 0.7282
Epoch 80/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8817 - acc: 0.7296 - val_loss: 0.8885 - val_acc: 0.7285
Epoch 81/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8824 - acc: 0.7278 - val_loss: 0.8929 - val_acc: 0.7276
Epoch 82/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8817 - acc: 0.7291 - val_loss: 0.8922 - val_acc: 0.7279
Epoch 83/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8800 - acc: 0.7310 - val_loss: 0.9002 - val_acc: 0.7251
Epoch 84/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8794 - acc: 0.7299 - val_loss: 0.8957 - val_acc: 0.7252
Epoch 85/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8807 - acc: 0.7294 - val_loss: 0.8998 - val_acc: 0.7242
Epoch 86/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8795 - acc: 0.7294 - val_loss: 0.8992 - val_acc: 0.7249
Epoch 87/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8797 - acc: 0.7305 - val_loss: 0.8921 - val_acc: 0.7282
Epoch 88/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.8812 - acc: 0.7295 - val_loss: 0.8967 - val_acc: 0.7238
Epoch 89/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8798 - acc: 0.7304 - val_loss: 0.8901 - val_acc: 0.7291
Epoch 90/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.8821 - acc: 0.7296 - val_loss: 0.8958 - val_acc: 0.7253
Epoch 91/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8812 - acc: 0.7286 - val_loss: 0.8913 - val_acc: 0.7284
Epoch 92/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8796 - acc: 0.7296 - val_loss: 0.8986 - val_acc: 0.7248
Epoch 93/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8789 - acc: 0.7305 - val_loss: 0.8969 - val_acc: 0.7255
Epoch 94/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8808 - acc: 0.7296 - val_loss: 0.8978 - val_acc: 0.7260
Epoch 95/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8788 - acc: 0.7307 - val_loss: 0.8898 - val_acc: 0.7299
Epoch 96/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8781 - acc: 0.7309 - val_loss: 0.8893 - val_acc: 0.7282
Epoch 97/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8786 - acc: 0.7301 - val_loss: 0.8924 - val_acc: 0.7273
Epoch 98/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8804 - acc: 0.7296 - val_loss: 0.8923 - val_acc: 0.7278
Epoch 99/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8776 - acc: 0.7310 - val_loss: 0.8898 - val_acc: 0.7301
Epoch 100/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8787 - acc: 0.7312 - val_loss: 0.9003 - val_acc: 0.7231
In [45]:
print('Evaluate NN model with relu activations'); print('--'*40)
results2 = model2.evaluate(X_val, y_val)
print('Validation accuracy: {}'.format(round(results2[1]*100, 2), '%'))
Evaluate NN model with relu activations
--------------------------------------------------------------------------------
60000/60000 [==============================] - 3s 47us/sample - loss: 0.9003 - acc: 0.7231
Validation accuracy: 72.31

Observation 4 - NN model with relu activations
  • Improves the scores considerably.
  • Best accuracy achieved till now is using relu activations, SGD optimizer, changing learning rate to 0.001.
  • Next, let's try and change the number of activators and see if the score improves.

NN model, relu activations, changing number of activators, SGD optimizers

In [46]:
print('NN model with relu activations and changing number of activators'); print('--'*40)
# Initialize the neural network classifier
model3 = Sequential()

# Input Layer - adding input layer and activation functions relu
model3.add(Dense(256, input_shape = (1024, )))
# Adding activation function
model3.add(Activation('relu'))

#Hidden Layer 1 - adding first hidden layer
model3.add(Dense(128))
# Adding activation function
model3.add(Activation('relu'))

#Hidden Layer 2 - Adding second hidden layer
model3.add(Dense(64))
# Adding activation function
model3.add(Activation('relu'))

# Output Layer - adding output layer which is of 10 nodes (digits)
model3.add(Dense(10))
# Adding activation function - softmax for multiclass classification
model3.add(Activation('softmax'))
NN model with relu activations and changing number of activators
--------------------------------------------------------------------------------
In [47]:
model3.summary()
Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_6 (Dense)              (None, 256)               262400    
_________________________________________________________________
activation_6 (Activation)    (None, 256)               0         
_________________________________________________________________
dense_7 (Dense)              (None, 128)               32896     
_________________________________________________________________
activation_7 (Activation)    (None, 128)               0         
_________________________________________________________________
dense_8 (Dense)              (None, 64)                8256      
_________________________________________________________________
activation_8 (Activation)    (None, 64)                0         
_________________________________________________________________
dense_9 (Dense)              (None, 10)                650       
_________________________________________________________________
activation_9 (Activation)    (None, 10)                0         
=================================================================
Total params: 304,202
Trainable params: 304,202
Non-trainable params: 0
_________________________________________________________________
In [48]:
# compiling the neural network classifier, sgd optimizer
sgd = optimizers.SGD(lr = 0.01)
model3.compile(optimizer = sgd, loss = 'categorical_crossentropy', metrics = ['accuracy'])

# Fitting the neural network for training
history = model3.fit(X_train, y_train, validation_data = (X_val, y_val), batch_size = 200, epochs = 100, verbose = 1)
Train on 42000 samples, validate on 60000 samples
Epoch 1/100
42000/42000 [==============================] - 1s 30us/sample - loss: 2.2970 - acc: 0.1352 - val_loss: 2.2856 - val_acc: 0.1681
Epoch 2/100
42000/42000 [==============================] - 1s 27us/sample - loss: 2.2796 - acc: 0.1758 - val_loss: 2.2716 - val_acc: 0.2027
Epoch 3/100
42000/42000 [==============================] - 1s 27us/sample - loss: 2.2635 - acc: 0.2046 - val_loss: 2.2535 - val_acc: 0.2214
Epoch 4/100
42000/42000 [==============================] - 1s 27us/sample - loss: 2.2420 - acc: 0.2349 - val_loss: 2.2278 - val_acc: 0.2569
Epoch 5/100
42000/42000 [==============================] - 1s 28us/sample - loss: 2.2120 - acc: 0.2681 - val_loss: 2.1917 - val_acc: 0.2880
Epoch 6/100
42000/42000 [==============================] - 1s 27us/sample - loss: 2.1718 - acc: 0.2916 - val_loss: 2.1462 - val_acc: 0.3122
Epoch 7/100
42000/42000 [==============================] - 1s 28us/sample - loss: 2.1214 - acc: 0.3206 - val_loss: 2.0898 - val_acc: 0.3460
Epoch 8/100
42000/42000 [==============================] - 1s 27us/sample - loss: 2.0581 - acc: 0.3461 - val_loss: 2.0236 - val_acc: 0.3535
Epoch 9/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.9848 - acc: 0.3783 - val_loss: 1.9432 - val_acc: 0.3811
Epoch 10/100
42000/42000 [==============================] - 1s 29us/sample - loss: 1.9031 - acc: 0.4055 - val_loss: 1.8582 - val_acc: 0.4124
Epoch 11/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.8185 - acc: 0.4314 - val_loss: 1.7743 - val_acc: 0.4404
Epoch 12/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.7381 - acc: 0.4524 - val_loss: 1.6980 - val_acc: 0.4704
Epoch 13/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.6675 - acc: 0.4743 - val_loss: 1.6228 - val_acc: 0.4852
Epoch 14/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.5969 - acc: 0.4995 - val_loss: 1.5545 - val_acc: 0.5227
Epoch 15/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.5448 - acc: 0.5187 - val_loss: 1.5015 - val_acc: 0.5362
Epoch 16/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.4915 - acc: 0.5380 - val_loss: 1.4665 - val_acc: 0.5372
Epoch 17/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.4440 - acc: 0.5557 - val_loss: 1.4270 - val_acc: 0.5549
Epoch 18/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.3993 - acc: 0.5724 - val_loss: 1.3648 - val_acc: 0.5817
Epoch 19/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.3595 - acc: 0.5881 - val_loss: 1.3138 - val_acc: 0.6082
Epoch 20/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.3237 - acc: 0.5979 - val_loss: 1.2766 - val_acc: 0.6204
Epoch 21/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.2891 - acc: 0.6102 - val_loss: 1.2549 - val_acc: 0.6227
Epoch 22/100
42000/42000 [==============================] - 1s 30us/sample - loss: 1.2631 - acc: 0.6179 - val_loss: 1.2217 - val_acc: 0.6326
Epoch 23/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.2333 - acc: 0.6262 - val_loss: 1.1897 - val_acc: 0.6449
Epoch 24/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.2085 - acc: 0.6334 - val_loss: 1.2089 - val_acc: 0.6307
Epoch 25/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.1796 - acc: 0.6431 - val_loss: 1.1399 - val_acc: 0.6573
Epoch 26/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.1567 - acc: 0.6515 - val_loss: 1.1690 - val_acc: 0.6470
Epoch 27/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.1386 - acc: 0.6560 - val_loss: 1.1205 - val_acc: 0.6615
Epoch 28/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.1176 - acc: 0.6613 - val_loss: 1.1342 - val_acc: 0.6515
Epoch 29/100
42000/42000 [==============================] - 1s 26us/sample - loss: 1.0952 - acc: 0.6679 - val_loss: 1.0826 - val_acc: 0.6724
Epoch 30/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0830 - acc: 0.6719 - val_loss: 1.0669 - val_acc: 0.6750
Epoch 31/100
42000/42000 [==============================] - 1s 28us/sample - loss: 1.0597 - acc: 0.6795 - val_loss: 1.0353 - val_acc: 0.6884
Epoch 32/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0421 - acc: 0.6851 - val_loss: 1.0258 - val_acc: 0.6925
Epoch 33/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0331 - acc: 0.6867 - val_loss: 1.0104 - val_acc: 0.6980
Epoch 34/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0165 - acc: 0.6934 - val_loss: 1.0069 - val_acc: 0.6970
Epoch 35/100
42000/42000 [==============================] - 1s 27us/sample - loss: 1.0008 - acc: 0.6960 - val_loss: 0.9970 - val_acc: 0.6992
Epoch 36/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.9945 - acc: 0.6982 - val_loss: 0.9940 - val_acc: 0.6965
Epoch 37/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.9793 - acc: 0.7029 - val_loss: 0.9624 - val_acc: 0.7110
Epoch 38/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.9663 - acc: 0.7077 - val_loss: 1.0115 - val_acc: 0.6923
Epoch 39/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.9564 - acc: 0.7122 - val_loss: 0.9731 - val_acc: 0.7039
Epoch 40/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.9448 - acc: 0.7154 - val_loss: 0.9362 - val_acc: 0.7169
Epoch 41/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.9365 - acc: 0.7180 - val_loss: 0.9292 - val_acc: 0.7188
Epoch 42/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.9225 - acc: 0.7215 - val_loss: 0.8999 - val_acc: 0.7302
Epoch 43/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.9145 - acc: 0.7238 - val_loss: 0.9025 - val_acc: 0.7299
Epoch 44/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.9029 - acc: 0.7281 - val_loss: 1.0152 - val_acc: 0.6853
Epoch 45/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.9000 - acc: 0.7281 - val_loss: 0.8805 - val_acc: 0.7349
Epoch 46/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8898 - acc: 0.7289 - val_loss: 0.9236 - val_acc: 0.7178
Epoch 47/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8760 - acc: 0.7355 - val_loss: 0.8661 - val_acc: 0.7407
Epoch 48/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8694 - acc: 0.7385 - val_loss: 0.8628 - val_acc: 0.7422
Epoch 49/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8612 - acc: 0.7381 - val_loss: 0.8997 - val_acc: 0.7266
Epoch 50/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8591 - acc: 0.7398 - val_loss: 0.8416 - val_acc: 0.7478
Epoch 51/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8462 - acc: 0.7453 - val_loss: 0.9037 - val_acc: 0.7258
Epoch 52/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8391 - acc: 0.7459 - val_loss: 0.8628 - val_acc: 0.7382
Epoch 53/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8393 - acc: 0.7460 - val_loss: 0.8461 - val_acc: 0.7462
Epoch 54/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8288 - acc: 0.7493 - val_loss: 0.8156 - val_acc: 0.7549
Epoch 55/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8158 - acc: 0.7539 - val_loss: 0.8001 - val_acc: 0.7626
Epoch 56/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.8154 - acc: 0.7545 - val_loss: 0.8082 - val_acc: 0.7612
Epoch 57/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.8053 - acc: 0.7569 - val_loss: 0.8299 - val_acc: 0.7492
Epoch 58/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.7994 - acc: 0.7599 - val_loss: 0.8095 - val_acc: 0.7570
Epoch 59/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.7924 - acc: 0.7619 - val_loss: 0.8090 - val_acc: 0.7562
Epoch 60/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.7901 - acc: 0.7609 - val_loss: 0.7917 - val_acc: 0.7615
Epoch 61/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.7827 - acc: 0.7656 - val_loss: 0.7713 - val_acc: 0.7694
Epoch 62/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.7769 - acc: 0.7650 - val_loss: 0.7830 - val_acc: 0.7661
Epoch 63/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.7675 - acc: 0.7671 - val_loss: 0.7965 - val_acc: 0.7606
Epoch 64/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.7644 - acc: 0.7679 - val_loss: 0.7599 - val_acc: 0.7731
Epoch 65/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.7584 - acc: 0.7713 - val_loss: 0.7570 - val_acc: 0.7728
Epoch 66/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.7528 - acc: 0.7725 - val_loss: 0.7684 - val_acc: 0.7698
Epoch 67/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.7447 - acc: 0.7757 - val_loss: 0.7679 - val_acc: 0.7717
Epoch 68/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.7423 - acc: 0.7766 - val_loss: 0.7558 - val_acc: 0.7750
Epoch 69/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.7405 - acc: 0.7773 - val_loss: 0.7588 - val_acc: 0.7726
Epoch 70/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.7341 - acc: 0.7778 - val_loss: 0.7447 - val_acc: 0.7783
Epoch 71/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.7243 - acc: 0.7812 - val_loss: 0.7490 - val_acc: 0.7745
Epoch 72/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.7239 - acc: 0.7818 - val_loss: 0.7168 - val_acc: 0.7861
Epoch 73/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.7154 - acc: 0.7850 - val_loss: 0.7268 - val_acc: 0.7806
Epoch 74/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.7103 - acc: 0.7860 - val_loss: 0.7208 - val_acc: 0.7813
Epoch 75/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.7072 - acc: 0.7857 - val_loss: 0.7144 - val_acc: 0.7858
Epoch 76/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.7010 - acc: 0.7887 - val_loss: 0.7003 - val_acc: 0.7927
Epoch 77/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.7012 - acc: 0.7879 - val_loss: 0.7082 - val_acc: 0.7896
Epoch 78/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6929 - acc: 0.7915 - val_loss: 0.6906 - val_acc: 0.7964
Epoch 79/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6939 - acc: 0.7909 - val_loss: 0.6842 - val_acc: 0.7974
Epoch 80/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6882 - acc: 0.7921 - val_loss: 0.7020 - val_acc: 0.7897
Epoch 81/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6787 - acc: 0.7941 - val_loss: 0.7151 - val_acc: 0.7848
Epoch 82/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6810 - acc: 0.7929 - val_loss: 0.6744 - val_acc: 0.7991
Epoch 83/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6717 - acc: 0.7963 - val_loss: 0.6813 - val_acc: 0.7977
Epoch 84/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6671 - acc: 0.7987 - val_loss: 0.7453 - val_acc: 0.7745
Epoch 85/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6686 - acc: 0.8000 - val_loss: 0.6739 - val_acc: 0.7992
Epoch 86/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6614 - acc: 0.8003 - val_loss: 0.6793 - val_acc: 0.7975
Epoch 87/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6552 - acc: 0.8014 - val_loss: 0.6969 - val_acc: 0.7904
Epoch 88/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6524 - acc: 0.8022 - val_loss: 0.6672 - val_acc: 0.8000
Epoch 89/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6473 - acc: 0.8039 - val_loss: 0.6661 - val_acc: 0.7994
Epoch 90/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6463 - acc: 0.8042 - val_loss: 0.6901 - val_acc: 0.7949
Epoch 91/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.6423 - acc: 0.8070 - val_loss: 0.6594 - val_acc: 0.8036
Epoch 92/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6373 - acc: 0.8077 - val_loss: 0.7119 - val_acc: 0.7857
Epoch 93/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6325 - acc: 0.8087 - val_loss: 0.7002 - val_acc: 0.7896
Epoch 94/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6286 - acc: 0.8102 - val_loss: 0.6647 - val_acc: 0.8014
Epoch 95/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6266 - acc: 0.8102 - val_loss: 0.6754 - val_acc: 0.7953
Epoch 96/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6194 - acc: 0.8131 - val_loss: 0.6490 - val_acc: 0.8067
Epoch 97/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6192 - acc: 0.8125 - val_loss: 0.6230 - val_acc: 0.8147
Epoch 98/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6151 - acc: 0.8145 - val_loss: 0.6406 - val_acc: 0.8116
Epoch 99/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6118 - acc: 0.8143 - val_loss: 0.6255 - val_acc: 0.8143
Epoch 100/100
42000/42000 [==============================] - 1s 26us/sample - loss: 0.6101 - acc: 0.8161 - val_loss: 0.6550 - val_acc: 0.8033
In [49]:
print('Evaluate NN model with relu activations and changing the number of activators'); print('--'*40)
results3 = model3.evaluate(X_val, y_val)
print('Validation accuracy: {}'.format(round(results3[1]*100, 2), '%'))
Evaluate NN model with relu activations and changing the number of activators
--------------------------------------------------------------------------------
60000/60000 [==============================] - 3s 49us/sample - loss: 0.6550 - acc: 0.8033
Validation accuracy: 80.33

NN model, relu activations, changing number of activators, Adam optimizers

In [50]:
# compiling the neural network classifier, adam optimizer
adam = optimizers.Adam(lr = 0.001)
model3.compile(optimizer = adam, loss = 'categorical_crossentropy', metrics = ['accuracy'])

# Fitting the neural network for training
history = model3.fit(X_train, y_train, validation_data = (X_val, y_val), batch_size = 200, epochs = 100, verbose = 1)
Train on 42000 samples, validate on 60000 samples
Epoch 1/100
42000/42000 [==============================] - 1s 32us/sample - loss: 0.9431 - acc: 0.7111 - val_loss: 0.7650 - val_acc: 0.7681
Epoch 2/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.7697 - acc: 0.7615 - val_loss: 0.8645 - val_acc: 0.7370
Epoch 3/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.7513 - acc: 0.7701 - val_loss: 0.7942 - val_acc: 0.7514
Epoch 4/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.7354 - acc: 0.7748 - val_loss: 0.7716 - val_acc: 0.7655
Epoch 5/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.7217 - acc: 0.7776 - val_loss: 0.7024 - val_acc: 0.7878
Epoch 6/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.6969 - acc: 0.7889 - val_loss: 0.7223 - val_acc: 0.7825
Epoch 7/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.6787 - acc: 0.7916 - val_loss: 0.6787 - val_acc: 0.7961
Epoch 8/100
42000/42000 [==============================] - 1s 31us/sample - loss: 0.6733 - acc: 0.7931 - val_loss: 0.6695 - val_acc: 0.7995
Epoch 9/100
42000/42000 [==============================] - 1s 31us/sample - loss: 0.6702 - acc: 0.7970 - val_loss: 0.6557 - val_acc: 0.7993
Epoch 10/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.6459 - acc: 0.8030 - val_loss: 0.6377 - val_acc: 0.8059
Epoch 11/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.6307 - acc: 0.8063 - val_loss: 0.6681 - val_acc: 0.7943
Epoch 12/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.6220 - acc: 0.8095 - val_loss: 0.6519 - val_acc: 0.7985
Epoch 13/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.6159 - acc: 0.8104 - val_loss: 0.5682 - val_acc: 0.8313
Epoch 14/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.5888 - acc: 0.8195 - val_loss: 0.6379 - val_acc: 0.8075
Epoch 15/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.5854 - acc: 0.8195 - val_loss: 0.6212 - val_acc: 0.8140
Epoch 16/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.5868 - acc: 0.8183 - val_loss: 0.6059 - val_acc: 0.8149
Epoch 17/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.5652 - acc: 0.8265 - val_loss: 0.6075 - val_acc: 0.8173
Epoch 18/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.5573 - acc: 0.8286 - val_loss: 0.5845 - val_acc: 0.8208
Epoch 19/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.5520 - acc: 0.8299 - val_loss: 0.6187 - val_acc: 0.8110
Epoch 20/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.5610 - acc: 0.8274 - val_loss: 0.5504 - val_acc: 0.8365
Epoch 21/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.5195 - acc: 0.8405 - val_loss: 0.5797 - val_acc: 0.8260
Epoch 22/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.5315 - acc: 0.8363 - val_loss: 0.5837 - val_acc: 0.8220
Epoch 23/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.5184 - acc: 0.8403 - val_loss: 0.5482 - val_acc: 0.8365
Epoch 24/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.5084 - acc: 0.8423 - val_loss: 0.5310 - val_acc: 0.8398
Epoch 25/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.4920 - acc: 0.8479 - val_loss: 0.5043 - val_acc: 0.8500
Epoch 26/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.4861 - acc: 0.8503 - val_loss: 0.5457 - val_acc: 0.8358
Epoch 27/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.4943 - acc: 0.8464 - val_loss: 0.5342 - val_acc: 0.8367
Epoch 28/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.4859 - acc: 0.8496 - val_loss: 0.4831 - val_acc: 0.8547
Epoch 29/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.4807 - acc: 0.8505 - val_loss: 0.5247 - val_acc: 0.8406
Epoch 30/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.4695 - acc: 0.8536 - val_loss: 0.5498 - val_acc: 0.8342
Epoch 31/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.4653 - acc: 0.8545 - val_loss: 0.5010 - val_acc: 0.8485
Epoch 32/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.4577 - acc: 0.8581 - val_loss: 0.4785 - val_acc: 0.8563
Epoch 33/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.4566 - acc: 0.8579 - val_loss: 0.5020 - val_acc: 0.8483
Epoch 34/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.4525 - acc: 0.8579 - val_loss: 0.5862 - val_acc: 0.8170
Epoch 35/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.4355 - acc: 0.8633 - val_loss: 0.4661 - val_acc: 0.8595
Epoch 36/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4291 - acc: 0.8661 - val_loss: 0.5259 - val_acc: 0.8383
Epoch 37/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.4338 - acc: 0.8625 - val_loss: 0.6068 - val_acc: 0.8148
Epoch 38/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.4325 - acc: 0.8645 - val_loss: 0.4514 - val_acc: 0.8656
Epoch 39/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.4294 - acc: 0.8657 - val_loss: 0.4641 - val_acc: 0.8617
Epoch 40/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.4239 - acc: 0.8666 - val_loss: 0.5175 - val_acc: 0.8439
Epoch 41/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.4266 - acc: 0.8659 - val_loss: 0.5023 - val_acc: 0.8489
Epoch 42/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.4206 - acc: 0.8668 - val_loss: 0.4673 - val_acc: 0.8594
Epoch 43/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.4083 - acc: 0.8699 - val_loss: 0.4995 - val_acc: 0.8479
Epoch 44/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.3917 - acc: 0.8773 - val_loss: 0.4362 - val_acc: 0.8698
Epoch 45/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.3893 - acc: 0.8769 - val_loss: 0.5263 - val_acc: 0.8397
Epoch 46/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.3958 - acc: 0.8752 - val_loss: 0.4380 - val_acc: 0.8702
Epoch 47/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.3959 - acc: 0.8743 - val_loss: 0.4739 - val_acc: 0.8581
Epoch 48/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.3956 - acc: 0.8740 - val_loss: 0.4374 - val_acc: 0.8686
Epoch 49/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.3933 - acc: 0.8749 - val_loss: 0.4651 - val_acc: 0.8607
Epoch 50/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.3671 - acc: 0.8832 - val_loss: 0.4376 - val_acc: 0.8698
Epoch 51/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.3662 - acc: 0.8822 - val_loss: 0.4165 - val_acc: 0.8781
Epoch 52/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.3612 - acc: 0.8860 - val_loss: 0.4172 - val_acc: 0.8770
Epoch 53/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.3767 - acc: 0.8793 - val_loss: 0.4456 - val_acc: 0.8668
Epoch 54/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.3575 - acc: 0.8863 - val_loss: 0.4409 - val_acc: 0.8695
Epoch 55/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.3584 - acc: 0.8840 - val_loss: 0.4215 - val_acc: 0.8756
Epoch 56/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.3558 - acc: 0.8864 - val_loss: 0.4387 - val_acc: 0.8690
Epoch 57/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3602 - acc: 0.8850 - val_loss: 0.4316 - val_acc: 0.8732
Epoch 58/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.3350 - acc: 0.8931 - val_loss: 0.4214 - val_acc: 0.8757
Epoch 59/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.3381 - acc: 0.8934 - val_loss: 0.4294 - val_acc: 0.8722
Epoch 60/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.3374 - acc: 0.8938 - val_loss: 0.4462 - val_acc: 0.8694
Epoch 61/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.3499 - acc: 0.8881 - val_loss: 0.4223 - val_acc: 0.8754
Epoch 62/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.3301 - acc: 0.8923 - val_loss: 0.4261 - val_acc: 0.8756
Epoch 63/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3302 - acc: 0.8942 - val_loss: 0.4492 - val_acc: 0.8649
Epoch 64/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.3206 - acc: 0.8978 - val_loss: 0.3936 - val_acc: 0.8864
Epoch 65/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.3162 - acc: 0.8980 - val_loss: 0.3932 - val_acc: 0.8864
Epoch 66/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.3092 - acc: 0.9009 - val_loss: 0.4013 - val_acc: 0.8825
Epoch 67/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.3158 - acc: 0.8982 - val_loss: 0.4132 - val_acc: 0.8783
Epoch 68/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.3189 - acc: 0.8970 - val_loss: 0.3641 - val_acc: 0.8957
Epoch 69/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.3035 - acc: 0.9028 - val_loss: 0.4218 - val_acc: 0.8760
Epoch 70/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3131 - acc: 0.8994 - val_loss: 0.4032 - val_acc: 0.8841
Epoch 71/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.3009 - acc: 0.9026 - val_loss: 0.4012 - val_acc: 0.8855
Epoch 72/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.2948 - acc: 0.9077 - val_loss: 0.3779 - val_acc: 0.8928
Epoch 73/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.2957 - acc: 0.9047 - val_loss: 0.4244 - val_acc: 0.8788
Epoch 74/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.2927 - acc: 0.9074 - val_loss: 0.4000 - val_acc: 0.8860
Epoch 75/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.2952 - acc: 0.9060 - val_loss: 0.4045 - val_acc: 0.8850
Epoch 76/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.2872 - acc: 0.9078 - val_loss: 0.4378 - val_acc: 0.8738
Epoch 77/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.3008 - acc: 0.9033 - val_loss: 0.3690 - val_acc: 0.8970
Epoch 78/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.2724 - acc: 0.9125 - val_loss: 0.3860 - val_acc: 0.8903
Epoch 79/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.2974 - acc: 0.9023 - val_loss: 0.4051 - val_acc: 0.8840
Epoch 80/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.2782 - acc: 0.9085 - val_loss: 0.3634 - val_acc: 0.9003
Epoch 81/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.2770 - acc: 0.9094 - val_loss: 0.3970 - val_acc: 0.8867
Epoch 82/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.2813 - acc: 0.9098 - val_loss: 0.3967 - val_acc: 0.8843
Epoch 83/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.2933 - acc: 0.9037 - val_loss: 0.4170 - val_acc: 0.8803
Epoch 84/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.2803 - acc: 0.9088 - val_loss: 0.3711 - val_acc: 0.8977
Epoch 85/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.2688 - acc: 0.9134 - val_loss: 0.4142 - val_acc: 0.8851
Epoch 86/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.2594 - acc: 0.9154 - val_loss: 0.3891 - val_acc: 0.8889
Epoch 87/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.2721 - acc: 0.9106 - val_loss: 0.4202 - val_acc: 0.8810
Epoch 88/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.2624 - acc: 0.9143 - val_loss: 0.4145 - val_acc: 0.8846
Epoch 89/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.2672 - acc: 0.9133 - val_loss: 0.4157 - val_acc: 0.8844
Epoch 90/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.2453 - acc: 0.9196 - val_loss: 0.3752 - val_acc: 0.8970
Epoch 91/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.2564 - acc: 0.9160 - val_loss: 0.3450 - val_acc: 0.9084
Epoch 92/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.2401 - acc: 0.9220 - val_loss: 0.4096 - val_acc: 0.8861
Epoch 93/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.2460 - acc: 0.9199 - val_loss: 0.3834 - val_acc: 0.8932
Epoch 94/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.2424 - acc: 0.9212 - val_loss: 0.3874 - val_acc: 0.8963
Epoch 95/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.2699 - acc: 0.9125 - val_loss: 0.3681 - val_acc: 0.9016
Epoch 96/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.2392 - acc: 0.9211 - val_loss: 0.4090 - val_acc: 0.8891
Epoch 97/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.2432 - acc: 0.9213 - val_loss: 0.3536 - val_acc: 0.9046
Epoch 98/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.2430 - acc: 0.9205 - val_loss: 0.3895 - val_acc: 0.8961
Epoch 99/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.2386 - acc: 0.9211 - val_loss: 0.3641 - val_acc: 0.9031
Epoch 100/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.2458 - acc: 0.9195 - val_loss: 0.3539 - val_acc: 0.9066
In [51]:
print('Evaluate NN model with relu activations and changing the number of activators'); print('--'*40)
results3 = model3.evaluate(X_val, y_val)
print('Validation accuracy: {}'.format(round(results3[1]*100, 2), '%'))
Evaluate NN model with relu activations and changing the number of activators
--------------------------------------------------------------------------------
60000/60000 [==============================] - 3s 49us/sample - loss: 0.3539 - acc: 0.9066
Validation accuracy: 90.66

Observation 5 - NN model with relu activations and changing activators
  • Adding relu activations and changing activators results in improvement of score.
  • Best accuracy achieved till now is using relu activations, changing number of activators and Adam optimizers with a learning rate of 0.001
  • Next, let's try adding weight initilization.

With Weight Initializers

Changing weight initialization scheme can significantly improve training of the model by preventing vanishing gradient problem up to some degree.

NN model, relu activations, SGD optimizers with weight initializers

In [52]:
print('NN model with weight initializers'); print('--'*40)
# Initialize the neural network classifier
model4 = Sequential()

# Input Layer - adding input layer and activation functions relu and weight initializer
model4.add(Dense(256, input_shape = (1024, ), kernel_initializer = 'he_normal'))
# Adding activation function
model4.add(Activation('relu'))

#Hidden Layer 1 - adding first hidden layer
model4.add(Dense(128, kernel_initializer = 'he_normal', bias_initializer = 'he_uniform'))
# Adding activation function
model4.add(Activation('relu'))

#Hidden Layer 2 - adding second hidden layer
model4.add(Dense(64, kernel_initializer = 'he_normal', bias_initializer = 'he_uniform'))
# Adding activation function
model4.add(Activation('relu'))

#Hidden Layer 3 - adding third hidden layer
model4.add(Dense(32, kernel_initializer = 'he_normal', bias_initializer = 'he_uniform'))
# Adding activation function
model4.add(Activation('relu'))

# Output Layer - adding output layer which is of 10 nodes (digits)
model4.add(Dense(10, kernel_initializer = 'he_normal', bias_initializer = 'he_uniform'))
# Adding activation function
model4.add(Activation('softmax'))
NN model with weight initializers
--------------------------------------------------------------------------------
In [53]:
model4.summary()
Model: "sequential_3"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_10 (Dense)             (None, 256)               262400    
_________________________________________________________________
activation_10 (Activation)   (None, 256)               0         
_________________________________________________________________
dense_11 (Dense)             (None, 128)               32896     
_________________________________________________________________
activation_11 (Activation)   (None, 128)               0         
_________________________________________________________________
dense_12 (Dense)             (None, 64)                8256      
_________________________________________________________________
activation_12 (Activation)   (None, 64)                0         
_________________________________________________________________
dense_13 (Dense)             (None, 32)                2080      
_________________________________________________________________
activation_13 (Activation)   (None, 32)                0         
_________________________________________________________________
dense_14 (Dense)             (None, 10)                330       
_________________________________________________________________
activation_14 (Activation)   (None, 10)                0         
=================================================================
Total params: 305,962
Trainable params: 305,962
Non-trainable params: 0
_________________________________________________________________
In [54]:
# compiling the neural network classifier, sgd optimizer
sgd = optimizers.SGD(lr = 0.01)
# Adding activation function - softmax for multiclass classification
model4.compile(optimizer = sgd, loss = 'categorical_crossentropy', metrics = ['accuracy'])

# Fitting the neural network for training
history = model4.fit(X_train, y_train, validation_data = (X_val, y_val), batch_size = 200, epochs = 100, verbose = 1)
Train on 42000 samples, validate on 60000 samples
Epoch 1/100
42000/42000 [==============================] - 1s 31us/sample - loss: 2.3023 - acc: 0.1175 - val_loss: 2.2772 - val_acc: 0.1445
Epoch 2/100
42000/42000 [==============================] - 1s 28us/sample - loss: 2.2583 - acc: 0.1590 - val_loss: 2.2352 - val_acc: 0.1811
Epoch 3/100
42000/42000 [==============================] - 1s 28us/sample - loss: 2.2025 - acc: 0.1988 - val_loss: 2.1630 - val_acc: 0.2169
Epoch 4/100
42000/42000 [==============================] - 1s 28us/sample - loss: 2.1205 - acc: 0.2455 - val_loss: 2.0677 - val_acc: 0.2791
Epoch 5/100
42000/42000 [==============================] - 1s 28us/sample - loss: 2.0151 - acc: 0.3002 - val_loss: 1.9792 - val_acc: 0.2936
Epoch 6/100
42000/42000 [==============================] - 1s 28us/sample - loss: 1.9102 - acc: 0.3502 - val_loss: 1.8788 - val_acc: 0.3721
Epoch 7/100
42000/42000 [==============================] - 1s 31us/sample - loss: 1.8119 - acc: 0.3987 - val_loss: 1.7286 - val_acc: 0.4290
Epoch 8/100
42000/42000 [==============================] - 1s 28us/sample - loss: 1.7188 - acc: 0.4394 - val_loss: 1.6887 - val_acc: 0.4459
Epoch 9/100
42000/42000 [==============================] - 1s 28us/sample - loss: 1.6454 - acc: 0.4664 - val_loss: 1.5630 - val_acc: 0.4977
Epoch 10/100
42000/42000 [==============================] - 1s 30us/sample - loss: 1.5575 - acc: 0.5016 - val_loss: 1.4854 - val_acc: 0.5321
Epoch 11/100
42000/42000 [==============================] - 1s 28us/sample - loss: 1.4950 - acc: 0.5230 - val_loss: 1.4138 - val_acc: 0.5634
Epoch 12/100
42000/42000 [==============================] - 1s 28us/sample - loss: 1.4227 - acc: 0.5528 - val_loss: 1.3826 - val_acc: 0.5729
Epoch 13/100
42000/42000 [==============================] - 1s 29us/sample - loss: 1.3695 - acc: 0.5711 - val_loss: 1.3418 - val_acc: 0.5759
Epoch 14/100
42000/42000 [==============================] - 1s 28us/sample - loss: 1.3157 - acc: 0.5902 - val_loss: 1.2598 - val_acc: 0.6136
Epoch 15/100
42000/42000 [==============================] - 1s 28us/sample - loss: 1.2658 - acc: 0.6087 - val_loss: 1.2570 - val_acc: 0.6051
Epoch 16/100
42000/42000 [==============================] - 1s 28us/sample - loss: 1.2305 - acc: 0.6206 - val_loss: 1.1832 - val_acc: 0.6360
Epoch 17/100
42000/42000 [==============================] - 1s 28us/sample - loss: 1.1781 - acc: 0.6407 - val_loss: 1.1872 - val_acc: 0.6325
Epoch 18/100
42000/42000 [==============================] - 1s 28us/sample - loss: 1.1564 - acc: 0.6430 - val_loss: 1.1064 - val_acc: 0.6654
Epoch 19/100
42000/42000 [==============================] - 1s 28us/sample - loss: 1.1314 - acc: 0.6545 - val_loss: 1.1058 - val_acc: 0.6610
Epoch 20/100
42000/42000 [==============================] - 1s 28us/sample - loss: 1.1027 - acc: 0.6636 - val_loss: 1.0935 - val_acc: 0.6723
Epoch 21/100
42000/42000 [==============================] - 1s 29us/sample - loss: 1.0788 - acc: 0.6717 - val_loss: 1.0264 - val_acc: 0.6935
Epoch 22/100
42000/42000 [==============================] - 1s 29us/sample - loss: 1.0554 - acc: 0.6769 - val_loss: 1.0594 - val_acc: 0.6763
Epoch 23/100
42000/42000 [==============================] - 1s 29us/sample - loss: 1.0394 - acc: 0.6810 - val_loss: 1.0024 - val_acc: 0.6969
Epoch 24/100
42000/42000 [==============================] - 1s 28us/sample - loss: 1.0207 - acc: 0.6883 - val_loss: 1.0085 - val_acc: 0.6931
Epoch 25/100
42000/42000 [==============================] - 1s 28us/sample - loss: 1.0036 - acc: 0.6933 - val_loss: 0.9629 - val_acc: 0.7112
Epoch 26/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.9817 - acc: 0.7005 - val_loss: 0.9621 - val_acc: 0.7084
Epoch 27/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.9767 - acc: 0.7021 - val_loss: 0.9664 - val_acc: 0.7089
Epoch 28/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.9630 - acc: 0.7063 - val_loss: 0.9301 - val_acc: 0.7188
Epoch 29/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.9437 - acc: 0.7120 - val_loss: 0.9326 - val_acc: 0.7169
Epoch 30/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.9364 - acc: 0.7131 - val_loss: 0.9065 - val_acc: 0.7268
Epoch 31/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.9258 - acc: 0.7177 - val_loss: 0.9086 - val_acc: 0.7228
Epoch 32/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.9093 - acc: 0.7233 - val_loss: 0.9859 - val_acc: 0.6932
Epoch 33/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.8930 - acc: 0.7258 - val_loss: 0.8975 - val_acc: 0.7272
Epoch 34/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.8898 - acc: 0.7287 - val_loss: 0.9300 - val_acc: 0.7106
Epoch 35/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.8749 - acc: 0.7321 - val_loss: 0.8580 - val_acc: 0.7386
Epoch 36/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.8671 - acc: 0.7364 - val_loss: 0.8540 - val_acc: 0.7415
Epoch 37/100
42000/42000 [==============================] - 1s 31us/sample - loss: 0.8604 - acc: 0.7385 - val_loss: 0.8776 - val_acc: 0.7315
Epoch 38/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.8520 - acc: 0.7388 - val_loss: 0.8911 - val_acc: 0.7285
Epoch 39/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.8411 - acc: 0.7417 - val_loss: 0.8206 - val_acc: 0.7509
Epoch 40/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.8298 - acc: 0.7486 - val_loss: 0.8165 - val_acc: 0.7536
Epoch 41/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.8214 - acc: 0.7479 - val_loss: 0.8244 - val_acc: 0.7494
Epoch 42/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.8155 - acc: 0.7504 - val_loss: 0.8274 - val_acc: 0.7457
Epoch 43/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.8055 - acc: 0.7525 - val_loss: 0.7904 - val_acc: 0.7614
Epoch 44/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.7994 - acc: 0.7544 - val_loss: 0.8214 - val_acc: 0.7481
Epoch 45/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.7903 - acc: 0.7585 - val_loss: 0.7907 - val_acc: 0.7623
Epoch 46/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.7772 - acc: 0.7641 - val_loss: 0.7760 - val_acc: 0.7646
Epoch 47/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.7708 - acc: 0.7637 - val_loss: 0.7748 - val_acc: 0.7641
Epoch 48/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.7647 - acc: 0.7668 - val_loss: 0.7755 - val_acc: 0.7647
Epoch 49/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.7619 - acc: 0.7675 - val_loss: 0.8349 - val_acc: 0.7467
Epoch 50/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.7527 - acc: 0.7716 - val_loss: 0.8013 - val_acc: 0.7557
Epoch 51/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.7460 - acc: 0.7721 - val_loss: 0.8218 - val_acc: 0.7488
Epoch 52/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.7374 - acc: 0.7745 - val_loss: 0.7460 - val_acc: 0.7739
Epoch 53/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.7298 - acc: 0.7764 - val_loss: 0.7361 - val_acc: 0.7758
Epoch 54/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.7241 - acc: 0.7789 - val_loss: 0.7359 - val_acc: 0.7771
Epoch 55/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.7231 - acc: 0.7796 - val_loss: 0.7706 - val_acc: 0.7632
Epoch 56/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.7154 - acc: 0.7819 - val_loss: 0.6990 - val_acc: 0.7884
Epoch 57/100
42000/42000 [==============================] - 1s 31us/sample - loss: 0.7089 - acc: 0.7844 - val_loss: 0.7225 - val_acc: 0.7826
Epoch 58/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.6965 - acc: 0.7884 - val_loss: 0.7235 - val_acc: 0.7802
Epoch 59/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.6945 - acc: 0.7878 - val_loss: 0.6968 - val_acc: 0.7902
Epoch 60/100
42000/42000 [==============================] - 1s 31us/sample - loss: 0.6934 - acc: 0.7889 - val_loss: 0.6906 - val_acc: 0.7930
Epoch 61/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.6812 - acc: 0.7935 - val_loss: 0.7031 - val_acc: 0.7886
Epoch 62/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6734 - acc: 0.7948 - val_loss: 0.6841 - val_acc: 0.7936
Epoch 63/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.6704 - acc: 0.7957 - val_loss: 0.7160 - val_acc: 0.7796
Epoch 64/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.6639 - acc: 0.7963 - val_loss: 0.7002 - val_acc: 0.7873
Epoch 65/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.6635 - acc: 0.7970 - val_loss: 0.6885 - val_acc: 0.7896
Epoch 66/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6577 - acc: 0.7989 - val_loss: 0.6442 - val_acc: 0.8066
Epoch 67/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.6460 - acc: 0.8025 - val_loss: 0.6553 - val_acc: 0.8033
Epoch 68/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.6456 - acc: 0.8022 - val_loss: 0.6597 - val_acc: 0.8029
Epoch 69/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.6428 - acc: 0.8029 - val_loss: 0.6701 - val_acc: 0.7975
Epoch 70/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6438 - acc: 0.8018 - val_loss: 0.6699 - val_acc: 0.7985
Epoch 71/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6290 - acc: 0.8080 - val_loss: 0.6421 - val_acc: 0.8052
Epoch 72/100
42000/42000 [==============================] - 1s 31us/sample - loss: 0.6306 - acc: 0.8080 - val_loss: 0.7013 - val_acc: 0.7821
Epoch 73/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.6264 - acc: 0.8086 - val_loss: 0.6395 - val_acc: 0.8061
Epoch 74/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.6162 - acc: 0.8125 - val_loss: 0.6528 - val_acc: 0.8025
Epoch 75/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.6137 - acc: 0.8125 - val_loss: 0.6327 - val_acc: 0.8090
Epoch 76/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.6107 - acc: 0.8144 - val_loss: 0.6555 - val_acc: 0.8019
Epoch 77/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.6071 - acc: 0.8128 - val_loss: 0.6139 - val_acc: 0.8156
Epoch 78/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.6012 - acc: 0.8164 - val_loss: 0.6277 - val_acc: 0.8092
Epoch 79/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.6028 - acc: 0.8175 - val_loss: 0.6266 - val_acc: 0.8112
Epoch 80/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.5946 - acc: 0.8199 - val_loss: 0.6035 - val_acc: 0.8183
Epoch 81/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.5892 - acc: 0.8203 - val_loss: 0.6225 - val_acc: 0.8119
Epoch 82/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.5836 - acc: 0.8201 - val_loss: 0.6568 - val_acc: 0.8032
Epoch 83/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.5772 - acc: 0.8246 - val_loss: 0.6034 - val_acc: 0.8190
Epoch 84/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.5778 - acc: 0.8245 - val_loss: 0.6260 - val_acc: 0.8090
Epoch 85/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.5757 - acc: 0.8242 - val_loss: 0.5882 - val_acc: 0.8230
Epoch 86/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.5666 - acc: 0.8275 - val_loss: 0.6215 - val_acc: 0.8104
Epoch 87/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.5657 - acc: 0.8270 - val_loss: 0.5943 - val_acc: 0.8211
Epoch 88/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.5611 - acc: 0.8284 - val_loss: 0.5841 - val_acc: 0.8231
Epoch 89/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.5546 - acc: 0.8317 - val_loss: 0.5673 - val_acc: 0.8303
Epoch 90/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.5512 - acc: 0.8315 - val_loss: 0.5769 - val_acc: 0.8254
Epoch 91/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.5524 - acc: 0.8309 - val_loss: 0.5750 - val_acc: 0.8269
Epoch 92/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.5496 - acc: 0.8310 - val_loss: 0.5701 - val_acc: 0.8285
Epoch 93/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.5473 - acc: 0.8340 - val_loss: 0.5738 - val_acc: 0.8263
Epoch 94/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.5401 - acc: 0.8360 - val_loss: 0.5505 - val_acc: 0.8365
Epoch 95/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.5413 - acc: 0.8350 - val_loss: 0.6173 - val_acc: 0.8122
Epoch 96/100
42000/42000 [==============================] - 1s 27us/sample - loss: 0.5368 - acc: 0.8359 - val_loss: 0.5707 - val_acc: 0.8304
Epoch 97/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.5295 - acc: 0.8385 - val_loss: 0.5667 - val_acc: 0.8296
Epoch 98/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.5249 - acc: 0.8381 - val_loss: 0.5843 - val_acc: 0.8244
Epoch 99/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.5273 - acc: 0.8393 - val_loss: 0.5687 - val_acc: 0.8294
Epoch 100/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.5213 - acc: 0.8399 - val_loss: 0.5648 - val_acc: 0.8295
In [55]:
print('NN with weight initializers'); print('--'*40)
results4 = model4.evaluate(X_val, y_val)
print('Validation accuracy: {}'.format(round(results4[1]*100, 2), '%'))
NN with weight initializers
--------------------------------------------------------------------------------
60000/60000 [==============================] - 3s 51us/sample - loss: 0.5648 - acc: 0.8295
Validation accuracy: 82.95

NN model, relu activations, Adam optimizers with weight initializers

In [56]:
# compiling the neural network classifier, adam optimizer
adam = optimizers.Adam(lr = 0.001)
# Adding activation function - softmax for multiclass classification
model4.compile(optimizer = adam, loss = 'categorical_crossentropy', metrics = ['accuracy'])

# Fitting the neural network for training
history = model4.fit(X_train, y_train, validation_data = (X_val, y_val), batch_size = 200, epochs = 100, verbose = 1)
Train on 42000 samples, validate on 60000 samples
Epoch 1/100
42000/42000 [==============================] - 1s 33us/sample - loss: 0.8991 - acc: 0.7259 - val_loss: 0.7777 - val_acc: 0.7567
Epoch 2/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.7556 - acc: 0.7651 - val_loss: 0.7145 - val_acc: 0.7837
Epoch 3/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.7233 - acc: 0.7769 - val_loss: 0.7429 - val_acc: 0.7717
Epoch 4/100
42000/42000 [==============================] - 1s 31us/sample - loss: 0.7134 - acc: 0.7808 - val_loss: 0.7203 - val_acc: 0.7836
Epoch 5/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.7006 - acc: 0.7834 - val_loss: 0.7579 - val_acc: 0.7643
Epoch 6/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.6854 - acc: 0.7876 - val_loss: 0.6947 - val_acc: 0.7862
Epoch 7/100
42000/42000 [==============================] - 1s 31us/sample - loss: 0.6375 - acc: 0.8048 - val_loss: 0.6920 - val_acc: 0.7847
Epoch 8/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.6482 - acc: 0.7993 - val_loss: 0.7172 - val_acc: 0.7798
Epoch 9/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.6429 - acc: 0.8015 - val_loss: 0.6664 - val_acc: 0.7970
Epoch 10/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.6220 - acc: 0.8086 - val_loss: 0.5867 - val_acc: 0.8212
Epoch 11/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.6078 - acc: 0.8123 - val_loss: 0.6032 - val_acc: 0.8152
Epoch 12/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.5877 - acc: 0.8190 - val_loss: 0.5871 - val_acc: 0.8198
Epoch 13/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.6041 - acc: 0.8137 - val_loss: 0.6405 - val_acc: 0.8071
Epoch 14/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.5615 - acc: 0.8272 - val_loss: 0.6205 - val_acc: 0.8112
Epoch 15/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.5613 - acc: 0.8273 - val_loss: 0.5388 - val_acc: 0.8364
Epoch 16/100
42000/42000 [==============================] - 1s 31us/sample - loss: 0.5540 - acc: 0.8296 - val_loss: 0.5700 - val_acc: 0.8283
Epoch 17/100
42000/42000 [==============================] - 1s 31us/sample - loss: 0.5484 - acc: 0.8299 - val_loss: 0.6048 - val_acc: 0.8130
Epoch 18/100
42000/42000 [==============================] - 1s 32us/sample - loss: 0.5484 - acc: 0.8280 - val_loss: 0.5539 - val_acc: 0.8290
Epoch 19/100
42000/42000 [==============================] - 1s 33us/sample - loss: 0.5517 - acc: 0.8289 - val_loss: 0.5933 - val_acc: 0.8200
Epoch 20/100
42000/42000 [==============================] - 1s 31us/sample - loss: 0.5333 - acc: 0.8333 - val_loss: 0.5172 - val_acc: 0.8419
Epoch 21/100
42000/42000 [==============================] - 1s 31us/sample - loss: 0.5145 - acc: 0.8397 - val_loss: 0.5776 - val_acc: 0.8227
Epoch 22/100
42000/42000 [==============================] - 1s 31us/sample - loss: 0.5038 - acc: 0.8428 - val_loss: 0.5885 - val_acc: 0.8203
Epoch 23/100
42000/42000 [==============================] - 1s 31us/sample - loss: 0.5021 - acc: 0.8435 - val_loss: 0.5563 - val_acc: 0.8313
Epoch 24/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.4995 - acc: 0.8436 - val_loss: 0.5187 - val_acc: 0.8443
Epoch 25/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.4902 - acc: 0.8461 - val_loss: 0.5196 - val_acc: 0.8422
Epoch 26/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4710 - acc: 0.8533 - val_loss: 0.4911 - val_acc: 0.8530
Epoch 27/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4936 - acc: 0.8451 - val_loss: 0.5424 - val_acc: 0.8342
Epoch 28/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4797 - acc: 0.8485 - val_loss: 0.5177 - val_acc: 0.8429
Epoch 29/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4690 - acc: 0.8533 - val_loss: 0.4953 - val_acc: 0.8504
Epoch 30/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4609 - acc: 0.8557 - val_loss: 0.5042 - val_acc: 0.8469
Epoch 31/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4545 - acc: 0.8564 - val_loss: 0.4685 - val_acc: 0.8587
Epoch 32/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.4528 - acc: 0.8584 - val_loss: 0.5315 - val_acc: 0.8411
Epoch 33/100
42000/42000 [==============================] - 1s 31us/sample - loss: 0.4509 - acc: 0.8584 - val_loss: 0.5534 - val_acc: 0.8298
Epoch 34/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.4374 - acc: 0.8635 - val_loss: 0.5036 - val_acc: 0.8473
Epoch 35/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4344 - acc: 0.8632 - val_loss: 0.4816 - val_acc: 0.8529
Epoch 36/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4361 - acc: 0.8620 - val_loss: 0.6043 - val_acc: 0.8112
Epoch 37/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4331 - acc: 0.8619 - val_loss: 0.5036 - val_acc: 0.8474
Epoch 38/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4255 - acc: 0.8643 - val_loss: 0.4520 - val_acc: 0.8638
Epoch 39/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4218 - acc: 0.8663 - val_loss: 0.4762 - val_acc: 0.8538
Epoch 40/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4104 - acc: 0.8693 - val_loss: 0.4917 - val_acc: 0.8511
Epoch 41/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4133 - acc: 0.8700 - val_loss: 0.4658 - val_acc: 0.8579
Epoch 42/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4138 - acc: 0.8682 - val_loss: 0.4311 - val_acc: 0.8703
Epoch 43/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.4022 - acc: 0.8722 - val_loss: 0.5085 - val_acc: 0.8441
Epoch 44/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3980 - acc: 0.8724 - val_loss: 0.4468 - val_acc: 0.8643
Epoch 45/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.3898 - acc: 0.8754 - val_loss: 0.4190 - val_acc: 0.8749
Epoch 46/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3825 - acc: 0.8784 - val_loss: 0.4479 - val_acc: 0.8654
Epoch 47/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.3758 - acc: 0.8804 - val_loss: 0.4472 - val_acc: 0.8663
Epoch 48/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3744 - acc: 0.8803 - val_loss: 0.4578 - val_acc: 0.8621
Epoch 49/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3732 - acc: 0.8777 - val_loss: 0.4057 - val_acc: 0.8789
Epoch 50/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3702 - acc: 0.8817 - val_loss: 0.4784 - val_acc: 0.8530
Epoch 51/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.3802 - acc: 0.8784 - val_loss: 0.4168 - val_acc: 0.8766
Epoch 52/100
42000/42000 [==============================] - 1s 31us/sample - loss: 0.3640 - acc: 0.8855 - val_loss: 0.4354 - val_acc: 0.8693
Epoch 53/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3655 - acc: 0.8817 - val_loss: 0.4408 - val_acc: 0.8706
Epoch 54/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.3629 - acc: 0.8839 - val_loss: 0.4955 - val_acc: 0.8507
Epoch 55/100
42000/42000 [==============================] - 1s 33us/sample - loss: 0.3575 - acc: 0.8848 - val_loss: 0.4405 - val_acc: 0.8676
Epoch 56/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.3460 - acc: 0.8889 - val_loss: 0.4270 - val_acc: 0.8721
Epoch 57/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.3463 - acc: 0.8882 - val_loss: 0.4216 - val_acc: 0.8741
Epoch 58/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.3444 - acc: 0.8886 - val_loss: 0.4267 - val_acc: 0.8738
Epoch 59/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3352 - acc: 0.8926 - val_loss: 0.4318 - val_acc: 0.8720
Epoch 60/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.3576 - acc: 0.8835 - val_loss: 0.4341 - val_acc: 0.8686
Epoch 61/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3416 - acc: 0.8905 - val_loss: 0.4321 - val_acc: 0.8719
Epoch 62/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3321 - acc: 0.8934 - val_loss: 0.4097 - val_acc: 0.8776
Epoch 63/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3265 - acc: 0.8952 - val_loss: 0.3944 - val_acc: 0.8839
Epoch 64/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.3281 - acc: 0.8930 - val_loss: 0.4291 - val_acc: 0.8740
Epoch 65/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3175 - acc: 0.8973 - val_loss: 0.4325 - val_acc: 0.8718
Epoch 66/100
42000/42000 [==============================] - 1s 32us/sample - loss: 0.3356 - acc: 0.8910 - val_loss: 0.3925 - val_acc: 0.8855
Epoch 67/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.3264 - acc: 0.8937 - val_loss: 0.4110 - val_acc: 0.8832
Epoch 68/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3193 - acc: 0.8961 - val_loss: 0.4201 - val_acc: 0.8763
Epoch 69/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3213 - acc: 0.8955 - val_loss: 0.4312 - val_acc: 0.8732
Epoch 70/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3135 - acc: 0.8977 - val_loss: 0.4648 - val_acc: 0.8613
Epoch 71/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3129 - acc: 0.8984 - val_loss: 0.3997 - val_acc: 0.8838
Epoch 72/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3039 - acc: 0.9007 - val_loss: 0.4164 - val_acc: 0.8781
Epoch 73/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.3088 - acc: 0.9005 - val_loss: 0.4063 - val_acc: 0.8815
Epoch 74/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.3041 - acc: 0.9022 - val_loss: 0.3704 - val_acc: 0.8949
Epoch 75/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.2971 - acc: 0.9039 - val_loss: 0.4283 - val_acc: 0.8779
Epoch 76/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3031 - acc: 0.9018 - val_loss: 0.4305 - val_acc: 0.8752
Epoch 77/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.2963 - acc: 0.9041 - val_loss: 0.4077 - val_acc: 0.8816
Epoch 78/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.2873 - acc: 0.9086 - val_loss: 0.4098 - val_acc: 0.8811
Epoch 79/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.2926 - acc: 0.9044 - val_loss: 0.3991 - val_acc: 0.8887
Epoch 80/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.2840 - acc: 0.9085 - val_loss: 0.3795 - val_acc: 0.8923
Epoch 81/100
42000/42000 [==============================] - 1s 31us/sample - loss: 0.2877 - acc: 0.9056 - val_loss: 0.4328 - val_acc: 0.8731
Epoch 82/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.2810 - acc: 0.9073 - val_loss: 0.3815 - val_acc: 0.8924
Epoch 83/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.2881 - acc: 0.9066 - val_loss: 0.4516 - val_acc: 0.8707
Epoch 84/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.2902 - acc: 0.9050 - val_loss: 0.3960 - val_acc: 0.8873
Epoch 85/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.2647 - acc: 0.9148 - val_loss: 0.3972 - val_acc: 0.8877
Epoch 86/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.2760 - acc: 0.9100 - val_loss: 0.3952 - val_acc: 0.8874
Epoch 87/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.2739 - acc: 0.9095 - val_loss: 0.4250 - val_acc: 0.8813
Epoch 88/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.2745 - acc: 0.9099 - val_loss: 0.4271 - val_acc: 0.8791
Epoch 89/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.2576 - acc: 0.9163 - val_loss: 0.4173 - val_acc: 0.8834
Epoch 90/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.2657 - acc: 0.9130 - val_loss: 0.3729 - val_acc: 0.8973
Epoch 91/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.2493 - acc: 0.9177 - val_loss: 0.3755 - val_acc: 0.8954
Epoch 92/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.2611 - acc: 0.9147 - val_loss: 0.3596 - val_acc: 0.9012
Epoch 93/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.2543 - acc: 0.9182 - val_loss: 0.3893 - val_acc: 0.8908
Epoch 94/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.2478 - acc: 0.9176 - val_loss: 0.4036 - val_acc: 0.8882
Epoch 95/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.2613 - acc: 0.9129 - val_loss: 0.4178 - val_acc: 0.8851
Epoch 96/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.2535 - acc: 0.9170 - val_loss: 0.3966 - val_acc: 0.8928
Epoch 97/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.2520 - acc: 0.9163 - val_loss: 0.3765 - val_acc: 0.8971
Epoch 98/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.2548 - acc: 0.9166 - val_loss: 0.4093 - val_acc: 0.8857
Epoch 99/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.2535 - acc: 0.9170 - val_loss: 0.4349 - val_acc: 0.8797
Epoch 100/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.2450 - acc: 0.9200 - val_loss: 0.4037 - val_acc: 0.8873
In [57]:
print('NN with weight initializers'); print('--'*40)
results4 = model4.evaluate(X_val, y_val)
print('Validation accuracy: {}'.format(round(results4[1]*100, 2), '%'))
NN with weight initializers
--------------------------------------------------------------------------------
60000/60000 [==============================] - 3s 53us/sample - loss: 0.4037 - acc: 0.8873
Validation accuracy: 88.73

Observation 6 - Weight initializers
  • Adding weight initialiers didn't result in improvement of score.
  • relu activations, changing number of activators, Adam optimizers gives the best score out of the ones tried as of now.
  • Next, let's try batch normalization.

Batch Normalization

Batch Normalization, one of the methods to prevent the "internal covariance shift" problem, has proven to be highly effective. Normalize each mini-batch before nonlinearity.

NN model, relu activations, SGD optimizers with weight initializers and batch normalization

In [58]:
print('NN model with batch normalization'); print('--'*40)
# Initialize the neural network classifier
model5 = Sequential()

# Input Layer - adding input layer and activation functions relu and weight initializer
model5.add(Dense(256, input_shape = (1024, ), kernel_initializer = 'he_normal'))
# Adding batch normalization
model5.add(BatchNormalization())
# Adding activation function
model5.add(Activation('relu'))

#Hidden Layer 1 - adding first hidden layer
model5.add(Dense(128, kernel_initializer = 'he_normal', bias_initializer = 'he_uniform'))
# Adding batch normalization
model5.add(BatchNormalization())
# Adding activation function
model5.add(Activation('relu'))

#Hidden Layer 2 - adding second hidden layer
model5.add(Dense(64, kernel_initializer = 'he_normal', bias_initializer = 'he_uniform'))
# Adding batch normalization
model5.add(BatchNormalization())
# Adding activation function
model5.add(Activation('relu'))

#Hidden Layer 3 - adding third hidden layer
model5.add(Dense(32, kernel_initializer = 'he_normal', bias_initializer = 'he_uniform'))
# Adding batch normalization
model5.add(BatchNormalization())
# Adding activation function
model5.add(Activation('relu'))

# Output Layer - adding output layer which is of 10 nodes (digits)
model5.add(Dense(10, kernel_initializer = 'he_normal', bias_initializer = 'he_uniform'))
# Adding activation function
model5.add(Activation('softmax'))
NN model with batch normalization
--------------------------------------------------------------------------------
In [59]:
model5.summary()
Model: "sequential_4"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_15 (Dense)             (None, 256)               262400    
_________________________________________________________________
batch_normalization (BatchNo (None, 256)               1024      
_________________________________________________________________
activation_15 (Activation)   (None, 256)               0         
_________________________________________________________________
dense_16 (Dense)             (None, 128)               32896     
_________________________________________________________________
batch_normalization_1 (Batch (None, 128)               512       
_________________________________________________________________
activation_16 (Activation)   (None, 128)               0         
_________________________________________________________________
dense_17 (Dense)             (None, 64)                8256      
_________________________________________________________________
batch_normalization_2 (Batch (None, 64)                256       
_________________________________________________________________
activation_17 (Activation)   (None, 64)                0         
_________________________________________________________________
dense_18 (Dense)             (None, 32)                2080      
_________________________________________________________________
batch_normalization_3 (Batch (None, 32)                128       
_________________________________________________________________
activation_18 (Activation)   (None, 32)                0         
_________________________________________________________________
dense_19 (Dense)             (None, 10)                330       
_________________________________________________________________
activation_19 (Activation)   (None, 10)                0         
=================================================================
Total params: 307,882
Trainable params: 306,922
Non-trainable params: 960
_________________________________________________________________
In [60]:
# compiling the neural network classifier, sgd optimizer
sgd = optimizers.SGD(lr = 0.01)
# Adding activation function - softmax for multiclass classification
model5.compile(optimizer = sgd, loss = 'categorical_crossentropy', metrics = ['accuracy'])

# Fitting the neural network for training
history = model5.fit(X_train, y_train, validation_data = (X_val, y_val), batch_size = 200, epochs = 100, verbose = 1)
Train on 42000 samples, validate on 60000 samples
Epoch 1/100
42000/42000 [==============================] - 2s 58us/sample - loss: 2.3150 - acc: 0.1844 - val_loss: 2.2163 - val_acc: 0.1867
Epoch 2/100
42000/42000 [==============================] - 2s 48us/sample - loss: 1.8823 - acc: 0.3717 - val_loss: 1.8685 - val_acc: 0.3833
Epoch 3/100
42000/42000 [==============================] - 2s 47us/sample - loss: 1.6338 - acc: 0.4850 - val_loss: 1.6083 - val_acc: 0.4873
Epoch 4/100
42000/42000 [==============================] - 2s 48us/sample - loss: 1.4471 - acc: 0.5612 - val_loss: 1.4532 - val_acc: 0.5435
Epoch 5/100
42000/42000 [==============================] - 2s 47us/sample - loss: 1.3017 - acc: 0.6110 - val_loss: 1.2819 - val_acc: 0.6060
Epoch 6/100
42000/42000 [==============================] - 2s 48us/sample - loss: 1.1888 - acc: 0.6459 - val_loss: 1.2110 - val_acc: 0.6357
Epoch 7/100
42000/42000 [==============================] - 2s 51us/sample - loss: 1.0931 - acc: 0.6720 - val_loss: 1.1470 - val_acc: 0.6414
Epoch 8/100
42000/42000 [==============================] - 2s 48us/sample - loss: 1.0188 - acc: 0.6929 - val_loss: 1.0641 - val_acc: 0.6710
Epoch 9/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.9576 - acc: 0.7099 - val_loss: 1.0080 - val_acc: 0.6921
Epoch 10/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.9052 - acc: 0.7238 - val_loss: 0.9584 - val_acc: 0.7056
Epoch 11/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.8609 - acc: 0.7344 - val_loss: 0.9839 - val_acc: 0.6924
Epoch 12/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.8205 - acc: 0.7494 - val_loss: 0.9355 - val_acc: 0.7078
Epoch 13/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.7844 - acc: 0.7591 - val_loss: 0.8979 - val_acc: 0.7123
Epoch 14/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.7555 - acc: 0.7667 - val_loss: 0.9690 - val_acc: 0.6953
Epoch 15/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.7297 - acc: 0.7736 - val_loss: 0.8284 - val_acc: 0.7423
Epoch 16/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.7034 - acc: 0.7813 - val_loss: 0.9007 - val_acc: 0.7136
Epoch 17/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.6805 - acc: 0.7895 - val_loss: 0.9803 - val_acc: 0.6925
Epoch 18/100
42000/42000 [==============================] - 2s 46us/sample - loss: 0.6578 - acc: 0.7969 - val_loss: 0.7991 - val_acc: 0.7457
Epoch 19/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.6425 - acc: 0.8005 - val_loss: 0.8517 - val_acc: 0.7338
Epoch 20/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.6255 - acc: 0.8049 - val_loss: 0.8117 - val_acc: 0.7400
Epoch 21/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.6085 - acc: 0.8114 - val_loss: 0.8065 - val_acc: 0.7500
Epoch 22/100
42000/42000 [==============================] - 2s 46us/sample - loss: 0.5910 - acc: 0.8164 - val_loss: 0.7252 - val_acc: 0.7730
Epoch 23/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.5801 - acc: 0.8195 - val_loss: 0.8067 - val_acc: 0.7415
Epoch 24/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.5632 - acc: 0.8253 - val_loss: 0.8137 - val_acc: 0.7422
Epoch 25/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.5505 - acc: 0.8272 - val_loss: 0.6932 - val_acc: 0.7826
Epoch 26/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.5378 - acc: 0.8322 - val_loss: 0.9460 - val_acc: 0.7073
Epoch 27/100
42000/42000 [==============================] - 2s 46us/sample - loss: 0.5254 - acc: 0.8360 - val_loss: 0.6978 - val_acc: 0.7786
Epoch 28/100
42000/42000 [==============================] - 2s 51us/sample - loss: 0.5168 - acc: 0.8379 - val_loss: 0.8921 - val_acc: 0.7368
Epoch 29/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.5062 - acc: 0.8424 - val_loss: 0.7901 - val_acc: 0.7526
Epoch 30/100
42000/42000 [==============================] - 2s 52us/sample - loss: 0.4920 - acc: 0.8462 - val_loss: 0.6657 - val_acc: 0.7909
Epoch 31/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.4873 - acc: 0.8475 - val_loss: 0.9750 - val_acc: 0.7089
Epoch 32/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.4793 - acc: 0.8509 - val_loss: 0.7757 - val_acc: 0.7625
Epoch 33/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.4699 - acc: 0.8526 - val_loss: 0.7678 - val_acc: 0.7598
Epoch 34/100
42000/42000 [==============================] - 2s 46us/sample - loss: 0.4629 - acc: 0.8545 - val_loss: 0.7930 - val_acc: 0.7597
Epoch 35/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.4515 - acc: 0.8596 - val_loss: 0.6978 - val_acc: 0.7844
Epoch 36/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.4392 - acc: 0.8636 - val_loss: 0.6710 - val_acc: 0.7926
Epoch 37/100
42000/42000 [==============================] - 2s 52us/sample - loss: 0.4332 - acc: 0.8650 - val_loss: 0.7249 - val_acc: 0.7742
Epoch 38/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.4249 - acc: 0.8675 - val_loss: 0.6876 - val_acc: 0.7842
Epoch 39/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.4179 - acc: 0.8695 - val_loss: 0.6177 - val_acc: 0.8071
Epoch 40/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.4102 - acc: 0.8732 - val_loss: 1.1939 - val_acc: 0.6915
Epoch 41/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.4061 - acc: 0.8729 - val_loss: 0.7878 - val_acc: 0.7606
Epoch 42/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.4005 - acc: 0.8746 - val_loss: 0.6239 - val_acc: 0.8039
Epoch 43/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.3936 - acc: 0.8773 - val_loss: 0.6273 - val_acc: 0.8059
Epoch 44/100
42000/42000 [==============================] - 2s 46us/sample - loss: 0.3872 - acc: 0.8802 - val_loss: 0.5841 - val_acc: 0.8211
Epoch 45/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.3789 - acc: 0.8835 - val_loss: 0.5863 - val_acc: 0.8203
Epoch 46/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.3709 - acc: 0.8839 - val_loss: 0.6137 - val_acc: 0.8091
Epoch 47/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.3669 - acc: 0.8873 - val_loss: 0.6152 - val_acc: 0.8077
Epoch 48/100
42000/42000 [==============================] - 2s 46us/sample - loss: 0.3640 - acc: 0.8870 - val_loss: 0.6704 - val_acc: 0.7876
Epoch 49/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.3575 - acc: 0.8888 - val_loss: 0.6125 - val_acc: 0.8124
Epoch 50/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.3488 - acc: 0.8913 - val_loss: 0.7482 - val_acc: 0.7721
Epoch 51/100
42000/42000 [==============================] - 2s 46us/sample - loss: 0.3417 - acc: 0.8925 - val_loss: 0.6436 - val_acc: 0.8008
Epoch 52/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.3388 - acc: 0.8943 - val_loss: 0.6630 - val_acc: 0.8002
Epoch 53/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.3363 - acc: 0.8953 - val_loss: 1.0303 - val_acc: 0.7077
Epoch 54/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.3302 - acc: 0.8977 - val_loss: 0.6981 - val_acc: 0.7801
Epoch 55/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.3246 - acc: 0.8993 - val_loss: 0.7736 - val_acc: 0.7778
Epoch 56/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.3225 - acc: 0.9000 - val_loss: 0.9034 - val_acc: 0.7348
Epoch 57/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.3214 - acc: 0.9002 - val_loss: 0.9498 - val_acc: 0.7229
Epoch 58/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.3085 - acc: 0.9051 - val_loss: 0.5623 - val_acc: 0.8292
Epoch 59/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.3056 - acc: 0.9064 - val_loss: 0.9083 - val_acc: 0.7462
Epoch 60/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.3017 - acc: 0.9069 - val_loss: 0.6753 - val_acc: 0.7982
Epoch 61/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.2960 - acc: 0.9097 - val_loss: 0.8200 - val_acc: 0.7688
Epoch 62/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.2921 - acc: 0.9085 - val_loss: 0.7007 - val_acc: 0.7875
Epoch 63/100
42000/42000 [==============================] - 2s 46us/sample - loss: 0.2870 - acc: 0.9111 - val_loss: 0.5730 - val_acc: 0.8280
Epoch 64/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.2868 - acc: 0.9124 - val_loss: 0.6654 - val_acc: 0.8033
Epoch 65/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.2802 - acc: 0.9140 - val_loss: 0.9222 - val_acc: 0.7542
Epoch 66/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.2723 - acc: 0.9172 - val_loss: 0.7212 - val_acc: 0.7878
Epoch 67/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.2724 - acc: 0.9156 - val_loss: 0.5668 - val_acc: 0.8287
Epoch 68/100
42000/42000 [==============================] - 2s 46us/sample - loss: 0.2662 - acc: 0.9178 - val_loss: 0.6644 - val_acc: 0.8062
Epoch 69/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.2692 - acc: 0.9163 - val_loss: 0.5836 - val_acc: 0.8216
Epoch 70/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.2586 - acc: 0.9201 - val_loss: 0.6806 - val_acc: 0.7996
Epoch 71/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.2588 - acc: 0.9211 - val_loss: 0.6559 - val_acc: 0.8069
Epoch 72/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.2556 - acc: 0.9227 - val_loss: 0.6227 - val_acc: 0.8114
Epoch 73/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.2572 - acc: 0.9206 - val_loss: 0.5525 - val_acc: 0.8367
Epoch 74/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.2480 - acc: 0.9237 - val_loss: 1.9892 - val_acc: 0.6235
Epoch 75/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.2453 - acc: 0.9242 - val_loss: 0.9709 - val_acc: 0.7505
Epoch 76/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.2417 - acc: 0.9260 - val_loss: 0.5700 - val_acc: 0.8296
Epoch 77/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.2377 - acc: 0.9266 - val_loss: 0.5419 - val_acc: 0.8381
Epoch 78/100
42000/42000 [==============================] - 2s 46us/sample - loss: 0.2350 - acc: 0.9288 - val_loss: 0.7219 - val_acc: 0.7956
Epoch 79/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.2356 - acc: 0.9276 - val_loss: 0.6147 - val_acc: 0.8211
Epoch 80/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.2337 - acc: 0.9287 - val_loss: 0.6259 - val_acc: 0.8096
Epoch 81/100
42000/42000 [==============================] - 2s 46us/sample - loss: 0.2274 - acc: 0.9311 - val_loss: 0.5695 - val_acc: 0.8381
Epoch 82/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.2224 - acc: 0.9329 - val_loss: 0.5421 - val_acc: 0.8424
Epoch 83/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.2269 - acc: 0.9294 - val_loss: 0.8632 - val_acc: 0.7679
Epoch 84/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.2149 - acc: 0.9343 - val_loss: 0.6775 - val_acc: 0.8163
Epoch 85/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.2183 - acc: 0.9321 - val_loss: 0.6102 - val_acc: 0.8263
Epoch 86/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.2136 - acc: 0.9340 - val_loss: 0.6119 - val_acc: 0.8273
Epoch 87/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.2130 - acc: 0.9346 - val_loss: 0.9060 - val_acc: 0.7550
Epoch 88/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.2108 - acc: 0.9350 - val_loss: 0.7187 - val_acc: 0.8004
Epoch 89/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.2061 - acc: 0.9364 - val_loss: 0.6345 - val_acc: 0.8180
Epoch 90/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.2057 - acc: 0.9381 - val_loss: 0.4849 - val_acc: 0.8626
Epoch 91/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.1968 - acc: 0.9413 - val_loss: 0.6472 - val_acc: 0.8304
Epoch 92/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.1917 - acc: 0.9424 - val_loss: 0.6683 - val_acc: 0.8133
Epoch 93/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.1934 - acc: 0.9407 - val_loss: 0.7798 - val_acc: 0.7920
Epoch 94/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.1913 - acc: 0.9417 - val_loss: 0.7867 - val_acc: 0.7778
Epoch 95/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.1907 - acc: 0.9420 - val_loss: 0.6375 - val_acc: 0.8199
Epoch 96/100
42000/42000 [==============================] - 2s 51us/sample - loss: 0.1893 - acc: 0.9419 - val_loss: 0.6015 - val_acc: 0.8279
Epoch 97/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.1853 - acc: 0.9440 - val_loss: 0.5841 - val_acc: 0.8378
Epoch 98/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.1819 - acc: 0.9442 - val_loss: 0.7618 - val_acc: 0.8002
Epoch 99/100
42000/42000 [==============================] - 2s 46us/sample - loss: 0.1804 - acc: 0.9435 - val_loss: 0.7315 - val_acc: 0.8016
Epoch 100/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.1793 - acc: 0.9457 - val_loss: 0.7381 - val_acc: 0.8078
In [61]:
print('NN with batch normalization'); print('--'*40)
results5 = model5.evaluate(X_val, y_val)
print('Validation accuracy: {}'.format(round(results5[1]*100, 2), '%'))
NN with batch normalization
--------------------------------------------------------------------------------
60000/60000 [==============================] - 4s 68us/sample - loss: 0.7381 - acc: 0.8078
Validation accuracy: 80.78

NN model, relu activations, Adam optimizers with weight initializers and batch normalization

In [62]:
# compiling the neural network classifier, adam optimizer
adam = optimizers.Adam(lr = 0.001)
# Adding activation function - softmax for multiclass classification
model5.compile(optimizer = adam, loss = 'categorical_crossentropy', metrics = ['accuracy'])

# Fitting the neural network for training
history = model5.fit(X_train, y_train, validation_data = (X_val, y_val), batch_size = 200, epochs = 100, verbose = 1)
Train on 42000 samples, validate on 60000 samples
Epoch 1/100
42000/42000 [==============================] - 3s 62us/sample - loss: 0.7504 - acc: 0.7711 - val_loss: 3.2994 - val_acc: 0.3926
Epoch 2/100
42000/42000 [==============================] - 2s 55us/sample - loss: 0.5840 - acc: 0.8120 - val_loss: 1.2989 - val_acc: 0.5782
Epoch 3/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.5254 - acc: 0.8286 - val_loss: 1.5073 - val_acc: 0.5588
Epoch 4/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.4785 - acc: 0.8432 - val_loss: 1.4805 - val_acc: 0.5900
Epoch 5/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.4628 - acc: 0.8490 - val_loss: 1.5200 - val_acc: 0.5520
Epoch 6/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.4416 - acc: 0.8556 - val_loss: 1.1569 - val_acc: 0.6578
Epoch 7/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.4216 - acc: 0.8632 - val_loss: 1.1616 - val_acc: 0.6521
Epoch 8/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.3979 - acc: 0.8718 - val_loss: 1.0359 - val_acc: 0.6737
Epoch 9/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.3967 - acc: 0.8716 - val_loss: 1.4157 - val_acc: 0.6081
Epoch 10/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.3868 - acc: 0.8749 - val_loss: 1.1907 - val_acc: 0.6233
Epoch 11/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.3657 - acc: 0.8803 - val_loss: 1.0921 - val_acc: 0.6475
Epoch 12/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.3518 - acc: 0.8872 - val_loss: 1.0731 - val_acc: 0.6741
Epoch 13/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.3442 - acc: 0.8874 - val_loss: 1.1014 - val_acc: 0.6781
Epoch 14/100
42000/42000 [==============================] - 2s 54us/sample - loss: 0.3349 - acc: 0.8899 - val_loss: 1.2347 - val_acc: 0.6433
Epoch 15/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.3198 - acc: 0.8963 - val_loss: 1.0597 - val_acc: 0.6864
Epoch 16/100
42000/42000 [==============================] - 2s 52us/sample - loss: 0.3151 - acc: 0.8965 - val_loss: 1.3385 - val_acc: 0.6374
Epoch 17/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.3006 - acc: 0.9024 - val_loss: 1.2399 - val_acc: 0.6477
Epoch 18/100
42000/42000 [==============================] - 2s 51us/sample - loss: 0.2937 - acc: 0.9026 - val_loss: 1.7937 - val_acc: 0.6011
Epoch 19/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.2807 - acc: 0.9075 - val_loss: 0.8963 - val_acc: 0.7358
Epoch 20/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.2783 - acc: 0.9098 - val_loss: 0.9110 - val_acc: 0.7418
Epoch 21/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.2730 - acc: 0.9120 - val_loss: 0.9502 - val_acc: 0.7330
Epoch 22/100
42000/42000 [==============================] - 2s 52us/sample - loss: 0.2706 - acc: 0.9109 - val_loss: 0.9985 - val_acc: 0.7336
Epoch 23/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.2596 - acc: 0.9153 - val_loss: 1.4751 - val_acc: 0.6206
Epoch 24/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.2459 - acc: 0.9192 - val_loss: 1.0131 - val_acc: 0.7315
Epoch 25/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.2410 - acc: 0.9215 - val_loss: 1.3443 - val_acc: 0.6587
Epoch 26/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.2471 - acc: 0.9187 - val_loss: 1.0037 - val_acc: 0.7185
Epoch 27/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.2468 - acc: 0.9188 - val_loss: 0.8069 - val_acc: 0.7687
Epoch 28/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.2256 - acc: 0.9253 - val_loss: 0.7956 - val_acc: 0.7738
Epoch 29/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.2232 - acc: 0.9268 - val_loss: 0.9224 - val_acc: 0.7399
Epoch 30/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.2189 - acc: 0.9274 - val_loss: 0.8176 - val_acc: 0.7669
Epoch 31/100
42000/42000 [==============================] - 2s 53us/sample - loss: 0.2169 - acc: 0.9281 - val_loss: 0.8483 - val_acc: 0.7552
Epoch 32/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.2185 - acc: 0.9276 - val_loss: 0.9837 - val_acc: 0.7370
Epoch 33/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.2079 - acc: 0.9323 - val_loss: 0.8542 - val_acc: 0.7669
Epoch 34/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.2026 - acc: 0.9330 - val_loss: 0.9887 - val_acc: 0.7467
Epoch 35/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.1968 - acc: 0.9350 - val_loss: 1.0074 - val_acc: 0.7355
Epoch 36/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.1947 - acc: 0.9348 - val_loss: 0.8603 - val_acc: 0.7645
Epoch 37/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.1904 - acc: 0.9379 - val_loss: 0.7189 - val_acc: 0.7949
Epoch 38/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.1789 - acc: 0.9404 - val_loss: 0.9522 - val_acc: 0.7534
Epoch 39/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.1776 - acc: 0.9414 - val_loss: 0.7963 - val_acc: 0.8006
Epoch 40/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.1767 - acc: 0.9409 - val_loss: 0.9288 - val_acc: 0.7571
Epoch 41/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.1766 - acc: 0.9411 - val_loss: 1.1659 - val_acc: 0.7217
Epoch 42/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.1626 - acc: 0.9458 - val_loss: 0.9868 - val_acc: 0.7480
Epoch 43/100
42000/42000 [==============================] - 2s 51us/sample - loss: 0.1718 - acc: 0.9429 - val_loss: 0.7413 - val_acc: 0.7992
Epoch 44/100
42000/42000 [==============================] - 2s 51us/sample - loss: 0.1626 - acc: 0.9460 - val_loss: 1.3095 - val_acc: 0.6971
Epoch 45/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.1561 - acc: 0.9485 - val_loss: 0.8414 - val_acc: 0.7886
Epoch 46/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.1580 - acc: 0.9470 - val_loss: 0.7729 - val_acc: 0.8021
Epoch 47/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.1604 - acc: 0.9472 - val_loss: 1.3858 - val_acc: 0.7020
Epoch 48/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.1538 - acc: 0.9488 - val_loss: 1.4758 - val_acc: 0.6984
Epoch 49/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.1425 - acc: 0.9529 - val_loss: 1.3721 - val_acc: 0.7090
Epoch 50/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.1504 - acc: 0.9494 - val_loss: 1.0456 - val_acc: 0.7430
Epoch 51/100
42000/42000 [==============================] - 2s 52us/sample - loss: 0.1506 - acc: 0.9496 - val_loss: 1.0405 - val_acc: 0.7593
Epoch 52/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.1414 - acc: 0.9524 - val_loss: 0.8466 - val_acc: 0.7840
Epoch 53/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.1373 - acc: 0.9538 - val_loss: 0.9790 - val_acc: 0.7692
Epoch 54/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.1469 - acc: 0.9508 - val_loss: 0.6021 - val_acc: 0.8406
Epoch 55/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.1327 - acc: 0.9560 - val_loss: 1.0881 - val_acc: 0.7641
Epoch 56/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.1317 - acc: 0.9556 - val_loss: 1.0091 - val_acc: 0.7638
Epoch 57/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.1291 - acc: 0.9558 - val_loss: 1.0545 - val_acc: 0.7565
Epoch 58/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.1291 - acc: 0.9569 - val_loss: 0.6812 - val_acc: 0.8264
Epoch 59/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.1252 - acc: 0.9581 - val_loss: 0.8298 - val_acc: 0.8017
Epoch 60/100
42000/42000 [==============================] - 2s 51us/sample - loss: 0.1217 - acc: 0.9590 - val_loss: 1.1110 - val_acc: 0.7636
Epoch 61/100
42000/42000 [==============================] - 2s 51us/sample - loss: 0.1219 - acc: 0.9591 - val_loss: 0.8744 - val_acc: 0.7934
Epoch 62/100
42000/42000 [==============================] - 2s 51us/sample - loss: 0.1249 - acc: 0.9574 - val_loss: 1.2907 - val_acc: 0.7352
Epoch 63/100
42000/42000 [==============================] - 2s 47us/sample - loss: 0.1188 - acc: 0.9600 - val_loss: 0.9428 - val_acc: 0.7811
Epoch 64/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.1189 - acc: 0.9592 - val_loss: 1.0607 - val_acc: 0.7717
Epoch 65/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.1173 - acc: 0.9614 - val_loss: 0.8250 - val_acc: 0.8036
Epoch 66/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.1102 - acc: 0.9630 - val_loss: 0.8915 - val_acc: 0.7794
Epoch 67/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.1150 - acc: 0.9620 - val_loss: 0.8700 - val_acc: 0.7963
Epoch 68/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.1051 - acc: 0.9645 - val_loss: 0.8872 - val_acc: 0.7971
Epoch 69/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.1120 - acc: 0.9622 - val_loss: 1.2268 - val_acc: 0.7523
Epoch 70/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.1117 - acc: 0.9622 - val_loss: 0.8054 - val_acc: 0.8112
Epoch 71/100
42000/42000 [==============================] - 2s 53us/sample - loss: 0.1154 - acc: 0.9614 - val_loss: 0.8851 - val_acc: 0.7975
Epoch 72/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.1071 - acc: 0.9640 - val_loss: 1.2763 - val_acc: 0.7354
Epoch 73/100
42000/42000 [==============================] - 2s 53us/sample - loss: 0.1113 - acc: 0.9617 - val_loss: 1.0526 - val_acc: 0.7878
Epoch 74/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.0973 - acc: 0.9672 - val_loss: 0.9450 - val_acc: 0.7733
Epoch 75/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.1012 - acc: 0.9657 - val_loss: 1.3132 - val_acc: 0.7354
Epoch 76/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.1073 - acc: 0.9642 - val_loss: 1.2652 - val_acc: 0.7547
Epoch 77/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.0974 - acc: 0.9671 - val_loss: 0.6400 - val_acc: 0.8382
Epoch 78/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.0929 - acc: 0.9692 - val_loss: 1.3082 - val_acc: 0.7578
Epoch 79/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.0951 - acc: 0.9686 - val_loss: 0.6770 - val_acc: 0.8472
Epoch 80/100
42000/42000 [==============================] - 2s 52us/sample - loss: 0.0961 - acc: 0.9680 - val_loss: 0.8655 - val_acc: 0.8092
Epoch 81/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.0991 - acc: 0.9665 - val_loss: 0.8122 - val_acc: 0.8212
Epoch 82/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.0931 - acc: 0.9688 - val_loss: 0.7419 - val_acc: 0.8227
Epoch 83/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.0910 - acc: 0.9695 - val_loss: 1.1053 - val_acc: 0.7612
Epoch 84/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.1015 - acc: 0.9658 - val_loss: 0.8460 - val_acc: 0.8165
Epoch 85/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.0871 - acc: 0.9707 - val_loss: 0.8258 - val_acc: 0.8185
Epoch 86/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.0931 - acc: 0.9679 - val_loss: 0.7424 - val_acc: 0.8282
Epoch 87/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.0896 - acc: 0.9692 - val_loss: 1.8734 - val_acc: 0.7012
Epoch 88/100
42000/42000 [==============================] - 2s 48us/sample - loss: 0.0949 - acc: 0.9678 - val_loss: 0.9185 - val_acc: 0.7866
Epoch 89/100
42000/42000 [==============================] - 2s 54us/sample - loss: 0.0945 - acc: 0.9676 - val_loss: 1.1041 - val_acc: 0.7891
Epoch 90/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.0850 - acc: 0.9713 - val_loss: 1.0702 - val_acc: 0.7694
Epoch 91/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.0888 - acc: 0.9695 - val_loss: 0.8052 - val_acc: 0.8224
Epoch 92/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.0860 - acc: 0.9713 - val_loss: 0.7990 - val_acc: 0.8161
Epoch 93/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.0729 - acc: 0.9757 - val_loss: 1.1168 - val_acc: 0.7737
Epoch 94/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.0783 - acc: 0.9731 - val_loss: 0.7263 - val_acc: 0.8390
Epoch 95/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.0854 - acc: 0.9707 - val_loss: 0.6956 - val_acc: 0.8449
Epoch 96/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.0763 - acc: 0.9735 - val_loss: 1.0002 - val_acc: 0.8003
Epoch 97/100
42000/42000 [==============================] - 2s 50us/sample - loss: 0.0863 - acc: 0.9700 - val_loss: 0.9083 - val_acc: 0.8045
Epoch 98/100
42000/42000 [==============================] - 2s 51us/sample - loss: 0.0833 - acc: 0.9722 - val_loss: 1.1478 - val_acc: 0.7791
Epoch 99/100
42000/42000 [==============================] - 2s 49us/sample - loss: 0.0819 - acc: 0.9722 - val_loss: 0.9367 - val_acc: 0.8083
Epoch 100/100
42000/42000 [==============================] - 2s 52us/sample - loss: 0.0795 - acc: 0.9729 - val_loss: 1.1002 - val_acc: 0.7901
In [63]:
print('NN with batch normalization'); print('--'*40)
results5 = model5.evaluate(X_val, y_val)
print('Validation accuracy: {}'.format(round(results5[1]*100, 2), '%'))
NN with batch normalization
--------------------------------------------------------------------------------
60000/60000 [==============================] - 4s 68us/sample - loss: 1.1002 - acc: 0.7901
Validation accuracy: 79.01

Observation 7 - Batch Normalization
  • Batch normalization didn't result in improvement of score.
  • Relu activations, changing number of activators, Adam optimizers achieved the best score.
  • Next, let's try batch normalization with dropout.

Dropout

NN model, relu activations, SGD optimizers with weight initializers, batch normalization and dropout

In [64]:
print('NN model with dropout - sgd optimizer'); print('--'*40)
# Initialize the neural network classifier
model6 = Sequential()
# Input Layer - adding input layer and activation functions relu and weight initializer
model6.add(Dense(512, input_shape = (1024, ), kernel_initializer = 'he_normal'))
# Adding batch normalization
model6.add(BatchNormalization()) 
# Adding activation function
model6.add(Activation('relu'))
# Adding dropout layer
model6.add(Dropout(0.2))

#Hidden Layer 1 - adding first hidden layer
model6.add(Dense(256, kernel_initializer = 'he_normal', bias_initializer = 'he_uniform'))
# Adding batch normalization
model6.add(BatchNormalization())
# Adding activation function
model6.add(Activation('relu'))
# Adding dropout layer
model6.add(Dropout(0.2))

#Hidden Layer 2 - adding second hidden layer
model6.add(Dense(128, kernel_initializer = 'he_normal', bias_initializer = 'he_uniform'))
# Adding batch normalization
model6.add(BatchNormalization())
# Adding activation function
model6.add(Activation('relu'))
# Adding dropout layer
model6.add(Dropout(0.2))

#Hidden Layer 3 - adding third hidden layer
model6.add(Dense(64, kernel_initializer = 'he_normal', bias_initializer = 'he_uniform'))
# Adding batch normalization
model6.add(BatchNormalization())
# Adding activation function
model6.add(Activation('relu'))
# Adding dropout layer
model6.add(Dropout(0.2))

#Hidden Layer 4 - adding fourth hidden layer
model6.add(Dense(32, kernel_initializer = 'he_normal', bias_initializer = 'he_uniform'))
# Adding batch normalization
model6.add(BatchNormalization())
# Adding activation function
model6.add(Activation('relu'))
# Adding dropout layer
model6.add(Dropout(0.2))

# Output Layer - adding output layer which is of 10 nodes (digits)
model6.add(Dense(10, kernel_initializer = 'he_normal',bias_initializer = 'he_uniform'))
# Adding activation function
model6.add(Activation('softmax'))
NN model with dropout - sgd optimizer
--------------------------------------------------------------------------------
In [65]:
model6.summary()
Model: "sequential_5"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense_20 (Dense)             (None, 512)               524800    
_________________________________________________________________
batch_normalization_4 (Batch (None, 512)               2048      
_________________________________________________________________
activation_20 (Activation)   (None, 512)               0         
_________________________________________________________________
dropout (Dropout)            (None, 512)               0         
_________________________________________________________________
dense_21 (Dense)             (None, 256)               131328    
_________________________________________________________________
batch_normalization_5 (Batch (None, 256)               1024      
_________________________________________________________________
activation_21 (Activation)   (None, 256)               0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 256)               0         
_________________________________________________________________
dense_22 (Dense)             (None, 128)               32896     
_________________________________________________________________
batch_normalization_6 (Batch (None, 128)               512       
_________________________________________________________________
activation_22 (Activation)   (None, 128)               0         
_________________________________________________________________
dropout_2 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_23 (Dense)             (None, 64)                8256      
_________________________________________________________________
batch_normalization_7 (Batch (None, 64)                256       
_________________________________________________________________
activation_23 (Activation)   (None, 64)                0         
_________________________________________________________________
dropout_3 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_24 (Dense)             (None, 32)                2080      
_________________________________________________________________
batch_normalization_8 (Batch (None, 32)                128       
_________________________________________________________________
activation_24 (Activation)   (None, 32)                0         
_________________________________________________________________
dropout_4 (Dropout)          (None, 32)                0         
_________________________________________________________________
dense_25 (Dense)             (None, 10)                330       
_________________________________________________________________
activation_25 (Activation)   (None, 10)                0         
=================================================================
Total params: 703,658
Trainable params: 701,674
Non-trainable params: 1,984
_________________________________________________________________
In [66]:
# compiling the neural network classifier, sgd optimizer
sgd = optimizers.SGD(lr = 0.01)
model6.compile(optimizer = sgd, loss = 'categorical_crossentropy', metrics = ['accuracy'])

# Adding activation function - softmax for multiclass classification
history = model6.fit(X_train, y_train, validation_data = (X_val, y_val), batch_size = 200, epochs = 100, verbose = 1)
Train on 42000 samples, validate on 60000 samples
Epoch 1/100
42000/42000 [==============================] - 3s 71us/sample - loss: 2.6054 - acc: 0.1100 - val_loss: 2.3237 - val_acc: 0.1209
Epoch 2/100
42000/42000 [==============================] - 2s 55us/sample - loss: 2.4142 - acc: 0.1332 - val_loss: 2.2139 - val_acc: 0.2095
Epoch 3/100
42000/42000 [==============================] - 2s 56us/sample - loss: 2.3151 - acc: 0.1548 - val_loss: 2.1209 - val_acc: 0.2677
Epoch 4/100
42000/42000 [==============================] - 2s 56us/sample - loss: 2.2315 - acc: 0.1876 - val_loss: 2.0331 - val_acc: 0.3157
Epoch 5/100
42000/42000 [==============================] - 3s 60us/sample - loss: 2.1531 - acc: 0.2181 - val_loss: 1.9579 - val_acc: 0.3408
Epoch 6/100
42000/42000 [==============================] - 2s 56us/sample - loss: 2.0815 - acc: 0.2495 - val_loss: 1.8626 - val_acc: 0.3965
Epoch 7/100
42000/42000 [==============================] - 2s 56us/sample - loss: 2.0079 - acc: 0.2789 - val_loss: 1.7922 - val_acc: 0.4313
Epoch 8/100
42000/42000 [==============================] - 2s 56us/sample - loss: 1.9351 - acc: 0.3115 - val_loss: 1.7376 - val_acc: 0.4426
Epoch 9/100
42000/42000 [==============================] - 2s 56us/sample - loss: 1.8709 - acc: 0.3382 - val_loss: 1.6199 - val_acc: 0.4975
Epoch 10/100
42000/42000 [==============================] - 2s 56us/sample - loss: 1.8116 - acc: 0.3619 - val_loss: 1.5512 - val_acc: 0.5238
Epoch 11/100
42000/42000 [==============================] - 2s 54us/sample - loss: 1.7518 - acc: 0.3841 - val_loss: 1.4736 - val_acc: 0.5408
Epoch 12/100
42000/42000 [==============================] - 2s 55us/sample - loss: 1.7028 - acc: 0.4066 - val_loss: 1.3984 - val_acc: 0.5655
Epoch 13/100
42000/42000 [==============================] - 2s 59us/sample - loss: 1.6439 - acc: 0.4270 - val_loss: 1.3576 - val_acc: 0.5828
Epoch 14/100
42000/42000 [==============================] - 2s 55us/sample - loss: 1.5967 - acc: 0.4430 - val_loss: 1.3111 - val_acc: 0.5965
Epoch 15/100
42000/42000 [==============================] - 2s 56us/sample - loss: 1.5617 - acc: 0.4607 - val_loss: 1.2542 - val_acc: 0.6111
Epoch 16/100
42000/42000 [==============================] - 2s 55us/sample - loss: 1.5186 - acc: 0.4753 - val_loss: 1.2091 - val_acc: 0.6265
Epoch 17/100
42000/42000 [==============================] - 2s 57us/sample - loss: 1.4842 - acc: 0.4868 - val_loss: 1.1793 - val_acc: 0.6374
Epoch 18/100
42000/42000 [==============================] - 2s 55us/sample - loss: 1.4473 - acc: 0.5011 - val_loss: 1.1519 - val_acc: 0.6433
Epoch 19/100
42000/42000 [==============================] - 2s 54us/sample - loss: 1.4101 - acc: 0.5162 - val_loss: 1.1057 - val_acc: 0.6606
Epoch 20/100
42000/42000 [==============================] - 2s 55us/sample - loss: 1.3852 - acc: 0.5309 - val_loss: 1.0990 - val_acc: 0.6641
Epoch 21/100
42000/42000 [==============================] - 2s 55us/sample - loss: 1.3534 - acc: 0.5418 - val_loss: 1.0614 - val_acc: 0.6762
Epoch 22/100
42000/42000 [==============================] - 2s 55us/sample - loss: 1.3285 - acc: 0.5496 - val_loss: 1.0304 - val_acc: 0.6890
Epoch 23/100
42000/42000 [==============================] - 2s 57us/sample - loss: 1.2992 - acc: 0.5609 - val_loss: 1.0085 - val_acc: 0.6887
Epoch 24/100
42000/42000 [==============================] - 2s 57us/sample - loss: 1.2786 - acc: 0.5706 - val_loss: 0.9750 - val_acc: 0.7053
Epoch 25/100
42000/42000 [==============================] - 2s 57us/sample - loss: 1.2606 - acc: 0.5766 - val_loss: 0.9636 - val_acc: 0.7069
Epoch 26/100
42000/42000 [==============================] - 2s 58us/sample - loss: 1.2394 - acc: 0.5840 - val_loss: 0.9422 - val_acc: 0.7091
Epoch 27/100
42000/42000 [==============================] - 2s 55us/sample - loss: 1.2132 - acc: 0.5979 - val_loss: 0.9353 - val_acc: 0.7169
Epoch 28/100
42000/42000 [==============================] - 2s 56us/sample - loss: 1.1932 - acc: 0.6047 - val_loss: 0.9152 - val_acc: 0.7215
Epoch 29/100
42000/42000 [==============================] - 2s 57us/sample - loss: 1.1793 - acc: 0.6109 - val_loss: 0.8742 - val_acc: 0.7368
Epoch 30/100
42000/42000 [==============================] - 2s 56us/sample - loss: 1.1572 - acc: 0.6158 - val_loss: 0.9592 - val_acc: 0.7013
Epoch 31/100
42000/42000 [==============================] - 2s 56us/sample - loss: 1.1357 - acc: 0.6254 - val_loss: 0.8361 - val_acc: 0.7450
Epoch 32/100
42000/42000 [==============================] - 2s 56us/sample - loss: 1.1246 - acc: 0.6317 - val_loss: 0.8718 - val_acc: 0.7318
Epoch 33/100
42000/42000 [==============================] - 2s 56us/sample - loss: 1.1089 - acc: 0.6334 - val_loss: 0.8291 - val_acc: 0.7463
Epoch 34/100
42000/42000 [==============================] - 2s 55us/sample - loss: 1.0959 - acc: 0.6414 - val_loss: 0.8557 - val_acc: 0.7293
Epoch 35/100
42000/42000 [==============================] - 2s 56us/sample - loss: 1.0835 - acc: 0.6456 - val_loss: 0.8170 - val_acc: 0.7472
Epoch 36/100
42000/42000 [==============================] - 2s 56us/sample - loss: 1.0635 - acc: 0.6545 - val_loss: 0.8103 - val_acc: 0.7564
Epoch 37/100
42000/42000 [==============================] - 2s 55us/sample - loss: 1.0554 - acc: 0.6572 - val_loss: 0.7692 - val_acc: 0.7686
Epoch 38/100
42000/42000 [==============================] - 3s 63us/sample - loss: 1.0389 - acc: 0.6655 - val_loss: 0.7873 - val_acc: 0.7632
Epoch 39/100
42000/42000 [==============================] - 3s 60us/sample - loss: 1.0189 - acc: 0.6723 - val_loss: 0.7743 - val_acc: 0.7618
Epoch 40/100
42000/42000 [==============================] - 2s 57us/sample - loss: 1.0079 - acc: 0.6775 - val_loss: 0.7309 - val_acc: 0.7797
Epoch 41/100
42000/42000 [==============================] - 2s 57us/sample - loss: 1.0044 - acc: 0.6761 - val_loss: 0.7411 - val_acc: 0.7771
Epoch 42/100
42000/42000 [==============================] - 2s 57us/sample - loss: 0.9949 - acc: 0.6807 - val_loss: 0.7256 - val_acc: 0.7826
Epoch 43/100
42000/42000 [==============================] - 2s 56us/sample - loss: 0.9799 - acc: 0.6888 - val_loss: 0.7309 - val_acc: 0.7801
Epoch 44/100
42000/42000 [==============================] - 2s 56us/sample - loss: 0.9662 - acc: 0.6906 - val_loss: 0.7218 - val_acc: 0.7818
Epoch 45/100
42000/42000 [==============================] - 2s 54us/sample - loss: 0.9578 - acc: 0.6942 - val_loss: 0.6908 - val_acc: 0.7904
Epoch 46/100
42000/42000 [==============================] - 2s 54us/sample - loss: 0.9417 - acc: 0.7012 - val_loss: 0.7344 - val_acc: 0.7721
Epoch 47/100
42000/42000 [==============================] - 2s 56us/sample - loss: 0.9413 - acc: 0.6993 - val_loss: 0.7060 - val_acc: 0.7821
Epoch 48/100
42000/42000 [==============================] - 2s 59us/sample - loss: 0.9270 - acc: 0.7054 - val_loss: 0.7181 - val_acc: 0.7824
Epoch 49/100
42000/42000 [==============================] - 2s 55us/sample - loss: 0.9109 - acc: 0.7130 - val_loss: 0.6612 - val_acc: 0.7988
Epoch 50/100
42000/42000 [==============================] - 2s 59us/sample - loss: 0.9078 - acc: 0.7140 - val_loss: 0.6873 - val_acc: 0.7903
Epoch 51/100
42000/42000 [==============================] - 2s 55us/sample - loss: 0.8992 - acc: 0.7155 - val_loss: 0.6665 - val_acc: 0.8009
Epoch 52/100
42000/42000 [==============================] - 2s 55us/sample - loss: 0.8844 - acc: 0.7209 - val_loss: 0.6310 - val_acc: 0.8111
Epoch 53/100
42000/42000 [==============================] - 2s 56us/sample - loss: 0.8843 - acc: 0.7214 - val_loss: 0.6561 - val_acc: 0.7992
Epoch 54/100
42000/42000 [==============================] - 2s 58us/sample - loss: 0.8734 - acc: 0.7273 - val_loss: 0.6311 - val_acc: 0.8083
Epoch 55/100
42000/42000 [==============================] - 2s 59us/sample - loss: 0.8673 - acc: 0.7281 - val_loss: 0.6193 - val_acc: 0.8108
Epoch 56/100
42000/42000 [==============================] - 2s 58us/sample - loss: 0.8486 - acc: 0.7355 - val_loss: 0.6420 - val_acc: 0.8012
Epoch 57/100
42000/42000 [==============================] - 2s 57us/sample - loss: 0.8503 - acc: 0.7352 - val_loss: 0.6994 - val_acc: 0.7861
Epoch 58/100
42000/42000 [==============================] - 2s 56us/sample - loss: 0.8442 - acc: 0.7357 - val_loss: 0.5837 - val_acc: 0.8247
Epoch 59/100
42000/42000 [==============================] - 2s 55us/sample - loss: 0.8391 - acc: 0.7389 - val_loss: 0.6363 - val_acc: 0.8028
Epoch 60/100
42000/42000 [==============================] - 2s 54us/sample - loss: 0.8335 - acc: 0.7413 - val_loss: 0.5961 - val_acc: 0.8199
Epoch 61/100
42000/42000 [==============================] - 2s 55us/sample - loss: 0.8294 - acc: 0.7413 - val_loss: 0.5941 - val_acc: 0.8199
Epoch 62/100
42000/42000 [==============================] - 2s 56us/sample - loss: 0.8155 - acc: 0.7469 - val_loss: 0.6761 - val_acc: 0.7928
Epoch 63/100
42000/42000 [==============================] - 2s 57us/sample - loss: 0.8109 - acc: 0.7496 - val_loss: 0.6227 - val_acc: 0.8094
Epoch 64/100
42000/42000 [==============================] - 2s 55us/sample - loss: 0.8007 - acc: 0.7499 - val_loss: 0.6400 - val_acc: 0.8033
Epoch 65/100
42000/42000 [==============================] - 2s 55us/sample - loss: 0.8021 - acc: 0.7507 - val_loss: 0.5850 - val_acc: 0.8219
Epoch 66/100
42000/42000 [==============================] - 2s 55us/sample - loss: 0.7899 - acc: 0.7548 - val_loss: 0.5739 - val_acc: 0.8261
Epoch 67/100
42000/42000 [==============================] - 2s 54us/sample - loss: 0.7775 - acc: 0.7590 - val_loss: 0.6175 - val_acc: 0.8099
Epoch 68/100
42000/42000 [==============================] - 2s 54us/sample - loss: 0.7847 - acc: 0.7580 - val_loss: 0.5935 - val_acc: 0.8196
Epoch 69/100
42000/42000 [==============================] - 2s 55us/sample - loss: 0.7709 - acc: 0.7607 - val_loss: 0.7921 - val_acc: 0.7597
Epoch 70/100
42000/42000 [==============================] - 2s 56us/sample - loss: 0.7697 - acc: 0.7607 - val_loss: 0.5952 - val_acc: 0.8155
Epoch 71/100
42000/42000 [==============================] - 2s 56us/sample - loss: 0.7648 - acc: 0.7630 - val_loss: 0.5414 - val_acc: 0.8394
Epoch 72/100
42000/42000 [==============================] - 2s 55us/sample - loss: 0.7613 - acc: 0.7660 - val_loss: 0.5473 - val_acc: 0.8335
Epoch 73/100
42000/42000 [==============================] - 2s 57us/sample - loss: 0.7515 - acc: 0.7674 - val_loss: 0.5540 - val_acc: 0.8316
Epoch 74/100
42000/42000 [==============================] - 2s 58us/sample - loss: 0.7506 - acc: 0.7693 - val_loss: 0.5334 - val_acc: 0.8379
Epoch 75/100
42000/42000 [==============================] - 2s 58us/sample - loss: 0.7394 - acc: 0.7717 - val_loss: 0.5623 - val_acc: 0.8301
Epoch 76/100
42000/42000 [==============================] - 2s 56us/sample - loss: 0.7449 - acc: 0.7692 - val_loss: 0.5112 - val_acc: 0.8475
Epoch 77/100
42000/42000 [==============================] - 2s 55us/sample - loss: 0.7367 - acc: 0.7758 - val_loss: 0.5499 - val_acc: 0.8302
Epoch 78/100
42000/42000 [==============================] - 2s 54us/sample - loss: 0.7347 - acc: 0.7757 - val_loss: 0.5693 - val_acc: 0.8232
Epoch 79/100
42000/42000 [==============================] - 2s 57us/sample - loss: 0.7263 - acc: 0.7772 - val_loss: 0.6090 - val_acc: 0.8162
Epoch 80/100
42000/42000 [==============================] - 2s 55us/sample - loss: 0.7270 - acc: 0.7768 - val_loss: 0.6102 - val_acc: 0.8118
Epoch 81/100
42000/42000 [==============================] - 2s 59us/sample - loss: 0.7203 - acc: 0.7798 - val_loss: 0.5505 - val_acc: 0.8335
Epoch 82/100
42000/42000 [==============================] - 2s 55us/sample - loss: 0.7166 - acc: 0.7812 - val_loss: 0.5441 - val_acc: 0.8350
Epoch 83/100
42000/42000 [==============================] - 2s 56us/sample - loss: 0.7112 - acc: 0.7829 - val_loss: 0.5885 - val_acc: 0.8156
Epoch 84/100
42000/42000 [==============================] - 2s 54us/sample - loss: 0.7027 - acc: 0.7845 - val_loss: 0.5343 - val_acc: 0.8360
Epoch 85/100
42000/42000 [==============================] - 2s 57us/sample - loss: 0.6959 - acc: 0.7876 - val_loss: 0.5293 - val_acc: 0.8361
Epoch 86/100
42000/42000 [==============================] - 2s 54us/sample - loss: 0.6922 - acc: 0.7890 - val_loss: 0.5124 - val_acc: 0.8444
Epoch 87/100
42000/42000 [==============================] - 2s 53us/sample - loss: 0.6874 - acc: 0.7903 - val_loss: 0.5271 - val_acc: 0.8380
Epoch 88/100
42000/42000 [==============================] - 2s 53us/sample - loss: 0.6906 - acc: 0.7891 - val_loss: 0.5241 - val_acc: 0.8409
Epoch 89/100
42000/42000 [==============================] - 2s 58us/sample - loss: 0.6743 - acc: 0.7943 - val_loss: 0.5262 - val_acc: 0.8401
Epoch 90/100
42000/42000 [==============================] - 2s 55us/sample - loss: 0.6803 - acc: 0.7909 - val_loss: 0.5525 - val_acc: 0.8313
Epoch 91/100
42000/42000 [==============================] - 2s 55us/sample - loss: 0.6779 - acc: 0.7928 - val_loss: 0.5164 - val_acc: 0.8436
Epoch 92/100
42000/42000 [==============================] - 2s 55us/sample - loss: 0.6767 - acc: 0.7927 - val_loss: 0.4986 - val_acc: 0.8492
Epoch 93/100
42000/42000 [==============================] - 2s 56us/sample - loss: 0.6632 - acc: 0.7985 - val_loss: 0.5105 - val_acc: 0.8431
Epoch 94/100
42000/42000 [==============================] - 2s 56us/sample - loss: 0.6640 - acc: 0.7991 - val_loss: 0.4897 - val_acc: 0.8518
Epoch 95/100
42000/42000 [==============================] - 2s 55us/sample - loss: 0.6698 - acc: 0.7959 - val_loss: 0.5073 - val_acc: 0.8418
Epoch 96/100
42000/42000 [==============================] - 2s 54us/sample - loss: 0.6591 - acc: 0.8000 - val_loss: 0.6676 - val_acc: 0.7981
Epoch 97/100
42000/42000 [==============================] - 2s 55us/sample - loss: 0.6608 - acc: 0.7998 - val_loss: 0.6813 - val_acc: 0.7984
Epoch 98/100
42000/42000 [==============================] - 2s 56us/sample - loss: 0.6470 - acc: 0.8018 - val_loss: 0.4924 - val_acc: 0.8505
Epoch 99/100
42000/42000 [==============================] - 2s 56us/sample - loss: 0.6573 - acc: 0.8001 - val_loss: 0.5130 - val_acc: 0.8419
Epoch 100/100
42000/42000 [==============================] - 2s 57us/sample - loss: 0.6478 - acc: 0.8029 - val_loss: 0.4758 - val_acc: 0.8560
In [67]:
print('NN model with dropout - sgd optimizer'); print('--'*40)
results6 = model6.evaluate(X_val, y_val)
print('Validation accuracy: {}'.format(round(results6[1]*100, 2), '%'))
NN model with dropout - sgd optimizer
--------------------------------------------------------------------------------
60000/60000 [==============================] - 4s 73us/sample - loss: 0.4758 - acc: 0.8560
Validation accuracy: 85.6
In [68]:
# compiling the neural network classifier, adam optimizer
adam = optimizers.Adam(lr = 0.001)
model6.compile(optimizer = adam, loss = 'categorical_crossentropy', metrics = ['accuracy'])

# Adding activation function - softmax for multiclass classification
history = model6.fit(X_train, y_train, validation_data = (X_val, y_val), batch_size = 200, epochs = 100, verbose = 1)
Train on 42000 samples, validate on 60000 samples
Epoch 1/100
42000/42000 [==============================] - 3s 75us/sample - loss: 1.0483 - acc: 0.6722 - val_loss: 1.5585 - val_acc: 0.4775
Epoch 2/100
42000/42000 [==============================] - 2s 58us/sample - loss: 0.9086 - acc: 0.7194 - val_loss: 1.3776 - val_acc: 0.5337
Epoch 3/100
42000/42000 [==============================] - 2s 59us/sample - loss: 0.8480 - acc: 0.7403 - val_loss: 1.0837 - val_acc: 0.6561
Epoch 4/100
42000/42000 [==============================] - 3s 62us/sample - loss: 0.7984 - acc: 0.7575 - val_loss: 1.1266 - val_acc: 0.6141
Epoch 5/100
42000/42000 [==============================] - 2s 58us/sample - loss: 0.7714 - acc: 0.7648 - val_loss: 1.2182 - val_acc: 0.5853
Epoch 6/100
42000/42000 [==============================] - 3s 60us/sample - loss: 0.7336 - acc: 0.7772 - val_loss: 1.0197 - val_acc: 0.6763
Epoch 7/100
42000/42000 [==============================] - 2s 57us/sample - loss: 0.7149 - acc: 0.7826 - val_loss: 1.1113 - val_acc: 0.6252
Epoch 8/100
42000/42000 [==============================] - 3s 60us/sample - loss: 0.6941 - acc: 0.7898 - val_loss: 0.9268 - val_acc: 0.7488
Epoch 9/100
42000/42000 [==============================] - 2s 59us/sample - loss: 0.6799 - acc: 0.7955 - val_loss: 1.3377 - val_acc: 0.5425
Epoch 10/100
42000/42000 [==============================] - 2s 58us/sample - loss: 0.6518 - acc: 0.8032 - val_loss: 0.8574 - val_acc: 0.7210
Epoch 11/100
42000/42000 [==============================] - 3s 63us/sample - loss: 0.6379 - acc: 0.8080 - val_loss: 1.2970 - val_acc: 0.5655
Epoch 12/100
42000/42000 [==============================] - 2s 59us/sample - loss: 0.6319 - acc: 0.8079 - val_loss: 0.9649 - val_acc: 0.7167
Epoch 13/100
42000/42000 [==============================] - 2s 58us/sample - loss: 0.6119 - acc: 0.8154 - val_loss: 1.2310 - val_acc: 0.5864
Epoch 14/100
42000/42000 [==============================] - 2s 59us/sample - loss: 0.6004 - acc: 0.8187 - val_loss: 0.8527 - val_acc: 0.7557
Epoch 15/100
42000/42000 [==============================] - 2s 57us/sample - loss: 0.5864 - acc: 0.8228 - val_loss: 1.2250 - val_acc: 0.5955
Epoch 16/100
42000/42000 [==============================] - 3s 60us/sample - loss: 0.5891 - acc: 0.8222 - val_loss: 0.9921 - val_acc: 0.6742
Epoch 17/100
42000/42000 [==============================] - 3s 60us/sample - loss: 0.5695 - acc: 0.8276 - val_loss: 1.0117 - val_acc: 0.6686
Epoch 18/100
42000/42000 [==============================] - 2s 57us/sample - loss: 0.5593 - acc: 0.8319 - val_loss: 0.8131 - val_acc: 0.7855
Epoch 19/100
42000/42000 [==============================] - 2s 57us/sample - loss: 0.5542 - acc: 0.8344 - val_loss: 1.1605 - val_acc: 0.6218
Epoch 20/100
42000/42000 [==============================] - 2s 59us/sample - loss: 0.5464 - acc: 0.8354 - val_loss: 0.9370 - val_acc: 0.6977
Epoch 21/100
42000/42000 [==============================] - 3s 63us/sample - loss: 0.5382 - acc: 0.8398 - val_loss: 0.7988 - val_acc: 0.7521
Epoch 22/100
42000/42000 [==============================] - 3s 61us/sample - loss: 0.5289 - acc: 0.8412 - val_loss: 0.9359 - val_acc: 0.6881
Epoch 23/100
42000/42000 [==============================] - 2s 57us/sample - loss: 0.5178 - acc: 0.8447 - val_loss: 0.8229 - val_acc: 0.7292
Epoch 24/100
42000/42000 [==============================] - 2s 59us/sample - loss: 0.5133 - acc: 0.8455 - val_loss: 1.0570 - val_acc: 0.6738
Epoch 25/100
42000/42000 [==============================] - 3s 60us/sample - loss: 0.5046 - acc: 0.8481 - val_loss: 0.7736 - val_acc: 0.7540
Epoch 26/100
42000/42000 [==============================] - 2s 56us/sample - loss: 0.5038 - acc: 0.8490 - val_loss: 0.8040 - val_acc: 0.7357
Epoch 27/100
42000/42000 [==============================] - 2s 59us/sample - loss: 0.4964 - acc: 0.8505 - val_loss: 1.0628 - val_acc: 0.6397
Epoch 28/100
42000/42000 [==============================] - 3s 64us/sample - loss: 0.4956 - acc: 0.8516 - val_loss: 1.0051 - val_acc: 0.6658
Epoch 29/100
42000/42000 [==============================] - 2s 57us/sample - loss: 0.4762 - acc: 0.8577 - val_loss: 0.9671 - val_acc: 0.6847
Epoch 30/100
42000/42000 [==============================] - 2s 59us/sample - loss: 0.4803 - acc: 0.8545 - val_loss: 0.9152 - val_acc: 0.7100
Epoch 31/100
42000/42000 [==============================] - 2s 59us/sample - loss: 0.4699 - acc: 0.8582 - val_loss: 0.7210 - val_acc: 0.7599
Epoch 32/100
42000/42000 [==============================] - 3s 61us/sample - loss: 0.4675 - acc: 0.8602 - val_loss: 0.7587 - val_acc: 0.7574
Epoch 33/100
42000/42000 [==============================] - 2s 58us/sample - loss: 0.4615 - acc: 0.8604 - val_loss: 0.7750 - val_acc: 0.7526
Epoch 34/100
42000/42000 [==============================] - 2s 59us/sample - loss: 0.4583 - acc: 0.8603 - val_loss: 0.8713 - val_acc: 0.7247
Epoch 35/100
42000/42000 [==============================] - 3s 62us/sample - loss: 0.4579 - acc: 0.8632 - val_loss: 0.7976 - val_acc: 0.7362
Epoch 36/100
42000/42000 [==============================] - 2s 59us/sample - loss: 0.4483 - acc: 0.8645 - val_loss: 0.7683 - val_acc: 0.7563
Epoch 37/100
42000/42000 [==============================] - 2s 58us/sample - loss: 0.4392 - acc: 0.8675 - val_loss: 0.7715 - val_acc: 0.7536
Epoch 38/100
42000/42000 [==============================] - 2s 59us/sample - loss: 0.4412 - acc: 0.8669 - val_loss: 0.8139 - val_acc: 0.7285
Epoch 39/100
42000/42000 [==============================] - 2s 58us/sample - loss: 0.4326 - acc: 0.8701 - val_loss: 0.6890 - val_acc: 0.7885
Epoch 40/100
42000/42000 [==============================] - 2s 58us/sample - loss: 0.4206 - acc: 0.8730 - val_loss: 0.8502 - val_acc: 0.7258
Epoch 41/100
42000/42000 [==============================] - 2s 58us/sample - loss: 0.4262 - acc: 0.8727 - val_loss: 0.6450 - val_acc: 0.7916
Epoch 42/100
42000/42000 [==============================] - 2s 58us/sample - loss: 0.4188 - acc: 0.8747 - val_loss: 0.5895 - val_acc: 0.8100
Epoch 43/100
42000/42000 [==============================] - 2s 58us/sample - loss: 0.4113 - acc: 0.8745 - val_loss: 0.7252 - val_acc: 0.7735
Epoch 44/100
42000/42000 [==============================] - 2s 58us/sample - loss: 0.4156 - acc: 0.8732 - val_loss: 1.1334 - val_acc: 0.6520
Epoch 45/100
42000/42000 [==============================] - 3s 60us/sample - loss: 0.4083 - acc: 0.8765 - val_loss: 0.6479 - val_acc: 0.7949
Epoch 46/100
42000/42000 [==============================] - 3s 61us/sample - loss: 0.4091 - acc: 0.8773 - val_loss: 0.7718 - val_acc: 0.7485
Epoch 47/100
42000/42000 [==============================] - 2s 57us/sample - loss: 0.4042 - acc: 0.8782 - val_loss: 0.7290 - val_acc: 0.7611
Epoch 48/100
42000/42000 [==============================] - 2s 59us/sample - loss: 0.3953 - acc: 0.8811 - val_loss: 0.6882 - val_acc: 0.7742
Epoch 49/100
42000/42000 [==============================] - 2s 59us/sample - loss: 0.3882 - acc: 0.8829 - val_loss: 0.8469 - val_acc: 0.7211
Epoch 50/100
42000/42000 [==============================] - 2s 57us/sample - loss: 0.3840 - acc: 0.8865 - val_loss: 0.6448 - val_acc: 0.7923
Epoch 51/100
42000/42000 [==============================] - 3s 60us/sample - loss: 0.3824 - acc: 0.8846 - val_loss: 0.7429 - val_acc: 0.7559
Epoch 52/100
42000/42000 [==============================] - 3s 60us/sample - loss: 0.3770 - acc: 0.8862 - val_loss: 0.6541 - val_acc: 0.7947
Epoch 53/100
42000/42000 [==============================] - 2s 58us/sample - loss: 0.3806 - acc: 0.8847 - val_loss: 0.7222 - val_acc: 0.7739
Epoch 54/100
42000/42000 [==============================] - 2s 59us/sample - loss: 0.3762 - acc: 0.8865 - val_loss: 0.6802 - val_acc: 0.7789
Epoch 55/100
42000/42000 [==============================] - 2s 59us/sample - loss: 0.3678 - acc: 0.8901 - val_loss: 0.5836 - val_acc: 0.8153
Epoch 56/100
42000/42000 [==============================] - 2s 58us/sample - loss: 0.3635 - acc: 0.8897 - val_loss: 0.7012 - val_acc: 0.7723
Epoch 57/100
42000/42000 [==============================] - 2s 57us/sample - loss: 0.3637 - acc: 0.8902 - val_loss: 0.5000 - val_acc: 0.8412
Epoch 58/100
42000/42000 [==============================] - 2s 58us/sample - loss: 0.3613 - acc: 0.8913 - val_loss: 0.7912 - val_acc: 0.7452
Epoch 59/100
42000/42000 [==============================] - 2s 59us/sample - loss: 0.3617 - acc: 0.8900 - val_loss: 0.7234 - val_acc: 0.7732
Epoch 60/100
42000/42000 [==============================] - 3s 62us/sample - loss: 0.3663 - acc: 0.8914 - val_loss: 0.7428 - val_acc: 0.7548
Epoch 61/100
42000/42000 [==============================] - 2s 59us/sample - loss: 0.3541 - acc: 0.8931 - val_loss: 0.4835 - val_acc: 0.8514
Epoch 62/100
42000/42000 [==============================] - 2s 58us/sample - loss: 0.3497 - acc: 0.8946 - val_loss: 0.9309 - val_acc: 0.7019
Epoch 63/100
42000/42000 [==============================] - 2s 59us/sample - loss: 0.3509 - acc: 0.8935 - val_loss: 0.6272 - val_acc: 0.8097
Epoch 64/100
42000/42000 [==============================] - 3s 62us/sample - loss: 0.3499 - acc: 0.8937 - val_loss: 0.6166 - val_acc: 0.8006
Epoch 65/100
42000/42000 [==============================] - 3s 62us/sample - loss: 0.3408 - acc: 0.8973 - val_loss: 0.5850 - val_acc: 0.8120
Epoch 66/100
42000/42000 [==============================] - 3s 62us/sample - loss: 0.3443 - acc: 0.8963 - val_loss: 0.5576 - val_acc: 0.8225
Epoch 67/100
42000/42000 [==============================] - 3s 61us/sample - loss: 0.3357 - acc: 0.8967 - val_loss: 0.5877 - val_acc: 0.8114
Epoch 68/100
42000/42000 [==============================] - 2s 59us/sample - loss: 0.3436 - acc: 0.8964 - val_loss: 0.8347 - val_acc: 0.7368
Epoch 69/100
42000/42000 [==============================] - 3s 62us/sample - loss: 0.3296 - acc: 0.8996 - val_loss: 0.5776 - val_acc: 0.8166
Epoch 70/100
42000/42000 [==============================] - 3s 60us/sample - loss: 0.3298 - acc: 0.9002 - val_loss: 0.7237 - val_acc: 0.7641
Epoch 71/100
42000/42000 [==============================] - 2s 57us/sample - loss: 0.3314 - acc: 0.8998 - val_loss: 0.7168 - val_acc: 0.7644
Epoch 72/100
42000/42000 [==============================] - 2s 59us/sample - loss: 0.3229 - acc: 0.9024 - val_loss: 0.6814 - val_acc: 0.7736
Epoch 73/100
42000/42000 [==============================] - 3s 60us/sample - loss: 0.3281 - acc: 0.8997 - val_loss: 0.5008 - val_acc: 0.8452
Epoch 74/100
42000/42000 [==============================] - 2s 58us/sample - loss: 0.3208 - acc: 0.9037 - val_loss: 0.5365 - val_acc: 0.8301
Epoch 75/100
42000/42000 [==============================] - 3s 61us/sample - loss: 0.3230 - acc: 0.9022 - val_loss: 0.6199 - val_acc: 0.8008
Epoch 76/100
42000/42000 [==============================] - 3s 65us/sample - loss: 0.3145 - acc: 0.9044 - val_loss: 0.5949 - val_acc: 0.8097
Epoch 77/100
42000/42000 [==============================] - 3s 61us/sample - loss: 0.3178 - acc: 0.9046 - val_loss: 0.6488 - val_acc: 0.7916
Epoch 78/100
42000/42000 [==============================] - 2s 59us/sample - loss: 0.3136 - acc: 0.9048 - val_loss: 0.5016 - val_acc: 0.8428
Epoch 79/100
42000/42000 [==============================] - 2s 56us/sample - loss: 0.3124 - acc: 0.9065 - val_loss: 0.6061 - val_acc: 0.8069
Epoch 80/100
42000/42000 [==============================] - 2s 59us/sample - loss: 0.3083 - acc: 0.9069 - val_loss: 0.5600 - val_acc: 0.8191
Epoch 81/100
42000/42000 [==============================] - 2s 59us/sample - loss: 0.3097 - acc: 0.9063 - val_loss: 0.5729 - val_acc: 0.8153
Epoch 82/100
42000/42000 [==============================] - 2s 58us/sample - loss: 0.3036 - acc: 0.9097 - val_loss: 0.4584 - val_acc: 0.8562
Epoch 83/100
42000/42000 [==============================] - 3s 62us/sample - loss: 0.3066 - acc: 0.9073 - val_loss: 0.5082 - val_acc: 0.8383
Epoch 84/100
42000/42000 [==============================] - 2s 59us/sample - loss: 0.2970 - acc: 0.9109 - val_loss: 0.7692 - val_acc: 0.7568
Epoch 85/100
42000/42000 [==============================] - 2s 59us/sample - loss: 0.3044 - acc: 0.9073 - val_loss: 0.6092 - val_acc: 0.8047
Epoch 86/100
42000/42000 [==============================] - 2s 57us/sample - loss: 0.2958 - acc: 0.9103 - val_loss: 0.4906 - val_acc: 0.8454
Epoch 87/100
42000/42000 [==============================] - 2s 58us/sample - loss: 0.2953 - acc: 0.9108 - val_loss: 0.4721 - val_acc: 0.8436
Epoch 88/100
42000/42000 [==============================] - 3s 60us/sample - loss: 0.2943 - acc: 0.9105 - val_loss: 0.6549 - val_acc: 0.7836
Epoch 89/100
42000/42000 [==============================] - 3s 60us/sample - loss: 0.2943 - acc: 0.9100 - val_loss: 0.6813 - val_acc: 0.7893
Epoch 90/100
42000/42000 [==============================] - 2s 56us/sample - loss: 0.2928 - acc: 0.9110 - val_loss: 0.7207 - val_acc: 0.7628
Epoch 91/100
42000/42000 [==============================] - 2s 58us/sample - loss: 0.2925 - acc: 0.9115 - val_loss: 0.8072 - val_acc: 0.7591
Epoch 92/100
42000/42000 [==============================] - 2s 58us/sample - loss: 0.2900 - acc: 0.9117 - val_loss: 0.8668 - val_acc: 0.7243
Epoch 93/100
42000/42000 [==============================] - 3s 63us/sample - loss: 0.2853 - acc: 0.9132 - val_loss: 0.5118 - val_acc: 0.8381
Epoch 94/100
42000/42000 [==============================] - 3s 60us/sample - loss: 0.2833 - acc: 0.9132 - val_loss: 0.6805 - val_acc: 0.7838
Epoch 95/100
42000/42000 [==============================] - 2s 57us/sample - loss: 0.2863 - acc: 0.9134 - val_loss: 0.5222 - val_acc: 0.8342
Epoch 96/100
42000/42000 [==============================] - 2s 59us/sample - loss: 0.2776 - acc: 0.9172 - val_loss: 0.6173 - val_acc: 0.7970
Epoch 97/100
42000/42000 [==============================] - 2s 57us/sample - loss: 0.2856 - acc: 0.9139 - val_loss: 0.4914 - val_acc: 0.8441
Epoch 98/100
42000/42000 [==============================] - 3s 60us/sample - loss: 0.2740 - acc: 0.9163 - val_loss: 0.4306 - val_acc: 0.8653
Epoch 99/100
42000/42000 [==============================] - 3s 60us/sample - loss: 0.2711 - acc: 0.9185 - val_loss: 0.6014 - val_acc: 0.8122
Epoch 100/100
42000/42000 [==============================] - 3s 60us/sample - loss: 0.2757 - acc: 0.9154 - val_loss: 0.5253 - val_acc: 0.8297
In [69]:
print('NN model with dropout - adam optimizer'); print('--'*40)
results6 = model6.evaluate(X_val, y_val)
print('Validation accuracy: {}'.format(round(results6[1]*100, 2), '%'))
NN model with dropout - adam optimizer
--------------------------------------------------------------------------------
60000/60000 [==============================] - 4s 72us/sample - loss: 0.5253 - acc: 0.8297
Validation accuracy: 82.97

Observation 8 - Batch Normalization and Dropout
  • Didn't result in any improvement of score.
  • NN model, relu activations, SGD optimizers with weight initializers and batch normalization is still the best model.
  • Next, let's try batch normalization and dropout with adam optimizer.

Prediction on test dataset using Model 3 - relu activations, Adam optimizers

In [70]:
print('NN model with relu activations and changing number of activators'); print('--'*40)
# Initialize the neural network classifier
model3 = Sequential()

# Input Layer - adding input layer and activation functions relu
model3.add(Dense(256, input_shape = (1024, )))
# Adding activation function
model3.add(Activation('relu'))

#Hidden Layer 1 - adding first hidden layer
model3.add(Dense(128))
# Adding activation function
model3.add(Activation('relu'))

#Hidden Layer 2 - Adding second hidden layer
model3.add(Dense(64))
# Adding activation function
model3.add(Activation('relu'))

# Output Layer - adding output layer which is of 10 nodes (digits)
model3.add(Dense(10))
# Adding activation function - softmax for multiclass classification
model3.add(Activation('softmax'))
NN model with relu activations and changing number of activators
--------------------------------------------------------------------------------
In [71]:
# compiling the neural network classifier, adam optimizer
adam = optimizers.Adam(lr = 0.001)
model3.compile(optimizer = adam, loss = 'categorical_crossentropy', metrics = ['accuracy'])

# Fitting the neural network for training
history = model3.fit(X_train, y_train, validation_data = (X_val, y_val), batch_size = 200, epochs = 100, verbose = 1)
Train on 42000 samples, validate on 60000 samples
Epoch 1/100
42000/42000 [==============================] - 2s 39us/sample - loss: 2.2450 - acc: 0.1494 - val_loss: 1.9481 - val_acc: 0.3398
Epoch 2/100
42000/42000 [==============================] - 1s 30us/sample - loss: 1.5502 - acc: 0.4844 - val_loss: 1.3632 - val_acc: 0.5493
Epoch 3/100
42000/42000 [==============================] - 1s 29us/sample - loss: 1.2826 - acc: 0.5840 - val_loss: 1.1971 - val_acc: 0.6190
Epoch 4/100
42000/42000 [==============================] - 1s 29us/sample - loss: 1.1456 - acc: 0.6382 - val_loss: 1.0864 - val_acc: 0.6639
Epoch 5/100
42000/42000 [==============================] - 1s 29us/sample - loss: 1.0588 - acc: 0.6704 - val_loss: 1.0618 - val_acc: 0.6653
Epoch 6/100
42000/42000 [==============================] - 1s 29us/sample - loss: 1.0035 - acc: 0.6882 - val_loss: 0.9490 - val_acc: 0.7054
Epoch 7/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.9524 - acc: 0.7060 - val_loss: 0.9711 - val_acc: 0.7005
Epoch 8/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.9115 - acc: 0.7183 - val_loss: 0.8688 - val_acc: 0.7336
Epoch 9/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.8822 - acc: 0.7289 - val_loss: 0.8723 - val_acc: 0.7294
Epoch 10/100
42000/42000 [==============================] - 1s 32us/sample - loss: 0.8557 - acc: 0.7358 - val_loss: 0.8526 - val_acc: 0.7390
Epoch 11/100
42000/42000 [==============================] - 1s 31us/sample - loss: 0.8294 - acc: 0.7449 - val_loss: 0.8516 - val_acc: 0.7340
Epoch 12/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.8123 - acc: 0.7475 - val_loss: 0.7854 - val_acc: 0.7562
Epoch 13/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.7956 - acc: 0.7556 - val_loss: 0.8098 - val_acc: 0.7527
Epoch 14/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.7827 - acc: 0.7574 - val_loss: 0.7563 - val_acc: 0.7686
Epoch 15/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.7541 - acc: 0.7695 - val_loss: 0.7475 - val_acc: 0.7719
Epoch 16/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.7410 - acc: 0.7709 - val_loss: 0.7352 - val_acc: 0.7752
Epoch 17/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.7287 - acc: 0.7764 - val_loss: 0.7410 - val_acc: 0.7713
Epoch 18/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.7150 - acc: 0.7795 - val_loss: 0.7157 - val_acc: 0.7821
Epoch 19/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.7025 - acc: 0.7840 - val_loss: 0.7085 - val_acc: 0.7825
Epoch 20/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.6890 - acc: 0.7884 - val_loss: 0.7198 - val_acc: 0.7801
Epoch 21/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.6722 - acc: 0.7940 - val_loss: 0.7024 - val_acc: 0.7853
Epoch 22/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.6632 - acc: 0.7958 - val_loss: 0.7599 - val_acc: 0.7661
Epoch 23/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.6577 - acc: 0.7969 - val_loss: 0.6656 - val_acc: 0.7979
Epoch 24/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.6323 - acc: 0.8049 - val_loss: 0.6612 - val_acc: 0.7972
Epoch 25/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.6246 - acc: 0.8080 - val_loss: 0.6253 - val_acc: 0.8116
Epoch 26/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.6241 - acc: 0.8078 - val_loss: 0.6045 - val_acc: 0.8176
Epoch 27/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.6085 - acc: 0.8138 - val_loss: 0.6033 - val_acc: 0.8179
Epoch 28/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.5945 - acc: 0.8184 - val_loss: 0.5880 - val_acc: 0.8226
Epoch 29/100
42000/42000 [==============================] - 1s 32us/sample - loss: 0.5854 - acc: 0.8206 - val_loss: 0.5891 - val_acc: 0.8232
Epoch 30/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.5819 - acc: 0.8211 - val_loss: 0.5972 - val_acc: 0.8191
Epoch 31/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.5704 - acc: 0.8244 - val_loss: 0.5885 - val_acc: 0.8224
Epoch 32/100
42000/42000 [==============================] - 1s 32us/sample - loss: 0.5728 - acc: 0.8236 - val_loss: 0.5906 - val_acc: 0.8203
Epoch 33/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.5547 - acc: 0.8282 - val_loss: 0.5743 - val_acc: 0.8260
Epoch 34/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.5482 - acc: 0.8301 - val_loss: 0.5979 - val_acc: 0.8173
Epoch 35/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.5397 - acc: 0.8349 - val_loss: 0.5557 - val_acc: 0.8319
Epoch 36/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.5328 - acc: 0.8364 - val_loss: 0.5550 - val_acc: 0.8320
Epoch 37/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.5315 - acc: 0.8366 - val_loss: 0.5342 - val_acc: 0.8388
Epoch 38/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.5190 - acc: 0.8395 - val_loss: 0.5484 - val_acc: 0.8354
Epoch 39/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.5133 - acc: 0.8424 - val_loss: 0.5546 - val_acc: 0.8307
Epoch 40/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.5245 - acc: 0.8354 - val_loss: 0.5637 - val_acc: 0.8249
Epoch 41/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.5054 - acc: 0.8429 - val_loss: 0.5232 - val_acc: 0.8415
Epoch 42/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.5053 - acc: 0.8436 - val_loss: 0.5410 - val_acc: 0.8354
Epoch 43/100
42000/42000 [==============================] - 1s 32us/sample - loss: 0.5047 - acc: 0.8423 - val_loss: 0.5490 - val_acc: 0.8328
Epoch 44/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.4911 - acc: 0.8472 - val_loss: 0.5196 - val_acc: 0.8423
Epoch 45/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4943 - acc: 0.8452 - val_loss: 0.5047 - val_acc: 0.8465
Epoch 46/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4728 - acc: 0.8534 - val_loss: 0.5193 - val_acc: 0.8446
Epoch 47/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4728 - acc: 0.8541 - val_loss: 0.5411 - val_acc: 0.8332
Epoch 48/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4783 - acc: 0.8500 - val_loss: 0.5254 - val_acc: 0.8409
Epoch 49/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.4661 - acc: 0.8535 - val_loss: 0.4862 - val_acc: 0.8540
Epoch 50/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.4548 - acc: 0.8591 - val_loss: 0.4810 - val_acc: 0.8564
Epoch 51/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4544 - acc: 0.8582 - val_loss: 0.5049 - val_acc: 0.8487
Epoch 52/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4551 - acc: 0.8577 - val_loss: 0.4743 - val_acc: 0.8570
Epoch 53/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.4540 - acc: 0.8572 - val_loss: 0.4703 - val_acc: 0.8590
Epoch 54/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4449 - acc: 0.8603 - val_loss: 0.5001 - val_acc: 0.8488
Epoch 55/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4390 - acc: 0.8627 - val_loss: 0.4796 - val_acc: 0.8548
Epoch 56/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4353 - acc: 0.8651 - val_loss: 0.4902 - val_acc: 0.8526
Epoch 57/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.4404 - acc: 0.8614 - val_loss: 0.5234 - val_acc: 0.8398
Epoch 58/100
42000/42000 [==============================] - 1s 33us/sample - loss: 0.4391 - acc: 0.8615 - val_loss: 0.4935 - val_acc: 0.8507
Epoch 59/100
42000/42000 [==============================] - 1s 31us/sample - loss: 0.4318 - acc: 0.8632 - val_loss: 0.4953 - val_acc: 0.8497
Epoch 60/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4265 - acc: 0.8662 - val_loss: 0.5158 - val_acc: 0.8422
Epoch 61/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4205 - acc: 0.8655 - val_loss: 0.5047 - val_acc: 0.8468
Epoch 62/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.4207 - acc: 0.8683 - val_loss: 0.4942 - val_acc: 0.8505
Epoch 63/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4136 - acc: 0.8702 - val_loss: 0.4649 - val_acc: 0.8625
Epoch 64/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4127 - acc: 0.8694 - val_loss: 0.4480 - val_acc: 0.8663
Epoch 65/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4169 - acc: 0.8680 - val_loss: 0.4424 - val_acc: 0.8684
Epoch 66/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4004 - acc: 0.8735 - val_loss: 0.4501 - val_acc: 0.8655
Epoch 67/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4046 - acc: 0.8707 - val_loss: 0.4404 - val_acc: 0.8692
Epoch 68/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4078 - acc: 0.8700 - val_loss: 0.4328 - val_acc: 0.8717
Epoch 69/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.4011 - acc: 0.8722 - val_loss: 0.4715 - val_acc: 0.8596
Epoch 70/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3974 - acc: 0.8742 - val_loss: 0.4439 - val_acc: 0.8680
Epoch 71/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3904 - acc: 0.8752 - val_loss: 0.4707 - val_acc: 0.8602
Epoch 72/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3903 - acc: 0.8760 - val_loss: 0.4233 - val_acc: 0.8751
Epoch 73/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3848 - acc: 0.8788 - val_loss: 0.4433 - val_acc: 0.8659
Epoch 74/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3820 - acc: 0.8785 - val_loss: 0.4359 - val_acc: 0.8703
Epoch 75/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3744 - acc: 0.8806 - val_loss: 0.4497 - val_acc: 0.8642
Epoch 76/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.3889 - acc: 0.8754 - val_loss: 0.4522 - val_acc: 0.8640
Epoch 77/100
42000/42000 [==============================] - 1s 32us/sample - loss: 0.3780 - acc: 0.8807 - val_loss: 0.5019 - val_acc: 0.8474
Epoch 78/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3741 - acc: 0.8813 - val_loss: 0.4329 - val_acc: 0.8713
Epoch 79/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3686 - acc: 0.8828 - val_loss: 0.4347 - val_acc: 0.8714
Epoch 80/100
42000/42000 [==============================] - 1s 32us/sample - loss: 0.3624 - acc: 0.8852 - val_loss: 0.4339 - val_acc: 0.8716
Epoch 81/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.3730 - acc: 0.8813 - val_loss: 0.4617 - val_acc: 0.8615
Epoch 82/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3621 - acc: 0.8837 - val_loss: 0.4089 - val_acc: 0.8799
Epoch 83/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.3633 - acc: 0.8843 - val_loss: 0.4048 - val_acc: 0.8813
Epoch 84/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3598 - acc: 0.8852 - val_loss: 0.4693 - val_acc: 0.8583
Epoch 85/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3580 - acc: 0.8858 - val_loss: 0.4362 - val_acc: 0.8702
Epoch 86/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3572 - acc: 0.8849 - val_loss: 0.4603 - val_acc: 0.8621
Epoch 87/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.3524 - acc: 0.8885 - val_loss: 0.4158 - val_acc: 0.8780
Epoch 88/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3495 - acc: 0.8895 - val_loss: 0.4046 - val_acc: 0.8816
Epoch 89/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3430 - acc: 0.8909 - val_loss: 0.4195 - val_acc: 0.8769
Epoch 90/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3433 - acc: 0.8904 - val_loss: 0.4739 - val_acc: 0.8565
Epoch 91/100
42000/42000 [==============================] - 1s 32us/sample - loss: 0.3410 - acc: 0.8909 - val_loss: 0.4222 - val_acc: 0.8758
Epoch 92/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.3433 - acc: 0.8912 - val_loss: 0.4033 - val_acc: 0.8818
Epoch 93/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.3392 - acc: 0.8909 - val_loss: 0.4610 - val_acc: 0.8605
Epoch 94/100
42000/42000 [==============================] - 1s 28us/sample - loss: 0.3302 - acc: 0.8946 - val_loss: 0.3992 - val_acc: 0.8845
Epoch 95/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3377 - acc: 0.8926 - val_loss: 0.4248 - val_acc: 0.8755
Epoch 96/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3321 - acc: 0.8932 - val_loss: 0.4212 - val_acc: 0.8764
Epoch 97/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3279 - acc: 0.8958 - val_loss: 0.4042 - val_acc: 0.8802
Epoch 98/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3348 - acc: 0.8917 - val_loss: 0.3966 - val_acc: 0.8849
Epoch 99/100
42000/42000 [==============================] - 1s 29us/sample - loss: 0.3188 - acc: 0.8980 - val_loss: 0.4341 - val_acc: 0.8735
Epoch 100/100
42000/42000 [==============================] - 1s 30us/sample - loss: 0.3311 - acc: 0.8932 - val_loss: 0.3852 - val_acc: 0.8892
In [72]:
print('NN with batch normalization'); print('--'*40)
results3 = model3.evaluate(X_val, y_val)
print('Validation accuracy: {}'.format(round(results3[1]*100, 2), '%'))
NN with batch normalization
--------------------------------------------------------------------------------
60000/60000 [==============================] - 3s 53us/sample - loss: 0.3852 - acc: 0.8892
Validation accuracy: 88.92
In [73]:
print('Testing the model on test dataset')
predictions = model3.predict_classes(X_test)
score = model3.evaluate(X_test, y_test)
print('Test loss :', score[0])
print('Test accuracy :', score[1])
Testing the model on test dataset
18000/18000 [==============================] - 1s 52us/sample - loss: 0.6316 - acc: 0.8357
Test loss : 0.6316036958562004
Test accuracy : 0.8357222
In [74]:
print('Classification Report'); print('--'*40)
print(classification_report(y_test_o, predictions))
Classification Report
--------------------------------------------------------------------------------
              precision    recall  f1-score   support

           0       0.84      0.88      0.86      1814
           1       0.85      0.86      0.86      1828
           2       0.86      0.85      0.86      1803
           3       0.79      0.79      0.79      1719
           4       0.86      0.88      0.87      1812
           5       0.76      0.83      0.80      1768
           6       0.83      0.81      0.82      1832
           7       0.91      0.85      0.88      1808
           8       0.80      0.79      0.80      1812
           9       0.84      0.82      0.83      1804

    accuracy                           0.84     18000
   macro avg       0.84      0.84      0.84     18000
weighted avg       0.84      0.84      0.84     18000

In [75]:
print('Visualizing the confusion matrix')
plt.figure(figsize = (15, 7.2))
sns.heatmap(confusion_matrix(y_test_o, predictions), annot = True)
Visualizing the confusion matrix
Out[75]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f7fa60fd0b8>
In [76]:
# summarize history for accuracy
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('Model3 Accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['Train', 'Validation'], loc = 'upper left')
plt.show()
# summarize history for loss
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model3 Loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['Train', 'Validation'], loc = 'upper left')
plt.show()
In [77]:
model3.predict_classes(X_test)[5]
Out[77]:
9
In [78]:
#Showing the image
plt.imshow(X_test[20].reshape(32, 32), cmap = 'gray')
Out[78]:
<matplotlib.image.AxesImage at 0x7f7f9f3c72b0>
In [79]:
model3.predict_classes(X_test)[20]
Out[79]:
0
In [82]:
plt.imshow(X_test[10].reshape(32, 32), cmap = 'gray')
Out[82]:
<matplotlib.image.AxesImage at 0x7f7f9f2fd198>
In [83]:
model3.predict_classes(X_test)[10]
Out[83]:
9

Conclusion

Evaluated the accuracy using two methods i.e. baby sitting the NN and NN through API. Followed all the required steps starting with loading the datasets to performing hyperparameter optimization and running a finer search by using a finer range. Explored different options in optimizers, number of activators, learning rate and activation methods in NN through API. Found that baby sitting process achieved the best accuracy of 21% using hyper parameter optimization. It might have been further improved but that's the trade off vs time taken to run the script. NN through API method achieved best accuracy of 90% on validation set. Also printed the classification report, visualized the confusion matrix and summarized history for accuracy and loss.