5.1.1 Hardware Processor: Intel Core i3, 1.70GHz RAM: 4GB5.1.2 Software Operating System: Windows 10 (64bit) Programming Languages : Python. Platform :Python37 (keras)Dept. of CSE, DSCE, Bangalore 78 11Facial Expression Recognition using Neural Networks 5 IMPLEMENTATION5.2 Implementation Details5.2.1 Organization of implementation filesDept. of CSE, DSCE, Bangalore 78 12Facial Expression Recognition using Neural Networks 5 IMPLEMENTATION5.2.2 InstallationInitially install following files in Python37:5.2.3 Dataset CollectionThe data consists of 48×48 pixel grayscale images of faces. The faces have beenautomatically registered so that the face is more or less centered and occupiesabout the same amount of space in each image.
The task is to categorize eachface based on the emotion shown in the facial expression in to one of seven categories (0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral).The dataset is divided in the ratio 80:20 for training and testing respectively.The training set consists of 35,888 examples. train.csv contains two columns,emotion and pixels. The emotion column contains a numeric code rangingfrom 0 to 6, inclusive, for the emotion that is present in the image.
The pixelscolumn contains a string surrounded in quotes for each image. The contents of thisstring a space-separated pixel values in row major order. For gender recognitionwe have used IMDB gender dataset which contains 460,723 RGB images whereeach image belongs to the class woman or man, and it achieved an accuracyof 96% in this dataset.Dept. of CSE, DSCE, Bangalore 78 13Facial Expression Recognition using Neural Networks 5 IMPLEMENTATION5.2.4 Loading FER Data-set1. def load fer2013 : It reads the csv file and convert pixel sequence of eachrow in image of dimension 48*48. It returns faces and emotion labels.2. def preprocess input : It is a standard way to pre-process images byscaling them between -1 to 1. Images is scaled to [0,1] by dividing it by255. Further, subtraction by 0.5 and multiplication by 2 changes the rangeto [-1,1]. [-1,1] has been found a better range for neural network models incomputer vision problems.Originally in the dataset provided in kaggle link, each image is given as stringwhich is a row 1*2304 which is 48*48 image stored as row vector.5.2.5 Training CNN model : Mini XceptionBelow are few of those techniques which are used while training the CNN modelbelow.1. Data Augmentation : More data is generated using the training set byapplying transformations. It is required if the training set is not sufficientenough to learn representation. The image data is generated by transforming the actual training images by rotation, crop, shifts, shear, zoom, flip,reflection, normalization etc.2. Kernel regularizer : It allows to apply penalties on layer parametersduring optimization. These penalties are incorporated in the loss functionthat the network optimizes. Argument in convolution layer is nothing butL2 regularisation of the weights. This penalizes peaky weights and makessure that all the inputs are considered.3. BatchNormalization : It normalizes the activation of the previous layer ateach batch, i.e. applies a transformation that maintains the mean activationclose to 0 and the activation standard deviation close to 1. It addresses theproblem of internal covariate shift. It also acts as a regularizer, in somecases eliminating the need for Dropout. It helps in speeding up the trainingprocess.4. Global Average Pooling : It reduces each feature map into a scalar valueby taking the average over all elements in the feature map. The averageoperation forces the network to extract global features from the input image.5. Depthwise Separable Convolution : These convolutions are composedof two different layers: depth-wise convolutions and point-wise convolutions.Depth-wise separable convolutions reduces the computation with respect tothe standard convolutions by reducing the number of parameters.Dept. of CSE, DSCE, Bangalore 78 14Facial Expression Recognition using Neural Networks 5 IMPLEMENTATION5.2.6 TestingWhile performing tests on the trained model, we felt that model detects the emotion of faces as neutral if the expressions are not made distinguishable enough.The model gives probabilities of each emotion class in the output layer of trainedmini xception CNN model.Dept. of CSE, DSCE, Bangalore 78 15Facial Expression Recognition using Neural Networks 6 TESTING6 TestingTesting is carried out with 20For testing static images we have used the followingcommandE:tryproject/faceclassificationmaster/srcpython imageemotiongenderdemo.py ../images/testimage.jpgDept. of CSE, DSCE, Bangalore 78 16Facial Expression Recognition using Neural Networks 7 RESULTS7 ResultsWe evaluated the accuracy of the proposed deep neural network architecturein two different experiments; viz. subject-independent and cross-database evaluation. In the subject-independent experiment, databases are split into training,validation, and test sets in a strict subject inde- pendent manner. We used theK-fold cross validation technique with K = 5 to evaluate the results. In FERAand SFEW, the training and test sets are defined in the database re- lease, andthe results are evaluated on the database defined test set without performing Kfold cross validation. Since there are different samples per emotion per subjectin some databases, the training, validation and test sets have slightly differentsample sizes in each fold. On average we used 175K samples for training, 56Ksamples for validation, and 64K samples for test. The proposed architecture wastrained for 200 epochs (i.e. 150K iterations on mini-batches of size 250 samples).Table 3 gives the average accuracy when classifying the images into the six basic expressions and the neutral expression. Table3 gives the average accuracy whenclassifying the images into the six basic expressions and the neutral expression.The average confusion matrix for subject-independent experiments can be seen inTable4.Here, we also report the top-2 expression classes. As Ta- ble3depicts, theaccuracy of the top-2 classification is 15% higher than the top-1 accuracy in mostcases, especially in the wild datasets (i.e. FERA, SFEW, FER2013).We believethat by assigning a single expression to a image can be ambiguous when there istransition between expressions or the given expression is not at its peak, and therefore the top-2 expression can result in a better classification performance whenevaluating image sequences.Table 3. Average Accuracy % for subject-independent.Dept. of CSE, DSCE, Bangalore 78 17Facial Expression Recognition using Neural Networks 7 RESULTSTable 4. Average (%) confusion matrix for subject-independent.In the cross-database experiment, one database is used for evaluation and therest of databases are used to train the network. Because every database has aunique fingerprint (lighting, pose, emotions, etc.) the cross database task is muchmore difficult to extract features from (both for traditional SVM approaches, andfor neural networks). The proposed architecture was trained for 100 epochs in eachexperiment. Table5 gives the average cross-database accuracy when classifying thesix basic expressions as well as the neutral expression. The experiment presentedin is a cross-database experiment performed by training the model on one of theCK+, MMI or FEEDTUM databases and testing the model on the others. Thereported result in Table5 is the average results for the CK+ and MMI databases.Multiple features are fused via a Multiple Kernel Learning algorithm andthe cross-database experiment is trained on CK+, evaluated on MMI and viceDept. of CSE, DSCE, Bangalore 78 18Facial Expression Recognition using Neural Networks 7 RESULTSversa. Com- paring the result of our proposed approach with these state- ofthe-art methods, it can be concluded that our network can generalized well forFER problem. Unfortunately, there is not any study on cross-database evaluationof more challenging datasets such as FERA, SFEW and FER2013. We believethat this work can be a baseline for cross-database of these challenging datasets.Figure3.Training loss and classification accuracy on validation set.Table 5. Average Accuracy (%) on cross databaseTable 6. Subject-independent comparison with AlexNet results (% accuracy)As a benchmark to our proposed solution, we trained a full AlexNet fromscratch (as opposed to fine tuning an already trained network) using the sameprotocol as used to train our own network. As shown in Table 6, our proposedDept. of CSE, DSCE, Bangalore 78 19Facial Expression Recognition using Neural Networks 7 RESULTSarchitecture has better performance on MMI and FER2013 and comparable performance on the rest of the databases. The value of the proposed solution overthe AlexNet architecture is its training time – Our version of AlexNet performedmore than 100M operations, whereas the proposed network performs about 25Moperations.Dept. of CSE, DSCE, Bangalore 78