Building A Face Recognition System Using Scikit Learn In Python

What’s face recognition?

Face recognition is the task of comparing an unknown individual’s face to images in a database of stored records. The mapping could be one–to–one or one–to–many, depending on whether we are running face verification or face identification.

In this tutorial, we are interested in building a facial identification system that will verify if an image, generally known as probe image, exists within a pre-existing database of faces, generally known as the evaluation set.

Intuition

There are four main steps involved in building such a system:

1. Detect faces in an image

Available face detection models include MTCNN, FaceNet, Dlib, etc.

2. Crop & align faces for uniformity

OpenCV library provides all the tools we need for this step.

3. Find vector representation for each face

Since programs can’t work with jpg or png files directly, we need some way of translating images to numbers. In this tutorial, we will be using the Insightface model for creating a multi-dimensional (512-d) embedding for a face such that it encapsulates useful semantic information pertaining to the face.

To tackle all three steps using a single library, we will be using insightface. In particular, we will be working with Insightface’s ArcFace model.

InsightFace is an open-sourced deep face analysis model for face recognition, face detection and face align-ment tasks.

4. Comparing the embeddings

Once we have translated each unique face into a vector, comparing faces essentials boils down to comparing the corresponding embeddings. We will be making use of these embeddings to train a sci-kit learn model.

P.S. If you’d like to follow along, the code is available on Github.

If this in-depth educational content is useful for you, subscribe to our AI research mailing list to be alerted when we release new material.

Setup

Create a virtual environment (optional):

python3 -m venv face_search_env

Activate this environment:

source face_search_env/bin/activate

Necessary installations within this environment:

pip install mxnet==1.8.0.post0
pip install -U insightface==0.2.1
pip install onnx==1.10.1
pip install onnxruntime==1.8.1

More importantly, once you are done with pip installing insightface:

– Download the antelope model release from onedrive. (It contains two pre-trained models for detection and recognition).
– Put it under ~/.insightface/models/, so there’re onnx models at ~/.insightface/models/antelope/*.onnx.

This is how it should look like if the setup was done correctly:

and if you look inside the antelope directory, you’ll find the two onnx models for face detection and recognition:

Note: Since the latest release of insightface 0.4.1 last week, the installation was not as straightforward as I would have hoped (at least for me). Hence, I will be using 0.2.1 for this tutorial. In the future, I’ll update the code on Github accordingly. Please see the instructions here if you’re stuck.

Dataset

We will be working with the Yale Faces dataset available on Kaggle, containing approximately 165 grayscale images of 15 individuals (i.e. 11 unique images per identity). The images are composed of a wide variety of expressions, poses, and illumination configurations.

Once you have the dataset, go ahead and unzip it inside a newly createddata directory within your project (see the project directory structure on Github).

Let’s begin…

If you’d like to follow along, the Jupyter Notebook can be found on Github.

Imports

import os
import pickle
import numpy as np
from PIL import Image
from typing import List
from tqdm import tqdm

from insightface.app import FaceAnalysis
from sklearn.neighbors import NearestNeighbors

Loading Insightface Model

Once insightface is installed, we must call app=FaceAnalysis(name="model_name")to load the models.

Since we stored our onnx models inside the antelope directory:

app = FaceAnalysis(name="antelope")
app.prepare(ctx_id=0, det_size=(640, 640))

Generate Insightface embeddings

Generating an embedding for an image is quite straightforward with the insightface model. For instance:

# Generating embeddings for an image

img_emb_results = app.get(np.asarray(img))
img_emb = img_emb_results[0].embedding
img_emb.shape

------------OUTPUT---------------
(512,)

Dataset

Prior to using this dataset, we must fix the extensions for the files in the directory such that file names end with .gif. (or .jpg , .png, etc).

For instance, the following code snippet will change the filename subject01.glasses to subject01_glasses.gif.

# Fixing the file extensions
YALE_DIR = "../data/yalefaces"
files = os.listdir(YALE_DIR)[1:]
for i, img in enumerate(files):
    # print("original name: ", img)
    new_ext_name = "_".join(img.split(".")) + ".gif"
    # print("new name: ",  new_ext_name)
    os.rename(os.path.join(YALE_DIR, img), os.path.join(YALE_DIR, new_ext_name))

Next, we will split the data into the evaluation and probe sets: 90% or 10 images per subject will become part of the evaluation set and the remaining 10% or 1 image per subject will be used in the probe set.

To avoid sampling bias, the probe image for each subject will be randomly chosen using a helper function called create_probe_eval_set() . It takes as input a list containing the (file names for the) 11 images belonging to a particular subject and returns two lists of lengths 1 and 10. The former contains the filename to be used for the probe set while the latter contains file names for the evaluation set.

def create_probe_eval_set(files: List):
    # pick random index between 0 and len(files)-1
    random_idx = np.random.randint(0,len(files))
    probe_img_fpaths = [files[random_idx]]
    eval_img_fpaths = [files[idx] for idx in range(len(files)) if idx != random_idx]
    
    return probe_img_fpaths, eval_img_fpaths

Generate embeddings

Both the lists returned by the create_probe_eval_set() are sequentially fed to a helper function called generate_embs(). For each filename in the list, it reads the grayscale image, converts it to RGB, calculates the corresponding embeddings, and finally returns the embeddings along with the image labels (scraped from the filename).

def generate_embs(img_fpaths: List[str]):
    embs_set = list()
    embs_label = list()

    for img_fpath in img_fpaths:  
                    
        # read grayscale img
        img = Image.open(os.path.join(YALE_DIR, img_fpath)) 
        img_arr = np.asarray(img)  
        
        # convert grayscale to rgb
        im = Image.fromarray((img_arr * 255).astype(np.uint8))
        rgb_arr = np.asarray(im.convert('RGB'))       
       
        # generate Insightface embedding
        res = app.get(rgb_arr)          
        # append emb to the eval set
        embs_set.append(res)          
        # append label to eval_label set
        embs_label.append(img_fpath.split("_")[0])          

    return embs_set, embs_label

Now that we have a framework for generating embeddings, let’s go ahead and create embeddings for both probe and evaluation set using generate_embs().

# sorting files
files = os.listdir(YALE_DIR)
files.sort()
eval_set = list()
eval_labels = list()
probe_set = list()
probe_labels = list()
IMAGES_PER_IDENTITY = 11
for i in tqdm(range(1, len(files), IMAGES_PER_IDENTITY), unit_divisor=True): # ignore the README.txt file at files[0]
    # print(i)
    probe, eval = create_probe_eval_set(files[i:i+IMAGES_PER_IDENTITY])
    
    # store eval embs and labels
    eval_set_t, eval_labels_t = generate_embs(eval)
    eval_set.extend(eval_set_t)
    eval_labels.extend(eval_labels_t)
    
    # store probe embs and labels
    probe_set_t, probe_labels_t = generate_embs(probe)
    probe_set.extend(probe_set_t)
    probe_labels.extend(probe_labels_t)

Few things to consider:

The files returned by os.listdir()are in completely random order, hence sorting on line 3 is important. Why do we need sorted filenames? Remember create_probe_eval_set() on line 11 requires all files belonging to a particular subject in any single iteration.

Output from os.listdir() without sorting (left) and with sorting (right)

[Optional] We could have replaced the create_probe_eval_set() function, get rid of the forloop, and simplified a few lines in the above code snippet if we used the stratified train_test_splitfunctionality provided by sklearn. For this tutorial, however, I prioritized clarity over code simplicity.

Oftentimes, insightface is unable to detect a face and subsequently generates an empty embedding for it. That explains why some of the entries in probe_setor eval_set list might be empty. It is important that we filter them out and keep only non-empty values.

To do so, we create another helper function called filter_empty_embs():

def filter_empty_embs(img_set: List, img_labels: List[str]):
    # filtering where insightface could not generate an embedding
    good_idx = [i for i,x in enumerate(img_set) if x]
    
    if len(good_idx) == len(img_set):
        clean_embs = [e[0].embedding for e in img_set]
        clean_labels = img_labels
        
    else:
        # filtering eval set and labels based on good idx
        clean_labels = np.array(img_labels)[good_idx]
        clean_set = np.array(img_set, dtype=object)[good_idx]
        
        # generating embs for good idx
        clean_embs = [e[0].embedding for e in clean_set]
    
    return clean_embs, clean_labels

It takes as input the image set (either probe_set or eval_set ) and removes those elements for which insightface could not generate an embedding (see Line 6). Following this, it also updates the labels (either probe_labelsor eval_labels) (see Line 7) such that both sets and labels have the same length.

Finally, we can obtain the 512-d embeddings for only the good indices in both evaluation set and probe set:

evaluation_embs, evaluation_labels = filter_empty_embs(eval_set, eval_labels)
probe_embs, probe_labels = filter_empty_embs(probe_set, probe_labels)

assert len(evaluation_embs) == len(evaluation_labels)
assert len(probe_embs) == len(probe_labels)

With both sets at our disposal, we are now ready to build our face identification system using a popular unsupervised learning method implemented in the Sklearn library.

Creating a face recognition system

We train the Nearest neighbor model using .fit() with evaluation embeddings as X. This is a neat technique for unsupervised nearest neighbors learning.

The nearest neighbour method allows us to find a predefined number of training samples closest in distance to a new point.

Note: The distance can, in general, be any metric measure such as Euclidean, Manhattan, Cosine, Minkowski, etc.

# Nearest neighbour learning method
nn = NearestNeighbors(n_neighbors=3, metric="cosine")
nn.fit(X=evaluation_embs)

# save the model to disk
filename = 'faceID_model.pkl'
with open(filename, 'wb') as file:
    pickle.dump(nn, file)
    
# some time later...
# load the model from disk
# with open(filename, 'rb') as file:
#     pickle_model = pickle.load(file)

Because we are implementing an unsupervised learning method, observe that we do not pass any labels, i.e. evaluation_label to the fit method. All we are doing here is mapping out the face embeddings in the evaluation set into a latent space.

Why??, you ask.
Simple answer: By storing the training set in memory ahead of time, we are able to speed up the search for its nearest neighbors during inference time.

How does it do this?
Simple answer: Storing the tree in an optimized manner in memory is quite useful, especially when the training set is large and searching for a new point’s neighbors becomes computationally expensive.

Neighbors-based methods are known as non-generalizing machine learning methods, since they simply “remember” all of its training data (possibly transformed into a fast indexing structure such as a Ball Tree or KD Tree). [Source]

Note: See this Stackoverflow discussion if you are still not convinced!

Inference

For each new probe image, we can find whether it is present in the evaluation set by searching for its top k neighborsusing nn.neighbours()method. For instance,

# Example inference on test image

dists, inds = nn.kneighbors(X = probe_img_emb.reshape(1,-1),
                            n_neighbors = 3,
                            return_distances = True
                            )

If the labels at the returned indices (inds) in the evaluation set are a perfect match for the probe image’s original/true label, then we know we have found our face in the verification system.

We have wrapped the aforementioned logic into the print_ID_results() method. It takes as input the probe image path, the evaluation set labels, and the verbose flag to specify if detailed results should be displayed.

def print_ID_results(img_fpath: str, evaluation_labels: np.ndarray, verbose: bool = False):
    img = Image.open(img_fpath)
    img_emb = app.get(np.asarray(img))[0].embedding
    
    # get pred from KNN
    dists, inds = nn.kneighbors(X=img_emb.reshape(1,-1), n_neighbors=3, return_distance=True)
    
    # get labels of the neighbours
    pred_labels = [evaluation_labels[i] for i in inds[0]]
    
    # check if any dist is greater than 0.5, and if so, print the results
    no_of_matching_faces = np.sum([1 if d &amp;lt;=0.6 else 0 for d in dists[0]])
    if no_of_matching_faces &gt; 0:
        print("Matching face(s) found in database! ")
        verbose = True
    else: 
        print("No matching face(s) not found in database!")
        
    # print labels and corresponding distances
    if verbose:
        for label, dist in zip(pred_labels, dists[0]):
            print(f"Nearest neighbours found in the database have labels {label} and is at a distance of {dist}")

Few important things to note here:

inds contain the indices of the nearest neighbors in the evaluation_labels set (line 6). For instance, inds = [[2,0,11]]means label at index=2 in evaluation_labels is found to be nearest to the probe image, followed by the label at index = 0.
Since for any image, nn.neighborswill return a non-empty response, we must only consider those results as a face ID match if the distance returned is less than or equal to 0.6 (line 12). (P.S. The choice of 0.6 is completely arbitrary).
For example, continuing with the above example where inds = [[2,0, 11]]and let’s saydists = [[0.4, 0.6, 0.9]], we will only consider the labels at index=2 and index = 0 (in evaluation_labels) as a true face match because the dist for the last neighbor is too large for it to be a genuine match.

As a quick sanity check, let’s see the system’s response when we input a baby’s face as a probe image. As expected, it reveals no matching faces found! However, we set verbose as True, because of which we get to see the labels and distances for its bogus nearest neighbors in the database, all of which appear to be quite large (>0.8).

Evaluating the face recognition system

One of the ways to test whether this system is any good is to see how many relevant results are present in the top k neighbors. A relevant result is one where the true label matches the predicted label. This metric is generally referred to as precision at k, where k is predetermined.

For instance, pick an image (or rather an embedding ) from the probe set with a true label as ‘subject01’. If the top two pred_labels returned by nn.neighborsfor this image are [‘subject01’, ‘subject01’], it means the precision at k (p@k) with k=2 is 100%. Similarly, if only one of the values in pred_labels was equal to ‘subject05’, p@k would be 50%, and so on…

dists, inds = nn.kneighbors(X=probe_embs_example.reshape(1, -1),
                            n_neighbors=2,
                            return_distance=True)

pred_labels = [evaluation_labels[i] for i in inds[0] ]
pred_labels

----- OUTPUT ------

['002', '002']

Let’s go ahead and calculate the average p@k value across the entire probe set:

# inference on probe set
dists, inds = nn.kneighbors(X=probe_embs, n_neighbors=2, return_distance=True)

# calculate avg p@k
p_at_k = np.zeros(len(probe_embs))
for i in range(len(probe_embs)):
    true_label = probe_labels[i]
    pred_neighbr_idx = inds[i]
    
    pred_labels = [evaluation_labels[id] for id in pred_neighbr_idx]
    pred_is_labels = [1 if label == true_label else 0 for label in pred_labels]
    
    p_at_k[i] = np.mean(pred_is_labels)
    
p_at_k.mean()

------ OUTPUT --------

0.9

Awesome! 90% Not too shabby but definitely could be improved (but that’s for another time)…

Kudos to you for following this through! Hopefully, this warm introduction to face recognition, an active area of research in computer vision, was enough to get you started. As always, if there’s an easier way to do some of the things I mentioned in this article, please do let me know.

Until next time 🙂

Podurama is the best podcast player to stream more than a million shows and 30 million episodes. It provides the best recommendations based on your interests and listening history. Available for iOS Android MacOS Windows 10 and web. Early users get free lifetime sync between unlimited devices.

This article was originally published on Towards Data Science and re-published to TOPBOTS with permission from the author.

We’ll let you know when we release more technical education.

What’s face recognition?

Intuition

1. Detect faces in an image

2. Crop & align faces for uniformity

3. Find vector representation for each face

4. Comparing the embeddings

Setup

Dataset

Let’s begin…

Imports

Loading Insightface Model

Generate Insightface embeddings

Dataset

Generate embeddings

Creating a face recognition system

Inference

Evaluating the face recognition system

Enjoy this article? Sign up for more AI updates.

Related

Reader Interactions

About Varshita Sher

Leave a Reply

Footer

About TOPBOTS