What’s face recognition?
Face recognition is the task of comparing an unknown individual’s face to images in a database of stored records. The mapping could be one–to–one or one–to–many, depending on whether we are running face verification or face identification.
In this tutorial, we are interested in building a facial identification system that will verify if an image, generally known as probe image, exists within a pre-existing database of faces, generally known as the evaluation set.
Intuition
There are four main steps involved in building such a system:
1. Detect faces in an image
Available face detection models include MTCNN, FaceNet, Dlib, etc.
2. Crop & align faces for uniformity
OpenCV library provides all the tools we need for this step.
3. Find vector representation for each face
Since programs can’t work with jpg or png files directly, we need some way of translating images to numbers. In this tutorial, we will be using the Insightface model for creating a multi-dimensional (512-d) embedding for a face such that it encapsulates useful semantic information pertaining to the face.
To tackle all three steps using a single library, we will be using insightface
. In particular, we will be working with Insightface’s ArcFace model.
InsightFace is an open-sourced deep face analysis model for face recognition, face detection and face align-ment tasks.
4. Comparing the embeddings
Once we have translated each unique face into a vector, comparing faces essentials boils down to comparing the corresponding embeddings. We will be making use of these embeddings to train a sci-kit learn model.
P.S. If you’d like to follow along, the code is available on Github.
If this in-depth educational content is useful for you, subscribe to our AI research mailing list to be alerted when we release new material.
Setup
Create a virtual environment (optional):
python3 -m venv face_search_env
Activate this environment:
source face_search_env/bin/activate
Necessary installations within this environment:
pip install mxnet==1.8.0.post0 pip install -U insightface==0.2.1 pip install onnx==1.10.1 pip install onnxruntime==1.8.1
More importantly, once you are done with pip installing insightface
:
– Download the antelope model release from onedrive. (It contains two pre-trained models for detection and recognition).
– Put it under ~/.insightface/models/
, so there’re onnx models at ~/.insightface/models/antelope/*.onnx
.
This is how it should look like if the setup was done correctly:
and if you look inside the antelope
directory, you’ll find the two onnx
models for face detection and recognition:
Note: Since the latest release of insightface
0.4.1 last week, the installation was not as straightforward as I would have hoped (at least for me). Hence, I will be using 0.2.1 for this tutorial. In the future, I’ll update the code on Github accordingly. Please see the instructions here if you’re stuck.
Dataset
We will be working with the Yale Faces dataset available on Kaggle, containing approximately 165 grayscale images of 15 individuals (i.e. 11 unique images per identity). The images are composed of a wide variety of expressions, poses, and illumination configurations.
Once you have the dataset, go ahead and unzip it inside a newly createddata
directory within your project (see the project directory structure on Github).
Let’s begin…
If you’d like to follow along, the Jupyter Notebook can be found on Github.
Imports
import os import pickle import numpy as np from PIL import Image from typing import List from tqdm import tqdm from insightface.app import FaceAnalysis from sklearn.neighbors import NearestNeighbors
Loading Insightface Model
Once insightface
is installed, we must call app=FaceAnalysis(name="model_name")
to load the models.
Since we stored our onnx
models inside the antelope directory:
app = FaceAnalysis(name="antelope") app.prepare(ctx_id=0, det_size=(640, 640))
Generate Insightface embeddings
Generating an embedding for an image is quite straightforward with the insightface
model. For instance:
# Generating embeddings for an image img_emb_results = app.get(np.asarray(img)) img_emb = img_emb_results[0].embedding img_emb.shape
------------OUTPUT---------------
(512,)
Dataset
Prior to using this dataset, we must fix the extensions for the files in the directory such that file names end with .gif
. (or .jpg
, .png
, etc).
For instance, the following code snippet will change the filename subject01.glasses
to subject01_glasses.gif
.
# Fixing the file extensions YALE_DIR = "../data/yalefaces" files = os.listdir(YALE_DIR)[1:] for i, img in enumerate(files): # print("original name: ", img) new_ext_name = "_".join(img.split(".")) + ".gif" # print("new name: ", new_ext_name) os.rename(os.path.join(YALE_DIR, img), os.path.join(YALE_DIR, new_ext_name))
Next, we will split the data into the evaluation and probe sets: 90% or 10 images per subject will become part of the evaluation set and the remaining 10% or 1 image per subject will be used in the probe set.
To avoid sampling bias, the probe image for each subject will be randomly chosen using a helper function called create_probe_eval_set()
. It takes as input a list containing the (file names for the) 11 images belonging to a particular subject and returns two lists of lengths 1 and 10. The former contains the filename to be used for the probe set while the latter contains file names for the evaluation set.
def create_probe_eval_set(files: List): # pick random index between 0 and len(files)-1 random_idx = np.random.randint(0,len(files)) probe_img_fpaths = [files[random_idx]] eval_img_fpaths = [files[idx] for idx in range(len(files)) if idx != random_idx] return probe_img_fpaths, eval_img_fpaths
Generate embeddings
Both the lists returned by the create_probe_eval_set()
are sequentially fed to a helper function called generate_embs()
. For each filename in the list, it reads the grayscale image, converts it to RGB, calculates the corresponding embeddings, and finally returns the embeddings along with the image labels (scraped from the filename).
def generate_embs(img_fpaths: List[str]): embs_set = list() embs_label = list() for img_fpath in img_fpaths: # read grayscale img img = Image.open(os.path.join(YALE_DIR, img_fpath)) img_arr = np.asarray(img) # convert grayscale to rgb im = Image.fromarray((img_arr * 255).astype(np.uint8)) rgb_arr = np.asarray(im.convert('RGB')) # generate Insightface embedding res = app.get(rgb_arr) # append emb to the eval set embs_set.append(res) # append label to eval_label set embs_label.append(img_fpath.split("_")[0]) return embs_set, embs_label
Now that we have a framework for generating embeddings, let’s go ahead and create embeddings for both probe and evaluation set using generate_embs()
.
# sorting files files = os.listdir(YALE_DIR) files.sort() eval_set = list() eval_labels = list() probe_set = list() probe_labels = list() IMAGES_PER_IDENTITY = 11 for i in tqdm(range(1, len(files), IMAGES_PER_IDENTITY), unit_divisor=True): # ignore the README.txt file at files[0] # print(i) probe, eval = create_probe_eval_set(files[i:i+IMAGES_PER_IDENTITY]) # store eval embs and labels eval_set_t, eval_labels_t = generate_embs(eval) eval_set.extend(eval_set_t) eval_labels.extend(eval_labels_t) # store probe embs and labels probe_set_t, probe_labels_t = generate_embs(probe) probe_set.extend(probe_set_t) probe_labels.extend(probe_labels_t)
Few things to consider:
- The files returned by
os.listdir()
are in completely random order, hence sorting on line 3 is important. Why do we need sorted filenames? Remembercreate_probe_eval_set()
on line 11 requires all files belonging to a particular subject in any single iteration.
Output from os.listdir() without sorting (left) and with sorting (right)
- [Optional] We could have replaced the
create_probe_eval_set()
function, get rid of thefor
loop, and simplified a few lines in the above code snippet if we used the stratifiedtrain_test_split
functionality provided bysklearn
. For this tutorial, however, I prioritized clarity over code simplicity.
Oftentimes, insightface
is unable to detect a face and subsequently generates an empty embedding for it. That explains why some of the entries in probe_set
or eval_set
list might be empty. It is important that we filter them out and keep only non-empty values.
To do so, we create another helper function called filter_empty_embs()
:
def filter_empty_embs(img_set: List, img_labels: List[str]): # filtering where insightface could not generate an embedding good_idx = [i for i,x in enumerate(img_set) if x] if len(good_idx) == len(img_set): clean_embs = [e[0].embedding for e in img_set] clean_labels = img_labels else: # filtering eval set and labels based on good idx clean_labels = np.array(img_labels)[good_idx] clean_set = np.array(img_set, dtype=object)[good_idx] # generating embs for good idx clean_embs = [e[0].embedding for e in clean_set] return clean_embs, clean_labels
It takes as input the image set (either probe_set
or eval_set
) and removes those elements for which insightface
could not generate an embedding (see Line 6). Following this, it also updates the labels (either probe_labels
or eval_labels
) (see Line 7) such that both sets and labels have the same length.
Finally, we can obtain the 512-d embeddings for only the good indices in both evaluation set and probe set:
evaluation_embs, evaluation_labels = filter_empty_embs(eval_set, eval_labels) probe_embs, probe_labels = filter_empty_embs(probe_set, probe_labels)
assert len(evaluation_embs) == len(evaluation_labels) assert len(probe_embs) == len(probe_labels)
With both sets at our disposal, we are now ready to build our face identification system using a popular unsupervised learning method implemented in the Sklearn library.
Creating a face recognition system
We train the Nearest neighbor model using .fit()
with evaluation embeddings as X
. This is a neat technique for unsupervised nearest neighbors learning.
The nearest neighbour method allows us to find a predefined number of training samples closest in distance to a new point.
Note: The distance can, in general, be any metric measure such as Euclidean, Manhattan, Cosine, Minkowski, etc.
# Nearest neighbour learning method nn = NearestNeighbors(n_neighbors=3, metric="cosine") nn.fit(X=evaluation_embs) # save the model to disk filename = 'faceID_model.pkl' with open(filename, 'wb') as file: pickle.dump(nn, file) # some time later... # load the model from disk # with open(filename, 'rb') as file: # pickle_model = pickle.load(file)
Because we are implementing an unsupervised learning method, observe that we do not pass any labels, i.e. evaluation_label
to the fit
method. All we are doing here is mapping out the face embeddings in the evaluation set into a latent space.
Why??, you ask.
Simple answer: By storing the training set in memory ahead of time, we are able to speed up the search for its nearest neighbors during inference time.
How does it do this?
Simple answer: Storing the tree in an optimized manner in memory is quite useful, especially when the training set is large and searching for a new point’s neighbors becomes computationally expensive.
Neighbors-based methods are known as non-generalizing machine learning methods, since they simply “remember” all of its training data (possibly transformed into a fast indexing structure such as a Ball Tree or KD Tree). [Source]
Note: See this Stackoverflow discussion if you are still not convinced!
Inference
For each new probe image, we can find whether it is present in the evaluation set by searching for its top k neighborsusing nn.neighbours()
method. For instance,
# Example inference on test image dists, inds = nn.kneighbors(X = probe_img_emb.reshape(1,-1), n_neighbors = 3, return_distances = True )
If the labels at the returned indices (inds
) in the evaluation set are a perfect match for the probe image’s original/true label, then we know we have found our face in the verification system.
We have wrapped the aforementioned logic into the print_ID_results()
method. It takes as input the probe image path, the evaluation set labels, and the verbose
flag to specify if detailed results should be displayed.
def print_ID_results(img_fpath: str, evaluation_labels: np.ndarray, verbose: bool = False): img = Image.open(img_fpath) img_emb = app.get(np.asarray(img))[0].embedding # get pred from KNN dists, inds = nn.kneighbors(X=img_emb.reshape(1,-1), n_neighbors=3, return_distance=True) # get labels of the neighbours pred_labels = [evaluation_labels[i] for i in inds[0]] # check if any dist is greater than 0.5, and if so, print the results no_of_matching_faces = np.sum([1 if d <=0.6 else 0 for d in dists[0]]) if no_of_matching_faces > 0: print("Matching face(s) found in database! ") verbose = True else: print("No matching face(s) not found in database!") # print labels and corresponding distances if verbose: for label, dist in zip(pred_labels, dists[0]): print(f"Nearest neighbours found in the database have labels {label} and is at a distance of {dist}")
Few important things to note here:
inds
contain the indices of the nearest neighbors in theevaluation_labels
set (line 6). For instance,inds = [[2,0,11]]
means label at index=2 inevaluation_labels
is found to be nearest to the probe image, followed by the label at index = 0.- Since for any image,
nn.neighbors
will return a non-empty response, we must only consider those results as a face ID match if the distance returned is less than or equal to 0.6 (line 12). (P.S. The choice of 0.6 is completely arbitrary).
For example, continuing with the above example whereinds = [[2,0, 11]]
and let’s saydists = [[0.4, 0.6, 0.9]]
, we will only consider the labels at index=2 and index = 0 (inevaluation_labels
) as a true face match because thedist
for the last neighbor is too large for it to be a genuine match.
As a quick sanity check, let’s see the system’s response when we input a baby’s face as a probe image. As expected, it reveals no matching faces found! However, we set verbose
as True, because of which we get to see the labels and distances for its bogus nearest neighbors in the database, all of which appear to be quite large (>0.8).
Evaluating the face recognition system
One of the ways to test whether this system is any good is to see how many relevant results are present in the top k neighbors. A relevant result is one where the true label matches the predicted label. This metric is generally referred to as precision at k, where k is predetermined.
For instance, pick an image (or rather an embedding ) from the probe set with a true label as ‘subject01’. If the top two pred_labels
returned by nn.neighbors
for this image are [‘subject01’, ‘subject01’], it means the precision at k (p@k) with k=2
is 100%. Similarly, if only one of the values in pred_labels
was equal to ‘subject05’, p@k would be 50%, and so on…
dists, inds = nn.kneighbors(X=probe_embs_example.reshape(1, -1), n_neighbors=2, return_distance=True) pred_labels = [evaluation_labels[i] for i in inds[0] ] pred_labels
----- OUTPUT ------
['002', '002']
Let’s go ahead and calculate the average p@k
value across the entire probe set:
# inference on probe set dists, inds = nn.kneighbors(X=probe_embs, n_neighbors=2, return_distance=True) # calculate avg p@k p_at_k = np.zeros(len(probe_embs)) for i in range(len(probe_embs)): true_label = probe_labels[i] pred_neighbr_idx = inds[i] pred_labels = [evaluation_labels[id] for id in pred_neighbr_idx] pred_is_labels = [1 if label == true_label else 0 for label in pred_labels] p_at_k[i] = np.mean(pred_is_labels) p_at_k.mean()------ OUTPUT -------- 0.9
Awesome! 90% Not too shabby but definitely could be improved (but that’s for another time)…
Kudos to you for following this through! Hopefully, this warm introduction to face recognition, an active area of research in computer vision, was enough to get you started. As always, if there’s an easier way to do some of the things I mentioned in this article, please do let me know.
Until next time 🙂
Podurama is the best podcast player to stream more than a million shows and 30 million episodes. It provides the best recommendations based on your interests and listening history. Available for iOS Android MacOS Windows 10 and web. Early users get free lifetime sync between unlimited devices.
This article was originally published on Towards Data Science and re-published to TOPBOTS with permission from the author.
Enjoy this article? Sign up for more AI updates.
We’ll let you know when we release more technical education.
Leave a Reply
You must be logged in to post a comment.