Wednesday, 1 July 2015

Getting started with OpenCV and Hand Gesture Recognition

For the past few days I have been working on OpenCV and trying to build a hand gesture recognition system.
Even though it is far from complete and needs many more improvements and I wanted to write about the progress that I have made.
So, I have detailed the steps that I followed and the links that helped me.

Setting up the environment:

  1. Install and configure OpenCV: I used the first method and installed it using the pre-built libraries in C:\
  2. Install Visual Studio 2013.
  3. Open Visual Studio and start a new project. Choose Win32 console application and name it appropraitely.(Yes, even for 64 bit applications. Apparently 32 in Win32 does not refer to it's bit-ness.for more details refer this.)
  4. In the create project wizard , select empty project in the second window of the wizard and keep everything else as it is.
  5. Right click on project name and open properties->configuration->manager->platform->new->x64->ok->close(this will make the application run in 64 bit systems)
  6.  Including C/C++ opencv libraries:
    C/C++ -> General->Additional Include Directories ->edit -> double click empty area -> click .. -> browse to C:\opencv\build\include
    similarly include:
    Click OK
  7. Including Linker Libraries:
    Linker->General->Additional Library Directories->edit->browse and select C:\opencv\build\x64\lib
    Linker->Input->Additional Dependencies->edit->copy the names of files with d in the end. in m case it was:
    Click OK
  8. Add OpenCV path to system environment: C:\opencv\build\x64\vc12\bin
  9. Right Click Source files under the project->name->add->new item->C++ File
Most of the steps I followed are same as this excellent youtube video so if you do not understand any of the above steps, you can refer to the linked video.

Where to start?

For getting started with OpenCV I followed some of the online tutorials to understand how the library works and what are the common function and how to read images and videos and webcam(whixh is also a video btw).
These are some of the links that helped me with it:
The official OpenCV tutorial seems pretty good and whenever I want to see how a function is used I look at their sample code from this tutorial. Unfortunately I got to know about it pretty late so, couldn't do it from start to finish but it is definitely on my to-do list.

For the actual hand gesture recognition, we first require to segment our target i.e. separate the hand from the rest of the frame, this is called image segmentation.The three common techniques of image segmentation are:

  1. Static Segmentation using RGB or HSV values 
  2. Edge Detection
  3. Background Subtraction
Static segmentation seems to gives good results like in this tutorial but it requires manually segmenting your hand using RGB/HSV values which I don't think will work in a general case with different skin color, different background colors and different lighting conditions.So, this method is out of the question.
Edge detection somehow doesn't seem right either and I don't really know how I would separate edges of my hand from the rest of edges whereas Background Subtraction seems much more intuitive and feasible right now so i'll go with Background Subtraction.(if Background subtraction doesn't work out maybe i'll try to explore Edge detection a little more).

Alright , so Background subtraction it is!

Background Subtraction

As the name suggests , in this technique we try to subtract the background of the image to get the segment of the image that we are interested in.
How do we decide what is the background and what is the target segment?
Well, the object in the frame that is moving is generally the target and the rest is the background.
The most basic way of doing this seems to be to just subtract two consecutive frames.
I got this idea of subtracting two frames from here. So if you wanna know more about it I would recommend watching the video, it is pretty good and it also helped me with the code.
There are also other methods like cv::BackgroundSubtractorMOG2 and cv::BackgroundSubtractorKNN but they don't seem to be giving good resullts , there is too much noise in the output. Maybe i'm missing something and might wanna try tweaking some parameters.
So, i might give these other methods a try later on.
The code for these methods is well explained here

So, the steps I followed were:
  1. Read the video from webcam.
  2. Capture two frames, convert them to grayscale, flip them.
  3. Filtering the image:
        Convert the image to binary using threshold function
        Blur the image to filter some noise using GaussianBlur(Bilateral filter didn't seem to work         because it require different types of images , maybe if I could convert the input frames to the     required format, it might work but i have left it for now).
        Use erode/dilate as required. I used morphologyEx library function which is a combination       of erode and dilute to cancel out some noise and make the contour of my hand better                 connected.
  4. Find the contour and convex hull of the image and display.
So, this was the final result:

Scope of improvement::
  1. The hand contour is not exact and there is some extra noise. Try to tweak some of the parameters of filters and try one of the denoising functions from opencv to remove the noise.
  2. Convex Hull seems pretty good in this picture but it is still not exact and it is not consistent but I think the convex hull and convexity defects , all depend on how good my contour is, so improving my contour using filtering and other techniques should automatically improve these.
  3. There is slight flicker in the output of this method of subtracting the frames which is not present when I use other background subtraction methods like MOG2 or KNN.
    So, that needs to be resolved.
Once these problems are resolved , then I can work on recognizing the hand gesture by maybe counting the number of convexity defects or some other method.

1 comment:

  1. hello can you share the code to


Math of Intelligence : Logistic Regression

Logistic Regression Logistic Regression ¶ Some javascript to enable auto numberi...