Bag of Visual Words – Efficient window histogram computation (MEX)

Bag of Visual Words (also known as Bag-of-Words) [LINKS] is a well known technique describing visual content in Pattern Recognition and Computer Vision. Idea is to represent an image or an object as a histogram of visual word occurrences. Here visual words are quantized local descriptors such as SIFT [WWW] or SURF [PDF]. Quantization of extracted descriptors is usually done using k-means [WWW] algorithm.

I have encountered a problem to efficiently compute such histograms not over whole image but over multiple sub-regions of an image. This requires fast feature selection enclosed by each region and histogram computation. Knowing what tools are available and after some search in net, I decided to implement my own Matlab MEX [WWW] version.

No normalization is carried out on these histograms.

You can download a function [HERE]. The code is under BSD license.

Sample data and an evaluation script showing the usage of the mexWindFind2s() can be downloaded [HERE].

You will need to provide the following information:

  • Feature coordinates (x,y) as well as visual word ID (words);
  • Window coordinates (x1,y1,x2,y2):
    each ith entry corresponds to a box: [left,top,right,bottom]
  • Total number of visual words.

Feature extraction such as SIFT, can be done using the VLFeat library [WWW]. Visual word computation can be done in two steps:

  1. Compute cluster centers on a set of descriptors using a very efficient k-means from YAEL toolbox [WWW]

    [ centers w ] = yael_kmeans( desc1, num_words );
    tree = [];tree.K = nwords;tree.depth = 1;tree.centers = int32( centers );
  2. Compute visual words for a set of descriptors using a function from the VLFeatlibrarywords = vl_hikmeanspush( tree, desc2 );
UPDATE: A novel function implements a significantly sped up version. No more brute force search.

About

Vladd... call me Vladd
This entry was posted in Uncategorized and tagged , , , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *