Bag of Visual Words (also known as Bag-of-Words) [LINKS] is a well known technique describing visual content in Pattern Recognition and Computer Vision. Idea is to represent an image or an object as a histogram of visual word occurrences. Here visual words are quantized local descriptors such as SIFT [WWW] or SURF [PDF]. Quantization of extracted descriptors is usually done using k-means [WWW] algorithm.
I have encountered a problem to efficiently compute such histograms not over whole image but over multiple sub-regions of an image. This requires fast feature selection enclosed by each region and histogram computation. Knowing what tools are available and after some search in net, I decided to implement my own Matlab MEX [WWW] version.
No normalization is carried out on these histograms.
You can download a function [HERE]. The code is under BSD license.
Sample data and an evaluation script showing the usage of the mexWindFind2s() can be downloaded [HERE].
You will need to provide the following information:
- Feature coordinates (x,y) as well as visual word ID (words);
- Window coordinates (x1,y1,x2,y2):
each ith entry corresponds to a box: [left,top,right,bottom]
- Total number of visual words.
Feature extraction such as SIFT, can be done using the VLFeat library [WWW]. Visual word computation can be done in two steps:
- Compute cluster centers on a set of descriptors using a very efficient k-means from YAEL toolbox [WWW]
[ centers w ] = yael_kmeans( desc1, num_words );tree = ;tree.K = nwords;tree.depth = 1;tree.centers = int32( centers );
- Compute visual words for a set of descriptors using a function from the VLFeatlibrarywords = vl_hikmeanspush( tree, desc2 );