NeuralTalk set up/train/test

NeuralTalk – Deep Visual-Semantic Alignments for Generating Image Descriptions (CVPR’15) – Standford¬† Li Fei-Fei

NeuralTalk is a Python+numpy project for learning Multimodal Recurrent Neural Networks that describe images with sentences. It ranked 13rd in the current leaderboard of Microsoft Coco image caption challenge. NeuralTalk uses VGG features as input for Multimodal Recurrent Neural Networks.

The following sections are organised as:

  • 1. Installation
  • 2. Testing with pre-trained models
  • 3. Training our own models

 

1. Installation

NeuralTalk

MatConvNet: NeuralTalk uses VGG 4096-D features, so we will get this features using MatConvNet, which is a MATLAB toolbox implementing Convolutional Neural Networks (CNNs) for computer vision applications.

  • Download:¬†http://www.vlfeat.org/matconvnet/download/matconvnet-1.0-beta13.tar.gz
  • Compile the toolbox:
    > cd <MatConvNet>
    > addpath matlab
    > vl_compilenn
  • Download VGG models:
    >run matlab/vl_setupnn
    
    % download a pre-trained CNN from the web
    >urlwrite('http://www.vlfeat.org/sandbox-matconvnet/models/imagenet-vgg-verydeep-19.mat', 'imagenet-vgg-verydeep-19.mat') ;
    >net = load('imagenet-vgg-f.mat') ;

 

2. Testing with pre-trained models

The code allows you to easily predict and visualize results of running the model on COCO/Flickr8K/Flick30K images. We can run the code on arbitrary image, things get a little more complicated because we need to first need to pipe your image through the VGG CNN to get the 4096-D activations on top. In this post, I will show how to calculate 4096-D VGG features using MatConvNet. Later I will add how to calculate VGG features using Caffe. Say we want to test 2 images saved in example_images folder:

1. Modify tasks.txt
Clear tasks.txt and add
“tennis.jpg
skiing.jpg

2. In MATLAB: Calculate VGG features using MatConvNet, then save to vgg_feats.mat file:

cd /home/kien/Documents/MATLAB/matconvnet-1.0-beta13
feats = [];

net = load(‘imagenet-vgg-verydeep-19.mat’) ;
% obtain and preprocess an image
im = imread(‘/home/kien/neuraltalk/neuraltalk/example_images/tennis.jpg’) ;
im_ = single(im) ; % note: 255 range
im_ = imresize(im_, net.normalization.imageSize(1:2)) ;
im_ = im_ – net.normalization.averageImage ;
% run the CNN
res = vl_simplenn(net, im_) ;
% show the classification result
scores = squeeze((res(end).x)) ;
[bestScore, best] = max(scores) ;
figure(1) ; clf ; imagesc(im) ;
title(sprintf(‘%s (%d), score %.3f’,…
net.classes.description{best}, best, bestScore)) ;
b=res(42).x;
b = squeeze(b);
feats(:,1) = b;

im = imread(‘/home/kien/neuraltalk/neuraltalk/example_images/skiing.jpg’) ;
im_ = single(im) ; % note: 255 range
im_ = imresize(im_, net.normalization.imageSize(1:2)) ;
im_ = im_ – net.normalization.averageImage ;
% run the CNN
res = vl_simplenn(net, im_) ;
% show the classification result
scores = squeeze((res(end).x)) ;
[bestScore, best] = max(scores) ;
figure(2) ; clf ; imagesc(im) ;
title(sprintf(‘%s (%d), score %.3f’,…
net.classes.description{best}, best, bestScore)) ;
b=res(42).x;
b = squeeze(b);
feats(:,2) = b;

save(‘/home/kien/neuraltalk/neuraltalk/example_images/vgg_feats.mat’,’feats’);

3. In TERMINAL:
Go to neuraltalk folder, type: python predict_on_images.py lstm_model.p -r example_images/

Open result.html in examples_images folder to see the result

 

3. Training our own models

If we want to train the model, download the data to /data folder from:

Run the training $ python driver.py

Relax and waiting now…

Will dig into driver.py file later