VQA

Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding

https://arxiv.org/abs/1606.01847

 

Advertisements