Multimodal Ferramenta dataset

Ferramenta dataset consists of 88.010 images split in 66.141 images for train and 21.869 images for test, belonging to 52 classes (paint brush, hinge, tape, safe, cart, etc.). Text descriptions in Ferramenta dataset contain 22045 different words for the train set and 20083 for the test set, all randomly selected. Ferramenta dataset was collected from different sellers available in the price comparison website Trovaprezzi. The ground truth was created using a query based software that clusters commercial offers based on a text matching system. After each query, three co-located human annotators, analyzed the intra-class image similarity and exploiting the text to resolve ambiguity. We are aware that this version of Ferramenta dataset contains few false positive offers. The following figure shows some images from the Ferramenta dataset, one for each class of the dataset.
Ferramenta dataset examples

Please, cite the paper Multimodal Classification Fusion in Real-World Scenarios if you use this dataset.
Authors: Ignazio Gallo, Alessandro Calefati and Shah Nawaz

Applied Recognition Technology Laboratory

Department of Theoretical and Applied Science

Multimodal Ferramenta dataset