Sketch Me That Shoe



We investigate the problem of fine-grained sketch-based image retrieval (SBIR), where free-hand human sketches are used as queries to perform instance-level retrieval of images. This is an extremely challenging task because (i) visual comparisons not only need to be fine-grained but also executed cross-domain, (ii) free-hand (finger) sketches are highly abstract, making fine-grained matching harder, and most importantly (iii) annotated cross-domain sketch-photo datasets required for training are scarce, challenging many state-of-the-art machine learning techniques.In this paper, for the first time, we address all these challenges, providing a step towards the capabilities that would underpin a commercial sketch-based image retrieval application. We introduce a new database of 1,432 sketch-photo pairs from two categories with 32,000 fine-grained triplet ranking annotations. We then develop a deep triplet ranking model for instance-level SBIR with a novel data augmentation and staged pre-training strategy to alleviate the issue of insufficient fine-grained training data. Extensive experiments are carried out to contribute a variety of insights into the challenges of data sufficiency and overfitting avoidance when training deep networks for fine-grained cross-domain ranking tasks.


New Dataset! (ShoeV2: 2000 photos + 6648 sketches)




  			title={Sketch Me That Shoe},
  			author={Yu, Qian and Liu, Feng and SonG, Yi-Zhe and Xiang, Tao and Hospedales, Timothy and Loy, Chen Change},
  			booktitle={Computer Vision and Pattern Recognition},

Results Updated: On Shoes dataset, acc.@1 is 52.17%. On Chairs dataset, acc.@1 is 72.16%. Please find further details here (Extra comment 1).


Try our demo!