We gratefully acknowledge support from the Simons Foundation, member institutions , and all contributors. Donate [Submitted on 29 Nov 2018 ( v1 ), last revised 9 Nov 2022 (this version, v3)]

Title: ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness

View PDF Abstract: Convolutional Neural Networks (CNNs) are commonly thought to recognise objects by learning increasingly complex representations of object shapes. Some recent studies suggest a more important role of image textures. We here put these conflicting hypotheses to a quantitative test by evaluating CNNs and human observers on images with a texture-shape cue conflict. We show that ImageNet-trained CNNs are strongly biased towards recognising textures rather than shapes, which is in stark contrast to human behavioural evidence and reveals fundamentally different classification strategies. We then demonstrate that the same standard architecture (ResNet-50) that learns a texture-based representation on ImageNet is able to learn a shape-based representation instead when trained on "Stylized-ImageNet", a stylized version of ImageNet. This provides a much better fit for human behavioural performance in our well-controlled psychophysical lab setting (nine experiments totalling 48,560 psychophysical trials across 97 observers) and comes with a number of unexpected emergent benefits such as improved object detection performance and previously unseen robustness towards a wide range of image distortions, highlighting advantages of a shape-based representation. Subjects: Computer Vision and Pattern Recognition (cs.CV) ; Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC); Machine Learning (stat.ML) Cite as: arXiv:1811.12231 [cs.CV] arXiv:1811.12231v3 [cs.CV] for this version)

Submission history

From: Robert Geirhos [ view email ]
[v1] Thu, 29 Nov 2018 15:04:05 UTC (7,714 KB)
Mon, 14 Jan 2019 13:59:09 UTC (6,282 KB)
Wed, 9 Nov 2022 23:15:15 UTC (4,790 KB)
View a PDF of the paper titled ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness, by Robert Geirhos and 5 other authors
  • View PDF
  • TeX Source
  • Other Formats
  • Current browse context:
    cs.CV
    recent | 2018-11 Change to browse by: cs.AI
    cs.LG
    q-bio
    q-bio.NC
    stat.ML
    listing | bibtex Robert Geirhos
    Patricia Rubisch
    Claudio Michaelis
    Matthias Bethge
    Felix A. Wichmann
    export BibTeX citation

    arXivLabs: experimental projects with community collaborators

    arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

    Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

    Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs .