Intriguing properties of generative classifiers: generative models align with human visual perception for object recognition

Priyank Jaini

Google DeepMind (Toronto)

The NLP Reading Group is delighted to receive Priyank Jaini who will be giving a talk in-person at Mila on “Intriguing properties of generative classifiers: generative models align with human visual perception for object recognition”.

Priyank will be available to chat in person at Mila after his talk. The sign-up form can be found in the email announcing this talk.

Talk Description

How does the human visual system recognize objects—through discriminative inference (fast but potentially unreliable) or using a generative model of the world (slow but potentially more robust)? The question of how the brain combines the best of both worlds to achieve fast and robust inference has been termed ``the deep mystery of vision’’ (Kriegeskorte 2015). Yet most of today’s leading computational models of human vision are simply based on discriminative inference, such as convolutional neural networks or vision transformers trained on object recognition. In this talk, I will revisit the concept of vision as generative inference. This idea dates back to the notion of vision as unconscious inference proposed by Helmholtz (1867), who hypothesized that the brain uses a generative model of the world to infer probable causes of sensory input. By turning text-to-image generative models into zero-shot classifiers, I will compare them against a broad range of discriminative classifiers and against human psychophysical object recognition data. I will discuss four emergent properties of generative classifiers: They show a record-breaking human-like shape bias (99% for Imagen), near human-level accuracy on challenging distorted images, and state-of-the-art alignment with human classification errors. Last but not least, generative classifiers understand certain perceptual illusions such as the famous bistable rabbit-duck illusion or Giuseppe Arcimboldo’s portrait of a man’s face composed entirely of vegetables, speaking to their ability to discern ambiguous input and distinguish local from global information. Taken together, these results indicate that while the current dominant paradigm for modeling human object recognition is discriminative inference, zero-shot generative models approximate human object recognition data remarkably well.

Speaker Bio

Priyank Jaini is a research scientist at Google DeepMind in Toronto. His research interests are in all things related to generative models and their applications.

Logistics

Date: September 13th
Time: 11:30AM
Location: Auditorium 2 or via Zoom (See email)