Meta Platforms

When content creators don't provide alt text for their images, Meta's AI-powered Automatic Alt Text (AAT) generates machine descriptions so blind and low-vision Facebook and Instagram users can receive some form of image context through their screen readers -- a builder-side content measure that fills part of the gap left by billions of unlabeled photos.

ENABLE Model location

Builder-side Interventions → Create Accessible Content

What it is

Meta Platforms deploys Automatic Alt Text (AAT) technology across Facebook and Instagram. AAT uses object recognition technology based on a neural network with billions of parameters, trained on millions of examples, to generate text-based descriptions of images when the person who uploaded or posted the image does not include their own alt text. Screen reader users who focus on an image will hear either the description entered by the poster or "a list of objects, concepts or locations the image may contain as recognized by Facebook's AAT technology."1

AAT was first launched on Facebook in April 2016 for iOS screen readers in English, developed by a team including research scientist Shaomei Wu and engineering manager Hermes Pique under the direction of Jeffrey Wieland, then Director of Accessibility. The system was trained to recognize over 100 concepts -- people, objects, scenes, and activities -- producing descriptions like "Image may contain three people, smiling, outdoors."2 It was significantly upgraded in 2021 (AAT v2), expanding the range of recognizable objects and improving description quality. On mobile (Android, iPhone, iPad), users can also request a "detailed image description" that includes: the poster's own alt text (if any), Facebook's generated description, any text visible in the image (read top-left to bottom-right), position information for objects, size information used to determine the image's focus, and elements sorted by category (people, plants, objects, etc.).1

Meta also offers a Disability Answer Desk for accessibility feedback and supports users through the Be My Eyes app for live visual assistance.1

Why it matters

Facebook and Instagram host billions of photos. The vast majority are uploaded without alt text, making them invisible to screen reader users. AAT provides something when the content creator provides nothing. Meta itself acknowledges this: "The automatic alt text may not always be complete."1

The ideal solution -- content creators writing their own accurate, contextual alt text -- is still the responsibility of the user who posts the image. But because most users do not write alt text, Meta intercepts at the platform level to generate machine-produced descriptions. The result is better than silence but worse than a human-written description that captures the emotional, social, or narrative context of the image. A birthday photo might be described as "may contain: 3 people, cake, indoor" rather than "Grandma blowing out candles at her 80th birthday party." Under the stricter ENABLE definition, AAT does not count as a stopgap because Meta does not present it as intentionally temporary. It functions as standing, partial builder-side content.

Real-world example

Facebook's detailed image description feature illustrates both the capability and the limitation of AAT. When a screen reader user focuses on a photo and opens the action menu, they can select "Generate detailed image description" to receive structured information: object identification, spatial positioning, text extraction, and size-based focus detection.1 This multi-layered approach gives blind users significantly more information than the original 2016 AAT, which only listed recognized objects.

The team behind AAT described their motivation in terms of inclusion: "We want to build technology that helps the blind community experience Facebook the same way others enjoy it."2 Yet the feature remains opt-in (users must actively request the detailed description) and machine-generated (it cannot capture social context, humor, irony, or relationships between people). The detailed description for a meme might list "text, person, animal" without explaining the joke. That gap keeps AAT from counting as a complete accessibility solution, but it also keeps the feature in the category of ongoing builder-side content rather than intentionally temporary stopgaps.

What care sounds like

"Our platform generates automatic descriptions for every image uploaded without alt text."
"We prompt content creators to add their own alt text before posting -- and make it easy to do."
"We continuously improve our computer vision models to recognize more objects, scenes, and contexts."

What neglect sounds like

"Alt text is the user's responsibility -- if they don't add it, the image is just invisible."
"We don't have the resources to build an auto-description system."
"Screen reader users are a small percentage of our user base."

What compensation sounds like

"I ask sighted friends to describe photos in my feed because the auto-generated text is too vague."
"The alt text said 'may contain: 2 people, outdoor' -- I still don't know who's in the photo or what's happening."
"I use Be My Eyes to have a volunteer describe images that the AI can't."

ENABLE Model location

What it is​

Why it matters​

Real-world example​

What care sounds like​

What neglect sounds like​

What compensation sounds like​

Disclaimer