14 min read

How I Automate Image Alt Text Using Simple AI Scripts (And Never Do It Manually Again)

How I Automate Image Alt Text Using Simple AI Scripts (And Never Do It Manually Again)

Learn my exact workflow to automate image alt text using AI. I share Python scripts, prompt engineering tips, and the two mistakes most SEOs make.

Table of Contents

(Note: If you are reading this on desktop, you will see this Table of Contents neatly pinned to the left of the article content for easy navigation while we dive deep into the code.)
  • The Breaking Point: Why I Built This
  • The Two Massive Mistakes People Make with Alt Text Automation
  • The AI Architecture I Use
  • The Python Script: Step-by-Step Breakdown
  • Prompt Engineering for Perfect Descriptions
  • Deploying at Scale Without Crashing Your CMS
If you want to automate image alt text at scale, you have to stop treating it like an afterthought. When I first started building programmatic SEO sites for ProgSEO, I hit a massive bottleneck. I was generating thousands of pages dynamically, but every single image needed an accessibility-compliant, keyword-rich description. I tried delegating it. I tried ignoring it. I even tried those terrible WordPress plugins that simply convert your image filename into a string of text. Nothing worked. It was a miserable, soul-crushing grind. I've processed over 150,000 images across dozens of enterprise builds since then, and I haven't written a single alt tag by hand in two years. Alt text is arguably the most neglected on-page SEO factor simply because it is incredibly boring to write. But when you automate it properly, it becomes a massive competitive advantage. In this guide, I am going to walk you through exactly how I use simple Python scripts and the OpenAI Vision API to write perfect, context-aware alt text.

The Two Massive Mistakes People Make with Alt Text Automation

Before I show you the code, we need to talk about why most automated alt text pipelines fail. I have audited countless sites where developers proudly showed off their 'AI alt text' solution, only for me to find out it was actively hurting their search rankings. Let's cover the two critical mistakes I see almost every week.

Mistake #1: Relying on Filename Fallbacks
People think that stripping the hyphens out of an image filename constitutes a valid alt tag. You upload `blue-running-shoes-nike.jpg` and a basic script spits out `alt="blue running shoes nike"`. That is a disaster for accessibility and a missed opportunity for semantic SEO. Screen readers sound completely broken when reading these out, and Google's image processing algorithms easily detect this lazy manipulation. Filename-based alt text plugins are honestly worse than having no alt text at all. If you leave it blank, at least the screen reader skips it. If you feed it garbage, you ruin the user experience.

Mistake #2: Zero Context Vision Processing
When developers first discover vision models, they usually write a script that sends the raw image to the API and asks, 'What is this?' The AI responds with a literal description. If you send it a screenshot of an analytics dashboard, the AI says `alt="A computer screen showing a line graph with an upward trend"`. This is functionally useless. What is the graph about? What is the trend? Who is the audience? The AI doesn't know because you didn't give it the surrounding text. Context is everything. Sending an image to an AI without the surrounding H2 and paragraph is the fastest way to generate generic, useless descriptions. Always pass the DOM context.

Writing alt text manually is a punishment for SEOs who refuse to learn basic Python. Context-aware AI generation is the only way forward.

The AI Architecture I Use

My setup is surprisingly lightweight. I don't use heavy frameworks or complex distributed systems. I rely on a simple Python script, a headless crawler, and the OpenAI GPT-4o API. Let's break down the stack.

First, I use a script to map out every image on the site missing an alt attribute. I prefer crawling the live DOM rather than querying the database directly because it allows me to grab the adjacent text (usually the preceding paragraph and the nearest heading). Once I have the image URL and the text context, I convert the image to a base64 string. I send this payload to the Vision API with a very specific system prompt. Finally, the script takes the JSON response from OpenAI and pushes the update back to the CMS via a REST API endpoint.

I have tested several open-source vision models like LLaVA, and while they are free, they are currently too slow and hallucinate too often for reliable, automated publishing. Sticking to OpenAI for this specific task is worth the fraction of a cent per image.

Headless Crawler (Playwright)

Scrapes the live site to locate images missing alt tags and extracts the surrounding HTML context.

Vision API (GPT-4o)

Analyzes the base64 encoded image alongside the textual context to generate a highly accurate description.

CMS Updater (REST API)

Pushes the finalized alt text directly to the media library without requiring database SQL injections.

The Python Script: Step-by-Step Breakdown

Let's get into the mechanics. The first step in my Python pipeline is fetching the image and encoding it. The Vision API cannot simply 'look' at an image URL if it sits behind a firewall or a staging environment, so converting it to a base64 string locally is the safest bet.

Here is how I think about the payload. I construct a message array where the first item is the system prompt. The second item is the user prompt containing the surrounding article text. The third item is the base64 image data. I wrap all of this in a strict JSON schema requirement. Structuring your AI prompt strictly as a JSON return is the absolute only way to prevent hallucinations and broken formatting. If you ask for plain text, the AI will inevitably start a response with 'Sure, here is the alt text for your image:' which will get injected directly into your HTML.

Once the script receives the JSON response containing the newly minted description, it fires off a `PUT` request to the WordPress (or custom CMS) REST API. I set a delay of 1.5 seconds between requests. Slamming your server with 500 API calls a second is a rookie mistake. Throttling is mandatory.
  • Fetch the image URL locally using the `requests` library.
  • Convert the raw bytes into a base64 string using `base64.b64encode`.
  • Construct the payload with the surrounding paragraph and H2.
  • Send to `gpt-4o` with `response_format={ "type": "json_object" }`.
  • Parse the JSON, extract the text, and PUT it via your CMS API.

Prompt Engineering for Perfect Descriptions

Your script is only as good as your prompt. When I first built this, the AI was way too verbose. It would generate 40-word descriptions that failed accessibility audits. WCAG guidelines recommend keeping alt text under 125 characters because many screen readers pause or truncate after that limit.

My system prompt is fiercely restrictive. I tell the AI exactly what persona to adopt: 'You are an expert web accessibility auditor and SEO specialist.' Then, I lay out the rules. No filler words. No starting with 'An image of' or 'A picture of'—screen readers announce the element as an image automatically. I also enforce the 125-character limit strictly. The most critical instruction I give is regarding context. I tell the AI: 'Use the provided surrounding text to understand the specific entities, brand names, or data points in the image. If the text mentions Q3 Revenue, ensure the chart description reflects Q3 Revenue.'

Writing strict, aggressive prompts is the secret to taming LLMs. You have to treat the AI like a brilliant but wildly undisciplined intern.
Image TypeBad AI Output (No Context)Great AI Output (With Context)
SaaS Dashboard ScreenshotA computer screen with a dark mode interface and a blue line graph.ProgSEO analytics dashboard showing a 45% increase in organic traffic over 30 days.
E-commerce ProductA pair of black running shoes on a white background.Nike Air Zoom Pegasus 39 running shoes in triple black.
Author HeadshotA smiling man with a beard wearing a blue shirt.John Doe, Lead SEO Strategist at ProgSEO, smiling in a blue button-down shirt.

Deploying at Scale Without Crashing Your CMS

So, you have a beautiful Python script and perfect AI outputs. Now you need to inject those descriptions back into your site. This is where many developers get impatient. I've seen people write raw SQL queries to update their database directly because it's faster. Pushing updates directly via SQL is a rookie move that bypasses CMS cache layers and often corrupts serialized data arrays.

Always use the official REST API of your platform. If you use WordPress, use the `/wp/v2/media/{id}` endpoint. If you use Webflow or Contentful, use their native SDKs. This ensures that your caching plugins (like WP Rocket or Cloudflare) are properly notified to purge the cache for the affected URLs. I also log every single update to a local CSV file. If the AI happens to go rogue, or if I accidentally run the script on the wrong category, I have a localized backup to restore from. I run this script as a cron job every Sunday at 2 AM. By the time I wake up, any missing alt text from the week's content sprints is fully patched, optimized, and deployed.
150,000+
Images Automated
$0.003
Average Cost per Image
99%
Accessibility Score Improvement
Most free plugins use your file name, which is terrible for SEO. AI-based plugins exist, but they are incredibly expensive per image. Running your own Python script costs fractions of a cent.
Decorative images (like borders or pure background graphics) should have an empty alt tag (alt=""). My script identifies these based on CSS classes and safely skips them.
No. Google rewards accurate, descriptive, and helpful alt text. They do not care if an AI wrote it, as long as it accurately describes the image in the context of the page.

Ready to scale your Programmatic SEO?

Stop doing manual SEO tasks. Let us help you build automated, high-traffic assets using advanced AI workflows.
Explore ProgSEO Services