Generating alt-text using open models

Using LLMs to create content is a delicate matter in the news industry. Where I work our news team has a healthy skepticism for AI-derived tools that – while they might ease their workload – also degrade the quality of the larger work. I think our news team worries to a lesser extent about “AI tools” taking their jobs because they rightfully see themselves as damn good writers and producers that would be hard to replace. Regardless, at CPR.org our AI policy rules out replacing members of the news team with tools.

Finding tools that help iron out the wrinkles in their jobs is a different matter. Alt-text is one such wrinkle. Alt-text is good for SEO and even better for accessibility. It’s also really boring to do, and for an understaffed and over-worked news production team it adds real friction to getting the news out.

As an experiment we created a quick and dirty alt-text generation tool that uses a light-weight language vision model, LLaVA 1.5 7B. CPR.org uses WordPress, so this tool is designed to work with WordPress’ media library and within WordPress posts. As this is a proof of concept we didn’t go the plugin route and instead took the light-weight approach of a bookmarklet.

We don’t host LLaVA ourselves, but rather rely on Cloudflare’s edge and an edge worker to host it. Because it’s still listed as a beta model on Cloudflare it’s free to use. Because it’s free to use it’s definitely not as polished as some tools out there, but it has its advantages:

The data stays with us, which is one of my favorite things
The content it generates is editable and requires a human to accept its changes

All of this is under the BSD 3-clause license, so it’s free to use. It requires two things to work: a WordPress site and a Cloudflare account. Image processing like this is getting less and less resource intensive. In a year or less this type of thing will happen on the user’s CPU. But for now, check it out.

Now I just need to get it to work for Hugo!