How AI Image Generators can Help Enhance Digital Accessibility

As a certified professional in digital accessibility, I have dedicated my career to making digital spaces inclusive and accessible for everyone. The rapid advancements in artificial intelligence (AI) technology, particularly in AI image generation and description, offer transformative opportunities for enhancing digital accessibility. Let's explore how.

The Power of AI Image Generators

AI image generators, utilizing advanced algorithms and extensive datasets, have the potential to transform digital accessibility, particularly for individuals with visual impairments. I spoke about the risk of using AI generated images without alt text and clearly describing their intention in a previous article, but what about the opposite side of that conversation? How can we leverage the technology of AI image generators and all their capabilities as content creators, writers, designers, and developers to be able to provide fully inclusive experiences?

Improved Alt Text Generation

Current Standards: WCAG 2.2 AA guidelines emphasize the importance of providing text alternatives for non-text content. Some (not all) AI image generators can automatically create detailed and accurate alt text for images, ensuring that visually impaired users can understand the content of the image through screen readers. With tools like Midjourney's describe command, you can take your image and ask it for a detailed description.‍

Example: When an AI generator analyzes an image of a bustling city street, it might produce alt text such as "A busy street in a city with pedestrians walking, cars driving, and tall buildings on either side." This level of detail provides a richer understanding for the user.

Beyond the AI: It is important that you read the descriptions generated by the AI and add in your appropriate context. When it comes to digital accessibility, context is everything, and the AI tools are not integrated into a platform or layout to decipher an image's intended purpose (yet).

Enhanced Image Descriptions

Current Standards: WCAG 2.2 AA includes criteria for providing additional descriptions for complex images. AI can generate long descriptions for images that convey more intricate details than standard alt text.

Example: For a scientific diagram, an AI generator could produce a comprehensive description that includes each element of the diagram and its significance, making the content accessible to users who cannot see the image.

Beyond the AI: As mentioned in the previous section about alt text generation, the most important thing you can do is not to fully rely on the output of the AI to understand the context of your information. While AI can do a good job at reading what is in front of it, it may not correlate what the main point of the chart is trying to convey, just a more general "overall" explanation. It is important that you not only fact check, but add the intended value of the visual to accompany the description.

In both of these use cases, the AI is there as a guiding post to help you get started and generate the basics first, after that it's up to you to provide the important contextual value.

Real-Time Image Recognition and Descriptions

Current Standards: While not explicitly covered in WCAG 2.2 AA, real-time image recognition and description align with the principle of providing equivalent access to information.

Example: Tools like "Be My AI" use real-time image recognition to describe scenes and objects through a smartphone camera. This allows users to receive immediate feedback about their surroundings, significantly enhancing their ability to navigate and interact with the world.

This tool could be leveraged to help visually impaired users decipher the contents of a page, if it were to be installed on a desktop device and used in a similar fashion to a screen reader like Voice Over on a MAC.

Beyond the AI: As with all AI, we can't solely rely on the accuracy of the data and the way that the AI model interpreted it's surroundings. There should be a built in way for visually impaired users to utilize these tools but also take snapshots and get validation from external parties to verify the accuracy of what the AI is reporting out. This will be a great way to continue to enhance and improve the AI model's capabilities.

Future Directions for AI and Accessibility

What we've explored so far is all current-state capability with a few wish-list items. What if we continued to leverage the power of AI and brainstorm future advancements within AI image generators that could further enhance digital accessibility.

Allowing the AI to Adapt to User-Preferences

AI could adapt content in real-time based on user preferences and needs. For example, if a user prefers more detailed descriptions, the AI could provide additional context and information automatically.

Interactive Image Descriptions

Users could interact with images in a more dynamic way, asking the AI specific questions about different parts of the image to get more detailed information. This would create a more personalized and informative experience.

Enhanced Contextual Understanding

AI could improve its contextual understanding to provide more relevant descriptions based on the user's current activity or location within a digital platform. For instance, describing images differently depending on whether the user is shopping online, reading news, or browsing social media. As mentioned, context is what will make or break the purpose of the image, and accurately conveying the important context and image adds will only make the experience more usable.

Integration with Augmented Reality (AR):

Current Technology: The Be My AI tool already provides real-time assistance by capturing specific images or scenes and describing them to the user. It operates on a capture-and-describe model, where users take photos, and the AI provides detailed descriptions.

Future Potential with AR and Live Streams: Imagine extending this capability to a live stream scenario. In an augmented reality (AR) setup, a visually impaired user could use a smartphone or smart glasses equipped with a camera that continuously captures video. The AI, integrated with AR technology, could provide real-time audio descriptions of the environment. For instance, as the user walks down a street, the AI could narrate their surroundings, alerting them to obstacles, identifying landmarks, and describing activities around them. This could enhance navigation and situational awareness beyond the static images that tools like "Be My AI" currently use.

Make it Relevant to Digital Accessibility: Now if we think about how this applies to digital accessibility, we could have that same real-time live stream of a person browsing the web or using an application or playing an online video game where the real time closed captioning can be narrated with all of the additional features explored above like adapting to user preferences, interactive descriptions, and enhanced contextual understanding.

Clarifying AR and AI Integration

Current State (Be My AI):

Works to Relay Data Based on Specific Image Captures
Provides Audio Descriptions of the Images Capture
Assists Users by Giving Contextual Information Based on the Image Capture

‍
Future State with AR and Live Streams:

Can utilize continuous video capture via a smartphone, smart glasses, or AR headsets.
Can provide live, real-time audio descriptions from live streamed video
Can Enhance navigation, object recognition, and hazard detection in a dynamic environment.

By leveraging AI and AR in these innovative ways, we can continue to break down barriers and create more inclusive digital experiences. Ensuring that all users, regardless of their abilities, have equal access to information and services is a crucial step towards a more accessible future.

Conclusion

The integration of AI image generators and real-time visual assistance tools marks a significant step forward in the journey towards digital accessibility. As professionals creating content in a digital environment, it is our responsibility to leverage these technologies to create more inclusive digital environments. By embracing AI, we can ensure that everyone, regardless of their abilities, can access and engage with digital content, and elevate that experience in a more impactful way to help users digest content more efficiently and more accurately without clouded code or additional barriers.