Have you ever written something and had your own visual in mind when you wrote it? Especially when it comes to short writings or thoughts, you are likely leaving those details up to the interpretation of the reader. What happens when that visual is brought to life, and the reader is an AI image generator, like Midjourney?
I recently took it upon myself to start experimenting with AI image generators, especially Midjourney, because I actually like the flexibility and adaptability within Discord. I know it's not for everyone, and they're working on a browser application, but it's how I got hooked. Through my experiments of the tools capabilities I started to wonder, how can AI surprise me in a good way? Just like a conversation with a text-based LLM that will write back to you in unexpected ways, I thought - I bet this can be extraordinary with AI image generators.
Spoiler alert, I wasn't wrong!
Experiment 1: Short Poetry
For this first experiment I took a bit of poetry I had written and pasted it verbatim into Midjourney. The first set of results was very stylized and in tune with what Midjourney would produce within it's tailored "aesthetic". However, you can also ask Midjourney to render images without it's aesthetic layered on top.
So I appended the --stylize 0 parameter at the end of the prompt to try and get it to be more general "AI" and less "Midjourney AI".
The Prompt:
for just a moment
i let the panic in,
& every moment after
i let the panic go.
there’s no room in this heart
to keep that sensation.
there’s no room in this heart
so i must set it free.
The Image:
The Result
In this rendering, the AI model focused on the generalization of what it thought the poetry was trying to convey. Even though the word "heart" was used a few times in the prompt, I was surprised (and delighted) to see that there wasn't a focus on making a heart front and center in every one of the images it rendered. It feels like Midjourney attempted to read the entire poem and create an image that aligned from a more expressive perspective and a less literal perspective.
Experiment 2: Short Poetry
The Prompt:
learning the greatest gift
is the journey
and the hardest loss
is the inability to find the adventure
when it’s right in front of you
The Image:
The Result:
It looks like the AI model in this case specifically focused on the term "journey" and once again interpreted this more broadly than falling for keywords in the prompt like "gift" "loss" and "adventure". This shows me that it takes into consideration the entirety of the prompt instead of focusing on a piece of the prompt in a specific area and then letting other elements be supporting of that main element in the prompt.
Experiment 3: Stream of Consciousness Thoughts
The Prompt:
sometimes you’re meant to be someone else’s villain no matter how hard you try to deny it
The Image:
The Result:
While I think the AI did a good job of trying to interpret what a villain might be, this series of prompts actually was hard to find the real meaning of the full prompt in the visual outputs. The image depicting a woman and a dragon actually feels more like the dragon is her guide, her pal, her sidekick and less like either one of them is the villain to the other. If you head over to my Instagram post, you can see that this one actually took the term villain and was reproducing a lot of Disney or movie character personalities.
This test yielded that sometimes the AI can be "too trained" to think of a particular thing in only one light. For example, how in the early stages of all image generators, it was very hard to find diversity, and even still today there is bias in the use of "woman" or "man" producing a white ethnicity, unless an ethnic adjective is provided along with it.
This is a similar thing that is happening with the term "villain".
Experiment 4: Stream of Consciousness Thoughts
The Prompt
sometimes silence is the best form of relief you can give yourself
The Image
The Result
With this rendering again, the AI was able to do it's best job of interpreting how a person can be silent without much description. This one was a really interesting test to see how an emotion that is normally conveyed with sound (or the lack of it) can be visually depicted by an AI model. There were a lot of images with people covering their mouths, covering their ears, being out in nature, but this one to me was the most impactful of all, by using the absence of things and relying more on color to do the storytelling just as much as the subject matter.
In the end, I think the AI did a really good job.
What Should You Do With This Information?
Take what you learned here and be inspired to test out AI image generators yourself. You don't have to use Midjourney either. There are plenty of other image generators out there like Adobe Firefly, RunwayML, Dalle3 (in ChatGPT!), StabilityAI and Leonardo AI (and probably so many more) that you can experiment with. Every AI model is built differently, which means you can get very different results by using the same prompt in a different image generator.
Experiment Yourself: Prompt any AI with the Captions
Take the description of every image I pasted throughout this article, and put it into an image generator. This is basically like a free prompt at your disposal if you like what you see! You can tweak it, modify the parameters, do whatever you want! Just for fun, I took the first image's caption and ran the prompt to see how it's caption as a prompt compared to the original image (remember that you can type the same prompt in twice and you will never get the same image a second time).
Experiment Yourself: Prompt the AI with Your Own Writing
Are you a writer? Are you a poet? Are you someone who journals and just puts your thoughts on paper? Have you ever not been able to show someone what you mean, but you can write it out and it makes so much more sense? Take some of the things you've written and ask AI to provide an accompanying visual, just like I did throughout this experiment.
If you don't have any writing, you can also level up your AI usage and ask a tool like ChatGPT to write you a short poem about a specific topic and see what you get. This is a double whammy AI test! Either way you try it out, the goal is to start having fun with AI and being inspired to create something every chance you get!
Notes on Using AI To Craft this Article
Note 1: The AI Model's Capabilities to Conduct the Experiment
I used Midjourney settings to experiment with the final output without actually changing the prompt. You can use parameters like stylize and weird to change how strong the interpretation of the prompt is within the AI model. For most of the images that were rendered, I took out Midjourney's base stylization and moved it down to zero. If I ever felt like I rendered the same prompt a few times and wasn't getting anything different, I could also have changed the weirdness, but since I wanted this to be more of a "baseline" analysis of AI's interpretation of words to visuals, I didn't stray too far off the path.
Note 2: Using the AI Model to Generate Descriptions
Since I was using very vague terminology, and sometimes prompts that didn't reflect the final visual at all, I needed to be able to add alternative descriptions or captions to the images themselves to help readers who may not have full vision interpret the results of the experiment. I wanted to make sure that I was accurately conveying the contents of the image to someone who may not be able to see that image. The best way to do this, efficiently, was through the use of Midjourney's /describe command. This feature allows a user to take an image and request the AI to describe it. The AI analyzes the image, and provides 4 slightly varying descriptions of the image provided. Using the information the AI model outputs from the image I was able to quickly create very descriptive captions that provided insight into what the image was trying to convey.
Why did I do this?
Because this experiment is tying data from ambiguous prompts to visual outputs, and to be able to see the results is a vital to understanding what the outcome of this experiment is trying to convey. If that is not conveyed through any other means than a visual image, the risk increase that there are people who cannot piece together the data and make sense of the story I am trying to tell here.
Note 3: I Didn't Use AI To Write This Blog Post
How silly of me, I know, to just sit here and actually write out word for word what I want to say about an experiment I conducted with AI. But that will be my next challenge is to get the AI to write it for me! One step at a time :)