In this experimentation of AI tools for creatives, I'm going to ask Midjourney to describe to me using it's own AI model what the image is, and then re-run that description to see how closely we can replicate the styles of famous periods of art.
About the AI Tool (Midjourney)
Midjourney is an AI image generator that has gained popularity in the art community. Launching just a few short years ago, the tool has quickly become a go-to for any creative's workflow. Part of the features we'll be using are Midjourney's "Describe" command and Midjourney's "Imagine" command.
Midjourney /describe
The describe command is used for two essential purposes, explaining the contents of a visual and determining the dimensions of a visual. According to Midjourney's website, "The /describe command allows you to upload an image and generate four possible prompts based on that image. Use the /describe command to explore new vocabulary and aesthetic movements."
Midjourney /image
The imagine command is where all the magic happens! Imagine is the way we ask Midjourney to create the images we are looking for. According to Midjourney's website, The /imagine command generates a unique image from a short text description (known as a Prompt).
Replication Test Artist and Work: Paul Cezanne Still Life with Skull
Paul Cezanne Still Life Description
The first thing we're going to do is take our sample image and ask Midjourney to give us it's version of a description of this image.
Interestingly enough, Midjourney did use the name Paul Cézanne in one of its descriptions! Here is what we got:
Description 1
fruit on top of a plate with a skull, in the style of post-impressionist colorism, narrative tableaux, haunting figuratism, paul cézanne, large-scale canvases, františek dvořák, skeletal --ar 64:53
(for those unsure, the command --ar 64:53 is the dimension of the image! "ar" stands for aspect ratio)
Description 2
a white cloth with fruit, in the style of haunting portraiture, fauvist color scheme, vanitas paintings, paleocore, research as art, 1860–1969, oil portraitures --ar 64:53
Description 3
a painting shows a bunch of fruits on an old wooden table, in the style of haunting figuratism, skeletal, paul cézanne, skull motifs --ar 64:53
Description 4
table with fruits on it, in the style of macabre subjects, post-impressionist colorism, skull motifs, emphasizes emotion over realism, large canvas sizes, skeletal, neo-impressionist techniques --ar 64:53
Paul Cezanne Inspired Still Life Image Generations
How much value is in the artists name?
The first thing I want to test is seeing how the language model responds to the usage of the artists name in the prompt. In this case Paul Cézanne was used in both description 1 and description 3, so we have two different prompts we can use to test Midjourney's accuracy in both describing what it sees and matching the visual it was given.
Description 1 from Midjourney:
fruit on top of a plate with a skull, in the style of post-impressionist colorism, narrative tableaux, haunting figuratism, paul cézanne, large-scale canvases, františek dvořák, skeletal --ar 64:53
Description 3 from Midjourney:
a painting shows a bunch of fruits on an old wooden table, in the style of haunting figuratism, skeletal, paul cézanne, skull motifs --ar 64:53
What we learn from this, is that Midjourney does a pretty good job of interpreting the ratio of skull to fruit, and even the cropping and realtive space the fruit should take up in the entire canvas. Using a term like the artist name also helps us get a closer depiction of the medium without having to specifically describe that.
Although the medium was not perfect, it was less of a distraction than the coloration of the images rendered and the softness in application of the actual paint. It looks like Midjourney struggled to tell us how to match those elements in its output of these two prompts.
So which ones are "good enough?" For the purpose of this article, let's compare side by side the artists work against a Midjourney rendering.
In the first grid, the results of Description 1, the closest depiction in my eyes would be the first one (top left in the rendered grid). I like it for the placement of items, ratios of elements, amount of white space in the canvas, and its the most accurate to the color palette and tone of the original image. What I'm missing the most is the white cloth and it looks like we've somehow added a plate into the mix!
In the second grid, the results of Description 3, my favorite would have to be the third image (bottom left in the rendered grid). The relative sizing and placement of items and the amount of white space in this rendering most closely match my original image. This one also, out of the whole group, most closely imitates the medium of the artist the best. We're still missing our white table cloth though! Also, I appreciate the second skeleton hiding in the background as a little easter egg! Lowkey is what makes this one the best of them all!
What does it render without the artist name?
Now that we've taken a look at the description renderings that included the name of the artist, what do we get when that is removed? Descriptions 2 and 4 can shed some light on the importance of the different terminology used to describe and recreate an image, even if we lack key distinguishing nouns.
Let's take a look!
Description 2 from Midjourney:
a white cloth with fruit, in the style of haunting portraiture, fauvist color scheme, vanitas paintings, paleocore, research as art, 1860–1969, oil portraitures --ar 64:53
Description 4 from Midjourney:
table with fruits on it, in the style of macabre subjects, post-impressionist colorism, skull motifs, emphasizes emotion over realism, large canvas sizes, skeletal, neo-impressionist techniques --ar 64:53
What we learn from these sets of descriptions is that you can add specific details, like "white cloth" or "table" to direct the scene a little more. What we do lose is a more refined reincarnation of the artist style. We see broad terminology like "neo-impressionist" used over the name of the artist "Paul Cézanne" and get close, but not as close as we do with the actual specificity and point of reference that the name of the artist adds to the prompt value.
If I were to pick images that I think stood out from each of these groups, I'd say in description 2 the second image (top right) most closely reflects the original image provided to Midjourney. While it is very clear that the technique of the painted canvas is different, the overall composition and contents of the still life are a close match. And we finally got our white table cloth!
The final grouping I struggle with the most, and went back and forth a few times trying to determine which one I think best fits the imitation of the original image. Ultimately, I think the final image of the grid (bottom right) made the most sense. The tone of the image was the least vibrant in a lot of ways (although not as flat as Cézanne's color layering, and has much harsher lighting) and the composition was the least chaotic out of the group. We definitely have extra elements we don't want, but we positively did get our white table cloth in this grouping as well!
If there is one key takeaway that these last two got right was the table cloth and table setting. I think combining these prompts would give us a much closer representation of the original image. This is why it is important, when you have a specific style you're trying to match, that you go through some testing and trial and error phases. It's probably not going to be perfect on the first try, but we can piece together a string of terms that the LLM (Midjourney) can respond to that will most accurately reimagine the image we are seeking.
Can We Replicate History with AI?
After going through this initial test, it is easy to see how we can emulate the styles of famous artists through language. Leaning into art history combined with learning from the language of the model, we can come up with creative ways to reimagine a world through the perspective of famous artists like Paul Cézanne.
As one last and final curious experiment with Midjourney, I simply entered in "/imagine Paul Cezanne Still Life with Skull" to see if, without any context at all, Midjourney would replicate the art in question better or worse than when it was given the image to describe first.
Interestingly enough, I think it did worse! The style is much harsher, and we have now gotten the ratio of skull to fruit all out of proportion! I do think there is a little surprise in the last image that has two skulls and two lemons, it feels the most accurate of all the rendered images to the softness of application of paint that Cézanne's work had. Otherwise, our colors, lighting, and even subject matter are not even close!
Let's keep exploring!
In Part 2, we take a look at Sandro Botticelli's Birth of Venus, and see if we can get Midjourney's AI Describe and Prompting to emulate our artist.