Follow our formula for creating beautiful image sets with a consistent style. Ideal for content creators, marketers, web designers and more. This article assumes you have a basic familiarity with how to use Midjourney.
Midjourney has been setting the internet ablaze in the past few weeks. Twitter, Reddit and LinkedIn are full of near-perfect image generations from the service. Pretty much anyone can produce great images with it – if you take a look at the basic parameters, and spend 15 minutes playing around with prompts, chances are you’ll produce something that you’d be proud to publish.
The hard part, however, is getting multiple images that work together. If you’re involved in producing websites, articles, apps or web content of any kind, you’ll know that consistency is key. You want your assets to work together to present a unified aesthetic across whatever piece of content you’re working on.
Without a bit of prompting know-how, you’ll probably end up with image sets that look random and unprofessional. For example, if you were trying to produce an image set for an educational course on Enlightenment philosophy, chances are you’d write prompts like this:
They’re pretty great images, all things considered, but they aren’t consistent. We’ve got one photograph, a graphic, a black-and-white illustration, and an engraving-style print.
What we’re aiming for (which is what we’ve produced with our method) is something like this:
These now look much more consistent and create a sense of continuity across the course.
So how do we do this? The answer, as you can probably guess, lies in creating consistency in our prompts.
We’ve devised a repeatable structure for writing prompts for large image sets that allows you to control the level of consistency, as well as the elements of your images that you want to stay consistent. This structure also makes your prompts modular – you can swap in and out the elements you want, making it easier to produce prompts at scale, especially if you are writing them in a spreadsheet.
The structure works by breaking down the prompt into five elements, separated by semi-colons. These are:
Scene depicted; Location; Lighting; Art style; Parameters
The reason we break these down into five parts is that we then have five possible levels of consistency to play with.
If we want to have images that are kinda similar, but are still varied enough to avoid repetition, we can just keep two elements constant (in this instance, Art style and Parameters), and change the rest based on the contents of each image. In our system we’ll call this a two-factor similarity, because only two elements remain constant.
The prompts for a two-factor image set would look like this (with the constant elements in bold):
Jeremy Bentham sitting on a bench; A light, airy terrace; Bright daylight; 18th-century oil painting; –v 5 –q 2 –s 130
René Descartes at a desk, looking towards us; The Bodleian Library, Oxford; Golden hour light coming through the windows; 18th-century oil painting; –v 5 –q 2 –s 130
Galileo looks at a globe, deep in thought; An Italian villa; Midday sunlight; 18th-century oil painting; –v 5 –q 2 –s 130
An eighteenth-century telescope next to a tree; A grassy field; Bright Springtime light; 18th-century oil painting; –v 5 –q 2 –s 130
And if you want to see the outputs just look at the four images above! Those are the images we produced with two factors of similarity. We find two-factor similarity, specifically ones where the art styles and parameters are the shared elements, to be the ‘Goldilocks’ structure for our use case – retaining stylistic consistency without being repetitive or boring.
However, if you want to create greater consistency, or greater randomness, in your image sets, feel free to play around with one-factor, three-factor or even four-factor similarity in your prompt sets. Of course, a five-factor similarity would just be the same prompts repeated again and again!
The risk when you start adding higher-factor similarities is that they become repetitive. To illustrate this, here are some four-factor similarity prompts for the same scenes used above:
Jeremy Bentham sitting on a bench; The Bodleian Library, Oxford; Golden hour light coming through the windows; 18th-century oil painting; –v 5 –q 2 –s 130
René Descartes at a desk, looking towards us; The Bodleian Library, Oxford; Golden hour light coming through the windows; 18th-century oil painting; –v 5 –q 2 –s 130
Galileo looks at a globe, deep in thought; The Bodleian Library, Oxford; Golden hour light coming through the windows; 18th-century oil painting; –v 5 –q 2 –s 130
An eighteenth-century telescope; The Bodleian Library, Oxford; Golden hour light coming through the windows; 18th-century oil painting; –v 5 –q 2 –s 130
And the results:
These are pretty nice images, and for some use cases this level of similarity will be what you need, but you can probably see how these are a little on the repetitive side.
It can also be useful to play around with which elements you keep constant, and which are variable within your set. For example, you might try a two-factor similarity with just location and lighting the same. Let’s take a look at some prompts for that:
Jeremy Bentham sitting on a bench; Fitzroy Square, London; Bright, Spring light; Watercolour painting; –v 5 –q 2
René Descartes looking towards us; Fitzroy Square, London; Bright, Spring light; Realist painting; –v 5 –q 2 –s 200
Galileo looks at a globe, deep in thought; Fitzroy Square, London; Bright, Spring light; Children’s illustration; –v 5 –q 2 –s 130
An eighteenth-century telescope; Fitzroy Square, London; Bright, Spring light; 35mm, photorealistic, Canon EOS 5D Mark IV DSLR, f/5.6 aperture, 1/125 second shutter speed, ISO 100; –v 5 –q 2 –s 90
The results:
They have a similar colouring and mood, while clearly having very different styles. This might be useful for people who want to make large image sets that deal with varied content, but also would like to keep a thread of aesthetic similarity running between them.
We recommend experimenting with our prompting structure to see how you can use it to create consistent Midjourney images. By taking a modular approach, breaking the prompts into five parts, the possibilities for different combinations are massive!
PS: If you’re new to Midjourney you might be a little confused by the ‘parameters’ element of our prompts – the stuff that looks like ‘–v 5 –q 2 –s 130’.
A quick summary: ‘–v 5’ means you’ll be using Midjourney version 5, the most recent version. You could opt for ‘–v 4’ or ‘–v 3’ if you wish, but ‘–v 5’ tends to produce the highest quality results.
‘–q 2’ is asking for Midjourney to dial up the ‘quality’ parameter to 2. Basically that means a more resource-intensive but higher-quality rendering. If you’d like to be more economical with your fast hours, use a lower quality parameter.
Finally, ‘–s 130’ is tuning up the ‘style’ parameter, which is set to 100 by default. Higher style parameters will produce images that are focus more on aesthetics than on accuracy to the scene requested. So if you’re producing something abstract, tune it up. For something precise, like what we are creating here, you’ll want to keep the style parameter pretty low.