

You begin not speaking the same language, not seeing the same things, but over time you hopefully find a way to communicate and converge on an aesthetic or concept. That leads to part two of my advice, it helps to have the right big picture mindset – you're having a back-and-forth conversation with the AI and trying to find common ground to coordinate. Even the starting resolution affects the images you get because it changes the initial noise pattern. If you're not getting what you want from prompts alone in Stable Diffusion, for instance, it could be because you need to shop around the sampler methods and CFG Scale values. With platforms like Dall-E 2 where underlying variables aren’t exposed, the prompts do play a dominant role.īut with Stable Diffusion and Midjourney, there are more controls available to you that affect the output. Prompts are important but they’re not everything. One, don't get tunnel vision on the prompts. My advice for prompting is a two-part answer. Coming from a photo background, I think of it like a contact sheet, searching for selections scattered among options and outtakes. Many of them will have artifacts and sometimes completely different interpretations of the prompt which I then have to sift through to find useable images. I'm able to run scripts that can walk through values like the CFG Scale, steps, and different sampler models.įrom that, I can generate a grid so I can look through a set of options.

I have an RTX 3090 and run Stable Diffusion locally on my machine. It becomes less about a single initial prompt and more about understanding/modifying the larger system of settings interacting with each other, like how you might track variables in a blueprint in Unreal.įor example, the different sampler models you can pick from (DDIM, k_euler, etc.) have a huge impact on the aesthetics. With each cycle, you can change the prompt and steer things in slightly different directions. You can also feed your resulting image back into the AI as a new starting point and cycle through a few generations. But the underlying image carries a lot of weight. Macro descriptions like "oil painting", "illustration", or "sculpture" have effects, as well as describing styles like Anime, Art Nouveau, or Cubism. In my experience so far, when you use img2img with restrictive settings, the prompts don't have as much effect on the end product as they usually might because the AI is drawing from multiple inputs. I set things up so that the result would be very close to the original. There's a noise setting that tells the AI how far it can deviate from the reference image. I used something called img2img in Stable Diffusion (other AI platforms have similar methods) where it takes an initial image as the starting point for the prompt.

Generating the Input Face with Stable Diffusion I've yet to contribute to any projects using my CGI skillset exclusively, but I'm hopeful that it won't be long before I can contribute some value to a project. I'm self-taught, so I've been studying YouTube channels and online courses/resources, but I bring a lot of my production experience and aesthetic sense from my day job into interpreting what I've been absorbing. I'm using CoffeeVectors as the avatar for my work focused purely on that space. I got really curious about exploring how creative work might change and evolve over the next decade and I decided to start expanding my skillset to be ready for those future clients that might have different needs than the ones I have today. 3D and computer-generated work has been something I've gotten into over the past few years as I've seen it increasingly used in film, fashion, and media more generally. I currently freelance in the video/photo industry and take on a wide range of clients from around the world, but I usually find myself working with fashion and makeup brands.
