
See my main account for my photography, videos, fractal images and more here: www.flickr.com/photos/josh-rokman/
This is one of a series of images I made with the Bing Image Creator, which is an AI image generator powered by DALL-E 3.
For most of these images I tried to combine multiple elements together into one, rather than creating a scene with multiple separate elements. This pushes the limits of what the AI model can do, and maximizes the amount of human control over the images. For example, I might use the text prompt ‘photo realistic snake plane made out of carbon fiber and gold’.
Here is my take on AI generated images vs. human made art: I think that the quality of AI generated images will NOT dramatically improve, even many years into the future. The ultimate goal of a creative image is to create a certain emotional state in the viewer. - Emotions themselves - are the main tool used to do this. A purely logic driven machine can only create a crude, generalized model of something meant to create a certain emotional state. When someone makes a piece of art, it is always some combination of using logic and emotions to guide the process. Remember that the ultimate goal is to create a certain emotional state in the viewer. Only having access to logic, but not emotions, will always create a very generic looking work. You need to actually be able to feel emotions to fine-tune the work beyond that, since creating emotions is the ultimate goal.
The main difference between an AI model and a human is not the difference in the power of the logic that can be deployed. The difference is that a human can feel emotions, which is key to creating an image (or text) made to create a certain emotional state. The logic that the best AI image generator models currently have seems to already be at the level of what the best human can do (based on some of the results I got, which was quite a shock). The results are still crude and generic compared to what a human can do, because the AI models have no access to emotions, which are the main tool for making and refining a creative work designed to create certain emotional states.
All creative work is built with a combination of logic and emotions (emotions should always be the main tool), and by not having direct access to emotions, a machine can only create crude, generic results. When I make music I always try and use emotions rather than logic to guide the process as much as possible, since creating a certain emotional state in the listener is the ultimate goal. The best AI models have an amazing ability to use logic to mix two different styles of images together since that is a logic driven process. They can’t make those images from scratch, since that is an emotion driven process, since it is all about creating certain emotions in the viewer.
Imagine you were a chef trying to develop a new dish, but you were not allowed to taste the food at any point as you made it. Your ability to determine the correct amount of salt and other seasonings would be very crude and limited. There would always be the possibility of a disaster happening, since you could not add a bit of seasoning at a time and taste it, so you would have to just dump it all in at once. This is the same idea of an AI model that is trying to ultimately create emotional states using sophisticated logic, but without having any access to emotions to guide the process.
The results will always be very generic looking, with the occasional unexpected gruesome image being returned. The power of the logic I have seen in some of the images I have created is quite shocking, but the results are still crude and generic compared to what a human can do, since the AI model is trying to create emotional states without being able to actually feel emotions itself, which is vital to creating emotional states through an image (or text).
The AI models have an amazing ability to combine multiple types of images together into one, but they have no understanding of what the individual elements they are combining together truly are. Again, this is because the individual images are designed to create certain emotional states, and an AI model has no ability to feel emotions, meaning it has no ability to understand them.
Having said all that, there is a good chance that the ability of humans to customize AI generated images will keep going up, and this will allow for this tool to create highly creative works close to what a human could do from scratch after all. I don’t think that the AI models will ever be able to do this by themselves with a simple button push though, as I have made the case for above. Also, I would expect that the number of images that are generated with a button push will keep going up and up.
Now, does making good AI generated images take talent? I say the answer is yes simply because these images are not all equal in quality. To create the best image in a set of 1,000 to 100,000,000 images, and to do so consistently, takes skill. You need to learn something from every single image you create about what the AI model does well and does not do well. You also need creative and artistic skills to come up with really good text prompts, in addition to this. The skills involved in this are similar to the skills involved in coming up with a great line in text, like in books, poems, speeches, scripts, ect. This is like abstract art. I know from experience that randomly applying different colors of paint never yields anything impressive. It takes a lot of talent to make good abstract art. Random combinations of words in text prompts will never create the results that highly targeted ones will, when it comes to AI generated images.