Monkeying with DALL-E

Generative Art Storytelling

Can there be a movie or a comic book with AI-generated characters, sets & plots? It is getting closer to possibility and let’s get a preview.

AI has been a rage. Automatically generating stuff with these artificially intelligent systems is the trend. One subset of this is image creation from text input. Can we use this to create picture stories? Let us try with DALL-E.

Wikipedia: DALL-E is an artificial intelligence program that creates images from textual descriptions. It uses a 12-billion parameter version of the GPT-3 Transformer model to interpret natural language inputs and generate corresponding images.

We use this no-code version hosted on Hugging Face spaces:

Today’s AI systems are pretty much a black box that anyone can use without coding knowledge. By just providing the right text prompts we can create images automatically. Let us pursue the idea of giving a series of text prompts that flow like a story.

Since we are monkeying around let us prompt our AI with “two monkeys in a street”

Image by Author

In approximately 30 secs the picture you see on the right emerges. You can see the whole process… give your input, and DALL-E processes it and outputs the image. That’s it.

Image by Author

Oh what changed? Same sentence but I now added a few words “riding cycle in a night”. Now we have a new image with cycles and the light changed from day to night. AI understands and interprets.

Image by Author

Let's add more light. Adding “bustling” changes that. You get the drift. By systematically changing the words in a text prompt you can have a series of images that are related to the core plot.

Boom!!! Here you go. Monkey night-out: A short story

Image by Author

What just happened? The process illustrated earlier is repeated across changing words and we have this short story.

Image by Author

You can see all the prompts in the diagram above. Every time a few words are changed the image changes. A short comic strip is created. The concept is very similar to creating animation using a flipbook where subtle changes in a series of images when viewed in quick succession show movement/ motion. Here we use subtle changes in text instead of images… a text flipbook.

As we move into the future there are a lot of possibilities.

Can we use the same person or object and run through the plot? Yes and here is an example. The same guitar image is modified by giving an image prompt in addition to a text description. Similarly, the same characters can be used across the script by feeding them as a prompt.

Pic Courtesy:

Can we improve the quality of the images? This is improving as we speak (Check Dall-E 2 from OpenAI)

Can we generate text prompts programmatically & autogenerate coherent narratives? Yes, and today full reports are already being autogenerated using NLP (check & the technology is improving with larger language models. (Check GPT-3, Chinchilla, etc)

Can we make it cartoonish? (Check Cartoon GAN)

Can we change clothes? (Check Fashion GAN)?

& so on…. All these technologies are evolving, maturing & converging.

AI-generated plots, movies, and comics are on the way. Buckle up!!!

