• Creative Currents Weekly
  • Posts
  • #7 - Rethinking Creative Practices: The Impact of Generative AI on Design, Design Thinking and Content Creation

#7 - Rethinking Creative Practices: The Impact of Generative AI on Design, Design Thinking and Content Creation

The Impact of Generative AI on Design

đź‘‹ Welcome to the seventh issue of creative currents! Get ready to be inspired and informed as we dive into the latest trends, tips, and insights from the world of generative AI. Whether you're a seasoned product designer or just starting out, this newsletter is your go-to resource for all things around creating human-centered products which people ❤️

This special issue highlights the amazing work of Prof. Dr. Sebastian Löwe, Mahmoud Fazeli and Dr. Afsaneh Asaei - Enjoy 👌

1. Intro

The pace in which the field of AI in general and Generative AI specifically progresses is breathtaking. Famous researchers and founders already asked for a stop in training big AI models. But still, almost every month a new astonishing AI model is being released, with more capabilities and more features. The world of designers, design thinkers and content creators is shaken to the core. Up until recently, most of the creatives had no idea what AI was and why they needed to pay attention to it. This changed rapidly and drastically. The notion that AI will not come for creative tasks or creativity –at least not in the near future– has been discarded for good. With the capabilities of Generative AI everyone can now create illustrations, games, photos, websites, and marketing campaigns, or can write code and ideate future products. Some creatives are threatened by Generative AI, some are creeped out, and some are excited – or all three things at once.

Intelligent image and text generation tools are here to stay. They are not a marketing hype, this much is clear. To enable designers, design thinkers and content creators to participate in the ongoing tectonic shift and reclaim a bit of their ability to reflect on and cope with these new methods and tools, this article wants to shed light on the world of Generative AI from a creative’s perspective. To do so, it answers the pressing question: What is the impact of Generative AI and its underlying AI models for design, design thinking and content creation, and how does it transform creative work? If language becomes the dominant paradigm for creation and traditional boundaries between creative domains are being dissolved, how does this change the way we need to look at design and content creation? For that, the article is divided into three parts. First, it takes a look at the so-called foundation models, such as large language models, which are the base for almost all Generative AI magic. Second, it explains the impact of large language models in general and the new conversation-based UX paradigm of ChatGPT specifically on creative work. And third, it takes a look at image generating AI and reflects on the ramifications of these techniques on design and content creation – now and in the near future.

2. What Are Foundation Models?  

Have you ever imagined a machine that effortlessly creates art, transcends language barriers, and pushes the boundaries of human creativity? Meet foundation models, the groundbreaking AI technology that is revolutionizing industries across the board. From healthcare to education, entertainment to marketing, these models are transforming the way we interact with AI.

2.1 What Are Foundation Models and What Do They Create?

Foundation models represent a paradigm shift in AI technology. These machine learning models possess the remarkable ability to learn from vast, diverse datasets without explicit supervision, enabling them to tackle a multitude of tasks with astonishing accuracy. Language translation, text generation, image synthesis, and design creation are just a glimpse of what they can achieve. Imagine breathing life into words, transforming mere text descriptions into vibrant and mesmerizing visuals, or witnessing images manifest from the depths of imagination. It is all made possible by the ingenuity of foundation models.

Utilizing self-supervised learning and transfer learning techniques, these models embark on a transformative learning journey. They create their own labels or objectives from unlabeled data, unlocking the true potential of information hidden within. Transfer learning empowers them to adapt their knowledge from one task or domain to another with minimal fine-tuning. A model trained on natural language can effortlessly navigate the complexities of medical terminology. With a broad set of skills and the ability to apply their knowledge across diverse domains, foundation models save valuable time and resources while pushing the boundaries of human ingenuity.

Examples of these marvels of technology include the renowned large language models (LLMs) like Google's BERT and OpenAI's GPT-n series, which are trained on colossal amounts of text data, enabling them to perform various natural language processing (NLP) tasks. Equally impressive are the multimodal models like OpenAI's DALL-E, capable of generating images from text or captions from images.

2.2 Bridging All Kinds of Media: Unlocking Creative Fields

Foundation models possess the remarkable ability to transcend media forms, seamlessly bridging the gaps between text, image, and voice. This transformative process, known as text2design, image2design, and voice2design, has revolutionized creative fields such as art, illustration, logo design, and sketching. Through the fusion of different modalities, these models exhibit unrivaled accuracy and robustness, far surpassing their single-modal counterparts. Moreover, they undertake language identification, speech transcription, and speech translation, further expanding their potential.

Real-world applications of these capabilities span diverse industries, from healthcare to entertainment, from education to marketing. By bridging the chasms between different forms of media, foundation models enable innovation, collaboration, and, most importantly, the enhancement of human creativity and expression.

2.3 The Power of Synergy: Large Language Models and Diffusion Models Combined

Foundation models reach unprecedented heights when the prowess of large language models converges with the artistry of diffusion models. Large language models, such as GPT-4, weave together coherent text, while diffusion models bring synthesized images to life. This marriage of technologies yields specialized features for Generative AI, fostering enhanced creativity beyond imagination.

Witness the wonders of this combination through examples like ImageGen, a text-to-image model employing T5 and cascaded diffusion to generate realistic images from natural language descriptions. Discover the magic of Latent Diffusion for Language Generation, where diffusion techniques enable language models to sample and decode continuous representations into text. And immerse yourself in the multisensory marvel of Meta ImageBind, a model seamlessly blending text, images audio, video, 3D objects, and tabular data. Its capabilities span image captioning, video summarization, and audio generation, opening up a realm of possibilities.

The benefits of this synergy are profound. The outputs produced by these combined models exhibit superior quality, diversity, and consistency compared to their individual counterparts. The fusion of large language models and diffusion models enables these creations to transcend mere replication, capturing the essence of creativity and generating outputs that astound with their realism. Moreover, this fusion enhances data efficiency and generalization, allowing models to learn from fewer data points while maintaining their ability to adapt to new domains and tasks. Flexibility and controllability become their defining traits, empowering users with a spectrum of options to tailor the style, content, and attributes of the generated outputs.

2.4 Unveiling the Limitations

Despite their immense capabilities, foundation models face significant limitations that demand attention and mitigation. Firstly, their hunger for computation and energy renders them costly and detrimental to the environment. The training of models like GPT-3, which emitted carbon emissions equivalent to five cars, comes with a hefty price tag of around $12 million. Addressing these environmental concerns is vital for the sustainable advancement of foundation models.

Secondly, foundation models heavily rely on the data they are trained on, making them susceptible to biases, noise, incompleteness, and unrepresentativeness. The outputs they generate can perpetuate inaccuracies, amplify existing biases, or lead to misleading information. Additionally, these models lack contextual understanding and reasoning abilities, relying solely on statistical patterns rather than grasping the nuances and subtleties of human language and behavior. This limitation can result in nonsensical or inappropriate responses that fail to capture the true essence of human communication.

Furthermore, the potential for misuse and abuse of foundation models by malicious actors cannot be overlooked. From the creation of fake or harmful content, including deepfakes, to the manipulation of individuals through spamming, phishing, or impersonation, the ethical implications are profound. Safeguarding these models against attacks and ensuring responsible use becomes paramount to protect individuals and uphold societal trust.

Addressing these challenges demands rigorous evaluation, regulation, and oversight from researchers, developers, users, and policymakers alike. While foundation models hold immense potential for scientific and societal advancement, their ethical implications and responsible use must be at the forefront of our considerations.

2.5 Navigating Legal and Ethical Waters

Foundation models, with their vast pre-training on diverse internet data, present both exceptional performance and legal and ethical challenges. Instances akin to the Cambridge Analytica controversy have shed light on the unintentional generation of copyrighted or privacy-violating content. AI systems trained on potentially harmful content, including hate speech, conspiracy theories, fake news, and extremist propaganda, underscore the need for robust legal and ethical frameworks.

Reliability, relevance, and respect for human rights in training data become critical factors. Preventing the generation of inappropriate or harmful content, protecting intellectual property rights, and ensuring data privacy require robust measures. Public trust in AI adoption and development is at stake. Inadequate regulation or oversight of foundation models risks undermining the credibility and legitimacy of AI applications, creating ethical dilemmas for users and developers alike. The malicious use or manipulation of these models can pose significant threats to democracy, security, and social cohesion.

To navigate these challenges, comprehensive regulations and standards are imperative. Ethical guidelines, mechanisms for data quality assurance, frameworks for data governance and ownership, methods to detect and mitigate harmful content, transparency, and accountability measures for foundation models and their outputs are necessary steps. Legal protections for copyright infringement and privacy violation must be enforced to safeguard the rights of individuals and creators.

However, implementing these measures presents its own set of hurdles. Balancing data quality and quantity, ensuring compliance across jurisdictions, fostering collaboration among stakeholders, and adapting to the rapid evolution of foundation models pose significant challenges. Overcoming these obstacles necessitates ongoing research, dialogue, and collaboration among researchers, policymakers, practitioners, and civil society.

As we move forward, we find it crucial to delve deeper into the profound effects of these models, especially within the creative field. The upcoming sections will explore the impact of large language models on design, design thinking, and content creation. We will also examine the ramifications of image AI and the increasing convergence of text and image in the world of Generative AI. By shedding light on these aspects, we aim to provide a comprehensive understanding of how AI continues to transform the creative landscape, pushing us to rethink the boundaries of design and content creation.

3. The effect of large language models on design, design thinking and content creation

One of the most interesting developments of the last few years is certainly the rise of large language models, or LLMs. They are capable of generating text by predicting the probability of words or part of words. This way they are able to produce meaningful sentences in any language, which makes them highly interesting for content creation, design and design thinking.

3.1 Design

In the field of design, LLMs can assist along the entire design process, starting at brainstorming ideas and conducting user research, up until prototyping and implementation. They help produce design concepts and work on software code for websites or games. LLMs can come up with new prompts for image generation, create short film scripts for digital film production, or game plots. They can help motion designers create storyboards, subtitles, and even outline animations. LLMs assist graphic designers in writing copy texts, translating clients’ requirements or doing research on market trends. They help UX designers research and analyze vast amounts of customer feedback, finding user needs in big data through data storytelling. They help translate user stories into actionable insights, define user flows and support persona creation. Fashion designers are supported by LLMs giving them insights into future trends, or supporting their communication with clients and outlining presentations. And, LLMs assist product designers by interpreting technical requirements and translating them into design ideas. These are just a few of the manifold applications for LLMs which already transform the way designers approach their work.

The language models are mostly being developed and trained by big tech companies, come in different forms and sizes, and have curious names. Google calls its LLMs PaLM and LaMDA, Meta named them Galactica and LLaMA, Google called it Minerva, OpenAI dubbed it GPT. Other models are called Claude, Dolly or Falcon.

One specific model stood out from its initial release in 2022: ChatGPT. It is based on an LLM, but it’s a bit different than all the other models up to that point. ChatGPT changed the way users could have meaningful interactions with these text generating machines by changing the interaction paradigm of LLMs. Users were now able to converse with the model and tell it exactly what they wanted in a very intuitive way.

The question now is: What lessons can designers learn from the ChatGPT case in terms of creating meaningful intelligent user experiences?

When ChatGPT came out in autumn 2022, the famous AI researcher and Meta employee Yann LeCun said that the technology behind it is not revolutionary. He was absolutely right, since ChatGPT was based on the LLM GPT-3 that was released almost 2 years prior by Open AI researchers. Then LeCun went on and called it not particularly innovative. This is where he struggled to grasp what ChatGPT did differently from other LLMs for the first time and what it meant for the intelligent user experience, or intelligence experience (IX). Contrary to his belief, the model set off nothing short of a small revolution in Generative AI. The interaction paradigm shifted fundamentally with ChatGPT and allowed for a completely new and exciting IX. The major difference to all the other LLMs was that researchers aligned ChatGPT better with the user’s intentions and needs. Despite generating false information or so-called hallucinations, it then created a ten times better IX and laid the foundation for its huge success.

The reason why ChatGPT was so different in terms of IX was the combination of two already existing machine learning technologies. The LLMs were trained in a self-supervised manner, like their peers. What revolutionized the IX was the idea to put reinforcement learning into play. The general idea was to train an additional reinforcement learning model to evaluate how good the content was that the LLM generated – from a human’s perspective and with the help of human trainers. In a second step this evaluation model then helped fine-tune the original language model. Researchers at OpenAI called this technique reinforcement learning for human feedback, or RLHF for short.

The RLHF-trained models then provided answers that were aligned with human intentions, needs and values, generated better responses and rejected inappropriate prompts. Users were also able to have a back and forth with ChatGPT for the first time, refining the model’s results iteratively. The interface itself allowed for an easy and intuitive interaction in natural language. Altogether, this resulted in an IX that resembled a meaningful conversation among co-workers more than the clunky interaction with GPT-3. New ChatGPT plugins that provide live access to the internet or to proprietary and secure data, allow for an even elevated IX and new use cases.

The lesson for designers here is interesting: The UX is –still, and even more so in the age of intelligent products– a major factor for a product’s success. This is true for all machine learning applications. When it comes to great IX, machine learning models need to be closely aligned with users’ expectations, needs, and values. Teaching models to learn and then evaluate user intentions, needs, and values is key for the success of intelligent products. After all, this is where these machine learning models excel.

3.2 Design Thinking

The landscape of LLM applications and other advanced AI capabilities can also practically enhance our user-centric product innovation process. We focus in this section on the key aspects of integration of AI in Design Thinking. We refer to our extended creative process as AI Design Thinking.

3.2.A Research

The initial stage of design thinking revolves around comprehending the user and empathizing with their pain points. Both Generative AI and traditional machine learning techniques have contributed to the advancement of social listening and monitoring of user behaviors. These innovations allow us to identify social patterns and analyze the prevalent language and interconnected incidents within social interactions. Consequently, AI-augmented social listening opens up new avenues for analyzing user-centric hypotheses with enhanced levels of analytics and data-driven insights.

For example, we now have the possibility to analyze a broad spectrum of emotions that are expressed in social interactions. The application of an emotional intelligence dashboard and sentiment processing algorithms introduces a more extensive awareness of emotional expressions and their contextual dependence to the user journey.

Another prominent approach to advancing user research is through the utilization of Generative AI prompt engineering. By instructing the LLM to adopt specific roles and tones of language, we can leverage its capabilities to design interview questionnaires and simulate user interviews. It is also beneficial in categorizing the problems encountered throughout the user research. The thought process and mental models can be extracted using Generative AI, and fine-tuned through the provision of a few examples for one-to-few shot learning.

Generative AI is also highly effective in extracting information from complex manuscripts. We can leverage this capability by utilizing it for reverse engineering scientific publications in AI across different domains. Generative AI can extract or reverse engineer the specific pain points that the advanced methodologies aim to address. This process of reverse engineering provides valuable insights into user challenges, and empower our teams with additional insights driven from the expert community publications. The pain points will then be considered for validation based on user interviews.

3.2.B Ideation 

The ideation phase of Design Thinking involves an intensive brainstorming process. One crucial aspect to consider is the effectiveness of the ideation session format for different team members within a product innovation team.

At Digital Product School, we have observed a distinction between technical or engineering disciplines and non-engineering disciplines. Engineering schools have a great focus on left-brain activities, while design schools focus on nurturing the right-brain skills. There are many anti-correlated patterns of thought process and problem-solving skills that the different disciplines contribute to the team. Consequently, a multidisciplinary team consisting of individuals from both disciplines may have varying comfort zones for their mental power and creative contributions.

In addition to conducting traditional ideation sessions, we have pioneered a unique method called AI-deation. This approach involves exposing the teams to the landscape of AI capabilities and leveraging this direct exposure to technology to stimulate the reflection process and form connections in addressing the user pains.

Consequently, we assert that multidisciplinary innovation teams working with AI require a bidirectional ideation process. In one direction, we encourage reflecting on the user pain points and brainstorming potential solutions. In the other direction, we encourage reflecting on the AI capabilities and brainstorming the pain points that AI can address. We organize specific AI-deation workshops with this format in collaboration with our experience partner DieProduktMacher.

We have observed that AI-deation workshops always lead to out-of-the-box ideas complementary to the classic design thinking brainstorming sessions. Therefore, a bidirectional ideation process empowers the innovation teams to better harness their team’s full potential for discovering user-centric products with AI.

3.2.C Prototyping

Once the team has formulated the initial concept for their product, the subsequent step is low-fidelity prototyping. There is a growing selection of tools available that expedite the process of creating low-fi prototypes for user testing. For example, the list of resources available on the website Design und KI are a good starting point. However, one crucial lesson we have learned from collaborating with numerous design thinking teams at Digital Product School is the necessity of adopting a new prototyping mindset.

The use of AI in products introduces a new dimension to user interactions that is dynamic, context-aware, and partially unpredictable. The conventional repertoire of a designer’s tools and methods does not adequately encompass the stochastic and dynamic behavior of AI solutions. As a result, the rapid prototyping phase necessitates the development and launch of coded prototypes. To further emphasize the need for the adoption of engineering practices at design thinking teams, we sometimes also refer to our AI Design Thinking process as AI Design Engineering.

Fortunately, there are many tools and platforms that facilitate the rapid prototyping of AI solutions. These tools contribute to the design and development of a realistic user experience. Following the lean start-up approach, the teams formulate their technical hypotheses and work in agile iterations of Build-Measure-Learn to test those hypotheses. It is an imperative task of the AI engineers to actively learn from the user feedback and adapt the AI system to reach –and maintain– the user acceptance criteria.

The Hugging Face platform is one of our favorite resources for integration of AI rapid prototyping. We support the engineering teams to stand on the shoulder of the open source community and contribute to the responsible AI solutions.

3.2.D User Testing and Iteration

The user experience (UX) of AI and machine learning introduces a new chapter in multidisciplinary knowledge and best practices required for designing and developing AI systems. It is crucial to consider an extended set of design principles with a focus on ethics, forgiveness, and continuous learning based on user interactions. While this topic goes beyond the scope of this article, we want to emphasize the significance of taking ownership of the AI UX role within the design thinking process. This ownership helps a team to navigate the complexities of AI design and development. It ensures that the systems are ethically sound, inclusive, and user-friendly. At Digital Product School, we organize a series of AIxD workshops to establish a disciplined collaboration between designers and AI engineers. To get deeper insights on some of our workshops relevant to the AI Design Thinking process, we created a Miro template.

Furthermore, our internal team of AI Makerspace at Digital Product School accelerates the adoption of new technologies in AI Design Thinking. We apply the principles of lean startup for a lean approach to innovate with AI. We educate and mentor our teams for rapid prototyping, lean AI engineering, explainable AI (XAI), ethical AI, as well as the user experience of AI and machine learning (AI UX).

3.3 Content Creation

In the world of content creation, where creativity and productivity intertwine, LLMs have emerged as powerful allies. These AI systems possess the ability to generate captivating content, overcome creative blocks, and provide personalized experiences for users. In this section, we explore the remarkable capabilities of LLMs in enhancing creativity, automating content production, facilitating content localization, and delivering personalized experiences.

3.3.A LLMs Unleashing Creativity: Empowering Content Creators

LLMs act as catalysts for creativity, empowering writers and content creators to surpass their limitations. By leveraging LLMs, creators can effortlessly generate diverse and high-quality content from simple prompts. From captivating introductions to coherent narratives, LLMs lend their expertise across various domains, such as blog posts, product descriptions, stories, and more. Writers can explore different perspectives, styles, and tones, avoiding plagiarism and repetition, while benefiting from valuable feedback and suggestions. The infusion of LLMs into the creative process significantly enhances productivity and unleashes the potential of human imagination.

3.3.B Automating Content Production: The Power of LLMs

With the advent of LLMs like ChatGPT, Narrato, and Copy.ai, content production undergoes a revolution. These AI systems automate the generation of natural language content, offering unprecedented speed, creativity, and quality. Content creators can save valuable resources and meet demanding deadlines, all while generating an array of diverse ideas. However, ethical considerations arise as LLMs rely on the data they are trained on and may inadvertently produce misleading or inappropriate content. Human creativity and judgment remain indispensable in ensuring the accuracy and appropriateness of AI-generated content before it reaches the public eye.

3.3.C Breaking Language Barriers: LLMs in Content Localization

Content localization, the art of adapting content to different languages, cultures, and markets, gains newfound efficiency through the aid of LLMs. These AI systems, honed by extensive language training, effortlessly translate content into multiple languages, breaking language barriers and fostering inclusivity. For instance, GPT-3 showcases its versatility by translating conventional language into formal computer code or generating query-specific code for SQL processing. While LLMs play a pivotal role in content localization, they may struggle to capture nuanced context and tone. Human input and review remain indispensable for ensuring the utmost accuracy and cultural sensitivity in localized content.

3.3.D Tailored Experiences: The Personalization Potential of LLMs

In the realm of marketing and e-commerce, LLMs emerge as powerful engines of personalization. By analyzing user behavior, preferences, and history, these AI systems curate tailored product recommendations, news feeds, and advertisements, enhancing user engagement and satisfaction. LLMs automate content creation, saving time and resources. However, ethical, legal, and quality considerations loom large. Responsible use of LLMs entails upholding user privacy and consent, ensuring that personalization serves as a means to enhance user experiences rather than infringe upon individual rights.

In harnessing the potential of LLMs, content creators navigate a landscape of immense opportunity and responsibility. By embracing the fusion of human creativity and AI ingenuity, they can unlock new frontiers in content creation, personalization, and global accessibility. The journey towards a future where AI and human expression harmoniously coexist beckons, urging us to tread carefully and harness the transformative power of LLMs responsibly.

4. Consequences of Image Generation for Design and Content Creation

Generative AI does not only consist of text creation, it also entails image generation. The field grows exponentially and what seemed far out and impossible a couple of years ago, is now becoming the new industry standard. Design and content creation are especially affected by this development, transforming the way creatives are able to create cross-media content.

4.1 Design

Just a few years ago, image generation models weren't as powerful as today and existed in their own little silos. They needed to be trained on very specific data, which allowed them only to generate images within the domain of their limited training data, such as faces or horses or cats. If creatives wanted to change the style, color or appearance of a certain image content, they needed to use a differently trained model.

This limitation came to an end when researchers combined LLMs with image generation models. By tying LLMs with diffusion models, text and image were bound together, and machine learning models learned that a written text represented a visual output. The diffusion models were then able to generate a new image based completely on the written text input. This way, image generation wasn’t limited to a certain content-related domain anymore. Now, virtually any content in any style could be produced with a single line of text. Hence, more creative freedom and capabilities were given to anyone who could write prompts. At the same time, a whole new paradigm for image generation was introduced by models such as DALL-E, Stable Diffusion or Midjourney. Combining image generation with LLMs meant that language became the dominant paradigm for generating visual artifacts. Interestingly, this led to a fundamental erosion of traditional boundaries between image and text creation. Now, writing text prompts is tantamount to creating visual output. This makes diffusion models intuitive tools for rapid imagination augmenting creators’ abilities in a fundamental and unforeseen way.

The latest developments in image creation erode traditional boundaries even more. Before LLM-driven diffusion models entered the scene, boundaries existed in image creation between different domains or modes of representation. This meant that content, style or color were separate entities from the point of view of an image creation model. Even though designers traditionally don’t see them as separate, since they altogether shape the appeal of an image. Designers could for instance change the style of an image, but not at the same time with the same model manipulate the content itself, be it the entire image or just specific regions or segments. Nor could they change the image’s color with a style transfer model. Depending on the so-called embedded image properties the style transfer model itself chose what it took from the original image and transferred it to the new one.

With a whole new generation of diffusion models, such as Composer, these boundaries vanish almost entirely. Now all domains of representation within an image –i.e. style, fidelity, dimensionality, depth, color palette, intensity, image region, poses, semantic properties and embedded visual image properties– can be accessed and manipulated all at once. For instance, a hand-drawn sketch of a person can be transformed into a high fidelity image of that person. The gender or age of that person can then subsequently be transformed. After this, the person can be made a Vermeer painting. Parts of the painting, like the face can be replaced and the person can be given a cat face in the style of Vermeer. After all this, the silhouette of the person can be used to create a background completely different from Vermeer’s style. Lastly, the entire image can then be transformed into a 3D rendering – all with the same text-to-image generation model.

Composer model for almost limitless image manipulation

This means that the boundaries that previously existed between the domains of representation vanish, giving creatives even more freedom. Since an image can now be generated at every state of representation, it allows creatives to walk seamlessly through the latent space of potential images. This has fundamental ramifications not only for the creative process, but also for the expertise of designers. Traditional boundaries between design disciplines, such as graphic design, illustration, web design, or even digital film design and game design make only limited sense from now on. Everyone can simply create content that is at least as professionally designed as material from a reasonably gifted designer of a certain discipline.

The power of diffusion models raises serious legal and ethical concerns. As for the legal aspect, it’s still unclear how these models fit into existing law. Whether or not existing copyright is enforceable, is still an open question and most likely decided by US and EU courts. Artists sued Stable Diffusion for creating unauthorized copies of their works. Arguing that the diffusion model is practically a copy and collage machine, the artists unfortunately missed a crucial point. The machine does not copy images but learns their representation by storing weights that are then able to produce completely new images resembling the training data. As erroneous as that may be, what’s real is the threat these models pose for the livelihood of creatives. With models like Dreambooth people can use transfer learning to train diffusion models with only a handful of images to produce outcomes in a certain proprietary style. The illustration artists involved don’t even have to give their consent, making it an ethically questionable endeavor. Currently there’s a pushback from creatives to at least be compensated for involuntarily providing training data. The outcome of these legal and ethical disputes will most likely determine the future of creative machines in the long run.

4.2 Content Creation

The realm of content creation is not limited to text alone. With the advent of AI-powered image and video generation, a new dimension of creative possibilities emerges. In this section, we delve into the capabilities of AI in generating captivating visuals, enhancing image and video quality, enabling visual storytelling, and facilitating multimedia content creation.

4.2.A Image and Video Generation: Unleashing AI's Creative Potential

Through techniques like Generative Adversarial Networks (GANs) and transformer models, AI can now generate novel and realistic visual content from text descriptions or random noise. Innovations such as DALL-E, StyleGAN, and CLIP have brought forth diverse and high-quality visual creations. From art and entertainment to education and medicine, AI-driven image and video generation pave the way for new forms of expression and immersive experiences. However, the ethical challenges surrounding technologies like deepfakes necessitate the establishment of regulations and standards to ensure responsible and ethical use.

4.2.B Enhancing Visual Quality: AI's Power to Transform Images and Videos

AI significantly elevates image and video quality through various enhancement processes. By increasing resolution, colorizing grayscale images, stylizing visuals, and stabilizing shaky videos, AI-powered tools like VanceAI Image Enhancer, Colourise.sg, DeepArt.io, and Microsoft Hyperlapse Pro deliver superior results compared to manual editing. These advancements save time, reduce costs, and unlock the true potential of visual content.

4.2.C Visual Storytelling: AI as a Co-Creator of Immersive Narratives

Generative AI plays a pivotal role in visual storytelling by seamlessly aligning images or video sequences with text narratives. Whether it's creating comics, storyboards, or animated sequences based on a script, AI-driven models offer a canvas for imagination to flourish. Examples include open-ended visual storytelling models that generate coherent image sequences based on given storylines and CodeToon, a comic authoring tool that harmonizes storytelling and coding. AI's ability to enhance emotional impact, such as identifying the perfect musical score or visual image, has far-reaching implications for advertising, entertainment, and education.

4.2.D Multimedia Content Creation: Uniting Text, Images, Sound, and Video

Generative AI transcends media boundaries, facilitating multimedia content creation that seamlessly integrates text, images, sound, and video. It empowers the generation of visual content from text, enriching articles and blogs. It enables video generation from text for marketing or educational purposes. By synthesizing audio from images or videos, AI creates immersive auditory experiences. Additionally, AI contributes to the creation of VR/AR content, revolutionizing interactive gaming and education. These AI applications elevate content quality, provide creative suggestions, and shape the future of multimedia expression.

As we explore the endless possibilities offered by AI in visual content creation, we must navigate the ethical considerations and establish responsible frameworks. By embracing AI's creative potential, content creators can unlock new dimensions of expression and captivate audiences across various industries. The marriage of human ingenuity and AI's artistic prowess heralds a future where visual storytelling knows no bounds.

5. The next level: ChatGPT and Image Creation Converge

It is undeniable that each of the two Generative AI fondation models, ChatGPT and LLM-based diffusion models have a huge individual impact. Recently, these two have been combined to create an even more powerful meta model. With Visual ChatGPT, creatives are now able to interact with a variety of different image creation and visual foundation models by using a prompt manager. The prompt manager connects ChatGPT with the various image models, such as Stable Diffusion, ControlNet, BLIP or Pix2pix.

Visual ChatGPT brings together image creation and the chat interface

With Visual ChatGPT image reasoning and image creation all work in unison against the backdrop of the natural conversation interaction paradigm. Creatives may for instance ask Visual ChatGPT what the background style or color of a given image looks like and then decide to create a prompt that changes that. They can ask the model to replace, retouch or transform images or parts of images, depending on image depth or segments. In contrast to Composer, where only one model was used, creatives can now access 22 different models in a seamless conversation. Whereas ChatGPT is limited in its ability to process visual information, Visual ChatGPT is capable of interpreting images and reason with them. The meta model has an algorithmic understanding of what’s depicted in an image and what physical concepts it represents. For example, if an image shows a balloon tied to a branch, you can ask Visual ChatGPT what will happen if someone cuts the thread. It will answer that the balloon flies away.

This makes the creative process even more flexible and grants even more freedom to creatives. Visual ChatGPT’s architecture is truly multi-modal allowing new models for –let’s say– voice synthesization or video generation to be integrated later on. And it seems that with Visual ChatGPT the traditional design process retracts in a way. Whereas creatives previously needed different tools, skills and mindsets along the creation process, starting from research to exploration to generation to prototyping, they now can use one single tool to fulfill most of their creation needs – even skipping certain stages in the design process.

6. Conclusion

What does this all mean for creative labor and what will the future of creation look like? What we see with foundation models and their meta-combinations is certainly an augmentation of creativity for all kinds of creative disciplines. With the new generation of powerful tools, the traditional borders between former creative domains are being further eroded. If having an idea and being able to put it into words means you can create high-fidelity images and texts, traditional expert domains become permeable. This has profound ramifications for creative work. Expert knowledge on execution isn’t a premise for working in a domain anymore. Now, marketing managers can do part of the job that designers previously did, and designers can do part of the job copy writers were needed for. In the digital gig economy of the last decade, creatives had to compete on an international level with peers of their creative domain. Now they’ll need to compete with all sorts of creatives and content creators outside of their domain as well.

As this happens, strategic and managerial components of creative work become more critical. Hence creatives should adopt an extended creative process such as AI Design Thinking to leverage the full potential of the innovation teams. Furthermore, knowing your customers, their journeys and how to strategically align new content to their needs, will set creatives apart from their peers. Creatives will necessarily have to become more strategic thinkers and managers of design processes, instead of just executors.

The near future is almost certainly both multimodal and multi-model, with language as the dominant paradigm for creation. With even more powerful models such as consistency models foreshadowing even more rapid image creation, we will most likely see the rise of more customized user content. If images can be generated in an instant, design can truly be just created in and for the individual moment of use. Connecting user preferences with image creation will be the next big thing for Generative AI. This of course will create an even more powerful experience bubble for users, giving creatives also the chance to critically design new human-centered experiences.

Struggling with a design challenge? Let’s jump on a call

Share the Creative Currents Newsletter with at least five friends in your network and Book your Free Consultation Session 👉 [email protected] with a design expert

Support the Creative Currents newsletter

  • Forward it to your product design friends and recommend them to

    👉 Subscribe 

  • Have product design related topics like events, newsletters, tools, jobs, articles or anything you'd like to share with our subscribers?

  • Sponsor the next edition and become part of the network