Welcome to The Long View—where we peruse the news of the week and strip it to the essentials. Let’s work out what really matters.
This Week: A Stable Diffusion Special
Unless you’ve been living under a rock for the past week, you’ll have seen something about Stable Diffusion. It’s the new open source machine learning model for creating images from text and even other images.
Analysis: Open source is the key
Like DALL-E and Midjourney, you give it a textual “prompt” and it generates amazing images (or sometimes utter garbage). Unlike those other models, it’s open source, so we’re already seeing an explosion of innovation.
Mark Hachman calls it ‘The new killer app’
“Fine-tune your algorithmic art”
AI art is fascinating. Enter a prompt, and the algorithm will generate an image to your specifications. Generally, this all takes place on the Web, with algorithms like DALL-E. [But] Stability.Ai and its Stable Diffusion model broke that mold … with a model that is publicly available and can run on consumer GPUs.
For now, Stability.Ai recommends that you have a GPU with at least 6.9GB of video RAM. Unfortunately, only Nvidia GPUs are currently supported. [But] if you own a powerful PC, you can take all the time you’d like to fine-tune your algorithmic art and come up with something truly impressive.
From the horse’s mouth, it’s Mother Mostaque: Stable Diffusion Public Release
“Use this in an ethical, moral and legal manner”
It is our pleasure to announce the public release of stable diffusion. … Over the last few weeks we all have been overwhelmed by the response and have been working hard to ensure a safe and ethical release, incorporating data from our beta model tests and community for the developers to act on.
As these models were trained on image-text pairs from a broad internet scrape, the model may reproduce some societal biases and produce unsafe content, so open mitigation strategies as well as an open discussion about those biases can bring everyone to this conversation. … We hope everyone will use this in an ethical, moral and legal manner and contribute both to the community and discourse around it.
Yeah, right. Have you ever been on the Internet? Kyle Wiggers sounds worried: Deepfakes for all
“90% are of women”
Stable Diffusion … is now in use by art generator services like Artbreeder, Pixelz.ai and more. But the model’s unfiltered nature means not all the use has been completely above board.
Other AI art-generating systems, like OpenAI’s DALL-E 2, have implemented strict filters for pornographic material. … Moreover, many don’t have the ability to create art of public figures. … Women, unfortunately, are most likely by far to be the victims of this. A study carried out in 2019 revealed that, of the 90% to 95% of deepfakes that are non-consensual, about 90% are of women.
Why is it such a big deal? Just ask Simon Willison:
“Science fiction is real”
Stable Diffusion is a really big deal. If you haven’t been paying attention to what’s going on … you really should be. … It’s similar to models like Open AI’s DALL-E, but with one crucial difference: they released the whole thing.
In just a few days, there has been an explosion of innovation around it. The things people are building are absolutely astonishing. … Generating images from text is one thing, but generating images from other images is a whole new ballgame. … Imagine having an on-demand concept artist that can generate anything you can imagine, and can iterate with you towards your ideal result.
Science fiction is real now. Machine learning generative models are here, and the rate with which they are improving is unreal. It’s worth paying real attention to.
How does it compare to the DALL-E? Just ask Beyondo:
Personally, stable diffusion is better. … OpenAI makes it sounds like they created the holy grail of image generation models but their images don’t impress anyone who used stable diffusion.
@fabianstelzer did a bunch of comparative tests:
These image synths are like instruments — it’s amazing we’ll get so many of them, each with a unique “sound.” … DALL-E’s really great for facial expressions. [Midjourney] wipes the floor with the others when it comes to … prompts aiming for textural details. … DALL-E’s usually my go to for scenes involving 2 or more clear “actors.” … DALL-E and SD being better at photos … Stable Diffusion can do incredible photos … but you need to be careful to not “overload” the scene.
The moment you put “art” into a prompt, Midjourney just goes nuts. … DALL-E’s imperfections look very digital, unlike MJ’s. … When it comes to copying specific styles, SD is absolutely 🤯🤌 [but] DALL-E won’t let you do a Botticelli painting of Trump.
And what of the training data? Here’s Andy Baio:
One of the biggest frustrations of text-to-image generation AI models is that they feel like a black box. We know they were trained on images pulled from the web, but which ones? … The team behind Stable Diffusion have been very transparent about how their model is trained. Since it was released publicly last week, Stable Diffusion has exploded in popularity, in large part because of its free and permissive licensing.
Simon Willison [and I] grabbed the data for over 12 million images used to train Stable Diffusion. [It] was trained off three massive datasets collected by LAION. … All of LAION’s image datasets are built off of Common Crawl, [which] scrapes billions of webpages monthly and releases them as massive datasets. … Nearly half of the images, about 47%, were sourced from only 100 domains, with the largest number of images coming from Pinterest. … WordPress-hosted blogs on wp.com and wordpress.com represented … 6.8% of all images. Other photo, art, and blogging sites included … Smugmug … Blogspot … Flickr … DeviantArt … Wikimedia … 500px, and … Tumblr.
Meanwhile, how does it work? Letitia Parcalabescu is easy for her to say:
How do Latent Diffusion Models work? If you want answers to these questions, we’ve got you covered!