An announcement from Stability.ai comes with some great news for anyone on the AI image generation hype. Stable Diffusion, an image generation software that uses consumer level hardware, will soon be going public.
As you can see from the header image the pictures being generated by the soon-to-be-released AI model are looking pretty incredible, especially considering how little GPU power it needs. The image generator has been led through development by Robin Rombach of LMU Munich’s Machine Vision & Learning Research group, and Patrick Esser who helped develop video editing software, Runway.
The announcement (opens in new tab) notes that the AI model runs on “under 10GB of VRAM on consumer GPUs.” Essentially you can run it on a 10GB Nvidia GeForce RTX 3080 (opens in new tab), an AMD Radeon RX 6700 (opens in new tab) or potentially something less powerful, though there’s nothing here about the minimum graphics requirements. That’s still contrary to a lot of AI generation models, which tend to be hosted by servers since they take several Nvidia A100 GPUs to run (opens in new tab).
Stable Diffusion is trained on Stability AI’s 4,000 A100 Ezra-1 AI ultracluster, with more than 10,000 beta testers generating 1.7 million images per day in order to explore this approach.
The core dataset for Stable Diffusion comes from the upcoming CLIP-based AI model LAION-Aesthetics, which filters the images based on how “beautiful” they are. I’m not exactly sure how beauty has been defined in this instance, however. LAION-Aesthetics selects and reworks images from LAION 5B (opens in new tab)‘s massive database, that was created in order address the issue (opens in new tab) that datasets—such as the billions of image and text pairs used by Dall-E and CLIP—have not been made openly available.
Apparently the AI can generate images at 512×512 pixel resolution in just a few seconds, though I assume upscaling to larger images will take a little longer. There’s still a long way to go, with the Stability AI team still researching the current method of image generation.
The great news is that “this will provide the template for the release of many open models we are currently training to unlock human potential.”
What a time to be alive, hey?
“We look forward to the open ecosystem that will emerge around this and further models to truly explore the boundaries of latent space,” the announcement says.
There’s also a note at the bottom from LAION’s Organizational Lead & Researcher, Christoph Schuhmann, who says: “With this project we continue to pursue our mission to make state of the art machine learning accessible for people from all over the world. 100% open. 100% free.”
A noble sentiment. What that appears to say is that Stable Diffusion may well be coming to consumer PCs completely free. If you’re looking to get involved sooner, you can sign up for a first stage of release of the Stable Diffusion AI image generator here (opens in new tab)—that’s for research and academic purposes only, mind.