Posted by Paul Ruiz – Senior Developer Relations Engineer, and Kris Tonthat – Technical Author
Earlier this yr, we previewed on-device text-to-image era with diffusion fashions for Android through MediaPipe Options. At present we’re completely happy to announce that that is accessible as an early, experimental answer, Picture Generator, for builders to check out on Android units, permitting you to simply generate photographs solely on-device in as shortly as ~15 seconds on larger finish units. We are able to’t wait to see what you create!
There are three main ways in which you should use the brand new MediaPipe Picture Generator job:
- Textual content-to-image era based mostly on textual content prompts utilizing normal diffusion fashions.
- Controllable text-to-image era based mostly on textual content prompts and conditioning photographs utilizing diffusion plugins.
- Custom-made text-to-image era based mostly on textual content prompts utilizing Low-Rank Adaptation (LoRA) weights that mean you can create photographs of particular ideas that you simply pre-define to your distinctive use-cases.
Fashions
Earlier than we get into all the enjoyable and thrilling elements of this new MediaPipe job, it’s vital to know that our Picture Technology API helps any fashions that precisely match the Secure Diffusion v1.5 structure. You should use a pretrained mannequin or your fine-tuned fashions by changing it to a mannequin format supported by MediaPipe Picture Generator utilizing our conversion script.
You can too customise a basis mannequin through MediaPipe Diffusion LoRA fine-tuning on Vertex AI, injecting new ideas right into a basis mannequin with out having to fine-tune the entire mannequin. Yow will discover extra details about this course of in our official documentation.
If you wish to do that job out as we speak with none customization, we additionally present hyperlinks to a couple verified working fashions in that very same documentation.
Picture Technology via Diffusion Fashions
Probably the most easy approach to strive the Picture Generator job is to present it a textual content immediate, after which obtain a outcome picture utilizing a diffusion mannequin.
Like MediaPipe’s different duties, you’ll begin by creating an choices object. On this case you’ll solely must outline the trail to your basis mannequin recordsdata on the machine. After getting that choices object, you’ll be able to create the ImageGenerator.
|
After creating your new ImageGenerator, you’ll be able to create a brand new picture by passing within the immediate, the variety of iterations the generator ought to undergo for producing, and a seed worth. This can run a blocking operation to create a brand new picture, so you’ll want to run it in a background thread earlier than returning your new Bitmap outcome object.
|
Along with this easy enter in/outcome out format, we additionally assist a manner so that you can step via every iteration manually via the execute() operate, receiving the intermediate outcome photographs again at completely different phases to point out the generative progress. Whereas getting intermediate outcomes again isn’t really helpful for many apps as a result of efficiency and complexity, it’s a good approach to exhibit what’s occurring below the hood. This is a bit more of an in-depth course of, however you’ll find this demo, in addition to the opposite examples proven on this publish, in our official instance app on GitHub.
Picture Technology with Plugins
Whereas with the ability to create new photographs from solely a immediate on a tool is already a big step, we’ve taken it just a little additional by implementing a brand new plugin system which permits the diffusion mannequin to just accept a situation picture together with a textual content immediate as its inputs.
We at the moment assist three other ways you could present a basis to your generations: facial constructions, edge detection, and depth consciousness. The plugins provide the means to supply a picture, extract particular constructions from it, after which create new photographs utilizing these constructions.
LoRA Weights
The third main function we’re rolling out as we speak is the power to customise the Picture Generator job with LoRA to show a basis mannequin a couple of new idea, akin to particular objects, folks, or types introduced throughout coaching. With the brand new LoRA weights, the Picture Generator turns into a specialised generator that is ready to inject particular ideas into generated photographs.
LoRA weights are helpful for instances the place you might have considered trying each picture to be within the model of an oil portray, or a selected teapot to seem in any created setting. Yow will discover extra details about LoRA weights on Vertex AI within the MediaPipe Secure Diffusion LoRA mannequin card, and create them utilizing this pocket book. As soon as generated, you’ll be able to deploy the LoRA weights on-device utilizing the MediaPipe Duties Picture Generator API, or for optimized server inference via Vertex AI’s one-click deployment.
Within the instance under, we created LoRA weights utilizing a number of photographs of a teapot from the Dreambooth teapot coaching picture set. Then we use the weights to generate a brand new picture of the teapot in several settings.
Subsequent Steps
That is just the start of what we plan to assist with on-device picture era. We’re wanting ahead to seeing all the nice issues the developer group builds, so you should definitely publish them on X (formally Twitter) with the hashtag #MediaPipeImageGen and tag @GoogleDevs. You possibly can take a look at the official pattern on GitHub demonstrating every thing you’ve simply discovered about, learn via our official documentation for much more particulars, and control the Google for Builders YouTube channel for updates and tutorials as they’re launched by the MediaPipe workforce.
Acknowledgements
We’d wish to thank all workforce members who contributed to this work: Lu Wang, Yi-Chun Kuo, Sebastian Schmidt, Kris Tonthat, Jiuqiang Tang, Khanh LeViet, Paul Ruiz, Qifei Wang, Yang Zhao, Yuqi Li, Lawrence Chan, Tingbo Hou, Joe Zou, Raman Sarokin, Juhyun Lee, Geng Yan, Ekaterina Ignasheva, Shanthal Vasanth, Glenn Cameron, Mark Sherwood, Andrei Kulik, Chuo-Ling Chang, and Matthias Grundmann from the Core ML workforce, in addition to Changyu Zhu, Genquan Duan, Bo Wu, Ting Yu, and Shengyang Dai from Google Cloud.