Talk Intermediate MIT License (Code) and CC BY-SA 4.0 (Presentation)

From Prompt to Pixel: Automating Workflows with Open-Source Qwen models and ComfyUI

Rejected

Session Description

In the world of LLM and AI tools, everyone is creating pipelines and workflows. We use popular diffusion models and image gen models to create consistent images. We use a node-based tools workflow to create user-friendly pipelines. What if the entire ecosystem from reasoning to rendering can be orchestrated through open-source models and workflow builders?

In this session, we are moving beyond manual prompting to build a fully automated, locally hosted generative pipeline using entirely free and open-source tools. We will answer that by exploring open-weight language models(Qwen models) and node-based GUI(ComfyUI)

We build workflows that connect ComfyUI nodes in a canvas to execute complex, node-based diffusion workflows without any manual clicks. Qwen acts as the brain, analysing high-level user intent to generate highly structured, context-rich prompts programmatically. The workflows contain nodes that orchestrate the image creation

We will demystify the node graph and show how to treat ComfyUI as a powerful, headless backend rather than just a graphical user interface. By exploring the automation logic that seamlessly passes context from the LLM to the diffusion model, this talk will provide an architectural blueprint needed to build your own end-to-end stable diffusion workflows

Note: I can give the talk as a working session

Key Takeaways

Understand the ComfyUI and its usage
Dwell in the realm of open source models like Qwen, Flux, etc.
Ability to build workflows to create images, videos, audios and 3D models using open source models

References

https://www.comfy.org/

https://qwen.ai/home

Session Categories

Tutorial about using a FOSS project

Technology architecture

Talk License: MIT License (Code) and CC BY-SA 4.0 (Presentation)

Speakers

Balaji Venkatraman S Lead Software Engineer(Flutter, AI/ML) | UST Global

I am a versatile software engineer with eight years of experience building scalable, user-centric web and cross-platform applications. Specialising in modern frameworks like React, Next.js, Node.js, and Flutter. I have delivered robust solutions across diverse domains, including FinTech, sustainability, and gaming. Backed by strong cloud expertise in GCP and AWS, I am deeply passionate about continuous innovation, and I am actively expanding my focus into AI/ML and GenAI to craft intelligent, next-generation digital experiences.

Reviews

The proposal talks about the "how" of "prompt to pixel" but there's no clarity on "why". Who is interested in solutions to such problems, and who are the intended users for such solutions? This is unfortunately a general problem with a lot of GenerativeAI work/projects at the moment.

Reviewer #1 Not Sure