A Python-based agent that generates realistic, high-quality images of characters using Stable Diffusion and custom prompts. Seamlessly integrated with Nevermined's Payments API, this agent efficiently handles task requests and billing.
This project is part of a larger workflow that explores the interconnection between agents and how can they communicate and work together. Please, refer to these projects in order to have a full view of the whole process
-
- Coordinates the entire workflow, ensuring smooth task execution across agents.
-
- Generates movie scripts based on input ideas.
-
- Extracts character descriptions from movie scripts for further processing.
-
- Generates realistic character images based on their descriptions.
- Introduction
- Getting Started
- Project Structure
- Integration with Nevermined Payments API
- How to Create Your Own Agent
- Model Download
- License
The Image Generator Agent is an application designed to produce high-quality character images based on detailed prompts. Using Stable Diffusion, it transforms textual character descriptions into stunning visuals.
This agent works within the Nevermined ecosystem, utilizing the Payments API for:
- Task management: Process task requests and return results.
- Billing integration: Ensure tasks align with the allocated budget.
- Event-driven architecture: Automatically process events without a dedicated server.
This agent typically operates after the Character Extraction Agent, taking character descriptions as input, generating corresponding images, and uploading them to IPFS for distribution.
-
Clone the repository:
git clone https://github.com/nevermined-io/image-generator-agent.git cd image-generator-agent
-
Set up a virtual environment (optional but recommended):
python3 -m venv venv source venv/bin/activate
-
Install dependencies:
pip install -r requirements.txt
-
Configure environment variables:
-
Copy the
.env.example
file to.env
:cp .env.example .env
-
Populate the
.env
file with the following details:NVM_API_KEY=YOUR_NVM_API_KEY PINATA_API_KEY=YOUR_PINATA_API_KEY PINATA_API_SECRET=YOUR_PINATA_API_SECRET NVM_ENVIRONMENT=testing # or staging/production AGENT_DID=YOUR_AGENT_DID
-
-
Download the model:
-
Download the
analog-madness
model from CivitAI in safetensor (fp16) format. -
Place the file in the
models
directory:models/ └── analogMadness_v70.safetensors
-
Run the agent with the following command:
python main.py
The agent will subscribe to the Nevermined task system and begin processing image generation requests.
image-generator-agent/
├── src/
│ ├── main.py # Main entry point for the agent
│ ├── image_generator.py # Image generation logic using Stable Diffusion
│ ├── utils/
│ └── utils.py # Utility functions for IPFS uploads
├── models/ # Directory for the Stable Diffusion model
├── .env.example # Example environment variables file
├── requirements.txt # Python dependencies
├── .gitignore # Files and directories to ignore
main.py
: Handles task requests, image generation, and task updates.image_generator.py
: Contains the logic for generating images using Stable Diffusion.utils/utils.py
: Includes helper functions, like uploading images to IPFS.
The Nevermined Payments API is central to this agent’s functionality, providing tools for task management, billing, and event subscription.
-
Initialize the Payments Instance:
from payments_py import Payments, Environment payment = Payments( app_id="image_generator_agent", nvm_api_key=nvm_api_key, version="1.0.0", environment=Environment.get_environment(environment), ai_protocol=True, )
-
Subscribe to Task Updates:
await payment.ai_protocol.subscribe( agent.run, join_account_room=False, join_agent_rooms=[agent_did], get_pending_events_on_subscribe=False )
-
Task Lifecycle:
-
Fetch step details:
step = payment.ai_protocol.get_step(step_id)
-
Update step status:
payment.ai_protocol.update_step( did=step['did'], task_id=step['task_id'], step_id=step['step_id'], step={ 'step_status': 'Completed', 'output_artifacts': [image_url], }, )
-
For detailed integration steps, refer to the official documentation.
Task requests are handled by subscribing to the Nevermined Payments API. The subscribe
method listens for incoming tasks and processes them using the run
function:
await payment.ai_protocol.subscribe(
agent.run,
join_account_room=False,
join_agent_rooms=[agent_did],
get_pending_events_on_subscribe=False
)
The run
function processes incoming tasks:
async def run(data):
step = self.payment.ai_protocol.get_step(data['step_id'])
if step['step_status'] != 'Pending':
return
character = step.get('input_query', '')
image = self.image_generator.generate_image(character)
image_url = upload_image_and_get_url(image)
self.payment.ai_protocol.update_step(
did=step['did'],
task_id=step["task_id"],
step_id=step['step_id'],
step={
'step_status': 'Completed',
'output_artifacts': [image_url],
}
)
The image_generator.py
handles image creation:
from diffusers import StableDiffusionPipeline
import torch
class ImageGenerator:
def generate_image(self, character):
prompt = f"photo of {character} (cinematic lighting:1.1) ..."
return self.pipe(prompt).images[0]
Logs track task progress and errors:
from payments_py.data_models import TaskLog
await self.payment.ai_protocol.log_task(TaskLog(
task_id=step['task_id'],
message='Image generated successfully.',
level='info',
task_status='Completed'
))
The analog-madness model is used for image generation. Download it from CivitAI in safetensor (fp16) format and place it in the models
directory.
models/
└── analogMadness_v70.safetensors
Copyright 2024 Nevermined AG
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.