Running a blog involves three of the most tedious tasks: replying to comments, maintaining English translations, and consistently writing new posts. The AgentForge project automates all three with AI agents.
This post outlines the complete architecture of our blog automation service, which operates across two servers.
System Topology
┌─────────────────────┐ HTTPS ┌─────────────────────┐
│ arm1 server │ ──────────────▶ │ ec1 server │
│ (Agent Operator) │ │ (Blog Hosting) │
├─────────────────────┤ ├─────────────────────┤
│ blog-agent (:8081) │ │ Hugo (nginx) │
│ ├─ CommentHandler │ │ Blog API (:8000) │
│ ├─ TranslateHandler│ │ ├─ translator.py │
│ └─ PostGenerator │ │ ├─ blog_manager.py │
│ │ │ └─ git_handler.py │
│ NATS / PostgreSQL │ │ │
│ Prometheus / Grafana │ │ Git (yarang/blogs) │
└─────────────────────┘ └─────────────────────┘
| Server | Role | Core Services |
|---|---|---|
| arm1 | Agent Operator | blog-agent.service — Flask + Scheduler + LLM Client |
| ec1 | Blog Hosting + API | Hugo (nginx) + blog-api.service (FastAPI) |
Communication between the two servers is restricted to HTTPS API calls only. SSH access from arm1 to ec1 is blocked, so all integrations are done through the Blog API.
arm1: Unified Blog Agent
Why Unified?
Initially, comment response, translation, and post generation operated as separate processes (three systemd services). The issues were:
- Using Claude Code CLI (
--print) for calls resulted in a response time of 9.7 seconds and consumed 688MB of disk space. - Managing six systemd units was burdensome.
- No state sharing between processes was possible.
By unifying these into one process and switching to direct LLM API calls, we achieved the following:
| Metric | Before | After |
|---|---|---|
| Response Time | 9.7s | 1.7s |
| Disk Usage | 688MB | ~50MB |
| systemd Units | 6 | 1 |
| Processes | 3 | 1 |
Architecture
class BlogAgent:
"""1 Process = Flask (webhook) + Scheduler (timer) + LLM Client"""
def __init__(self):
self.config = AgentConfig.from_credentials()
self.llm = LLMClient(self.config) # ZAI glm-4.7
self.api = BlogAPIClient(self.config) # ec1 Blog API
# Handlers
self.comment = CommentHandler(self.llm, self.config)
self.translate = TranslateHandler(self.api)
self.post_gen = PostGenerator(self.llm, self.api)
# Scheduler
self.scheduler = Scheduler()
self.scheduler.every(hours=6, task=self.translate.check_and_sync)
self.scheduler.daily_at(hour=9, task=self.post_gen.generate_and_publish)
Module Operations
1. CommentHandler — AI Comment Response
Receives Webhook events from GitHub Discussions to automatically generate AI comments.
[User Comment] → GitHub Webhook → arm1 Flask → CommentHandler
→ LLM Call (ZAI glm-4.7) → Generate Reply → Post Comment via GitHub API
- Trigger: Webhook event-based (real-time)
- Filtering: Skips blog owner comments and AI-generated comments.
- Security: HMAC-SHA256 Webhook secret verification, Flask-Limiter applied.
2. TranslateHandler — Automatic Translation Trigger
Requests translation synchronization from ec1’s Blog API every 6 hours.
[Scheduler 6h] → TranslateHandler.check_and_sync()
→ POST /translate/sync → ec1 Blog API performs actual translation
arm1 does not perform the translation itself; it only sends a trigger to the ec1 API. The actual translation logic resides in translator.py on ec1.
3. PostGenerator — Automatic Post Generation
Automatically generates technical blog posts every day at 9 AM.
[Scheduler 09:00 KST] → PostGenerator.generate_and_publish()
→ Collect existing topics → Refer to RSS trends → Generate content with LLM
→ Deduplication Check → Publish via Blog API
Deduplication is key. It compares the similarity between new titles and the last 100 existing titles using difflib.SequenceMatcher:
def _is_duplicate_title(self, new_title, existing_titles):
"""Considers it a duplicate if the ratio is >= 0.6"""
new_lower = new_title.lower().strip()
for title in existing_titles[-100:]:
ex_lower = title.lower().strip()
ratio = difflib.SequenceMatcher(None, new_lower, ex_lower).ratio()
if ratio >= 0.6:
return True
return False
ec1: Blog API Translation System
Transition to Gemini
Initially, translations were performed using ZAI (glm-4.7), but a critical issue arose:
glm-4.7 is a reasoning model, which first consumes its
max_tokensbudget forreasoning_content(internal thought process). Ifmax_tokens=256, it uses all 256 tokens for reasoning, leaving the actualcontentas an empty string.
This led to an incident where nine English posts were translated with empty string titles.
Solution: Replaced with Gemini 2.5 Flash Lite.
| Item | ZAI (Previous) | Gemini (Current) |
|---|---|---|
| Model | glm-4.7 (reasoning) | gemini-2.5-flash-lite |
| Translation Time | ~30s/post | ~8s/post |
| Cost | Paid API | Free (1,500 requests/day) |
| Empty Response Issue | Occurred | None |
OpenAI-Compatible Endpoint
Gemini provides an OpenAI-compatible API. The existing code can be used without any changes by simply switching the base URL:
LLM_BASE_URLS = {
"GEMINI": "https://generativelanguage.googleapis.com/v1beta/openai",
"ZAI": "https://api.z.ai/api/coding/paas/v4",
}
Translation Matching Logic
Pairing Korean↔English posts uses date prefix matching:
ko: 2026-05-04-001-개발-생산성-17배-극대화-deepseek-v4와-...
en: 2026-05-04-001-개발-생산성-17배-극대화-deepseek-v4와-...
↑ Same prefix = Same post
Although the slugs might differ in language, if the YYYY-MM-DD-NNN part is the same, it’s recognized as the same post. The prerequisite for this method is that no two posts with the same date and number exist.
Title-in-Body Translation Technique
Translating the title via a separate API call caused issues with empty results from the reasoning model. The solution is to include the title as the first line of the body:
# When requesting translation
prompt = f"# {original_title}\n\n{original_body}"
# Extracting the title from the translation result
if translated.lstrip().startswith("# "):
lines = translated.lstrip().split("\n", 1)
extracted_title = lines[0].lstrip("# ").strip()
translated_body = lines[1].lstrip("\n")
This translates the title and body simultaneously in a single API call, preserving context and saving tokens.
LLM Strategy: Role-Based Model Separation
Not all tasks are handled by a single LLM. Models are separated based on the nature of the task.
| Task | Server | Model | Reason |
|---|---|---|---|
| AI Comment Response | arm1 | ZAI glm-4.7 | Conversational, excellent Korean quality |
| Post Generation | arm1 | ZAI glm-4.7 | Long-form content generation, creativity required |
| Translation (ko→en) | ec1 | Gemini Flash Lite | Non-reasoning, fast and free |
Core Principle: Do not use reasoning models for translation. Reasoning models consume tokens for internal thought processes, making non-reasoning models more suitable for simple conversion tasks.
Monitoring and Operations
Health Check Endpoints
# arm1 agent
curl http://arm1:8081/health
# → {"status":"healthy","agent":"blog-agent","scheduler_jobs":2,"uptime_sec":...}
curl http://arm1:8081/status
# → {"scheduler":[{"name":"auto-translate","last_run":...},{"name":"post-generator","last_run":"2026-05-04"}]}
# ec1 Blog API
curl https://blog.example.com/api/health
# → {"status":"healthy","version":"2.0.0"}
Observability Points
| Metric | Normal Range | Alert Condition |
|---|---|---|
| arm1 uptime | >0 | Service Down |
| scheduler_jobs | 2 | ≠ 2 |
| Translation Sync | ko post count = en post count | Discrepancy occurs |
| Post Generation | 1 post daily | No posts for over 24 hours |
Lessons Learned and Operational Tips
1. The Pitfall of Reasoning Models
It’s often not explicitly stated in documentation that max_tokens combines reasoning and content. If you get an empty response, check the finish_reason—if it’s "length", it indicates insufficient token budget.
2. Value of the OpenAI-Compatible Pattern
When switching translation providers from ZAI to Gemini, the code change was just one line for the base URL. Abstracting to an OpenAI-compatible interface from the start dramatically reduces LLM replacement costs.
3. Constraints of Date Prefix Matching
In the YYYY-MM-DD-NNN pattern, if two or more posts share the same date and number, translation matching will break. The PostGenerator must include logic to check the last number for that date and increment it when generating new posts.
4. Benefits of Process Consolidation
Consolidating three independent services into one resulted in:
- State Sharing (LLM clients, configurations, API clients initialized only once)
- Simplified Deployment (one systemd unit)
- Easier Debugging (logs consolidated in one place)
Future Plans
- Review the integration of arm1 agent’s LLM with Gemini.
- Comment Quality Evaluation Pipeline (monitoring the appropriateness of auto-generated comments).
- Automatic Translation Quality Verification (comparing with back-translation).
- Expanding inter-agent collaboration through the AgentForge framework.
Blog automation aims not for “complete automation,” but for “minimal human intervention.” A structure where AI generates content, humans review it, and the system alerts operators to anomalies is the key to stable operation.