
Email is a two-way channel. Most agent infrastructure handles the outbound side well — SMTP credentials, rate limits, delivery tracking. The inbound side is where things get complicated, and where most teams underestimate the engineering surface area.
Processing inbound email via webhooks is the right architecture for agents. It's event-driven, stateless from the agent's perspective, and fits naturally into the same async processing pipelines agents already use. But getting it right requires understanding what happens between an email hitting your MX records and your agent taking an action on it.
This post covers the full pipeline: DNS setup, SMTP ingestion, webhook payload structure, parsing strategies, threading, idempotency, and common failure modes.
How inbound email becomes a webhook
When someone sends an email to agent@yourdomain.com, here's what happens before your agent sees anything:
- MX lookup — The sending MTA queries DNS for the MX record on
yourdomain.com. This returns the hostname of the mail server that should accept mail for your domain. - SMTP delivery — The sending MTA opens an SMTP connection to your MX host and delivers the raw message.
- Ingestion and parsing — The inbound mail server receives the raw RFC 5322 message, parses headers, extracts body parts, decodes attachments.
- Webhook dispatch — The platform POSTs a structured JSON payload to your configured endpoint.
Steps 1–3 are handled by email infrastructure. Your code starts at step 4.
The critical insight: you never want to run your own MX server. Spam filtering, IP reputation checks, bounce handling, raw SMTP edge cases — none of that is the problem you should be solving. Use infrastructure that handles ingestion and hands you a clean webhook.
DNS configuration for inbound
To receive email at your domain, you need an MX record pointing to the mail server that handles ingestion:
yourdomain.com. 300 IN MX 10 inbound.mailprovider.com.
The priority value (10) matters when you have multiple MX records for redundancy. Lower values are preferred. If you're running a secondary MX for failover:
yourdomain.com. 300 IN MX 10 primary.mailprovider.com.
yourdomain.com. 300 IN MX 20 secondary.mailprovider.com.
For agent use cases, you often want to receive email on a subdomain — like agents.yourdomain.com or dynamic addresses like task-{uuid}@in.yourdomain.com. That requires an MX record on the subdomain:
in.yourdomain.com. 300 IN MX 10 inbound.mailprovider.com.
TTL on MX records matters for cutover. If you're migrating inbound routing, set TTL to 300 seconds before the change so you're not waiting hours for propagation.
The webhook payload
What arrives at your endpoint depends on your inbound email provider, but a well-structured payload should include:
{
"message_id": "<abc123@mail.gmail.com>",
"from": {
"email": "user@example.com",
"name": "Alice Chen"
},
"to": [
{ "email": "agent@yourdomain.com" }
],
"subject": "Re: Order confirmation #8821",
"date": "2025-01-15T14:23:01Z",
"headers": {
"In-Reply-To": "<original@yourdomain.com>",
"References": "<original@yourdomain.com> <prev@yourdomain.com>",
"Message-ID": "<abc123@mail.gmail.com>"
},
"text_body": "Thanks, I need to change the shipping address.",
"html_body": "<p>Thanks, I need to change the shipping address.</p>",
"attachments": [
{
"filename": "invoice.pdf",
"content_type": "application/pdf",
"size": 42891,
"url": "https://..."
}
],
"spam_score": 0.2,
"spf": "pass",
"dkim": "pass"
}
Key fields your agent logic should always read:
message_id— TheMessage-IDheader. This is your idempotency key.In-Reply-ToandReferences— Thread reconstruction.In-Reply-Tocontains the Message-ID of the email being replied to.Referencescontains the full ancestry chain.spf/dkim— Authentication results. Don't process unauthenticated email from strangers the same way you'd process authenticated email from known senders.text_body— Always prefer plain text for agent processing. HTML parsing is fragile.
Parsing strategy: what your agent actually reads
The raw text body of a reply email contains the entire quoted history. Your agent doesn't need that. It needs the new content.
Stripping quoted reply content is harder than it looks. Gmail uses On Mon, Jan 15... before the quoted block. Outlook uses a From: header line inside the body. Some clients use > per line, others use ---Original Message---. Forwarded messages have a different structure than replies entirely.
A reasonable extraction approach:
import re
def extract_reply_body(text: str) -> str:
# Common reply delimiter patterns
delimiters = [
r'^On .+wrote:$', # Gmail
r'^-----Original Message-----', # Outlook
r'^From: .+$', # Some clients
r'^>{1,}', # Quoted lines
]
lines = text.split('\n')
cutoff = len(lines)
for i, line in enumerate(lines):
for pattern in delimiters:
if re.match(pattern, line.strip(), re.MULTILINE):
cutoff = i
break
if cutoff < len(lines):
break
return '\n'.join(lines[:cutoff]).strip()
This is a starting point, not a complete solution. Libraries like email-reply-parser (Ruby/Python) handle more edge cases. For production, test against actual reply samples from Gmail, Outlook, Apple Mail, and mobile clients — they all behave differently.
Thread reconstruction and state management
For agents handling multi-turn email conversations, threading is critical. It works through the Message-ID, In-Reply-To, and References headers.
When your agent sends an outbound email, it generates (or receives) a Message-ID. When the user replies, their client sets In-Reply-To: <that-message-id> and References: <earlier-ids> <that-message-id>. Your agent needs to map incoming In-Reply-To values back to conversation state. A simple schema:
CREATE TABLE email_threads (
thread_id UUID PRIMARY KEY,
root_message_id VARCHAR(255) UNIQUE, -- First message in thread
agent_id VARCHAR(255),
created_at TIMESTAMP,
metadata JSONB
);
CREATE TABLE email_messages (
message_id VARCHAR(255) PRIMARY KEY, -- Full Message-ID header
thread_id UUID REFERENCES email_threads,
direction VARCHAR(10), -- 'inbound' | 'outbound'
from_email VARCHAR(255),
subject VARCHAR(500),
body_text TEXT,
received_at TIMESTAMP
);
CREATE INDEX ON email_messages (thread_id);
When a webhook arrives:
- Check
in_reply_toagainstemail_messages.message_id - If found, load the
thread_idand retrieve conversation context - If not found, create a new thread
- Store the inbound message
- Pass thread context to the agent
This lets your agent say "I previously confirmed order #8821 and the user is now asking about shipping" rather than treating every email as a cold start.
Idempotency and delivery guarantees
Webhook delivery is at-least-once. Your endpoint will occasionally receive the same message twice — provider retries, network timeouts, infrastructure hiccups.
Your handler must be idempotent. The Message-ID header is your key:
from fastapi import FastAPI, Request, HTTPException
from redis import Redis
app = FastAPI()
redis = Redis()
@app.post("/webhooks/inbound")
async def handle_inbound(request: Request):
payload = await request.json()
message_id = payload["message_id"]
# Deduplicate using Redis SET NX with TTL
lock_key = f"email:processed:{message_id}"
was_set = redis.set(lock_key, "1", nx=True, ex=86400) # 24h TTL
if not was_set:
# Already processed — return 200 so provider stops retrying
return {"status": "duplicate", "message_id": message_id}
# Process the email
await enqueue_for_agent(payload)
return {"status": "accepted"}
Always return HTTP 200 to the webhook provider, even for duplicates. Return 4xx or 5xx and the provider will retry. Acknowledge receipt with 200, then handle failures internally by requeueing from your own job queue.
Webhook security
Before processing any payload, verify it came from your email provider. Common verification mechanisms:
HMAC signature verification (most providers):
import hmac
import hashlib
def verify_webhook_signature(
payload: bytes,
signature: str,
secret: str
) -> bool:
expected = hmac.new(
secret.encode(),
payload,
hashlib.sha256
).hexdigest()
return hmac.compare_digest(expected, signature)
The signature is typically in a request header like X-Webhook-Signature. Use hmac.compare_digest instead of == to prevent timing attacks.
Also check SPF and DKIM results on the parsed email itself. An email that fails both and claims to be from @yourcompany.com should be treated with extreme skepticism — don't let it trigger privileged agent actions.
For agents that execute actions based on email commands ("deploy to production", "send invoice to customer"), this authentication layer is not optional. Prompt injection via email is a real attack vector: a malicious email body could contain instructions designed to manipulate your agent's behavior.
Routing and classification
Not every inbound email should go to the same agent or trigger the same workflow. A classification layer before your agent processes the message saves compute and routes work correctly.
Common routing dimensions:
| Signal | Use Case |
|---|---|
To: address |
Route billing@ vs support@ vs orders@ to different agents |
| Subject prefix | [URGENT] or Re: indicates reply vs new thread |
| SPF/DKIM status | Authenticated vs unauthenticated senders get different trust levels |
| Sender domain | Known customer vs cold inbound vs internal |
| Attachment presence | Triggers document processing pipeline |
| Spam score | Above threshold → quarantine, don't process |
Platforms like Mails.ai handle classification before the webhook fires, so your endpoint receives pre-labeled events rather than having to implement this logic yourself.
For teams building their own routing, a simple rule engine evaluated before enqueueing:
def route_inbound(payload: dict) -> str:
to_address = payload["to"][0]["email"]
routing_table = {
"billing@": "billing-agent",
"support@": "support-agent",
"orders@": "order-agent",
}
for prefix, agent_queue in routing_table.items():
if to_address.startswith(prefix):
return agent_queue
return "default-agent"
Handling attachments
Attachments in webhook payloads come as either base64-encoded content inline or as pre-signed URLs to fetch separately. Fetch-on-demand is better — it keeps webhook payloads small and lets you defer fetching until you know the attachment is actually needed.
For agents processing attachments:
- PDFs → extract text with
pdfplumberorpymupdfbefore passing to the agent - Images → pass to a vision model or OCR pipeline
- CSV/Excel → parse to structured data first
- Validate content type against actual file signatures, not just the MIME type header
Set a maximum attachment size your agent will process. A 50MB ZIP file arriving via email shouldn't block your processing queue.
Failure modes and edge cases
Encoding issues: Email bodies can be UTF-8, Latin-1, ISO-8859-1, or other encodings. Always decode with error handling:
body = raw_bytes.decode('utf-8', errors='replace')
Bounce messages: Automated bounce notifications (DSNs) have Content-Type: multipart/report and a message/delivery-status part. Don't pass these to your user-facing agent — route them to a delivery monitoring handler.
Out-of-office replies: Headers like X-Autoreply: yes or Auto-Submitted: auto-replied indicate automated responses. Check for these before triggering agent replies to avoid reply loops.
Email loops: If your agent sends a reply and the recipient's autoresponder fires back, you can get infinite loops. Track automated messages per thread and stop processing after a threshold.
MAX_AUTO_REPLIES_PER_THREAD = 3
def should_process(thread_id: str, is_autoreply: bool) -> bool:
if not is_autoreply:
return True
count = db.count_auto_replies_in_thread(thread_id)
return count < MAX_AUTO_REPLIES_PER_THREAD
Processing architecture
Your webhook endpoint should do minimal work synchronously — verify the signature, deduplicate, enqueue, return 200. Everything else happens asynchronously:
Inbound Email
→ MX Server
→ Inbound Email Platform (parsing, spam filtering)
→ Webhook POST to your endpoint
→ Signature verification + deduplication
→ Enqueue to job queue (Redis/SQS/RabbitMQ)
→ Worker pulls job
→ Classification + routing
→ Thread context loading
→ Agent inference
→ Action execution
→ Optional outbound reply
Keep the webhook endpoint response time under 3 seconds. Providers typically have short timeout windows (3–10 seconds) before marking delivery as failed and retrying.
For inbound email processing at scale, the throughput bottleneck is usually agent inference, not email parsing. Size your worker pool based on agent latency, not email volume.
Frequently Asked Questions
What's the difference between polling IMAP and using webhooks for inbound email?
IMAP poll-based approaches require your code to open a connection on a schedule, check for new messages, and process them. Webhook-based approaches are event-driven — the platform pushes a notification to your endpoint the moment a message arrives. For agents, webhooks are strictly better: lower latency (seconds vs. minutes depending on poll interval), no connection management, and no need to track "last seen" message state. IMAP is useful if you need to access a mailbox you don't control, but for infrastructure you own, webhooks are the right choice.
How do I reconstruct email threads reliably?
Use the References header, not just In-Reply-To. In-Reply-To only contains the immediate parent. References contains the full ancestry chain as a space-separated list of Message-IDs. Store every outbound Message-ID you send, then on inbound, walk the References list from most recent to oldest and find the first one you recognize. That gives you the thread context. Fall back to subject-line matching (stripping Re: prefixes) only when headers are missing, which is rare with modern clients.
How should I handle emails where DKIM passes but the content looks malicious?
DKIM passing means the email was signed by the claimed sending domain — it doesn't guarantee the content is safe. For agents that take actions based on email content, apply a content-level threat model: strip HTML before passing to the agent, use system prompts that explicitly instruct the agent to ignore any instructions embedded in email content, and scope agent permissions so that email-triggered actions can only affect resources associated with the verified sender. Never let an inbound email escalate privileges.
What Message-ID format should my agent use when sending replies?
Generate Message-IDs with the format <unique-id@yourdomain.com>. The local part (before @) should be a UUID or cryptographically random string. The domain part should match your sending domain. Store this in your email_messages table immediately when sending, before the message is delivered, so you can match against it when replies arrive. Many SMTP libraries generate Message-IDs automatically — verify they're using your domain, not the library's default.
How do I prevent my agent from entering an infinite reply loop?
Check three things before sending an automated reply: (1) the Auto-Submitted header — if it's anything other than no, the incoming email is itself automated; (2) the X-Autoreply or Precedence: bulk/list/junk headers; (3) your own outbound count per thread within a time window. Safe rule: if the incoming email has any auto-submission headers, don't reply at all. If it's a legitimate user email, cap automated replies at 3 per thread per 24-hour window.
Should I process the HTML body or plain text body for agent input?
Always prefer plain text. HTML email bodies contain layout markup, tracking pixels, button text, footer boilerplate, and unsubscribe links that are irrelevant to agent reasoning. Plain text is cleaner, shorter, and cheaper to process. If plain text isn't available (some senders send HTML-only), strip HTML tags using a library like bleach in Python or sanitize-html in Node — don't pass raw HTML to your agent.