Webhook-Based Inbound Email Processing for AI Agents

Email is a two-way channel. Most agent infrastructure handles the outbound side well — SMTP credentials, rate limits, delivery tracking. The inbound side is where things get complicated, and where most teams underestimate the engineering surface area.

Processing inbound email via webhooks is the right architecture for agents. It's event-driven, stateless from the agent's perspective, and fits naturally into the same async processing pipelines agents already use. But getting it right requires understanding what happens between an email hitting your MX records and your agent taking an action on it.

This post covers the full pipeline: DNS setup, SMTP ingestion, webhook payload structure, parsing strategies, threading, idempotency, and common failure modes.

How inbound email becomes a webhook

When someone sends an email to agent@yourdomain.com, here's what happens before your agent sees anything:

MX lookup — The sending MTA queries DNS for the MX record on yourdomain.com. This returns the hostname of the mail server that should accept mail for your domain.
SMTP delivery — The sending MTA opens an SMTP connection to your MX host and delivers the raw message.
Ingestion and parsing — The inbound mail server receives the raw RFC 5322 message, parses headers, extracts body parts, decodes attachments.
Webhook dispatch — The platform POSTs a structured JSON payload to your configured endpoint.

Steps 1–3 are handled by email infrastructure. Your code starts at step 4.

The critical insight: you never want to run your own MX server. Spam filtering, IP reputation checks, bounce handling, raw SMTP edge cases — none of that is the problem you should be solving. Use infrastructure that handles ingestion and hands you a clean webhook.

DNS configuration for inbound

To receive email at your domain, you need an MX record pointing to the mail server that handles ingestion:

yourdomain.com.  300  IN  MX  10  inbound.mailprovider.com.

The priority value (10) matters when you have multiple MX records for redundancy. Lower values are preferred. If you're running a secondary MX for failover:

yourdomain.com.  300  IN  MX  10  primary.mailprovider.com.
yourdomain.com.  300  IN  MX  20  secondary.mailprovider.com.

For agent use cases, you often want to receive email on a subdomain — like agents.yourdomain.com or dynamic addresses like task-{uuid}@in.yourdomain.com. That requires an MX record on the subdomain:

in.yourdomain.com.  300  IN  MX  10  inbound.mailprovider.com.

TTL on MX records matters for cutover. If you're migrating inbound routing, set TTL to 300 seconds before the change so you're not waiting hours for propagation.

The webhook payload

What arrives at your endpoint depends on your inbound email provider, but a well-structured payload should include:

{
  "message_id": "<abc123@mail.gmail.com>",
  "from": {
    "email": "user@example.com",
    "name": "Alice Chen"
  },
  "to": [
    { "email": "agent@yourdomain.com" }
  ],
  "subject": "Re: Order confirmation #8821",
  "date": "2025-01-15T14:23:01Z",
  "headers": {
    "In-Reply-To": "<original@yourdomain.com>",
    "References": "<original@yourdomain.com> <prev@yourdomain.com>",
    "Message-ID": "<abc123@mail.gmail.com>"
  },
  "text_body": "Thanks, I need to change the shipping address.",
  "html_body": "<p>Thanks, I need to change the shipping address.</p>",
  "attachments": [
    {
      "filename": "invoice.pdf",
      "content_type": "application/pdf",
      "size": 42891,
      "url": "https://..."
    }
  ],
  "spam_score": 0.2,
  "spf": "pass",
  "dkim": "pass"
}

Key fields your agent logic should always read:

message_id — The Message-ID header. This is your idempotency key.
In-Reply-To and References — Thread reconstruction. In-Reply-To contains the Message-ID of the email being replied to. References contains the full ancestry chain.
spf / dkim — Authentication results. Don't process unauthenticated email from strangers the same way you'd process authenticated email from known senders.
text_body — Always prefer plain text for agent processing. HTML parsing is fragile.

Parsing strategy: what your agent actually reads

The raw text body of a reply email contains the entire quoted history. Your agent doesn't need that. It needs the new content.

Stripping quoted reply content is harder than it looks. Gmail uses On Mon, Jan 15... before the quoted block. Outlook uses a From: header line inside the body. Some clients use > per line, others use ---Original Message---. Forwarded messages have a different structure than replies entirely.

A reasonable extraction approach:

import re

def extract_reply_body(text: str) -> str:
    # Common reply delimiter patterns
    delimiters = [
        r'^On .+wrote:$',           # Gmail
        r'^-----Original Message-----',  # Outlook
        r'^From: .+$',              # Some clients
        r'^>{1,}',                  # Quoted lines
    ]
    
    lines = text.split('\n')
    cutoff = len(lines)
    
    for i, line in enumerate(lines):
        for pattern in delimiters:
            if re.match(pattern, line.strip(), re.MULTILINE):
                cutoff = i
                break
        if cutoff < len(lines):
            break
    
    return '\n'.join(lines[:cutoff]).strip()

This is a starting point, not a complete solution. Libraries like email-reply-parser (Ruby/Python) handle more edge cases. For production, test against actual reply samples from Gmail, Outlook, Apple Mail, and mobile clients — they all behave differently.

Thread reconstruction and state management

For agents handling multi-turn email conversations, threading is critical. It works through the Message-ID, In-Reply-To, and References headers.

When your agent sends an outbound email, it generates (or receives) a Message-ID. When the user replies, their client sets In-Reply-To: <that-message-id> and References: <earlier-ids> <that-message-id>. Your agent needs to map incoming In-Reply-To values back to conversation state. A simple schema:

CREATE TABLE email_threads (
  thread_id     UUID PRIMARY KEY,
  root_message_id VARCHAR(255) UNIQUE,  -- First message in thread
  agent_id      VARCHAR(255),
  created_at    TIMESTAMP,
  metadata      JSONB
);

CREATE TABLE email_messages (
  message_id    VARCHAR(255) PRIMARY KEY,  -- Full Message-ID header
  thread_id     UUID REFERENCES email_threads,
  direction     VARCHAR(10),  -- 'inbound' | 'outbound'
  from_email    VARCHAR(255),
  subject       VARCHAR(500),
  body_text     TEXT,
  received_at   TIMESTAMP
);

CREATE INDEX ON email_messages (thread_id);

When a webhook arrives:

Check in_reply_to against email_messages.message_id
If found, load the thread_id and retrieve conversation context
If not found, create a new thread
Store the inbound message
Pass thread context to the agent

This lets your agent say "I previously confirmed order #8821 and the user is now asking about shipping" rather than treating every email as a cold start.

Idempotency and delivery guarantees

Webhook delivery is at-least-once. Your endpoint will occasionally receive the same message twice — provider retries, network timeouts, infrastructure hiccups.

Your handler must be idempotent. The Message-ID header is your key:

from fastapi import FastAPI, Request, HTTPException
from redis import Redis

app = FastAPI()
redis = Redis()

@app.post("/webhooks/inbound")
async def handle_inbound(request: Request):
    payload = await request.json()
    message_id = payload["message_id"]
    
    # Deduplicate using Redis SET NX with TTL
    lock_key = f"email:processed:{message_id}"
    was_set = redis.set(lock_key, "1", nx=True, ex=86400)  # 24h TTL
    
    if not was_set:
        # Already processed — return 200 so provider stops retrying
        return {"status": "duplicate", "message_id": message_id}
    
    # Process the email
    await enqueue_for_agent(payload)
    return {"status": "accepted"}

Always return HTTP 200 to the webhook provider, even for duplicates. Return 4xx or 5xx and the provider will retry. Acknowledge receipt with 200, then handle failures internally by requeueing from your own job queue.

Webhook security

Before processing any payload, verify it came from your email provider. Common verification mechanisms:

HMAC signature verification (most providers):

import hmac
import hashlib

def verify_webhook_signature(
    payload: bytes,
    signature: str,
    secret: str
) -> bool:
    expected = hmac.new(
        secret.encode(),
        payload,
        hashlib.sha256
    ).hexdigest()
    return hmac.compare_digest(expected, signature)

The signature is typically in a request header like X-Webhook-Signature. Use hmac.compare_digest instead of == to prevent timing attacks.

Also check SPF and DKIM results on the parsed email itself. An email that fails both and claims to be from @yourcompany.com should be treated with extreme skepticism — don't let it trigger privileged agent actions.

For agents that execute actions based on email commands ("deploy to production", "send invoice to customer"), this authentication layer is not optional. Prompt injection via email is a real attack vector: a malicious email body could contain instructions designed to manipulate your agent's behavior.

Routing and classification

Not every inbound email should go to the same agent or trigger the same workflow. A classification layer before your agent processes the message saves compute and routes work correctly.

Common routing dimensions:

Signal	Use Case
`To:` address	Route `billing@` vs `support@` vs `orders@` to different agents
Subject prefix	`[URGENT]` or `Re:` indicates reply vs new thread
SPF/DKIM status	Authenticated vs unauthenticated senders get different trust levels
Sender domain	Known customer vs cold inbound vs internal
Attachment presence	Triggers document processing pipeline
Spam score	Above threshold → quarantine, don't process

Platforms like Mails.ai handle classification before the webhook fires, so your endpoint receives pre-labeled events rather than having to implement this logic yourself.

For teams building their own routing, a simple rule engine evaluated before enqueueing:

def route_inbound(payload: dict) -> str:
    to_address = payload["to"][0]["email"]
    
    routing_table = {
        "billing@": "billing-agent",
        "support@": "support-agent",
        "orders@": "order-agent",
    }
    
    for prefix, agent_queue in routing_table.items():
        if to_address.startswith(prefix):
            return agent_queue
    
    return "default-agent"

Handling attachments

Attachments in webhook payloads come as either base64-encoded content inline or as pre-signed URLs to fetch separately. Fetch-on-demand is better — it keeps webhook payloads small and lets you defer fetching until you know the attachment is actually needed.

For agents processing attachments:

PDFs → extract text with pdfplumber or pymupdf before passing to the agent
Images → pass to a vision model or OCR pipeline
CSV/Excel → parse to structured data first
Validate content type against actual file signatures, not just the MIME type header

Set a maximum attachment size your agent will process. A 50MB ZIP file arriving via email shouldn't block your processing queue.

Failure modes and edge cases

Encoding issues: Email bodies can be UTF-8, Latin-1, ISO-8859-1, or other encodings. Always decode with error handling:

body = raw_bytes.decode('utf-8', errors='replace')

Bounce messages: Automated bounce notifications (DSNs) have Content-Type: multipart/report and a message/delivery-status part. Don't pass these to your user-facing agent — route them to a delivery monitoring handler.

Out-of-office replies: Headers like X-Autoreply: yes or Auto-Submitted: auto-replied indicate automated responses. Check for these before triggering agent replies to avoid reply loops.

Email loops: If your agent sends a reply and the recipient's autoresponder fires back, you can get infinite loops. Track automated messages per thread and stop processing after a threshold.

MAX_AUTO_REPLIES_PER_THREAD = 3

def should_process(thread_id: str, is_autoreply: bool) -> bool:
    if not is_autoreply:
        return True
    count = db.count_auto_replies_in_thread(thread_id)
    return count < MAX_AUTO_REPLIES_PER_THREAD

Processing architecture

Your webhook endpoint should do minimal work synchronously — verify the signature, deduplicate, enqueue, return 200. Everything else happens asynchronously:

Inbound Email
    → MX Server
    → Inbound Email Platform (parsing, spam filtering)
    → Webhook POST to your endpoint
    → Signature verification + deduplication
    → Enqueue to job queue (Redis/SQS/RabbitMQ)
    → Worker pulls job
    → Classification + routing
    → Thread context loading
    → Agent inference
    → Action execution
    → Optional outbound reply

Keep the webhook endpoint response time under 3 seconds. Providers typically have short timeout windows (3–10 seconds) before marking delivery as failed and retrying.

For inbound email processing at scale, the throughput bottleneck is usually agent inference, not email parsing. Size your worker pool based on agent latency, not email volume.

Frequently Asked Questions

What's the difference between polling IMAP and using webhooks for inbound email?

IMAP poll-based approaches require your code to open a connection on a schedule, check for new messages, and process them. Webhook-based approaches are event-driven — the platform pushes a notification to your endpoint the moment a message arrives. For agents, webhooks are strictly better: lower latency (seconds vs. minutes depending on poll interval), no connection management, and no need to track "last seen" message state. IMAP is useful if you need to access a mailbox you don't control, but for infrastructure you own, webhooks are the right choice.

How do I reconstruct email threads reliably?

Use the References header, not just In-Reply-To. In-Reply-To only contains the immediate parent. References contains the full ancestry chain as a space-separated list of Message-IDs. Store every outbound Message-ID you send, then on inbound, walk the References list from most recent to oldest and find the first one you recognize. That gives you the thread context. Fall back to subject-line matching (stripping Re: prefixes) only when headers are missing, which is rare with modern clients.

How should I handle emails where DKIM passes but the content looks malicious?

DKIM passing means the email was signed by the claimed sending domain — it doesn't guarantee the content is safe. For agents that take actions based on email content, apply a content-level threat model: strip HTML before passing to the agent, use system prompts that explicitly instruct the agent to ignore any instructions embedded in email content, and scope agent permissions so that email-triggered actions can only affect resources associated with the verified sender. Never let an inbound email escalate privileges.

What Message-ID format should my agent use when sending replies?

Generate Message-IDs with the format <unique-id@yourdomain.com>. The local part (before @) should be a UUID or cryptographically random string. The domain part should match your sending domain. Store this in your email_messages table immediately when sending, before the message is delivered, so you can match against it when replies arrive. Many SMTP libraries generate Message-IDs automatically — verify they're using your domain, not the library's default.

How do I prevent my agent from entering an infinite reply loop?

Check three things before sending an automated reply: (1) the Auto-Submitted header — if it's anything other than no, the incoming email is itself automated; (2) the X-Autoreply or Precedence: bulk/list/junk headers; (3) your own outbound count per thread within a time window. Safe rule: if the incoming email has any auto-submission headers, don't reply at all. If it's a legitimate user email, cap automated replies at 3 per thread per 24-hour window.

Should I process the HTML body or plain text body for agent input?

Always prefer plain text. HTML email bodies contain layout markup, tracking pixels, button text, footer boilerplate, and unsubscribe links that are irrelevant to agent reasoning. Plain text is cleaner, shorter, and cheaper to process. If plain text isn't available (some senders send HTML-only), strip HTML tags using a library like bleach in Python or sanitize-html in Node — don't pass raw HTML to your agent.

Webhook-Based Inbound Email Processing for AI Agents

How inbound email becomes a webhook

DNS configuration for inbound

The webhook payload

Parsing strategy: what your agent actually reads

Thread reconstruction and state management

Idempotency and delivery guarantees

Webhook security

Routing and classification

Handling attachments

Failure modes and edge cases

Processing architecture

Frequently Asked Questions

What's the difference between polling IMAP and using webhooks for inbound email?

How do I reconstruct email threads reliably?

How should I handle emails where DKIM passes but the content looks malicious?

What Message-ID format should my agent use when sending replies?

How do I prevent my agent from entering an infinite reply loop?

Should I process the HTML body or plain text body for agent input?

Related guides

Built for agents.
Self-serve at every volume.

Webhook-Based Inbound Email Processing for AI Agents

How inbound email becomes a webhook

DNS configuration for inbound

The webhook payload

Parsing strategy: what your agent actually reads

Thread reconstruction and state management

Idempotency and delivery guarantees

Webhook security

Routing and classification

Handling attachments

Failure modes and edge cases

Processing architecture

Frequently Asked Questions

What's the difference between polling IMAP and using webhooks for inbound email?

How do I reconstruct email threads reliably?

How should I handle emails where DKIM passes but the content looks malicious?

What Message-ID format should my agent use when sending replies?

How do I prevent my agent from entering an infinite reply loop?

Should I process the HTML body or plain text body for agent input?

Related guides

Built for agents.Self-serve at every volume.

Built for agents.
Self-serve at every volume.