All posts
Architecture··10 min read·Mails.ai Team

Designing Idempotent Email Sending for AI Agents

Designing Idempotent Email Sending for AI Agents

AI agents fail. Networks drop. Retries happen. Without idempotency built into your email sending layer, those failures produce duplicate emails — a confirmation sent three times, an invoice delivered twice, a password reset that confuses the recipient. For human-operated apps this is annoying. For automated agents firing at scale, it's a trust-destroying, deliverability-killing problem.

This post walks through the mechanisms, data structures, and failure modes you need to design an idempotent email pipeline — one where sending the same logical message twice produces exactly one delivery.

What idempotency means for email

Idempotency means that repeating the same operation produces the same outcome as doing it once. For email, the outcome you want to guarantee is: one logical message results in exactly one delivery to the recipient's inbox. The challenge is that SMTP is fire-and-forget — the protocol gives you no native deduplication. You have to build it above the transport layer.

The key insight is that idempotency in email isn't about the SMTP session — it's about the intent to send. You need to capture that intent, assign it a stable identifier, and check that identifier before every send attempt. If it's been seen before and the send succeeded, skip. If it failed, retry safely. If it's in-flight, wait or deduplicate.

The idempotency key: your deduplication primitive

An idempotency key is a stable, deterministic string that uniquely identifies one logical send operation. When your agent decides "send order confirmation to user@example.com for order #4821," that decision should produce exactly one key — regardless of how many times the agent code runs.

How to construct a good key

Don't use random UUIDs generated at send time. They're different on every retry, which defeats the purpose. Instead, derive keys from the semantic content of the intent:

import hashlib
import json

def make_idempotency_key(event_type: str, entity_id: str, recipient: str, version: int = 1) -> str:
    payload = json.dumps({
        "type": event_type,
        "entity": entity_id,
        "to": recipient,
        "v": version
    }, sort_keys=True)
    return hashlib.sha256(payload.encode()).hexdigest()

# Example: order confirmation
key = make_idempotency_key(
    event_type="order.confirmation",
    entity_id="order_4821",
    recipient="user@example.com"
)
# => stable SHA-256 regardless of when or how many times this runs

The version field matters. If you intentionally want to resend — say the order was refunded and reprocessed — bump the version. That produces a new key, which is correct behavior: it's a new logical intent.

Key scope tradeoffs

Scope Example key components Risk
Too narrow order_id + recipient Doesn't distinguish email type; suppresses legitimate resends
Too broad recipient + timestamp Collisions across distinct intents; doesn't deduplicate
Right event_type + entity_id + recipient + version Unique per logical send intent, stable across retries

The deduplication store

The key is only useful if you check it somewhere persistent before sending. That store needs to be fast (you're checking on every send attempt, potentially thousands per minute), atomic (two concurrent retries must not both pass the check), durable (a process restart can't clear it), and TTL-aware (keys for old intents can expire).

Redis with atomic SET NX EX (set if not exists, with expiry) is the standard choice:

import redis

r = redis.Redis(host='localhost', port=6379, decode_responses=True)

def attempt_send(idempotency_key: str, send_fn, ttl_seconds: int = 86400 * 7) -> dict:
    """
    Returns {"status": "sent"} or {"status": "duplicate", "original_message_id": ...}
    """
    lock_key = f"email:lock:{idempotency_key}"
    result_key = f"email:result:{idempotency_key}"

    # Atomic claim: only one caller wins this
    claimed = r.set(lock_key, "claimed", nx=True, ex=ttl_seconds)

    if not claimed:
        # Already sent or in-flight
        stored = r.get(result_key)
        if stored:
            return {"status": "duplicate", "original_message_id": stored}
        else:
            # In-flight — another process is sending right now
            return {"status": "in_flight"}

    # We won the claim — actually send
    try:
        message_id = send_fn()
        r.set(result_key, message_id, ex=ttl_seconds)
        return {"status": "sent", "message_id": message_id}
    except Exception as e:
        # Release lock on failure so retry can try again
        r.delete(lock_key)
        raise

Note the failure handling: if send_fn() throws, you delete the lock key. This allows a future retry to reclaim it. Leave the lock in place after a failure and you permanently suppress a send that never happened — the opposite of what you want.

What to store as the result

Store the Message-ID returned by your email provider, not just a boolean. The Message-ID (format: <unique-string@domain>) is what SMTP uses to thread messages and what your logs reference for debugging. Storing it means that when a retry hits the deduplication check, you can return the original Message-ID — useful for audit trails and for connecting downstream events like opens and clicks back to the original intent.

Failure modes and how to handle each

Idempotent systems fail in specific, predictable ways.

1. Provider accepted but returned error

Some providers accept a message into their queue but return a 5xx on the HTTP response (network timeout after the message was enqueued). Your code sees a failure and retries — but the provider already has the message.

Defense: Use provider-level idempotency keys. Most serious email APIs accept an X-Idempotency-Key or equivalent header. Pass your derived key there. The provider will deduplicate on their side if you retry with the same key within their window (often 24 hours).

import httpx

def send_via_api(idempotency_key: str, payload: dict) -> str:
    response = httpx.post(
        "https://api.email-provider.com/send",
        json=payload,
        headers={
            "Authorization": "Bearer ...",
            "Idempotency-Key": idempotency_key
        },
        timeout=10.0
    )
    response.raise_for_status()
    return response.json()["message_id"]

2. Agent restarts mid-workflow

An agent runs a multi-step workflow: fetch context → compose email → send → update database. The process crashes after send but before the database update. On restart, the agent re-derives the same idempotency key, hits the deduplication store, finds the lock was released (because send threw), and tries again.

Defense: Treat the deduplication store as the source of truth for send status — not your application database. Check the store first on any resume path, not just on first attempt.

3. Distributed agents racing

Two agent instances independently decide to send the same email (e.g., both pick up the same job from a queue that delivers at-least-once). Both derive the same key and hit the store at the same millisecond.

Defense: The SET NX atomicity in Redis handles this. Only one caller gets true from SET NX — the other gets nil and reads the stored result. No lock striping needed unless your throughput requires it (>50k keys/sec on a single Redis node).

4. Legitimate resend requested

A user clicks "resend confirmation." You want to send again even though the original key was used.

Defense: This is exactly what the version field in your key construction is for. The user action should increment the version, producing a new key:

# Original send: version=1
# User-triggered resend: version=2
key = make_idempotency_key("order.confirmation", "order_4821", "user@example.com", version=2)

Request-reply flow for idempotent agents

sequenceDiagram
    participant Agent
    participant DedupeStore
    participant EmailAPI
    participant Recipient

    Agent->>DedupeStore: SET NX email_lock_KEY claimed
    alt Lock claimed
        DedupeStore-->>Agent: OK claimed
        Agent->>EmailAPI: POST send with Idempotency-Key header
        EmailAPI-->>Agent: 200 message_id abc123
        Agent->>DedupeStore: SET email_result_KEY abc123
        EmailAPI->>Recipient: Deliver message
    else Lock exists and result stored
        DedupeStore-->>Agent: nil lock exists
        Agent->>DedupeStore: GET email_result_KEY
        DedupeStore-->>Agent: abc123
        Agent-->>Agent: Skip send return cached result
    else Lock exists no result yet
        DedupeStore-->>Agent: nil lock exists
        DedupeStore-->>Agent: nil no result yet
        Agent-->>Agent: Return in_flight status
    end

TTL strategy for the deduplication store

Keys can't live forever — you'll exhaust memory. But they need to live long enough to cover all realistic retry windows.

  • Transactional emails (order confirmations, password resets): 7-day TTL covers any plausible retry scenario
  • Scheduled agent emails (weekly reports, digest emails): TTL should be at least 2x your scheduling interval — if your agent runs weekly, use a 14-day TTL
  • High-frequency notifications: 24-hour TTL is usually sufficient; these are time-sensitive and a next-period re-trigger should send fresh

Store lock keys and result keys with the same TTL. If your result key expires before the lock key (or vice versa), you create inconsistent state.

Idempotency at the DKIM/SPF layer

Deduplication keeps you from sending duplicates — but you still need the underlying emails to be trusted by receiving servers. Each message your agent sends should carry a valid DKIM signature tied to your sending domain. Duplicate detection happens before the SMTP layer, so your DKIM infrastructure isn't involved in deduplication. But here's the subtlety: a replayed SMTP connection carrying an old DKIM signature will still pass DKIM verification — DKIM doesn't have replay protection by design. Your deduplication store is what prevents semantic duplicates, not the authentication layer.

This is why dedicated IP infrastructure and sender reputation matter for agents specifically — high-volume automated senders that occasionally produce duplicates (before idempotency is wired up) accumulate spam complaints that stick to the IP, not just the message.

Integrating with event-driven agent architectures

Most agents aren't calling a send function directly — they're responding to events from a queue (SQS, Kafka, Pub/Sub). Those queues are almost always at-least-once delivery. That means your idempotency layer isn't optional; it's the primary defense against duplicate processing.

def handle_event(event: dict):
    # Derive key from event content — not from event metadata
    key = make_idempotency_key(
        event_type=event["type"],
        entity_id=event["data"]["order_id"],
        recipient=event["data"]["email"]
    )
    
    result = attempt_send(
        idempotency_key=key,
        send_fn=lambda: send_order_confirmation(event["data"])
    )
    
    if result["status"] == "duplicate":
        # Log but don't error — this is expected behavior
        logger.info("Deduplicated send", key=key, original_id=result["original_message_id"])
    elif result["status"] == "in_flight":
        # Re-enqueue with backoff — another worker is handling it
        raise RetryableError("Send in flight, retry later")

Deriving the key from event content (order_id, email) rather than event metadata (event_id, timestamp) ensures that even if the same logical event is published twice with different IDs, you still deduplicate correctly. Event IDs are not stable semantic identifiers.

Observability: what to instrument

Idempotency without observability is flying blind. Instrument these metrics:

  • Deduplication rate (email.send.deduplicated / email.send.attempted): should be low in steady state; spikes indicate upstream retry storms or agent loops
  • Lock contention rate: how often you hit the in_flight case — high contention suggests you need distributed locking or your workers are processing the same events in parallel too aggressively
  • Key TTL distribution: are keys expiring before you expect? Indicates your TTL is too short for your retry window
  • Provider-level deduplication: some providers (including Mails.ai) expose whether a send was deduplicated at their layer — reconcile this against your store to catch cases where your deduplication didn't fire but theirs did

Frequently Asked Questions

How is an idempotency key different from a Message-ID?

A Message-ID (e.g., <abc123@yourdomain.com>) is assigned by the mail server at send time and is used for SMTP threading — it appears in email headers and lets clients group replies. An idempotency key is your application-level identifier for the intent to send, checked before the send ever happens. The Message-ID is the result; the idempotency key is the gate. You store the Message-ID as the result value in your deduplication store.

What happens if my Redis instance goes down?

You have two options: fail open (skip the deduplication check and send anyway, accepting potential duplicates) or fail closed (refuse to send until the store is available). For most agent workflows, fail open is preferable — a rare duplicate is better than missed delivery. Gate your fail-open path with a circuit breaker and alert aggressively so you restore the store quickly.

Can I use a relational database instead of Redis for the deduplication store?

Yes, with caveats. You need a table with a unique index on the idempotency key and an atomic upsert (INSERT ... ON CONFLICT DO NOTHING in PostgreSQL). The performance is fine for moderate volumes (< 1k sends/sec). Above that, write contention on a single table becomes a bottleneck, and Redis's O(1) atomic SET NX is the better tool.

How do I handle idempotency for bulk sends — e.g., an agent sending 10,000 newsletter emails?

Each recipient gets their own idempotency key: make_idempotency_key("newsletter.issue_42", "issue_42", recipient_email). The batch is the logical grouping but individual deliveries are the atomic unit. This lets you safely retry a failed batch without re-sending to addresses that already received it.

Do email providers support idempotency keys natively?

Some do. Check your provider's API documentation for Idempotency-Key header support or equivalent fields. When available, pass your derived key — this gives you a second deduplication layer at the provider level, which covers the specific failure mode where the provider accepts your message but the HTTP response is lost in transit. Your application-level store is still necessary because provider-side idempotency windows are typically short (24 hours) and don't cover semantic intent across different API sessions.

How should I version idempotency keys when email content changes?

Content changes fall into two categories. If the change is incidental (you fixed a typo in a template), the logical intent is the same — don't bump the version, the recipient doesn't need the email again. If the change represents a new business event (the order was modified, the price changed), that's a new intent — bump the version or include the content-relevant field (like order_version) in the key derivation. Tie key versioning to semantic intent, not to template revisions.

Closed beta

Built for agents.
Self-serve in minutes.

Public API opens Q3 2026. Drop ~6 lines into your agent and ship.

npmpnpmbunpip
$ npm install @mailsai/sdk
Packages publish with cohort 1 · Q3 2026