Introduction

In a microservices architecture, we often publish events. Thousands or even hundreds of thousands of events are published to notify other services about changes: a user created a profile, an order was paid, a courier took an order for delivery, and so on. All of these are examples of events: UserProfileCreated, OrderPaid, OrderInDelivery. Events are important, and we need to ensure they are published no matter what.

This is exactly what the Outbox pattern does. It ensures that events are reliably published from a microservice, even if the service itself or the message broker crashes, or something else goes wrong.

Outbox pattern

The main idea of the Outbox pattern is to create an additional table and populate it in the same transaction as the business data manipulation. This table is usually called Outbox, but this is not a requirement—you can name it whatever suits your project. After new data is added to the table, a separate process (e.g., a background worker) processes it and publishes the events to the broker.

The diagram above outlines the process, which we can split into 3 steps:

The UserService creates a transaction in the database, where it creates a new user by populating the UserTable and inserting new data into the OutboxTable.
The background process detects that there is a new event in the OutboxTable.
The background process publishes the event to the message broker.

I will show different ways to implement this pattern, but the part of the application we are working with will remain the same. Let's imagine we have a service that creates users, and we must deliver the event that a user was created.

Relational database

Let's have a look how we can implement this approach using the relation database.

The OutboxTable might look like the following:

CREATE TABLE Outbox (
    Id BIGINT IDENTITY(1,1) PRIMARY KEY,
    Type NVARCHAR(255) NOT NULL,       -- Event type (e.g., 'UserCreated')
    Payload NVARCHAR(MAX) NOT NULL,    -- JSON-serialized event data
    Published BIT DEFAULT 0,           -- Whether the event has been published
    CreatedAt DATETIME2 DEFAULT GETDATE(),
    PublishedAt DATETIME2 NULL,
    Attempts INT DEFAULT 0             -- Number of publishing attempts
);

This table implementation allows support for different types of events.

public class UserService
{
    private readonly IUnitOfWork _unitOfWork;
    private readonly IOutboxRepository _outboxRepository;
    private readonly IUserRepository _userRepository;

    public UserService(
        IUnitOfWork unitOfWork,
        IOutboxRepository outboxRepository,
        IUserRepository userRepository)
    {
        _unitOfWork = unitOfWork;
        _outboxRepository = outboxRepository;
        _userRepository = userRepository;
    }

    public async Task CreateUserAsync(string name, string email)
    {
        _unitOfWork.BeginTransaction();

        try
        {
            // Create a user
            var user = await _userRepository.AddAsync(name, email);

            var @event = new UserCreatedEvent(user.Id, user.Name, user.Email);

            // Save event to outbox table
            var outboxMessage = new OutboxMessage(
                "UserCreated",
                JsonSerializer.Serialize(@event),
                false, // Not published
                DateTime.UtcNow,
                null,  // No date of publish
                0      // 0 attempts to publish the message
            );
            await _outboxRepository.AddAsync(outboxMessage);

            await _unitOfWork.SaveChangesAsync();

            _unitOfWork.Commit();
        }
        catch
        {
            _unitOfWork.Rollback();
            throw;
        }
    }
}

This code snippet shows how the UserService with the Outbox pattern can be implemented. As you might notice we added the message to the Outbox table right after saving the user data.

The same logic can be implemented directly in SQL:

BEGIN TRANSACTION

INSERT INTO User(..)
INSERT INTO Outbox(..)

COMMIT TRANSACTION

What seems to me like a small code smell in this implementation is that the business logic is close to the code responsible for events. This is not a major issue, and in many cases, it might be the easiest and most practical approach.

To address this, triggers can be used in databases that support them. This might be slightly harder to implement compared to adding another repository in the code, but it separates business logic from event handling.

CREATE TRIGGER trg_Users_AfterInsert
ON Users
AFTER INSERT
AS
BEGIN
    SET NOCOUNT ON;

    INSERT INTO Outbox (Type, Payload, Published, CreatedAt, PublishedAt, Attempts)
    SELECT
        'UserCreated' AS Type,
        (
            SELECT
                u.Id,
                u.Name,
                u.Email,
                GETDATE() AS EventTimestamp
            FROM inserted u
            FOR JSON PATH
        ) AS Payload,
        0 AS Published, -- Not published yet
        GETDATE() AS CreatedAt,
        NULL AS PublishedAt,
        0 AS Attempts
    FROM inserted;
END;

The database trigger and INSERT are atomic operations: if one fails, the other is rolled back. This works for most relational databases, but it's best to verify if your database supports it.

So far, we've introduced the event storage mechanism, but what about sending the event to a message broker? Common implementations include a background process or a serverless application like Azure Function or AWS Lambda.

The most important part here is how to retrieve the event data. You need to consider how often to query events and how large the batch should be. This is one of the weakest points of the Outbox pattern. For high-load applications that generate a huge number of events, working with the Outbox table might become a bottleneck.

In high-load systems, the Outbox table can grow very quickly, which may negatively impact performance. So, it's crucial to decide how long events should be stored in the database. Should it be a month or more? How often do you need to reprocess all events from the previous month?

public class OutboxWorker : BackgroundService
{
    private readonly IOutboxRepository _outboxRepository;
    private readonly IEventPublisher _eventPublisher;
    private readonly ILogger<OutboxWorker> _logger;
    private readonly TimeSpan _pollingInterval = TimeSpan.FromSeconds(5);

    public OutboxWorker(
        IOutboxRepository outboxRepository,
        IEventPublisher eventPublisher,
        ILogger<OutboxWorker> logger)
    {
        _outboxRepository = outboxRepository;
        _eventPublisher = eventPublisher;
        _logger = logger;
    }

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        _logger.LogInformation("Outbox Worker started.");

        while (!stoppingToken.IsCancellationRequested)
        {
            try
            {
                var messages = await _outboxRepository.GetUnpublishedAsync();
                foreach (var message in messages)
                {
                    try
                    {
                        await _eventPublisher.PublishAsync(message.Type, message.Payload);
                        await _outboxRepository.MarkAsPublishedAsync(message.Id);
                        _logger.LogInformation("Published event {EventType} (ID: {Id})", message.Type, message.Id);
                    }
                    catch (Exception ex)
                    {
                        await _outboxRepository.IncrementAttemptsAsync(message.Id);
                        _logger.LogError(ex, "Failed to publish event {EventType} (ID: {Id}). Attempt {Attempts}", 
                            message.Type, message.Id, message.Attempts + 1);
                    }
                }
            }
            catch (Exception ex)
            {
                _logger.LogError(ex, "Error processing outbox messages.");
            }

            await Task.Delay(_pollingInterval, stoppingToken);
        }

        _logger.LogInformation("Outbox Worker stopped.");
    }
}

The code snippet above shows how to poll events from the Outbox table using background tasks in .NET. Let's look at some implementation details:

Line 26: Uses IOutboxRepository to query the list of unpublished events. This can be a simple SQL query or use an ORM.

SELECT * FROM OutboxMessages WHERE Published = 0 AND Attempts < 5 ORDER BY CreatedAt

In this example, I query all unpublished events, but you might want to limit the batch size if there are many. At the same time, if an event fails 5 times, it might be best to stop further attempts. In such cases, it's better to have a notification that an event cannot be published, so the development team can quickly investigate and fix the issue.

We must ensure all events are published in the correct order, so we use ORDER BY CreatedAt. This is important because events describe the state of the system over time. The event UserSubscriptionAssigned cannot be fired before UserCreated, because you cannot assign a subscription to a user that hasn't been created yet.

Line 31: IEventPublisher is used to publish an event to the broker.
Line 32: The event is marked as Published. After this point, we are no longer interested in this event because it has been published.

What if an exception occurs at this point? The event might have been published, and consumers may have already processed it, but the background process failed to mark the event as published. On the next polling cycle, this event will be published again. Consumers should be prepared for the possibility that the same event may be published multiple times and should be able to process it or skip it.

Thoughts and Considerations

The implementation of the Outbox pattern might seem trivial, but there are some important considerations to keep in mind. Some of these have already been touched on, but I want to summarize them here.

Polling Mechanism

It is important to ensure that the background process is idempotent and marks a message as published only after it has actually been sent.

Another important consideration is the batch size and how often to poll for events. If your system generates 100 events per second but processes only 10 per second, the outbox can become a bottleneck for the entire system because it cannot deliver events in time.

Error Handling

How many times should the system retry sending a message? How should it notify the development team about messages that cannot be delivered? If, for some reason, a message cannot be delivered (e.g., the event broker is down, the topic or queue does not exist), the notification should be received as quickly as possible. The earlier the team knows about the issue, the faster they can resolve it and eliminate serious implications.

Cleaning

The Outbox table must be cleaned; otherwise, it might grow to an unmanageable size. There are many possible approaches to handle this. Let's consider some of them:

Cleaning Strategy: First, decide whether to delete rows older than a week or a month. Alternatively, you might want to clean up old records when the number of records in the table hits a threshold (e.g., 10k, 100k). Or, should you delete the event immediately after sending it? These are questions you need to answer before implementing the cleaning mechanism.
Cleaning Execution: Who or what will clean the table? This could be a database-level cleanup using a database scheduler (if supported by your database). If the database does not support jobs, or if this approach is not suitable for your needs, you can write a cleanup service as a background process or a serverless application.

NoSQL

So far, we've discussed relational databases, which natively support transactions. However, there are many cases where relational data isn't used, and we have to use NoSQL databases like Azure Cosmos DB, MongoDB, or Amazon DynamoDB. Can we use the same approach with NoSQL? The answer is: yes and no. Let's look at a few examples.

Azure Cosmos DB

Azure Cosmos DB does not have tables in the same way as relational databases like MS SQL or PostgreSQL. Instead, it has containers, and the data in containers is divided by logical partitions. There is a special key, PartitionKey, which defines these partitions. When you insert data into the database, all documents in a container with the same PartitionKey are considered to be in the same logical partition.

If you look at the diagram above, you can see a container called data with documents inside it. Documents with the same PartitionKey are in the same logical partition.

Why is this important? Transactions in Azure Cosmos DB are called "transactional batches," and they operate on a single logical partition, guaranteeing ACID properties. This means that our business data and events must have the same PartitionKey and be in the same container.

Let's look how it works. Here's the base class for a document:

public class DocumentBase<T> where T : class
{
    private DocumentBase() { }

    public DocumentBase(string? partitionKey, string type, T data, string? id = null)
    {
        Data = data;
        Type = type;
        PartitionKey = partitionKey;
        Id = id ?? Guid.NewGuid().ToString();
    }

    public string Id { get; set; }
    public string? PartitionKey { get; init; }
    public string Type { get; init; } = null!;
    public T Data { get; init; } = null!;
    public DateTimeOffset CreatedAt { get; set; } = DateTimeOffset.UtcNow;
}

The code snippet below shows how to use a transactional batch to create a user and an event:

var userId = Guid.NewGuid().ToString();

var user = new DocumentBase<User>(
    partitionKey: userId,
    type: "user",
    data: new User
    {
        Name = "John Doe",
        Email = "john.doe@example.com",
    },
    id: userId);

var userCreatedEvent = new DocumentBase<UserCreatedEvent>(
    partitionKey: userId,
    type: "event",
    data: new UserCreatedEvent 
    { 
        Data = new UserCreatedEventData(userId, user.Data.Name, user.Data.Email)
    });


var partitionKey = new PartitionKey(userId);
var transactionalBatch = container.CreateTransactionalBatch(partitionKey);

transactionalBatch.CreateItem(user);
transactionalBatch.CreateItem(userCreatedEvent);

// Execute the transactional batch
var batchResponse = await transactionalBatch.ExecuteAsync();

if (!batchResponse.IsSuccessStatusCode)
{
    throw new InvaidOperationException("Transactional batch failed.");
}

Let's break down the code:

Lines 3–19: Create the User entity and the UserCreatedEvent. Note that we use the same PartitionKey and set the Type property to easily distinguish between user data and events.
Lines 22–29: Create a Transactional Batch using the previously defined PartitionKey. Both entities are added to the batch, which is then executed to save the data into Azure Cosmos DB.
Lines 31–34: Check the result of the batch execution. In a real-world application, you might want to add logging or other actions, but here I simply throw an exception.

After the code executes, you'll find two documents with the same PartitionKey but different IDs.

The user document looks like this:

{
    "id": "0aa14bfb-f5f7-45e3-8112-45b14bec48b3",
    "partitionKey": "0aa14bfb-f5f7-45e3-8112-45b14bec48b3",
    "type": "user",
    "data": {
        "name": "John Doe",
        "email": "john.doe@example.com"
    },
    "createdAt": "2026-06-20T12:57:06.834242+00:00",
    "_rid": "EC4xAOFJBqMNAAAAAAAAAA==",
    "_self": "dbs/EC4xAA==/colls/EC4xAOFJBqM=/docs/EC4xAOFJBqMNAAAAAAAAAA==/",
    "_etag": "\"00000000-0000-0000-00b4-4d98780501dd\"",
    "_attachments": "attachments/",
    "_ts": 1781960228
}

The event looks like this:

{
    "id": "e645943f-208c-4dba-a149-1bcb53388302",
    "partitionKey": "0aa14bfb-f5f7-45e3-8112-45b14bec48b3",
    "type": "event",
    "data": {
        "eventType": "UserCreatedEvent",
        "data": {
            "userId": "0aa14bfb-f5f7-45e3-8112-45b14bec48b3",
            "name": "John Doe",
            "email": "john.doe@example.com"
        }
    },
    "createdAt": "2026-06-20T12:57:07.45013+00:00",
    "_rid": "EC4xAOFJBqMOAAAAAAAAAA==",
    "_self": "dbs/EC4xAA==/colls/EC4xAOFJBqM=/docs/EC4xAOFJBqMOAAAAAAAAAA==/",
    "_etag": "\"00000000-0000-0000-00b4-4d98800501dd\"",
    "_attachments": "attachments/",
    "_ts": 1781960228
}

The next step in implementing a proper Outbox mechanism is reading the event data from Azure Cosmos DB. For this, the database provides a change feed, which is a persistent storage of all changes in all containers. To read the change feed, we can use a background processor or an Azure Function.

I prefer the second approach because if you're using Azure Cosmos DB, your application is likely already in Azure. Creating and deploying a function would be the easiest way.

public class OutboxFunction
{
    private const string TopicName = "user-events";
    
    private readonly ServiceBusClient _serviceBusClient;

    public OutboxFunction(
        IAzureClientFactory<ServiceBusClient> factory)
    {
        _serviceBusClient = factory.CreateClient("Default");
    }

    [Function("OutboxFunction")]
    public async Task Run([CosmosDBTrigger(
        databaseName: "%COSMOS_DATABASE_NAME%",
        containerName: "%COSMOS_CONTAINER_NAME%",
        Connection = "COSMOS_CONNECTION",
        LeaseContainerName = "leases",
        CreateLeaseContainerIfNotExists = true)] IReadOnlyList<DocumentBase<dynamic>> documents)
    {
        var sender = _serviceBusClient.CreateSender(TopicName);

        foreach (var doc in documents.Where(x => x.Type == "event"))
        {
            var message = new ServiceBusMessage(BinaryData.FromObjectAsJson(doc.Data))
            {
                ContentType = "application/json",
                MessageId = Guid.NewGuid().ToString()
            };

            await sender.SendMessageAsync(message);
        }

        await sender.DisposeAsync();
    }
}

Let's review the code:

Lines 14-19: shows how to use CosmosDBTrigger to receive the changed documents. I use DocumentBase<dynamic> because not only events will be returned and easiest way is to use dynamic.

Line 21: create the sender for a topic user-events.

Line 23: change feed detects all changes in a container, so besides the events there will be the real documents which we do not want to send. It is the reason why we filter all documents by the type.

Lines 25-31: the json-serialization and sending the message to the Azure Service Bus topic.

Line 34: disposing the sender for releasing the resources.

This function is simple and lacks exception handling, which is important in production-ready systems. Imagine a scenario where an exception occurs and DisposeAsync is not called. Resources may not be released, and after some time, a flood of exceptions could fill your logging system. Pay attention to this and use try/catch/finally for this function.

In the previous code snippet, I used Azure Service Bus, but this approach will work with any message broker you prefer.

The system diagram for the Azure Cosmos DB approach looks like this:

Thoughts and Considerations

We don't need to poll for events here because the trigger fires after changes in the container. However, we can still receive duplicate events if an exception occurs after sending the event to Azure Service Bus but before the change is processed. This is something to consider.

Another important consideration is the proper cleanup of event data. Without it, the change feed can contain many stale records, which cost money but provide no value. To address this, we can use the TTL (Time To Live) feature of Cosmos DB. TTL defines, in seconds, how long a document will live in the container. It can be set at the container or item level, or disabled entirely.

MongoDB

MongoDB is a popular choice for those who want to use a NoSQL database. Let's see how we can implement the Outbox pattern using MongoDB and Kafka as a message broker.

For cases where atomicity is required across multiple collections, we can use transactions. The only requirement is that a replica set must be created, which is standard for production systems.

The code for creating a user and saving an event is shown below:

app.MapPost("/api/users", async (UserDto dto, IMongoDatabase db) =>
{
    using var session = await db.Client.StartSessionAsync();
    User user = null!;
    try
    {
        await session.WithTransactionAsync(async (s, ct) =>
        {
            var usersColl = db.GetCollection<User>("users");
            var outboxColl = db.GetCollection<OutboxEvent>("outbox_events");

            user = new User { Id = ObjectId.GenerateNewId(), Name = dto.Name, Email = dto.Email };
            await usersColl.InsertOneAsync(s, user, null, ct);

            var evt = new OutboxEvent
            {
                Type = "UserCreated",
                Payload = new BsonDocument
                {
                    {"userId", user.Id.ToString()},
                    {"name", user.Name},
                    {"email", user.Email}
                },
                CreatedAt = DateTime.UtcNow,
                Processed = false
            };
            await outboxColl.InsertOneAsync(s, evt, null, ct);
            return true;
        });
        
        return Results.Created($"/users/{user.Id}", user);
    }
    catch (Exception)
    {
        throw;
    }
});

The general idea remains the same; the only difference is how transactions are used in MongoDB. We need to create a new session (line 3) and, within this session, create a transaction (line 7).

Lines 12–13: Create and save the user.
Lines 15–27: Create and save the event. Note that we set Processed to false, which indicates that the event has not yet been processed.

For the event processor, I chose a simple BackgroundService that starts a Change Stream to listen for changes in the outbox_events collection.

public class OutboxProcessor : BackgroundService
{
    private readonly IMongoDatabase _db;
    private readonly IProducer<string, string> _producer;
    public OutboxProcessor(IMongoDatabase db)
    {
        _db      = db;
        var cfg  = new ProducerConfig { BootstrapServers = Environment.GetEnvironmentVariable("KAFKA_BOOTSTRAP") ?? "kafka:9092" };
        _producer = new ProducerBuilder<string, string>(cfg).Build();
    }

    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        var coll = _db.GetCollection<OutboxEvent>("outbox_events");
        var meta = _db.GetCollection<BsonDocument>("outbox_metadata");

        var pipeline = new BsonDocument[]
        {
            new BsonDocument("$match", new BsonDocument("operationType", "insert"))
        };

        while (!stoppingToken.IsCancellationRequested)
        {
            try
            {
                var resumeDoc = await meta.Find(Builders<BsonDocument>.Filter.Eq("_id", "resume_token"))
                    .FirstOrDefaultAsync(stoppingToken);

                var options = new ChangeStreamOptions();
                if (resumeDoc != null)
                    options.ResumeAfter = resumeDoc["token"].AsBsonDocument;

                using var cursor = await coll.WatchAsync<ChangeStreamDocument<OutboxEvent>>(
                    pipeline, options, stoppingToken);

                await cursor.ForEachAsync(async change =>
                {
                    var evt = change.FullDocument;
                    try
                    {
                        await _producer.ProduceAsync(
                            "user-events",
                            new Message<string, string>
                            {
                                Key   = evt.Id.ToString(),
                                Value = evt.Payload.ToJson()
                            },
                            stoppingToken);

                        var filter = Builders<OutboxEvent>.Filter.Eq(e => e.Id, evt.Id);
                        var update = Builders<OutboxEvent>.Update.Set(e => e.Processed, true);
                        await coll.UpdateOneAsync(filter, update, cancellationToken: stoppingToken);

                        await meta.ReplaceOneAsync(
                            Builders<BsonDocument>.Filter.Eq("_id", "resume_token"),
                            new BsonDocument
                            {
                                { "_id", "resume_token" },
                                { "token", change.ResumeToken }
                            },
                            new ReplaceOptions { IsUpsert = true },
                            cancellationToken: stoppingToken);
                    }
                    catch (Exception ex)
                    {
                        Console.WriteLine($"Failed to process event {evt.Id}: {ex.Message}");
                    }
                }, stoppingToken);
            }
            catch (OperationCanceledException)
            {
                break;
            }
            catch (Exception ex)
            {
                Console.WriteLine($"Change Stream error: {ex.Message}. Reconnecting in 5s...");
                await Task.Delay(5000, stoppingToken);
            }
        }
    }
}

Let's break down the code, as there are some differences from what we've already seen:

Lines 17–20: Create the pipeline for the change stream. In simple terms, this is the configuration that specifies when we want to receive notifications about changes in the collection. There are different stages available, and you can use more than one if needed. In our case, we use $match to check that operationType is equal to insert.
Line 26: Query the resume token from the outbox_metadata collection. The resume token tells the change feed where it stopped previously. If the background process was shut down due to planned maintenance or an unexpected issue, when it starts again, it will know from which point in time to resume. This needs to be implemented because, by default, the change feed does not look back and starts waiting for changes only from the time it begins.
Lines 29-31: Creating the ChangeStreamOptions and if we have a resume token we set ResumeAfter property with the value of token.
Line 33: Create the change feed cursor.
Line 36: Provide the function that will be called for each change in the collection.
Lines 41–48: Send the event to the Kafka topic using IProducer.ProduceAsync.
Lines 50–51: Update the Processed field to true, indicating that the event has already been sent.
Lines 54–62: Save the resume token.

Thoughts and Considerations

As we've seen, the implementation of Outbox for MongoDB follows the same pattern. However, it has a similar problem: cleaning up the collection with outbox events. Here are some approaches.

TTL Index

TTL Index is a special index that tells MongoDB after how much time a document can be removed from the collection. This might be a good idea because it is an automated approach. However, the downside is the difficulty of finding the correct time interval after which events can be deleted. Setting it too high will cause the collection to grow in size, while setting it too low might result in situations where an event needs to be reprocessed but has already been deleted.

Capped Collections

Capped Collections are a special type of collection with a fixed size. When the collection is full and new events arrive, it automatically overwrites the oldest ones. This type of outbox can be helpful when you need predictable storage usage. The downside is that we cannot predict when events will be deleted, as any spike in the number of events could delete many old ones.

Immediate cleaning

Another possibility is to delete the event immediately after it is sent. This can be done in the same background process where the change stream sends the events to the message broker. Again, it is unclear what to do if the event needs to be reprocessed.

Each of these approaches has its pros and cons. Perhaps, as developers, we should not focus on a single approach but combine them. For example, in a system where some events are more important than others, less important events could be deleted immediately after being published, while important ones could wait for automatic deletion using the TTL approach.

Summary

As we have seen, the components of the Outbox pattern remain the same regardless of the database or implementation approach. Whether we use a relational or NoSQL database, the core idea is consistent: we save the event in the database as part of the same transaction that modifies the business data. A background process then retrieves unpublished events and publishes them.

Despite using different databases or approaches for polling events, we still face the same key considerations: how to clean up events, how to avoid duplicates, and how to handle unexpected situations.

The Outbox pattern might look very simple at first glance, but appearances can be deceiving.