Building for large systems and long-running background jobs.
Credit: Ilias Chebbi on Unsplash
Months ago, I assumed the role that required building infrastructure for media(audio) streaming. But beyond serving audio as streamable chunks, there were long-running media processing jobs and an extensive RAG pipeline that catered to transcription, transcoding, embedding, and sequential media updates. Building an MVP with a production mindset had us reiterate till we achieved a seamless system. Our approach has been one where we integrated features and the underlying stack of priorities.
Of Primary concern:
Over the course of building, each iteration came as a response to immediate and often “encompassing” need. Initial concern was queuing jobs, which readily sufficed with Redis; we simply fired and forgot. Bull MQ in the NEST JS framework gave us an even better control over retries, backlogs, and the dead-letter queue. Locally and with a few payloads in production, we got the media flow right. We were soon burdened by the weight of Observability:
Logs → Record of jobs (requests, responses, errors).
Metrics → How much / how often these jobs run, fail, complete, etc.
Traces → The path a job took across services (functions/methods called within the flow path).
You can solve some of these by designing APIs and building a custom dashboard to plug them into, but the problem of scalability will suffice. And in fact, we did design the APIs.
Building for Observability
The challenge of managing complex, long-running backend workflows, where failures must be recoverable, and state must be durable, Inngest became our architectural salvation. It fundamentally reframed our approach: each long-running background job becomes a background function, triggered by a specific event.
For instance, an Transcription.request event will trigger a TranscribeAudio function. This function might contain step-runs for: fetch_audio_metadata, deepgram_transcribe, parse_save_trasncription, and notify_user.
Deconstructing the Workflow: The Inngest Function and Step-runs
The core durability primitive is the step-runs. A background function is internally broken down into these step-runs, each containing a minimal, atomic block of logic.
Atomic Logic: A function executes your business logic step by step. If a step fails, the state of the entire run is preserved, and the run can be retried. This restarts the function from the beginning. Individual steps or step-runs cannot be retried in isolation.Response Serialization: A step-run is defined by its response. This response is automatically serialized, which is essential for preserving complex or strongly-typed data structures across execution boundaries. Subsequent step-runs can reliably parse this serialized response, or logic can be merged into a single step for efficiency.Decoupling and Scheduling: Within a function, we can conditionally queue or schedule new, dependent events, enabling complex fan-out/fan-in patterns and long-term scheduling up to a year. Errors and successes at any point can be caught, branched, and handled further down the workflow.
Inngest function abstract:
import { inngest } from ‘inngest-client’;
export const createMyFunction = (dependencies) => {
return inngest.createFunction(
{
id: ‘my-function’,
name: ‘My Example Function’,
retries: 3, // retry the entire run on failure
concurrency: { limit: 5 },
onFailure: async ({ event, error, step }) => {
// handle errors here
await step.run(‘handle-error’, async () => {
console.error(‘Error processing event:’, error);
});
},
},
{ event: ‘my/event.triggered’ },
async ({ event, step }) => {
const { payload } = event.data;
// Step 1: Define first step
const step1Result = await step.run(‘step-1’, async () => {
// logic for step 1
return `Processed ${payload}`;
});
// Step 2: Define second step
const step2Result = await step.run(‘step-2’, async () => {
// logic for step 2
return step1Result + ‘ -> step 2’;
});
// Step N: Continue as needed
await step.run(‘final-step’, async () => {
// finalization logic
console.log(‘Finished processing:’, step2Result);
});
return { success: true };
},
);
};
The event-driven model of Inngest provides granular insight into every workflow execution:
Comprehensive Event Tracing: Every queued function execution is logged against its originating event. This provides a clear, high-level trail of all activities related to a single user action.Detailed Run Insights: For each function execution (both successes and failures), Inngest provides detailed logs via its ack (acknowledge) and nack (negative acknowledgment) reporting. These logs include error stack traces, full request payloads, and the serialized response payloads for every individual step-run.Operational Metrics: Beyond logs, we gained critical metrics on function health, including success rates, failure rates, and retry count, allowing us to continuously monitor the reliability and latency of our distributed workflows.
Building for Resilience
The caveat to relying on pure event processing is that while Inngest efficiently queues function executions, the events themselves are not internally queued in a traditional messaging broker sense. This absence of an explicit event queue can be problematic in high-traffic scenarios due to potential race conditions or dropped events if the ingestion endpoint is overwhelmed.
To address this and enforce strict event durability, we implemented a dedicated queuing system as a buffer.
AWS Simple Queue System (SQS) was the system of choice (though any robust queuing system is doable), given our existing infrastructure on AWS. We architected a two-queue system: a Main Queue and a Dead Letter Queue (DLQ).
We established an Elastic Beanstalk (EB) Worker Environment specifically configured to consume messages directly from the Main Queue. If a message in the Main Queue fails to be processed by the EB Worker a set number of times, the Main Queue automatically moves the failed message to the dedicated DLQ. This ensures no event is lost permanently if it fails to trigger or be picked up by Inngest. This worker environment differs from a standard EB web server environment, as its sole responsibility is message consumption and processing (in this case, forwarding the consumed message to the Inngest API endpoint).
UNDERSTANDING LIMITS AND SPECIFICATIONS
An understated and rather pertinent part of building enterprise-scale infrastructure is that it consumes resources, and they are long-running. Microservices architecture provides scalability per service. Storage, RAM, and timeouts of resources will come into play. Our specification for AWS instance type, for example, moved quickly from t3.micro to t3.small, and is now pegged at t3.medium. For long-running, CPU-intensive background jobs, horizontal scaling with tiny instances fails because the bottleneck is the time it takes to process a single job, not the volume of new jobs entering the queue.
Jobs or functions like transcoding, embedding are typically CPU-bound and Memory-bound. CPU-bound because they require sustained, intense CPU usage, and Memory-Bound because they often require substantial RAM to load large models or handle large files or payloads efficiently.
Ultimately, this augmented architecture, placing the durability of SQS and the controlled execution of an EB Worker environment directly upstream of the Inngest API, provided essential resiliency. We achieved strict event ownership, eliminated race conditions during traffic spikes, and gained a non-volatile dead letter mechanism. We leveraged Inngest for its workflow orchestration and debugging capabilities, while relying on AWS primitives for maximum message throughput and durability. The resulting system is not only scalable but highly auditable, successfully translating complex, long-running backend jobs into secure, observable, and failure-tolerant micro-steps.
Building Spotify for Sermons. was originally published in Coinmonks on Medium, where people are continuing the conversation by highlighting and responding to this story.