{"id":183736,"date":"2026-06-19T17:12:27","date_gmt":"2026-06-19T17:12:27","guid":{"rendered":"https:\/\/mycryptomania.com\/?p=183736"},"modified":"2026-06-19T17:12:27","modified_gmt":"2026-06-19T17:12:27","slug":"deploying-a-multi-service-ai-platform-on-a-budget","status":"publish","type":"post","link":"https:\/\/mycryptomania.com\/?p=183736","title":{"rendered":"Deploying a Multi-Service AI Platform on a Budget"},"content":{"rendered":"<h3>It\u2019s Not One\u00a0Thing<\/h3>\n<p>The first time I tried to deploy this platform, I treated it like a single application. One repository, one deploy, done. That lasted about 30 seconds before I realized this thing has at least four separate processes that all need to run simultaneously:<\/p>\n<p>A WebSocket relay that maintains a persistent connection to the data\u00a0sourceAn API server that handles REST endpoints, WebSocket broadcasting, and scheduled jobsA vector database for semantic\u00a0searchA frontend web application<\/p>\n<p>These aren\u2019t optional components. They all need to be running for anything to work. The relay feeds data in. The API serves it out. The vector database stores it. The frontend displays it. Kill any one of them and the whole system is degraded.<\/p>\n<p>And they can\u2019t all run in the same process. The WebSocket relay is a long-running blocking connection. The API server is an async web framework. The vector database is a separate service entirely. The frontend is a static site that gets built and served independently.<\/p>\n<h3>The Multi-Service Reality<\/h3>\n<p>Cloud platforms generally assume you\u2019re deploying one thing. A web app. A worker process. A database. You pick a template, push your code, and it figures out how to run\u00a0it.<\/p>\n<p>When you need four services from the same repository, each with different startup commands, different resource requirements, and different networking needs, the deployment story gets complicated fast.<\/p>\n<p>Each service needs its own configuration:<\/p>\n<p>The relay needs a startup command that runs the WebSocket listener\u00a0scriptThe API needs a startup command that runs the web framework with the right host and port\u00a0bindingsThe vector database runs as a Docker container with persistent storageThe frontend needs a build step followed by a static file\u00a0server<\/p>\n<p>They also need to talk to each other. The relay writes to the vector database. The API reads from both the vector database and the analytics database. The frontend talks to the API. These internal connections need to use private networking so they\u2019re fast and don\u2019t incur external traffic\u00a0costs.<\/p>\n<p>And only some services need public URLs. The API needs one so the frontend can reach it. The frontend needs one so users can access it. The relay and the database should be internal\u00a0only.<\/p>\n<h3>The GPU\u00a0Problem<\/h3>\n<p>Here\u2019s a fun one. The local embedding model that runs great on my development machine with a GPU? It doesn\u2019t work on cloud infrastructure that only has CPUs. This seems obvious in retrospect, but the first deployment just crashed with cryptic errors about missing CUDA\u00a0drivers.<\/p>\n<p>The fix required building a fallback chain for the embedding system.<\/p>\n<p>On startup, the system checks what hardware is available:<\/p>\n<p>GPU with CUDA? Use the configured model with half-precision acceleration.CPU only? Switch to a CPU-optimized model automatically.That model fails to load? Try the next one in the fallback\u00a0chain.All models fail? Disable embeddings gracefully and continue running everything else.<\/p>\n<p>The key word is \u201cgracefully.\u201d The platform should still work even if embeddings are broken. You lose semantic search, but the analytics, the agents, the momentum tracking. all of that can function without embeddings. So the system logs the failure, sets a flag, and keeps\u00a0going.<\/p>\n<p>This fallback chain was designed after the first deployment failed at 2am and I had to wake up to figure out why the whole platform was down because one model couldn\u2019t load. Now the worst case is degraded functionality, not a\u00a0crash.<\/p>\n<h3>Environment Variable\u00a0Hell<\/h3>\n<p>Four services, all reading from the same set of environment variables but each needing slightly different values. The API server needs to know the public URL of itself for CORS headers. The relay needs the internal URL of the vector database. The frontend needs the public URL of the API. The vector database needs its storage configuration.<\/p>\n<p>And then there are the shared secrets. The AI model API keys, the cryptographic wallet key, the database credentials. These need to be the same across services but configured separately in each one because they\u2019re separate deployment units.<\/p>\n<p>I ended up with a master environment template that documents every variable, which services need it, and what the default value should be. Without this, every deployment was a game of \u201cwhich service is failing because I forgot to set QDRANT_URL in that specific service\u2019s config.\u201d<\/p>\n<p>The most annoying bugs were always the environment ones. Service A works fine. Service B works fine. But they can\u2019t talk to each other because one is using the public URL and the other is using the internal URL and they\u2019re subtly different. Or one service has a trailing slash in an environment variable and the other doesn\u2019t, and the URL construction breaks.<\/p>\n<h3>Cost Management<\/h3>\n<p>Running four services 24\/7 adds up. The naive approach. give each service generous resources. would cost $100\u2013200\/month. For a project still in development, that\u2019s a\u00a0lot.<\/p>\n<p>The optimization strategy\u00a0was:<\/p>\n<p><strong>Right-size everything.<\/strong> The relay is mostly idle between events. It doesn\u2019t need much CPU or memory. The API handles bursty traffic but isn\u2019t under constant load. The vector database needs memory proportional to the active dataset size. The frontend is static\u00a0files.<\/p>\n<p><strong>Use free tiers where possible.<\/strong> The vector database has a free cloud tier that\u2019s sufficient for moderate-scale usage. The frontend can be hosted on a static site platform for free. That brings you from four paid services to\u00a0two.<\/p>\n<p><strong>Combine where it makes sense.<\/strong> The relay and the API can technically run as a single process with some careful async management. Less clean architecturally, but it halves the compute cost. I kept them separate in production for reliability but combined them in staging to save\u00a0money.<\/p>\n<p><strong>Monitor actual usage.<\/strong> I was paying for compute that was 80% idle. Scaling down to the minimum viable resource allocation for each service cut costs by about 40% with no performance impact.<\/p>\n<h3>The Things That Break at\u00a03am<\/h3>\n<p>Deployments are easy. Keeping things running is hard. Here are the things that actually broke in production:<\/p>\n<p><strong>WebSocket disconnections.<\/strong> The data source occasionally drops the connection without warning. The reconnection logic works, but there\u2019s a gap. usually a few seconds to a minute. where data is being missed. You only notice because the momentum graphs show a dip and then a catch-up\u00a0spike.<\/p>\n<p><strong>Memory leaks.<\/strong> One of the background jobs was accumulating state in a dictionary that never got cleaned up. Worked fine for days, then the process would OOM and restart. The fix was a periodic cleanup sweep, but finding the leak took longer than fixing\u00a0it.<\/p>\n<p><strong>Database connection exhaustion.<\/strong> The analytics database has a connection limit. Under heavy agent processing (multiple agents all querying at once), you can hit it. Connection pooling and query timeouts solved this, but not before a few incidents where the API became unresponsive because all connections were\u00a0stuck.<\/p>\n<p><strong>Clock drift.<\/strong> Two services disagreeing about what time it is by a few seconds. This caused the mover job (which uses timestamps to decide what data is \u201cold enough\u201d to archive) to occasionally skip batches or process the same batch twice. The fix was using database timestamps instead of local clocks for all time-sensitive operations.<\/p>\n<p><strong>Deployment order.<\/strong> If the API deploys before the database finishes its migration, the API crashes on startup because the schema doesn\u2019t match. I added health checks that wait for dependent services to be ready before the application starts accepting traffic.<\/p>\n<h3>What I\u2019d Tell Someone Starting\u00a0Out<\/h3>\n<p>Don\u2019t try to build a monolith and split it later. Design for separate services from the start, even if you deploy them as one thing initially. The separation of concerns pays off immediately in clarity and pays off again when you need to scale or debug individual components.<\/p>\n<p>Invest in your environment configuration management early. One source of truth for all variables, clear documentation of which service needs what, and validation on startup that fails fast with clear error messages.<\/p>\n<p>Build graceful degradation into everything. The system should always prefer \u201crunning with reduced functionality\u201d over \u201ccrashed.\u201d Users can tolerate a missing feature. They can\u2019t tolerate a blank\u00a0screen.<\/p>\n<p>And monitor your costs from day one. Cloud services are designed to make spending money easy and tracking spending hard. Set up cost alerts and review usage weekly. The difference between a $300\/month deployment and a $3000\/month deployment<em> <\/em>is usually just configuration, not capability.<\/p>\n<p><em>This article is part of a 10-part series documenting the journey of building a real-time intelligence platform from scratch ( <\/em><a href=\"https:\/\/naiko.io\/\"><em>https:\/\/naiko.io<\/em><\/a><em> ). Start from the beginning with \u201cI Built a Real-Time Intelligence Platform and the Hardest Part Was the Plumbing.\u201d<\/em><\/p>\n<p><a href=\"https:\/\/medium.com\/coinmonks\/deploying-a-multi-service-ai-platform-on-a-budget-ebd6d03b8484\">Deploying a Multi-Service AI Platform on a Budget<\/a> was originally published in <a href=\"https:\/\/medium.com\/coinmonks\">Coinmonks<\/a> on Medium, where people are continuing the conversation by highlighting and responding to this story.<\/p>","protected":false},"excerpt":{"rendered":"<p>It\u2019s Not One\u00a0Thing The first time I tried to deploy this platform, I treated it like a single application. One repository, one deploy, done. That lasted about 30 seconds before I realized this thing has at least four separate processes that all need to run simultaneously: A WebSocket relay that maintains a persistent connection to [&hellip;]<\/p>\n","protected":false},"author":0,"featured_media":183737,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[2],"tags":[],"class_list":["post-183736","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-interesting"],"_links":{"self":[{"href":"https:\/\/mycryptomania.com\/index.php?rest_route=\/wp\/v2\/posts\/183736"}],"collection":[{"href":"https:\/\/mycryptomania.com\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mycryptomania.com\/index.php?rest_route=\/wp\/v2\/types\/post"}],"replies":[{"embeddable":true,"href":"https:\/\/mycryptomania.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=183736"}],"version-history":[{"count":0,"href":"https:\/\/mycryptomania.com\/index.php?rest_route=\/wp\/v2\/posts\/183736\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/mycryptomania.com\/index.php?rest_route=\/wp\/v2\/media\/183737"}],"wp:attachment":[{"href":"https:\/\/mycryptomania.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=183736"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mycryptomania.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=183736"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mycryptomania.com\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=183736"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}