Knowing Better IT

Stop Adding AI to Your Architecture

Krzysztof Bardoński — Wed, 18 Feb 2026 12:37:13 +0000

If your AI initiative looks like “We’ll call the model from this service”, you’re not integrating AI. You’re adding a very expensive, probabilistic dependency into a deterministic system.

I’ve seen multiple architectures where AI was “successfully integrated.” The pattern was always the same: a new microservice wrapping a model API, wired into an existing request/response flow. It worked in staging. In production, it amplified latency, introduced non-determinism, and quietly inflated OpEx.

The problem isn’t the model. The problem is trying to graft AI onto an architecture that was never designed for it.

AI is not a feature. It is an architectural force multiplier. And multipliers amplify weaknesses.

Quick summary (for busy readers)

AI violates core architectural assumptions: determinism, stable latency, and predictable cost.
Wrapping AI in a microservice does not isolate its risk.
Synchronous integration is the most common failure pattern.
AI requires economic circuit breakers, not just autoscaling.
Deterministic and probabilistic domains must be separated.
Redesign means accepting reduced control and engineering around it.

The Core Mistake:

Treating AI as Just Another Service

Most distributed systems are designed around three assumptions:

Deterministic outputs
Predictable latency
Stable cost per request

AI violates all three.

When you insert a large language model or ML inference endpoint into a synchronous flow, you introduce:

Variable latency (cold starts, token length variance, queue contention)
Non-deterministic outputs
Cost tied to input size and usage patterns

You don’t “add” something like that. You redesign around it.

The Three Architectural Illusions

Illusion 1: “It’s Just Another API”

A traditional API:

Returns deterministic responses
Has bounded execution time
Fails explicitly

An AI endpoint:

Produces probabilistic outputs
May degrade in quality without failing
Can hallucinate confidently

Architectural implication: You cannot treat AI responses as authoritative truth inside a transactional workflow. If your order-processing flow blocks on a generative AI call for product classification, you’ve just tied revenue to a probabilistic function.

Redesign principle: AI outputs should influence decisions, not directly execute them.

Illusion 2: “We’ll Just Scale It”

AI scaling introduces:

GPU allocation constraints
Token-based billing volatility
Throughput degradation under context growth

Scenario:

A team once embedded document summarization into a customer portal. Everything worked until users started pasting 40-page PDFs. Token usage multiplied cost per request by 6x within days.

If you do not design economic constraints at the architectural level, you are trusting user behavior to stay rational.

That’s not a strategy.

Redesign principle: AI systems require economic circuit breakers not just autoscaling.

Examples:

Hard token limits or budgets per tenant or environment.
Fast token estimation mechanisms before hitting the model.
Fallback to cheaper models under load.
Context compression pipelines.

Illusion 3: “We’ll Wrap It in a Microservice”

Wrapping AI inside a microservice does not isolate its complexity. It pushes uncertainty deeper into the system.

Common symptoms:

Retry storms due to transient inference failures
Latency propagation across synchronous chains
Observability blind spots (you log HTTP 200, but quality is degrading)

You don’t need another microservice. You need a new interaction model.

What Redesign Actually Means

Redesign does not mean rewriting your platform. It means changing architectural posture around four core areas.

A. Move From Synchronous to Asynchronous by Default

Most AI integrations fail because they assume request/response is sacred.

Instead:

Queue AI tasks
Allow partial responses
Accept eventual consistency where possible
Separate user interaction from heavy inference

Mental model:

Treat AI like a background analyst, not a blocking function.

When we redesigned a document processing system this way, perceived latency dropped even though actual processing time remained the same.

B. Separate Deterministic and Probabilistic Domains

Do not mix business critical logic with probabilistic outputs.

Create boundaries:

Deterministic core: billing, compliance, state transitions
Probabilistic layer: classification, summarization, recommendations

The deterministic layer must validate, constrain, or post-process AI outputs.

This prevents:

Hallucinated database writes
Invalid state transitions
Regulatory exposure

C. Design for Model Evolution

AI components evolve faster than traditional services. If your architecture assumes:

One model
One embedding strategy
One prompt structure

…it will eventually fail.

Redesign patterns:

Abstract model providers behind capability interfaces
Externalize prompts/configuration
Version embeddings and schemas explicitly

Models are volatile dependencies. Treat them like third-party infrastructure, not internal code.

D. Build Observability Around Quality, Not Just Uptime

Traditional monitoring answers:

Is the service up?
Is latency acceptable?

AI monitoring must answer:

Is output quality degrading?
Is distribution shifting?
Are hallucination rates increasing?

If you don’t track semantic drift, you are operating blind. This is where many “working” AI systems quietly deteriorate for months.

The Mental Model:

AI as a Volatile Core

Think of traditional architecture like steel beams. AI is not steel, it’s reinforced glass, powerful, transparent, valuable but brittle under incorrect load assumptions. If you embed reinforced glass into a structure designed only for steel, stress fractures appear.

Redesign means:

Adjusting load distribution
Adding structural buffers
Accounting for material properties

AI changes material properties of your system.

Signs You Haven’t Redesigned (You’ve Just Added AI)

AI calls sit inside synchronous business flows.
There is no fallback mode.
Costs are monitored monthly, not per request.
Outputs are trusted without validation.
Model changes require code changes.

If this describes your system, you haven’t integrated AI.

You’ve increased systemic risk.

Conclusion:

AI Demands Architectural Humility

Stop adding AI to your architecture.

Redesign around:

Probabilistic outputs
Economic volatility
Latency variability
Continuous evolution

You must recognize that AI is not a plug-in capability. It alters system dynamics at a structural level.

When you redesign intentionally, AI becomes leverage.

When you bolt it on, it becomes technical debt with a GPU bill attached.

Key Takeaways:

AI violates deterministic architectural assumptions.
Synchronous integration is the most common failure pattern.
Economic circuit breakers are architectural, not financial afterthoughts.
Deterministic and probabilistic domains must be separated.
Observability must include quality and drift metrics.

If you’re leading AI initiatives, don’t ask:

“Where do we call the model?”

Ask instead:

“What must change in our architecture because we no longer control the output?”

That question changes everything.

5 Architectural Decisions That Will Make or Break Your AI Product

Krzysztof Bardoński — Mon, 16 Feb 2026 18:13:41 +0000

The most expensive part of your AI product isn't the model - it's the architecture you built to keep it alive

Scenario:

Last Black Friday, a client’s AI powered pricing engine triggered $20,000 in cloud overage in just 48 hours. The culprit? A recursive retry loop in their realtime inference pipeline. The model itself worked exactly as designed. The problem was the architecture surrounding it.

Architects know this story too well: AI does not fail gracefully. It magnifies every latent design flaw data bottlenecks, latency spikes, and runaway costs. This article cuts past the hype to focus on the decisions that actually determine if an AI product survives production, backed by hard-earned post-mortem lessons.

Quick summary (for busy readers)

Execution is Economics: Real-time AI is a financial risk. If you don't have economic circuit breakers at the API Gateway level, a bug can bankrupt a project in a weekend.
Avoid the VDB Trap: Don't buy a specialized Vector Database until you've proven that pgvector (Postgres) or your existing search engine can't handle the OpEx and latency requirements.
Models are Code: If your model isn't versioned, tested via CI/CD, and capable of an automated rollback, it isn't a production artifact - it's an experiment.
Design for Drift: AI is probabilistic. If you aren't monitoring for silent output degradation (drift), your system will eventually lie to your users without triggering a single "Error 500."

1. Deployment Pattern: Real-Time, Batch, or Streaming

The Architect’s Dilemma: Sub-second inference is rarely free. Serverless GPU functions look attractive, but cold-start latency and provisioning delays often blow SLAs when traffic spikes.

Hidden trap: Treating serverless AI functions as “always available.” This leads to request queuing, which triggers client retries, leading to a death spiral of cost and latency.
Trade-off: Batch inference is cheap and stable but useless for interactive UX. Streaming pipelines are resilient but introduce significant orchestration overhead.

Post-mortem — “The $20,000 Weekend”:

A fraud detection model was deployed on ephemeral GPU instances. During a traffic spike, cold starts triggered a 5-second delay. The calling service retried every 1 second. The cloud bill doubled before the on-call engineer even logged in. The fix wasn’t a better model; it was implementing pre-warmed instances and aggressive request throttling.

2. Data Architecture and the Vector DB Trap

AI systems are essentially data pipelines with an expensive, probabilistic function in the middle. If your plumbing leaks, your model is irrelevant.

The Vector DB trap: Specialized vector databases are the “shiny toy” of the year. However, their OpEx is high. In 90% of use cases, an extension like PGVector on your existing Postgres instance provides 100% of the required functionality at a fraction of the cost.

Feature consistency: Without a proper feature store, the data used for training will differ from the data seen at inference time (training-serving skew), causing models to fail in ways that are nearly impossible to debug.

Post-mortem — “The RAG Bottleneck":

A knowledge retrieval system used a fully managed vector DB. Query latency hit 3 seconds at peak. By migrating to Postgres + PGVector, the team halved latency and reduced infrastructure costs by 75%, simply by keeping the embeddings closer to the primary application data.

3. Model Management and Lifecycle Complexity

Models are production artifacts, not static scripts. Treating them as “special” usually means they bypass the rigor of standard software engineering.

Versioning pitfalls: Deploying a “better” model version without a side-by-side (A/B) test or a clear rollback path is a recipe for silent failure.
CI/CD for ML: Automated testing must include “model evaluation” steps. Does the new version still pass the gold-standard test set, or did it trade general accuracy for a specific performance gain?

Post-mortem — “The Hallucinating Support Bot”:

4. Scalability, Latency, and Economic Circuit Breakers

AI architecture is unique because a logic bug can have immediate, five-figure financial consequences. Traditional auto-scaling focuses on availability; AI scaling must focus on economic survival.

Token Budgets: Implement hard limits on the number of tokens a specific user, session, or tenant can consume. This prevents “prompt injection” or recursive loops from draining your budget.

Tiered Fallbacks: When latency spikes or primary GPU clusters are full, the architecture should automatically fall back to a “smaller” model (e.g., falling back from a 70B parameter model to an 8B model). This preserves UX at the cost of a temporary dip in intelligence.

The API Gateway Guard: Economic circuit breakers should live at the Gateway level. If the cost-per-minute exceeds a threshold, the system should kill the connection before the cloud provider’s billing department does.

Takeaway:

Scalability in AI isn’t just about handling load; it’s about Financial Engineering. You need the ability to “degrade gracefully” rather than “spend infinitely.“

5. Observability, Guardrails, and Drift

Because AI is probabilistic, it will eventually produce an answer that is technically “valid” but practically wrong or dangerous.

Silent Drift: A model that worked in January might be useless in June because the underlying user behavior has changed. You need automated alerts that flag when the distribution of model outputs shifts significantly.
Input/Output Guardrails: These are deterministic “wrappers” around the AI. If the model generates a response containing restricted words or PII, the guardrail blocks the output regardless of what the model intended.

Analogy:

Observability is your system’s shock absorber. Without it, every “probabilistic surprise” becomes a catastrophic jolt to the business.

Conclusion:

The Architect as Risk Manager

Success in AI isn’t about choosing the most sophisticated model; it’s about building a ship that stays afloat even when the engine (the model) behaves unpredictably. The most successful AI products aren’t those with the highest accuracy, they are the ones with the lowest Mean Time to Contain (MTTC) when things go wrong.

Actionable next step:

This Friday, run a “Pre-Mortem” with your team. Ask: “If we woke up to a $50,000 cloud bill tomorrow morning, which specific architectural loop allowed it to happen?” Then, go build the circuit breaker that prevents it.

Why This Ransomware Attack Failed

Krzysztof Bardoński — Mon, 16 Feb 2026 13:53:25 +0000

A post-mortem of a contained ransomware incident. Learn how architectural blast-radius control, identity-first policy, and immutable recovery turned a potential catastrophe into a minor operational blip.

Quick summary (for busy readers)

Shift from Perimeter to Blast Radius: Assume the "hull" will be breached; design independent pressure vessels to ensure the ship stays afloat.
Identity is the New Control Plane: Moving from network ACLs to Zero Trust Network Access (ZTNA) and Identity-Aware Proxies (IAP).
Immutability is Non-Negotiable: Use WORM (Write Once Read Many) storage to defeat "time-bomb" ransomware.
The Goal: Optimize for Mean Time to Contain (MTTC) and minimize the Blast Radius Ratio.

The Scenario:

When the ghost admin met the malware

A sophisticated session-hijack targeted a high-level administrator. The attacker expected the keys to the kingdom, but they encountered an architecture designed for “the ghost admin.“

Even with valid credentials, the attacker found an account with zero standing permissions. To execute any meaningful change, they would have needed to trigger a Just-In-Time (JIT) elevation request, verified by a hardware token and a second human approver. While the attacker sat idle, looking for leverage, a Digital Canary a hidden honeypot share named confidential_data triggered an automated isolation playbook the moment the first file was touched.

The attack was contained before a single production byte was encrypted.

Why These Controls Work (The Submarine Strategy)

A modern submarine doesn’t survive a hull breach by hoping the hull is unbreakable; it survives through independent pressure vessels.

Microsegmentation as a Pressure Seal: Move away from "Layer 3" thinking. By using Business-Capability Mapping, ensure that a breach in the Marketing segment cannot physically reach the Production Database segment.
The 1:1000 Blast Radius Metric: In a flat network, 1 compromised device = 100% exposure. Your goal should be a 1:1000 Ratio: ensuring that any single point of failure exposes less than 0.1% of the total estate.
The Immutable Time Traveler: Attackers often "dwell" for 60+ days to rotate out clean backups. By utilizing WORM storage with a 90-day retention lock, you deprive the attackers of leverage. Instead of "restoring from yesterday", the system enables you to time-travel back to a verified, immutable state in under two hours.

Practical Architecture Patterns & Trade-offs

Pattern	The "Why"	The Architect's Trade-off
ZTNA / IAP	Decouples access from network location; high-fidelity signals.	High OpEx: Requires deep integration with a robust IAM fabric.
Microsegmentation	Hardens the internal blast radius; stops lateral movement.	Initial Friction: Requires a canonical service inventory and identity model.
JIT / Least Privilege	Eliminates standing "Admin" targets for credential harvesters.	Operational Latency: Requires automation to prevent "Security Fatigue" in staff.
Air-Gapped Recovery	Ensures the backup control plane is separate from the domain.	Complexity: Demands rigorous, automated recovery drills.

Common Mistakes

That Let Ransomware Win

The "Crunchy Shell, Soft Center" Network: Relying on firewalls while leaving SMB/RDP wide open internally.
Shared Fate Backups: Using the same Active Directory credentials for both production and backup storage.
Static MFA: Failing to account for modern session-hijacking that bypasses simple 2FA.
The "Shadow" Recovery Plan: Having backups but never testing the Recovery Time Objective (RTO) for the Active Directory itself.

Pro Tips:

The Executive Checklist

Deploy "Digital Canaries": Create honeypot files and shares that trigger immediate isolation alerts.
Enforce Device Attestation: Treat unmanaged or "unhealthy" devices as high-risk, regardless of the user's credentials.
Audit the "Blast Radius Ratio": Map your network and ask: "If this node is compromised, what is the maximum percentage of data it can touch?"
Automated Response: Don't wait for a human to read an alert; automate the isolation of the workload the moment mass-file encryption behavior is detected.

Final takeaways

Ransomware is not a binary win/lose event, it’s an engineering problem. The organizations that “win” are those that stop trying to build a perfect wall and start building a better ship. Focus on containment, make identity your primary control plane, and ensure your recovery path is immutable.

Actionable next step:

This Friday, run a two-hour “Active Directory Recovery” tabletop. Don’t focus on the data – focus on how you restore the identity provider itself after it’s been wiped. If you can’t restore AD, your data backups are useless.

When “More” Makes the System Worse

Krzysztof Bardoński — Tue, 10 Feb 2026 11:05:30 +0000

If new features are progress, why do mature systems so often feel slower, harder to change, and more fragile every year?

Quick summary (for busy readers)

New features create long-term architectural drag, not just short-term cost
Stakeholder pressure can be reframed using maintenance vs. innovation budgets
Common enterprise failures are mundane, not dramatic
Telemetry beats assumptions when deciding whether to build
Features should have a sunset plan the moment they ship

Why “adding a feature” is deceptively attractive

Features are easy to justify:

What’s harder to see is the compounding cost curve:

A feature is rarely “just a feature.” It’s a permanent expansion of the system’s problem space.

Stakeholder conflict:

How to reframe the conversation

When architects push back on features, the conversation often stalls at “engineering is blocking progress.” Saying “no” rarely works.

A more effective tactic is to reframe the discussion in budget terms.

The Maintenance vs. Innovation Budget

Every system has two invisible budgets:

Innovation Budget – capacity for new capabilities
Maintenance Budget – effort spent keeping existing behavior stable

New features withdraw from both:

Immediately from innovation (build cost)
Continuously from maintenance (bugs, upgrades, support)

Practical tip:

Instead of rejecting a feature, show stakeholders:

How much of the next 12 months’ capacity will be consumed maintaining it
What other initiatives that maintenance load will crowd out

You can make this concrete by tagging Jira tickets as “Feature” vs. “Maintenance / Tech Debt” and showing the trend line over four quarters. Once stakeholders see maintenance steadily eating into delivery capacity, the trade-off becomes visible – and the conversation changes.

A more relatable failure:

The permissions matrix nobody needed

Not all failures are dramatic. Most are painfully ordinary.

The scenario

A mid-sized B2B platform started with:

Two roles: Admin and User
Clear ownership boundaries
Simple authorization checks

Then came feature requests:

“Read-only admin”
“Billing admin”
“Support user”
“Temporary access”
“Regional admin”

The outcome

A permissions matrix with dozens of combinations
Authorization checks on nearly every API call
Slower response times and harder caching
Engineers afraid to touch auth logic
Customers confused about what roles actually meant

In hindsight, 90% of use cases could have been solved with:

Admin vs. User
One or two scoped toggles (e.g., billing access)

This is how systems rot – not from ambition, but from unquestioned accumulation.

A mental model you can show on one slide

Use this simple comparison when discussing scope:

Aspect	Feature Cost	System Complexity	Stakeholder View
Growth pattern	Linear	Exponential	“Steady progress”
Visibility	High	Low	“Looks cheap”
Reversibility	Medium	Hard	“We can undo it later”
Long-term impact	Predictable	Emergent	“Invisible cost”

Key insight:

You pay for features once. You pay for complexity forever.

This table also explains why feature debates are so difficult: architects experience the exponential curve directly, while stakeholders mostly see what ships.

Deepened practical example:

Killing a feature before it existed

Sometimes small work can bring big value.

The scenario

A team proposed building a Custom Report Builder:

Drag-and-drop fields
Filters, joins, export formats
Significant backend and UI work

Before committing, the team added lightweight telemetry to the existing CSV export:

Number of exports per user
Columns included
Filters applied
Time-to-first-export

What the data showed

90% of users exported CSVs with the same 3 columns
Fewer than 5% applied any filters
Most exports were immediately opened in Excel and modified manually

The outcome

Instead of a report builder:

Result:

User satisfaction increased, delivery took days instead of months, and the system avoided a major new UI and data-processing surface.

Telemetry turned a speculative feature into a non-event.

Feature = permanently deployed debt

Think of every feature as:

A library you must support forever
A behavior users will depend on
A contract that constrains future design

Like financial debt, it can be useful – but only if:

You understand the interest rate
You have a plan to pay it down

Tactical addition:

The sunsetting protocol

If features are debt, retirement must be intentional.

A simple sunsetting protocol

Add this to your architecture or product governance process:

Usage threshold – Define “healthy usage” (e.g., % of active users, frequency)
Owner assignment – Every feature has a named owner responsible for its health
Deprecation signal – Log usage, warn internally when it drops below threshold
Soft removal:
- Hide from UI, keep API compatibility
- Add observability hooks (e.g., `feature_used` logs or kill switches)
- If the usage signal stays at zero for 30 days, the risk of hard removal is statistically negligible – even for that “one critical customer workflow”
Hard removal – Delete code paths, schemas, tests.

Critical warning:

If you don’t practice removal, your system will only ever grow – until change becomes impossible.

Common mistakes (and how to avoid them)

Final takeaways

If you remember one thing:

The best systems aren’t defined by what they can do – but by what they deliberately choose not to.