Knowing Better IT https://kb-it.net Get inspired. Make the change. Mon, 23 Feb 2026 11:01:57 +0000 en-US hourly 1 https://wordpress.org/?v=6.9.1 https://kb-it.net/wp-content/uploads/2026/02/cropped-logo_v2-32x32.png Knowing Better IT https://kb-it.net 32 32 Stop Adding AI to Your Architecture https://kb-it.net/stop-adding-ai-to-your-architecture/ Wed, 18 Feb 2026 12:37:13 +0000 https://kb-it.net/stop-adding-ai-to-your-architecture/
If your AI initiative looks like “We’ll call the model from this service”, you’re not integrating AI. You’re adding a very expensive, probabilistic dependency into a deterministic system.
I’ve seen multiple architectures where AI was “successfully integrated.” The pattern was always the same: a new microservice wrapping a model API, wired into an existing request/response flow. It worked in staging. In production, it amplified latency, introduced non-determinism, and quietly inflated OpEx.
The problem isn’t the model. The problem is trying to graft AI onto an architecture that was never designed for it.
AI is not a feature. It is an architectural force multiplier. And multipliers amplify weaknesses.

Quick summary (for busy readers)

The Core Mistake:

Treating AI as Just Another Service

Most distributed systems are designed around three assumptions:
  1. Deterministic outputs
  2. Predictable latency
  3. Stable cost per request
AI violates all three.
When you insert a large language model or ML inference endpoint into a synchronous flow, you introduce:
  • Variable latency (cold starts, token length variance, queue contention)
  • Non-deterministic outputs
  • Cost tied to input size and usage patterns
You don’t “add” something like that. You redesign around it.

The Three Architectural Illusions

Illusion 1: “It’s Just Another API”

A traditional API:
  • Returns deterministic responses
  • Has bounded execution time
  • Fails explicitly
An AI endpoint:
  • Produces probabilistic outputs
  • May degrade in quality without failing
  • Can hallucinate confidently
Architectural implication: You cannot treat AI responses as authoritative truth inside a transactional workflow. If your order-processing flow blocks on a generative AI call for product classification, you’ve just tied revenue to a probabilistic function.

Redesign principle: AI outputs should influence decisions, not directly execute them.

Illusion 2: “We’ll Just Scale It”

AI scaling introduces:
  • GPU allocation constraints
  • Token-based billing volatility
  • Throughput degradation under context growth
Scenario:
A team once embedded document summarization into a customer portal. Everything worked until users started pasting 40-page PDFs. Token usage multiplied cost per request by 6x within days.
If you do not design economic constraints at the architectural level, you are trusting user behavior to stay rational.
That’s not a strategy.
Redesign principle: AI systems require economic circuit breakers not just autoscaling.
Examples:
  • Hard token limits or budgets per tenant or environment.
  • Fast token estimation mechanisms before hitting the model.
  • Fallback to cheaper models under load.
  • Context compression pipelines.

Illusion 3: “We’ll Wrap It in a Microservice”

Wrapping AI inside a microservice does not isolate its complexity. It pushes uncertainty deeper into the system.
Common symptoms:
  • Retry storms due to transient inference failures
  • Latency propagation across synchronous chains
  • Observability blind spots (you log HTTP 200, but quality is degrading)
You don’t need another microservice. You need a new interaction model.

What Redesign Actually Means

Redesign does not mean rewriting your platform. It means changing architectural posture around four core areas.

A. Move From Synchronous to Asynchronous by Default

Most AI integrations fail because they assume request/response is sacred.
Instead:
  • Queue AI tasks
  • Allow partial responses
  • Accept eventual consistency where possible
  • Separate user interaction from heavy inference
Mental model:
Treat AI like a background analyst, not a blocking function.
When we redesigned a document processing system this way, perceived latency dropped even though actual processing time remained the same.

B. Separate Deterministic and Probabilistic Domains

Do not mix business critical logic with probabilistic outputs.
Create boundaries:
  • Deterministic core: billing, compliance, state transitions
  • Probabilistic layer: classification, summarization, recommendations
The deterministic layer must validate, constrain, or post-process AI outputs.

This prevents:
  • Hallucinated database writes
  • Invalid state transitions
  • Regulatory exposure

C. Design for Model Evolution

AI components evolve faster than traditional services. If your architecture assumes:
  • One model
  • One embedding strategy
  • One prompt structure
…it will eventually fail.
Redesign patterns:
  • Abstract model providers behind capability interfaces
  • Externalize prompts/configuration
  • Version embeddings and schemas explicitly
Models are volatile dependencies. Treat them like third-party infrastructure, not internal code.

D. Build Observability Around Quality, Not Just Uptime

Traditional monitoring answers:
  • Is the service up?
  • Is latency acceptable?
AI monitoring must answer:
  • Is output quality degrading?
  • Is distribution shifting?
  • Are hallucination rates increasing?
If you don’t track semantic drift, you are operating blind. This is where many “working” AI systems quietly deteriorate for months.
The Mental Model:

AI as a Volatile Core

Think of traditional architecture like steel beams. AI is not steel, it’s reinforced glass, powerful, transparent, valuable but brittle under incorrect load assumptions. If you embed reinforced glass into a structure designed only for steel, stress fractures appear.
Redesign means:
  • Adjusting load distribution
  • Adding structural buffers
  • Accounting for material properties
AI changes material properties of your system.

Signs You Haven’t Redesigned (You’ve Just Added AI)

If this describes your system, you haven’t integrated AI.
You’ve increased systemic risk.
Conclusion:

AI Demands Architectural Humility

Stop adding AI to your architecture.

Redesign around:
  • Probabilistic outputs
  • Economic volatility
  • Latency variability
  • Continuous evolution
You must recognize that AI is not a plug-in capability. It alters system dynamics at a structural level.
When you redesign intentionally, AI becomes leverage.
When you bolt it on, it becomes technical debt with a GPU bill attached.
Key Takeaways:
If you’re leading AI initiatives, don’t ask:
Where do we call the model?

Ask instead:
What must change in our architecture because we no longer control the output?

That question changes everything.
]]>
5 Architectural Decisions That Will Make or Break Your AI Product https://kb-it.net/5-architectural-decisions-that-will-make-or-break-your-ai-product/ Mon, 16 Feb 2026 18:13:41 +0000 https://kb-it.net/5-architectural-decisions-that-will-make-or-break-your-ai-product/
The most expensive part of your AI product isn't the model - it's the architecture you built to keep it alive
Scenario:
Last Black Friday, a client’s AI powered pricing engine triggered $20,000 in cloud overage in just 48 hours. The culprit? A recursive retry loop in their realtime inference pipeline. The model itself worked exactly as designed. The problem was the architecture surrounding it.
Architects know this story too well: AI does not fail gracefully. It magnifies every latent design flaw data bottlenecks, latency spikes, and runaway costs. This article cuts past the hype to focus on the decisions that actually determine if an AI product survives production, backed by hard-earned post-mortem lessons.

Quick summary (for busy readers)

1. Deployment Pattern: Real-Time, Batch, or Streaming

The Architect’s Dilemma: Sub-second inference is rarely free. Serverless GPU functions look attractive, but cold-start latency and provisioning delays often blow SLAs when traffic spikes.
  • Hidden trap: Treating serverless AI functions as “always available.” This leads to request queuing, which triggers client retries, leading to a death spiral of cost and latency.
  • Trade-off: Batch inference is cheap and stable but useless for interactive UX. Streaming pipelines are resilient but introduce significant orchestration overhead.
Post-mortem — “The $20,000 Weekend”:
A fraud detection model was deployed on ephemeral GPU instances. During a traffic spike, cold starts triggered a 5-second delay. The calling service retried every 1 second. The cloud bill doubled before the on-call engineer even logged in. The fix wasn’t a better model; it was implementing pre-warmed instances and aggressive request throttling.

2. Data Architecture and the Vector DB Trap

AI systems are essentially data pipelines with an expensive, probabilistic function in the middle. If your plumbing leaks, your model is irrelevant.
The Vector DB trap: Specialized vector databases are the “shiny toy” of the year. However, their OpEx is high. In 90% of use cases, an extension like PGVector on your existing Postgres instance provides 100% of the required functionality at a fraction of the cost.
Feature consistency: Without a proper feature store, the data used for training will differ from the data seen at inference time (training-serving skew), causing models to fail in ways that are nearly impossible to debug.
Post-mortem — “The RAG Bottleneck":
A knowledge retrieval system used a fully managed vector DB. Query latency hit 3 seconds at peak. By migrating to Postgres + PGVector, the team halved latency and reduced infrastructure costs by 75%, simply by keeping the embeddings closer to the primary application data.

3. Model Management and Lifecycle Complexity

Models are production artifacts, not static scripts. Treating them as “special” usually means they bypass the rigor of standard software engineering.
  • Versioning pitfalls: Deploying a “better” model version without a side-by-side (A/B) test or a clear rollback path is a recipe for silent failure.
  • CI/CD for ML: Automated testing must include “model evaluation” steps. Does the new version still pass the gold-standard test set, or did it trade general accuracy for a specific performance gain?
Post-mortem — “The Hallucinating Support Bot”:
A fraud detection model was deployed on ephemeral GPU instances. During a traffic spike, cold starts triggered a 5-second delay. The calling service retried every 1 second. The cloud bill doubled before the on-call engineer even logged in. The fix wasn’t a better model; it was implementing pre-warmed instances and aggressive request throttling.

4. Scalability, Latency, and Economic Circuit Breakers

AI architecture is unique because a logic bug can have immediate, five-figure financial consequences. Traditional auto-scaling focuses on availability; AI scaling must focus on economic survival.
Token Budgets: Implement hard limits on the number of tokens a specific user, session, or tenant can consume. This prevents “prompt injection” or recursive loops from draining your budget.
Tiered Fallbacks: When latency spikes or primary GPU clusters are full, the architecture should automatically fall back to a “smaller” model (e.g., falling back from a 70B parameter model to an 8B model). This preserves UX at the cost of a temporary dip in intelligence.
The API Gateway Guard: Economic circuit breakers should live at the Gateway level. If the cost-per-minute exceeds a threshold, the system should kill the connection before the cloud provider’s billing department does.
Feature consistency: Without a proper feature store, the data used for training will differ from the data seen at inference time (training-serving skew), causing models to fail in ways that are nearly impossible to debug.
Takeaway:
Scalability in AI isn’t just about handling load; it’s about Financial Engineering. You need the ability to “degrade gracefully” rather than “spend infinitely.

5. Observability, Guardrails, and Drift

Because AI is probabilistic, it will eventually produce an answer that is technically “valid” but practically wrong or dangerous.
  • Silent Drift: A model that worked in January might be useless in June because the underlying user behavior has changed. You need automated alerts that flag when the distribution of model outputs shifts significantly.
  • Input/Output Guardrails: These are deterministic “wrappers” around the AI. If the model generates a response containing restricted words or PII, the guardrail blocks the output regardless of what the model intended.
Analogy:
Observability is your system’s shock absorber. Without it, every “probabilistic surprise” becomes a catastrophic jolt to the business.
Conclusion:

The Architect as Risk Manager

Success in AI isn’t about choosing the most sophisticated model; it’s about building a ship that stays afloat even when the engine (the model) behaves unpredictably. The most successful AI products aren’t those with the highest accuracy, they are the ones with the lowest Mean Time to Contain (MTTC) when things go wrong.
Actionable next step:
This Friday, run a “Pre-Mortem” with your team. Ask: “If we woke up to a $50,000 cloud bill tomorrow morning, which specific architectural loop allowed it to happen?” Then, go build the circuit breaker that prevents it.
]]>
Why This Ransomware Attack Failed https://kb-it.net/why_this_ransomware_attack_failed/ Mon, 16 Feb 2026 13:53:25 +0000 https://kb-it.net/why_this_ransomware_attack_failed/ architectural engineering. Here is how to design for the inevitable breach and win.]]>
A post-mortem of a contained ransomware incident. Learn how architectural blast-radius control, identity-first policy, and immutable recovery turned a potential catastrophe into a minor operational blip.

Quick summary (for busy readers)

The Scenario:

When the ghost admin met the malware

A sophisticated session-hijack targeted a high-level administrator. The attacker expected the keys to the kingdom, but they encountered an architecture designed for “the ghost admin.

Even with valid credentials, the attacker found an account with zero standing permissions. To execute any meaningful change, they would have needed to trigger a Just-In-Time (JIT) elevation request, verified by a hardware token and a second human approver. While the attacker sat idle, looking for leverage, a Digital Canary a hidden honeypot share named confidential_data triggered an automated isolation playbook the moment the first file was touched.

The attack was contained before a single production byte was encrypted.

Why These Controls Work (The Submarine Strategy)

A modern submarine doesn’t survive a hull breach by hoping the hull is unbreakable; it survives through independent pressure vessels.

Practical Architecture Patterns & Trade-offs

Pattern The "Why" The Architect's Trade-off
ZTNA / IAP Decouples access from network location; high-fidelity signals. High OpEx: Requires deep integration with a robust IAM fabric.
Microsegmentation Hardens the internal blast radius; stops lateral movement. Initial Friction: Requires a canonical service inventory and identity model.
JIT / Least Privilege Eliminates standing "Admin" targets for credential harvesters. Operational Latency: Requires automation to prevent "Security Fatigue" in staff.
Air-Gapped Recovery Ensures the backup control plane is separate from the domain. Complexity: Demands rigorous, automated recovery drills.
Common Mistakes

That Let Ransomware Win

Pro Tips:

The Executive Checklist

Final takeaways

Ransomware is not a binary win/lose event, it’s an engineering problem. The organizations that “win” are those that stop trying to build a perfect wall and start building a better ship. Focus on containment, make identity your primary control plane, and ensure your recovery path is immutable.
Actionable next step:
This Friday, run a two-hour “Active Directory Recovery” tabletop. Don’t focus on the data – focus on how you restore the identity provider itself after it’s been wiped. If you can’t restore AD, your data backups are useless.
]]>
When “More” Makes the System Worse https://kb-it.net/when_more_makes_the_system_worse/ Tue, 10 Feb 2026 11:05:30 +0000 https://kb-it.net/when_more_makes_the_system_worse/

If new features are progress, why do mature systems so often feel slower, harder to change, and more fragile every year?

Quick summary (for busy readers)

Why “adding a feature” is deceptively attractive

Features are easy to justify:
What’s harder to see is the compounding cost curve:

A feature is rarely “just a feature.” It’s a permanent expansion of the system’s problem space.

Stakeholder conflict:

How to reframe the conversation

When architects push back on features, the conversation often stalls at “engineering is blocking progress.” Saying “no” rarely works.

A more effective tactic is to reframe the discussion in budget terms.

The Maintenance vs. Innovation Budget

Every system has two invisible budgets:
  • Innovation Budget – capacity for new capabilities
  • Maintenance Budget – effort spent keeping existing behavior stable
New features withdraw from both:
  • Immediately from innovation (build cost)
  • Continuously from maintenance (bugs, upgrades, support)
Practical tip:
Instead of rejecting a feature, show stakeholders:
  • How much of the next 12 months’ capacity will be consumed maintaining it
  • What other initiatives that maintenance load will crowd out
You can make this concrete by tagging Jira tickets as “Feature” vs. “Maintenance / Tech Debt” and showing the trend line over four quarters. Once stakeholders see maintenance steadily eating into delivery capacity, the trade-off becomes visible – and the conversation changes.
A more relatable failure:

The permissions matrix nobody needed

Not all failures are dramatic. Most are painfully ordinary.
The scenario
A mid-sized B2B platform started with:
  • Two roles: Admin and User
  • Clear ownership boundaries
  • Simple authorization checks
Then came feature requests:
  • “Read-only admin”
  • “Billing admin”
  • “Support user”
  • “Temporary access”
  • “Regional admin”
The outcome
In hindsight, 90% of use cases could have been solved with:
  • Admin vs. User
  • One or two scoped toggles (e.g., billing access)
This is how systems rot – not from ambition, but from unquestioned accumulation.

A mental model you can show on one slide

Use this simple comparison when discussing scope:
Aspect Feature Cost System Complexity Stakeholder View
Growth pattern Linear Exponential “Steady progress”
Visibility High Low “Looks cheap”
Reversibility Medium Hard “We can undo it later”
Long-term impact Predictable Emergent “Invisible cost”
Key insight:
You pay for features once. You pay for complexity forever.
This table also explains why feature debates are so difficult: architects experience the exponential curve directly, while stakeholders mostly see what ships.
Deepened practical example:

Killing a feature before it existed

Sometimes small work can bring big value.
The scenario
A team proposed building a Custom Report Builder:
  • Drag-and-drop fields
  • Filters, joins, export formats
  • Significant backend and UI work
Before committing, the team added lightweight telemetry to the existing CSV export:
  • Number of exports per user
  • Columns included
  • Filters applied
  • Time-to-first-export
What the data showed
  • 90% of users exported CSVs with the same 3 columns
  • Fewer than 5% applied any filters
  • Most exports were immediately opened in Excel and modified manually
The outcome
Instead of a report builder:
Result:
User satisfaction increased, delivery took days instead of months, and the system avoided a major new UI and data-processing surface.

Telemetry turned a speculative feature into a non-event.

Feature = permanently deployed debt

Think of every feature as:
  • A library you must support forever
  • A behavior users will depend on
  • A contract that constrains future design
Like financial debt, it can be useful – but only if:
  • You understand the interest rate
  • You have a plan to pay it down
Tactical addition:

The sunsetting protocol

If features are debt, retirement must be intentional.
A simple sunsetting protocol
Add this to your architecture or product governance process:
  1. Usage threshold – Define “healthy usage” (e.g., % of active users, frequency)
  2. Owner assignment – Every feature has a named owner responsible for its health
  3. Deprecation signal – Log usage, warn internally when it drops below threshold
  4. Soft removal:
    • Hide from UI, keep API compatibility
    • Add observability hooks (e.g., `feature_used` logs or kill switches)
    • If the usage signal stays at zero for 30 days, the risk of hard removal is statistically negligible – even for that “one critical customer workflow
  5. Hard removal – Delete code paths, schemas, tests.
Critical warning:
If you don’t practice removal, your system will only ever grow – until change becomes impossible.

Common mistakes (and how to avoid them)

Final takeaways

If you remember one thing:
The best systems aren’t defined by what they can do – but by what they deliberately choose not to.
]]>