PRIVACY BY DESIGN β€’ SESSION-BASED ARCHITECTURE

Document Intelligence Architecture

Stateless document processing with ephemeral memory architecture. Transform documents for AI chatbots, RAG pipelines, and semantic searchβ€”with configurable cloud deployment options.

This architecture is documented as a reusable blueprint for multi-modal, privacy-first document intelligence systems.

Interactive Architecture Diagrams

Click and explore the system architecture. These diagrams are dynamically rendered using Mermaid.js.

RAG Pipeline Sequence

sequenceDiagram participant U as User participant API as Flask API participant R as Smart Router participant E as Embedder participant V as Vector Store participant LLM as OpenRouter U->>API: Upload Document API->>API: Parse & Convert to MD API->>E: Chunk & Embed E->>V: Store Vectors V-->>API: Indexed βœ“ U->>API: Ask Question API->>E: Embed Query E->>V: Similarity Search V-->>API: Top-K Chunks API->>R: Analyze Complexity R-->>API: Model Selection API->>LLM: Context + Query LLM-->>U: Streamed Response

Smart Router Decision Logic

flowchart TD A[User Query] --> B{Complexity Analysis} B -->|Simple Q&A| C[Flash Model] B -->|Complex Reasoning| D[Pro Model] B -->|Technical Code| E[Specialized Model] C --> F{Domain Profile?} D --> F E --> F F -->|Legal| G[Legal Prompt] F -->|Medical| H[Medical Prompt] F -->|Technical| I[Tech Prompt] F -->|General| J[Base Prompt] G --> K[OpenRouter Gateway] H --> K I --> K J --> K K --> L[Response + Metrics] style A fill:#eff6ff,stroke:#2563eb style C fill:#d1fae5,stroke:#10b981 style D fill:#fef3c7,stroke:#f59e0b style E fill:#ede9fe,stroke:#8b5cf6 style L fill:#f0fdf4,stroke:#10b981

GCP Infrastructure Map

graph TB subgraph Internet U[Users] GH[GitHub Actions] end subgraph GCP["Google Cloud Platform"] LB[Cloud Load Balancer] subgraph Compute["Compute Engine"] VM[e2-micro VM] GUNICORN[Gunicorn Workers] FLASK[Flask App] end subgraph Security["Security Layer"] FW[Firewall Rules] SA[Service Account] end end subgraph External["External Services"] OR[OpenRouter API] EMBED[Jina Embeddings] end U --> LB LB --> FW FW --> VM VM --> GUNICORN GUNICORN --> FLASK FLASK --> OR FLASK --> EMBED GH -->|SSH Deploy| SA SA --> VM style LB fill:#4285f4,stroke:#1a73e8,color:#fff style VM fill:#34a853,stroke:#0f9d58,color:#fff style OR fill:#8b5cf6,stroke:#7c3aed,color:#fff

Stage 1: Multimodal Ingest & Processing

System Flow
πŸ“‚
Ingest
Docs & Images
β†’
πŸ”€
Router
Text vs Vision
β†’
⚑
Parse
OCR / Extract
β†’
πŸ“
Unified
Markdown
β†’
πŸ—‘οΈ
Cleanup
Ephemeral Storage

Requests are processed in-memory; transient cache entries exist for up to 30/10 minutes for async workflows, then are wiped.

↓

This architecture demonstrates a scalable, secure, and observable system for processing documents and generating insights using Large Language Models.

Stage 2: RAG Preparation & Retrieval

Data preparation for AI chatbots, semantic search, and retrieval-augmented generation. Supports 100k+ document estates with multi-tenant collections and idempotent re-indexing.

RAG Pipeline
Document
Any format
β†’
Markdown
Clean text
β†’
Chunk
tiktoken accurate
β†’
Embed
768-dim (config)
β†’
Export
Vector-ready
JSONL
Streaming
ChromaDB
Collection ready
LanceDB
Table ready
Qdrant
Collection ready
PostgreSQL
pgvector ready

Supports large document estates, multi-tenant indexing, and idempotent re-runs, so enterprises can plug this stage into existing data lakes, catalogs, and governance workflows.

↓

Stage 3: RAG Inference & Generation

Real-time retrieval and answer generation pipeline

Inference Flow
User Query
Natural Language
β†’
Embed
Query Vector
β†’
Vector Search
Top-K Chunks
β†’
LLM Gen
Context + Prompt
β†’
Response
Cited Answer

SQL Intelligence Layer

NEW

Bring Your Own Database (BYOD) with natural language queries. Universal SQL Builder supports multiple ingestion strategies with read-only security enforcement.

SQL Sandbox Flow
πŸ“‚
User Upload
.db / .sql / .csv / .xlsx
β†’
πŸ”
Format Detection
Universal SQL Builder
β†’
πŸ”’
SQLite Instance
Read-Only Mode
β†’
πŸ€–
SQL Agent
LangChain + LLM
β†’
πŸ“Š
Results + SQL
Glass Box AI
Native SQLite
.db, .sqlite, .sqlite3
Direct read-only URI mode
SQL Dump Rehydration
.sql files
Temp DB from executescript()
Spreadsheet Translation
.csv, .xlsx
Pandas normalization β†’ SQLite

πŸ›‘οΈ Security Boundaries

βœ“ Read-Only URI: ?mode=ro&uri=true
βœ“ 50MB upload limit
βœ“ 5s query timeout
βœ“ 1000 row result limit
βœ— No DML (INSERT/UPDATE/DELETE blocked)
βœ“ Session-scoped ephemeral storage

Engineering Pipelines

πŸš€ CI/CD Automation Pipeline (Designed and Implemented by Me)

I designed and implemented a fully automated CI/CD pipeline that takes MegaDoc from commit to production on GCP with DevSecOps (gitleaks, bandit, safety), quality gates, and zero-downtime deploys. This is the same pipeline I use for my own projects and can adapt for client or employer environments.

CI/CD Flow
Code Push
GitHub Main
β†’
Code Quality
Lint + Security
β†’
CI: Test
GitHub Actions
β†’
Release
Version Tag
β†’
CD: Deploy
GCP VM + nginx

Every push to main triggers a GitHub Actions workflow that runs security scans (gitleaks, bandit, safety), quality gates, and smoke tests before deploying to GCP VM with nginx reverse proxy and Let's Encrypt SSL.

GitHub Platform Features

  • Multi-Stage Workflow: Build β†’ Test β†’ Security β†’ Deploy.
  • Branch Protection: Required reviews, status checks passed.
  • GitHub Environments: Manual approval for Production.
  • Automated Versioning: Semantic Release based on commits.

Jira & Confluence Integration

  • Jira Sync: PRs validate issue keys and update Jira status.
  • PR Linking: All PRs must link to a Jira Ticket.
  • Build Summaries: CI status posted to Jira comments.
  • Release Logs: Confluence page updated automatically.

Zero-Downtime Deployment

  • Staging Gate: Staging must pass before Production deploy.
  • PID-Based Management: Graceful process restart with health checks.
  • HTTPS Auto-Config: Let's Encrypt with auto-renewal.

Governance & Auditability

  • Full Traceability: Commit β†’ PR β†’ Jira β†’ Build β†’ Deploy link.
  • Policy as Code: Security configs version-controlled and validated.

Zero Trust Security & Governance

Defense in Depth: In the current demo, the Zero Trust controls are simulated and validated in design; the reference deployment uses Cloud Armor, Istio, and mTLS in GKE Autopilot.

Edge
Gateway
Model
Curator
Output

Infrastructure

Network:
  • Cloud Armor (WAF)
  • DDoS Protection
  • Private VPC
Zero Trust:
  • mTLS (Istio Mesh)
  • Service Identity
  • Rate Limiting

Application & AI

Input Guard:
  • Prompt Injection
  • PII Redaction
  • Magic Byte Check
Output Guard:
  • Hallucination Check
  • Toxicity Filter
  • Citation Verify

Compliance

Data:
  • Encryption at Rest
  • Ephemeral Storage (TTL-based)
  • Data Sovereignty
Audit:

Immutable logs of all AI decisions and access events.

Technology Stack

The public demo uses lightweight SQLite and GCS for hosting, but the processing path is stateless and can be swapped to fully ephemeral or enterprise data stores.

Backend

  • Python 3.11+
  • Flask
  • MarkItDown
  • SQLite

AI/NLP

  • tiktoken
  • scikit-learn
  • sentence-transformers
  • langdetect

Data

  • ChromaDB / LanceDB (Demo)
  • Qdrant / PGVector (Enterprise)
  • GCS Hosting
  • Embeddings
ChromaDB and LanceDB are used in the demo; Qdrant and pgvector/PostgreSQL are supported as deployment targets for enterprise environments.

Security

  • CSRF Protection
  • Magic Byte File Validation
  • Rate Limiting
OWASP Top 10 controls designed into the gateway layer. Air-Gap Ready via pluggable local-inference path.

Infrastructure

  • GCP Compute Engine
  • nginx + Let's Encrypt
  • GitHub Actions CI/CD

Target Enterprise Architecture (Reference)

Blueprint This reference architecture is what I use to discuss trade-offs and adaptation paths when aligning with new environments and constraints. This blueprint demonstrates how the system scales for enterprise workloads (10k+ QPS).

⚑ Event-Driven Ingestion (Kafka + GKE)

I designed this ingestion backbone using Kafka + GKE to handle large, bursty document flows from enterprise systems (SharePoint, S3, etc.).

Connectors
SharePoint/S3
β†’
Apache Kafka
Event Backbone
β†’
Ingestion Pods
GKE Autopilot

πŸš€ HA Inference Cluster (Istio + vLLM)

Zero-trust service mesh with auto-scaling inference endpoints

Global LB
Multi-Region
β†’
Istio Gateway
mTLS / Rate Limit
β†’
Serving Pods
vLLM / KEDA

Operational Excellence: Observability & FinOps

πŸ“Š Full-Stack Observability

  • βœ“
    Golden Signals: Latency (P95/P99), Error Rate, Traffic, Saturation.
  • βœ“
    AI Metrics: Time-to-First-Token (TTFT), Cache Hit Rate, RAG Retrieval Score.
  • βœ“
    Tracing: OpenTelemetry for end-to-end request tracing.

πŸ’° FinOps & Cost Strategy

  • βœ“
    Ingest Efficiency: Spot Instances for stateless worker nodes (60% savings).
  • βœ“
    Inference Scaling: Scale-to-Zero policies during off-peak hours.
  • βœ“
    Token Optimization: Semantic Caching reduces cost by 30%.