PRIVACY BY DESIGN • SESSION-BASED ARCHITECTURE

Document Intelligence Architecture

Stateless document processing with ephemeral memory architecture. Transform documents for AI chatbots, RAG pipelines, and semantic search—with configurable cloud deployment options.

This architecture is documented as a reusable blueprint for multi-modal, privacy-first document intelligence systems.

Diagrams Ingest RAG Inference SQL CI/CD Enterprise Security

Interactive Architecture Diagrams

Click and explore the system architecture. These diagrams are dynamically rendered using Mermaid.js.

RAG Pipeline Sequence

sequenceDiagram participant U as User participant API as Flask API participant R as Smart Router participant E as Embedder participant V as Vector Store participant LLM as OpenRouter U->>API: Upload Document API->>API: Parse & Convert to MD API->>E: Chunk & Embed E->>V: Store Vectors V-->>API: Indexed ✓ U->>API: Ask Question API->>E: Embed Query E->>V: Similarity Search V-->>API: Top-K Chunks API->>R: Analyze Complexity R-->>API: Model Selection API->>LLM: Context + Query LLM-->>U: Streamed Response

Smart Router Decision Logic

flowchart TD A[User Query] --> B{Complexity Analysis} B -->|Simple Q&A| C[Flash Model] B -->|Complex Reasoning| D[Pro Model] B -->|Technical Code| E[Specialized Model] C --> F{Domain Profile?} D --> F E --> F F -->|Legal| G[Legal Prompt] F -->|Medical| H[Medical Prompt] F -->|Technical| I[Tech Prompt] F -->|General| J[Base Prompt] G --> K[OpenRouter Gateway] H --> K I --> K J --> K K --> L[Response + Metrics] style A fill:#eff6ff,stroke:#2563eb style C fill:#d1fae5,stroke:#10b981 style D fill:#fef3c7,stroke:#f59e0b style E fill:#ede9fe,stroke:#8b5cf6 style L fill:#f0fdf4,stroke:#10b981

GCP Infrastructure Map

graph TB subgraph Internet U[Users] GH[GitHub Actions] end subgraph GCP["Google Cloud Platform"] LB[Cloud Load Balancer] subgraph Compute["Compute Engine"] VM[e2-micro VM] GUNICORN[Gunicorn Workers] FLASK[Flask App] end subgraph Security["Security Layer"] FW[Firewall Rules] SA[Service Account] end end subgraph External["External Services"] OR[OpenRouter API] EMBED[Jina Embeddings] end U --> LB LB --> FW FW --> VM VM --> GUNICORN GUNICORN --> FLASK FLASK --> OR FLASK --> EMBED GH -->|SSH Deploy| SA SA --> VM style LB fill:#4285f4,stroke:#1a73e8,color:#fff style VM fill:#34a853,stroke:#0f9d58,color:#fff style OR fill:#8b5cf6,stroke:#7c3aed,color:#fff

Stage 1: Multimodal Ingest & Processing

System Flow

📂

Ingest

Docs & Images

→

🔀

Router

Text vs Vision

→

⚡

Parse

OCR / Extract

→

📝

Unified

Markdown

→

🗑️

Cleanup

Ephemeral Storage

Requests are processed in-memory; transient cache entries exist for up to 30/10 minutes for async workflows, then are wiped.

↓

This architecture demonstrates a scalable, secure, and observable system for processing documents and generating insights using Large Language Models.

Stage 2: RAG Preparation & Retrieval

Data preparation for AI chatbots, semantic search, and retrieval-augmented generation. Supports 100k+ document estates with multi-tenant collections and idempotent re-indexing.

RAG Pipeline

Document

Any format

→

Markdown

Clean text

→

Chunk

tiktoken accurate

→

Embed

768-dim (config)

→

Export

Vector-ready

JSONL

Streaming

ChromaDB

Collection ready

LanceDB

Table ready

Qdrant

Collection ready

PostgreSQL

pgvector ready

Supports large document estates, multi-tenant indexing, and idempotent re-runs, so enterprises can plug this stage into existing data lakes, catalogs, and governance workflows.

↓

Stage 3: RAG Inference & Generation

Real-time retrieval and answer generation pipeline

Inference Flow

User Query

Natural Language

→

Embed

Query Vector

→

Vector Search

Top-K Chunks

→

LLM Gen

Context + Prompt

→

Response

Cited Answer

SQL Intelligence Layer

NEW

Bring Your Own Database (BYOD) with natural language queries. Universal SQL Builder supports multiple ingestion strategies with read-only security enforcement.

SQL Sandbox Flow

📂

User Upload

.db / .sql / .csv / .xlsx

→

🔍

Format Detection

Universal SQL Builder

→

🔒

SQLite Instance

Read-Only Mode

→

🤖

SQL Agent

LangChain + LLM

→

📊

Results + SQL

Glass Box AI

Native SQLite

.db, .sqlite, .sqlite3

Direct read-only URI mode

SQL Dump Rehydration

.sql files

Temp DB from executescript()

Spreadsheet Translation

.csv, .xlsx

Pandas normalization → SQLite

🛡️ Security Boundaries

✓ Read-Only URI: ?mode=ro&uri=true

✓ 50MB upload limit

✓ 5s query timeout

✓ 1000 row result limit

✗ No DML (INSERT/UPDATE/DELETE blocked)

✓ Session-scoped ephemeral storage

Engineering Pipelines

🚀 CI/CD Automation Pipeline (Designed and Implemented by Me)

I designed and implemented a fully automated CI/CD pipeline that takes MegaDoc from commit to production on GCP with DevSecOps (gitleaks, bandit, safety), quality gates, and zero-downtime deploys. This is the same pipeline I use for my own projects and can adapt for client or employer environments.

CI/CD Flow

Code Push

GitHub Main

→

Code Quality

Lint + Security

→

CI: Test

GitHub Actions

→

Release

Version Tag

→

CD: Deploy

GCP VM + nginx

Every push to main triggers a GitHub Actions workflow that runs security scans (gitleaks, bandit, safety), quality gates, and smoke tests before deploying to GCP VM with nginx reverse proxy and Let's Encrypt SSL.

GitHub Platform Features

Multi-Stage Workflow: Build → Test → Security → Deploy.
Branch Protection: Required reviews, status checks passed.
GitHub Environments: Manual approval for Production.
Automated Versioning: Semantic Release based on commits.

Jira & Confluence Integration

Jira Sync: PRs validate issue keys and update Jira status.
PR Linking: All PRs must link to a Jira Ticket.
Build Summaries: CI status posted to Jira comments.
Release Logs: Confluence page updated automatically.

Zero-Downtime Deployment

Staging Gate: Staging must pass before Production deploy.
PID-Based Management: Graceful process restart with health checks.
HTTPS Auto-Config: Let's Encrypt with auto-renewal.

Governance & Auditability

Full Traceability: Commit → PR → Jira → Build → Deploy link.
Policy as Code: Security configs version-controlled and validated.

Zero Trust Security & Governance

Defense in Depth: In the current demo, the Zero Trust controls are simulated and validated in design; the reference deployment uses Cloud Armor, Istio, and mTLS in GKE Autopilot.

Edge

Gateway

Model

Curator

Output

Infrastructure

Network:

Cloud Armor (WAF)
DDoS Protection
Private VPC

Zero Trust:

mTLS (Istio Mesh)
Service Identity
Rate Limiting

Application & AI

Input Guard:

Prompt Injection
PII Redaction
Magic Byte Check

Output Guard:

Hallucination Check
Toxicity Filter
Citation Verify

Compliance

Data:

Encryption at Rest
Ephemeral Storage (TTL-based)
Data Sovereignty

Audit:

Immutable logs of all AI decisions and access events.

Technology Stack

The public demo uses lightweight SQLite and GCS for hosting, but the processing path is stateless and can be swapped to fully ephemeral or enterprise data stores.

Backend

Python 3.11+
Flask
MarkItDown
SQLite

AI/NLP

tiktoken
scikit-learn
sentence-transformers
langdetect

Data

ChromaDB / LanceDB (Demo)
Qdrant / PGVector (Enterprise)
GCS Hosting
Embeddings

ChromaDB and LanceDB are used in the demo; Qdrant and pgvector/PostgreSQL are supported as deployment targets for enterprise environments.

Security

CSRF Protection
Magic Byte File Validation
Rate Limiting

OWASP Top 10 controls designed into the gateway layer. Air-Gap Ready via pluggable local-inference path.

Infrastructure

GCP Compute Engine
nginx + Let's Encrypt
GitHub Actions CI/CD

Target Enterprise Architecture (Reference)

Blueprint This reference architecture is what I use to discuss trade-offs and adaptation paths when aligning with new environments and constraints. This blueprint demonstrates how the system scales for enterprise workloads (10k+ QPS).

⚡ Event-Driven Ingestion (Kafka + GKE)

I designed this ingestion backbone using Kafka + GKE to handle large, bursty document flows from enterprise systems (SharePoint, S3, etc.).

Connectors

SharePoint/S3

→

Apache Kafka

Event Backbone

→

Ingestion Pods

GKE Autopilot

🚀 HA Inference Cluster (Istio + vLLM)

Zero-trust service mesh with auto-scaling inference endpoints

Global LB

Multi-Region

→

Istio Gateway

mTLS / Rate Limit

→

Serving Pods

vLLM / KEDA

Operational Excellence: Observability & FinOps

📊 Full-Stack Observability

✓
Golden Signals: Latency (P95/P99), Error Rate, Traffic, Saturation.
✓
AI Metrics: Time-to-First-Token (TTFT), Cache Hit Rate, RAG Retrieval Score.
✓
Tracing: OpenTelemetry for end-to-end request tracing.

💰 FinOps & Cost Strategy

✓
Ingest Efficiency: Spot Instances for stateless worker nodes (60% savings).
✓
Inference Scaling: Scale-to-Zero policies during off-peak hours.
✓
Token Optimization: Semantic Caching reduces cost by 30%.