Initializing neural network...

Machine Learning Engineer · Bogotá, Colombia

MANUEL
ALEJANDRO
DIAZ RUBIANO

Building end-to-end AI systems — from LLMs and RAG architectures to MLOps pipelines and production deployment.

Learn more Contact

Pipeline ready · inference Deployed

Latency p95

—

LLM Accuracy

—

Faithfulness

—

Throughput

—

CPU Usage

—

Uptime

99.8%

Hallucination

—

Scroll

About me

Manuel Alejandro Diaz Rubiano

AI Engineer & ML Specialist

Background

How I got here

I started my journey in statistics and mathematics, but quickly fell in love with the intersection of data and artificial intelligence. During my studies at Universidad Santo Tomás, I developed my thesis on topic analysis using the LDA model applied to the Colombian case.

I've worked across finance, customer service, and enterprise technology — moving from dashboards and KPI analysis to building production AI systems with LLMs, RAG, and multi-agent orchestration. Each role gave me a different perspective on how AI can create real impact.

Today, as an AI Engineer at H&Co Latam, I design end-to-end GenAIOps pipelines: from architecture to deployment, observability, and continuous improvement. My goal is to build reliable AI systems that don't just work in demos — they work in production.

Expertise & Stacks

My expertise

Full-cycle AI engineering — from raw data and model training to production deployment, observability, and continuous improvement. I choose tools based on the problem, not the trend.

Stack

LLMs & GenAI

Working daily with frontier models — from prompt design and structured outputs to fine-tuning and guardrails for production safety.

Claude OpenAI Mistral Llama Bedrock Guardrails Prompt Engineering Structured Outputs Fine-tuning · LoRA Pydantic Function Calling

Stack

Agents & Orchestration

Building multi-agent systems with conditional routing, parallel tool execution, and stateful workflows — from prototype to production.

LangGraph LangChain LlamaIndex MCP AWS Step Functions Bedrock Agents FastAPI

Stack

Vector & Retrieval

Designing hybrid RAG pipelines with semantic and lexical fusion — pgvector, dense embeddings, and reranking at scale.

Pinecone OpenSearch pgvector · HNSW Qdrant Chroma FAISS Redis BM25 · RRF Cohere Rerank sentence-transformers

Stack

Observability & Eval

Tracing every agent step, evaluating output quality, and closing the feedback loop — AI systems must be measurable to be trustworthy.

LangSmith Langfuse Arize Phoenix MLflow W&B CloudWatch

Stack

Cloud

Multi-cloud fluency across AWS, Azure, and GCP — with deep hands-on experience deploying managed AI services and serverless pipelines.

AWS Azure GCP AWS Bedrock GCP Vertex AI Azure AI SageMaker Lambda CDK GitHub Actions vLLM

Stack

Containers & IaC

Infrastructure as code from day one — containerized workloads, Kubernetes orchestration, and reproducible deployments across environments.

Docker Kubernetes Terraform CloudFormation Canary Releases CI/CD

Stack

ML & Deep Learning

Strong foundations in classical ML, NLP, and computer vision — from statistical modeling to training custom neural architectures.

PyTorch HuggingFace BERT · T5 XGBoost Scikit-learn U-Net Prophet ARIMA · LSTM TorchServe

Stack

Languages & Data

Python-first, but fluent across the data layer — SQL, R for analysis, TypeScript for APIs, and modern data tools for clean pipelines.

Python SQL R TypeScript Supabase PostgreSQL dbt

Values

What drives me

01

Impact-driven AI

Every architecture I design has production reliability and measurable business impact as its north star — not hype.
02

Model quality & alignment

My RLHF and evaluation background gives me a quality lens that goes beyond metrics — systems must behave predictably in the real world.
03

Continuous improvement

From Bayesian statistics to autonomous agents, I constantly evolve my knowledge and apply it where it matters most.

Now

What I'm building

001

AI Engineer at H&Co Latam

Leading end-to-end GenAIOps pipelines with AWS Bedrock, LangGraph, and OpenAI API for production LLM apps with multi-agent orchestration.
002

Advanced RAG architectures

Designing RAGOps with vector DBs (OpenSearch, Pinecone), semantic reranking, Phoenix Arize evaluation, and Kubernetes deployments.
003

Multi-agent systems

Exploring LangGraph orchestration patterns: conditional routing, parallel tool execution, and state management for complex AI workflows.

Selected projects

WHAT I'VE
BUILT

001

The making of the

LLM Agent with RAG

AI / NLP — AWS Bedrock

Production agentic system routing multi-channel messages through a LangGraph state machine — Pinecone vector search, Redis memory, OpenAI moderation guardrails, and Blue/Green Kubernetes deployment.

LangGraph Pinecone LangChain AWS Bedrock

Click to explore →

002

The making of the

ML Transaction Classifier

ML / Fintech — AWS SageMaker

Multi-tenant ML pipeline classifying financial transactions to General Ledger accounts using XGBoost, BERT embeddings, and AWS Step Functions — with per-tenant model specialization via feedback loops.

XGBoost SageMaker Step Functions AWS Lambda

Click to explore →

003

The making of the

Travel Metadata Processor

LLM / Serverless — AWS Bedrock

Serverless LLM pipeline extracting structured metadata from raw travel content using AWS Bedrock and Claude — transforming unstructured data into enriched, queryable records at scale.

AWS Bedrock Claude AI AWS Lambda

Click to explore →

004

The making of the

Serverless Cloud AWS Chatbot

AI / Cloud — AWS Serverless

100% AWS-native conversational AI platform on Bedrock Agent — with Bedrock Knowledge Base, Bedrock Guardrails, WebSocket API Gateway, Cognito auth, and CDK nested CloudFormation stacks. Zero external dependencies.

Bedrock Agent Guardrails Lambda CDK

Click to explore →

005

The making of the

Hybrid Vector Search Engine

NLP / Search — Open-Source

Master's thesis — multimodal search over Glovo's food catalog using pgvector HNSW + BM25 tsvector fused via RRF. Search by text or image: upload a burger photo and find burger restaurants. 100% open-source.

pgvector · HNSW Jina CLIP v2 Supabase FastAPI

Click to explore →

006

The making of the

NLP Sentiment Analysis

NLP / ML — HuggingFace

Fine-tuned BERT model for multi-class sentiment classification, served via FastAPI with HuggingFace Transformers.

BERT HuggingFace FastAPI

Click to explore →

007

The making of the

Recommendation System

ML / RecSys — TensorFlow

Collaborative + content-based recommendation engine using TensorFlow embeddings and Redis for real-time serving.

TensorFlow Embeddings Redis

Click to explore →

008

The making of the

U-Net Semantic Segmentation

CV / Medical — PyTorch

Medical image segmentation using U-Net architecture in PyTorch — trained for pixel-level classification of anatomical structures.

U-Net Medical PyTorch

Click to explore →

5 / 8 projects

Contact

LET'S TALK
ABOUT YOUR
PROJECT

alejandromadr@gmail.com LinkedIn GitHub Download CV

Field	Type	Description
header	JSON	Supplier name, contact, phone
itinerary	Array	Day-by-day breakdown — locations, activities, meals per day
mealsPlanBreakdown	JSON	Breakfast / lunch / dinner flags across the full trip
supplierCost	JSON	costOptions · tripCostBreakdowns · tripSupplements
includes / excludes	Arrays	Package inclusions and exclusions
confidence	Float 0–1	Key metric — overall reliability of the extraction
confidenceBreakdown	JSON	Per-evaluator scores + gate pass/fail per block

        Bedrock + Cohere Embed v3 — vector embeddings
        OpenAI GPT — inference & response generation
    

Field	Type	Description
original_question	str	Raw inbound message from customer
expanded_queries	List[str]	2–4 semantic variants generated for retrieval
documents	List[Document]	Retrieved knowledge base + FAQ chunks
estimates_summary	JSON	Customer pending / approved quotes from CRM
answer	str	Final output — formatted response for delivery
has_image	bool	Routes to multimodal path when True
image_base64	str	Base64-encoded image for GPT vision analysis
image_description	str	GPT vision analysis result of attached image
needs_rag	Optional[bool]	Pre-controller gate — False skips vector retrieval
confidence_score	Float 0–1	Aggregate retrieval + vision confidence

100% free & open-source — Supabase free tier · HuggingFace models · no paid APIs Master's thesis: Sistema Inteligente de Búsqueda Vectorial — UE Madrid, 2025

Model	Dimensions	Modality	Purpose
jinaai/jina-embeddings-v3	1024D	Text	Primary text search — multilingual, high accuracy, main retrieval column
jinaai/jina-clip-v2	1024D	Text + Image	Unified cross-modal space — text queries can find images & vice versa
Alibaba-NLP/gte-multilingual-base	768D	Text	Dense multilingual embeddings — alternative semantic retrieval
intfloat/multilingual-e5-small	384D	Text	Lightweight baseline — fast, low memory, good multilingual coverage
sentence-transformers/clip-ViT-B-32-multilingual-v1	512D	Text	Multilingual text aligned to CLIP image space (cross-modal text side)
openai/clip-vit-base-patch32	512D	Image	Image embeddings — visual search by food photo, L2-normalized

Step	Method	SQL / Logic
1. Vector Search	pgvector cosine ANN (HNSW)	`ORDER BY text_emb_je3 <=> query_vec::vector ASC LIMIT 60`
2. Lexical Search	PostgreSQL tsvector + BM25	`WHERE to_tsvector('spanish', ...) @@ plainto_tsquery('spanish', query)`
3. Score Fusion	Reciprocal Rank Fusion	`score += 1.0 / (60 + rank)` — applied to both ranked lists
4. Final Ranking	Sort by combined RRF score	Top-K results returned with full trace (SQL, candidates, scores)

Why RRF? Rank-based fusion — no score normalization needed, robust to score distribution differences between BM25 and cosine similarity. No parameter tuning required.

Mode	Encoding Process	Models Involved
Text Search	Query text → tokenized → single embedding model encodes to N-dimensional vector → pgvector cosine search against pre-indexed product text embeddings. Optionally also runs BM25 tsvector and fuses with RRF. Simple: 1 encode step. Query and index live in the same vector space.	1 model (e.g. JE3 or GTE or E5 — user picks one)
Image Search	Image uploaded as base64 → Step 1: decode base64 to raw bytes → PIL opens image → converts to RGB → Step 2: encode pixels through CLIP vision encoder (ViT-B/32 or Jina CLIP v2) → produces image embedding vector → Step 3: search against the image_emb_clip column (products were pre-indexed with the same CLIP image encoder). No BM25 — pure vector search only. More steps: decode base64 → PIL load → CLIP vision encode → cosine search. Two models may be needed if cross-modal: text encoder for indexing + vision encoder for query.	2 models when cross-modal: CLIP text encoder (index side) + CLIP image encoder (query side) — must share the same embedding space
Cross-Modal	The key insight of this thesis: Jina CLIP v2 encodes both text and images into the same 1024D space. A text description of "pizza margherita" and a photo of a pizza produce vectors that are geometrically close — enabling true cross-modal search: find images using text, or find text descriptions using an image. This is why two separate CLIP models exist — one for 512D compatibility (clip-ViT-B-32-multilingual + clip-vit-base-patch32) and one for the full 1024D unified space (Jina CLIP v2).	Jina CLIP v2 (both sides use same model — unified 1024D space)

Endpoint	Mode	Input	Description
`POST /chat`	Hybrid text	text, k, ef_search	JE3 vector search + BM25 tsvector + RRF — returns answer + full trace
`POST /search` mode=text	Semantic text	text, k, ef_search	JE3 embedding → pgvector HNSW cosine search
`POST /search` mode=image	Visual search	image_base64, k, ef_search	Base64 image → CLIP ViT-B/32 → pgvector search on image_emb_clip
`POST /search` mode=hybrid	Full hybrid	text + image, k, ef_search	JE3 + BM25 + RRF fusion — best recall, highest quality

100% managed AWS — no self-hosted servers, no external vendors Trade-off: pay-per-invocation cost model — higher unit cost vs. fixed container pricing

Route	Handler (Lambda)	Runtime	Description
`$connect`	connect-handler	Node.js 20.x	Opens session, validates Cognito JWT, registers connection ID
`sendMessage`	WS Request Processor	Node.js 20.x	Routes payload to Bedrock handler, streams response back to client
`$disconnect`	disconnect-handler	Node.js 20.x	Cleans up connection state, persists final session entry
`$default`	WS Authorizer	Node.js 20.x	Custom Lambda authorizer — rejects unauthorized connections

Layer	Service	Protection
Identity & Access	Amazon Cognito User Pool	JWT-based auth · MFA support · password policies · user pool groups
API Authorization	Lambda Authorizer (WS)	Validates Cognito JWT on every WebSocket connection attempt
Content Safety	Bedrock Guardrails	Blocks harmful topics · PII redaction · profanity filter · grounding checks
Secret Management	AWS Secrets Manager	Third-party API keys encrypted at rest · IAM-scoped access only
IAM & Least Privilege	AWS IAM Roles	Per-Lambda execution roles · no shared credentials · resource-level policies
Data Encryption	AWS KMS (default)	DynamoDB tables · S3 objects · Secrets Manager — AES-256 at rest

Provider	Model	Use Case	Config via
Amazon Bedrock	Mistral Small	Default deployment — fast, cost-efficient general chat	SSM Parameter Store
Amazon Bedrock	Anthropic Claude	Complex reasoning · document analysis	SSM Parameter Store
Amazon Bedrock	Amazon Titan	Text generation · embeddings	SSM Parameter Store
Amazon SageMaker	Custom / fine-tuned	Domain-specific models hosted on SageMaker endpoints	SSM Parameter Store
Third-party	Any (e.g. OpenAI)	Plugged in via Secrets Manager API key injection	Secrets Manager

Model switching requires no code changes — update SSM parameter /ai-chatbot/use-case-config/{use-case}/{model-id} and redeploy config.

Stack	Resources	Purpose
Root Stack	ServiceCatalog AppRegistry · IAM Roles	Entry point · cost attribution · cross-stack references
Chat Storage Stack	DynamoDB (conversation + usage tables)	Isolated data layer — per-environment, per-use-case tables
Use Case Stack	API GW WebSocket · Lambda functions · Cognito	Core runtime — chat API, auth, message routing
LLM Provider Stack	LangChain Lambda layers · Boto3 layer · SSM configs	Bedrock model integration and orchestration
UI Stack	S3 Bucket · CloudFront Distribution	CDN-served chat frontend (optional — DeployUI flag)
Observability Stack	CloudWatch Dashboard · Alarms	Token usage · latency · error tracking per use case

MANUEL ALEJANDRO DIAZ RUBIANO

How I got here

My expertise

What drives me

What I'm building

WHAT I'VEBUILT

MANUEL
ALEJANDRO
DIAZ RUBIANO

WHAT I'VE
BUILT