Blog

Thoughts on AI, data science, and the technology shaping our future.

Machine Learning

Google Gemma 4: What the Most Capable Open Models Mean for Local AI Agents

Google DeepMind's Gemma 4 ships four model sizes under Apache 2.0 with native function calling, 256K context, and MoE efficiency that puts 3.8B active parameters within striking distance of 31B dense quality. Local agentic AI just became production-viable.

Read more →
AI Engineering

Gemini Embedding 2: What a Unified Multimodal Embedding Model Means for RAG

Google's Gemini Embedding 2 is the first production embedding model that natively maps text, images, video, audio, and documents into a single vector space. Early adopters report 70% latency reduction and 20% recall improvement — here's what it means for your RAG pipeline architecture.

Read more →
AI Engineering

The Data Scientist's Guide to Prompt Engineering in 2026

Prompt engineering has evolved from ad-hoc craft into disciplined practice. From structured outputs and DSPy optimization to agentic system prompts and retrieval-grounded patterns — here's what actually works in production, and the common mistakes data scientists make.

Read more →
MLOps

Building Production ML Pipelines with MLflow 3.0: What Actually Changed

MLflow 3.0 introduces LoggedModel as a first-class entity, comprehensive tracing for GenAI, LLM-as-judge evaluation, and deployment jobs that validate models before production. Here's what matters after migrating two production pipelines — and what's still missing.

Read more →
AI Engineering

Vector Databases Compared: Pinecone vs Weaviate vs Chroma vs Qdrant in 2026

The vector database market has matured from a handful of startups into a crowded, competitive landscape. Here's a practical comparison of Pinecone, Weaviate, Chroma, Qdrant, Milvus, and pgvector — with guidance on which to choose for your use case.

Read more →
Machine Learning

Fine-Tuning Open Source LLMs on Custom Data: A 2026 Practical Guide

From LoRA to QLoRA, the barrier to fine-tuning has shifted from compute to judgment. Here's the practical playbook for choosing your base model, preparing data, setting hyperparameters, and deploying fine-tuned 8B models on consumer hardware.

Read more →
AI Engineering

RAG Architectures in 2026: A Practical Guide to Retrieval-Augmented Generation

From naive retrieve-and-generate to agentic self-reflective pipelines and GraphRAG, the RAG landscape has evolved dramatically. Here's a practical framework for choosing between RAG, fine-tuning, and long context windows — and why most production systems in 2026 use all three.

Read more →
MLOps

GPU Costs in 2026: A Data Scientist's Guide to Cloud vs. On-Prem

B200s range from $2.25 to $16/hour depending on where you rent. H100s have dropped 60% in two years. Here's how to think about cloud vs. on-prem vs. marketplace for your ML workloads — with real numbers and practical recommendations.

Read more →
AI Industry

The Claude Mythos Leak: What a Misconfigured CMS Tells Us About the Next Wave of AI

Anthropic's leaked internal documents reveal Claude Mythos, a model that represents a "step change" in reasoning and cybersecurity capabilities. Markets reacted, but the real story is what this tells us about the accelerating frontier model race.

Read more →
AI Regulation

The AI Accountability Act: What Data Scientists Need to Know

The US just passed its most significant AI legislation ever, requiring mandatory bias audits for AI in hiring, lending, healthcare, and criminal justice. Here's how it changes the way we build and deploy models.

Read more →
AI Trends

The Rise of Agentic AI: Why 2026 Is the Year AI Stops Assisting and Starts Doing

From autonomous code generation to supply chain orchestration, agentic AI systems are moving from research demos to production deployments. Here's what that means for data scientists and why Morgan Stanley says most of the world isn't ready.

Read more →