Arepa.AI: Agentic AI Platform for Spanish-Speaking SMBs | Celestinosalim.com
Arepa.AI: Agentic AI Platform for Spanish-Speaking SMBs
Building LangGraph agents with RAG pipelines and voice interfaces for small businesses across Latin America.
Arepa.AI: Agentic AI Platform for Spanish-Speaking SMBs
Small businesses in Latin America operate in a different reality than Silicon Valley startups. They don't have engineering teams. They don't have data infrastructure. They often don't have a website. But they have the same operational problems that AI can solve: answering customer questions, scheduling appointments, managing inventory, and following up on leads.
Arepa.AI is the platform I'm building to bridge that gap. The name is a nod to my Venezuelan roots — the arepa is the most universal food in the culture, and this project aims to make AI equally accessible.
The Problem
Most AI tooling assumes English-first, enterprise-scale, and technical users. That leaves out millions of small businesses across Latin America who could benefit from automation but can't afford a $200K consulting engagement or navigate English-language documentation.
The specific gaps I'm targeting:
Language: LLM performance degrades significantly in Spanish, especially with regional dialects and informal business communication
Cost: SMBs can't justify $0.50/query. The unit economics need to work at $0.01/query or less
Complexity: Business owners need to interact with AI through voice and WhatsApp, not dashboards
Architecture Decisions
LangGraph for Agent Orchestration
I chose LangGraph over raw LangChain or custom orchestration for three reasons:
State machines over chains: Business workflows (lead qualification, appointment booking, follow-up sequences) map naturally to state graphs, not linear chains
Human-in-the-loop: SMB owners need to approve actions before the agent takes them. LangGraph's interrupt/resume model handles this cleanly
Observability: LangSmith integration gives me full trace visibility, which matters when debugging Spanish-language edge cases
RAG Pipeline
The retrieval layer ingests business-specific content: menus, service lists, pricing, FAQs, and operating hours. Each business gets an isolated vector namespace in Supabase (pgvector).
Key design choices:
Chunking strategy: Semantic chunking tuned for Spanish sentence boundaries
Embedding model: Multilingual model (not English-only) to preserve semantic accuracy
Hybrid search: Combining vector similarity with keyword matching for proper nouns (business names, product names) that embeddings handle poorly
Infrastructure
AWS: Lambda for agent execution, S3 for document storage, CloudWatch for monitoring
Terraform: All infrastructure is IaC from day one. No clicking in consoles
Cost ceiling: Hard limits on per-business monthly spend. If a business's agent costs exceed $50/month, something is wrong with the architecture
Current Status
This project is in active development. What's working:
Core agent framework with LangGraph state management
RAG pipeline with Spanish-optimized chunking and retrieval
Voice interface prototype using the same LiveKit stack from celestino.ai
Terraform modules for multi-tenant AWS deployment
What's next:
WhatsApp Business API integration for the primary customer channel
Billing and usage metering per business
Onboarding flow that lets business owners configure their agent without code
Technical Rationale
I'm building Arepa.AI because it sits at the intersection of everything I've learned: production AI engineering from Eventbrite, data pipeline design from FlowWest, and product thinking from building celestino.ai. It's also the hardest version of the problem — making AI work reliably in a language and market that most tooling ignores.
This isn't a demo. It's a business I'm building in public, with the same engineering rigor I'd bring to any production system.