Eventbrite: Distributed Indexing Pipelines That Drove ~$576K/yr Revenue | Celestinosalim.com
Eventbrite: Distributed Indexing Pipelines That Drove ~$576K/yr Revenue
As a Senior Software Engineer at Eventbrite, I designed distributed indexing and sitemap pipelines using Airflow, dbt, and Snowflake that drove ~15% organic impression growth and ~$576K/yr in incremental revenue through controlled experimentation.
Eventbrite: Distributed Indexing Pipelines That Drove ~$576K/yr Revenue
As a Senior Software Engineer at Eventbrite (March 2022 - February 2025), I set technical direction for Tier-1 SEO/Growth systems powering the core homepage and discovery surfaces. This is the engineering story behind the distributed indexing and sitemap infrastructure that drove measurable organic growth.
The Context
Eventbrite is a global event management and ticketing platform. Millions of events across hundreds of categories, each with its own page that needs to be discoverable by search engines. The challenge was not just having pages — it was ensuring that search engines could efficiently crawl, understand, and index the right pages at the right time.
I joined the SEO/Growth engineering team and quickly aligned Engineering, Product, and Growth stakeholders on a shared strategy for indexing, crawlability, and performance. The existing systems worked but had scaling limits that were becoming visible as the event catalog grew.
The Architecture: Distributed Indexing and Sitemaps
The pipeline: I designed and built indexing and sitemap generation pipelines using Airflow for orchestration, dbt for transformation logic, and Snowflake as the analytical data warehouse. The pipeline processed the full event catalog — filtering by eligibility, freshness, and quality signals — to produce optimized sitemaps and indexing directives.
Why Airflow/dbt/Snowflake: Eventbrite already had Snowflake as its warehouse and Airflow for scheduling. Building on the existing stack meant no new infrastructure to provision, no new vendor relationships, and a team that already knew how to operate these tools. dbt gave us version-controlled transformation logic that the Growth analytics team could also read and review.
Crawl budget optimization: Search engines allocate a finite "crawl budget" to each site. My pipelines prioritized high-value pages (popular events, trending categories) while deprioritizing expired events and low-quality pages. This was not a one-time configuration. It was a dynamic system that recomputed priorities on a regular cadence as the event catalog changed.
Sitemap architecture: Rather than a monolithic sitemap, I built a partitioned sitemap system organized by category, location, and recency. This gave search engines clear signals about which sections of the site had changed, reducing wasted crawl budget on unchanged content.
Controlled Experimentation
Every change to the indexing pipeline went through controlled experimentation using Statsig. This was critical because SEO changes have delayed, noisy feedback loops — organic traffic changes can take weeks to materialize and are confounded by seasonality, algorithm updates, and competitor behavior.
Experiment design: I worked with the Growth team to design experiments that isolated the effect of indexing changes from other variables. We used page-level holdout groups where possible, and time-series analysis where holdouts were not feasible.
The discipline: No indexing change shipped without an experiment. No experiment was called without reaching statistical significance. This slowed us down in the short term but prevented the common SEO anti-pattern of shipping changes based on "it felt like traffic went up."
Reliability at Scale
I led reliability for the high-traffic distributed services that powered these surfaces. This was not just writing code — it was defining the standards by which the entire team operated.
SLO/SLA governance: I defined and reviewed Service Level Objective and Service Level Agreement targets for our systems, participating in Staff/Principal governance forums and driving cross-team incident mitigation. When something broke at Eventbrite's scale, the blast radius was measured in lost ticket sales and event visibility.
Escalation authority: I served as an escalation authority up to the Director/VP level during incidents affecting our systems. This meant being the person who could explain what broke, why it broke, and what we were doing about it — in real time, to non-technical leadership.
The ~$576K/yr figure was the revenue directly attributable to the indexing pipeline improvements through controlled experimentation — not a broad correlation, but a measured lift from specific changes.
The +482% YoY organic impression growth was the compound result of the indexing pipelines, the caching/deduplication work (covered in my cost engineering writeup), and page performance improvements landing together. No single initiative drove the full number. That is how infrastructure compounds: each layer makes the next one more effective.
What I Learned
Experimentation is the only honest way to measure SEO impact. Before we had experiments, every SEO change was followed by "traffic went up" or "traffic went down" with no way to attribute cause. Statsig gave us the rigor to say "this specific change produced this specific lift" with confidence. The investment in experiment infrastructure paid for itself immediately by stopping us from shipping changes that felt good but measured poorly.
Reliability is a product feature in SEO. If your pages are slow or unavailable during a crawl window, the search engine moves on to a competitor. SLOs are not just for the SRE team. For SEO surfaces, uptime and latency directly affect discoverability. I made this case repeatedly in governance forums, and the data backed it up.
Cross-functional alignment is harder than the code. The engineering was complex but solvable. Getting Engineering, Product, Growth, and Analytics to agree on what to measure, how to experiment, and when to ship was the real challenge. I spent as much time in alignment conversations as I did in code. That was the right allocation — a perfectly engineered pipeline that nobody trusts or understands is useless.
Build on the existing stack. I evaluated purpose-built SEO infrastructure tools and decided against them. Airflow, dbt, and Snowflake were already understood, operated, and budgeted for at Eventbrite. Adding a new tool would have introduced operational burden that offset any marginal capability gain. The systems thinking move was to build within the constraints, not to fight them.