FlowWest: Modernizing 30 Years of dBase III into Cloud Infrastructure

FlowWest: Modernizing 30 Years of dBase III into Cloud Infrastructure | Celestinosalim.com

FlowWest: Modernizing 30 Years of dBase III into Cloud Infrastructure

I joined FlowWest in August 2020 as the first engineering hire. FlowWest is a water, ecosystems, science, and technology consulting firm that works with public sector agencies and tribal partners on environmental programs managing $20-40M annually. They had deep domain expertise in hydrology, fisheries, and water policy. What they did not have was modern data infrastructure.

The backbone of their data operations was a dBase III system built in the early 1990s - over 30 years of accumulated data, entry workflows, and institutional knowledge encoded in a technology that predated the web. My job was to modernize that system without losing what made it work.

Tech Stack

| Layer | Technology | |---|---| | Database | PostgreSQL (migrated from dBase III) | | Cloud Platform | Google Cloud Platform (GCP) | | ETL Pipelines | Python | | Data Validation | Custom rules engine (domain expert-driven) | | Access | Cloud-native multi-user (replaced file locks) |

The Problem

dBase III was not just a database. It was a workflow. Biologists, hydrologists, and field technicians had been entering data into this system for decades. Every workaround, every naming convention, every manual step had a reason - even if that reason was lost to institutional memory.

The concrete problems:

Manual data entry was the primary input method. Field data from monitoring stations, fish counts, and water quality measurements was transcribed by hand into dBase forms. Error rates were high and corrections were slow.
Processing time for aggregated reports - the deliverables that state agencies and tribal partners actually needed - took weeks. The system could not handle the volume of joins and aggregations required for regulatory reporting at the scale of programs managing $20-40M.
No collaboration. The dBase files lived on local machines and network shares. There was no concurrent access, no version history, and no audit trail. If someone overwrote a record, it was gone.
Regulatory compliance required data accuracy and traceability that the existing system could not guarantee.

What I Built

PostgreSQL migration on GCP. I migrated the dBase III data into a PostgreSQL database hosted on Google Cloud Platform. This was not a lift-and-shift. The dBase schema had decades of accumulated debt - denormalized tables, inconsistent naming, implicit relationships that only made sense if you had been using the system since 1995. I normalized the schema while preserving backward compatibility with existing reporting queries.

Cross-platform ETL and workflow systems. I built ETL pipelines in Python that could ingest data from multiple sources - field devices, spreadsheets, and the legacy dBase exports - normalize it, validate it, and load it into PostgreSQL. The workflow layer automated the aggregation and reporting steps that had previously been manual.

Data validation layer. Every record that entered the new system went through validation rules built from domain expert knowledge. Range checks on water temperature readings. Consistency checks across related measurements. Flagging, not rejection - the system surfaced anomalies for human review rather than silently dropping data.

Cloud-native access. Moving to GCP meant the team could access the data from anywhere. Field technicians could submit data from monitoring stations. Analysts could run queries without waiting for file locks. State agencies could receive reports faster.

Results

| Metric | Before | After | |--------|--------|-------| | Manual data entry effort | Baseline | -80% | | Data accuracy | Baseline | +60% | | Report processing time | Weeks | Hours | | Concurrent access | None (file locks) | Full multi-user |

These numbers mattered because the programs they served were not optional. Fish population monitoring, water quality reporting, regulatory compliance - these had legal deadlines and real consequences for the ecosystems and communities that depended on them. Faster processing and higher accuracy meant better decisions about water allocation, habitat restoration, and endangered species protection.

What I Learned

Legacy systems encode institutional knowledge. The temptation with a 30-year-old system is to throw it away and start fresh. That would have been a mistake. Every quirk in the dBase schema was there because a domain expert made a decision about how to represent their data. The migration succeeded because I spent weeks interviewing hydrologists and biologists to understand why the data was structured the way it was before I changed anything.

Being the first engineering hire means building trust first. The team had operated without a dedicated engineer for decades. They were skeptical - reasonably - that a software engineer from New York understood their work. I earned trust by learning their domain vocabulary, sitting in on field data collection, and delivering small, visible improvements before proposing the big migration. The technical architecture was the easier part.

Data validation is a product feature, not a cleanup task. The validation layer was initially scoped as a post-migration cleanup. I advocated for making it a permanent part of the ingest pipeline. When you are dealing with environmental data that informs regulatory decisions, "the number looked wrong but we did not catch it" is not an acceptable outcome.

ETL for regulated domains is different. You cannot silently transform data. Every transformation needs to be auditable - what came in, what went out, what rule was applied, and who approved the exception. I built the pipeline with full lineage tracking from day one. This was not over-engineering; it was a requirement from the agencies that funded these programs. The same audit-trail discipline carries into every AI system I build - from legal intake automation where attorney-client privilege demands traceability, to Arepa.AI's multi-tenant data isolation.

Work With Me On Something Similar

If you're carrying legacy data infrastructure that has become a liability - whether that's a 30-year-old system like dBase III or a decade of accumulated technical debt in a modern stack - the migration approach here scales. Domain knowledge preservation, phased migration, and data validation by design are the same disciplines I apply to AI data pipelines today.

Explore AI Consulting Services or submit an inquiry - I'll respond within one business day.

Eventbrite: How I Cut External API Costs from $15K/Day to $40/Month