DE HITL AI
It sounds German
It’s not German, it data engineering human in the loop artificial intelligence. I’m building a practitioner’s suite of tools under the banner of my company Metro Decisions. Just another consultant with hand-on skills.
Right now is my third and so far least lucrative evolution of Metro Decisions, the company I first started back in 1995 when I left Pilot Software to pursue independent contracting on their product suite. That was Atlanta Metro and it put 5 figures of savings in my bank within a year. I rebooted Metro in LA after five years at Hyperion and then a gain for a moment after working with the Hackett Group. Since then, I’ve moved from purely commercial Enterprise software to hybrid commercial + open source in the AWS cloud. Now I can say I’m a data engineer and some people actually know what that means. Amaze.
In each of those prior instantiation of Metro, I knew much less about full-stack architecture than I know now. Cloud helped leverage what I knew via DevOps. Now I’m leveraging even more via Agentic harnesses. I’m combining all of that to a more or less industry adaptive virtual data practice with the new Metro Decisions. The idea is relatively simple. How could I be a one person Deloitte, you know without actually having to be Deloitte?
I don’t take it for granted that SAAS is dead, but there’s something a little bit smaller that used to exist. If you remember Baan, JD Edwards and Peoplesoft, then good for you. There was also Solomon, Epicor, Lawson and QAD. All good sized viable companies that got swallowed or stomped on by either Microsoft, Oracle or SAP. Kind of sad when you think about it. Anyway I just had a long conversation with Claude who has inflated my ego as I described the near-misses in my 30 year career.
The following is an outline of what’s going on long-term in my head about data engineering: Not much of this is built, but it is the blueprint for what I will do when somebody pays me to think. For the moment, I’m liking my and Claude’s intent to ideation process, wordy as it is.
METRO: Practice Engineering and Data Governance Tooling
Executive Summary
METRO is a practice engineering initiative that codifies data engineering methodology, AI-assisted development frameworks, and web design standards into executable tools and auditable documentation. Its most significant technical artifact is md-util, a Go CLI that transforms raw data assets from disparate sources into SHACL 1.2 ontology representations and emits dialect-specific DDL for Postgres, BigQuery, Databricks, Parquet/Arrow, and Vortex — solving the schema synchronization problem across heterogeneous data environments with a single canonical model. Together, METRO and md-util demonstrate end-to-end system design from strategic framing through specification to working software.
Technical Overview
METRO encompasses two threads of work: a practice documentation system that formalizes how data engineering and AI-assisted development should be conducted, and md-util, a production tool that puts those principles into practice.
Practice Framework: Left-of-Do
The intellectual foundation is the Left-of-Do framework, a closed-loop methodology linking business intent to behavioral proof through three stages. Hypothesis-Driven Development (HDD) defines strategic intent through testable benefit hypotheses — answering “why are we building this?” across problem reality, agent hypothesis, and assumption inventory phases. Specification-Driven Development (SDD) translates that intent into precise, structured specifications that AI agents can reliably build from. Behavior-Driven Development (BDD) proves the specification was met through executable Gherkin scenarios covering happy-path, edge-case, adversarial, and failure-mode conditions. Both HDD and BDD are implemented as interactive Go CLI tools that walk teams through structured question sets, producing auditable artifacts at each stage.
md-util: SHACL-First Data Governance
md-util is a Go CLI implementing an ingest, review, emit pipeline with W3C SHACL 1.2 Turtle as the canonical interchange format.
Five ingestors handle CSV (with statistical type inference over 10,000-row samples and nullable column detection), JSON/JSONL, SQL DDL (parsed without external dependencies), DuckDB (via subprocess delegation, avoiding CGO), and PostgreSQL (via environment-variable configuration). Each ingestor produces a uniform internal model of NodeShape and PropertyShape structs, eliminating cross-boundary format leakage.
A centralized type mapping layer provides the single source of truth for XSD-to-SQL conversions across Postgres, BigQuery, and Databricks dialects. No emitter contains inline type logic — all five emitters (Postgres DDL, BigQuery DDL, Databricks DDL, Parquet/Arrow schema JSON, and Vortex DuckDB COPY commands) project SHACL shapes through this shared mapping. The result: one ontology in, multiple governed schemas out, with no synchronization drift.
The turtle serialization package implements bidirectional SHACL Turtle parsing and emission without external RDF libraries, using a state-machine scanner that tracks bracket depth and string literals. Round-trip testing ensures parse-serialize-parse fidelity.
AI Integration as Engineering Discipline
md-util is itself a demonstration of the Left-of-Do framework in practice. The .context/ directory contains the full audit trail: an HDD document framing the problem hypothesis, an SDD specifying intent and architectural constraints (including the critical “never reverse-engineer .ttl from DDL” principle), and a 26-scenario BDD Gherkin specification defining system boundaries and edge cases — timeout handling, path traversal guards, format auto-detection, and type ambiguity fallbacks.
Development proceeded through deliberate git-flow feature branches (feature/init-command, feature/lakes, worktree-feature+emit) with session-based status reports tracking version progression from v0.2 through v0.5.0. The current release carries 35+ passing tests spanning SHACL model validation, type mapping, Turtle round-trip serialization, and emit dialect handling.
Supporting Standards
METRO also codifies a Node-free web design standard (Caddy, HTMX, Alpine.js, Tailwind, Go templates, Hugo) and two AI prompt kit frameworks: Proper Skills for building production-ready AI agent capabilities, and Dark Code for auditing hidden complexity and risk in AI-generated code. These sit alongside formalized practice analyses covering data transformation (SQL-first with DuckDB/Polars), producer-consumer pipeline architecture, and organizational strategy positioning.
Architecture and Design Choices
Go is the implementation language throughout, chosen for deployment simplicity and runtime reliability. md-util avoids CGO entirely — DuckDB integration uses subprocess delegation, keeping the binary self-contained. Production safety guards include path traversal prevention, collision warnings on .ttl overwrites, and type ambiguity detection with explicit fallback annotations. Schema migrations, environment configuration via direnv, and secrets management through HashiCorp Vault follow the same operational patterns used across the broader project portfolio.
The through-line across METRO is that methodology and tooling are not separate concerns. The frameworks that define how work should be done are implemented as the same Go CLIs and structured artifacts that do the work — closing the loop between theory and practice
.


