Agent workflows · harnesses · operations

Agent-assisted tools for messy operational work.

Most of my work sits somewhere between production engineering, agent-assisted workflows, and diagnostic systems.

I’m interested in building, hardening, and evaluating agent harnesses that help teams understand what happened, decide what to do next, and adopt automation without losing trust or visibility.

Current focus

Turning repeated investigations into agent workflows.

Lately I’ve been focused on four related problems:

  • Agent-assisted production triage
  • Harnesses for safe, inspectable agent runs
  • Evaluation workflows for developer tools
  • Local-first infrastructure for repeatable debugging

Selected work

Agent workflows grounded in real systems work.

Production Triage Workflow

Support and engineering teams were repeatedly doing the same investigations by hand: checking logs, validating config state, correlating tickets, and reconstructing what happened across several systems.

I built an agent-assisted triage workflow that turns those recurring investigations into evidence-backed diagnostic runs inside the tools the team already used.

  • Designed agent runs for faster investigations, not blind automation
  • Kept evidence visible so engineers could trust and correct the result
  • Reduced covered investigation time from roughly 20–60 minutes to 5–15 minutes
Ask about this work

Repoper

Adopting an open-source repository can feel casual until it becomes part of your workflow. I wanted a calmer way to inspect unfamiliar code before trusting it.

Repoper is a Markdown-first evaluation harness for unfamiliar repositories and agent tooling: collect candidates, extract signals, write durable notes, run sandboxed smoke tests, then decide whether a project is worth adopting or watching.

  • Evaluation workflows for agent tools and developer infrastructure
  • Sandbox-first testing for unknown code and harness behavior
  • Reports and notes that survive after the initial search
Ask about Repoper

Compass

A lot of small decisions become messy because the input, criteria, and final choice are scattered across notes and memory.

Compass is a local-first Go CLI for turning raw signal into an explicit decision record: collect candidates, evaluate them, choose, and leave a trail that can be reviewed later.

  • Raw notes become candidates, evaluations, and decisions
  • Each phase is visible instead of hidden in a chat transcript
  • Decision records keep provenance attached to the outcome
View repository

Warden

Local development environments quietly accumulate state: temporary files, generated artifacts, old outputs, and things nobody remembers owning.

Warden is a local control-plane tool for scanning filesystem state, checking ownership and age, and demoting objects that no longer belong.

  • Reconciliation loop over local filesystem objects
  • Ownership and TTL metadata for generated state
  • Demotion instead of silent deletion
View repository

What tends to matter

The hard part is rarely just the code.

Harnesses

Agentic systems need clear boundaries, durable state, and reviewable outputs before they can be trusted near real work.

Evaluation

Useful automation has to be tested against real workflows, not just demos, benchmarks, or optimistic prompts.

Operations

The useful version is usually the boring one: documented, repeatable, observable, and easy to change later.

About

I like making agentic work easier to inspect.

I’m Mason Dumas, a New York-based engineer working across production engineering, data systems, developer tooling, and agent-assisted workflows.

I’m drawn to operational mess: tickets that require too much context, reports that disagree, debugging paths that live only in someone’s head, and agent workflows that need better harnesses, evals, and evidence trails before they can be trusted.

Contact

Want to compare notes?

For project conversations, agent-harness ideas, evaluation workflows, or questions about private work like Repoper, email is best.

mdumas38@gmail.com