Senior Staff Technical Program Manager (AI Platform/OS)
Red Cell Partners is an incubation firm building and investing in rapidly scalable technology-led companies that are bringing revolutionary advancements to market in three distinct practice areas: healthcare, cyber, and national security. United by a shared sense of duty and deep belief in the power of innovation, Red Cell is developing powerful tools and solutions to address our Nation's most pressing problems.
Co-founded in 2023 by Joe Laws and Grant Verstandig, Trase Systems is AI, Uncomplicated. Trase empowers enterprise leaders to harness the full potential of AI without the associated complexity and risks. We are an end-to-end solution for deploying, managing, and optimizing AI in the enterprise. Our platform specializes in bridging the "last mile" of AI adoption, unlocking AI's full potential while driving efficiency and significant cost savings. Trase is at the forefront of AI Agent innovation, topping the Hugging Face GAIA Leaderboard for Generalized AI Assistants, ahead of industry giants such as Google, Meta, Microsoft, and OpenAI. We are leveraging our cutting-edge technologies to develop mission-critical agentic applications in complex industries such as Healthcare, Oil & Gas, and National Security.
About the Role
As a Senior Staff Technical Program Manager, you will own internal program execution across our operating system - ensuring that platform investments translate into shipped, reliable, and measurable outcomes.
This is not a coordination or reporting role. You are responsible for:
- Driving execution across highly coupled, multi-team platform work
- Creating the operating system for engineering execution
- Ensuring platform systems (runtime, infrastructure, AI workflows) ship predictably and safely
You will operate at the intersection of:
- Platform engineering (agent runtime, workflows, system orchestration)
- DevOps / SRE (deployment, reliability, observability)
- DevEx (developer workflows, CI/CD, release safety)
- AI/ML systems (LLM-driven workflows, evaluation, and inference pipelines)
You are expected to be deeply technical - able to:
- Read architecture diagrams and system designs fluently
- Understand and reason about code, APIs, and system behavior
- Engage engineers on tradeoffs across infrastructure, runtime, and AI systems
While this is not a hands-on coding role, the ability to read and occasionally write code to unblock or validate work is highly valuable.
Why This Role Is Needed
Our operating system is a distributed, orchestration-heavy platform with:
- Long-live, stateful workflows
- Cross-service and cross-environment dependencies
- AI/LLM-driven execution paths requiring observability and control
- Strict reliability, security, and auditability requirements
As the platform scales, the bottleneck shifts to:
- Cross-team coordination
- Dependency sequencing
- Release readiness
- Execution predictability
This role exists to:
- Reduce coordination overhead on engineering leads
- Ensure platform work is sequenced, unblocked, and measurable
- Improve delivery predictability without slowing velocity
- Translate platform investments into real shipped outcomes
Scope of Responsibilities
- Own end-to-end execution of internal platform initiatives across the Trase operating system, translating ambiguous work across infrastructure, runtime systems, and AI/ML workflows into clear, actionable plans while ensuring alignment across Engineering, DevOps/SRE, DevEx, and Product.
- Identify and manage cross-team dependencies across services, cloud infrastructure, and AI pipelines, sequencing work to minimize blocking dependencies, reduce integration risk, and avoid rework.
- Establish and maintain a lightweight operating rhythm that drives execution, including milestone tracking, execution reviews, and release readiness checkpoints, ensuring teams have clear priorities, defined success criteria, and visibility into risks.
- Partner with DevOps and SRE to ensure releases are safe, validated, and traceable, and that platform and AI/ML changes are observable, auditable, and ready for production environments; drive go/no-go decisions based on system readiness and risk.
- Proactively identify and manage system-level risks across infrastructure, deployment systems, AI/ML pipelines, and runtime behavior, ensuring mitigation strategies are in place before issues impact delivery.
- Define and track key execution and reliability signals, including delivery predictability, release success rates, dependency resolution, and system health, acting as the source of truth for execution status and risk.
- Continuously improve engineering execution by identifying inefficiencies in CI/CD workflows, testing and integration systems, and AI workflow evaluation, partnering with DevEx and DevOps to increase developer velocity, release safety, and overall system reliability.
Senior Staff-Level Expectations
- Influence execution across multiple teams and domains
- Drive improvements in how systems are built, deployed, and operated
- Balance speed, reliability, and clarity
- Deliver outcomes that improve organization-level execution metrics
Qualifications
Required
- 12+ years of experience in technical program management, engineering, or related roles
- Experience working on distributed systems, cloud infrastructure, CI/CD and deployment systems
- Strong understanding of DevOps / SRE workflows, system dependencies and failure modes
- Demonstrated ability to break down ambiguous technical problems, drive execution across teams, influence without authority
Technical Depth (Critical)
- Strong technical fluency with ability to read and understand production code, reason about system architecture and APIs, engage in technical tradeoff discussions
- Experience with or exposure to AI/ML systems and LLM-based workflows, AI infrastructure (inference, evaluation, orchestration)
- Ability to write code when needed (for debugging, validation, or prototyping), though not a primary responsibility
Strongly Preferred
- Experience working closely with DevOps / SRE teams, platform engineering teams
- Familiarity with Kubernetes, Infrastructure-as-Code, observability systems
- Experience in regulated or high-security environments
Working Style
- High ownership and accountability
- Strong bias for action and clarity
- Comfortable operating in ambiguity
- Focused on outcomes over process
Some travel is required.
If you want to be on the cutting edge of technology, building AI solutions for the future, and are up for a challenge, let's talk!
Salary Range: $200,000-260,000. This represents the typical salary range for this position based on experience, skills, and other factors.
#LI-RCP