PubNub is a San Francisco–based product company powering real-time experiences including chat, live updates, and interactive applications for 2,000+ companies including Verizon, Autodesk, Zillow, and Dropbox.
Our global data network processes trillions of messages each month with sub-100 ms latency across 15+ data centers worldwide.
We're building an AI capability layer that helps developers add AI features to real-time streams such as classification, summarization, routing, enrichment, and automation, without breaking latency, reliability, or trust.
What you'll do:
Ship AI-powered features into production
Integrate LLM inference, build evaluation and observability tooling, and own the full lifecycle of AI services — from quality metrics and tracing to cost and reliability. You'll work with providers like OpenAI, AWS Bedrock, Azure, or open-source models and deliver features used by real customers.
Build & operate AI services at scale
Design low-latency inference pipelines for high-throughput message streams. Implement model routing, prompt and retrieval patterns (RAG), caching, batching, and fallbacks. You'll solve real-world constraints around latency, scale, and cost.
Enable other teams
Build internal frameworks, APIs/SDKs, and tooling so other teams can ship AI features safely and consistently. Partner with product and engineering on trade-offs between latency, cost, accuracy, and privacy. Clear documentation and great developer experience matter here.
Tech & communication
Work mainly with TypeScript, Python, or Rust (and be open to learning the others). Use modern AI-assisted tools (Copilot, Cursor, Claude, etc.). Communicate clearly in English.
How we work:
If this sounds like the kind of systems challenge you enjoy, apply and include a short note about a production AI feature you've shipped. We review every application personally.