Hyperspace Pods
A pod is the team primitive of Hyperspace. Identity at <slug>.hyperspace.sh, (optional) P2P GPU pool, (optional) BYOK $ treasury, and (optional) Hyperspace Go credit pool — four layers, each independently optional, all under one slug. Slack-style visibility. One auth. One bill.
Free when local. A team of five paying for cloud AI burns $500–2,000/month on API calls. The same team's existing machines can serve competitive open models for free — the hardware is already on their desks. When a query genuinely needs a frontier model, the pod falls back to cloud at wholesale rates from a shared treasury.
No middleman. Prompts travel from your IDE to your pod members' hardware and back. No server in between reading your data. Pod state is replicated across your own machines using Raft consensus.
Automatic sharding. Tell the pod which model you want. It figures out how to split it across whatever hardware is online. No configuring layer ranges or calculating VRAM budgets.
Real NAT traversal. Friend behind a home router with a dynamic IP? Works. No VPN, no Tailscale, no port forwarding.
A pod is a private cluster of 2-10 trusted devices that pool their compute, models, and credits. Create a pod, invite your devices or friends, and collectively run models that no single device could handle alone.
Local (default) — Raft consensus across member devices. Works fully offline. State in ~/.hyperspace/pod.json. No cloud dependency.
Cloud — Create with pod create <name> --cloud. Adds a web dashboard, Drive (shared filesystem + vector search), marketplace listing, and a persistent coordinator on Hyperspace infrastructure.
Distributed Inference
Split 70B+ models across GPUs. Pipeline parallelism across the mesh.
Shared Providers
OpenRouter, Groq, Together, Fireworks, DeepInfra, xAI, Google, Mistral, Cohere, Anthropic, OpenAI, Vercel.
OpenAI-Compatible API
pk_* keys against /v1/chat/completions. Works with every client.
Always-On Agent VM
Persistent daemon on cloud VM. 9 providers supported. BYOK model.
Drive
Shared filesystem + vector search. S3-compatible backends. PDF/Word extraction.
Custom Domains
<slug>.hyperspace.sh auto-provisioned. CNAME for your own domain.
Each layer is independently optional. A solo developer runs layer 4 only. A community runs 1 + 4. A team running its own GPU cluster runs all four. The same pods row in Postgres carries metadata for every layer; members come from one membership list; billing flows through two parallel pool RPCs that don't conflict.
1. Identity & access required
Slug, member roster, admin, Slack-style visibility, multi-use invitations with email-lock, max-uses, expiry, and revocation.
2. P2P GPU compute optional
Member machines join via libp2p. pod-raft consensus, capability heartbeats, layer-sharding, federation across peer pods via x402.
3. Inference billing ($) optional
OpenAI-compat gateway, BYOK provider keys, Stripe-loaded treasury, per-call $ accounting, daily/monthly caps, per-member sub-caps.
4. Hyperspace Go pool optional
Flat $10/mo team subscription. Pooled credits on a 5h sliding window. 14 curated open-weight models. Per-member sub-caps. Stacks with BYOK.
The gateway picks the cheapest path automatically: P2P first (free if a member has the model), BYOK next (your provider, your $), Hyperspace Go last (pool credits). Federation lets pods burst to peer pods via x402 micropayments when no member has the model.
Set per pod. Members see the same thing regardless — strangers see different things based on the visibility flag. Backwards-compat: pods that pre-date mig 047 had their visibility backfilled from older boolean flags.
| Visibility | Access gate | Listed? | Behavior to a stranger |
|---|---|---|---|
| public | no auth required | yes | Anyone with the URL can read pod-shell content |
| unlisted | no auth required | no | Public if you have the URL — not in directories |
| members_only | sign-in + must be a member | yes | Stranger sees a sign-in / invite-redeem gate |
| invite_only | sign-in + must be a member | no | Non-members see a 404 — pod behaves like it doesn't exist |
Visibility is enforced at app/pod/[slug]/page.jsx server-side and at every pod-scoped API route. Toggle from the pod admin panel or via hyperspace pod-cloud visibility <mode>.
Pricing is per pod, not per user. Members are bundled. The Hyperspace Go pool stacks on top of layers 1–3 — you can run BYOK and Go together.
| Plan | Price | Members | Hyperspace Go pool (5h sliding window) | Curated catalog |
|---|---|---|---|---|
| Pod Starter | $0 | up to 5 | — | BYOK only — bring your own keys |
| Pod Team | $10/mo | up to 10 | 10,200 credits / 5h | 14 open-weight models, pooled across team |
| Pod Pro | $30/mo | unlimited | 30,000 credits / 5h | + frontier (Claude Opus 4.7, Sonnet 4.6, GPT 5.5, Gemini 3.1 Pro) |
Each model has a class; class determines credits-per-request. Heavy frontier models cost more per call than the cheap classes — this is what makes pooling fair.
| Class | Credits/req | Reqs/5h on Team | Examples |
|---|---|---|---|
| heavy | 11.6 | 880 | GLM-5.1, Kimi K2.6, DeepSeek V4 Pro, Claude Opus 4.7, GPT 5.5 |
| medium | 7.9 | 1,290 | MiMo V2 Pro, Claude Sonnet 4.6, Gemini 3.1 Pro |
| light | 3.1 | 3,300 | Qwen 3.6 Plus, MiMo V2.5 |
| cheap | 1.0 | 10,200 | MiniMax M2.5/2.7, DeepSeek V4 Flash |
Pod admins set per-member sub-caps so one member can't drain the team pool. The hg_consume_pod RPC checks both the pool window and the member's individual cap atomically before each call.
Start from the CLI, link to the cloud later.
Admin creates the pod on the web, members join via link.
All commands support --json for machine-readable output.
| Command | Description |
|---|---|
| pod create <name> | Create a new pod (you become owner) |
| pod create <name> --cloud | Create with cloud dashboard, Drive, and marketplace |
| pod join <token> | Join via invite token |
| pod leave | Leave current pod |
| pod status | Online nodes, VRAM, models |
| pod members | List members with roles and online status |
| pod invite | Generate shareable invite (hsi_v1.* token) |
| pod invite --role admin | Invite with admin privileges |
| pod invite --ttl 7d | Custom expiry (default 24h) |
| pod invite --multi-use | Reusable invite link |
| pod models | List models available across all members |
| pod resources | Per-node VRAM/RAM breakdown |
| pod gateway | Show gateway URL, port, and connection info |
| Command | Description |
|---|---|
| coord status | Raft leader, term, commit index, cluster health |
| coord members | Raft cluster membership and voter status |
| coord balance | Treasury balance for current member |
| coord ledger | Full transaction history |
| coord mint <amount> | Mint credits into treasury (owner only) |
| coord revoke <member> | Revoke member access |
| coord transfer <to> <amount> | Transfer credits between members |
| coord credit <member> <amount> | Credit member balance (admin) |
| coord invite | Generate Raft-level invite token |
| coord redeem <token> | Redeem a Raft invite token |
| coord join-cluster <addr> | Join an existing Raft cluster |
| Command | Description |
|---|---|
| models pull hf:<repo> --quant Q8_0 | Download from HuggingFace with quantization |
| models pull --auto | Auto-select best model for your GPU |
| models register <path> --id <name> | Register a local GGUF file |
| models list | Catalog + registered + discovered models |
| models downloaded | Show models downloaded locally |
| Command | Description |
|---|---|
| pod shard <model> | Distribute model across pod nodes |
| pod shard <model> --force | Force multi-node shard (even if fits on one) |
| pod shard <model> --dry-run | Preview shard plan without activating |
| pod dissolve | Dissolve the active shard ring |
| pod resources | Per-node VRAM/RAM breakdown with shard status |
| Command | Description |
|---|---|
| pod keys create --name <n> | Generate pk_* key |
| pod keys create --name <n> --scopes <models> | Key restricted to specific models |
| pod keys list | List keys with usage stats |
| pod keys revoke <id> | Revoke a key |
Drives the cloud-side pod row at <slug>.hyperspace.sh — visibility, Hyperspace Go subscription, per-member credit caps. Different from coord commands, which drive Raft state on member machines.
| Command | Description |
|---|---|
| pod-cloud visibility <mode> | Set public, unlisted, members_only, or invite_only |
| pod-cloud subscribe | Attach a Hyperspace Go subscription to the pod ($10/mo, pooled) |
| pod-cloud members | List cloud-side membership with sub-caps |
| pod-cloud invite [--email] [--role] [--ttl] | Mint a multi-use invitation (logged in pod_invitations) |
| pod-cloud cap <email> <credits> | Set a per-member 5h cap (e.g. cap a junior at 1,000) |
Route requests to cloud providers when local inference is unavailable or a specific model is needed. Two modes: BYOK (member supplies their own API key) and Funded (pod treasury pays).
| Provider | BYOK | Funded |
|---|---|---|
| OpenRouter | Yes | Yes |
| Groq | Yes | Yes |
| Together | Yes | Yes |
| Fireworks | Yes | Yes |
| DeepInfra | Yes | Yes |
| xAI | Yes | Yes |
| Yes | Yes | |
| Mistral | Yes | Yes |
| Cohere | Yes | Yes |
| Anthropic | Yes | Yes |
| OpenAI | Yes | Yes |
| Vercel AI Gateway | Yes | No |
Control how much each member can spend on cloud inference from the treasury.
| Budget Type | Description |
|---|---|
| percent | Percentage of treasury (e.g. 10% each for 10 members) |
| fixed_daily | Fixed daily credit limit per member |
| fixed_monthly | Fixed monthly credit limit per member |
| unlimited | No limit (default for owner) |
Every cloud inference call runs check_and_reserve_budget before dispatching to the provider. Over-budget requests return 429 Too Many Requests.
The gateway routes every inference request through a priority chain. The first tier that can serve the request wins.
Budget enforcement applies at every level. Even BYOK calls are logged and count toward usage analytics. Funded calls are blocked if the member exceeds their budget allocation.
Shared filesystem with automatic text extraction and vector search. Every file uploaded to Drive is indexed and searchable by any pod member.
S3-Compatible
Cloudflare R2, AWS S3, or GCS as backend. Local fallback for offline pods.
Text Extraction
PDF, Word (.docx), and plaintext files automatically parsed on upload.
Vector Search
Cosine similarity across all indexed documents. Embeddings computed on upload.
| Model | Dimensions | Source |
|---|---|---|
| GTE-small | 384 | Built-in (default) |
| Ollama embeddings | 768 | Local Ollama instance |
| OpenAI text-embedding-3-small | 1536 | Cloud (BYOK or funded) |
| Plan | Storage | Max Files |
|---|---|---|
| Free | 10 GB | 500 |
| Pro | 100 GB | 5,000 |
| Ultra | 1 TB | 50,000 |
An always-on agent daemon running on a cloud VM. The pod provisions and manages the lifecycle automatically — you choose the provider and model.
| Provider | Notes |
|---|---|
| Oracle Cloud | Free tier (ARM A1, 4 OCPU, 24GB RAM) |
| Scaleway | Stardust/DEV1-S instances |
| Fly.io | Shared-cpu, performance-2x |
| Vercel | Serverless functions (cron-based) |
| Vultr | Cloud compute (vc2) instances |
| AWS Lightsail | $3.50/mo nano instances |
| DigitalOcean | Basic droplets |
| Linode | Nanode 1GB instances |
| Hetzner | CX11 / CAX11 (ARM) instances |
The VM auto-provisions the cheapest available instance matching your requirements. Uses BYOK credentials for the cloud provider.
Deploy lightweight services that run alongside your pod. Services are accessible from the pod dashboard and via your custom domain.
| Runtime | Description |
|---|---|
| python | Python 3.11+ with pip dependencies |
| node | Node.js 20+ with npm/pnpm |
| docker | Any Docker image |
| shell | Bash script executed on cron or webhook |
| static | Static HTML/CSS/JS served from Drive |
Services are deployed from the pod web dashboard or via the CLI. Each service gets a URL at <slug>.hyperspace.sh/apps/<name>.
Sync external data sources into Pod Drive. Connectors run on a schedule and automatically index new content for vector search.
| Source | Description |
|---|---|
| Google Drive | Sync folders or shared drives |
| Notion | Sync pages and databases |
| GitHub | Sync repository files (code, docs, issues) |
| Slack | Sync channel messages and threads |
| Web Crawl | Crawl and index a URL or sitemap |
| S3 Bucket | Mirror an existing S3/R2/GCS bucket |
Every pod gets a <slug>.hyperspace.sh subdomain auto-provisioned via Cloudflare Worker. You can also bring your own domain.
| Path | Description |
|---|---|
| / | Pod dashboard (status, members, models) |
| /files | Drive file browser and search |
| /apps/<name> | Deployed services |
| /webhook/<path> | Incoming webhook endpoints |
| /agents/* | Agent API and status pages |
Reusable agent recipes. A template bundles a model configuration, system prompt, tools, and services into a shareable package.
| Visibility | Description |
|---|---|
| public | Listed in the Hyperspace marketplace, anyone can clone |
| unlisted | Accessible via direct link, not listed in marketplace |
| private | Only visible to pod members |
Pods can form bilateral trust relationships with other pods. Federation extends your pod's capabilities without merging membership.
Shared Model Lists
Allied pods advertise their available models. Requests route across the alliance when local capacity is full.
Shared Credit Pool
Optional. Contribute credits to a shared pool that any allied pod can draw from for cloud inference.
Public Offerings
Advertise model access at a per-request rate. Other pods pay from their treasury to use your hardware.
Alliances use Ed25519 federation envelopes — signed messages with a 5-minute TTL to prevent replay attacks. Trust is bilateral: both pod owners must approve. Duration is configurable (default: 30 days, renewable).
Every state change is an Ed25519-signed command replicated through Raft. No single member can forge transactions.
| Operation | Authority |
|---|---|
| member.join | Admin/owner (or genesis) |
| member.leave | Self or owner |
| key.mint | Admin/owner (leader-signed) |
| treasury.transfer | Self (from own balance) |
| key.revoke | Admin/owner or key creator |
| budget.set | Admin/owner |
| federation.propose | Owner |
| federation.accept | Owner |
Every command is wrapped in a signed envelope before Raft replication:
Replay protection: unique nonce (24h retention) + timestamp drift check (5 min window vs cluster time). Domain separation: namespace-scoped signing prevents cross-topic replay of GossipSub messages.
Hyperspace provides a native MCP server for Claude Code. This lets Claude Code manage your pod, run inference, and use all 30+ tools without leaving the conversation.
Once registered, Claude Code can call tools like pod_status, pod_infer, models_pull, pod_shard_model, and pod_create_api_key directly.
You can create a Claude Code agent that uses your pod as its inference backend. This gives you a sub-agent powered by your own hardware.
If you add the agent to your Claude Code config, you can invoke it as a slash command:
| Device | Role | GPU | Status |
|---|---|---|---|
| NVIDIA H100 | Inference primary | 80GB HBM3 | online ~44 tok/s |
| Mac Studio M3 Max | Inference secondary | 48GB unified | online |
| DO Droplet | Coordinator | 8GB RAM | Raft leader |
| Model | Qwen3.6-27B Q8_0 (28.6GB) |
| Engine | llama.cpp + CUDA (H100), MLX (Mac) |
| Throughput | ~44 tok/s on CUDA (single node) |
| API Key | pk_a2cf2afb69928fde5abc675d862c8da4ce01ce990e39784c |
| Gateway | localhost:8080/v1 |
| Dashboard | vega.hyperspace.sh |
The Vega pod is our reference deployment for testing three scenarios:
1. Cross-device sharding. Split Qwen3.6-27B BF16 (53.8GB) across H100 + Mac M3 Max using pipeline parallelism. The smart shard planner assigns layers proportional to each device's usable memory.
2. Benchmark sharded vs single-node. Compare latency and throughput when the model runs entirely on the H100 (Q8_0, fits in 80GB) versus sharded BF16 across two devices. Measure activation transfer overhead via libp2p.
3. Claude Code agent integration. Run a production Claude Code agent that routes coding queries through the Vega gateway. Qwen3.6-27B with thinking mode provides reasoning-heavy code generation at zero external API cost.
Pods run any GGUF model from HuggingFace. Pull with quantization selection:
| Model | Params | BF16 | Q8_0 | Min VRAM |
|---|---|---|---|---|
| Qwen 3.6 27B | 27B | 53.8 GB | 28.6 GB | 32 GB |
| Llama 3.1 70B | 70B | 140 GB | 74 GB | 80 GB |
| Qwen 2.5 72B | 72B | 144 GB | 76 GB | 80 GB |
| Gemma 3 27B | 27B | 54 GB | 28 GB | 28 GB |
Thinking models (Qwen3.6): set max_tokens: 8000+ to allow room for internal reasoning. The gateway preserves reasoning_content in responses.
- Four-layer architecture — identity / P2P compute / BYOK $ billing / Hyperspace Go pool, each independently optional, all on the same pod row
- Slack-style visibility — public · unlisted · members_only · invite_only; non-members of invite_only see a 404
- Hyperspace Go pool — pod-pooled $10/mo team subscription, 10,200 credits/5h sliding window, 14 curated open-weight models
- Per-member caps — go_credit_cap_5h on pod_members, atomic enforcement via hg_consume_pod RPC
- Multi-use invitations — pod_invitations ledger with email-lock, max-uses, expiry, revocation
- Pod Pro tier — 30,000 credits/5h, unlimited members, frontier models (Claude Opus 4.7, GPT 5.5, Gemini 3.1 Pro)
- Pod Drive — S3-compatible shared filesystem with vector search
- Connectors — sync GitHub, Notion, Google Drive into Drive
- Custom domains — <slug>.hyperspace.sh auto-provisioned via Cloudflare
- Fix auto-update re-exec to use installed binary path
- Fix Vercel deploy-download CI step
- Direct llama-server proxy — preserves reasoning_content for thinking models
- Default context increased to 16k for registered models
- CUDA build-from-source — auto-builds llama.cpp with CUDA when toolkit detected
- HuggingFace model pull — models pull hf:<repo> --quant BF16
- Pod identity mapping — pod-raft member IDs ↔ libp2p PeerIDs for shard planning
- Dynamic model registry — ~/.hyperspace/models.json
- Qwen3.6-27B added to catalog and shard architecture map
- Pod ring runtime initializes on restart (was only during pod join)
- --force flag for pod shard
- Pod keys create for local pods — leader-signed /api/keys/mint endpoint
- Smart shard planner (VRAM-weighted, MLX + llama-cpp-rpc)
- Pod-raft Go binary (Raft consensus, Ed25519-signed commands)
- Invite system (hsi_v1 tokens), treasury + ledger
- Sidecar architecture (network/inference/pulse/matrix isolation)
- Bootstrap migration to DigitalOcean (6 nodes)
- Initial pod-raft prototype (membership, invites)
- Pod concept introduced in CLI
- libp2p mesh integration for shard streams
Full changelog: changelog.hyper.space