Self-Hosted Cowork, Part 1 of 6: Foundation — Validating Your GPU and Deploying the Docker Stack

Validate your GPU host, deploy Docker with PostgreSQL+pgvector, and lay the foundation for your self-hosted Copilot alternative.

Self-Hosted Cowork, Part 1 of 6: Foundation — Validating Your GPU and Deploying the Docker Stack
Photo by Annie Spratt / Unsplash

Objective

By the end of this post you will have a validated GPU host, a working Docker environment, and all prerequisite services running — ready for LLM inference in Part 2.

What Are Copilot Cowork and Claude Cowork?

A cowork AI assistant is not a chatbot you open in a browser tab. It sits inside your workflow, reads your inbox and calendar, understands the context of ongoing projects, and takes action on your behalf — pausing for your approval before anything irreversible happens.

Both Microsoft and Anthropic shipped products under this name in early 2026. They are distinct products, and the difference matters to how you evaluate this series.

Microsoft Copilot Cowork

Copilot Cowork launched in March 2026 through Microsoft's Frontier program. It is a cloud-hosted autonomous agent with 13 built-in skills: Word, Excel, PowerPoint, PDF, Email, Scheduling, Calendar Management, Meetings, Daily Briefing, Enterprise Search, Communications, Deep Research, and Adaptive Cards. You describe what you need; it executes multi-step workflows across your M365 environment and pauses for approval before sending emails or booking meetings.

What Microsoft does not lead with: Copilot Cowork is built on Claude. Microsoft worked with Anthropic to bring Claude's agentic reasoning into M365. The model doing the work is Claude. The infrastructure it runs on is Microsoft's. Your data is processed in Microsoft's cloud.

Pricing sits on top of your existing M365 subscription — around AU$44 per user per month at the Copilot tier. For a 20-person team: AU$10,560 per year, before base licences.

Claude Cowork (Anthropic)

Anthropic launched their own Claude Cowork product in January 2026. It is a desktop agent (macOS; Windows in progress) that runs locally and accesses your files, folders, and connected cloud apps directly. It is designed for individual power users — researchers, analysts, legal professionals, finance teams — who need to move across large volumes of documents and data.

The design philosophy emphasises local execution: your files stay on your machine until the approved action is taken. But the model reasoning still calls Anthropic's API. It is local-first, not fully air-gapped.

The constraint: Claude Cowork requires a Claude Max subscription — AU$150 to AU$240 per user per month. That is three to five times the cost of Copilot Cowork. For an individual contributor it is defensible. For a 10-person team, the per-seat cost becomes difficult to justify.

What this series builds

This series builds neither of those. It builds an agent that calls the same Microsoft Graph endpoints that Copilot Cowork calls, uses open models running on your hardware, and processes nothing on external infrastructure until the approved action is executed. No per-seat licensing. No data leaving your network during the reasoning step.

The model doing the reasoning is Gemma4-26B — a 26-billion parameter mixture-of-experts model from Google, running on an RTX 3090. In 2026, open models have closed the capability gap: Gemma4-26B scores 85.2% on MMLU Pro and ranks third on the LMArena open model leaderboard. This is not a compromise — it is a different architectural choice.

What It Actually Does for a Microsoft 365 Team

Microsoft's 2024-2025 Work Trend Index surveyed 31,000 knowledge workers across 31 countries and found that on average, 57% of working time is spent on communication — email, meetings, and chat — and only 43% on focused creation. A cowork agent targets that 57%. The specific productivity figures below come from Microsoft's own Copilot studies of cloud-deployed Copilot. The self-hosted stack in this series uses the same Graph API to produce the same outcomes.

ScenarioWhat the agent doesMeasured outcome
Morning inbox triageSummarises overnight threads, surfaces items needing a decision, drafts replies for one-click approval64% less time processing email; 29% faster across a standard set of daily tasks (Work Trend Index 2024)
Meeting catch-upGenerates a summary with decisions and action items for missed meetings4x faster catch-up than manual review (Work Trend Index 2024)
Cross-team schedulingReads the request, checks all attendees' calendars, proposes a slot, books on approvalVodafone: employees saved average 3 hours per week, reclaiming 10% of working time (Microsoft case study 2024)
First draft of client communicationsGenerates a draft grounded in your past emails and SharePoint docs, in your team's writing style85% of Copilot users report faster first drafts; Newman's Own tripled monthly campaign volume and cut brief time from 3 hours to 30 minutes (Microsoft case study 2024)
Action item captureExtracts tasks from meeting transcripts, creates Microsoft To Do entries with due dates and assignees68% of Copilot users report improved quality of work output (Work Trend Index 2024)
Regulated industry document handlingProcesses client documents locally — no file content sent to external APIs — then acts on approved findingsA US law firm documented a AU$50,000 self-hosted deployment specifically because client case files and legal strategy could not be sent to public LLMs (documented case study, r/LocalLLaMA 2025)

Why Copilot Adoption Is Stalling

Despite the productivity numbers, the market has not moved. Microsoft has sold Copilot licences to approximately 15 million users out of 450 million M365 subscribers — a 3.3% conversion rate over two years. Gartner found that only 5-6% of organisations that ran a Copilot pilot moved to a larger-scale deployment.

The three barriers that practitioners consistently report are not about the AI. They are about the environment the AI inherits.

1. Copilot surfaces what your permission model has been hiding

Copilot Cowork inherits every permission the user has in Microsoft 365. Research by Securiti found that on average, 16% of business-critical data is overshared across M365 tenants, with an average of 802,000 files at risk per organisation (reported in The Register, August 2024). When Copilot can read everything a user can read, that oversharing becomes visible: salary data surfaces in responses to the wrong people, HR documents appear in project searches, confidential client information appears in general queries.

The Securiti CDO Jack Berkowitz, speaking to more than 20 enterprise Chief Data Officers, described the core problem plainly: "You've got to have clean data and you've got to have clean security in order to get these systems to really work the way you anticipate. It's more than just flipping the switch." (The Register, August 2024)

This is not a flaw in Copilot — it is a prerequisite that most organisations discover they do not meet until they turn Copilot on. Reported consequence: approximately half of large enterprise deployments have been restricted or turned off pending data governance remediation.

2. Australian data residency has a documented gap

Microsoft stores M365 data at rest in Australia for Australian tenants. But Copilot processing still temporarily moves data offshore for AI inference. In-country AI processing is not expected to be available until end of 2026. This is documented in Microsoft's own data residency documentation.

Under the Australian Privacy Act (APP 8), your organisation remains legally liable for how personal information is handled overseas, even when processed by a contracted third party. The Office of the Australian Information Commissioner has published explicit guidance on this risk for commercial AI products.

The additional exposure: the US CLOUD Act allows US authorities to compel any US-based company to produce data from any server it owns or leases anywhere in the world, including Australian data centres. For organisations handling sensitive client data, patient records, or information subject to professional privilege, this is a material risk — not a theoretical one.

Current Australian Privacy Act penalties for a serious breach: up to AU$50 million or 30% of adjusted annual turnover, whichever is greater.

3. Copilot only sees Microsoft Graph

Copilot Cowork operates within the Microsoft 365 boundary. It cannot natively reach Salesforce, a legacy ERP, an internal ticketing system, or any data source outside Graph. For businesses where operational data lives across multiple systems — which describes most organisations over 20 people — Copilot produces answers based on fragments. The self-hosted stack in this series can call any API you give it a tool definition for.

The Real Objections to Copilot Cowork and Claude Cowork

These are not theoretical concerns. They come from IT professionals, sysadmins, and MSPs who have evaluated or deployed both products and documented what actually happened. The sources are cited where possible.

Copilot Cowork: "It turned on and immediately surfaced files that should never be visible"

This is the most consistently reported deployment failure. Copilot Cowork inherits every M365 permission the user holds. Research by Securiti, published in The Register (August 2024) and based on interviews with more than 20 enterprise Chief Data Officers, found that on average 16% of business-critical data is overshared across M365 tenants, with an average of 802,000 files at risk per organisation. When Copilot can read everything a user can read, that oversharing becomes visible: salary information surfaces in project queries, HR documents appear in general searches, confidential client data appears in responses to the wrong people.

CDO Jack Berkowitz described the core issue: "You've got to have clean data and you've got to have clean security in order to get these systems to really work the way you anticipate. It's more than just flipping the switch."

Reported outcome: approximately half of large enterprise Copilot deployments have been restricted or turned off pending data governance remediation. Copilot does not create the oversharing problem — but it makes an existing, invisible problem suddenly very visible.

Copilot Cowork: "Our data governance team will not approve it — and they are right"

The specific concern in Australian organisations is the data residency gap. Microsoft stores M365 data at rest in Australia for Australian tenants, but Copilot AI inference still temporarily processes data on offshore infrastructure. In-country AI processing is not expected to be available until end of 2026 (documented in Microsoft's own data residency documentation).

Under the Australian Privacy Act (APP 8), organisations remain legally liable for how personal information is handled overseas even when processed by a contracted third party. The OAIC has published specific guidance on this risk for commercial AI products. Adding to the exposure: the US CLOUD Act allows US authorities to compel any US-based company to produce data from any server it operates globally, including in Australian data centres.

Current Australian Privacy Act maximum penalty for a serious or repeated breach: AU$50 million or 30% of adjusted annual turnover, whichever is greater. Governance teams blocking Copilot deployment on these grounds are not being obstructive — they are reading the legislation correctly.

Copilot Cowork: "It only works if everything is already in Microsoft"

Copilot Cowork is bounded by Microsoft Graph. It cannot natively access Salesforce, a legacy ERP, an internal ticketing system, or any data source outside the M365 ecosystem. For organisations where operational data spans multiple systems — the majority of businesses over 20 people — Copilot produces answers based on an incomplete picture. A query like "what is our current exposure to this client?" requires data from your CRM (deals), accounting system (invoices), and SharePoint (contracts) simultaneously. Copilot sees only the SharePoint fragment.

Claude Cowork: "AU$150-240 per seat per month is not a team tool — it is a premium individual tool"

Claude Cowork's per-seat pricing is three to five times higher than Copilot Cowork. At AU$200 per user per month, a 10-person team pays AU$24,000 per year before any M365 base licences. The product is designed for individual power users — researchers, analysts, legal professionals — where the personal productivity gain justifies the cost. It is not designed as a team-wide deployment. Deploying it across a 20-person business to handle shared workflows is not the use case it is built for.

Claude Cowork: "It is local-first but it is not local"

Claude Cowork runs the interface and file access on your desktop. But the model reasoning calls Anthropic's API. Your document content — or at minimum, the portions passed to the model as context — reaches Anthropic's US-based infrastructure during inference. For use cases where that is acceptable (personal productivity, non-sensitive documents), this is fine. For use cases where client confidentiality, patient data, or commercial-in-confidence material is involved, it is the same data residency question as Copilot, with the same US CLOUD Act exposure.

Both products: "The agent acts on my behalf — what if it acts wrong?"

Both Copilot Cowork and Claude Cowork include approval gates for consequential actions. In practice, the granularity and audit trail are limited. Neither product stores a local, queryable log of every reasoning step and every action proposed and taken. The self-hosted stack in this series uses PostgreSQL checkpointing via LangGraph: every decision, every proposed action, and every approval or rejection is written to a local database. If an action is disputed, you have a complete record. Under Australian Privacy Act obligations, that audit trail is not optional — it is how you demonstrate compliance.

The Numbers: Self-Hosted vs Subscribed

Self-hosted vs Copilot performance and cost comparison
FactorCopilot CoworkClaude Cowork (desktop)Self-Hosted (this series)
Cost (20 users/year)AU$10,560 + M365 baseAU$36,000-57,600 + M365 baseHardware amortised + ~AU$600/year electricity
Data residency (Australia)At rest in AU; AI inference temporarily processed offshore until end of 2026Local device; model API calls reach Anthropic (US)Fully on-premises; nothing leaves until approved Graph action executes
US CLOUD Act exposureYes — Microsoft is a US companyYes — Anthropic is a US companyNo — model inference runs on your hardware
ModelClaude (via Microsoft infrastructure)Claude (via Anthropic API)Gemma4-26B or any Ollama-compatible model
M365 integration depthNative — deep Graph integrationLimited — desktop files and connected appsFull Microsoft Graph API (same endpoints Copilot uses)
Non-Microsoft integrationsNot supported nativelyLocal apps onlyAny API with a tool definition
HITL approval gatesPer-skill configurationUser approves on desktopConfigurable interrupt before every irreversible action
Offline / air-gappedNoNo (model API requires internet)Yes — after initial model pull

Open-source model capability is not a significant trade-off in 2026. Gemma4-26B, Qwen3, and DeepSeek-V3 match proprietary frontier models on standard reasoning benchmarks. The capability gap that made self-hosting a compromise two years ago has closed.

The MSP and IT professional community has reached a similar conclusion: a 2025 survey of 333 MSPs by AvePoint and Omdia found that 51% identified governance and compliance as the biggest obstacle to AI adoption, not model capability. The question most Australian IT decision-makers are asking in 2026 is not "is the model good enough?" — it is "where does my data go when the model runs?"

This series answers that question by building a stack where the answer is always: it stays here.

Prerequisites

  • Ubuntu 22.04 LTS server with NVIDIA GPU (RTX 3090 or equivalent)
  • At least 32 GB system RAM, NVMe SSD
  • sudo access, internet connection for initial setup
  • Docker Engine 24+ and Docker Compose V2

Specification

The foundation stage has three components:

  1. NVIDIA driver + CUDA toolkit — confirmed working with nvidia-smi
  2. NVIDIA Container Toolkit — so Docker containers can access the GPU
  3. Base Docker Compose stack — PostgreSQL 16 + pgvector, Redis 7

Architecture overview:

Self-hosted Cowork architecture

Implementation

Step 1 — Verify NVIDIA driver

nvidia-smi
# Expected: driver version, CUDA version, GPU name (RTX 3090), 24576 MiB total VRAM

If nvidia-smi fails, install the driver:

sudo apt update
sudo apt install -y nvidia-driver-535
sudo reboot

Step 2 — Install NVIDIA Container Toolkit

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg
curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
  sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
  sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list
sudo apt update && sudo apt install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker

Step 3 — Create project directory

sudo mkdir -p /opt/cowork
sudo chown $USER:$USER /opt/cowork
cd /opt/cowork

Step 4 — Deploy the base Docker Compose stack

cat > /opt/cowork/docker-compose.yml << 'COMPOSE'
version: "3.9"
services:
  postgres:
    image: pgvector/pgvector:pg16
    restart: unless-stopped
    environment:
      POSTGRES_USER: cowork
      POSTGRES_PASSWORD: cowork_secret
      POSTGRES_DB: cowork
    volumes:
      - pgdata:/var/lib/postgresql/data
    ports:
      - "127.0.0.1:5432:5432"

  redis:
    image: redis:7-alpine
    restart: unless-stopped
    ports:
      - "127.0.0.1:6379:6379"

volumes:
  pgdata:
COMPOSE

docker compose up -d

Step 5 — Enable pgvector extension

docker exec -it cowork-postgres-1 psql -U cowork -d cowork \
  -c "CREATE EXTENSION IF NOT EXISTS vector;"

Validation

CheckCommandExpected output
GPU visiblenvidia-smi --query-gpu=name,memory.total --format=csv,noheaderNVIDIA GeForce RTX 3090, 24576 MiB
Docker GPU accessdocker run --rm --gpus all nvidia/cuda:12.0-base nvidia-sminvidia-smi output inside container
PostgreSQL updocker compose -f /opt/cowork/docker-compose.yml ps postgresState: running
pgvector loadeddocker exec cowork-postgres-1 psql -U cowork -d cowork -c "\dx"vector in extensions list
Redis upredis-cli pingPONG

Troubleshooting

  • nvidia-smi not found: Driver not installed — run Step 1 above.
  • Docker GPU access denied: NVIDIA Container Toolkit not installed or Docker not restarted after install.
  • pgvector extension missing: Use pgvector/pgvector:pg16 image, not plain postgres:16.
  • Port 5432 conflict: Stop any system PostgreSQL with sudo systemctl stop postgresql.

Stage 1 Complete

You now have a GPU-validated host with PostgreSQL+pgvector and Redis running. All containers restart automatically on host reboot.

Up Next: Part 2 — LLM Inference

In Part 2 we deploy Ollama, pull Gemma4-26B (MoE, ~17 GB Q4_K_M), configure the environment variables that unlock Flash Attention and parallel context, and validate 80+ tokens/second throughput on the RTX 3090.


Call to Action

If your team has hit the data governance wall with Copilot, or if you are evaluating alternatives before committing to per-seat licensing, follow along at momentums.com.au. Each part of this series is a stage you can implement and validate independently before moving to the next.

#SelfHosted #RTX3090 #Microsoft365 #CopilotCowork #AI #AustralianPrivacyAct #MSP