Integration Testing

deploy → verify comprehensive checks → teardown — end-to-end on real cloudflare infrastructure

Overview

The integration test runs a complete deploy → smoke test → teardown cycle against a real Cloudflare deployment. It deploys the full platform (6 workers, D1, KV, dispatch namespace), creates test users, pushes a fixture project, runs comprehensive verification steps covering every feature, then tears everything down. Nothing is mocked — this tests the actual production code path.

Deploy

6 workers + infra

→

Smoke Test

many steps × N

→

Teardown

delete everything

Run It

npm run cli-infra -- integration-test 3
Deploys, runs 3 concurrent smoke tests, tears down.

Private Only

Tests refuse --prod. They only run against private deployments prefixed with PRIVATE_PLATFORM.

Always Tears Down

Teardown runs in a finally block, so even if tests fail, resources are always cleaned up.

Deploy

The deploy step provisions the entire Fling platform on Cloudflare. All resources are prefixed with the PRIVATE_PLATFORM value to avoid colliding with production or other developers' deployments.

Step 1

Validate environment

provision storage

{prefix}-platform

{prefix}-usage

Namespace

{prefix}-users

run D1 migrations

Config

Generate wrangler.toml for each worker

deploy in order

{prefix}-discord

Plugin worker

{prefix}-slack

Plugin worker

{prefix}-api

Platform API

{prefix}-dispatch

Request router

{prefix}-cron

Cron scheduler

{prefix}-email-inbound

Email receiver

set secrets + fake services

Ready

Poll /health until API responds

Dependency order matters. Plugin workers are deployed first because the API worker has service bindings (DISCORD_PLUGIN, SLACK_PLUGIN) to them. If deployed in the wrong order, the API worker would fail to bind and reject deploy requests.

Smoke Test Steps

Each smoke test instance creates a test user, scaffolds a project with fixtures, deploys it, and runs comprehensive ordered verification steps. Every major Fling feature is exercised against real Cloudflare infrastructure.

Setup

Check Prereqs

CLI build exists

API Reachable

GET /health returns ok

Cleanup Users

Remove stale test users

Create User

st-{id}@test.com + token

Setup Project

Init, login, copy fixtures

Set Secrets

fling secrets set TEST_SECRET

Local Dev

Dev Server

Shared worker tests + cron, email

Deploy & Auth

Deploy Push

fling it → live URL

Resend Setup

Fake email service tenant

Discord Setup

Fake tenant + guild

Verify Whoami

CLI auth state check

HTTP & Frontend (shared tests via WorkerTestEnv — same logic as step 07)

Health Check

Worker health + secret

Frontend Test

Playwright browser check

Static Assets

/logo.svg serves correctly

SPA Fallback

Unknown routes → index.html

Backend Features (shared tests via WorkerTestEnv — same logic as step 07)

API Tests

Todo CRUD + SQL verify

WASM Test

5 + 7 = 12 via WASM

Storage Tests

R2 upload, download, list, delete

Presigned URLs

Direct R2 upload + download

Integrations

Discord Tests

Commands, messages, reactions

Email Tests

Receive + parse + store

Email Verification

Signup → verify → confirmed

Unverified Deploy

Must fail for unverified user

Cron & Ops

Cron Tests

Logs Test

fling logs --prod

Feedback Test

fling feedback submit

Usage Test

GraphQL analytics query

Multi-Project & Lifecycle

Multi-Project

Second project, both work

Signup Flow

Signup → verify

Slug Change

Rename + redirect check

Project Takedown

Delete + verify cleanup

Fixture project. Each smoke test instance deploys a purpose-built fixture with a Hono backend (todos, health, WASM, storage, presigned URLs, Discord, email handlers), a React frontend (displays secrets, uses Tailwind), a WASM module (add.wasm), and a static asset (logo.svg). The fixture exercises every Fling runtime feature.

Stress Test

The stress test runs N smoke test instances in parallel to verify the platform handles concurrent users. Each instance gets its own user, project, and ports, but they share fake service tenants to test real multi-tenant behavior.

Prepare

Build CLI + generate N random IDs

create shared tenants

Discord

1 tenant, N guilds

Resend

From deploy

launch N in parallel

Instance 0

ports 7654 / 8765

Instance 1

ports 7664 / 8775

Instance N

ports +10 each

await all

Summary

N/N passed — teardown test users

Port Isolation

Each instance gets unique local ports: backend 7654 + (i × 10), frontend 8765 + (i × 10). This lets multiple dev servers run simultaneously without port conflicts.

Test User Patterns

Main user: st-{id}@test.com, signup: signup-{id}@test.com, email verify: ev-{id}@test.com. Cleanup matches the st-*, inv-*, ev-* prefixes.

Diagnostics

When a smoke test step fails, the runner automatically dumps diagnostic information to help identify the root cause. Failures are reported with full context — what was expected, what was received, and what the system state looks like.

On Failure

The dumpDiagnostics function runs automatically when any step throws. It collects five categories of information to aid debugging.

Worker Logs

fling logs --prod --since 5m
Last 50 log entries from the deployed worker.

Database Tables

Queries sqlite_master to list all tables, confirming the schema was applied.

Cron Jobs

fling cron list --prod
Shows registered cron jobs and their schedules.

Cron History

fling cron history for the test cron job. Shows invocation timestamps, success/error counts, and error messages.

Dispatcher Diagnostics

Direct /diagnostics endpoint on the cron worker (private only). Shows isDue, nextRun, lastScheduledFor.

Retry Logic

HTTP requests use fetchWithRetry with exponential backoff. The health check polls up to 30 times with 2–10s delays. Deployment propagation gets built-in 3–5s delays.

Preserving Evidence

On failure: test directory and token are preserved for manual debugging. On success: test dirs are cleaned up. Use --no-cleanup to always preserve.

Teardown

Teardown deletes all Cloudflare resources created during deploy. It runs in the integration test's finally block, so it always executes even if tests fail. Every step is non-fatal — individual failures are logged as warnings, and teardown continues.

Start

Begin teardown (non-fatal)

DNS records

Custom hostname

delete workers

API

Dispatch

Cron

Discord

Slack

delete namespace

D1 databases

KV namespace

R2 buckets

Done

All resources deleted

Non-fatal by design. If deleting the D1 database fails (maybe it's already gone), teardown logs a warning and continues to delete KV, R2, and everything else. This ensures partial failures don't leave resources behind. The teardown also refuses --prod as a safety measure.

Configuration

Integration tests are configured via .env variables. The PRIVATE_PLATFORM prefix isolates all resources so multiple developers can test simultaneously without conflicts.

Required

Core Variables

CLOUDFLARE_ACCOUNT_ID your CF account
CLOUDFLARE_API_TOKEN API token with Workers/D1/KV/R2 access
ADMIN_KEY admin API authentication key
PRIVATE_PLATFORM resource prefix (e.g., my-fling)

Optional

Fake Services & Extras

FAKE_DISCORD_URL mock Discord API
FAKE_RESEND_URL mock Resend API
SLACK_API_TOKEN Slack notifications
R2_ACCESS_KEY_ID R2 storage access
DEV_DOMAIN custom dev domain

CLI Options

--verbose shows subprocess output. --skip-deploy skips deployment. --skip-teardown skips teardown. --no-cleanup preserves test directories.

Private vs Production

Private: PRIVATE_PLATFORM=my-fling prefixes everything, uses workers.dev URLs. Production: fixed names, custom domain. Tests always refuse --prod.

Local Test

The local-test command runs the full dev server test suite without any Cloudflare credentials. It reuses the same shared worker tests and local-only tests that the smoke test's dev-server step runs, but in a standalone flow that only needs Node.js and npm.

No cloud credentials needed. The local test scaffolds a temporary project, starts fling dev, and exercises the entire local stack: health checks, secrets, CRUD, storage, presigned URLs, WASM, static assets, frontend HTML, Vite proxy, cron (list/trigger/history/failures), and email triggers.

Build CLI

npm run build

→

Scaffold

init + fixtures + install

→

Dev Server

fling dev + all tests

→

Cleanup

remove temp dirs

Run It

npm run cli-infra -- local-test
Add --verbose for detailed output.

No Credentials

No CLOUDFLARE_ACCOUNT_ID, ADMIN_KEY, or PRIVATE_PLATFORM required. Just Node.js and npm.

Custom Ports

--be-port 4000 and --fe-port 5000 override the default backend (7654) and frontend (8765) ports.

Cleanup behavior. On success, both project and config temp directories are deleted. On failure, the project directory is preserved for debugging while the config directory is cleaned up. Use --no-cleanup to always preserve both directories.

Workflow Testing

The workflow system has a layered test suite: unit tests for the runtime integration, stress tests for concurrency, and a comprehensive engine test suite in the vendored flingflow package. All workflow tests run in-memory using MemoryEventStore for fast, isolated execution — no external dependencies required.

Four test layers. The workflow runtime unit tests (28 tests) verify the Fling-side integration. The stress tests (6 tests) verify concurrency and throughput. The flingflow engine tests (82 tests) cover the core event-sourced engine, stores, recovery, context building, and deterministic simulation. Together they ensure workflows are correct under both normal and high-load conditions.

Unit Tests

Workflow Runtime — 28 tests

Located in src/workflow/__tests__/runtime.test.ts. Uses flingflow's MemoryEventStore for fast, isolated testing.

Workflow registration and discovery
Start workflow and retrieve results
Scratchpad read/write persistence
Duplicate workflow deduplication
NonRetryableError error handling
Max attempts and retry limits
Get and list workflow queries
Engine-not-initialized guard checks
Metadata extraction from events

Stress Tests

Concurrency & Throughput — 6 tests

Located in src/workflow/__tests__/stress.test.ts. Verifies correctness under concurrent load.

High-volume concurrent workflows (60+)
Dedup correctness under concurrency
Failure and retry under load
Scratchpad integrity across concurrent workflows
Mixed workflow types running simultaneously
Throughput measurement (~9000 workflows/sec)

flingflow Engine Tests

Core Engine — 82 tests across 7 suites

The vendored packages/flingflow/ package has its own comprehensive test suite covering the event-sourced workflow engine, event stores, recovery, and context building.

Engine Tests

test/engine.test.ts
Core engine lifecycle: register, start, complete, fail, retry, dedup.

Store Conformance

test/store-conformance.ts
Shared test suite run against both MemoryEventStore and SqliteEventStore.

Recovery Tests

test/recovery.test.ts
Stuck workflow detection, heartbeat timeout, recovery re-execution.

Context Tests

test/context.test.ts
Context building from event streams, state reconstruction.

Simulation Tests

test/simulation.test.ts
Deterministic simulation for reproducible workflow testing.

Store Tests

test/store-memory.test.ts test/store-sqlite.test.ts
Store-specific edge cases beyond conformance.

No D1EventStore test file. The D1EventStore adapter (src/worker-runtime/d1-event-store.ts) is tested indirectly through the workflow runtime tests and integration tests. It follows the same EventStore interface validated by flingflow's store conformance suite.

Running Workflow Tests

Commands

Workflow tests can be run independently or as part of the full test suite.

Runtime Tests

npx vitest run src/workflow/__tests__/ — runs all 28 unit tests + 6 stress tests

flingflow Engine

cd packages/flingflow && npm test — runs all 82 engine tests

flingflow Stress CLI

cd packages/flingflow && npm run stress — runs the flingflow stress test CLI with throughput benchmarks

Full Suite

npm run test:run — runs all tests including workflow tests as part of the pre-commit check

Integration

Smoke Test: Workflow Execution

The same testWorkflow() function runs against both the local dev server (Step 7) and the deployed worker (Step 27), verifying end-to-end workflow execution in both environments.

The smoke test workflow has three step types: start (doubles the input), sleep (sleeps 10s per invocation, 60s total across 6 iterations), and persist (writes the result to D1). The 60s total sleep exceeds the 50s queue consumer execution budget, forcing a re-enqueue on deployed — proving the queue continuation path works on real infrastructure.

The test verifies: workflow creation, step execution, scratchpad data surviving across re-enqueue, D1/SQLite persistence by run_id, deduplication, and all four CLI commands (fling workflow list, show, show -v, start).