diff --git a/Data-Model.-.md b/Data-Model.-.md new file mode 100644 index 0000000..4827e50 --- /dev/null +++ b/Data-Model.-.md @@ -0,0 +1,244 @@ +# Data Model + +| Meta | Value | +|------|-------| +| **Repo** | `xpltdco/fractafrag` | +| **Page** | `Data-Model` | +| **Audience** | developers, agents | +| **Last Updated** | 2026-04-04 | +| **Status** | current | + +## Overview + +Fractafrag uses PostgreSQL 16 with the **pgvector** extension for vector similarity search, **uuid-ossp** for UUID generation, and **pg_trgm** for trigram text search. The schema is bootstrapped from `db/init.sql` on first startup. ORM models are defined in `services/api/app/models/models.py` using SQLAlchemy 2 (async). + +## Entity Relationships + +```mermaid +erDiagram + users ||--o{ shaders : "authors" + users ||--o{ votes : "casts" + users ||--o{ engagement_events : "generates" + users ||--o{ desires : "creates" + users ||--o{ api_keys : "owns" + users ||--o{ bounty_tips : "tips" + users ||--o{ comments : "writes" + shaders ||--o{ shader_versions : "has versions" + shaders ||--o{ votes : "receives" + shaders ||--o{ engagement_events : "tracked" + shaders ||--o{ comments : "has" + shaders ||--|| shaders : "forked_from" + desires ||--o{ desire_clusters : "clustered" + desires ||--o{ bounty_tips : "receives tips" + desires }o--o| shaders : "fulfilled by" +``` + +## Tables + +### `users` + +| Column | Type | Nullable | Default | Description | +|--------|------|----------|---------|-------------| +| `id` | UUID | No | gen_random_uuid() | Primary key | +| `username` | TEXT | No | — | Unique display name | +| `email` | TEXT | No | — | Unique email address | +| `password_hash` | TEXT | No | — | bcrypt hash (12 rounds) | +| `role` | TEXT | No | `user` | `user`, `moderator`, `admin` | +| `trust_tier` | TEXT | No | `standard` | `standard`, `creator`, `trusted_api` | +| `is_system` | BOOLEAN | No | false | Platform system account flag | +| `stripe_customer_id` | TEXT | Yes | NULL | Stripe customer ID | +| `subscription_tier` | TEXT | No | `free` | `free`, `pro`, `studio` | +| `ai_credits_remaining` | INTEGER | No | 0 | AI generation credits | +| `taste_vector` | vector(512) | Yes | NULL | Learned taste embedding (pgvector) | +| `is_verified_creator` | BOOLEAN | No | false | Verified creator status | +| `verified_creator_at` | TIMESTAMPTZ | Yes | NULL | When creator was verified | +| `stripe_connect_account_id` | TEXT | Yes | NULL | Stripe Connect for payouts | +| `created_at` | TIMESTAMPTZ | No | now() | Registration time | +| `last_active_at` | TIMESTAMPTZ | Yes | NULL | Last activity timestamp | + +**System account:** UUID `00000000-0000-0000-0000-000000000001`, username `fractafrag`, role `admin` + +### `shaders` + +| Column | Type | Nullable | Default | Description | +|--------|------|----------|---------|-------------| +| `id` | UUID | No | gen_random_uuid() | Primary key | +| `author_id` | UUID FK | Yes | — | FK → users (ON DELETE SET NULL) | +| `title` | TEXT | No | — | Shader title | +| `description` | TEXT | Yes | NULL | Shader description | +| `glsl_code` | TEXT | No | — | Full GLSL fragment shader source | +| `status` | TEXT | No | `published` | `draft`, `published`, `archived` | +| `is_public` | BOOLEAN | No | true | Visibility flag | +| `is_ai_generated` | BOOLEAN | No | false | Was this generated by AI? | +| `is_system` | BOOLEAN | No | false | Platform-curated shader | +| `ai_provider` | TEXT | Yes | NULL | `anthropic`, `openai`, `ollama` | +| `system_label` | TEXT | Yes | NULL | e.g., `fractafrag-curated` | +| `thumbnail_url` | TEXT | Yes | NULL | Rendered thumbnail path | +| `preview_url` | TEXT | Yes | NULL | Rendered preview path | +| `render_status` | TEXT | No | `pending` | `pending`, `rendering`, `ready`, `failed` | +| `style_vector` | vector(512) | Yes | NULL | Visual style embedding | +| `style_metadata` | JSONB | Yes | NULL | `{chaos_level, color_temp, motion_type, ...}` | +| `tags` | TEXT[] | No | `{}` | Array of tag strings | +| `shader_type` | TEXT | No | `2d` | `2d`, `3d`, `audio-reactive` | +| `forked_from` | UUID FK | Yes | NULL | Self-FK → shaders (fork origin) | +| `current_version` | INTEGER | No | 1 | Latest version number | +| `view_count` | INTEGER | No | 0 | View counter | +| `score` | FLOAT | No | 0 | Cached hot score for feed ranking | +| `access_tier` | TEXT | No | `open` | Content access level | +| `source_unlock_price_cents` | INTEGER | Yes | NULL | Price to view source (creator economy) | +| `commercial_license_price_cents` | INTEGER | Yes | NULL | Commercial license price | +| `verified_creator_shader` | BOOLEAN | No | false | By a verified creator | +| `created_at` | TIMESTAMPTZ | No | now() | Creation time | +| `updated_at` | TIMESTAMPTZ | No | now() | Last update time | + +### `shader_versions` + +Immutable snapshots created on every shader update. + +| Column | Type | Nullable | Default | Description | +|--------|------|----------|---------|-------------| +| `id` | UUID | No | gen_random_uuid() | Primary key | +| `shader_id` | UUID FK | No | — | FK → shaders (ON DELETE CASCADE) | +| `version_number` | INTEGER | No | — | Sequential version number | +| `glsl_code` | TEXT | No | — | GLSL source at this version | +| `title` | TEXT | Yes | NULL | Title at this version | +| `description` | TEXT | Yes | NULL | Description at this version | +| `tags` | TEXT[] | Yes | NULL | Tags at this version | +| `style_metadata` | JSONB | Yes | NULL | Style data at this version | +| `change_note` | TEXT | Yes | NULL | What changed (e.g., "fixed color bleeding") | +| `thumbnail_url` | TEXT | Yes | NULL | Thumbnail at this version | +| `created_at` | TIMESTAMPTZ | No | now() | Version creation time | + +**Unique constraint:** `(shader_id, version_number)` + +### `votes` + +| Column | Type | Nullable | Default | Description | +|--------|------|----------|---------|-------------| +| `id` | UUID | No | gen_random_uuid() | Primary key | +| `user_id` | UUID FK | No | — | FK → users (CASCADE) | +| `shader_id` | UUID FK | No | — | FK → shaders (CASCADE) | +| `value` | SMALLINT | No | — | `-1` (downvote) or `+1` (upvote) | +| `created_at` | TIMESTAMPTZ | No | now() | Vote time | + +**Unique constraint:** `(user_id, shader_id)` + +### `engagement_events` + +Tracks user interactions for feed personalization. + +| Column | Type | Nullable | Default | Description | +|--------|------|----------|---------|-------------| +| `id` | UUID | No | gen_random_uuid() | Primary key | +| `user_id` | UUID FK | Yes | NULL | FK → users (SET NULL) | +| `session_id` | TEXT | Yes | NULL | Anonymous session tracker | +| `shader_id` | UUID FK | No | — | FK → shaders (CASCADE) | +| `event_type` | TEXT | No | — | `view`, `dwell`, `replay`, `share` | +| `dwell_secs` | FLOAT | Yes | NULL | Time spent viewing | +| `metadata` | JSONB | Yes | NULL | Event-specific data | +| `created_at` | TIMESTAMPTZ | No | now() | Event time | + +### `desires` + +Community bounty requests. + +| Column | Type | Nullable | Default | Description | +|--------|------|----------|---------|-------------| +| `id` | UUID | No | gen_random_uuid() | Primary key | +| `author_id` | UUID FK | Yes | NULL | FK → users (SET NULL) | +| `prompt_text` | TEXT | No | — | Description of desired shader | +| `prompt_embedding` | vector(512) | Yes | NULL | TF-IDF + SVD embedding | +| `style_hints` | JSONB | Yes | NULL | Visual style preferences | +| `tip_amount_cents` | INTEGER | No | 0 | Bounty amount in cents | +| `status` | TEXT | No | `open` | `open`, `fulfilled`, `expired` | +| `heat_score` | FLOAT | No | 1 | Demand indicator (cluster size) | +| `fulfilled_by_shader` | UUID FK | Yes | NULL | FK → shaders | +| `fulfilled_at` | TIMESTAMPTZ | Yes | NULL | Fulfillment time | +| `expires_at` | TIMESTAMPTZ | Yes | NULL | Expiration date | +| `created_at` | TIMESTAMPTZ | No | now() | Creation time | + +### `desire_clusters` + +Groups similar desires by embedding proximity. + +| Column | Type | Description | +|--------|------|-------------| +| `cluster_id` | UUID | Cluster identifier | +| `desire_id` | UUID FK | FK → desires (CASCADE) | +| `similarity` | FLOAT | Cosine similarity to cluster representative | + +**Primary key:** composite `(cluster_id, desire_id)` + +### `bounty_tips` + +| Column | Type | Nullable | Default | Description | +|--------|------|----------|---------|-------------| +| `id` | UUID | No | gen_random_uuid() | Primary key | +| `desire_id` | UUID FK | No | — | FK → desires (CASCADE) | +| `tipper_id` | UUID FK | Yes | NULL | FK → users (SET NULL) | +| `amount_cents` | INTEGER | No | — | Tip amount | +| `stripe_payment_intent_id` | TEXT | Yes | NULL | Stripe payment reference | +| `status` | TEXT | No | `held` | `held`, `paid`, `refunded` | +| `created_at` | TIMESTAMPTZ | No | now() | Tip time | + +### `api_keys` + +MCP client authentication keys. + +| Column | Type | Nullable | Default | Description | +|--------|------|----------|---------|-------------| +| `id` | UUID | No | gen_random_uuid() | Primary key | +| `user_id` | UUID FK | No | — | FK → users (CASCADE) | +| `key_hash` | TEXT | No | — | bcrypt hash (unique) | +| `key_prefix` | TEXT | No | — | First 16 chars for identification | +| `name` | TEXT | Yes | NULL | User-given name | +| `trust_tier` | TEXT | No | `probation` | `probation`, `standard`, `verified` | +| `submissions_approved` | INTEGER | No | 0 | Approved submission count | +| `rate_limit_per_hour` | INTEGER | No | 10 | Allowed requests/hour | +| `last_used_at` | TIMESTAMPTZ | Yes | NULL | Last usage time | +| `created_at` | TIMESTAMPTZ | No | now() | Creation time | +| `revoked_at` | TIMESTAMPTZ | Yes | NULL | Revocation time (soft delete) | + +### `generation_log`, `comments`, `source_unlocks`, `creator_payouts`, `creator_engagement_snapshots` + +These tables exist in the schema but are stubs for future milestones (M4-M5). See `db/init.sql` for full definitions. + +## Indexes + +### Performance Indexes +- `shaders(score DESC)` — Feed ranking +- `shaders(created_at DESC)` — New feed +- `shaders(tags)` — GIN index for tag filtering +- `shaders(render_status)` — Filter rendered shaders +- `shaders(status, is_public)` — Published shader queries +- `shaders(title)` — Trigram text search (pg_trgm) + +### Vector Indexes (HNSW) +- `shaders(style_vector)` — Visual similarity (cosine ops) +- `users(taste_vector)` — Taste-based recommendations (cosine ops) +- `desires(prompt_embedding)` — Desire clustering (cosine ops) + +### Relationship Indexes +- `votes(user_id, shader_id)` — Quick vote lookup +- `engagement_events(user_id, shader_id, session_id)` — Engagement queries +- `desires(status, heat_score)` — Bounty board ranking +- `api_keys(user_id, key_prefix)` — Key lookup + +## Migration Strategy + +The initial schema is bootstrapped from `db/init.sql` which runs as a Docker entrypoint script on first PostgreSQL startup. Subsequent migrations use **Alembic** (configured in the API service). The ORM models in `app/models/models.py` mirror the init.sql schema. + +## Hot Score Calculation + +The `score` column on shaders is recalculated on each vote change using Wilson score lower bound with time decay: + +``` +wilson_lower = (p + z²/2n - z√(p(1-p)/n + z²/4n²)) / (1 + z²/n) + where p = upvotes/total, z = 1.96 (95% CI), n = total votes + +time_decay = 1 / (1 + age_hours/48) + +score = wilson_lower * time_decay +``` + +This gives shaders a ~3-day half-life while still rewarding high vote ratios. \ No newline at end of file