# Qapha — safety model & privacy posture

> *isiZulu / isiXhosa: **qapha** — "be watchful, be alert."*
> The third CongoSky sibling. **Khuluma** speaks. **Lalela** plays. **Qapha** watches
> the road for the child — never the child for the parent.

This is the canonical design for Qapha: an AI-moderated, age-appropriate chat
environment for children where the guardian **does not read the chats** and is
alerted **only** to real safety signals — self-harm, grooming/predatory contact,
abuse disclosure, and targeted cruelty.

It exists because of a real parent's ask: *"I don't need to see their chats. I
just need to know about self-harm, or an uncle, or someone on Roblox."* That
sentence is the whole product. Everything below serves it.

---

## 0. The one rule

**The guardian sees `category + severity + recommended action`. Never the
message text.** A child's words stay with the child. The only exception is an
explicit, logged **break-glass** (§6) when a life is at risk.

If a design choice would put a child's verbatim words in front of a parent by
default, it is the wrong choice. The safest record is the one we never surface.

---

## 1. Why "don't show the parent the chat" is the safety feature

It is counter-intuitive, so state it plainly:

- A child who knows a parent reads everything **stops telling the truth** — to
  the parent and to the AI that could actually help. The disclosure that saves
  them ("my uncle…") only happens in a space the child believes is theirs.
- Surveillance a child can feel is surveillance a child **routes around** —
  straight back to the free, unmoderated apps Qapha is meant to replace.
- So Qapha keeps the child's trust on purpose. The AI is a confidant; the
  classifier is a smoke alarm wired to the parent. The child gets support in the
  moment; the parent gets a signal, not a transcript.

This is the same posture as Lalela's "listen *for* them, not *on* them" and
Khuluma's "the safest record is the one we never keep."

---

## 2. Risk tiers

Every message is classified into exactly one tier. The tier decides what (if
anything) the guardian experiences.

| Tier | Meaning | Guardian experience |
|------|---------|---------------------|
| 🟢 **green** | Ordinary, safe chat. | **Nothing.** Stays private. This is most messages. |
| 🟠 **amber** | Worth watching: low mood, mean exchange, a personal detail shared, mild risk. | No alert. Increments a **trend**. Repeated amber in a short window → a gentle "check in with your kid" nudge. |
| 🔴 **red** | A real safety signal. | **Alert now:** category, plain-language meaning, recommended next step, crisis resources. |

The overall tier of a message is the **worst** signal it contains. Tiers are
deliberately conservative on the red line: for child safety we accept a higher
false-positive rate on red than we would for an adult product.

---

## 3. Categories

Grouped by the tier they normally raise. `who` = whose message the rule applies
to: the **child**, or **someone messaging the child** (inbound).

### Raises 🔴 red

| Category | who | What it catches |
|----------|-----|-----------------|
| `self_harm` | child | Suicidal ideation, self-injury, "better off without me," methods. |
| `abuse_disclosure` | child | A child disclosing that a known adult is hurting them — *the "an uncle" case.* Family member/coach/teacher + harm or secrecy or "comes into my room at night." |
| `grooming_secrecy` | other | "Don't tell your mum," "our secret," "delete these messages." |
| `grooming_move_platform` | other | "Add me on Snap/Discord/WhatsApp," "let's go private," "turn on your camera." |
| `grooming_meet` | other | "Let's meet up," "where do you live," "don't bring your parents." |
| `grooming_sexual_solicit` | other | Requests for photos, "are you home alone," sexual content aimed at a child. |
| `directed_harm` | other | Telling the child to harm themselves ("kys," "go die"). |

### Raises 🟠 amber (watch / trend)

| Category | who | What it catches |
|----------|-----|-----------------|
| `low_mood` | child | Sadness, loneliness, "nobody likes me." |
| `grooming_flattery_isolation` | other | "You're so mature," "I'm the only one who gets you," "how old are you?" — the soft, pre-predatory register. |
| `pii_disclosure` | child | Sharing address, school, phone, "I'm home alone." |
| `bullying` | any | Insults and cruelty in either direction. |
| `violence` | child | Talk of weapons or hurting others (context matters). |

### Escalation: alert on the *pattern*, not the word

Predators don't say one scary word; they run a *sequence*. Qapha models that:

- **Two or more grooming flags** in a conversation → the whole exchange is **red**,
  even if each line alone was amber. ("How old are you?" + "don't tell your mum"
  + "send a pic" = predator.)
- **PII shared inside a grooming conversation** → red, not a shrug. The address
  matters more when someone's been steering toward it.
- A `needsContext` flag (e.g. a bare "how old are you?") **demotes to amber**
  unless a second grooming flag corroborates — this is how we keep an innocent
  classmate question out of the red zone.

See `demo/moderation.js` for the executable version of every rule above.

---

## 4. Architecture

```
child's device                      CongoSky / Yama API                 OpenRouter
┌────────────────────┐  message   ┌───────────────────────┐  classify  ┌──────────┐
│ Qapha chat client  │ ─────────► │ /qapha/classify       │ ─────────► │ safety   │
│ • on-device pass   │            │ • merge local+model   │            │ model    │
│   (moderation.js)  │ ◄───────── │ • take HIGHER tier    │ ◄───────── │ (JSON)   │
│ • companion reply  │  verdict   │ • write alert/trend   │            └──────────┘
└────────────────────┘            └───────────┬───────────┘
        ▲                                      │ red/amber only
        │ caring reply, no flags               ▼
   the child                          ┌───────────────────┐
                                      │ guardian dashboard │  category + action,
                                      │ (push / SMS)       │  NEVER the text
                                      └───────────────────┘
```

**Two layers, on purpose:**

1. **On-device first pass** (`moderation.js`) — pattern matching that runs in the
   browser, offline, at ~0ms. Catches the obvious instantly, works with no
   network, and is fully auditable (you can read every rule). It is also the
   fallback when the model is unreachable: the child is never unprotected.
2. **Safety model** (server-side, OpenRouter) — the nuanced read: sarcasm, coded
   language, slang, multi-turn grooming, languages the regex doesn't cover. The
   model **sees the message to classify it**; the *guardian* sees only the
   verdict. Take the **higher** (more cautious) tier of the two layers.

The browser **never holds the model key.** It calls our endpoint; our endpoint
calls OpenRouter. Same boundary CongoSky uses everywhere.

### The classifier contract

`classify({ text, from }) → { tier, signals[], redacted }` — identical shape from
both the local heuristics and the remote model, so they're interchangeable and
mergeable. Server prompt sketch:

```
system: You are a child-safety classifier for a kids' chat product.
Output ONLY the JSON schema. NEVER echo or quote the child's words.
Tiers: green | amber | red. Categories: self_harm, abuse_disclosure,
grooming_secrecy, grooming_move_platform, grooming_meet,
grooming_sexual_solicit, grooming_flattery_isolation, directed_harm,
low_mood, pii_disclosure, bullying, violence.
Escalate to red when ≥2 grooming flags co-occur. Be conservative: when a
child's safety is plausibly at stake, choose the higher tier.
response_format: json_schema { tier, signals:[{category,tier,label,action}], confidence }
```

`demo/moderation.js → llmClassify()` is the working drop-in seam.

---

## 5. What we store (and refuse to)

| Stored | Not stored (guardian-visible) |
|--------|-------------------------------|
| Alert category, severity, timestamp | Verbatim chat as a parent-readable log |
| Trend counters (e.g. amber/day) | A browsable transcript for the guardian |
| Child + guardian account links | Behavioural ad profile of the child |
| Audited break-glass events | Anything sold, ever |

Raw messages may transit the classifier and exist transiently for the child's own
session/companion context, governed by a tight retention window — but they are
**not** a guardian-facing store. CongoSky's platform AI boundary (classify and
redact PII before model context, re-check output before persistence) applies.

---

## 6. Break-glass (the audited exception)

When a red signal indicates imminent danger (active self-harm intent, an
in-progress meet-up), a guardian may need more than a category. Break-glass
reveals limited context. It is:

- **Explicit** — a deliberate action, never the default view.
- **Logged** — every reveal is recorded (who, when, which alert).
- **Minimised** — the least context needed to act, not the whole history.
- **Escalating, by design** — for an older child's privacy, a reveal can require
  a second guardian or a safeguarding professional to co-sign (configurable per
  family/age). The demo shows the button and states the reveal would be logged.

Break-glass is a fire axe behind glass: there when a life is on the line,
embarrassing to break for no reason, and it leaves a mark.

---

## 7. Honest limits

- **This is not a safeguarding service.** Automated moderation misses things and
  over-triggers. Qapha is a smoke alarm, not a fire brigade. The caring adult is
  still the plan.
- **Heuristics are not understanding.** The on-device pass is pattern matching;
  it is fooled by novel slang and trips on innocent phrasing. The model layer
  helps; it is not perfect either.
- **Mandatory reporting & local law.** Abuse disclosures may carry legal
  reporting duties depending on jurisdiction. Qapha surfaces resources (Childline
  SA 116, SAPS 10111, SADAG 0800 567 567) and is **not** a substitute for them.
- **Consent & age.** Children should know an AI is keeping them safe and that a
  trusted adult is told about danger — Qapha is honest with the child that it
  watches for danger, even as it keeps everyday chat private. The exact balance
  is set per family and per child's age.

---

## 8. Build status

| Piece | State | Where |
|-------|-------|-------|
| Concept / marketing page | ✅ done | `qapha/index.html` |
| Working demo (offline) | ✅ done | `qapha/demo/` |
| Tiered classifier (heuristic) | ✅ done | `qapha/demo/moderation.js` |
| Kid companion (stub) | ✅ done | `qapha/demo/companion.js` |
| Server `/qapha/classify` (OpenRouter) | 🟡 reference written | `qapha/server/classify.py` — drop into platform repo (`arjuna-badger-platform`) |
| Guardian dashboard (real, push/SMS) | ⬜ next | platform repo |
| Auth / family linking | ⬜ next | Auth0 + Neon |

**Next build step:** a working reference for `/qapha/classify` now lives in
`qapha/server/classify.py` (FastAPI + OpenRouter, JSON-schema structured output,
default model `anthropic/claude-haiku-4-5`). Mount it on the CongoSky/Yama API
(`app.include_router(qapha_router)`); `llmClassify()` already points at the
endpoint. The demo's local pass is the offline fallback; everything speaks the
same `{tier, signals[]}` contract. See `qapha/server/README.md`.

---

*Questions: info@congosky.cloud. Qapha is part of CongoSky — the sovereign cloud
for Africa. Umuntu ngumuntu ngabantu.*