Data Export Format
Last updated
Apna Data Export Format
Effective Date: [v1.0 launch] Last Updated: 2026-05-21 (draft) Foundational Principle: §13.3 acquisition-survival
Why this document exists#
Apna's foundational principle 13.3 ("Acquisition-survival guarantees") commits to: users should be able to leave with everything they came in with, intact, in formats no future owner can lock down.
This document specifies the exact format of the ZIP file that Apna produces when you tap Settings → Privacy → Export Data. We commit to this format being stable across versions — newer versions of Apna add new sections to the ZIP, but never break existing section formats.
ZIP structure#
apna-export-<your-handle>-<YYYY-MM-DD>.zip
├── README.md # Human-readable index of contents
├── version.json # Export schema version + Apna app version
├── identity/
│ ├── did.txt # Your DID (public)
│ ├── ed25519-public.b58btc.txt # Your ed25519 identity public key
│ ├── x25519-public.b58btc.txt # Your x25519 static public key
│ └── private-keys.encrypted.bin # AES-256-GCM encrypted with your device's unlock key (you decrypt locally)
├── memory/
│ ├── schema.cypher # LadybugDB schema as Cypher DDL
│ ├── data.cypher # Full data dump as Cypher CREATE/MERGE statements
│ ├── data.jsonld # Same data as JSON-LD (machine-readable)
│ └── data.md # Same data as human-readable Markdown
├── onboarding/
│ └── onboarding-state.json # Your Phase 2 onboarding decisions (Mira personality, food prefs, etc.)
├── a2a/
│ ├── sessions/ # Per-peer Double Ratchet session state (encrypted; useless without your keys)
│ │ └── <peer-did>.encrypted.bin
│ ├── transcripts/ # Decrypted message transcripts (you decrypted them, so the export contains plaintext for your eyes)
│ │ └── <peer-did>/<conversation-id>.md
│ └── agent-card.json # Your current published AgentCard
├── personal-voice/ # Only present if Personal Voice is enabled
│ ├── embedding.encrypted.bin # Your speaker embedding (encrypted; we cannot decrypt)
│ ├── training-metadata.json # Date trained, duration, model version
│ └── ai-disclosure-log.json # All cloned-voice usage events (timestamps + conversation refs)
├── subscriptions/ # Pro tier billing history
│ └── pro-history.json
└── settings/
├── tier-assignments.json # Phase 2 tier overrides
├── standing-rules.json # Phase 2 standing rules (e.g., "always answer Mom")
└── notification-preferences.json
Format guarantees#
Each section's format is stable. Specifically:
version.json (always present)#
{
"schema_version": "1.0",
"apna_app_version": "1.0.0",
"exported_at": "2026-05-21T22:15:00Z",
"exported_from_platform": "ios" | "android",
"user_did": "did:key:z6Mk..."
}
schema_version is incremented when we add a new top-level directory. Within a major version (e.g., 1.x), all existing sections retain their format.
memory/schema.cypher#
The LadybugDB schema is documented in docs/superpowers/specs/2026-05-19-apna-design.md §3.2. The export contains the schema DDL verbatim, so a technically-able user can reconstruct the schema on any compatible graph database (LadybugDB, Kuzu, Neo4j with minor translation).
memory/data.cypher#
Every Person, Episode, Fact, Rule, Decision, Agent node + every edge as Cypher CREATE/MERGE statements. Loadable into a fresh LadybugDB instance via LOAD CYPHER FROM './data.cypher'.
memory/data.jsonld#
Same data as JSON-LD using a schema we publish at https://apna.com/schema/memory-v1.jsonld. JSON-LD is a W3C standard; any compliant tool can ingest it.
memory/data.md#
Human-readable Markdown. Person → "## People" section. Episode → "### Events" timeline. Fact → "### Things I remember." This is the format you read directly without any tooling.
onboarding/onboarding-state.json#
The Phase 2 onboarding-state-v1 JSON schema. Documented at docs/implementation/golden/onboarding-state-v1.json in the Apna repo.
a2a/sessions/<peer-did>.encrypted.bin#
Your Double Ratchet SessionState for each peer you've messaged. Encrypted at rest with FileProtectionType.completeUnlessOpen (iOS) / EncryptedFile (Android). The export ZIP contains the encrypted blob; only your device's keys can decrypt.
Why ship this if it's encrypted? So you can import on another device you own (e.g., new phone) and continue conversations with peers without re-establishing the X3DH handshake.
a2a/transcripts/<peer-did>/<conversation-id>.md#
The plaintext message transcripts for each conversation. These were already decrypted by your device, so you exporting them is just "writing what you already see in the app to a file." Markdown format with timestamp + sender + content.
personal-voice/embedding.encrypted.bin#
Your ECAPA-TDNN speaker embedding (192-dim float32, encrypted with Secure Enclave/StrongBox-backed key). The Apna app on a new device you own can re-import this without re-running the 15-minute training flow.
Important: This file is useless to anyone other than you. The encryption key lives in your device's secure hardware and is bound to your biometric authentication. We cannot decrypt; an attacker cannot decrypt; only you (with biometric unlock) can decrypt.
personal-voice/ai-disclosure-log.json#
Every cloned-voice utterance Mira has produced, with timestamp + conversation ID + duration. The on-device log only keeps 90 days; the export contains the full history available at export time.
Round-trip guarantee#
We commit to: at any time, you can export → uninstall Apna → reinstall → import → and have a functionally identical app state.
This is verified by an automated test in CI (make demo-roundtrip-export once Phase 8 lands).
What's NOT in the export#
For clarity:
- Apple/Google account information — these are managed by the platform; we never have access. To migrate your contacts/calendar, use the platform's own export tools.
- System keychain entries — these stay on your device. iOS/Android's own iCloud Keychain / Google Backup handle them per your platform settings.
- App Store / Play Store subscription history — these are managed by Apple/Google. Pro tier billing history is in
subscriptions/pro-history.jsonfor the period we have records. - Cloudflare backend logs — we don't keep these long-term (30-day operational retention on Cloudflare edge; not user-queryable).
Stable across acquisitions#
Per ToS §8.2 and Privacy Policy §10, this format is stable across any acquisition or restructuring. The acquiring company can add new sections (with new keys in version.json), but cannot remove or break existing sections. If they do, the existing engineering team will publish the import tooling open-source so you can read your data with any compatible parser.
Tools we publish#
At v1.0 launch we will publish, under MIT license:
apna-export-validator— a CLI that validates the structure of an export ZIPapna-export-reader— a CLI that reads an export ZIP and pretty-prints itapna-export-to-csv— a CLI that converts the memory section to CSV for spreadsheet analysis- A LadybugDB importer — a script that takes
memory/data.cypherand loads it into a fresh LadybugDB instance
Source repositories TBD; will be linked here.
Legal disclaimer (DRAFT): This document is a first-pass engineering-team draft. It is the canonical specification of the export format from an engineering standpoint. It supplements (but does not replace) the legal commitments in the Privacy Policy and Terms of Service.