Data Retention Policy (Iceberg, Conservation & Unfreeze Rules)¶
Overview¶
This policy defines how Nissan North America (NNA) manages data lifecycle and storage using Apache Iceberg, ensuring that datasets are retained, conserved, or deleted according to business, compliance, and operational requirements.
It introduces a tiered conservation model — Active, Conserved, and Frozen (Unfreeze on Demand) — to balance data accessibility, cost, and compliance.
Purpose¶
- Establish retention and archiving standards using Iceberg’s table versioning and metadata management.
- Optimize data storage by migrating stale data to conservation or frozen states.
- Maintain data recoverability and auditability while controlling costs.
- Ensure deletion and unfreeze actions comply with retention policies and regulatory obligations.
Retention Model¶
| Tier | Description | Access Level | Typical Duration | Storage Location |
|---|---|---|---|---|
| Active | Frequently accessed operational or analytical data | Read/Write | 0–12 months | Primary Iceberg tables (production schema) |
| Conserved | Infrequently accessed but needed for compliance or reprocessing | Read-only | 1–5 years | Secondary Iceberg catalog / cold storage |
| Frozen | Rarely accessed, preserved for audit or legal hold | Archived, unfreezed on request | 5+ years | Glacier-like archive, separate retention zone |
Data Lifecycle Rules¶
- Transition to Conservation:
- Triggered when data has no updates for X months (e.g., 12 months).
- Iceberg metadata snapshot is retained, and data files are moved to lower-cost storage.
-
Table is marked read-only and versioned.
-
Transition to Frozen:
- Triggered when data has not been accessed for Y years (e.g., 5 years).
- Data files are moved to archival storage (e.g., S3 Glacier or Azure Archive).
-
Metadata and schema definitions remain for lineage and unfreeze reference.
-
Unfreeze Rule:
- Authorized request (Data Owner + Compliance approval) restores a frozen dataset to an Active or Conserved state.
- Restoration process is tracked in the audit log.
-
Restored datasets are automatically reclassified under retention tier rules.
-
Deletion Rule:
- Data that exceeds retention and is not under legal hold is securely purged using Iceberg delete manifests.
- Audit trail of deletion (who, when, why) must be maintained.
- Deletion is confirmed by Compliance.
Technical Implementation¶
| Function | Description | Tooling |
|---|---|---|
| Time-based Partitioning | Partition tables by date to simplify retention filtering | Iceberg partition spec (transaction_date) |
| Snapshot Expiration | Automatically delete outdated table snapshots | expire_snapshots API |
| Metadata Compaction | Optimize small file and snapshot management | Iceberg maintenance jobs |
| Tiered Storage Policies | Assign storage class based on retention tier | S3 Lifecycle, Azure Blob tiering |
| Governance Audit Logs | Record all conservation, unfreeze, and deletion actions | Supabase / Snowflake audit tables |
| Access Control | Restrict access to conserved or frozen data | Role-based Iceberg catalog permissions |
Example: Automated Iceberg Lifecycle (Pseudocode)¶
# Example pseudocode for automated Iceberg retention management
from datetime import datetime, timedelta
from iceberg import Table, expire_snapshots
def manage_table_lifecycle(table: Table):
today = datetime.utcnow()
# Expire snapshots older than 12 months
expire_snapshots(table, expire_before=today - timedelta(days=365))
# Transition data to conservation tier
if table.last_modified < today - timedelta(days=365):
move_to_tier(table, tier="conservation")
# Transition data to frozen tier
if table.last_accessed < today - timedelta(days=5*365):
move_to_tier(table, tier="frozen")
# Securely delete expired data
if table.retention_expired():
delete_table_data(table)
This process would typically run as part of a governance-managed data lifecycle job, orchestrated by Airflow, Dagster, or Databricks workflows.
Visual Representation¶
flowchart TD
A[Active Tier] -->|No updates 12m| B[Conserved Tier]
B -->|No access 5y| C[Frozen Tier]
C -->|Unfreeze Request| B
C -->|Retention Expired| D[Secure Deletion]
D --> E[Audit Log Entry]
Roles and Responsibilities¶
| Role | Responsibility |
|---|---|
| Data Owners | Approve conservation and unfreeze requests; define tier thresholds |
| Data Stewards | Implement lifecycle tagging and monitor data state transitions |
| IT / Platform Teams | Manage storage policies, automate Iceberg lifecycle jobs |
| Compliance / Legal Teams | Approve final deletion, maintain audit logs, enforce legal holds |
| Data Governance Office | Oversee adherence, review metrics, and evolve policies |
Key Takeaways¶
- Iceberg enables metadata-driven lifecycle management with version control and retention automation.
- Tiered conservation model ensures cost efficiency while maintaining compliance and recoverability.
- Unfreeze procedures provide controlled, auditable restoration for frozen data.
- Combining Iceberg with S3 / Azure tiering delivers a scalable, policy-driven data retention framework for NNA.