Skip to content

Data Retention Policy (Iceberg, Conservation & Unfreeze Rules)

Overview

This policy defines how Nissan North America (NNA) manages data lifecycle and storage using Apache Iceberg, ensuring that datasets are retained, conserved, or deleted according to business, compliance, and operational requirements.
It introduces a tiered conservation modelActive, Conserved, and Frozen (Unfreeze on Demand) — to balance data accessibility, cost, and compliance.


Purpose

  • Establish retention and archiving standards using Iceberg’s table versioning and metadata management.
  • Optimize data storage by migrating stale data to conservation or frozen states.
  • Maintain data recoverability and auditability while controlling costs.
  • Ensure deletion and unfreeze actions comply with retention policies and regulatory obligations.

Retention Model

Tier Description Access Level Typical Duration Storage Location
Active Frequently accessed operational or analytical data Read/Write 0–12 months Primary Iceberg tables (production schema)
Conserved Infrequently accessed but needed for compliance or reprocessing Read-only 1–5 years Secondary Iceberg catalog / cold storage
Frozen Rarely accessed, preserved for audit or legal hold Archived, unfreezed on request 5+ years Glacier-like archive, separate retention zone

Data Lifecycle Rules

  1. Transition to Conservation:
  2. Triggered when data has no updates for X months (e.g., 12 months).
  3. Iceberg metadata snapshot is retained, and data files are moved to lower-cost storage.
  4. Table is marked read-only and versioned.

  5. Transition to Frozen:

  6. Triggered when data has not been accessed for Y years (e.g., 5 years).
  7. Data files are moved to archival storage (e.g., S3 Glacier or Azure Archive).
  8. Metadata and schema definitions remain for lineage and unfreeze reference.

  9. Unfreeze Rule:

  10. Authorized request (Data Owner + Compliance approval) restores a frozen dataset to an Active or Conserved state.
  11. Restoration process is tracked in the audit log.
  12. Restored datasets are automatically reclassified under retention tier rules.

  13. Deletion Rule:

  14. Data that exceeds retention and is not under legal hold is securely purged using Iceberg delete manifests.
  15. Audit trail of deletion (who, when, why) must be maintained.
  16. Deletion is confirmed by Compliance.

Technical Implementation

Function Description Tooling
Time-based Partitioning Partition tables by date to simplify retention filtering Iceberg partition spec (transaction_date)
Snapshot Expiration Automatically delete outdated table snapshots expire_snapshots API
Metadata Compaction Optimize small file and snapshot management Iceberg maintenance jobs
Tiered Storage Policies Assign storage class based on retention tier S3 Lifecycle, Azure Blob tiering
Governance Audit Logs Record all conservation, unfreeze, and deletion actions Supabase / Snowflake audit tables
Access Control Restrict access to conserved or frozen data Role-based Iceberg catalog permissions

Example: Automated Iceberg Lifecycle (Pseudocode)

# Example pseudocode for automated Iceberg retention management
from datetime import datetime, timedelta
from iceberg import Table, expire_snapshots

def manage_table_lifecycle(table: Table):
    today = datetime.utcnow()

    # Expire snapshots older than 12 months
    expire_snapshots(table, expire_before=today - timedelta(days=365))

    # Transition data to conservation tier
    if table.last_modified < today - timedelta(days=365):
        move_to_tier(table, tier="conservation")

    # Transition data to frozen tier
    if table.last_accessed < today - timedelta(days=5*365):
        move_to_tier(table, tier="frozen")

    # Securely delete expired data
    if table.retention_expired():
        delete_table_data(table)

This process would typically run as part of a governance-managed data lifecycle job, orchestrated by Airflow, Dagster, or Databricks workflows.

Visual Representation

flowchart TD
    A[Active Tier] -->|No updates 12m| B[Conserved Tier]
    B -->|No access 5y| C[Frozen Tier]
    C -->|Unfreeze Request| B
    C -->|Retention Expired| D[Secure Deletion]
    D --> E[Audit Log Entry]

Roles and Responsibilities

Role Responsibility
Data Owners Approve conservation and unfreeze requests; define tier thresholds
Data Stewards Implement lifecycle tagging and monitor data state transitions
IT / Platform Teams Manage storage policies, automate Iceberg lifecycle jobs
Compliance / Legal Teams Approve final deletion, maintain audit logs, enforce legal holds
Data Governance Office Oversee adherence, review metrics, and evolve policies

Key Takeaways

  1. Iceberg enables metadata-driven lifecycle management with version control and retention automation.
  2. Tiered conservation model ensures cost efficiency while maintaining compliance and recoverability.
  3. Unfreeze procedures provide controlled, auditable restoration for frozen data.
  4. Combining Iceberg with S3 / Azure tiering delivers a scalable, policy-driven data retention framework for NNA.