Saving Data for Disaster Recovery with Amazon S3 Glacier Deep Archive

TL;DR

Create an AWS account, S3 bucket in a cheap region, and a dedicated IAM user with minimal S3 related permissions
Install & configure AWS CLI on the machine that will do the upload
Stage 1 — Archive: Run glacier_archive_split.sh to compress source folders into ~100GB .tar.gz chunks across two transit disks (uses pigz for fast parallel compression)
Stage 2 — Upload: Run glacier_upload.sh to upload archives to S3 Glacier Deep Archive via resumable multipart upload (100MB parts, crash-safe)
Cost: ~$1/month per TB stored; uploads are free; full retrieval of 1TB costs ~$96 (mostly data transfer out)
Safety: original data is never touched, every upload is verified, resume survives power outages (loses at most ~100MB)

This post has CLAUDE.md file so you can update setup according to your needs via Claude Code.

Stage 0: AWS Setup

1. Create an S3 Bucket

Create a bucket in one of the cheapest regions for Glacier Deep Archive:

Region Name	Region Code	Price (1TB / Month)
US East (N. Virginia)	us-east-1	$1.01
US East (Ohio)	us-east-2	$1.01
US West (Oregon)	us-west-2	$1.01
Europe (Stockholm)	eu-north-1	$1.01

2. Create an IAM User

Create a dedicated user in IAM for managing this specific bucket. Attach the following inline policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "BucketLevelActions",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:GetBucketLocation"
            ],
            "Resource": "arn:aws:s3:::your-bucket-name"
        },
        {
            "Sid": "ObjectLevelActions",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject",
                "s3:AbortMultipartUpload",
                "s3:ListMultipartUploadParts",
                "s3:GetObjectAttributes"
            ],
            "Resource": "arn:aws:s3:::your-bucket-name/*"
        },
        {
            "Sid": "AllowKMSEncryption",
            "Effect": "Allow",
            "Action": [
                "kms:GenerateDataKey",
                "kms:Decrypt"
            ],
            "Resource": "*"
        }
    ]
}

s3:ListBucketMultipartUploads and s3:AbortMultipartUpload are needed for the --cleanup flag to detect and abort incomplete multipart uploads.

3. Create an Access Key

Stay logged in as root — don’t log in as the new user. Go to IAM → Users, select the new user, switch to the Security credentials tab, and create a new Access Key.

4. Install AWS CLI

macOS:

Download and install AWS CLI v2.

Linux (remote server via SSH):

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
rm -rf awscliv2.zip aws/

Verify installation:

aws --version

5. Configure AWS CLI

Run aws configure. It will prompt for:

AWS Access Key ID — from step 3
AWS Secret Access Key — from step 3
Default region name — e.g. us-east-1 (see pricing table above)
Default output format — press Enter to keep json, or type text

Note: The upload script uses the S3 multipart API directly with 100MB parts, so AWS CLI multipart settings (multipart_chunksize, multipart_threshold) are not needed.

I have 1.2TB of files to upload. It is foto and video mainly. At first I will compress them in relatively big archives by folder. This reduces the number of PUT/GET requests to Amazon (each request costs money) and reduces metadata storage overhead (Glacier charges a minimum 40KB of metadata per object, so thousands of small files would waste storage on metadata alone).

The source HDD is connected to a local server without fast internet. The workflow is split into two stages:

Stage 1: Archive data into ~100GB chunks on two transit SSDs (for physical transport)
Stage 2: Upload archives from transit disks to AWS S3 Glacier Deep Archive

I asked Claude Code to create scripts for both stages.

Stage 1: Archive & Split to Transit Disks

Problem

The source HDD (2TB) is connected to a local server (user@192.168.1.100). Internet is too slow for direct upload. Two transit SSDs of different sizes are used to carry archives to a machine with fast internet.

Script: `glacier_archive_split.sh`

Scans all folders, groups them into ~100GB batches, compresses with pigz (fast parallel gzip), distributes across two transit disks respecting their available capacity. All operations are strictly sequential — one archive at a time, no parallel disk I/O.

Features:

Auto-detects free space on each transit disk
Large folders (e.g. GoPro) can be split into subfolders with custom naming
Natural sort: folders are archived in alphabetical order, so archive_001 contains the earliest folders
Full file/folder map in master manifest on each disk
Resume support (skips completed archives on re-run)
--dry-run generates a preview manifest with complete file listings
--yes skips confirmation prompt (for background execution)
--cleanup removes all archives and state from transit disks

Configuration

Edit the top of glacier_archive_split.sh:

SOURCE_DIR="/mnt/source_hdd/PhotosVideos"
TRANSIT_DISK_1="/mnt/sdd_transit"
TRANSIT_DISK_2="/mnt/sdc_transit"
MAX_ARCHIVE_SIZE=$((100 * 1024 * 1024 * 1024))  # 100GB

# Split large folders into subfolders with custom naming
# Format: "FolderName:ArchivePrefix"
SPLIT_FOLDERS=("GoPro:GoPro")

With SPLIT_FOLDERS, the GoPro folder’s subfolders are batched separately and named GoPro_1.tar.gz, GoPro_2.tar.gz, etc. Regular folders become archive_001.tar.gz, archive_002.tar.gz, etc.

Usage

Preview the plan (no archives created):

./glacier_archive_split.sh --dry-run

This generates archive_master_manifest_PREVIEW.txt with the complete file map for every planned archive.

Run for real:

./glacier_archive_split.sh

Run in background (via screen + log file):

screen -S glacier
./glacier_archive_split.sh --yes 2>&1 | tee archive_run.log
# Ctrl+A, D to detach
# screen -r glacier to reattach
# tail -f archive_run.log from another terminal

Clean up transit disks after a failed run:

./glacier_archive_split.sh --cleanup

Check progress via glacier_archive_status.sh:

./glacier_archive_status.sh

What happens per archive

Create tar.gz with pigz -1 (fast parallel compression)
Verify archive integrity (tar -tzf)
Compute md5sum
Write per-archive manifest (archive_NNN.manifest.txt)
Append full file/folder map to master manifest (archive_master_manifest.txt)
Mark as completed in .archive_state/completed.txt

Master manifest

Each transit disk gets an archive_master_manifest.txt containing the complete directory structure and file listing for every archive on that disk. File paths are prefixed with the archive name:

archive_003/20230715 Summer-Trip/IMG_0001.JPG
archive_003/20230715 Summer-Trip/subfolder/video.mp4
GoPro_2/GoPro/HERO5 Session/DCIM/100GOPRO/G0012345.MP4

Search for any file:

grep "filename.jpg" /mnt/sdd_transit/archive_master_manifest.txt

Stage 2: Upload to S3 Glacier Deep Archive

Script: `glacier_upload.sh`

Uploads pre-built archives from transit disks to S3 Glacier Deep Archive using the S3 multipart upload API. Each 100MB part is individually tracked, so a power outage loses at most ~100MB of upload progress — not 100GB.

Features:

Scans both transit disks for .tar.gz archives
Resumable multipart upload: two levels of resume
- Whole-archive: completed.txt tracks fully uploaded archives
- Per-part: .upload state files track each 100MB part
Verifies upload with head-object size check
Deletes local archive after successful upload + verification
Uploads master manifests to S3 at the end
--dry-run shows upload plan without making API calls
--yes skips confirmation (for background/nohup execution)
--cleanup aborts incomplete S3 multipart uploads and cleans local state

Configuration

Edit the top of glacier_upload.sh:

S3_BUCKET="your-bucket-name"    # Must be changed before use
S3_PREFIX="photo-video-archive"
STORAGE_CLASS="DEEP_ARCHIVE"
TRANSIT_DISK_1="/mnt/sdd_transit"
TRANSIT_DISK_2="/mnt/sdc_transit"
PART_SIZE=$((100 * 1024 * 1024))  # 100MB per part

Usage

Test first (recommended):

./glacier_upload_dryrun.sh

The dry-run test creates a small ~25MB test archive and uploads it using multipart with 5MB parts, so you can see part-by-part progress and verify your AWS setup works.

Preview the upload plan:

./glacier_upload.sh --dry-run

Run the full upload:

./glacier_upload.sh

Run in background:

screen -S glacier
./glacier_upload.sh --yes 2>&1 | tee upload_run.log
# Ctrl+A, D to detach
# screen -r glacier to reattach

Clean up after failure:

./glacier_upload.sh --cleanup

This aborts incomplete multipart uploads on S3 (which accumulate costs) and removes local state files.

Check progress:

./glacier_status.sh        # Local state only
./glacier_status.sh --s3   # Also check S3 for incomplete uploads

What happens per archive

Check if already completed (skip if so)
Initiate or resume S3 multipart upload
Upload 100MB parts — each part’s ETag is written to state file immediately
Complete multipart upload (send all ETags to S3)
Verify via head-object — compare ContentLength to local file size
Mark as completed, delete state file, delete local archive

Two-level resume (power outage recovery)

After a power outage, just re-run ./glacier_upload.sh:

Whole-archive resume: Archives already in completed.txt are skipped entirely
Per-part resume: If an upload was mid-way, the script reads the .upload state file, validates the upload ID is still alive on S3, and continues from the last completed part
Worst case: lose ~100MB of upload progress (one part), not ~100GB
If the upload ID expired on S3, the script detects this and starts a fresh multipart upload

State files (on each transit disk)

.upload_state/
  completed.txt                    # Fully uploaded archive names
  archive_001.tar.gz.upload        # Per-part state for in-progress upload
  upload_YYYYMMDD_HHMMSS.log       # Upload log

Per-part state file format:

upload_id=ABC123...
s3_key=photo-video-archive/archive_001.tar.gz
file_size=107374182400
part_size=104857600
total_parts=1024
1 "etag-for-part-1"
2 "etag-for-part-2"
...

Master manifests

Each transit disk’s archive_master_manifest.txt (created by Stage 1) is uploaded to S3 at the end. These list every file in every archive, so you can find any file without downloading from Glacier.

Logs

Upload operations are logged in:

.upload_state/upload_YYYYMMDD_HHMMSS.log

Progress tracking:

.upload_state/completed.txt  # List of fully uploaded archives

Cost Estimation

Summary for 1TB in eu-north-1 (Stockholm):

Upload: ~$0.20 (one-time, essentially free)
Storage: ~$1/month per TB
Download (disaster recovery): ~$96 (one-time, see breakdown below)

Monthly Storage Cost (Glacier Deep Archive)

~$0.99 per TB per month ($0.00099 per GB/month)
For 1TB of data: ~$1/month or ~$12/year
Minimum storage duration: 180 days (early deletion fee applies)
This is for ARCHIVAL only, not frequent access

Upload Cost

FREE. There is no charge for uploading data to S3.

Download Cost (Disaster Recovery)

This is where it gets expensive. Downloading 1TB from Glacier Deep Archive costs roughly $96 using the cheapest options. The “Data Transfer” fee is the hidden cost that surprises most people — the actual retrieval from the archive is surprisingly cheap.

Bare minimum cost breakdown (cheapest method, 1TB):

Item	Cost Factor	Calculation	Cost
1. Data Transfer Out	Sending data to your PC	1,000 GB × $0.09	$90
2. Bulk Retrieval (48h)	Thawing the data	1,000 GB × $0.0025	$3
3. Temporary S3 Standard storage	Restored objects (3 days)	1,000 GB × $0.023 ÷ 30 × 3	$3
4. API Requests	Restore + GET calls	~18 requests	$0

That’s ~$96 minimum to get 1TB of your data back. Most of it ($90) is the data transfer fee — the actual Glacier retrieval is only $3.

Every AWS account gets 100GB of data transfer out per month for free. If you have the full 100GB available, subtract ~$9. If you’ve already used it this month on other downloads, you pay the full $96.

You must choose Bulk Retrieval when restoring from Glacier. AWS offers two retrieval tiers:

Standard (12 hours): $0.02 per GB = $20 for 1TB
Bulk (48 hours): $0.0025 per GB = $3 for 1TB

Always choose Bulk unless you need data within 12 hours — it saves ~$17 per TB.

Temporary S3 Standard storage — when you restore files from Glacier, AWS creates a temporary copy in S3 Standard for you to download. You specify how many days to keep this copy. Set it to 3 days (or however long your internet speed needs) to minimize this cost: 1,000 GB × $0.023/GB-month ÷ 30 days × 3 days ≈ $3.

Can you avoid the ~$90 transfer fee? No, not if you are downloading to your home or office internet. However, if you restore data to an EC2 instance inside the same region (eu-north-1), data transfer is free. If you only need to check a few files rather than download everything, launch a cheap EC2 server in Stockholm, restore there, extract what you need, and only download the small result to your home.

AWS Pricing Calculator: 1TB Full Cycle (eu-north-1)

For reference, here is the full line-item breakdown from the AWS Pricing Calculator for storing 1TB for 1 month and retrieving all of it. This estimate includes both Standard and Bulk retrieval and a full month of temporary S3 Standard storage — the real cost is lower if you use Bulk only and keep restored objects for just a few days (see above).

Service	Usage Type	Quantity	Unit	Cost (USD)
Glacier Deep Archive	PUT requests	18	Requests	$0.001
Glacier Deep Archive	S3-GDA Transition requests	18	Requests	$0.001
Glacier Deep Archive	GET requests (Tier 3)	18	Requests	$0.002
Glacier Deep Archive	Other requests (Tier 5)	18	Requests	$0.001
Glacier Deep Archive	Storage (monthly)	1,026	GB-Mo	$1.016
Glacier Deep Archive	Bulk Retrieval (48h)	1,024	GB	$3.072
Glacier Deep Archive	Standard Retrieval (12h)	1,024	GB	$20.480
S3 Standard	Temporary storage (restored objects, full month)	1,024	GB-Mo	$23.552
Data Transfer	Transfer Out to Internet	1,024	GB	$92.160

The $140 total is a worst case — it includes both retrieval tiers and a full month of temporary storage. In practice, using Bulk Retrieval only and keeping restored objects for 3 days brings the real cost down to ~$96.

Safety Features

Original data is never touched — only archives are uploaded and deleted
Verification — each upload is verified via head-object size check before deleting local archive
Two-level resume — whole-archive + per-part resume survives power outages
Crash-safe state — part ETags are appended to state file after each 100MB part
Detailed logs — full audit trail of all operations
Confirmation prompt — asks before starting (skip with --yes)
Cleanup command — --cleanup aborts orphaned multipart uploads to prevent S3 cost accumulation

Example Output

[2026-02-08 14:30:15] ================================================================
[2026-02-08 14:30:15] archive_001.tar.gz from sdd_transit (95.2G, 976 parts)
[2026-02-08 14:30:15] ================================================================
[2026-02-08 14:30:15] Initiating new multipart upload...
[2026-02-08 14:30:16] Upload initiated: ABC123...
  Part 500/976 (51%) 85 MB/s
  Part 976/976 (100%) 82 MB/s
[2026-02-08 18:45:12] Completing multipart upload...
[2026-02-08 18:45:13] Multipart upload completed
[2026-02-08 18:45:13] Verifying upload...
[2026-02-08 18:45:14] Verified: size matches (102189432832 bytes)
[2026-02-08 18:45:14] Deleting local archive...
[2026-02-08 18:45:15] DONE: archive_001.tar.gz

Troubleshooting

AWS CLI not found

brew install awscli        # macOS
# Or see AWS docs for Linux

Invalid credentials

aws configure
# Re-enter your credentials

Bucket not accessible

Check bucket name is correct
Verify IAM permissions (including s3:ListBucketMultipartUploads and s3:AbortMultipartUpload)
Test with: aws s3 ls s3://your-bucket-name

Upload failed mid-way

Just re-run ./glacier_upload.sh — it resumes from the last completed 100MB part
Check .upload_state/*.upload files to see progress
Review log file in .upload_state/upload_*.log

Orphaned multipart uploads (cost accumulation)

Incomplete multipart uploads store parts on S3 and cost money. Clean them up:

./glacier_upload.sh --cleanup

Manual Verification

Check what’s in your S3 bucket:

aws s3 ls s3://your-bucket-name/photo-video-archive/

Get details of a specific archive:

aws s3api head-object \
  --bucket your-bucket-name \
  --key photo-video-archive/archive_001.tar.gz

List incomplete multipart uploads:

aws s3api list-multipart-uploads --bucket your-bucket-name

Stopping the Script

Press Ctrl+C to stop. On next run:

Completed archives are skipped (tracked in .upload_state/completed.txt)
Partially uploaded archives resume from the last completed 100MB part
Worst case: lose one part (~100MB) of upload progress

TL;DR#

Stage 0: AWS Setup#

1. Create an S3 Bucket#

2. Create an IAM User#

3. Create an Access Key#

4. Install AWS CLI#

5. Configure AWS CLI#

Stage 1: Archive & Split to Transit Disks#

Problem#

Script: glacier_archive_split.sh#

Configuration#

Usage#

What happens per archive#

Master manifest#

Stage 2: Upload to S3 Glacier Deep Archive#

Script: glacier_upload.sh#

Configuration#

Usage#

Test first (recommended):#

Preview the upload plan:#

Run the full upload:#

Run in background:#

Clean up after failure:#

Check progress:#

What happens per archive#

Two-level resume (power outage recovery)#

State files (on each transit disk)#

Master manifests#

Logs#

Cost Estimation#

Monthly Storage Cost (Glacier Deep Archive)#

Upload Cost#

Download Cost (Disaster Recovery)#

AWS Pricing Calculator: 1TB Full Cycle (eu-north-1)#

Safety Features#

Example Output#

Troubleshooting#

AWS CLI not found#

Invalid credentials#

Bucket not accessible#

Upload failed mid-way#

Orphaned multipart uploads (cost accumulation)#

Manual Verification#

Stopping the Script#

TL;DR

Stage 0: AWS Setup

1. Create an S3 Bucket

2. Create an IAM User

3. Create an Access Key

4. Install AWS CLI

5. Configure AWS CLI

Stage 1: Archive & Split to Transit Disks

Problem

Script: `glacier_archive_split.sh`

Configuration

Usage

What happens per archive

Master manifest

Stage 2: Upload to S3 Glacier Deep Archive

Script: `glacier_upload.sh`

Configuration

Usage

Test first (recommended):

Preview the upload plan:

Run the full upload:

Run in background:

Clean up after failure:

Check progress:

What happens per archive

Two-level resume (power outage recovery)

State files (on each transit disk)

Master manifests

Logs

Cost Estimation

Monthly Storage Cost (Glacier Deep Archive)

Upload Cost

Download Cost (Disaster Recovery)

AWS Pricing Calculator: 1TB Full Cycle (eu-north-1)

Safety Features

Example Output

Troubleshooting

AWS CLI not found

Invalid credentials

Bucket not accessible

Upload failed mid-way

Orphaned multipart uploads (cost accumulation)

Manual Verification

Stopping the Script