TL;DR

  • Create an AWS account, S3 bucket in a cheap region, and a dedicated IAM user with minimal S3 related permissions
  • Install & configure AWS CLI on the machine that will do the upload
  • Stage 1 — Archive: Run glacier_archive_split.sh to compress source folders into ~100GB .tar.gz chunks across two transit disks (uses pigz for fast parallel compression)
  • Stage 2 — Upload: Run glacier_upload.sh to upload archives to S3 Glacier Deep Archive via resumable multipart upload (100MB parts, crash-safe)
  • Cost: ~$1/month per TB stored; uploads are free; full retrieval of 1TB costs ~$96 (mostly data transfer out)
  • Safety: original data is never touched, every upload is verified, resume survives power outages (loses at most ~100MB)

This post has CLAUDE.md file so you can update setup according to your needs via Claude Code.

Stage 0: AWS Setup

1. Create an S3 Bucket

Create a bucket in one of the cheapest regions for Glacier Deep Archive:

Region NameRegion CodePrice (1TB / Month)
US East (N. Virginia)us-east-1$1.01
US East (Ohio)us-east-2$1.01
US West (Oregon)us-west-2$1.01
Europe (Stockholm)eu-north-1$1.01

2. Create an IAM User

Create a dedicated user in IAM for managing this specific bucket. Attach the following inline policy:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "BucketLevelActions",
            "Effect": "Allow",
            "Action": [
                "s3:ListBucket",
                "s3:ListBucketMultipartUploads",
                "s3:GetBucketLocation"
            ],
            "Resource": "arn:aws:s3:::your-bucket-name"
        },
        {
            "Sid": "ObjectLevelActions",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject",
                "s3:AbortMultipartUpload",
                "s3:ListMultipartUploadParts",
                "s3:GetObjectAttributes"
            ],
            "Resource": "arn:aws:s3:::your-bucket-name/*"
        },
        {
            "Sid": "AllowKMSEncryption",
            "Effect": "Allow",
            "Action": [
                "kms:GenerateDataKey",
                "kms:Decrypt"
            ],
            "Resource": "*"
        }
    ]
}

s3:ListBucketMultipartUploads and s3:AbortMultipartUpload are needed for the --cleanup flag to detect and abort incomplete multipart uploads.

3. Create an Access Key

Stay logged in as root — don’t log in as the new user. Go to IAM → Users, select the new user, switch to the Security credentials tab, and create a new Access Key.

4. Install AWS CLI

macOS:

Download and install AWS CLI v2.

Linux (remote server via SSH):

curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
rm -rf awscliv2.zip aws/

Verify installation:

aws --version

5. Configure AWS CLI

Run aws configure. It will prompt for:

  • AWS Access Key ID — from step 3
  • AWS Secret Access Key — from step 3
  • Default region name — e.g. us-east-1 (see pricing table above)
  • Default output format — press Enter to keep json, or type text

Note: The upload script uses the S3 multipart API directly with 100MB parts, so AWS CLI multipart settings (multipart_chunksize, multipart_threshold) are not needed.

I have 1.2TB of files to upload. It is foto and video mainly. At first I will compress them in relatively big archives by folder. This reduces the number of PUT/GET requests to Amazon (each request costs money) and reduces metadata storage overhead (Glacier charges a minimum 40KB of metadata per object, so thousands of small files would waste storage on metadata alone).

The source HDD is connected to a local server without fast internet. The workflow is split into two stages:

  1. Stage 1: Archive data into ~100GB chunks on two transit SSDs (for physical transport)
  2. Stage 2: Upload archives from transit disks to AWS S3 Glacier Deep Archive

I asked Claude Code to create scripts for both stages.

Stage 1: Archive & Split to Transit Disks

Problem

The source HDD (2TB) is connected to a local server (user@192.168.1.100). Internet is too slow for direct upload. Two transit SSDs of different sizes are used to carry archives to a machine with fast internet.

Script: glacier_archive_split.sh

Scans all folders, groups them into ~100GB batches, compresses with pigz (fast parallel gzip), distributes across two transit disks respecting their available capacity. All operations are strictly sequential — one archive at a time, no parallel disk I/O.

Features:

  • Auto-detects free space on each transit disk
  • Large folders (e.g. GoPro) can be split into subfolders with custom naming
  • Natural sort: folders are archived in alphabetical order, so archive_001 contains the earliest folders
  • Full file/folder map in master manifest on each disk
  • Resume support (skips completed archives on re-run)
  • --dry-run generates a preview manifest with complete file listings
  • --yes skips confirmation prompt (for background execution)
  • --cleanup removes all archives and state from transit disks

Configuration

Edit the top of glacier_archive_split.sh:

SOURCE_DIR="/mnt/source_hdd/PhotosVideos"
TRANSIT_DISK_1="/mnt/sdd_transit"
TRANSIT_DISK_2="/mnt/sdc_transit"
MAX_ARCHIVE_SIZE=$((100 * 1024 * 1024 * 1024))  # 100GB

# Split large folders into subfolders with custom naming
# Format: "FolderName:ArchivePrefix"
SPLIT_FOLDERS=("GoPro:GoPro")

With SPLIT_FOLDERS, the GoPro folder’s subfolders are batched separately and named GoPro_1.tar.gz, GoPro_2.tar.gz, etc. Regular folders become archive_001.tar.gz, archive_002.tar.gz, etc.

Usage

Preview the plan (no archives created):

./glacier_archive_split.sh --dry-run

This generates archive_master_manifest_PREVIEW.txt with the complete file map for every planned archive.

Run for real:

./glacier_archive_split.sh

Run in background (via screen + log file):

screen -S glacier
./glacier_archive_split.sh --yes 2>&1 | tee archive_run.log
# Ctrl+A, D to detach
# screen -r glacier to reattach
# tail -f archive_run.log from another terminal

Clean up transit disks after a failed run:

./glacier_archive_split.sh --cleanup

Check progress via glacier_archive_status.sh:

./glacier_archive_status.sh

What happens per archive

  1. Create tar.gz with pigz -1 (fast parallel compression)
  2. Verify archive integrity (tar -tzf)
  3. Compute md5sum
  4. Write per-archive manifest (archive_NNN.manifest.txt)
  5. Append full file/folder map to master manifest (archive_master_manifest.txt)
  6. Mark as completed in .archive_state/completed.txt

Master manifest

Each transit disk gets an archive_master_manifest.txt containing the complete directory structure and file listing for every archive on that disk. File paths are prefixed with the archive name:

archive_003/20230715 Summer-Trip/IMG_0001.JPG
archive_003/20230715 Summer-Trip/subfolder/video.mp4
GoPro_2/GoPro/HERO5 Session/DCIM/100GOPRO/G0012345.MP4

Search for any file:

grep "filename.jpg" /mnt/sdd_transit/archive_master_manifest.txt

Stage 2: Upload to S3 Glacier Deep Archive

Script: glacier_upload.sh

Uploads pre-built archives from transit disks to S3 Glacier Deep Archive using the S3 multipart upload API. Each 100MB part is individually tracked, so a power outage loses at most ~100MB of upload progress — not 100GB.

Features:

  • Scans both transit disks for .tar.gz archives
  • Resumable multipart upload: two levels of resume
    • Whole-archive: completed.txt tracks fully uploaded archives
    • Per-part: .upload state files track each 100MB part
  • Verifies upload with head-object size check
  • Deletes local archive after successful upload + verification
  • Uploads master manifests to S3 at the end
  • --dry-run shows upload plan without making API calls
  • --yes skips confirmation (for background/nohup execution)
  • --cleanup aborts incomplete S3 multipart uploads and cleans local state

Configuration

Edit the top of glacier_upload.sh:

S3_BUCKET="your-bucket-name"    # Must be changed before use
S3_PREFIX="photo-video-archive"
STORAGE_CLASS="DEEP_ARCHIVE"
TRANSIT_DISK_1="/mnt/sdd_transit"
TRANSIT_DISK_2="/mnt/sdc_transit"
PART_SIZE=$((100 * 1024 * 1024))  # 100MB per part

Usage

./glacier_upload_dryrun.sh

The dry-run test creates a small ~25MB test archive and uploads it using multipart with 5MB parts, so you can see part-by-part progress and verify your AWS setup works.

Preview the upload plan:

./glacier_upload.sh --dry-run

Run the full upload:

./glacier_upload.sh

Run in background:

screen -S glacier
./glacier_upload.sh --yes 2>&1 | tee upload_run.log
# Ctrl+A, D to detach
# screen -r glacier to reattach

Clean up after failure:

./glacier_upload.sh --cleanup

This aborts incomplete multipart uploads on S3 (which accumulate costs) and removes local state files.

Check progress:

./glacier_status.sh        # Local state only
./glacier_status.sh --s3   # Also check S3 for incomplete uploads

What happens per archive

  1. Check if already completed (skip if so)
  2. Initiate or resume S3 multipart upload
  3. Upload 100MB parts — each part’s ETag is written to state file immediately
  4. Complete multipart upload (send all ETags to S3)
  5. Verify via head-object — compare ContentLength to local file size
  6. Mark as completed, delete state file, delete local archive

Two-level resume (power outage recovery)

After a power outage, just re-run ./glacier_upload.sh:

  • Whole-archive resume: Archives already in completed.txt are skipped entirely
  • Per-part resume: If an upload was mid-way, the script reads the .upload state file, validates the upload ID is still alive on S3, and continues from the last completed part
  • Worst case: lose ~100MB of upload progress (one part), not ~100GB
  • If the upload ID expired on S3, the script detects this and starts a fresh multipart upload

State files (on each transit disk)

.upload_state/
  completed.txt                    # Fully uploaded archive names
  archive_001.tar.gz.upload        # Per-part state for in-progress upload
  upload_YYYYMMDD_HHMMSS.log       # Upload log

Per-part state file format:

upload_id=ABC123...
s3_key=photo-video-archive/archive_001.tar.gz
file_size=107374182400
part_size=104857600
total_parts=1024
1 "etag-for-part-1"
2 "etag-for-part-2"
...

Master manifests

Each transit disk’s archive_master_manifest.txt (created by Stage 1) is uploaded to S3 at the end. These list every file in every archive, so you can find any file without downloading from Glacier.

Logs

Upload operations are logged in:

.upload_state/upload_YYYYMMDD_HHMMSS.log

Progress tracking:

.upload_state/completed.txt  # List of fully uploaded archives

Cost Estimation

Summary for 1TB in eu-north-1 (Stockholm):

  • Upload: ~$0.20 (one-time, essentially free)
  • Storage: ~$1/month per TB
  • Download (disaster recovery): ~$96 (one-time, see breakdown below)

Monthly Storage Cost (Glacier Deep Archive)

  • ~$0.99 per TB per month ($0.00099 per GB/month)
  • For 1TB of data: ~$1/month or ~$12/year
  • Minimum storage duration: 180 days (early deletion fee applies)
  • This is for ARCHIVAL only, not frequent access

Upload Cost

FREE. There is no charge for uploading data to S3.

Download Cost (Disaster Recovery)

This is where it gets expensive. Downloading 1TB from Glacier Deep Archive costs roughly $96 using the cheapest options. The “Data Transfer” fee is the hidden cost that surprises most people — the actual retrieval from the archive is surprisingly cheap.

Bare minimum cost breakdown (cheapest method, 1TB):

ItemCost FactorCalculationCost
1. Data Transfer OutSending data to your PC1,000 GB × $0.09$90
2. Bulk Retrieval (48h)Thawing the data1,000 GB × $0.0025$3
3. Temporary S3 Standard storageRestored objects (3 days)1,000 GB × $0.023 ÷ 30 × 3$3
4. API RequestsRestore + GET calls~18 requests$0

That’s ~$96 minimum to get 1TB of your data back. Most of it ($90) is the data transfer fee — the actual Glacier retrieval is only $3.

Every AWS account gets 100GB of data transfer out per month for free. If you have the full 100GB available, subtract ~$9. If you’ve already used it this month on other downloads, you pay the full $96.

You must choose Bulk Retrieval when restoring from Glacier. AWS offers two retrieval tiers:

  • Standard (12 hours): $0.02 per GB = $20 for 1TB
  • Bulk (48 hours): $0.0025 per GB = $3 for 1TB

Always choose Bulk unless you need data within 12 hours — it saves ~$17 per TB.

Temporary S3 Standard storage — when you restore files from Glacier, AWS creates a temporary copy in S3 Standard for you to download. You specify how many days to keep this copy. Set it to 3 days (or however long your internet speed needs) to minimize this cost: 1,000 GB × $0.023/GB-month ÷ 30 days × 3 days ≈ $3.

Can you avoid the ~$90 transfer fee? No, not if you are downloading to your home or office internet. However, if you restore data to an EC2 instance inside the same region (eu-north-1), data transfer is free. If you only need to check a few files rather than download everything, launch a cheap EC2 server in Stockholm, restore there, extract what you need, and only download the small result to your home.

AWS Pricing Calculator: 1TB Full Cycle (eu-north-1)

For reference, here is the full line-item breakdown from the AWS Pricing Calculator for storing 1TB for 1 month and retrieving all of it. This estimate includes both Standard and Bulk retrieval and a full month of temporary S3 Standard storage — the real cost is lower if you use Bulk only and keep restored objects for just a few days (see above).

ServiceUsage TypeQuantityUnitCost (USD)
Glacier Deep ArchivePUT requests18Requests$0.001
Glacier Deep ArchiveS3-GDA Transition requests18Requests$0.001
Glacier Deep ArchiveGET requests (Tier 3)18Requests$0.002
Glacier Deep ArchiveOther requests (Tier 5)18Requests$0.001
Glacier Deep ArchiveStorage (monthly)1,026GB-Mo$1.016
Glacier Deep ArchiveBulk Retrieval (48h)1,024GB$3.072
Glacier Deep ArchiveStandard Retrieval (12h)1,024GB$20.480
S3 StandardTemporary storage (restored objects, full month)1,024GB-Mo$23.552
Data TransferTransfer Out to Internet1,024GB$92.160

The $140 total is a worst case — it includes both retrieval tiers and a full month of temporary storage. In practice, using Bulk Retrieval only and keeping restored objects for 3 days brings the real cost down to ~$96.

Safety Features

  1. Original data is never touched — only archives are uploaded and deleted
  2. Verification — each upload is verified via head-object size check before deleting local archive
  3. Two-level resume — whole-archive + per-part resume survives power outages
  4. Crash-safe state — part ETags are appended to state file after each 100MB part
  5. Detailed logs — full audit trail of all operations
  6. Confirmation prompt — asks before starting (skip with --yes)
  7. Cleanup command--cleanup aborts orphaned multipart uploads to prevent S3 cost accumulation

Example Output

[2026-02-08 14:30:15] ================================================================
[2026-02-08 14:30:15] archive_001.tar.gz from sdd_transit (95.2G, 976 parts)
[2026-02-08 14:30:15] ================================================================
[2026-02-08 14:30:15] Initiating new multipart upload...
[2026-02-08 14:30:16] Upload initiated: ABC123...
  Part 500/976 (51%) 85 MB/s
  Part 976/976 (100%) 82 MB/s
[2026-02-08 18:45:12] Completing multipart upload...
[2026-02-08 18:45:13] Multipart upload completed
[2026-02-08 18:45:13] Verifying upload...
[2026-02-08 18:45:14] Verified: size matches (102189432832 bytes)
[2026-02-08 18:45:14] Deleting local archive...
[2026-02-08 18:45:15] DONE: archive_001.tar.gz

Troubleshooting

AWS CLI not found

brew install awscli        # macOS
# Or see AWS docs for Linux

Invalid credentials

aws configure
# Re-enter your credentials

Bucket not accessible

  • Check bucket name is correct
  • Verify IAM permissions (including s3:ListBucketMultipartUploads and s3:AbortMultipartUpload)
  • Test with: aws s3 ls s3://your-bucket-name

Upload failed mid-way

  • Just re-run ./glacier_upload.sh — it resumes from the last completed 100MB part
  • Check .upload_state/*.upload files to see progress
  • Review log file in .upload_state/upload_*.log

Orphaned multipart uploads (cost accumulation)

Incomplete multipart uploads store parts on S3 and cost money. Clean them up:

./glacier_upload.sh --cleanup

Manual Verification

Check what’s in your S3 bucket:

aws s3 ls s3://your-bucket-name/photo-video-archive/

Get details of a specific archive:

aws s3api head-object \
  --bucket your-bucket-name \
  --key photo-video-archive/archive_001.tar.gz

List incomplete multipart uploads:

aws s3api list-multipart-uploads --bucket your-bucket-name

Stopping the Script

Press Ctrl+C to stop. On next run:

  • Completed archives are skipped (tracked in .upload_state/completed.txt)
  • Partially uploaded archives resume from the last completed 100MB part
  • Worst case: lose one part (~100MB) of upload progress