TL;DR
- Create an AWS account, S3 bucket in a cheap region, and a dedicated IAM user with minimal S3 related permissions
- Install & configure AWS CLI on the machine that will do the upload
- Stage 1 — Archive: Run glacier_archive_split.sh to compress source folders into ~100GB
.tar.gzchunks across two transit disks (usespigzfor fast parallel compression) - Stage 2 — Upload: Run glacier_upload.sh to upload archives to S3 Glacier Deep Archive via resumable multipart upload (100MB parts, crash-safe)
- Cost: ~$1/month per TB stored; uploads are free; full retrieval of 1TB costs ~$96 (mostly data transfer out)
- Safety: original data is never touched, every upload is verified, resume survives power outages (loses at most ~100MB)
This post has CLAUDE.md file so you can update setup according to your needs via Claude Code.
Stage 0: AWS Setup
1. Create an S3 Bucket
Create a bucket in one of the cheapest regions for Glacier Deep Archive:
| Region Name | Region Code | Price (1TB / Month) |
|---|---|---|
| US East (N. Virginia) | us-east-1 | $1.01 |
| US East (Ohio) | us-east-2 | $1.01 |
| US West (Oregon) | us-west-2 | $1.01 |
| Europe (Stockholm) | eu-north-1 | $1.01 |
2. Create an IAM User
Create a dedicated user in IAM for managing this specific bucket. Attach the following inline policy:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "BucketLevelActions",
"Effect": "Allow",
"Action": [
"s3:ListBucket",
"s3:ListBucketMultipartUploads",
"s3:GetBucketLocation"
],
"Resource": "arn:aws:s3:::your-bucket-name"
},
{
"Sid": "ObjectLevelActions",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts",
"s3:GetObjectAttributes"
],
"Resource": "arn:aws:s3:::your-bucket-name/*"
},
{
"Sid": "AllowKMSEncryption",
"Effect": "Allow",
"Action": [
"kms:GenerateDataKey",
"kms:Decrypt"
],
"Resource": "*"
}
]
}
s3:ListBucketMultipartUploads and s3:AbortMultipartUpload are needed for the --cleanup flag to detect and abort incomplete multipart uploads.
3. Create an Access Key
Stay logged in as root — don’t log in as the new user. Go to IAM → Users, select the new user, switch to the Security credentials tab, and create a new Access Key.
4. Install AWS CLI
macOS:
Download and install AWS CLI v2.
Linux (remote server via SSH):
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
rm -rf awscliv2.zip aws/
Verify installation:
aws --version
5. Configure AWS CLI
Run aws configure. It will prompt for:
- AWS Access Key ID — from step 3
- AWS Secret Access Key — from step 3
- Default region name — e.g.
us-east-1(see pricing table above) - Default output format — press Enter to keep
json, or typetext
Note: The upload script uses the S3 multipart API directly with 100MB parts, so AWS CLI multipart settings (
multipart_chunksize,multipart_threshold) are not needed.
I have 1.2TB of files to upload. It is foto and video mainly. At first I will compress them in relatively big archives by folder. This reduces the number of PUT/GET requests to Amazon (each request costs money) and reduces metadata storage overhead (Glacier charges a minimum 40KB of metadata per object, so thousands of small files would waste storage on metadata alone).
The source HDD is connected to a local server without fast internet. The workflow is split into two stages:
- Stage 1: Archive data into ~100GB chunks on two transit SSDs (for physical transport)
- Stage 2: Upload archives from transit disks to AWS S3 Glacier Deep Archive
I asked Claude Code to create scripts for both stages.
Stage 1: Archive & Split to Transit Disks
Problem
The source HDD (2TB) is connected to a local server (user@192.168.1.100). Internet is too slow for direct upload. Two transit SSDs of different sizes are used to carry archives to a machine with fast internet.
Script: glacier_archive_split.sh
Scans all folders, groups them into ~100GB batches, compresses with pigz (fast parallel gzip), distributes across two transit disks respecting their available capacity. All operations are strictly sequential — one archive at a time, no parallel disk I/O.
Features:
- Auto-detects free space on each transit disk
- Large folders (e.g. GoPro) can be split into subfolders with custom naming
- Natural sort: folders are archived in alphabetical order, so
archive_001contains the earliest folders - Full file/folder map in master manifest on each disk
- Resume support (skips completed archives on re-run)
--dry-rungenerates a preview manifest with complete file listings--yesskips confirmation prompt (for background execution)--cleanupremoves all archives and state from transit disks
Configuration
Edit the top of glacier_archive_split.sh:
SOURCE_DIR="/mnt/source_hdd/PhotosVideos"
TRANSIT_DISK_1="/mnt/sdd_transit"
TRANSIT_DISK_2="/mnt/sdc_transit"
MAX_ARCHIVE_SIZE=$((100 * 1024 * 1024 * 1024)) # 100GB
# Split large folders into subfolders with custom naming
# Format: "FolderName:ArchivePrefix"
SPLIT_FOLDERS=("GoPro:GoPro")
With SPLIT_FOLDERS, the GoPro folder’s subfolders are batched separately and named GoPro_1.tar.gz, GoPro_2.tar.gz, etc. Regular folders become archive_001.tar.gz, archive_002.tar.gz, etc.
Usage
Preview the plan (no archives created):
./glacier_archive_split.sh --dry-run
This generates archive_master_manifest_PREVIEW.txt with the complete file map for every planned archive.
Run for real:
./glacier_archive_split.sh
Run in background (via screen + log file):
screen -S glacier
./glacier_archive_split.sh --yes 2>&1 | tee archive_run.log
# Ctrl+A, D to detach
# screen -r glacier to reattach
# tail -f archive_run.log from another terminal
Clean up transit disks after a failed run:
./glacier_archive_split.sh --cleanup
Check progress via glacier_archive_status.sh:
./glacier_archive_status.sh
What happens per archive
- Create tar.gz with
pigz -1(fast parallel compression) - Verify archive integrity (
tar -tzf) - Compute
md5sum - Write per-archive manifest (
archive_NNN.manifest.txt) - Append full file/folder map to master manifest (
archive_master_manifest.txt) - Mark as completed in
.archive_state/completed.txt
Master manifest
Each transit disk gets an archive_master_manifest.txt containing the complete directory structure and file listing for every archive on that disk. File paths are prefixed with the archive name:
archive_003/20230715 Summer-Trip/IMG_0001.JPG
archive_003/20230715 Summer-Trip/subfolder/video.mp4
GoPro_2/GoPro/HERO5 Session/DCIM/100GOPRO/G0012345.MP4
Search for any file:
grep "filename.jpg" /mnt/sdd_transit/archive_master_manifest.txt
Stage 2: Upload to S3 Glacier Deep Archive
Script: glacier_upload.sh
Uploads pre-built archives from transit disks to S3 Glacier Deep Archive using the S3 multipart upload API. Each 100MB part is individually tracked, so a power outage loses at most ~100MB of upload progress — not 100GB.
Features:
- Scans both transit disks for
.tar.gzarchives - Resumable multipart upload: two levels of resume
- Whole-archive:
completed.txttracks fully uploaded archives - Per-part:
.uploadstate files track each 100MB part
- Whole-archive:
- Verifies upload with
head-objectsize check - Deletes local archive after successful upload + verification
- Uploads master manifests to S3 at the end
--dry-runshows upload plan without making API calls--yesskips confirmation (for background/nohup execution)--cleanupaborts incomplete S3 multipart uploads and cleans local state
Configuration
Edit the top of glacier_upload.sh:
S3_BUCKET="your-bucket-name" # Must be changed before use
S3_PREFIX="photo-video-archive"
STORAGE_CLASS="DEEP_ARCHIVE"
TRANSIT_DISK_1="/mnt/sdd_transit"
TRANSIT_DISK_2="/mnt/sdc_transit"
PART_SIZE=$((100 * 1024 * 1024)) # 100MB per part
Usage
Test first (recommended):
./glacier_upload_dryrun.sh
The dry-run test creates a small ~25MB test archive and uploads it using multipart with 5MB parts, so you can see part-by-part progress and verify your AWS setup works.
Preview the upload plan:
./glacier_upload.sh --dry-run
Run the full upload:
./glacier_upload.sh
Run in background:
screen -S glacier
./glacier_upload.sh --yes 2>&1 | tee upload_run.log
# Ctrl+A, D to detach
# screen -r glacier to reattach
Clean up after failure:
./glacier_upload.sh --cleanup
This aborts incomplete multipart uploads on S3 (which accumulate costs) and removes local state files.
Check progress:
./glacier_status.sh # Local state only
./glacier_status.sh --s3 # Also check S3 for incomplete uploads
What happens per archive
- Check if already completed (skip if so)
- Initiate or resume S3 multipart upload
- Upload 100MB parts — each part’s ETag is written to state file immediately
- Complete multipart upload (send all ETags to S3)
- Verify via
head-object— compareContentLengthto local file size - Mark as completed, delete state file, delete local archive
Two-level resume (power outage recovery)
After a power outage, just re-run ./glacier_upload.sh:
- Whole-archive resume: Archives already in
completed.txtare skipped entirely - Per-part resume: If an upload was mid-way, the script reads the
.uploadstate file, validates the upload ID is still alive on S3, and continues from the last completed part - Worst case: lose ~100MB of upload progress (one part), not ~100GB
- If the upload ID expired on S3, the script detects this and starts a fresh multipart upload
State files (on each transit disk)
.upload_state/
completed.txt # Fully uploaded archive names
archive_001.tar.gz.upload # Per-part state for in-progress upload
upload_YYYYMMDD_HHMMSS.log # Upload log
Per-part state file format:
upload_id=ABC123...
s3_key=photo-video-archive/archive_001.tar.gz
file_size=107374182400
part_size=104857600
total_parts=1024
1 "etag-for-part-1"
2 "etag-for-part-2"
...
Master manifests
Each transit disk’s archive_master_manifest.txt (created by Stage 1) is uploaded to S3 at the end. These list every file in every archive, so you can find any file without downloading from Glacier.
Logs
Upload operations are logged in:
.upload_state/upload_YYYYMMDD_HHMMSS.log
Progress tracking:
.upload_state/completed.txt # List of fully uploaded archives
Cost Estimation
Summary for 1TB in eu-north-1 (Stockholm):
- Upload: ~$0.20 (one-time, essentially free)
- Storage: ~$1/month per TB
- Download (disaster recovery): ~$96 (one-time, see breakdown below)
Monthly Storage Cost (Glacier Deep Archive)
- ~$0.99 per TB per month ($0.00099 per GB/month)
- For 1TB of data: ~$1/month or ~$12/year
- Minimum storage duration: 180 days (early deletion fee applies)
- This is for ARCHIVAL only, not frequent access
Upload Cost
FREE. There is no charge for uploading data to S3.
Download Cost (Disaster Recovery)
This is where it gets expensive. Downloading 1TB from Glacier Deep Archive costs roughly $96 using the cheapest options. The “Data Transfer” fee is the hidden cost that surprises most people — the actual retrieval from the archive is surprisingly cheap.
Bare minimum cost breakdown (cheapest method, 1TB):
| Item | Cost Factor | Calculation | Cost |
|---|---|---|---|
| 1. Data Transfer Out | Sending data to your PC | 1,000 GB × $0.09 | $90 |
| 2. Bulk Retrieval (48h) | Thawing the data | 1,000 GB × $0.0025 | $3 |
| 3. Temporary S3 Standard storage | Restored objects (3 days) | 1,000 GB × $0.023 ÷ 30 × 3 | $3 |
| 4. API Requests | Restore + GET calls | ~18 requests | $0 |
That’s ~$96 minimum to get 1TB of your data back. Most of it ($90) is the data transfer fee — the actual Glacier retrieval is only $3.
Every AWS account gets 100GB of data transfer out per month for free. If you have the full 100GB available, subtract ~$9. If you’ve already used it this month on other downloads, you pay the full $96.
You must choose Bulk Retrieval when restoring from Glacier. AWS offers two retrieval tiers:
- Standard (12 hours): $0.02 per GB = $20 for 1TB
- Bulk (48 hours): $0.0025 per GB = $3 for 1TB
Always choose Bulk unless you need data within 12 hours — it saves ~$17 per TB.
Temporary S3 Standard storage — when you restore files from Glacier, AWS creates a temporary copy in S3 Standard for you to download. You specify how many days to keep this copy. Set it to 3 days (or however long your internet speed needs) to minimize this cost: 1,000 GB × $0.023/GB-month ÷ 30 days × 3 days ≈ $3.
Can you avoid the ~$90 transfer fee? No, not if you are downloading to your home or office internet. However, if you restore data to an EC2 instance inside the same region (eu-north-1), data transfer is free. If you only need to check a few files rather than download everything, launch a cheap EC2 server in Stockholm, restore there, extract what you need, and only download the small result to your home.
AWS Pricing Calculator: 1TB Full Cycle (eu-north-1)
For reference, here is the full line-item breakdown from the AWS Pricing Calculator for storing 1TB for 1 month and retrieving all of it. This estimate includes both Standard and Bulk retrieval and a full month of temporary S3 Standard storage — the real cost is lower if you use Bulk only and keep restored objects for just a few days (see above).
| Service | Usage Type | Quantity | Unit | Cost (USD) |
|---|---|---|---|---|
| Glacier Deep Archive | PUT requests | 18 | Requests | $0.001 |
| Glacier Deep Archive | S3-GDA Transition requests | 18 | Requests | $0.001 |
| Glacier Deep Archive | GET requests (Tier 3) | 18 | Requests | $0.002 |
| Glacier Deep Archive | Other requests (Tier 5) | 18 | Requests | $0.001 |
| Glacier Deep Archive | Storage (monthly) | 1,026 | GB-Mo | $1.016 |
| Glacier Deep Archive | Bulk Retrieval (48h) | 1,024 | GB | $3.072 |
| Glacier Deep Archive | Standard Retrieval (12h) | 1,024 | GB | $20.480 |
| S3 Standard | Temporary storage (restored objects, full month) | 1,024 | GB-Mo | $23.552 |
| Data Transfer | Transfer Out to Internet | 1,024 | GB | $92.160 |
The $140 total is a worst case — it includes both retrieval tiers and a full month of temporary storage. In practice, using Bulk Retrieval only and keeping restored objects for 3 days brings the real cost down to ~$96.
Safety Features
- Original data is never touched — only archives are uploaded and deleted
- Verification — each upload is verified via
head-objectsize check before deleting local archive - Two-level resume — whole-archive + per-part resume survives power outages
- Crash-safe state — part ETags are appended to state file after each 100MB part
- Detailed logs — full audit trail of all operations
- Confirmation prompt — asks before starting (skip with
--yes) - Cleanup command —
--cleanupaborts orphaned multipart uploads to prevent S3 cost accumulation
Example Output
[2026-02-08 14:30:15] ================================================================
[2026-02-08 14:30:15] archive_001.tar.gz from sdd_transit (95.2G, 976 parts)
[2026-02-08 14:30:15] ================================================================
[2026-02-08 14:30:15] Initiating new multipart upload...
[2026-02-08 14:30:16] Upload initiated: ABC123...
Part 500/976 (51%) 85 MB/s
Part 976/976 (100%) 82 MB/s
[2026-02-08 18:45:12] Completing multipart upload...
[2026-02-08 18:45:13] Multipart upload completed
[2026-02-08 18:45:13] Verifying upload...
[2026-02-08 18:45:14] Verified: size matches (102189432832 bytes)
[2026-02-08 18:45:14] Deleting local archive...
[2026-02-08 18:45:15] DONE: archive_001.tar.gz
Troubleshooting
AWS CLI not found
brew install awscli # macOS
# Or see AWS docs for Linux
Invalid credentials
aws configure
# Re-enter your credentials
Bucket not accessible
- Check bucket name is correct
- Verify IAM permissions (including
s3:ListBucketMultipartUploadsands3:AbortMultipartUpload) - Test with:
aws s3 ls s3://your-bucket-name
Upload failed mid-way
- Just re-run
./glacier_upload.sh— it resumes from the last completed 100MB part - Check
.upload_state/*.uploadfiles to see progress - Review log file in
.upload_state/upload_*.log
Orphaned multipart uploads (cost accumulation)
Incomplete multipart uploads store parts on S3 and cost money. Clean them up:
./glacier_upload.sh --cleanup
Manual Verification
Check what’s in your S3 bucket:
aws s3 ls s3://your-bucket-name/photo-video-archive/
Get details of a specific archive:
aws s3api head-object \
--bucket your-bucket-name \
--key photo-video-archive/archive_001.tar.gz
List incomplete multipart uploads:
aws s3api list-multipart-uploads --bucket your-bucket-name
Stopping the Script
Press Ctrl+C to stop. On next run:
- Completed archives are skipped (tracked in
.upload_state/completed.txt) - Partially uploaded archives resume from the last completed 100MB part
- Worst case: lose one part (~100MB) of upload progress
