Enable S3 versioning to retrieve accidentally overwritten or deleted objects

2023-03-15 aws

Versioning is settings of S3 Bucket. When enabled, all PUT data will remain with the version ID, and on DELETE, a delete marker is put as the latest version without actually deleting. If you GET an object whose latest version is a delete marker, it returns Not found like when there is no object, but if you delete the delete marker version, you can get that object again.

If you delete an object with the version, the delete marker is not created.

$ export BUCKET=<bucket> PREFIX=<prefix> RESTORE_TIME="2023-03-14T15:00+00:00"

# permanently delete existing objects
$ aws s3api list-object-versions --bucket ${BUCKET} --prefix ${PREFIX} --output json --query 'Versions[?IsLatest==`true` && LastModified>`'"${RESTORE_TIME}"'].[Key, VersionId]' | jq -r '.[] | "--key '\''" + .[0] + "'\'' --version-id " + .[1]' | xargs -L1 aws s3api delete-object --bucket ${BUCKET} 

# permanently delete delete markers
$ aws s3api list-object-versions --bucket <bucket> --prefix ${PREFIX} --output json --query 'DeleteMarkers[?IsLatest==`true` && LastModified>`'"${RESTORE_TIME}"'`].[Key, VersionId]' | jq -r '.[] | "--key '\''" + .[0] + "'\'' --version-id " + .[1]' | xargs -L1 aws s3api delete-object --bucket ${BUCKET}

Past versions are also applied to billing in the same way as the latest objects. So if objects are frequently overwritten or deleted on a daily basis, enabling versioning will increase the cost, but if they are not updated so much, they can be saved from accidents without paying the cost for the backup. By creating a lifecycle rule that deletes outdated versions, you can prevent deleted objects from remaining forever.

lifecyle rule which deletes outdated versions

Reference

Retrieve an Amazon S3 object that was deleted | AWS re:Post