Tag Archive: python

Automatic File Deletion in Amazon S3

A couple of weeks ago I wrote about backing up Apache and MySQL to Amazon S3, the final piece of the puzzle I was looking for was to automatically purge all backups older than 7 days from S3.  It took me a little while to put all the pieces in place, and I wanted to write it all down as much for myself as anyone else since I am sure I won’t remember all this if I need it again.

I found a nice utility names s3cleaner a simple python script that uses age and regex patterns to delete files from S3 buckets.

The first challenge was to run a python script on my server (hosted at Dreamhost) I needed to create a sandboxed Python environment.  I found some instructions here.  The first step is to download virtual-python.py and run it.  From a command line you can simply run

wget http://peak.telecommunity.com/dist/virtual-python.py (to download the script)


Once it is downloaded run the script


Next you have to download ez_setup.py from the command line run

wget http://peak.telecommunity.com/dist/ez_setup.py
Once it is downloaded execute it, the key here is you are now using the sandboxed install 
of python, point to python in the path shown in the image above, not simply python like as
in the first step.
You now have a sandboxed installation of python which will let you run s3cleaner but wait not so fast, 
s3cleaner relies on boto which provides the python interface to Amazon Web Services.  Download boto here,
and copy it up to your server.  Once on your server install it, again make sure you use the sandboxed python
not the system wide one
You are now ready to place s3cleaner.py on your server and run it you need to specify the following flags
when running
A word of caution, for bucket name it only accepts a bucket name not a subdirectory, so 
everything that matches your deletion criteria (–-maxage and –-regex) anywhere in the 
bucket will be deleted.
Download List 
virtual-python.py and ez_setup.py