unlimited space: creating a s3-based filesystem with s3backer

Published (click to see context) on 15/05/2020 by Morgan Bazalgette • 4 minutes

A few days ago, I wound up on the GitHub page for s3backer. The idea is fantastic: don’t rely on the way that S3 stores objects; instead, just use it to store raw data while using a filesystem on top of it. After having struggled many hours due to goofys being often too slow for what I needed, and due to the fact it did not have some real caching system if not for catfs, I decided I wanted to give s3backer a try.

I was pleased. My git hosting website, zxq.co, is now using s3backer together with an ext4 filesystem to serve git repositories. The downside is that, generally speaking, there needs to be a cache warm-up each time a repository is loaded: but the most important aspect really is the fact that I can now leverage Scaleway’s Object Storage while still running on a DEV1-S. In other words, “if you thought that I wouldn’t seriously use your cheapest machine just because it only had 20GB, you are wrong.”

This is an organised quick-start guide I made because I know I’m going to need this another time; it was done arranging together information from the s3backer wiki.

First of all, you will need to download and install s3backer.

# get deps for ubuntu and debian
sudo apt-get install libcurl4-openssl-dev libfuse-dev libexpat1-dev libssl-dev zlib1g-dev pkg-config

cd
git clone https://github.com/archiecobbs/s3backer.git
cd s3backer
./autogen.sh
./configure
make
sudo make install

# enable fuse for non-root users
sudo sh -c 'echo user_allow_other >> /etc/fuse.conf'

We need to give s3backer our auth data. We create ~/.s3backer_passwd and fill it with our s3 access ID and key, like so:

0KAODKRXJM39543K343:+MkIE9MA/dkwEaldRoaPP83dfa03=

You’ll need to pick a block size for the next step. There are two resources to consider: Choosing a block size, which makes a consideration about the block size especially in relationship to the cost, and also Performance considerations. It depends on your usecase, but in my case, for instance, using the Scaleway object storage together with a Scaleway VM has the nice side effect of having very little latency. I opted for the 128k option to be a good compromise. Consider, also, that Scaleway recommends keeping the object number below 500,000; as a matter of fact, in a project of mine we went over 5,000,000 without any issues, however shortly after that we started seeing the file uploads and downloads being slower, so you may want to keep it big enough.

In our example, we’ll also use compression; keep in mind that compression happens at a block level, thus with a larger block size there will be a better compression ratio.

Next, we’ll need to create a password file. This assumes you want to use s3backer’s encryption feature; and you probably do, since “it’s just there”.

# We'll assume we want to keep s3backer's files in ~/fs
mkdir ~/fs
cd ~/fs
head -c 256 /dev/urandom > passwd

Now comes the time to actually create the s3backer filesystem:

mkdir ~/fs/s3
s3backer --blockCacheFile=blockcachefile --baseURL=https://s3.fr-par.scw.cloud/ --blockSize=128k --size=200g --listBlocks --encrypt --passwordFile=passwd your-bucket-name --region=fr-par ~/fs/s3/

--blockCacheFile will use the specified file instead of writing the cache to RAM. On my settings, this lead to ~300MB of used resident memory, so I preferred to use a file.
--baseURL has been set to Scaleway’s s3 endpoint; change it to yours, or remove it to use amazon’s.
you will need to decide on --blockSize and --size. I’ve opted for 200g, but you can use any size.
--listBlocks is useful for running s3backer the first time, as it will avoid making useless checks to see which blocks are there and instead mark them all as blank, you may want to disable it when the blocks grow significantly, as this slows down startup time.
--encrypt enables encryption AND compression; --passwordFile refers to the passwd file we created above.
You will then need to change the bucket and the region according to the settings you have.

Other useful flags:

--blockCacheSize: decide how many blocks to cache; default is 1000.
--blockCacheThreads: “the faster your Internet connection, the more threads you’ll want; the larger your block size, the fewer threads you’ll want. The default value of 20 is just a wild guess.” (ref)

You can see them all at ManPage.

We have now started s3backer. We will now create the filesystem.

mkfs.ext4 -E nodiscard -F ~/fs/s3/file

Keep in mind that if you have modified the path, you can’t change file at the end.

You can now see the device at ~/fs/s3/file. Also, ~/fs/s3/stats has been created, which will give useful information about s3backer.

Now comes the mounting.

mkdir ~/mnt
sudo mount -o loop -o discard ~/fs/s3/file ~/mnt
# you probably want to do this to get write access from your non-sudo user
sudo chown -R $USER:$USER ~/mnt

We use -o discard because it allows us to remove file from S3 when we no longer need them; this works using the kernel API calls as SSD trimming. For more information: Unused Block Detection

You may then want to create a systemd service to start s3backer and mount the filesystem; unfortunately, the author writes that doing this using fstab is probably no longer possible due to two bugs.

#blogpost