Encrypted, Client side, NAS based file syncing
I'm not that keen on any file syncing service that takes my files and puts them on someone else's computer. (the cloud) This is because of a few reasons:
I'm simply paranoid. I know you can get services with client side encryption and everything but I still don't trust that.
I have slow internet, uploading every file on my computer would take days if not weeks.
- I don't want to pay a monthly service fee. The amount of data I have would far exceed and free or trial space I am given. I'd far rather buy my own hard drive which I own and control, even though in many cases this could actually be more expensive.
With these considerations, I haven't bothered syncing. My laptop and desktop have had diverging file systems since their install - Until now. Because of this I have mainly been using my laptop as that is what I take into lectures with me It is the device with important notes and files.
However I have a much more powerful desktop which is underutilized, so I've decided to make my own syncing system, which only relies on a NAS within the local network. I could have installed NextCloud on a Raspberry Pi or something but decided on a more DIY approach. I already have a NAS, in the form of a backup hard drive plugged directly into my router. My router provides an ssh environment which I can use to push files to the hard drive. I didn't want to bother with any server side software, so made a completely client side script.
Although my NAS is not accessible through the internet, I am still paranoid. I have both of my computer hard drives LUKS encrypted but the NAS is not, someone breaking in still could steal the hard drive and access all the data. It is also shared by anyone who can connect to the network, well, that's not entirely true as an ssh login is required, but I plan to share the NAS with my flatmates next year. Because of this, I made sure the script encrypted all files before sending them to the NAS.
The basic structure of my script is that all client devices that want to sync run the exact same script. This script contains variables for the directory to sync, password for symmetrical encryption, where to put temporary files, and the network address and user of the NAS. It expects to login to the NAS automatically via means of its public ssh key in the authorized_keys file.
The NAS is a dumb device which sits on the network providing storage space and ssh login capability. That is all that is needed of it.
The script checks if a manifest exists on the server. Assuming one doesn't yet it uploads files in the sync directory as well as a manifest of all files stating there name, modification date and hash in csv format. Everything is uploaded in a gpg encrypted form including the manifest.
Another computer with the same script downloads the manifests and cheeks it against it's own file system, if the hash of a file is different it will either upload its version if its version has a later timestamp or download the server's version if it has the later timestamp. This will overwrite the existing version. If a file is missing in its file system or the NAS's it will download or upload that file respective.
All uploads pass through gpg symmetrical encryption before upload. All downloads pass through gpg decryption before being placed in the synced file system.
This happens back and forth between all computers registered to the same sync script until all files are synced between them.
The script does not work on it's own, it requires a cron job to regularly run it. Also the first time running is particularly slow since it must do an initial upload. This is somewhat inefficient as it encrypts a file, uploads then stops uploading to encrypt the next. For lots of small files that is a lot of stopping and starting. Once most files exist in all locations and it only needs to upload or download those that have changed it completes far more quickly.
There is a also a slight problem with deleting files in this setup. If a file is deleted from a local directory but not the NAS, it will still be in the manifest of the NAS and with therefore be downloaded and presumed a missing file. As a result, zombie files appear and are difficult to get rid of. For this I have created a check such that if a file is missing, the script with first check the local trash folder and if a file with the same hash exists there it will not download the NAS's version and remove the file from the manifest. Once the file is removed from the manifest it is safe empty the trash, but if a file is deleted directly, or the trash emptied too quickly it will return as a zombie file. A backed up version of the file still exists on the NAS, however the script loses track of it with it no longer appearing in the manifest. Files must be deleted from each device as deletions with not sync across devices.
It also does not give any privacy to the names of the files. The files are uploaded with there current name and a .gpg extension. This means anyone who has access to the NAS could see what your files are called, just not what they contain, unless it is obvious from the title. This could be fixed by randomising the file names then mapping them back to there real names through the manifest file. Remember the manifest file is also encrypted on the NAS.
This script is highly experimental, and will required bug fixes and optimisations as I use it and discover its problems.
It relies on a UNIX environment so if your not running Linux (your missing out) it probably wont work.
If any reader would like to also use it, you can find it on my github here but you must use at your own risk and take backups before deploying it encase something goes horribly wrong. It is working specifically with every file in your directory, be cautious. I welcome forking, pull requests or improvements you may have. Licensing under GPL2.