A better backup strategy

Last week, Chase Jarvis published a very popular article about his workflow/backup strategy. The article was retweeted and by many and commented by Vincent Laforet who gave it his nod of approval.

Well, I am not the kind of guy who is impressed by cool charts and I found two flaws in Chase workflow.

Flaw 1: Archive the cards, not the file!

First, I would remove Aperture from the On Location process. A better way to import data from CF Card is to create disk images of these cards. This is a trick I picked from the book From Still to Motion. It is a better way to organize files by keeping all related data in the same place (DMG). Also, if you archive the physical cards, it makes the matching of virtual archives to the physical ones trivial. I never really though about keeping CF Cards until Shane Hulrbut mention it and with the decreasing cost of CF cards, it makes a lot of sense.

Flaw 2: Live work backuped with Time Machine? WTF!

First off, don’t get me wrong. Time Machine is good, very good. It is a set and forget kind of thing: it will backup files that have changed between two drives every x minutes. That is its strength but also its weakness: it does the copy based on time and not on milestones.

Lets take a worst case scenario: Time Machine just copied your project, then you decide to make three variations of one of your file. A few minutes later, you are done and while waiting for the automatic backup to happen, disaster strike: HD failure!

Here is another scenario: in the span of an hour, you modify a file three times then realize that you need version #2 which was not backuped by Time Machine (it saved #1 & #3). What can you do?

These are two example of how time based backup can (and will!) fail. While they are very good a copying stuff when you would not think about it, they dont have a notion of what is important for you to backup. Losing an hour of work on a file does not always mean that you will be able to get it back by spending another hour on it. Creativity and inspiration are not a function of time.

The classical approach to this kind of problem is to “save as” every minute and create a multitude of copies of a single file and manually backup this file on another drive. Not very practical.

Fortunately, there is a better solution. It is free, powerful and easy to use! And you know the best part? You don’t even need a complex IT infrastructure to make it work. I would never dare to say that I am the first to think about it, but according to Google, I am the first to blog about it: using GIT as a version control/backup system for your visual assets!

What is GIT

GIT is a distributed code versioning system. It is used by programmers to keep an history of all the modifications done to each file and distribute these files/change. In plain English, it means that it tracks changes to files, can go back to any milestones in its history and can apply all the modifications done on workstation to all the others.

GIT is the industry standard in the IT field and while its features set is an overkill, its performance and ease of use makes it a mandatory tool for every paranoid virtual asset owner.

How to use GIT

Understanding how to use GIT is out of the scope of this article. I am planning to publish a detailed article on the subject soon. If you can’t wait, and want your work to be meteor-proof, go take a look at the GIT Ready website. While the content is targeted at programmers, it will teach you the basics (tips: just read the sections about the init, add, commit, push and clone commands).

Conclusion

If there is one thing I hope you learned from this post is that a full featured backup solution is not simply about having multiple copy of your (RAW) files in various location. That is only part of the equation (which is fully covered in Chase video). The other part is making sure that you will also be able to access the version (live work) that is important to you (which I will cover in an upcoming article).

About Tommy

Photography allows me to be what I want to be, to be where I want to be, and to do what I want to do ... I'm not professional photographer and I don't need a title, I love to take photographs and that is what I do, I love to learn and I always try to do it better ...
  • http://zadie.com zadie

    How is flaw 1 a flaw? I see it as a strength. In your scenario of archiving DMG files, all you need is one corrupt file to loose an entire card’s worth of data. By importing the contents of the card the chances of data loss are minuscule compared to storing everything in a single image file.

    Think about it… maybe a CF card has some sort of glitch, maybe a bad sector (not sure about terminology here) or whatever. That card gets archived to a dmg that then has a problem opening. Cards should never be recycled in the field, but what if, by mistake, it happened?

    This is obviously a worst case scenario, but it is technology we’re talking about here. Stranger things can happen.

  • Cobbles

    I completely agree! I was surprised at how much press the article got.

    Why would you use Git over rsync?

  • admin

    Zadie: file corruption risks are minimal since the DMG is also backuped. The idea is that keeping files archive that way allow you at any point in time to use tools like Magic Bullet Grinder or the Log&Trans tool to get some metadata back from the files too.

    Maybe I was not clear but once DMGed, you dont recycle the card, you keep it until the data is backuped on another drive. This way, you have the files on two separate media.

  • admin

    Cobbles: First, I never read about rSync but use GIT daily so it was a natural choice for me.

    Looking at the rSync doc, it seems like it does not really keep an history of the files, it is used to sync the file system.
    Also, GIT is distributed, there is no ‘master’ data repository, every clone can be used as a source, or be used to update the rest of the network.
    This last feature is quite cool when you work with multiple computers and you can replicate in every direction.
    Also the ability to create multiple versions of a whole project (not just a file) is key.

  • Wayne

    Speaking as a programmer, the information provided by the article lacks a bit of detail for backup practices. The configuration required is quite a bit more than Time-Machine. Initializing the git environment and committing to it will not be sufficient to have a reliable backup system.

    In addition to requiring you to mark when you want the changes to be remembered, you also have to deploy these changes to a different location that will serve as your backup. Since the git backups are stored on the same hard drive as the data, a disk failure would destroy all your images anyway! A secondary hard drive (either local or remote) needs to be set up to transfer the images from your current environment to the backup environment. Since commits and pushes are different operations, for every change you need to do TWO commands to save the changes reliably (A hook can be set up to only require one command, but this requires more programming knowledge).

    Ultimately, the real problem with backups systems is we don’t expect the unexpected. As a result, we usually don’t think about backing up until it is too late. In this way, Time Machine works great due to its transparency and hands-off approach.

    If I wanted to have a finer grain backup system, I would create a daemon that would commit changes and push them to two different environments (local second hard drive and remote server) automatically. The daemon can know when a file is modified using a pipe to DTrace. This way, every time I save a change, the daemon commits and pushes the changes to two different destinations. This would leave me with a reliable, transparent, and finely grained version control system.

    Notice that if your computer has irreparably damaged due to an electrical storm, flood, fire, theft, etc… Time-Machine might be gone alongside your main hard drive. By backing up on a remote server, this prevents geographically localized incidents from ruining your data.

    • admin

      You are totally right. The backup strategy I am suggesting does not rely on GIT alone. Fit is only used to store the changes to the working files while the raw data is stored using another strategy. I am moving this weekend but once I am set, I will work on the tutorial.

  • Pingback: Two new backup workflows | SnapperTalk()

Facebook

Twitter

Google Plus

YouTube

Google Plus

Follow Me on Pinterest
  • Perseid Meteor Shower at Ward Charcoal Kilns NV by ACWaddington on 500px

    Pinned: 21 Aug 2013
  • Nocturnal lines by Alonso Díaz on 500px

    Pinned: 21 Aug 2013
  • Passing Storm by Gil Folk on 500px

    Pinned: 21 Aug 2013
  • Sam Luna by Dean Bradshaw on 500px

    Pinned: 21 Aug 2013
  • Irene by Oleg Sharonov on 500px

    Pinned: 21 Aug 2013
  • Flowers in the night by Takk B on 500px

    Pinned: 21 Aug 2013
  • Just fire! by Artem Fomichev on 500px

    Pinned: 21 Aug 2013
  • Nanpu Bridge by Paul Reiffer on 500px

    Pinned: 21 Aug 2013
  • Road to Nowhere - Supermoon by Aaron J. Groen on 500px

    Pinned: 21 Aug 2013
  • happy campers by Darren Pearson on 500px

    Pinned: 21 Aug 2013