DupeGuru: You Own Less Data Than You Think [en]

[fr] Pour faire la chasse aux doublons sur Mac, Windows ou Linux, je vous recommande chaudement d'essayer dupeGuru! (En plus, système de rémunération des développeurs intéressant: Fairware.)

One of the consequences of putting an SSD into my MacBook and using CrashPlan and an Amahi home server to store my data and backups is that I have been forced to do a little digital spring-cleaning.

I had:

  • a 500Gb HDD in my MacBook, which hit “full” some time back before I freed up some space by moving stuff to an external HDD
  • an external 320Gb HDD, initially to store photos and videos, in practice filled with undefined junk, most of it mine, some of it others’
  • an external 250Gb HDD, initially to store a mirror of my MacBook HDD when it was only 250Gb, then filled with undefined junk, most of it mine, some of it others’
  • an external 110Gb HDD, containing disk images of various installation DVDs, and quite a lot of undefined junk, most of it mine, some of it others’

As you can see, “undefined junk” comes back often. What is it?

  • “I don’t have quite enough space on my MacBook HDD anymore, let’s move this onto an external drive”
  • “heck, do I have a second copy of this data somewhere? let’s make one here just in case”
  • “Sally, let me just make a copy of your user directory here before I upgrade your OS/put in a bigger hard drive, just in case things go wrong”
  • “eeps, I haven’t made a backup in some time, let me put a copy of my home directory somewhere” (pre-Time Machine)

See the idea?

dupeGuru logo.Enter dupeGuru. I’ve wanted a programme like this for ages, without really taking the time to find it. Thanks to a kind soul on IRC, I have finally found the de-duping love of my life. (It works on OSX, Windows, and Linux.) It’s been an invaluable assistance in showing me where my huge chunks of redundant data are. Plus, it’s released as Fairware, which I find a very interesting compensation model: as long as there are uncompensated hours of work on the project, you’re encouraged to contribute to it, and the whole process is visible online.

Back to data. I quickly realized (no surprise) that I had huge amounts of redundant data. This prompted me to coin the following law:

Lack of a clear backup strategy leads to massive, uncontrolled and disorganized data redundancy.

The first thing I did was create a directory on my home server and copy all my external hard drives there. Easier to clean if everything is in one place! I also used my (now clean) 500Gb to copy some folder structures I knew were clean.

Now, one nice thing about dupeGuru is that you can specify a “reference” folder when you choose where to hunt for duplicates. That means you tell dupeGuru “stuff in here is good, don’t touch it, but I want to know if I have duplicate copies of that content lying around”. Once you’ve found duplicates, you can choose to view only the duplicates, sort them by size or folder, delete, copy or move them.

As with any duplicate-finder programme, you cannot just use it blindly, but it’s an invaluable assistant in freeing space.

I ran it on my well-organized Music folder and discovered 5Gb of duplicate data in there — in less than a minute!

Now that I’ve cleaned up most of my mess, I realize that instead of having 8 or 900Gb of data like I imagined, reality is closer to 300Gb. Not bad, eh?

So, here are my clean-up tips, if you have a huge mess like mine, with huge folder structures duplicated at various levels of your storage devices:

  • start small, and grow: pick a folder to start with that’s reasonably under control, clean it up, then add more folders using it as reference actually, better to set a big folder as reference and check to see if a smaller folder isn’t already included in it
  • scan horribly messy structures to identify redundant branches (maybe you have mymess/somenastydirectory and mymess/documents/old/documents/june/somenastydirectory), copy those similar branches to the same level (I do that because it makes it easier for my brain to follow what I’m doing), mark one of them as reference and prune the other; then copy the remaining files into the first one, if there are any
  • if you need to quickly make space, sort your dupes by size
  • if dupeGuru is suggesting you get rid of the copy of a file which is in a directory you want to keep, go back and mark that directory as reference
  • keep an eye on the bottom of the screen, which tells you how much data the dupes represent (if it’s 50Mb and hundreds of small files in as many little folders, you probably don’t want to bother, unless you’re really obsessed with organizing your stuff, in which case you probably won’t have ended up in a situation requiring dupeGuru in the first place)

Happy digital spring-cleaning!

Keeping The Flat Clean: Living Space As User Interface [en]

How I applied what I have understood about designing user interfaces to organising my flat so that it too is ‘usable’ and remains clean.

One of my ongoing post-study projects is reorganising my flat from top to bottom, hopefully throwing out half my stuff in the process. I have been thinking a bit about the way I store things.

First of all, I tend to try to minimise waste of space. I will organise things into cupboards and drawers so that they occupy the less space possible. Second, I tend to organise things with taxonomy rather than function in mind. I will try to store objects of the same type together, regardless of their respective frequency of use.

The result is a perpetually messy flat, with whole areas that I never use (places I do not go, cupboards I never open).

I have therefore been rethinking my whole living environment in terms of function and process. What do I use this thing for, and when? How do I deal with common tasks like washing up or doing my mail? And most important, how does clutter arise? An environment where each thing has a place is not sufficient to prevent clutter. If clutter arises, it is not due to “laziness”. It is because the storage system is not usable enough. It was not designed with the user in mind.

I have switched to considering my living space as a user interface rather than as a library of categorised items.

If I catch myself dumping something on the table instead of putting it away, I’ll try to identify what is preventing me from putting it where it belongs. I’ll try to bring this “where it belongs” closer to where I am naturally tempted to put it. (Instead of thinking “ooh I’m a bad girl, I’m not putting things away as I should,” which we all agree does not help in the least.)

Here are a couple of examples of what I have been doing.

First, I identified the main sources of clutter in my flat: dirty kitchen things, clothes, papers and books. Then I tried to analyse how these things ended up lying about my whole flat. I know that I can clean my flat spotless, and that within a couple of weeks it will be messy again. So obviously, there are things I do mechanically which create clutter. Something which breaks the natural “keeping clean” flow.

Let’s take the dirty dishes to start with. (Not the most glamorous example, but I’m sure there are many of you out there who can relate.) Why do I leave cups, glasses, or even plates lying around in various places? A first reason for this, obviously, is that I do not only eat in my kitchen. That’s a fact we will just have to live with. But why don’t I bring things back to the kitchen? Well, more often than not, the kitchen is in such a state that there wouldn’t really be any place to put them. The sink, of course, is already full of dirty dishes. We have here are perfect example of how disorganisation in one area leads to clutter elsewhere.

One factor which helps stuff pile up in my sink (despite my “fool-proof” method for taming dirty dishes) is that I usually have to make space on the drainer before I start washing up. (I’m one of these people who don’t dry dishes but leave them on the drainer to put them away “later”.) And putting the dishes away is a pain because my cupboard is so crammed with stuff that I have to empty half of it before being able to put my plates were they belong. That is where the bottleneck is. Or the limiting factor, if you prefer.

I realised that out of my four kitchen cupboards, there are only two that I regularly open. I proceeded to empty all the junk out of the others and get rid of the most of it (if I never open the cupboards, then I can’t really need what’s inside them, can I?) I then reorganised the things I use on a regular basis in all the available cupboards, focusing on “how easy will it be to put it back there?” rather than “could I use less space for this?”

One significant result concerns plates. (Don’t worry, we’ll soon be done with the kitchen things.) I have big plates and small plates, four of each. I used to keep the small plates piled up on the big ones, which meant that each time I wanted to put a big plate back in the cupboard, I had to lift up all the small plates first (see what I mean?) That didn’t help prevent things from accumulating on the drainer. Now I have the small plates on one shelf, and big ones on another. I use up more storage space, but it’s easier to put things away. I have rearranged all my kitchen cupboards along the same principle, and the kitchen is now much more usable.

This post is getting much longer than what I expected. However, I don’t want to leave you without letting you know what I have come up with for dealing with my incoming mail. I have been using a tray-based system for sorting paperwork for a long time, but it has shown its limitations regularly over the past years. The new system still uses trays, that groups papers according to what I have to do with them instead of what they are. So now, this is what my trays look like; I’ll see as I use it if it needs any modifications:

  • to do (bills to pay, things to investigate or have a closer look at)
  • to do, ASAP (anything urgent)
  • to file, daily business (bank papers, medical papers, salary slips)
  • to file, important (tax stuff and other important things)
  • to look at (optional) before throwing out (various newspapers, information leaflets)
  • to throw out (envelopes and anything else I don’t keep; the bin is often not close at hand)
  • to sort (anything unopened; sometimes I fetch my mail and don’t deal with it straight away

In conclusion, here is my line of conduct:

  1. pay attention to cupboards that are never opened or shelves that are never reached at
  2. keep an eye on what I do automatically and try to adapt the environment
  3. think “actions”, “process”, and “frequency” instead of “categories” and “families”
  4. accept my limitations

The last point is important: there will always be clean washing waiting to be ironed, because no matter how hard I try, I’ll never get around to ironing and putting it away as soon as it’s dry. I therefore need to take this into account and explicitly plan a space for my huge pile of Clothes Waiting To Be Ironed, even if in an ideal world, Clothes Waiting To Be Ironed should not be around.