Thinking Too Much [en]

[fr] J'ai un peu tendance à penser trop, et à ne pas vivre assez. Aujourd'hui, avec le côté un peu compulsif de la consommation d'infos en ligne (hello, Facebook!) je crois que je suis retombée dans ce piège.

At some point during my young life, in my mid twenties, it dawned on me that I was thinking too much for the amount of life I had racked up until then. Barely post-adolescent brains will go a bit overboard, of course, but this has happened to me a couple of times since. In my mid-thirties, for example: I had spent a lot of energy trying to figure out the world, people, relationships, myself, life, death and the like. I did study philosophy and history of religions, after all.

Green

Today, I’m wondering if I’m not thinking too much — again. But it’s taking a different shape. Although I’ve long been skeptical about all the alarm bells ringing about information overload, I have come to believe that there is something to say about our access to, and relationship with, all the information now at the tip of our fingers. And it’s clear to me that there is something compulsive in the way I go after information.

This was the case for me before the internet. I’ve always been an avid reader. I’ve always loved understanding things. I collected stamps. Then fonts, and even AD&D spells (don’t laugh). At university, I loved immersing myself in a topic, surrounded with piles of books and articles, going through them for hours and seeing a big muddled mess of ideas start to make sense. So, imagine when the internet came along. As far as my academic life goes, that was largely when I was working on my dissertation.

My compulsive search for information has served my life well when I have managed to harness it for concrete projects (write a dissertation; publish a blog post; gain expertise). I even wondered if there was a way to use it to earn money some way. But today, I feel it is leading me around in circles on Facebook, mainly. There is so much interesting stuff to read out there. I still want to understand the world, people, life, love, politics, beliefs, education, relationships, society… And I will never be done. But the internet allows me to not stop.

My tendency to “think too much, live not enough” has found an ally in the  compulsive consumption of online media.

Time to think less, and accept I can’t figure everything out.

Deb Roy: The Birth of a Word [en]

[fr] Une vidéo fascinante sur l'apprentissage du langage -- et aussi sur le traitement et la visualisation de quantités étourdissantes de données linguistiques. A regarder.

Ah yes, another video. You see, some evenings, instead of sitting in front of the TV (not my usual evening occupation, by the way), I sit in front of my computer and watch videos I’ve queued up on Boxee — or hunted down for the occasion. No surprise, TED Talks are a favourite hang-out of mine.

Here’s one titled The Birth of a Word: researcher Deb Roy recorded the whole three first years of his son’s life to gather data which, once analyzed, would bring insight on how we learn language.

It’s fascinating. Fascinating for the language geek in me, and also fascinating from a data visualisation and analysis point of view. In the second part of his talk, Deb moves on to analysis of publicly available commentary (online) matched to TV shows they’re about. The visualisation is stunning (he’s showing us real data) and the implications left me feeling giddy.

Your turn.

Hat tip: thanks to Loïc for pointing out this video on Facebook.

DupeGuru: You Own Less Data Than You Think [en]

[fr] Pour faire la chasse aux doublons sur Mac, Windows ou Linux, je vous recommande chaudement d'essayer dupeGuru! (En plus, système de rémunération des développeurs intéressant: Fairware.)

One of the consequences of putting an SSD into my MacBook and using CrashPlan and an Amahi home server to store my data and backups is that I have been forced to do a little digital spring-cleaning.

I had:

  • a 500Gb HDD in my MacBook, which hit “full” some time back before I freed up some space by moving stuff to an external HDD
  • an external 320Gb HDD, initially to store photos and videos, in practice filled with undefined junk, most of it mine, some of it others’
  • an external 250Gb HDD, initially to store a mirror of my MacBook HDD when it was only 250Gb, then filled with undefined junk, most of it mine, some of it others’
  • an external 110Gb HDD, containing disk images of various installation DVDs, and quite a lot of undefined junk, most of it mine, some of it others’

As you can see, “undefined junk” comes back often. What is it?

  • “I don’t have quite enough space on my MacBook HDD anymore, let’s move this onto an external drive”
  • “heck, do I have a second copy of this data somewhere? let’s make one here just in case”
  • “Sally, let me just make a copy of your user directory here before I upgrade your OS/put in a bigger hard drive, just in case things go wrong”
  • “eeps, I haven’t made a backup in some time, let me put a copy of my home directory somewhere” (pre-Time Machine)

See the idea?

dupeGuru logo.Enter dupeGuru. I’ve wanted a programme like this for ages, without really taking the time to find it. Thanks to a kind soul on IRC, I have finally found the de-duping love of my life. (It works on OSX, Windows, and Linux.) It’s been an invaluable assistance in showing me where my huge chunks of redundant data are. Plus, it’s released as Fairware, which I find a very interesting compensation model: as long as there are uncompensated hours of work on the project, you’re encouraged to contribute to it, and the whole process is visible online.

Back to data. I quickly realized (no surprise) that I had huge amounts of redundant data. This prompted me to coin the following law:

Lack of a clear backup strategy leads to massive, uncontrolled and disorganized data redundancy.

The first thing I did was create a directory on my home server and copy all my external hard drives there. Easier to clean if everything is in one place! I also used my (now clean) 500Gb to copy some folder structures I knew were clean.

Now, one nice thing about dupeGuru is that you can specify a “reference” folder when you choose where to hunt for duplicates. That means you tell dupeGuru “stuff in here is good, don’t touch it, but I want to know if I have duplicate copies of that content lying around”. Once you’ve found duplicates, you can choose to view only the duplicates, sort them by size or folder, delete, copy or move them.

As with any duplicate-finder programme, you cannot just use it blindly, but it’s an invaluable assistant in freeing space.

I ran it on my well-organized Music folder and discovered 5Gb of duplicate data in there — in less than a minute!

Now that I’ve cleaned up most of my mess, I realize that instead of having 8 or 900Gb of data like I imagined, reality is closer to 300Gb. Not bad, eh?

So, here are my clean-up tips, if you have a huge mess like mine, with huge folder structures duplicated at various levels of your storage devices:

  • start small, and grow: pick a folder to start with that’s reasonably under control, clean it up, then add more folders using it as reference actually, better to set a big folder as reference and check to see if a smaller folder isn’t already included in it
  • scan horribly messy structures to identify redundant branches (maybe you have mymess/somenastydirectory and mymess/documents/old/documents/june/somenastydirectory), copy those similar branches to the same level (I do that because it makes it easier for my brain to follow what I’m doing), mark one of them as reference and prune the other; then copy the remaining files into the first one, if there are any
  • if you need to quickly make space, sort your dupes by size
  • if dupeGuru is suggesting you get rid of the copy of a file which is in a directory you want to keep, go back and mark that directory as reference
  • keep an eye on the bottom of the screen, which tells you how much data the dupes represent (if it’s 50Mb and hundreds of small files in as many little folders, you probably don’t want to bother, unless you’re really obsessed with organizing your stuff, in which case you probably won’t have ended up in a situation requiring dupeGuru in the first place)

Happy digital spring-cleaning!

A Data Management Fantasy [en]

[fr] Mon rêve: un système qui cacherait sur un espace donné de mon SSD (disons 50GB) les fichiers les plus récemment ouverts se trouvant sur mon disque dur externe. Ainsi, j'aurais à portée de main et sur disque dur rapide tous mes fichiers courants. Vous connaissez une solution qui fait ça?

I’m now running a happy MacBook with a 120Gb SSD (too big or to small depending on how you look at it, but I was in a hurry and dependant on what was in stock in the shop). I have an external 500Gb HDD to store all my junk on.

And here’s my dream. Wouldn’t it be nice if I could devote a certain amount of space on the SSD to my files, say 50Gb, and have that space occupied by cached copies of the files from the external drive that I most recently used? When I modify the files, the cached copies and those on the HDD would sync. And if I haven’t touched a file for long enough, it would be removed from the cache to free up space.

Like that my “current” files would be on the super-fast SDD and close at hand when I’m on the road.

I’m sure a solution to do this already exists — heard of anything?

Going SSD and Amahi Home Server [en]

[fr] En train de mettre un SSD dans mon vieux Macbook (performance!), ce qui signifie stockage distant de mes données: disque dur externe, serveur maison Amahi, et Crashplan pour les backups.

I’ve been drooling on the MacBook Air over the past weeks, to the point I’ve pretty much decided it’ll be my next machine. Sure, a MacBook Pro is way more powerful, but do I need all that extra power? The eternal question when changing computers.

I understood that one of the things that make the Air zippy is the SSD. But why wait for another machine to have an SSD? I’m going to put one in my MacBook directly (I’ve already changed the hard drive twice, no biggie thanks to ifixit). Actually, I would be doing this now, if I hadn’t by mistake ordered a 3.5″ SSD instead of 2.5″ (I have two on my hands by the way, if anybody is interested in buying them off me, still in their unopened box).

The reason it took me so long to warm up to SSD strategy is the price. Horribly expensive per Gb, compared to a “normal” hard drive! But what I’ve understood is that if you go the SSD way, you also stop storing all your data on those expensive Gb. You keep the expensive SSD Gb for your OS and applications, and all the data that is just “storage” goes on something slower.

For example, an external hard drive (I’m going to have a 2.5″ 500 Gb one once I swap it out) or… an Amahi Home Server, like the one I’m currently building. The server is a good solution for me to keep my data on a flexible redundant system (Greyhole).

Add to that Crashplan, which plays nice with Amahi, and the server will also allow me to host distant backups for my friends (with the idea that they might also allow me to use some of their storage space for mine). VPN acces, etc.

Right, I’m going back to my hardware!

A Year Ago: Backup Awareness Day [en]

A year ago today, I hit the wrong “drop” button in PhpMyAdmin and completely deleted my blog. I couldn’t remember when I had last made a backup.

I’ll cut the long story of recovery short, but it took me nearly two months to get all my data back in place. I could have saved myself a lot of pain and worry and extra work if I had had an up-to-date backup of my blog.

I’ve always been sloppy with backups. Most people who are not IT professionals (and even them) are sometimes even sloppier still. We all know we should make backups more often, but we still live in the hope that we will not die theft, hard drive failures, and dropped databases will not happen to us. Oh yes, we know we’re wrong, but we’ve been lucky so far, haven’t we? Now shoo away those guilty feelings and get on with your life.

Well, no. I decided to make the 24th of every month Backup Awareness Day. A day to

  • blog about the importance of backups
  • give practical tips to actually do them
  • help people around you do backups
  • tell horror stories of lost data
  • do your own backups!
  • put in place automated systems.

You get the idea. A day a month to think about backups, do something about them, and raise awareness in your communities.

Unfortunately, I guess I had too much going on at the time, and I didn’t really follow through (I tweeted a bit, and blogged about it in June, but honestly, these last six months haven’t been very backup-aware).

So, this year, let’s make Backup Awareness Day a real part of our lives. I need your help for that. On the 24th of each month, even if I forget (I’ll try not to, promise!), tweet about it, blog about it, do your backups, and encourage those around you to do so too. Online, and offline. Can I count you in?

I’ve just hit that “Export” button in WordPress, saved a dump of my MySQL database, and plugged in the external hard drive so that Time Machine could have a go at it. You too — do these things now if that’s how you back up your important data, or do whatever you do to make sure your words, photographs, videos, and precious files do not evaporate in the event of a disaster.

I’m now going to mark Backup Awareness Day in my calendar for the coming months. (Of course, next month, Backup Awareness Day coincides with Ada Lovelace Day, which I’ll be telling you more about in a second later today.)

Update: Backup Awareness Day now has its own website at backupawareness.org! I’m going to need help with it, so let me know if you’d be ready to give a hand.

Prezi: Never Use Powerpoint Again [en]

[fr] Prezi va tuer Powerpoint, c'est moi qui vous le dis. Vous créez un canevas géant de votre présentation, et à coups de zoom et de déplacement, naviguez à l'intérieur pour illustrer votre présentation tout en gardant la structure de celle-ci bien visible. Powerpoint? Une dimension. Prezi? Trois.

A quick note to point you to Prezi, which I saw in action a couple of times in Paris. For example, see this one below by Kevin Marks on Buzzwordsmy talk notes).

Prezi allows you to create a giant mind-map of your presentation, and using zoom and movement on the map, creates a presentation from it.

Check out the Prezi tutorials and videos for more. It’s just blowing my mind, and seems very fun and easy to use too.

I think I’m never ever going to use powerpoint again.

Today: Backup Awareness Day! [en]

[fr] Aujourd'hui, comme le 24 de chaque mois, journée des sauvegardes. Faites les vôtres!

I haven’t done as much as I wanted around Backup Awareness Day yet (and even skipped last month because I was in the mountains at that time), but it will come during the next months.

Backup Awareness Day takes place on the 24th of each month and is the occasion to:

  • do your backups and set up automatic systems to keep your data safe
  • help and encourage others to do so by helping them and blogging about the importance of backups and backup techniques.

If like me you’re having a busy week (busy but good), at least take the time today to:

  • plug in that external hard drive and make sure Time Machine does a backup
  • export your WordPress blog
  • dump your MySQL database
  • if all else fails or is too complicated for you, copy your most precious document folders onto a thumb drive or an external hard drive.

More next month!

Today is Backup Awareness Day! [en]

Two months ago, on February 24th, I hit the wrong “Drop” button in PhpMyAdmin, resulting in the immediate deletion of the blog you’re reading. I didn’t know when I had last backed it up.

The story ends well, though it cost me (and others) many hours (days, actually) of work to get the whole of Climb to the Stars back online again.

I’ve always been careless about backups. Like many of you, probably. We can afford to be careless because accidents don’t happen very often, and as with Black Swans, we are under the mistaken belief that having been safe in the past will keep us safe in the future. Not so. As I like to repeat, the first time a disaster happens, well, it had never happened till then.

So, I’ve decided to declare the 24th of each month “Backup Awareness Day”. Here’s what it’s about:

  • Back up your files.
  • Back up your website.
  • Blog about the importance of backing up (sharing tips, stories, advice).
  • Tell your friends to back up.
  • Help your friends back up.
  • Put in place automatic backup systems.

Bottom-line: decrease the number of people who never back up, or back up so infrequently they’ll be in a real mess if things go wrong.

Now, perfectionism is the biggest enemy to getting things done. Backup Awareness Day does not mean that you have to do all this. Here are a few ideas to get your started (better a bad backup than no backup at all):

  • If Time Machine (or any other regular backup system you use for your computer) has been telling you it hasn’t done a backup in ages, stop what you’re doing right now and plug it in.
  • If you use WordPress, when was the last time you went to Tools > Export to make a quick backup? It’s not the best way to do it, but in my case, it saved CTTS.
  • Do you use something like Mozy to have a remote backup of your most important files? Time to sign up, maybe.
  • Are you working on important documents that exist only on your computer, which is never backed up? At the minimum, pick up a thumb drive and copy them onto it — or send yourself an e-mail with the files as attachment, if your e-mail is stored outside your computer (Gmail, for example).
  • Do you have an automatic backup set up for your database or website? Set some time aside on Backup Awareness Day to figure out cron.
  • When did you make the last dump of your MySQL database? Head over to PhpMyAdmin, or the command line (it’s mysqldump --opt -u user -p databasename > my-dirty-backup.sql)
  • Do you have the backup thing all figured out? Write a post for your readers with a few tips or tutorials to help them along. (Tag your posts “backupawarenessday” — I thought about “BAD” but that wasn’t really optimal ;-))

I’m hoping to develop the concept more over the coming months. If you have ideas, get in touch, and take note of Backup Awareness Day for the month of May: Sunday 24th!

(Now stop reading and go do a few backups.)

Blog Host Ugliness [en]

[fr] Une amie Serbe s'est vu poser un ultimatum par son hébergeur de blogs: 24 heures pour supprimer commentaires d'un autre blogueur et liens vers ses sites, ou voir son blog disparaître.

L'hébergeur en question (qui utilise WordPress multi-utilisateurs, comme WordPress.com) avait en outre désactivé la fonction d'exportation de blog.

On s'en est sortis comme on a pu (voir ici).

Mis à part le côté technique de l'affaire, il est absolument scandaleux qu'un hébergeur de blogs se permette d'agir ainsi. Certes, tout hébergeur est libre de "virer" des clients -- mais déactiver au préalable la fonction d'exportation des blogs, cela atteint des sommets de mesquinerie. A bon entendeur.

Edit: sur Seesmic, l'histoire en français et en vidéo.

Note: I’ve updated this post as I gathered information allowing me to see more clearly in this whole mess. Please read the comment if you’re going to jump in the conversation or blog about this.

Wednesday night, my friend Sanja from BlogOpen (she was my very kind and competent hostess) pinged me on IM. She had less than 24 hours to export her blog before her blog host shut it down.

It was a blog hosted by WordPress multi-user [Edit: not WPMU]. Easy enough, I thought. There is an export function. Unfortunately, when I logged in (the interface was in Serbian, but I can find my way through WordPress with my eyes closed), this is what I found:

WordPress (MU?) with no Export

Even if you don’t understand Serbian, you can see there is a missing tab. I tried calling /wp-admin/export.php directly, but the file had been removed.

Well, after a bit of poking, prodding and thinking, this is what I came up with (reminder: WPMU means that you can’t there was no possibility to install plugins and no direct access to the server):

Last Hope Export of WordPress MU Blog

I went into Options > Reading. I set the feeds to “entire post”. As there were 110 posts in this blog, I set the home page to display all of them, with a little margin for error. There were more than 1400 comments, so I set the maximum number of items in a feed to 1500.

Then I did three things:

  • saved /feed (an RSS dump of the blog posts)
  • saved /comments/feed (an RSS dump of the comments)
  • scraped the blog (with single blog post pages) as an extra backup by running wget -r -l1 -w1 BLOGURL (thanks, John) from my server (also to save the images).

The blog was saved. I couldn’t import the RSS dump of blog posts into WordPress.com, where I told Sanja to open a new blog account, so I quickly set up a regular WordPress install on my server, imported it there, and exported it in WXR format. Great.

Comments, however, are another story. If you’re in a hackish mood, any help would be appreciated.

We’ll probably have to deal with the images too once the blog has been completely wiped off the 381.com server — for the moment it seems like it was disabled, but the images are still there (see this one for example).

There, that was for the technical part.

Now for a personal comment. I find it utterly disgusting and shocking that a blog host owner would give people an ultimatum to leave and disable the export function in the blogging software. Sanja tells me that they had the export function until a few days before the ultimatum.

Of course, a blog host can choose not to host certain people. But trying to lock people in by disabling export of their own data is simply evil. If you’re kicking people off your system, you damn well better make sure they can take their data with them.

Edit, 27.01, 12:00: I’m happy to learn that it seems the disabling of the export function was not related to the ultimatum, and that the blog381 people were not actually trying to actively lock people in. However, it remains that it’s pretty delicate in a conflictual situation to tell people to “submit or leave” when they don’t have a way to export their data on their own.

So, people, please. If you need a blog host, choose a serious one. WordPress.com for example. Or Blogger. Or Typepad. Putting your precious blog between the hands of an individual is risky (weblogs.com, anybody? and if you remember, people on weblogs.com at least had the guarantee they could export their data…)

How did this happen?

I got some details about the situation, but a word of warning about that, first. The source material to this Serbian blogosphere drama is all in… Serbian. I’m relying here on what my friend Sanja told me about the situation, and I do not doubt her good faith. I know, though, that stories do have multiple sides, and that there might be more to the background than what I’m telling you here — but whatever the background story, it cannot justify the behaviour of this blog host.

From what I gathered, what brought about this crisis is a quarrel between two bloggers: Tatjana aka Venus aka Lang (Update: Tatjana is not happy that I’m linking to her and has redirected visitors to this site elsewhere; to see her blog, copy-paste the link http://www.laluve.com/ in your browser), the owner of the Serbian blogging platform blog381.com (not the Tatjana who organized BlogOpen!), and another pretty popular blogger. At some point, Tatjana decided to forbid the people using her platform from linking to this other blogger or harbouring his comments.

Here is the warning she posted on the community forums:

Vlasnik blogova

http://bruh.org/ludizmaj/,

http://www.blogoye.org/pecina/,

http://www.blogoye.org/Mudrosti/,

http://www.blogoye.org/sujeta/

(ima verovatno jos ali ne mogu da trazim)

je ovom blog sistemu naneo stetu laziranjem glasova oko izbora za najblogera (na kom je on bio ‘pobednik’), ‘miniranjem’ sledeceg izbora, sirenjem neistina, traceva, vrbovanjem novih blogera sa tri osam jedan sistema, a sve u cilju da se naskodi ovom sistemu a poveca sopstveni traffic i “ugled”.

Za one koji nisu dovoljno informisani i sve ostale koji su slusali ili nisu, samo jednu stranu price od gore pomenutog, necu dodatno iznositi nikakve detalje, niti vise imam nameru da se borim sa provincijalizmima pojedinih ljudi koji su bili ili jesu na neki nacin u komunikaciji sa blogom381 i njegovim korisnicima.

Slobodna volja svakog od nas da pise kako i gde hoce, ali oni koji se odluce da i dalje pisu ovde nece moci da imaju linkove ka ovim blogovima niti komentare vlasnika istih.

Ukoliko imate zelju,nameru ili potrebu da ostanete na ovom blog sitemu, obrisite linkove i komentare gore pomenutog blogera u roku od 24h.

Translation (Sanja was a bit tired, so forgive the wobbliness):

The owner of these blogs
http://bruh.org/ludizmaj/,
http://www.blogoye.org/pecina/,
http://www.blogoye.org/Mudrosti/,
http://www.blogoye.org/sujeta/

has caused damage to this blog system by faking votes for the election of “The best blogger” (where he was “the winner”), and was undermining the next election by spreading gossip, lies, and recruiting new 381 bloggers, with only one aim: to damage this community and increase his own blog traffic and “reputation”.

For those who are not informed well enough, and all others who were listening or didn’t, only one side of the story of the person mentioned above, I will not give any additional details, nor do I have the intention to fight with provincialism of some people who were or in some way are connected to blog381 communication and their users.

It is the free will of each of us to write how and where we want to, but those who decide to keep writing here, will not be able to have links to these blogs or comments by their owner.

Those of you who have the wish, intention or need to stay on this blog system, should delete links and comments of the blogger (mentioned above) within 24 hours.

Sanja learnt about this because the owner of the blogging platform left a comment on one of her posts (not the most recent) to let her know about it. Given that the “other blogger” in question is a friend of Sanja’s, she wasn’t going to comply.

Other bloggers have also seen their blogs deleted, or at least de-activated (actually, before the 24-hour limit was up). A dozen or so, says Sanja.

If you want to chime in on the “political” side of this story (particularly if you’re involved in this story or a direct witness), you’re welcome to use my comments. However, I ask (as always) that everybody remain civil and refrain from personal attacks (commonsense blogging etiquette, y’know).

Update: It seems that since Sanja’s blog was deactivated, the whole blogging platform has been shut down, with a message that people can e-mail the administrator to get an export of their blog. This message was not there during the ultimatum period.

In a comment to this post, Tatjana aka Lang asked me to remove the link to her blog, http://www.laluve.com/ , which I had placed upon her name. As I have refused to remove it (linking to the people involved in this story is perfectly relevant, and on the web, you can link to who you want, anyway), she has set up a redirection which sends visitors from this site straight off to CNN. So, I’ve left the link in, of course, but provided you with a handy copy-paste if you want to go and visit her all the same.