Tag Archives: data

Deb Roy: The Birth of a Word

[fr]

Une vidéo fascinante sur l'apprentissage du langage -- et aussi sur le traitement et la visualisation de quantités étourdissantes de données linguistiques. A regarder.

[en]

Ah yes, another video. You see, some evenings, instead of sitting in front of the TV (not my usual evening occupation, by the way), I sit in front of my computer and watch videos I’ve queued up on Boxee — or hunted down for the occasion. No surprise, TED Talks are a favourite hang-out of mine.

Here’s one titled The Birth of a Word: researcher Deb Roy recorded the whole three first years of his son’s life to gather data which, once analyzed, would bring insight on how we learn language.

It’s fascinating. Fascinating for the language geek in me, and also fascinating from a data visualisation and analysis point of view. In the second part of his talk, Deb moves on to analysis of publicly available commentary (online) matched to TV shows they’re about. The visualisation is stunning (he’s showing us real data) and the implications left me feeling giddy.

Your turn.

Hat tip: thanks to Loïc for pointing out this video on Facebook.

Similar Posts:

Posted in Language Geekiness, Social Media and the Web | Tagged analysis, baby, commentary, data, data visualisation, language, Research, talking, ted talks, tv, video | Leave a comment

DupeGuru: You Own Less Data Than You Think

[fr]

Pour faire la chasse aux doublons sur Mac, Windows ou Linux, je vous recommande chaudement d'essayer dupeGuru! (En plus, système de rémunération des développeurs intéressant: Fairware.)

[en]

One of the consequences of putting an SSD into my MacBook and using CrashPlan and an Amahi home server to store my data and backups is that I have been forced to do a little digital spring-cleaning.

I had:

  • a 500Gb HDD in my MacBook, which hit “full” some time back before I freed up some space by moving stuff to an external HDD
  • an external 320Gb HDD, initially to store photos and videos, in practice filled with undefined junk, most of it mine, some of it others’
  • an external 250Gb HDD, initially to store a mirror of my MacBook HDD when it was only 250Gb, then filled with undefined junk, most of it mine, some of it others’
  • an external 110Gb HDD, containing disk images of various installation DVDs, and quite a lot of undefined junk, most of it mine, some of it others’

As you can see, “undefined junk” comes back often. What is it?

  • “I don’t have quite enough space on my MacBook HDD anymore, let’s move this onto an external drive”
  • “heck, do I have a second copy of this data somewhere? let’s make one here just in case”
  • “Sally, let me just make a copy of your user directory here before I upgrade your OS/put in a bigger hard drive, just in case things go wrong”
  • “eeps, I haven’t made a backup in some time, let me put a copy of my home directory somewhere” (pre-Time Machine)

See the idea?

dupeGuru logo.Enter dupeGuru. I’ve wanted a programme like this for ages, without really taking the time to find it. Thanks to a kind soul on IRC, I have finally found the de-duping love of my life. (It works on OSX, Windows, and Linux.) It’s been an invaluable assistance in showing me where my huge chunks of redundant data are. Plus, it’s released as Fairware, which I find a very interesting compensation model: as long as there are uncompensated hours of work on the project, you’re encouraged to contribute to it, and the whole process is visible online.

Back to data. I quickly realized (no surprise) that I had huge amounts of redundant data. This prompted me to coin the following law:

Lack of a clear backup strategy leads to massive, uncontrolled and disorganized data redundancy.

The first thing I did was create a directory on my home server and copy all my external hard drives there. Easier to clean if everything is in one place! I also used my (now clean) 500Gb to copy some folder structures I knew were clean.

Now, one nice thing about dupeGuru is that you can specify a “reference” folder when you choose where to hunt for duplicates. That means you tell dupeGuru “stuff in here is good, don’t touch it, but I want to know if I have duplicate copies of that content lying around”. Once you’ve found duplicates, you can choose to view only the duplicates, sort them by size or folder, delete, copy or move them.

As with any duplicate-finder programme, you cannot just use it blindly, but it’s an invaluable assistant in freeing space.

I ran it on my well-organized Music folder and discovered 5Gb of duplicate data in there — in less than a minute!

Now that I’ve cleaned up most of my mess, I realize that instead of having 8 or 900Gb of data like I imagined, reality is closer to 300Gb. Not bad, eh?

So, here are my clean-up tips, if you have a huge mess like mine, with huge folder structures duplicated at various levels of your storage devices:

  • start small, and grow: pick a folder to start with that’s reasonably under control, clean it up, then add more folders using it as reference actually, better to set a big folder as reference and check to see if a smaller folder isn’t already included in it
  • scan horribly messy structures to identify redundant branches (maybe you have mymess/somenastydirectory and mymess/documents/old/documents/june/somenastydirectory), copy those similar branches to the same level (I do that because it makes it easier for my brain to follow what I’m doing), mark one of them as reference and prune the other; then copy the remaining files into the first one, if there are any
  • if you need to quickly make space, sort your dupes by size
  • if dupeGuru is suggesting you get rid of the copy of a file which is in a directory you want to keep, go back and mark that directory as reference
  • keep an eye on the bottom of the screen, which tells you how much data the dupes represent (if it’s 50Mb and hundreds of small files in as many little folders, you probably don’t want to bother, unless you’re really obsessed with organizing your stuff, in which case you probably won’t have ended up in a situation requiring dupeGuru in the first place)

Happy digital spring-cleaning!

Similar Posts:

Posted in OSX | Tagged data, directories, dupeguru, duplicate, files, folders, server, spring-cleaning | Leave a comment

A Data Management Fantasy

[fr]

Mon rêve: un système qui cacherait sur un espace donné de mon SSD (disons 50GB) les fichiers les plus récemment ouverts se trouvant sur mon disque dur externe. Ainsi, j'aurais à portée de main et sur disque dur rapide tous mes fichiers courants. Vous connaissez une solution qui fait ça?

[en]

I’m now running a happy MacBook with a 120Gb SSD (too big or to small depending on how you look at it, but I was in a hurry and dependant on what was in stock in the shop). I have an external 500Gb HDD to store all my junk on.

And here’s my dream. Wouldn’t it be nice if I could devote a certain amount of space on the SSD to my files, say 50Gb, and have that space occupied by cached copies of the files from the external drive that I most recently used? When I modify the files, the cached copies and those on the HDD would sync. And if I haven’t touched a file for long enough, it would be removed from the cache to free up space.

Like that my “current” files would be on the super-fast SDD and close at hand when I’m on the road.

I’m sure a solution to do this already exists — heard of anything?

Similar Posts:

Posted in OSX | Tagged caching, data, external drive, files, lazyweb, macbook, ssd | 4 Comments

Going SSD and Amahi Home Server

[fr]

En train de mettre un SSD dans mon vieux Macbook (performance!), ce qui signifie stockage distant de mes données: disque dur externe, serveur maison Amahi, et Crashplan pour les backups.

[en]

I’ve been drooling on the MacBook Air over the past weeks, to the point I’ve pretty much decided it’ll be my next machine. Sure, a MacBook Pro is way more powerful, but do I need all that extra power? The eternal question when changing computers.

I understood that one of the things that make the Air zippy is the SSD. But why wait for another machine to have an SSD? I’m going to put one in my MacBook directly (I’ve already changed the hard drive twice, no biggie thanks to ifixit). Actually, I would be doing this now, if I hadn’t by mistake ordered a 3.5″ SSD instead of 2.5″ (I have two on my hands by the way, if anybody is interested in buying them off me, still in their unopened box).

The reason it took me so long to warm up to SSD strategy is the price. Horribly expensive per Gb, compared to a “normal” hard drive! But what I’ve understood is that if you go the SSD way, you also stop storing all your data on those expensive Gb. You keep the expensive SSD Gb for your OS and applications, and all the data that is just “storage” goes on something slower.

For example, an external hard drive (I’m going to have a 2.5″ 500 Gb one once I swap it out) or… an Amahi Home Server, like the one I’m currently building. The server is a good solution for me to keep my data on a flexible redundant system (Greyhole).

Add to that Crashplan, which plays nice with Amahi, and the server will also allow me to host distant backups for my friends (with the idea that they might also allow me to use some of their storage space for mine). VPN acces, etc.

Right, I’m going back to my hardware!

Similar Posts:

Posted in Technology | Tagged amahi, crashplan, data, home server, macbook, performance, ssd, storage | 4 Comments

A Year Ago: Backup Awareness Day

[fr]

Aujourd'hui, c'est Backup Awareness Day (quelqu'un a une traduction élégante à proposer? La journée mondiale des sauvegardes?) -- un jour par mois, le 24, pour faire ses sauvegardes si l'on est un peu laxe sur la question, et de façon générale rendre les gens conscients de l'importance de cette démarche. Il y a un an, en effet, suite à une fausse manipulation, j'ai effacé tout mon blog. Il m'a fallu deux mois pour le remettre en ligne en entier, ce qui aurait été bien plus simple si j'avais eu une sauvegarde à jour.

Le 24 de chaque mois, donc, c'est l'occasion d'écrire un article sur vos blogs pour rappeler à vos lecteurs de faire leurs sauvegardes, et leur donner des trucs pratiques pour le faire bien. Est-ce que je peux compter sur vous?

[en]

A year ago today, I hit the wrong “drop” button in PhpMyAdmin and completely deleted my blog. I couldn’t remember when I had last made a backup.

I’ll cut the long story of recovery short, but it took me nearly two months to get all my data back in place. I could have saved myself a lot of pain and worry and extra work if I had had an up-to-date backup of my blog.

I’ve always been sloppy with backups. Most people who are not IT professionals (and even them) are sometimes even sloppier still. We all know we should make backups more often, but we still live in the hope that we will not die theft, hard drive failures, and dropped databases will not happen to us. Oh yes, we know we’re wrong, but we’ve been lucky so far, haven’t we? Now shoo away those guilty feelings and get on with your life.

Well, no. I decided to make the 24th of every month Backup Awareness Day. A day to

  • blog about the importance of backups
  • give practical tips to actually do them
  • help people around you do backups
  • tell horror stories of lost data
  • do your own backups!
  • put in place automated systems.

You get the idea. A day a month to think about backups, do something about them, and raise awareness in your communities.

Unfortunately, I guess I had too much going on at the time, and I didn’t really follow through (I tweeted a bit, and blogged about it in June, but honestly, these last six months haven’t been very backup-aware).

So, this year, let’s make Backup Awareness Day a real part of our lives. I need your help for that. On the 24th of each month, even if I forget (I’ll try not to, promise!), tweet about it, blog about it, do your backups, and encourage those around you to do so too. Online, and offline. Can I count you in?

I’ve just hit that “Export” button in WordPress, saved a dump of my MySQL database, and plugged in the external hard drive so that Time Machine could have a go at it. You too — do these things now if that’s how you back up your important data, or do whatever you do to make sure your words, photographs, videos, and precious files do not evaporate in the event of a disaster.

I’m now going to mark Backup Awareness Day in my calendar for the coming months. (Of course, next month, Backup Awareness Day coincides with Ada Lovelace Day, which I’ll be telling you more about in a second later today.)

Update: Backup Awareness Day now has its own website at backupawareness.org! I’m going to need help with it, so let me know if you’d be ready to give a hand.

Similar Posts:

Posted in My projects, Technology | Tagged backup, backup awareness day, BAD, data | 4 Comments

Prezi: Never Use Powerpoint Again

[fr]

Prezi va tuer Powerpoint, c'est moi qui vous le dis. Vous créez un canevas géant de votre présentation, et à coups de zoom et de déplacement, naviguez à l'intérieur pour illustrer votre présentation tout en gardant la structure de celle-ci bien visible. Powerpoint? Une dimension. Prezi? Trois.

[en]

A quick note to point you to Prezi, which I saw in action a couple of times in Paris. For example, see this one below by Kevin Marks on Buzzwordsmy talk notes).

Prezi allows you to create a giant mind-map of your presentation, and using zoom and movement on the map, creates a presentation from it.

Check out the Prezi tutorials and videos for more. It’s just blowing my mind, and seems very fun and easy to use too.

I think I’m never ever going to use powerpoint again.

Similar Posts:

Posted in Technology | Tagged data, mindmap, powerpoint, presentations, prezi, slides, vizualisation, zoom | 11 Comments

Today: Backup Awareness Day!

[fr]

Aujourd'hui, comme le 24 de chaque mois, journée des sauvegardes. Faites les vôtres!

[en]

I haven’t done as much as I wanted around Backup Awareness Day yet (and even skipped last month because I was in the mountains at that time), but it will come during the next months.

Backup Awareness Day takes place on the 24th of each month and is the occasion to:

  • do your backups and set up automatic systems to keep your data safe
  • help and encourage others to do so by helping them and blogging about the importance of backups and backup techniques.

If like me you’re having a busy week (busy but good), at least take the time today to:

  • plug in that external hard drive and make sure Time Machine does a backup
  • export your WordPress blog
  • dump your MySQL database
  • if all else fails or is too complicated for you, copy your most precious document folders onto a thumb drive or an external hard drive.

More next month!

Similar Posts:

Posted in My projects, Technology | Tagged backup, backup awareness day, BAD, data, Events, security | 3 Comments

Today is Backup Awareness Day!

[fr]

Il y a deux mois, le 24 février, j'ai appuyé sur la mauvais bouton et effacé mon blog. Je ne savais pas de quand datait ma dernière sauvegarde.

Tout est bien qui finit bien pour moi (mais ça n'a pas été sans mal). Du coup, histoire d'encourager ceux qui, comme moi, sont un peu "mous de la sauvegarde", j'ai décrété que le 24 de chaque mois serait "Backup Awareness Day", ou "Journée de sensibilisation à la sauvegarde" (c'est mieux en Anglais, vous trouvez pas?)

Aujourd'hui, donc, on s'arrête un moment pour faire ces sauvegardes dont on aurait dû s'occuper depuis longtemps. On rebranche Time Machine. On va dans WordPress > Outils > Exporter. On fait un dump SQL de sa base de données. On met une copie des fichiers super importants sur lesquels on travaille depuis des semaines et qui n'existent que sur son ordinateur sur une clé USB. On motive ses voisins à faire de même. On écrit un article sur son blog pour rappeler à ses lecteurs de faire des sauvegardes. On met en place des systèmes automatisés, et l'on aide les autres à le faire.

Faire des sauvegardes une fois par mois, ce n'est pas suffisant. Mais c'est mieux que rien. Avec Backup Awareness Day, une fois par mois, prenons le temps de rappeler au monde que c'est important. Participez et taguez vos articles "backupawarenessday"!

[en]

Two months ago, on February 24th, I hit the wrong “Drop” button in PhpMyAdmin, resulting in the immediate deletion of the blog you’re reading. I didn’t know when I had last backed it up.

The story ends well, though it cost me (and others) many hours (days, actually) of work to get the whole of Climb to the Stars back online again.

I’ve always been careless about backups. Like many of you, probably. We can afford to be careless because accidents don’t happen very often, and as with Black Swans, we are under the mistaken belief that having been safe in the past will keep us safe in the future. Not so. As I like to repeat, the first time a disaster happens, well, it had never happened till then.

So, I’ve decided to declare the 24th of each month “Backup Awareness Day”. Here’s what it’s about:

  • Back up your files.
  • Back up your website.
  • Blog about the importance of backing up (sharing tips, stories, advice).
  • Tell your friends to back up.
  • Help your friends back up.
  • Put in place automatic backup systems.

Bottom-line: decrease the number of people who never back up, or back up so infrequently they’ll be in a real mess if things go wrong.

Now, perfectionism is the biggest enemy to getting things done. Backup Awareness Day does not mean that you have to do all this. Here are a few ideas to get your started (better a bad backup than no backup at all):

  • If Time Machine (or any other regular backup system you use for your computer) has been telling you it hasn’t done a backup in ages, stop what you’re doing right now and plug it in.
  • If you use WordPress, when was the last time you went to Tools > Export to make a quick backup? It’s not the best way to do it, but in my case, it saved CTTS.
  • Do you use something like Mozy to have a remote backup of your most important files? Time to sign up, maybe.
  • Are you working on important documents that exist only on your computer, which is never backed up? At the minimum, pick up a thumb drive and copy them onto it — or send yourself an e-mail with the files as attachment, if your e-mail is stored outside your computer (Gmail, for example).
  • Do you have an automatic backup set up for your database or website? Set some time aside on Backup Awareness Day to figure out cron.
  • When did you make the last dump of your MySQL database? Head over to PhpMyAdmin, or the command line (it’s mysqldump --opt -u user -p databasename > my-dirty-backup.sql)
  • Do you have the backup thing all figured out? Write a post for your readers with a few tips or tutorials to help them along. (Tag your posts “backupawarenessday” — I thought about “BAD” but that wasn’t really optimal ;-) )

I’m hoping to develop the concept more over the coming months. If you have ideas, get in touch, and take note of Backup Awareness Day for the month of May: Sunday 24th!

(Now stop reading and go do a few backups.)

Similar Posts:

Posted in Wordpress | Tagged accident, backing up, backup, backupawarenessday, black swan, computer, ctts, data, Events, mozy, mysql, security, time machine, website, Wordpress | 12 Comments

Blog Host Ugliness

[fr]

Une amie Serbe s'est vu poser un ultimatum par son hébergeur de blogs: 24 heures pour supprimer commentaires d'un autre blogueur et liens vers ses sites, ou voir son blog disparaître.

L'hébergeur en question (qui utilise WordPress multi-utilisateurs, comme WordPress.com) avait en outre désactivé la fonction d'exportation de blog.

On s'en est sortis comme on a pu (voir ici).

Mis à part le côté technique de l'affaire, il est absolument scandaleux qu'un hébergeur de blogs se permette d'agir ainsi. Certes, tout hébergeur est libre de "virer" des clients -- mais déactiver au préalable la fonction d'exportation des blogs, cela atteint des sommets de mesquinerie. A bon entendeur.

Edit: sur Seesmic, l'histoire en français et en vidéo.

[en]

Note: I’ve updated this post as I gathered information allowing me to see more clearly in this whole mess. Please read the comment if you’re going to jump in the conversation or blog about this.

Wednesday night, my friend Sanja from BlogOpen (she was my very kind and competent hostess) pinged me on IM. She had less than 24 hours to export her blog before her blog host shut it down.

It was a blog hosted by WordPress multi-user [Edit: not WPMU]. Easy enough, I thought. There is an export function. Unfortunately, when I logged in (the interface was in Serbian, but I can find my way through WordPress with my eyes closed), this is what I found:

WordPress (MU?) with no Export

Even if you don’t understand Serbian, you can see there is a missing tab. I tried calling /wp-admin/export.php directly, but the file had been removed.

Well, after a bit of poking, prodding and thinking, this is what I came up with (reminder: WPMU means that you can’t there was no possibility to install plugins and no direct access to the server):

Last Hope Export of WordPress MU Blog

I went into Options > Reading. I set the feeds to “entire post”. As there were 110 posts in this blog, I set the home page to display all of them, with a little margin for error. There were more than 1400 comments, so I set the maximum number of items in a feed to 1500.

Then I did three things:

  • saved /feed (an RSS dump of the blog posts)
  • saved /comments/feed (an RSS dump of the comments)
  • scraped the blog (with single blog post pages) as an extra backup by running wget -r -l1 -w1 BLOGURL (thanks, John) from my server (also to save the images).

The blog was saved. I couldn’t import the RSS dump of blog posts into WordPress.com, where I told Sanja to open a new blog account, so I quickly set up a regular WordPress install on my server, imported it there, and exported it in WXR format. Great.

Comments, however, are another story. If you’re in a hackish mood, any help would be appreciated.

We’ll probably have to deal with the images too once the blog has been completely wiped off the 381.com server — for the moment it seems like it was disabled, but the images are still there (see this one for example).

There, that was for the technical part.

Now for a personal comment. I find it utterly disgusting and shocking that a blog host owner would give people an ultimatum to leave and disable the export function in the blogging software. Sanja tells me that they had the export function until a few days before the ultimatum.

Of course, a blog host can choose not to host certain people. But trying to lock people in by disabling export of their own data is simply evil. If you’re kicking people off your system, you damn well better make sure they can take their data with them.

Edit, 27.01, 12:00: I’m happy to learn that it seems the disabling of the export function was not related to the ultimatum, and that the blog381 people were not actually trying to actively lock people in. However, it remains that it’s pretty delicate in a conflictual situation to tell people to “submit or leave” when they don’t have a way to export their data on their own.

So, people, please. If you need a blog host, choose a serious one. WordPress.com for example. Or Blogger. Or Typepad. Putting your precious blog between the hands of an individual is risky (weblogs.com, anybody? and if you remember, people on weblogs.com at least had the guarantee they could export their data…)

How did this happen?

I got some details about the situation, but a word of warning about that, first. The source material to this Serbian blogosphere drama is all in… Serbian. I’m relying here on what my friend Sanja told me about the situation, and I do not doubt her good faith. I know, though, that stories do have multiple sides, and that there might be more to the background than what I’m telling you here — but whatever the background story, it cannot justify the behaviour of this blog host.

From what I gathered, what brought about this crisis is a quarrel between two bloggers: Tatjana aka Venus aka Lang (Update: Tatjana is not happy that I’m linking to her and has redirected visitors to this site elsewhere; to see her blog, copy-paste the link http://www.laluve.com/ in your browser), the owner of the Serbian blogging platform blog381.com (not the Tatjana who organized BlogOpen!), and another pretty popular blogger. At some point, Tatjana decided to forbid the people using her platform from linking to this other blogger or harbouring his comments.

Here is the warning she posted on the community forums:

Vlasnik blogova
http://bruh.org/ludizmaj/,
http://www.blogoye.org/pecina/,
http://www.blogoye.org/Mudrosti/,
http://www.blogoye.org/sujeta/
(ima verovatno jos ali ne mogu da trazim)

je ovom blog sistemu naneo stetu laziranjem glasova oko izbora za najblogera (na kom je on bio ‘pobednik’), ‘miniranjem’ sledeceg izbora, sirenjem neistina, traceva, vrbovanjem novih blogera sa tri osam jedan sistema, a sve u cilju da se naskodi ovom sistemu a poveca sopstveni traffic i “ugled”.

Za one koji nisu dovoljno informisani i sve ostale koji su slusali ili nisu, samo jednu stranu price od gore pomenutog, necu dodatno iznositi nikakve detalje, niti vise imam nameru da se borim sa provincijalizmima pojedinih ljudi koji su bili ili jesu na neki nacin u komunikaciji sa blogom381 i njegovim korisnicima.

Slobodna volja svakog od nas da pise kako i gde hoce, ali oni koji se odluce da i dalje pisu ovde nece moci da imaju linkove ka ovim blogovima niti komentare vlasnika istih.

Ukoliko imate zelju,nameru ili potrebu da ostanete na ovom blog sitemu, obrisite linkove i komentare gore pomenutog blogera u roku od 24h.

Translation (Sanja was a bit tired, so forgive the wobbliness):

The owner of these blogs http://bruh.org/ludizmaj/, http://www.blogoye.org/pecina/, http://www.blogoye.org/Mudrosti/, http://www.blogoye.org/sujeta/

has caused damage to this blog system by faking votes for the election of “The best blogger” (where he was “the winner”), and was undermining the next election by spreading gossip, lies, and recruiting new 381 bloggers, with only one aim: to damage this community and increase his own blog traffic and “reputation”.

For those who are not informed well enough, and all others who were listening or didn’t, only one side of the story of the person mentioned above, I will not give any additional details, nor do I have the intention to fight with provincialism of some people who were or in some way are connected to blog381 communication and their users.

It is the free will of each of us to write how and where we want to, but those who decide to keep writing here, will not be able to have links to these blogs or comments by their owner.

Those of you who have the wish, intention or need to stay on this blog system, should delete links and comments of the blogger (mentioned above) within 24 hours.

Sanja learnt about this because the owner of the blogging platform left a comment on one of her posts (not the most recent) to let her know about it. Given that the “other blogger” in question is a friend of Sanja’s, she wasn’t going to comply.

Other bloggers have also seen their blogs deleted, or at least de-activated (actually, before the 24-hour limit was up). A dozen or so, says Sanja.

If you want to chime in on the “political” side of this story (particularly if you’re involved in this story or a direct witness), you’re welcome to use my comments. However, I ask (as always) that everybody remain civil and refrain from personal attacks (commonsense blogging etiquette, y’know).

Update: It seems that since Sanja’s blog was deactivated, the whole blogging platform has been shut down, with a message that people can e-mail the administrator to get an export of their blog. This message was not there during the ultimatum period.

In a comment to this post, Tatjana aka Lang asked me to remove the link to her blog, http://www.laluve.com/ , which I had placed upon her name. As I have refused to remove it (linking to the people involved in this story is perfectly relevant, and on the web, you can link to who you want, anyway), she has set up a redirection which sends visitors from this site straight off to CNN. So, I’ve left the link in, of course, but provided you with a handy copy-paste if you want to go and visit her all the same.

Similar Posts:

Posted in Wordpress | Tagged blog host, blog381, blogging, blogging platform, Blogosphere Interest, blogs, code of conduct, data, dump, ethics, export, Real Live Code, rss, wget, Wordpress, wpmu | 113 Comments

Being Lifter 20: I’m the “Star” Networker!

[fr]

Après LIFT l'an dernier, un questionnaire a été soumis au participants dans le but de déterminer quel impact la conférence avait eu sur leur réseau. J'y ai répondu, avec 27 autres personnes (un assez petit échantillon, à mon avis). Il se trouve que je suis la "super-réseauteuse" de l'étude. Quelques remarques.

[en]

Eleven months ago, I participated and encouraged you to participate in a survey which aimed to map social networking between participants of the LIFT’07 conference. As I was browsing around after submitting my workshop proposal, I saw that the report based on that survey had been published. On the LIFT site, you can see screenshots of the graphs (yes, this is what I call a “social graph”!) before and after the conference.

Go and look.

LIFT'07 Network Mapping Report

Notice the node somewhat to the left, that seems to be connected to a whole bunch of people? Yeah, that’s me. I’m “lifter 20″. How do I know? Well, not hard to guess — I have a rather atypical profile compared to the other people who took the survey.

So, as the “star” networker in this story, I do have a few thoughts/comments on some of the conclusions drawn from the survey. Don’t get me wrong — I think it’s very interesting, and that we need this kind of research (and more of it!) but as Glenn says himself in the 1Mb PDF report, it’s important to bear in mind the limitations of this study. (All the quotes in this blog post are taken form the PDF, unless I say otherwise.)

The limitations of this study needs to be understood before considering the findings: This study maps networks from the point of view of the 28 participants. Consequently, it is only a partial map of the networks established at LIFT07.

In this study, I’m the “star” networker: the person with the most connections before and after the conference.

Before the conference, participant Lifter20 had the largest network (59 attendees) which was increased by 25 attendees after the conference.

Bearing that in mind, I would personally have removed myself from the “average” calculations (I don’t think that was done), because I’m too a-typical compared to the other people in the survey. Typically, I would find it interesting to be given figures with extremes removed here:

There was a large range in the size of the individual networks before LIFT07 (from 0 to 59) and a smaller range in the number of people added to networks after the conference (from 0 to 28). However, on average, participants had seven people in their network before LIFT07 and added nine more people after the conference – leading to the conclusion that people at least doubled their network by attending LIFT07.

As mentioned earlier, 28 people took the survey. I know I’m not the most networked person at LIFT. In my “network of red nodes” (people not in the survey) there are people like Robert Scoble, Stowe Boyd, or Laurent Haug — who clearly did not take the survey, or I wouldn’t be the “star networker” here. So, they are a little red node somewhere in the graph. Which makes me take the following remark with a big grain of salt:

Before the conference, several “red” attendees (i.e. those attendees nominated as part of the network of the 28 participants) were significant relay nodes in the network receiving considerable incoming links – notably the red node to the right of Lifter 12 and the red node to the left of Lifter 16. In both cases, the number of links to these nodes increased after the conference.

What’s missing here is that these red nodes might very well be super networkers like Stowe or Robert. The fact they receive significant incoming links would then take a different meaning: only a very small part of their role in the global LIFT networking ecosystem is visible. (Yes, the study here only talks about a small part of this ecosystem, but it’s worth repeating.)

I think that most heavy networkers are not very likely to fill in such a survey. The more people you know, the more time it takes. I’m easily a bit obsessive, and I think this kind of study is really interesting, so I took the trouble to do it — but I’m sure many people with a smaller network than mine didn’t even consider doing it because it’s “too much work”. I suspect participation in such a survey is skewed towards people with smaller networks (“sure, I just know 5-10 people, I’ll quickly fill it in”).

Here’s a comment about the ratio of new contacts made during LIFT’07:

For example, the “star” networker, Lifter20 has a ratio of 1:0.4. In other words, for every third person in her existing network, she met one new person. Whereas, Lifter18 had the highest ratio of 1:7. In other words, for every person in her existing network, she met seven new people.

I think it’s important to note that, as I said in my previous post about this experiment, knowing many people from the LIFT community beforehand, the increase in my network (proportionally) was bound to be less impressive, than, say, when I came to LIFT’06 two years ago (I basically knew 3 people before going: Anne Dominique, Laurent, Marc-Olivier — and maybe Roberto… and walked out with a ton of new people). I’m sure Dunbar’s number kicks in somewhere too, and I would expect that the more people you know initially, the lower your ratio of new contacts should be.

On page 8 of the survey there is a list of participants and the number of before/after contacts they entered in the survey. So, if you took the survey and have a rough idea of how many people you knew before LIFT, and how many you met there, you should be able to identify who you are.

This is interesting:

The “star” networker, Lifter 20 had seven links to other participants before LIFT07 which grew to ten after the conference, giving her the most central position in the network of participants.

So, basically, 10 people I know took the survey — out of 28 total. I know I blogged about the survey and actively encouraged people in my network to take it. This would skew the sample, of course, making it closer to “my network at LIFT”. If we know each other and you took the survey, can you identify which number you are? it would be interesting to put faces on the numbers to interpret the data (for me, in any case, as I know the people). For example, if you’re a person I brought to LIFT, chances are your “new connections” will overlap mine quite a bit — more than if you came to LIFT independently.

A chapter of the report is devoted to the “star” networker (in other words, little me).

Interestingly, many of the people that she connected to, both before and after LIFT07, were not part of the networks of the other 27 participants of the study, indicating a certain isolation of parts of her network.

[...]

Before the conference, a significant number of contacts (35) of Lifter20 had no connections with any of the other 27 participants of the study.

After the conference, a number of contacts (14) made by Lifter20 had no connections with any of the other 27 participants of the study.

The first remark be turned the other way: maybe all these “unconnected” people are actually quite connected within the “global LIFT network”, and it is the sample of 28 people who answered the survey which have isolated networks. Of course, isolation is a relative notion, but the way things are phrased here makes it look like I have an isolated network… which I don’t really believe to be the case — a great part of my network is actually very interconnected, only it doesn’t show in the graph because the people in question did not take the survey. Friend Wheel for Stephanie Booth - Facebook Friend Relationships My friend wheel (see screenshot) from Facebook gives a better impression of what it looks like. (No, no, I’m not taking this personally! I’m not.)

Lifter20 shares a number of contacts with one other participant (Lifter13 – the blue node horizontally to the right in the “after” diagram).

Who is Lifter 13? (14 before, met 7 at LIFT’07) Somebody I knew before LIFT’07. I’m curious.

I’d also love to know who Lifter 18 (the “booster” networker) and Lifter 11 (the “clique” networker) were, though the graph indicates I know neither.

In conclusion, I’d say this is a really interesting study, but the anonymized data would gain to be interpreted in the light of who the actual people were and what their networks were like. I think it would allow to evaluate where this kind of analysis works well and works less well.

I think 28 people is a rather small sample for such a study — it’s a pity more people didn’t participate in the survey. How could we motivate people to participate? I think one of the issues, mainly, is that people don’t get anything directly out of participating. So… maybe some goodie incentive for doing it, next time? Also, I remember the interface was a bit raw. What I did is go through the participant list and type the names. It’s almost impossible to just think back at “so, who did I meet at LIFT this year?” — either you’re going to take a stack of business cards your brought home, or you’re going to go through a list and see what names ring a bell.

Maybe the survey organisation could take that into account. Provide participants in the survey with a (searchable, ajaxy) list of attendees with checkboxes. Then you could add smart stuff to help out like Dopplr’s “travellers you may know” (based on a “contacts of your contacts” algorithm).

Similar Posts:

Posted in Stuff that doesn't fit | Tagged analysis, conference, connector, data, event, Events, lift, lift07, Psychology / Sociology, report, social graph, social network, social networking, supernode | 10 Comments