Tags and Categories are not the Same! [en]

[fr] Les tags et les catégories, ce n'est pas la même chose. En bref, les catégories forment une structure hiérarchique, prédéfinie, qui régit l'architecture de notre contenu et aide autrui à s'y retrouver. Les tags sont spontanés, ad hoc, de granularité variable, tournés vers le partage et la recherche d'information.

Update, Sept. 2007: when I saw Matt in San Francisco this winter, he told me he had finally “seen the light” (his words!) about tags and categories. Six months later, it’s a reality for WordPress users. Thanks for listening.

I got a bit heated up last night between Matt’s comment that tags and categories function the same and a discussion I was having with Kevin on IM at the same time, about the fact that Technorati parses categories as tags.

I went back to read two of my old posts: Technorati Tagified and Plugin Idea: Weighted Tags by Category which I wrote about a year ago. In both, it’s very clear that as a user, I don’t percieve tags to be the same thing as categories. Tags were something like “public keywords”. Is anybody here going to say that keywords and categories are the same thing? (There is a difference between keywords and tags, but this isn’t the topic here; keywords and tags are IMHO much closer in nature than tags and categories).

Here are, in my opinion, the main differences between tags and categories, from the “tagger” point of view.

  • categories exist before the item I’m categorizing, whereas tags are created in reaction to the item, often in an ad hoc manner: I need to fit the item in a category, but I adapt tags to the item;
  • categories should be few, tags many;
  • categories are expected to have a pretty constant granularity, whereas tags can be very general like “switzerland” or very particular like “bloggyfriday“;
  • categories are planned, tags are spontanous, they have a brainstorm-like nature, as Kevin explains very well: You look at the picture and type in the few words it makes you think of, move on to the next, and you’re done.
  • relations between categories are tree-like, but those between tags are network-like;
  • categories are something you choose, tags are generally something you gush out;
  • categories help me classify what I’m talking about, and tags help me share or spread it;

There’s nothing wrong with Technorati treating categories as tags. I’d say categories are a kind of tag. They are special tags you plan in advance to delimit zones of content, and that you display them on your blog to help your readers find their way through what you say or separate areas of interest (ie, my Grandma will be interested by my Life and Ramblings category and subscribe to that if she has an RSS reader, but she knows she doesn’t care about anything in the Geek category. (By the way, CTTS is not a good example of this, the categories are a real mess.)

So, let’s say categories are tags. I can agree with that. But tags are not categories! Tags help people going through a “search” process. Click on a tag to see related posts/photos. See things outside the world of this particular weblog which have the same label attached. Provide a handy label to collect writings, photos, and stuff from a wide variety of people without requiring them to change the architecture of their blog content (their categories). If you want to, yeah, you can drop categories and use only tags. It works on http://del.icio.us/. But have you noticed how most Flickr users have http://flickr.com/photos/bunny/sets/ in addition to tagging their photos? Sets aren’t categories, but they can be close. They are a way of presenting and organizing things for human beings rather than machines, search engines, database queries.

To get back to my complaint that WordPress.com does not provide real tags, it’s mainly a question of user interface. I don’t care if from a software point of view, tags and categories are the same thing for WordPress. As a user, I need a field in which I can let my fingers gush out keyword-tags once I’ve finished writing my post. I also need someplace to define and structure category-tags. I need to be able to define how to display these two kids of tags (if you want to call them both that) on my blog, because they are ways of classifying or labeling information which I live very differently.

Am I a tag weirdo? Do you also perceive a difference between tags and categories? How would you express or define it? If categories and tags are the same, the new WP2.0 interface for categories should make the Bunny Tags Plugin obsolete — does it?

Tracking Keywords: PubSub and Technorati [en]

[fr] Comparaison de PubSub et Technorati pour surveiller des mots-clés dans la blogosphère. Aucun des deux vraiment satisfaisant.

One thing I came back with from LIFT’06 is that what one should monitor is more keyword watchlists, rather than blogs. I used to have a few hundred blogs in an aggregator, but gave up using it ages ago. Too much to sift through, considering it isn’t my day job to do so.

During his talk, Robert mentioned that he used PubSub to track keywords like “Microsoft” or his name. Of course, it makes sense. Tracking topics that are of interest to you. I created a PubSub account and set up a few subscriptions to try to track things like mentions of my hometown, Lausanne, teenagers and weblogs, and of course my name. Tracking your name makes a lot of sense if you’re looking out for conversations. Think of highlighting in IRC: if everybody tracks their name in blogs, then you can just call out to them. Hi, Robert, by the way!

Now, this name thing. I guess tracking your surname with PubSub is all right if you’re named Scoble, but if you’re named Booth it makes things much trickier. I added my first name, but that didn’t help much if I omitted the quotes. And as people are likely to refer to me as “Stephanie Booth”, “Stéphanie Booth”, “Steph Booth” or even “Stéph Booth” that’s a bunch to track, but let’s say it’s manageable. But it rules out people who refer to me as “bunny” or even “Tara” (yeah, and if I start tracking those too, it’s not going to make things less messy).

What I really liked about PubSub is that it offers me an out-of-the-box sidebar for firefox. I can get a list of the recent posts containing my keywords in there, browse them, click, check, move on. It has highlighting too, and that’s really nice — helps me see straight away if the Stephanie Booth on the page is me or some homonym. (For some reason it’s not working anymore, but it was nice while it lasted.)

What I didn’t like is that it didn’t seem to be returning as many results as Technorati. Also, I wasn’t always sure if it was responding or not (I guess the current conversation around my name isn’t very busy ;-)). And the “Latest Messages” option only gave me the last three posts in each subscription. It gave me the impression of being a little incomplete in the results it returned. I suspect it isn’t really incomplete, but I can’t really nail what gives me the impression. In any case, PubSub and Technorati give different results for a search on “cocomment”

The slight unsatisfaction with PubSub made me go back to Technorati watchlists, which I had never really used. I like the idea of tracking URLs in posts. If somebody links to me, then it doesn’t matter if the person called me “Stéph Booth” or “Tara” or “la Mère Denis“, I’ll see it. I can also track links to my Flickr account and other blogs and stuff easily. Keyword searches work too. So, neat, I now have a watchlist page on Technorati with all my monitoring material. I can subscribe to each of them by RSS.

Gripes, however. And for the sake of it, let’s assume I’m hoping my watchlists will replace my NewsReader, and not go and live in it:

  • I can only expand one watchlist at a time.
  • Expanding a watchlist shows only the three last results.
  • I don’t have a compilation page with the latest results from all/any of my watchlists.
  • I’d like a sidebar!
  • Blogroll links keep showing up in Technorati search results. It’s nice to know you’ve been blogrolled, but you don’t need to be reminded of it each time you do a search.
  • No highlighting!

What it boils down to: I’d like a Technorati Watchlist sidebar for FireFox and highlighting of search terms or URL in the pages which are loaded from it.

Do you monitor keywords, URLs or search terms? Do you use PubSub or Technorati? Do you stick the results in your feed reader to keep track of them?

Update: of course, I’m much more familiar with Technorati, so there might be something about PubSub I’m missing completely. Feel free to educate me.

Life and Trials of a Multilingual Weblog [en]

Here is an explanation of how I set up WordPress to manage my bilingual weblog. I give all the code I used to do it, and announce some of the things I’d like to implement. A “Multilingual blogging” TopicExchange channel is now open.

[fr] J'explique ici quelles sont les modifications que j'ai faites à WordPress pour gérer le bilinguisme de mon weblog -- code php et css à l'appui. Je mentionne également quelques innovations que j'ai en tête pour rendre ce weblog plus sympathique à mes lecteurs monolingues (ce résumé en est une!) Un canal pour le weblogging multilingue a été ouvert sur TopicExchange, et vous y trouverez peut-être d'autres écrits sur le même sujet. Utilisez-le (en envoyant un trackback) si vous écrivez des billets sur le multinguisme dans les weblogs!

My weblog is bilingual, and has been since November 2000. Already then, I knew that I wouldn’t be capable of producing a site which duplicates every entry in two languages.

I think this would defeat the whole idea of weblogging: lowering the “publication barrier”. I feel like writing something, I quickly type it out, press “Publish”, and there we are. Imposing upon myself to translate everything just pushes it back up again. I have seen people try this, but I have never seen somebody keep it up for anything nearing four years (this weblog is turning four on July 13).

This weblog is therefore happily bilingual, as I am — sometimes in English, sometimes in French. This post is about how I have adapted the blogging tools I use to my bilingualism, and more importantly, how I can accommodate my monolingual readers so that they also feel comfortable here.

First thing to note: although weblogging tools are now ready to be used by people speaking a variety of languages (thanks to a process named “localization”), they remain monolingual. Language is determined at weblog-level.

With Movable Type, I used categories to emulate post-level language awareness. This wasn’t satisfying at all: I ended up with to monstrous categories, Français and English, which didn’t help keep rebuild times down.

With WordPress, the solution is far more satisfying: I store the language information as Post Meta, or “custom field”. No more category exploitation for something they shouldn’t be used for.

Before I really got started doing the exciting stuff, I made a quick change to the WordPress admin interface. If I was going to be adding a “language” custom field to each and every post of mine, I didn’t want to be doing it with the (imho) rather clumsy “Custom Fields” form.

In edit.php, just after the categorydiv fieldset, I inserted the following:

<fieldset id="languagediv">
      <legend>< ?php _e('Language') ?></legend>
	  <div><input type="text" name="language" size="7"
                     tabindex="2" value="en" id="language" /></div>
</fieldset>

(You’ll probably have to move around your tabindex values so that the tabbing order makes sense to you.)

I also tweaked the wp-admin.css file a bit to keep it looking reasonably pretty, adding the rule below:

#languagediv {
	height: 3.5em;
	width: 5em;
}

and adding #languagediv everywhere I could see #poststatusdiv, so that they obeyed the same rules.

In this way, I have a small text field to edit to set the language. I pre-set it to “en”, and have just to change it to “fr” if I am writing in French.

We just need to add a little piece of code in the form processing script, post.php, just after the line that says add_meta($post_ID):

 // add language
	if(isset($_POST['language']))
	{
	$_POST['metakeyselect'] = 'language';
        $_POST['metavalue'] = $_POST['language'];
        add_meta($post_ID);
        }

The first thing I do with this language information is styling posts differently depending on the language. I do this by adding a lang attribute to my post <div>:

<div class="post" lang="<?php $post_language=get_post_custom_values("language"); $the_language=$post_language['0']; print($the_language); ?>">

In the CSS, I add these rules:

div.post:lang(fr) h2.post-title:before {
  content: " [fr] ";
  font-weight: normal;
}
div.post:lang(en) h2.post-title:before {
  content: " [en] ";
  font-weight: normal;
}
div.post:lang(fr)
{
background-color: #FAECE7;
}

I also make sure the language of the date matches the language of the post. For this, I added a new function, the_time_lg(), to my-hacks.php. I then use the following code to print the date: <?php the_time_lg($the_language); ?>.

Can more be done? Yes! I know I have readers who are not bilingual in the two languages I use. I know that at times I write a lot in one language and less in another, and my “monolingual” readers can get frustrated about this. During a between-session conversation at BlogTalk, I suddenly had an idea: I would provide an “other language” excerpt for each of my posts.

I’ve been writing excerpts for each of my posts for the last six months now, and it’s not something that raises the publishing barrier for me. Quickly writing a sentence or two about my post in the “other language” is something I can easily do, and it will at least give my readers an indication about what is said in the posts they can’t understand. This is the first post I’m trying this with.

So, as I did for language above, I added another “custom field” to my admin interface (in edit-form.php). Actually, I didn’t stop there. I also added the field for the excerpt to the “simple controls” posting page that I use (set that in Options > Writing), and another field for keywords, which I also store for each post as meta data. Use at your convenience:

<!-- BEGIN BUNNY HACK -->
<fieldset style="clear:both">
<legend><a href="http://wordpress.org/docs/reference/post/#excerpt"
title="<?php _e('Help with excerpts') ?>"><?php _e('Excerpt') ?></a></legend>
<div><textarea rows="1" cols="40" name="excerpt" tabindex="5" id="excerpt">
<?php echo $excerpt ?></textarea></div>
</fieldset>
<fieldset style="clear:both">
<legend><?php _e('Other Language Excerpt') ?></legend>
<div><textarea rows="1" cols="40" name="other-excerpt"
tabindex="6" id="other-excerpt"></textarea></div>
</fieldset>
<fieldset style="clear:both">
<legend><?php _e('Keywords') ?></legend>
<div><textarea rows="1" cols="40" name="keywords" tabindex="7" id="keywords">
<?php echo $keywords ?></textarea></div>
</fieldset>
<!-- I moved around some tabindex values too -->
<!-- END BUNNY HACK -->

I inserted these fields just below the “content” fieldset, and styled the #keywords and #other-excerpt textarea fields in exactly the same way as #excerpt. Practical translation: open wp-admin.css, search for “excerpt”, and modify the rules so that they look like this:

#excerpt, #keywords, #other-excerpt {
	height: 1.8em;
	width: 98%;
}

instead of simply this:

#excerpt {
	height: 1.8em;
	width: 98%;
}

I’m sure by now you’re curious about what my posting screen looks like!

To make sure the data in these fields is processed, we need to add the following code to post.php (as we did for the “language” field above):

// add keywords
	if(isset($_POST['keywords']))
	{
	$_POST['metakeyselect'] = 'keywords';
        $_POST['metavalue'] = $_POST['keywords'];
        add_meta($post_ID);
        }
   // add other excerpt
	if(isset($_POST['other-excerpt']))
	{
	$_POST['metakeyselect'] = 'other-excerpt';
        $_POST['metavalue'] = $_POST['other-excerpt'];
        add_meta($post_ID);
        }

Displaying the “other language excerpt” is done in this simple-but-not-too-elegant way:

<?php
$post_other_excerpt=get_post_custom_values("other-excerpt");
$the_other_excerpt=$post_other_excerpt['0'];
if($the_other_excerpt!="")
{
	if($the_language=="fr")
	{
	$the_other_language="en";
	}

	if($the_language=="en")
	{
	$the_other_language="fr";
	}
?>
    <div class="other-excerpt" lang="<?php print($the_other_language); ?>">
    <?php print($the_other_excerpt); ?>
    </div>
  <?php
  }
  ?>

accompanied by the following CSS:

div.other-excerpt:lang(fr)
{
background-color: #FAECE7;
}
div.other-excerpt:lang(en)
{
background-color: #FFF;
}
div.other-excerpt:before {
  content: " [" attr(lang) "] ";
  font-weight: normal;
}

Now that we’ve got the basics covered, what else can be done? Well, I’ve got some ideas. Mainly, I’d like visitors to be able to add “en” or “fr” at the end of any url to my weblog, and that would automatically filter out all the content which is not in that language — maybe using the trick Daniel describes? In addition to that, it would also change the language of what I call the “page furniture” — titles, footer, and even (let’s by ambitious) category names. Adding language sensitivity to trackbacks and comments could also be interesting.

A last thing I’ll mention in the multilingual department for this weblog is my styling of outgoing links if they are written in a language which is not my post language, using the hreflang attribute. It’s easy, and you should do it too!

Suw (who has just resumed blogging in Welsh) and I have just set up a “Multilingual blogging” channel on TopicExchange — please trackback it if you write about blogging in more than one language!