Blekko Update: Mechanical Turk Meets Algorithmic Web Search

Posted August 13th, 2010 by Andrew Goodman

Blekko, the stealth search engine startup co-founded by Rich Skrenta (of ODP fame), has recently launched a limited private beta. I’ve been able to give it a spin and can confirm, it feels like early days yet for the project but it does have its core metaphors and functions nailed down solidly.

As close observers now know, the service allows users to use pre-set (or custom) “slashtags” to reduce a whole web search to a narrower slice of relevant websites. “Pre-sets” include /people, /stock, /weather, /define, etc. These work somewhat the same as certain Google pre-sets, which can be either activated by knowing the nomenclature, or may be activated automatically based on the style of your query. Blekko’s pre-sets already go beyond some of what Google offers; as these develop they constitute a potentially useful toolkit.

Some quirky areas they’re working on are near and dear to the founder’s heart: because he’s always aimed to pitch the product to search pros and SEO’s, Blekko has spent considerable time building out SEO-candy features. For example, a slashtag called /rank (and no doubt other features in development) reveal the factors in ranking for certain phrases, which would allow one to understand why some sites rank higher than others. Go ahead and game Blekko if you want — people won’t be using it to search the whole web, so they’re unlikely to include your site in a slashtag site list if it’s spammy.

More germane to the spirit of the project are “topic” slashtags (from /arts to /discgolf to /yoga) and “user” slashtags (anything users customize). The idea is basically to metasearch all the sites that are included in the list, but none of the rest of the web. (Or you can call it a mini index or a multi site search if you will.)

It’s important to note that Blekko needs to maintain its own full web index to offer this to users and partners as original technology; after all, when you enter a site to add to the list, it’s got to be spidered and already included in the master list of sites. One of the key guiding principles Skrenta mentioned to me in the earlier going was that so few search startups are going into the Google arena of “whole web search” that the field is starved of healthy competition. He’s right. To keep costs down, Blekko’s index is of course much smaller than Google’s. That means it’s less comprehensive, but it makes the project feasible and further limits the incentives for spammers.

So many other search startups have been content to set their sights lower, tinkering with mere “features.” Yet somewhat disturbingly, in a recent Techcrunch interview, Skrenta’s emphasis is on slashtags as a “feature”. It’s still unclear, then, whether Blekko is a big idea, or a small one.

A few points are worth mentioning as far as this core functionality goes.

1. There is some question as to who Blekko is competing with and where its positioning lies. Clearly, the identified shortcoming in mega web indexes is junk, spam, and an inadequate commitment to personalization.

2. To address that, some return to the curated web must be contemplated. Fixed directories had their day and were susceptible to claims of corruption. Moreover, as Steve Thomas of a startup called Moreover once noted, directories and other curated sources suffer from the “fixed taxonomy problem”. What if your own list or own approach would be different? Shouldn’t you be able to follow an “editor” who “slashes” the web in a way that you approve of? Shouldn’t you be able to contribute your own value as an editor to the community, without being stuck on categories you don’t actually love because someone else got there first?

3. The large number of pre-set slashtags are interesting. They suggest yet another attempt could be made to intelligently curate the web, using people who are good at putting together the lists. Someone would need to admit that it was opinionated categorization, of course. There’s something disingenuous about Google News’s helpful reminder that the whole thing is “generated by a computer program” and has no editorial judgment involved. Presumably Blekko suggests we might (or must inevitably) go in a different direction. Admitting to curation might be a start!

So Blekko is a way to customize the web and shrink the universe. It’s a cousin of wave 1 of “peer to peer” search (OpenCola) or shared bookmarking services (Backflip, HotLinks).

Here’s the rub at this stage. When I try to use certain slashtags in an intuitive way to find content, it doesn’t work very well.

I built my own slashtag called /ugc to encompass a number of user review sites I like, such as Yelp and Chowhound (and Tripadvisor and…). When I tried to find certain restaurants, no matter what I typed, I still got a mess of irrelevant results. ¬†And this is arguably not an advanced search problem, but one of the simpler ones you’ll throw at an engine. I’d be way better off just going directly to Yelp or, or to their mobile apps. Then, I discovered the slashtag called /reviews. It didn’t work any better. At the end of the day, I know how Yelp, Tripadvisor, and HomeStars work when I visit them directly. I don’t know how Blekko works, and/or, it currently just doesn’t do the job. In many important ways, it offers a less rich experience than the individual sources, which offer a rich navigational experience, community, etc. It’s a search engine, of course, so it helps you find stuff. Which in this day and age, isn’t all that impressive to the average person… especially if it does a poor job of that.

Similar problems came up when I tried intuitive seeming searches using the /vegan slashtag. With a very specific intent, maybe that tag might work. 90% of the time, it’s safe to say you’ll be just as frustrated as you would be using Google.

One generous reviewer noted that “Slashtags aren’t perfect“. In my opinion, they don’t really work as billed, and might never work intuitively. As a user, when I see the slash /liberal or /local or /review, I think of an attribute, not a simple notation that the web universe will be shrunk to a curated list of websites, however small or large. The state of the semantic web is in shambles, there is no question about that. So it is indeed pie-in-the-sky to expect key attributes of pages (such as “this is a consumer review”) to be widely and universally available for search engines and users. But I get overexcited when I see a slashtag like /funny or /green. I expect it to work magically. Of course, it doesn’t.

Granted, the project is still in Beta and even the individual site searches on the underlying sites are often weak and cluttered until you become a power user. At HomeStars, we quibbled for a couple of years over the best way of treating the fact that 40% or more queries are on a company name, but generic words in company names often overlap with review content, giving the “wrong” rank-order for that user’s intent. Those are solvable problems to an extent, so with more curation, more tweaking, more feedback on the beta, etc., Blekko can maybe solve many of these problems.

I definitely empathize with the challenge, having been involved in a site that fails often even just making the search and navigational experience work for a narrow subset of users. Novices come and go who think they can write code that will “fix” search and make it “work”. It does work, for 0.5% of users. Then it breaks for the other 99.5%. Everyone has opinions about rank-order, and pretty much everything else to do with search. You don’t “solve” it with a couple of pointers as it’s incredibly complex and subjective.

Ruminating on all of that, it’s clear that Blekko is an enormous idea, not a small idea or a simple feature. As a result, it opens up enormous cans of worms — as it should.

Other (Googly) ways of assessing relevance and quality are going to eventually need to be built into an engine like Blekko, regardless. Google has a light-year’s worth of head start in gathering data about user behavior, click trails and paths, and other signals that help propel one page or site higher than another. Even if you cut out 95% of Google’s index, or 99%, Google’s proprietary data and ranking technologies would be immensely helpful to ranking. Heck, you can use Google site search for your own site and tap into the same technology.

Meanwhile, Google is moving forward to aggregate content and serve up types of content with certain attributes, drawing from qualifying lists of underlying data providers. One such area is user review aggregation. They’re making a botch of it now, and are heavy-handedly trying to funnel users into their own review app. But they’re iterating fast.

In other words, Google is going hard after solving this type of problem in areas where it really matters. Blekko is experimenting with how the problem itself is posed, and where to take it next. And that is bound to be strikingly different in tone and execution than Google’s method.

Rich Skrenta is right. The field of large scale search engines is in desperate need of innovation and genuine competition. Blekko can act as an incubator for better ideas about search, but perhaps just as importantly, different ideas about search.

