When Aggregating, Get a Human

April 22, 2008

I realized some time ago that it would be a long fight to have my site referenced as the first result in a search for David Brooks. And that’s fine, I realize that I’m a web designer and not of any national standing as far as the general public, or anyone else is concerned. I also realize that I have a fairly common first and last name. All of that adds up to a recipe for anonymity, which is good if you think about the amount of identity theft going around these days.

Every once and awhile my friends will give me a hard time about a news article I wrote for the New York Times, the newspaper that I have nothing to do with and for which I have never written an article. (I wish…) But I’ve never really had to worry about actually being mistaken for anyone else, and nobody else has really ever mistaken me for another David Brooks either.

…well, there was that one call in High School where someone I didn’t know called me to tell me to stay away from his girlfriend, “or else.” The call, I figured out, was intended for another David Brooks in another school district that probably was beaten up and never knew why.

But anyway, yesterday I found a link in my referral logs to a site that supposedly holds online resumes for “important people” on the web. Apparently this site has figured out that one of the many other people with the same name as me actually owns LuzCannon.com and that fact had been credited to his resume at that site. This was news to me, I guess with all the hostile takeovers these days that I didn’t see the memo to my share holders. …At least I didn’t lose my publisher access, maybe that’s next. I had better watch my back.

But all joking aside, the problem is this: I was going to be really irritated at this other David Brooks for claiming my site as his own work… But then I noticed a microscopic chunk of text that reads something like this: “This information has not been verified for accuracy, it has been collected by our automatic aggregators. We do this to give you the most up to date information.”

Apparently the aggregating bot doesn’t understand who I am in comparison or relation to the next David Brooks. It also disregards the middle initial, since that’s the thing that he uses to set himself apart and the very thing that I leave off. The truly laughable thing about it, however, is the fact that the most obvious person, the one with the strongest pull on Google isn’t the one that absorbed my information, it was one of the others with little web presence.

So what am I saying? If you’re going to create a site that tries to identify people on the web you’re going to need a real person to assist you. Bots just don’t understand that sort of thing yet, and I would imagine that over half the information on that site suffers from similar issues.

From a user perspective, I can’t trust the validity of the data. The fact that it was misleading is one thing, somewhat forgivable since it’s bot generated, I’d say. But the fact that they don’t really explain what aggregation is and then their claim to be “up to date” is ludicrous. If you’re going to auto-aggregate the information and base your site off the results you should probably have a way to check your information or at least allow people to modify the content appropriately like Wikipedia does.

Furthermore, this particular site requires me to login to change the information on the page… But to do that I would need to actually be “the rightful owner” of the profile in question. Sending an email to the site was a maze of “you should login instead of sending this email!” and “do you want to advertise with us instead?”

It seems like the best thing to do is just stay at a distance from the website in question. I’m not going to link to it or anything like that, unless someone changes my mind. The underlying point of this article should be that we need data checking on the web, otherwise we’re never going to be taken as a credible source. Period. We already have trouble as it is, and automatically aggregating and misappropriating content from the entire web doesn’t help. C’est Tout.

Edit: In case you wondered if this was a case of site mix-up within the resume, probably not. His correct site was listed in the right column, mine was referenced in the event that anyone would want to contact him… Three times.

| | |

Recently on Twitter

Last.fm Playlist

2Nero
3Benny Benassi
4deadmau5
5Figure
6BT

© 2005 - 2012 David Brooks, all rights reserved.

Powered by Textpattern icon Textpattern

Current Location: Michigan, the whole thing.