Detecting similar news

I was faced with an interview question recently that sounded like a nice puzzle to play with. If my memory's right, the question was something of »