Taking TrackBack Back (from Spam)
Wallach, Dan S.
The TrackBack protocol, conceived as a way to automatically link together web sites which reference one another, has become a new vector for spammers wishing to divert web surfers to their sites. A site which supports TrackBack allows any entity to inject arbitrary HTML code, plus the URL of the sender, into its pages; an attacker need only follow the TrackBack protocol to exploit the system and leverage such a site in a link farm. Current approaches to combating TrackBack spam are limited to content-based filters (of the sort currently used against email and weblog comment spam). In this paper, we propose a way to identify TrackBack spam by considering the relationship between the sender's URL and the site under attack. In particular, we observe that, for spam TrackBacks, the page at the given URL does not link to the page to which the TrackBack was sent. We have developed software for weblog authors that rejects TrackBacks from sources lacking this reciprocal link. Data collected from our users demonstrates that this test is 100% accurate at identifying and separating spam from legitimate TrackBacks.
Technical Report Number