January 27, 2003

Using GZIP as a Spam Filter

Really it's using the Ziv-Lempel (Zip) compression algorithm.

Zip finds redundant strings so you can figure out how many times strings are used.

This article talks about pulling down messages from spamassassin.com and running them through gzip and doing a character count (wc -c) on the gziped file.

Interesting.

Posted by Steve at January 27, 2003 08:41 AM