A Spam Filter Based on Rough Sets Theory
With the popularization of Internet and the wide use of electronic mails, the number of spam mails grows continuously. The matter has made e-mail users feel inconvenient. If e-mail servers can be integrated with data mining and artificial intelligence techniques and learn spam rules and filter out spam mails automatically, they will help every person who is bothered by spam mails to enjoy a clear e-mail environment.
In this research, we propose an architecture called union defense to oppose against the spread of spam mails. Under the architecture, we need a rule-based data mining and artificial intelligence algorithm. Rough sets theory will be a good choice. Rough sets theory was proposed by Palwak, a logician living in Poland. It is a rule-based data mining and artificial intelligence algorithm and suitable to find the potential knowledge of inexact and incomplete data out.
This research developed a spam filter based on rough sets theory. It can search for the characteristic rules of spam mails and can use these rules to filter out spam mails. This system set up by this research can be appended to most of existing e-mail servers. Besides, the system support Chinese, Japanese and Korean character sets and overcome the problem that most spam filters only can deal with English mails. We can develop a rule exchange approach between e-mail servers in the future works to realize union defense.
Advisor:Bing-chiang Jeng; Sheng-tzong Cheng; Nian-shing Chen; Chia-mei Chen
School:National Sun Yat-Sen University
School Location:China - Taiwan
Source Type:Master's Thesis
Keywords:rough sets theory spam data mining and artificial intelligence
Date of Publication:07/26/2005