CA2423654A1 - Method and apparatus for identification and classification of correspondents sending electronic messages - Google Patents
Method and apparatus for identification and classification of correspondents sending electronic messages Download PDFInfo
- Publication number
- CA2423654A1 CA2423654A1 CA002423654A CA2423654A CA2423654A1 CA 2423654 A1 CA2423654 A1 CA 2423654A1 CA 002423654 A CA002423654 A CA 002423654A CA 2423654 A CA2423654 A CA 2423654A CA 2423654 A1 CA2423654 A1 CA 2423654A1
- Authority
- CA
- Canada
- Prior art keywords
- message
- correspondent
- classifying
- recipient
- recognized
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/12—Applying verification of the received information
- H04L63/126—Applying verification of the received information the source of the received data
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L51/00—User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
- H04L51/21—Monitoring or handling of messages
- H04L51/212—Monitoring or handling of messages using filtering or selective blocking
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04W—WIRELESS COMMUNICATION NETWORKS
- H04W99/00—Subject matter not provided for in other groups of this subclass
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer And Data Communications (AREA)
Abstract
The present invention allows recipients to refine their own definition of correspondents without affecting the definition of correspondents for other users of the system. The invention thus ensures the authenticity of messages by properly classifying the correspondent and subsequently ensuring that the content from an arbitrary correspondent has not been forged. Positive recognizers are used to determine the messages are genuine for a given correspondent or category of correspondent. Feedback from the recipient is used to adapt positive recognizers with regard to new recipients or new categories thereof. For wireless communications systems, a threat indicator and a message sorter are added to block massed emailing to the wireless providers customers.
Description
~889757?C.~
NtETIIOp Alri~p APPARATUS FOIL IpENTIFICATION ANp CL,ASSiFICATIAN G1)f COR,RESPG1YDENTS SEl~tntNG ET.ECTRUNIC
MESSAGES
S Field of the :Invention The present invexltion relates to metliod and apparatus for identification and classification of the solders of electronic messages, and is particularly concerned with pre-delivery s;creex?ing.
Background of the Invention Traditional methods .for the identification of machine correspondents provide a significant weala2ess in the implementation of many challenge-based messaging systems. In order to satisfy the requirements of subscribers wishing to receive certain machine-ozigi~aated rrle$sages, messaging providers have traditionally allowed messages claiming a certain origins to effecrively circumvent tine challenge system.
However as cluallerlge-based systems become more prevalent, malicious organizations will begin to recognize this irrlhlementation weakness; specifically, that once they acquire a (potentially Iegiti~mate) account vn a particular system, that they can probe ii {or merely read its documeextation) to discover which source addresses are considered 2t) to be from "legitimate" r~cl2ine-originated services. 'rhe weakness here is shat the traditional challenge-based systems do not have a method far insuring chat machine-originated messages are not in fact forgeries; in fact they merely examine the SMTP
envelope, and accept the addrass arid message ai face-value. It is then a simple matter for a rnaliciou:~ person to forge messages claiming to be fiom one machine with another - varyi:ctg the payload from, for exa;nple, weather reports to an illegal chain letter attesnptirtl; to defraud unwary subscribers.
08$97572C.A.
.7 Othex challenge-hosed systems assume they exist in isolation, or at least that no other vendors of challenge=based system exist. This means that a human protected 6y non-interoperable systems could not communicate wi2b a human an another challenge-based system, as challenge messages are, by their very de&nitian, rriachine-originated.
Syste:rrrs, such as 'khose offered by E3rightMail and Postini use signatures atxd/ar heuristics to recognize invalid traffic. These systems assume that message traffic is valid until declared itlva,Iid and as such are vulnerable to mass-mailing attacks when a resourceful, ;yet malicious person causes a machine to forge large quantities of e-mail.
This malicious person aiewd only acquire an account protected by one of these systems, probe their detectiozt algorithms, and begin injecting mail at a furious rate -he will be successful until the centralized authority updates its protection criteria (signatures, Heuristics) at the leaf nodes. Depending on tile resources available to the malicious person, this could mean millions of compromised mailboxes.
To be successful ixi this effort there must be adequate ltnawledge of the nature of messaging today. Messages take many forms including email, SMS and Instant but each of these has some common characteristics. First, they are all either originated :ZO directly by a human being or a machine operaun g an automated program and second they are all intended to tae received by a huxtaan reader.
There is of course a vast array of both the intent and content of messages and this invention does not seek to classify them. Rather this invention identifies and auther~dcates the corresgond~ts.
In order to be of value to an arbitrary user of messaging systems any classification system should be capable of breakdown of correspondent. That is recognized or unrecognized, human ar machine, desirable or undesirable.
08897572~.~
Summary of the Invention An object of thE~ present invention is to provide an improved method and apparatus for identification. arid classification of the senders of electronic messages.
Accordingly, the invention allows recipients to refine their own definition of correspondents without affecting the detnition of correspaz~dents for other users of the system. The invention thus ensures the authenticity of messages by properly classifying the correspondent and subsequently ensuring that the content from an arbitrary correspondent has not been forged.
IU
8y virtue of technology employed in the challenge-response system, it becomes very difficult to develop software to "play human" and authenticate itself on an autornated basis. It is fitrihermare very time-consuming to become recognized as a valid coaespondent for sexcding rrressages to a large number of users, as the system I 5 isolates delivery pernlission on a recipient-by-recipient basis.
In accordance wii:h an aspect of the present invention there is provided a method of identifying alnd Cla_S61fy3.ng correspondents sending electronic messages, said method c:omprisiug the steps of classifying a correspondent as one a recognized machine or recognized htunan or tuu-ecognized and if classified as unrecognized, 20 sending a challenge request for response from the correspondent in an attempt to classify as hll;man.
The in teroperability provisions in this method provides clear benefits to the messaging commuW;y-ai--Iarge; without it, most users would not be able to communicate with one a:nc~ther as the use of challenged-based system reaches a :!S critical mass. As such, no.n-interaperability would eventually degrade the rxiessaging system as a whole.
In accordance with another aspect of the present invention there is provided a method of identifying and classifying correspondents sending electronic messages, 0889~572C.4 said method comprising the step uf: determining that contents of a message are authentic to a category of rctachitte correspondents.
Determining the category of a machine correspondent is the first stage in ensuring messages from an already identified machine correspondent are not forged, however on its own it also provides tangible benefits to subscribers.
Automated classification provides the ability for subscribers to n3anage their message traffic more easily; for example by ilkerin.g messages to different folders based an recognition criteria, or f~electxng which traffic to send to a wireless device or ocher messaging destination based on on8;in type.
In accordance with a further-aspect of the present invention there is provided a method of identifying and classifying correspondents sending electronic messages, said method, comprising the steps o#: requesting conf~riatiorr from a potential recipient ih~~i receiving ~t message from a recogni2ed but previously unseen correspondent is desired axed that an initial elassihcation of the correspondent is correct; upon con firmation without reclassifleation, delivering the message to the recipient; and upon confzrnzation with reeiassificarion, changing the classification of the eorrespor.~dent and delivering the message to the recipient.
This method benefits both subscribers and providers of messaging systems. rn the solicitation of input from subscribers, the system of#loads maintenance and teaching requirements from the provider of the messaging system; this has a direct impact on staffing levels fior mid w large~sized organizations, yielding a very tangible reduction in overhead expenditure.
The heilefit to the subscriber is less tangible, however it is of paramount importance. A~essage traffic tends to ebb and flow like tides, aazd sometimes, like tidal :?S waves. Allowing subscrib~r~; to "teach' the systerz~ right fronn wrong when it has made a mistake helps the system better react to future change, but more importantly, the ever-alert subscriber base will be able to curb the onslaught of an attack which may have otherwi:<.~e proven successful at delivering unwanted messages to many more destinations before a system administrator could correct, or even detect, a problem.
0889~572CA
The :nature of message traffic also evolves Over tune, and no one knows what is a forgery and what is ncn better than the subscriber receiving a message;
certainly, no organization could hope to establish criteria which will stand the test of time frorn launch date. As such, subscribers interacting with the system benefit by its increased 5 effectiveness for both false-positive and false-negative detection - as do other subscribers vvho do not participate, but benefit by the actions of subscribers.
In accordance with a further aspect of the present invention there is provided a method of identifying arid classifying coiTesponder~ts sending electronic messages, said method comprising 'the steps of: classifying a sender of a message as one of recognized or unrecognized; and if recognized, determining that contents of the message are authentic to tihe~ sznder.
in accordance with a~ further aspect of the present invention there is provided a method of i.d.entifying and cIassifyixig correspondents sending electronic messages, said method comprising the steps of: classifying a sender of a message as one of recognized or: unxecogaiz:ed; classifying a correspondent as one a machine and a person; if classi$ed as unrecognized, sending a request for response from the correspondent in an attempt to classify as hum; if recognized, deiertnining one that contents of th~° message ai°e authentic to the sender and that the contents of a message are authentic to a categk7r?r of correspondents; if unable to determine complete 0 acceptability of the correspondent, requesting confirmation from a potenrial recipient that receiving a message from a specific previously unseen correspondent is desired and that an initial clarsi$e$tiotl of the correspondent is correct; upon confirmation without reclassification, delivering the message to the recipient; and upon confrrnation with reclassification, changing the classification of the correspondent 2S and delivering the message to the recipient.
The mast signifiear~t benefits from the system occur in a combined method or apparatus. An important concept applied in the methods and apparatuses of the present invention are that xnsssage traffic i:; analysed on the basis of insuring it is authentic and from vcrhorr~ it claims to be from, rather than with the common 30 approach: trying to determine if it is undesixabIe on a generic, and often vague, basis.
08897572CE~
xn accordance with another aspect of the invention there is provided a method of identifying and classifying correspondents sending electronic messages, said method comprising the steps of: classifying a receiver o f a message as one of subscribed and unsubscribed; if unsubscribed, preparing a signature from contents of the message; comparing the si~ature to determine whether a similar message previously rE;ceived; and if yes, determining if a predetermined threshold has been '' reached for the similar mcsssage; and blocking any ftu~ther sitnilar messages.
rn acc:ordance witlx ~u~other aspect of the invention there 1s provided apgaratus for identifying arid c:l.assifying correspondents sending electronic messages comprising: means far ctas~sifying a receiver of a message as one of subscribed and unsubscribed;, means for preparing a signature fiom cotltents of the message if unsubscribed; means for comparing the signature to determine whether a similar message previously received; and means for determining if a predeterrn.ined threshold has been reached for the similar rneSSage; and blocking any further similar messages.
~:5 This method and apparatus allows more stringent measures for dealing with potential SPAM to be reserved for those instances where the carrier is definitely tuxder a "carpet bon3b" SPAM attack. While the other anti-SPAM measures, described herein above a.re available oo handle lower volumes of normal SPAM.
Conveniently, the invention provides the following qualif ers to an arbitrary correspondent address *arigGrr *caregory, sub~ategory ~'methad *subscriprion_~evel 1. Where address can be either an ettxail address long or short code address
NtETIIOp Alri~p APPARATUS FOIL IpENTIFICATION ANp CL,ASSiFICATIAN G1)f COR,RESPG1YDENTS SEl~tntNG ET.ECTRUNIC
MESSAGES
S Field of the :Invention The present invexltion relates to metliod and apparatus for identification and classification of the solders of electronic messages, and is particularly concerned with pre-delivery s;creex?ing.
Background of the Invention Traditional methods .for the identification of machine correspondents provide a significant weala2ess in the implementation of many challenge-based messaging systems. In order to satisfy the requirements of subscribers wishing to receive certain machine-ozigi~aated rrle$sages, messaging providers have traditionally allowed messages claiming a certain origins to effecrively circumvent tine challenge system.
However as cluallerlge-based systems become more prevalent, malicious organizations will begin to recognize this irrlhlementation weakness; specifically, that once they acquire a (potentially Iegiti~mate) account vn a particular system, that they can probe ii {or merely read its documeextation) to discover which source addresses are considered 2t) to be from "legitimate" r~cl2ine-originated services. 'rhe weakness here is shat the traditional challenge-based systems do not have a method far insuring chat machine-originated messages are not in fact forgeries; in fact they merely examine the SMTP
envelope, and accept the addrass arid message ai face-value. It is then a simple matter for a rnaliciou:~ person to forge messages claiming to be fiom one machine with another - varyi:ctg the payload from, for exa;nple, weather reports to an illegal chain letter attesnptirtl; to defraud unwary subscribers.
08$97572C.A.
.7 Othex challenge-hosed systems assume they exist in isolation, or at least that no other vendors of challenge=based system exist. This means that a human protected 6y non-interoperable systems could not communicate wi2b a human an another challenge-based system, as challenge messages are, by their very de&nitian, rriachine-originated.
Syste:rrrs, such as 'khose offered by E3rightMail and Postini use signatures atxd/ar heuristics to recognize invalid traffic. These systems assume that message traffic is valid until declared itlva,Iid and as such are vulnerable to mass-mailing attacks when a resourceful, ;yet malicious person causes a machine to forge large quantities of e-mail.
This malicious person aiewd only acquire an account protected by one of these systems, probe their detectiozt algorithms, and begin injecting mail at a furious rate -he will be successful until the centralized authority updates its protection criteria (signatures, Heuristics) at the leaf nodes. Depending on tile resources available to the malicious person, this could mean millions of compromised mailboxes.
To be successful ixi this effort there must be adequate ltnawledge of the nature of messaging today. Messages take many forms including email, SMS and Instant but each of these has some common characteristics. First, they are all either originated :ZO directly by a human being or a machine operaun g an automated program and second they are all intended to tae received by a huxtaan reader.
There is of course a vast array of both the intent and content of messages and this invention does not seek to classify them. Rather this invention identifies and auther~dcates the corresgond~ts.
In order to be of value to an arbitrary user of messaging systems any classification system should be capable of breakdown of correspondent. That is recognized or unrecognized, human ar machine, desirable or undesirable.
08897572~.~
Summary of the Invention An object of thE~ present invention is to provide an improved method and apparatus for identification. arid classification of the senders of electronic messages.
Accordingly, the invention allows recipients to refine their own definition of correspondents without affecting the detnition of correspaz~dents for other users of the system. The invention thus ensures the authenticity of messages by properly classifying the correspondent and subsequently ensuring that the content from an arbitrary correspondent has not been forged.
IU
8y virtue of technology employed in the challenge-response system, it becomes very difficult to develop software to "play human" and authenticate itself on an autornated basis. It is fitrihermare very time-consuming to become recognized as a valid coaespondent for sexcding rrressages to a large number of users, as the system I 5 isolates delivery pernlission on a recipient-by-recipient basis.
In accordance wii:h an aspect of the present invention there is provided a method of identifying alnd Cla_S61fy3.ng correspondents sending electronic messages, said method c:omprisiug the steps of classifying a correspondent as one a recognized machine or recognized htunan or tuu-ecognized and if classified as unrecognized, 20 sending a challenge request for response from the correspondent in an attempt to classify as hll;man.
The in teroperability provisions in this method provides clear benefits to the messaging commuW;y-ai--Iarge; without it, most users would not be able to communicate with one a:nc~ther as the use of challenged-based system reaches a :!S critical mass. As such, no.n-interaperability would eventually degrade the rxiessaging system as a whole.
In accordance with another aspect of the present invention there is provided a method of identifying and classifying correspondents sending electronic messages, 0889~572C.4 said method comprising the step uf: determining that contents of a message are authentic to a category of rctachitte correspondents.
Determining the category of a machine correspondent is the first stage in ensuring messages from an already identified machine correspondent are not forged, however on its own it also provides tangible benefits to subscribers.
Automated classification provides the ability for subscribers to n3anage their message traffic more easily; for example by ilkerin.g messages to different folders based an recognition criteria, or f~electxng which traffic to send to a wireless device or ocher messaging destination based on on8;in type.
In accordance with a further-aspect of the present invention there is provided a method of identifying and classifying correspondents sending electronic messages, said method, comprising the steps o#: requesting conf~riatiorr from a potential recipient ih~~i receiving ~t message from a recogni2ed but previously unseen correspondent is desired axed that an initial elassihcation of the correspondent is correct; upon con firmation without reclassifleation, delivering the message to the recipient; and upon confzrnzation with reeiassificarion, changing the classification of the eorrespor.~dent and delivering the message to the recipient.
This method benefits both subscribers and providers of messaging systems. rn the solicitation of input from subscribers, the system of#loads maintenance and teaching requirements from the provider of the messaging system; this has a direct impact on staffing levels fior mid w large~sized organizations, yielding a very tangible reduction in overhead expenditure.
The heilefit to the subscriber is less tangible, however it is of paramount importance. A~essage traffic tends to ebb and flow like tides, aazd sometimes, like tidal :?S waves. Allowing subscrib~r~; to "teach' the systerz~ right fronn wrong when it has made a mistake helps the system better react to future change, but more importantly, the ever-alert subscriber base will be able to curb the onslaught of an attack which may have otherwi:<.~e proven successful at delivering unwanted messages to many more destinations before a system administrator could correct, or even detect, a problem.
0889~572CA
The :nature of message traffic also evolves Over tune, and no one knows what is a forgery and what is ncn better than the subscriber receiving a message;
certainly, no organization could hope to establish criteria which will stand the test of time frorn launch date. As such, subscribers interacting with the system benefit by its increased 5 effectiveness for both false-positive and false-negative detection - as do other subscribers vvho do not participate, but benefit by the actions of subscribers.
In accordance with a further aspect of the present invention there is provided a method of identifying arid classifying coiTesponder~ts sending electronic messages, said method comprising 'the steps of: classifying a sender of a message as one of recognized or unrecognized; and if recognized, determining that contents of the message are authentic to tihe~ sznder.
in accordance with a~ further aspect of the present invention there is provided a method of i.d.entifying and cIassifyixig correspondents sending electronic messages, said method comprising the steps of: classifying a sender of a message as one of recognized or: unxecogaiz:ed; classifying a correspondent as one a machine and a person; if classi$ed as unrecognized, sending a request for response from the correspondent in an attempt to classify as hum; if recognized, deiertnining one that contents of th~° message ai°e authentic to the sender and that the contents of a message are authentic to a categk7r?r of correspondents; if unable to determine complete 0 acceptability of the correspondent, requesting confirmation from a potenrial recipient that receiving a message from a specific previously unseen correspondent is desired and that an initial clarsi$e$tiotl of the correspondent is correct; upon confirmation without reclassification, delivering the message to the recipient; and upon confrrnation with reclassification, changing the classification of the correspondent 2S and delivering the message to the recipient.
The mast signifiear~t benefits from the system occur in a combined method or apparatus. An important concept applied in the methods and apparatuses of the present invention are that xnsssage traffic i:; analysed on the basis of insuring it is authentic and from vcrhorr~ it claims to be from, rather than with the common 30 approach: trying to determine if it is undesixabIe on a generic, and often vague, basis.
08897572CE~
xn accordance with another aspect of the invention there is provided a method of identifying and classifying correspondents sending electronic messages, said method comprising the steps of: classifying a receiver o f a message as one of subscribed and unsubscribed; if unsubscribed, preparing a signature from contents of the message; comparing the si~ature to determine whether a similar message previously rE;ceived; and if yes, determining if a predetermined threshold has been '' reached for the similar mcsssage; and blocking any ftu~ther sitnilar messages.
rn acc:ordance witlx ~u~other aspect of the invention there 1s provided apgaratus for identifying arid c:l.assifying correspondents sending electronic messages comprising: means far ctas~sifying a receiver of a message as one of subscribed and unsubscribed;, means for preparing a signature fiom cotltents of the message if unsubscribed; means for comparing the signature to determine whether a similar message previously received; and means for determining if a predeterrn.ined threshold has been reached for the similar rneSSage; and blocking any further similar messages.
~:5 This method and apparatus allows more stringent measures for dealing with potential SPAM to be reserved for those instances where the carrier is definitely tuxder a "carpet bon3b" SPAM attack. While the other anti-SPAM measures, described herein above a.re available oo handle lower volumes of normal SPAM.
Conveniently, the invention provides the following qualif ers to an arbitrary correspondent address *arigGrr *caregory, sub~ategory ~'methad *subscriprion_~evel 1. Where address can be either an ettxail address long or short code address
2:5 (SMS, or instant message identifier 2. origin is either human or machine
3. Category is usually for mar_hiues and refers to one of mailing list, info service, 3G service message etc.
08897572C.:~
08897572C.:~
4. Sub-category is usually for the info service category and refers to weather, new:; spans, stocks etc
5. Method describes how and where this corresponderxt was added to the subsc:riptian list b. Subs~iption lever is either tentative indicating it was added by the system or subscribed iridieatisa.g it was addad car endorsed by the recipient tharrraclvcs ~irief Descriptipn ref the ~rawipgs The present invention wih he further understood from the following detailed description with reference to the drawings in which:
Fig. 1 illustrates in a functional block diagram a message system in accordance with au embodiment of the present invention;
Fig. 2 ihustrates vv a flow chart further detail of the apparatus and method of ~0 Fig_ 1;
Fig. 3r illustrates the campanents of segment 300 of Figs. 1 and 2;
Fig. 4 :illustrates the components of segment 6a0 of Figs. 1 and z;
Fig. S :illustrates a sec:ond~ errkbodiment of the present invention for application to messaging i:n a wireless c:o~ununieaaons environment;
Big. b illustrates a thud embodiment of the present invention for application to messaging in a wireless commurliaations environment.
0889'75 ~2t_.~1 Detailed De~seription of tt~e Preferred Etrtbodiment S Referring to Fig. 1, there is illustrated in a functional block diagram a message system in accordance with an embodiment of the present invention. The message system includes an initial screening sef;ment 100, a correspondent classification segment 200, a recipient endorsement segment 300, a category specific positive recognizes s,egrnent 4001, a rt:cipierlt level positive recognizes 500, axzd ari auio-response sel~nent 600, ~t corxespondent specific reei.pient level positive recogn.izer 800, a delivery box 900 and. a rejection box 888. fiach major segment can deliver the message, pass it to the ne~ct segment or reject it.
The initial screening segment 100 is coupled to the delivery box 900, the next segment, correspondent classification segment 200, end the reject box 888.
The Fmrpose of c~azrespondent classification segment 200 is to classify valid machine originated correspondents and eas~tre their messages are not forged.
The correspondent classification segment 200 passes the message to the recipient endorsement segment 30(:1, the recipient level positive recognizes 500 or the rejection box 888. Segment 300 prnvides an opportunity far the intended recipient to either endorse or correct the classification of segment 200, allowing the system to adapt its future classif.cations. The cate;8ory specific positive recognizes segment 400 passes the message to the delivery box 900 or auto-response segment 600, which either ~5 rejects it (to rejection box 888) or passes it to the recipient level positive recogni2er 500. The recipient level positive recognizes 500 either rejects it or passes it to recipient endorsement segment 300. The correspondent specific recipient level positive recogni2er 800 either rejects it or passes it to the delivery box 900.
?~0 The following definitions will assist in the understanding of embodiments of the present invention.
Q8897S72C.~
1-Global Syt~ Black _List CsSBL_ The global system black list is a collection of specific co:n'espondents that will not be accepted at the system level for delivery to recipienis_ A, message received purporting to be from a correspondent on this ~,st will be routed to the bad message corpus for later analysis. Only the System Adminisuator would normally administer this list.
2- Glai~g~. Bad Corpus GEC. 1'he Global 'bad corpus is a collection of messages that have been rejected for dc~iivery by the system either because the correspondent is on the GBS, or the correspondent being declared an imposter_ They are useful for analysis and. :for training )3ayesian filters.
~- Fosi~ive RmQg~zizer PR. A positive recognizer is a text classifier and tnay be either a Bayesiaa filter, merrxory based masoner or case based reasoner that has been trained 1S to recognize :message content either from a specific correspondent yr messages of a very specif c type. For earanaple messages purporting to be from a stock information service wih b~e sent througxt a 1?R chat has been trained by a signlfieant volume of message traffic fram a v~.ety of valid stock information service.
Correspondent specific PRr, ;are created by advance system training on a corpus of valid messages ,~ 0 fror~n that specific correspondent or when there are X number of identical entries in the RSL lists ~~cross the recipient base of type machine created by process not by the recipient.
08897572C.A
'fhe Correspondent specific PR table consists of entries in the form:
Corresportdc:ru ~ ~1~
~ Category 5 ~ Sub-Category » Location «f s:lassifier Each a try is created based on a specific recognized Correspondent address.
Where Correspondent is the address of the Correspondent, and Origin is either hurrran, or machine, Category is dither mailing list, info service or service message and sub-10 category is one of news, weather, sports, lottery, horoscope, stoclCS, etc.
The Category PR table consists of entries in the format:
~ Correspondent ~ Ongin ~.5 ~ Category ~ Sub-Category ~ Location of classifier In this case Correspondent is unrecognized. These are created to represent ?0 very specific categories oi' information services. These could he categories such as weather, stoclt;~, ring tones; et:c. inhere the type of information service can be provided through a nurtlber of addresses but il3e content is very specific and easily identifiable.
The value of these categs:~ry specific PRs is that the recipient can be provided sufficient i>Gtformaiian about the legitimate content of the message to decide whether 2:i or not to receive the mes:aagt: aver if the correspondent is not recogni2ed to the o~s9zs~zc.~t 1~
system. One-very specific form of Category Positive Recognizer recognizes the form and method~> associated with valid Mailing Lists. This Positive Recognizer is a combination of a trained laayesian filter coupled Nvith heuristic text scanner and an additional code block intc~n.ded 'to identify Mailing List characteristics.
The confidence level must be very high befare this type of .correspondent identification is trusted to be passed an to the recipient.
4- Recipient-Subscri t-p ion List. The RSL contains a mix of coxrespondents:
those confirmed si:rriply to.be human by the auto-responder, those correspoxident addresses added as the result of the user themselves sending a message to this correspondent or by endorsir~~; the tentative classification by the system as an acceptable machine originated m~asage and those added to the list directly by the user through the allow-hst admin s~,ween.
Entries in the Subseriptiou-List have the form:
~ Correspondent - the address of the correspandent ~ Origin - either human or machine ~ Categpry (only i:f machine) - one of mailing list, info service, service message ~ Sub-Category (ox>ly if machine) - news, weather, sports ~ Method - were they added by the user sending a message to someone, by the auto:>-respander confirming the correspondent to be human, by the user tl:,~e~xnselves manually, by the recipient in the process of endorsing n tentative classificaxian of a PR in a confirmation message (PR).
~ Level - eithex tenl:atiue or subscribed 5- Recipient=l~~n 1_ist RbL. 'the user populates the reeipieni-deny list by one of several specifi~,c actions. An Entry rnay be added to the list through the deny-list admin oss9~s~?C~~
1?
screed, as a response to the endorsement message presented for delivery by the system, by forwarding ate arbitrary message to their RDL.
The user may adca a. correspond.eni to the deny Iist for any reason and messages that may for all other reasons be acceptable is sent to the trash if the correspondent is in the RpL.
Fig. 1 illustrates in a functional block diagram a message system in accordance with au embodiment of the present invention;
Fig. 2 ihustrates vv a flow chart further detail of the apparatus and method of ~0 Fig_ 1;
Fig. 3r illustrates the campanents of segment 300 of Figs. 1 and 2;
Fig. 4 :illustrates the components of segment 6a0 of Figs. 1 and z;
Fig. S :illustrates a sec:ond~ errkbodiment of the present invention for application to messaging i:n a wireless c:o~ununieaaons environment;
Big. b illustrates a thud embodiment of the present invention for application to messaging in a wireless commurliaations environment.
0889'75 ~2t_.~1 Detailed De~seription of tt~e Preferred Etrtbodiment S Referring to Fig. 1, there is illustrated in a functional block diagram a message system in accordance with an embodiment of the present invention. The message system includes an initial screening sef;ment 100, a correspondent classification segment 200, a recipient endorsement segment 300, a category specific positive recognizes s,egrnent 4001, a rt:cipierlt level positive recognizes 500, axzd ari auio-response sel~nent 600, ~t corxespondent specific reei.pient level positive recogn.izer 800, a delivery box 900 and. a rejection box 888. fiach major segment can deliver the message, pass it to the ne~ct segment or reject it.
The initial screening segment 100 is coupled to the delivery box 900, the next segment, correspondent classification segment 200, end the reject box 888.
The Fmrpose of c~azrespondent classification segment 200 is to classify valid machine originated correspondents and eas~tre their messages are not forged.
The correspondent classification segment 200 passes the message to the recipient endorsement segment 30(:1, the recipient level positive recognizes 500 or the rejection box 888. Segment 300 prnvides an opportunity far the intended recipient to either endorse or correct the classification of segment 200, allowing the system to adapt its future classif.cations. The cate;8ory specific positive recognizes segment 400 passes the message to the delivery box 900 or auto-response segment 600, which either ~5 rejects it (to rejection box 888) or passes it to the recipient level positive recogni2er 500. The recipient level positive recognizes 500 either rejects it or passes it to recipient endorsement segment 300. The correspondent specific recipient level positive recogni2er 800 either rejects it or passes it to the delivery box 900.
?~0 The following definitions will assist in the understanding of embodiments of the present invention.
Q8897S72C.~
1-Global Syt~ Black _List CsSBL_ The global system black list is a collection of specific co:n'espondents that will not be accepted at the system level for delivery to recipienis_ A, message received purporting to be from a correspondent on this ~,st will be routed to the bad message corpus for later analysis. Only the System Adminisuator would normally administer this list.
2- Glai~g~. Bad Corpus GEC. 1'he Global 'bad corpus is a collection of messages that have been rejected for dc~iivery by the system either because the correspondent is on the GBS, or the correspondent being declared an imposter_ They are useful for analysis and. :for training )3ayesian filters.
~- Fosi~ive RmQg~zizer PR. A positive recognizer is a text classifier and tnay be either a Bayesiaa filter, merrxory based masoner or case based reasoner that has been trained 1S to recognize :message content either from a specific correspondent yr messages of a very specif c type. For earanaple messages purporting to be from a stock information service wih b~e sent througxt a 1?R chat has been trained by a signlfieant volume of message traffic fram a v~.ety of valid stock information service.
Correspondent specific PRr, ;are created by advance system training on a corpus of valid messages ,~ 0 fror~n that specific correspondent or when there are X number of identical entries in the RSL lists ~~cross the recipient base of type machine created by process not by the recipient.
08897572C.A
'fhe Correspondent specific PR table consists of entries in the form:
Corresportdc:ru ~ ~1~
~ Category 5 ~ Sub-Category » Location «f s:lassifier Each a try is created based on a specific recognized Correspondent address.
Where Correspondent is the address of the Correspondent, and Origin is either hurrran, or machine, Category is dither mailing list, info service or service message and sub-10 category is one of news, weather, sports, lottery, horoscope, stoclCS, etc.
The Category PR table consists of entries in the format:
~ Correspondent ~ Ongin ~.5 ~ Category ~ Sub-Category ~ Location of classifier In this case Correspondent is unrecognized. These are created to represent ?0 very specific categories oi' information services. These could he categories such as weather, stoclt;~, ring tones; et:c. inhere the type of information service can be provided through a nurtlber of addresses but il3e content is very specific and easily identifiable.
The value of these categs:~ry specific PRs is that the recipient can be provided sufficient i>Gtformaiian about the legitimate content of the message to decide whether 2:i or not to receive the mes:aagt: aver if the correspondent is not recogni2ed to the o~s9zs~zc.~t 1~
system. One-very specific form of Category Positive Recognizer recognizes the form and method~> associated with valid Mailing Lists. This Positive Recognizer is a combination of a trained laayesian filter coupled Nvith heuristic text scanner and an additional code block intc~n.ded 'to identify Mailing List characteristics.
The confidence level must be very high befare this type of .correspondent identification is trusted to be passed an to the recipient.
4- Recipient-Subscri t-p ion List. The RSL contains a mix of coxrespondents:
those confirmed si:rriply to.be human by the auto-responder, those correspoxident addresses added as the result of the user themselves sending a message to this correspondent or by endorsir~~; the tentative classification by the system as an acceptable machine originated m~asage and those added to the list directly by the user through the allow-hst admin s~,ween.
Entries in the Subseriptiou-List have the form:
~ Correspondent - the address of the correspandent ~ Origin - either human or machine ~ Categpry (only i:f machine) - one of mailing list, info service, service message ~ Sub-Category (ox>ly if machine) - news, weather, sports ~ Method - were they added by the user sending a message to someone, by the auto:>-respander confirming the correspondent to be human, by the user tl:,~e~xnselves manually, by the recipient in the process of endorsing n tentative classificaxian of a PR in a confirmation message (PR).
~ Level - eithex tenl:atiue or subscribed 5- Recipient=l~~n 1_ist RbL. 'the user populates the reeipieni-deny list by one of several specifi~,c actions. An Entry rnay be added to the list through the deny-list admin oss9~s~?C~~
1?
screed, as a response to the endorsement message presented for delivery by the system, by forwarding ate arbitrary message to their RDL.
The user may adca a. correspond.eni to the deny Iist for any reason and messages that may for all other reasons be acceptable is sent to the trash if the correspondent is in the RpL.
6- Aegat~ar. The aggrel;ator is a daemon running in the background that examines all recipient-subscription lists for identical entries. The aggregator looks at both the address as well as the category fields and 'when a critical number of entries across the aggregate recipient subscription lists exceeds the system administrator defined "x"
theta a new personal positive recognizer is created for that correspondent.
This PR is only used at the individual recipient level not at the system wide level. The individual entries irt thE~ subscription lists are retained but now the content of messages from this correspondent is examined to identify impasters.
theta a new personal positive recognizer is created for that correspondent.
This PR is only used at the individual recipient level not at the system wide level. The individual entries irt thE~ subscription lists are retained but now the content of messages from this correspondent is examined to identify impasters.
7- Auto-R~s_ o er. The auto-responder selectively sends out a subjective challenge message to correspondents that cannot be explicitly identified by other components of the system. The challenge m~~asage can be one of a number of different subjective questions that can only be processed by a perceptive being (a human being).
The auto-responder w;~its for a cozTect r<~sponse frog the correspondent and if it gets it marks the correspondent as being human with a status of tentative and the original message continues to groeess. if there is no reply or the answer is incorrect then the message is sent to the G:BC.
The auto-responder w;~its for a cozTect r<~sponse frog the correspondent and if it gets it marks the correspondent as being human with a status of tentative and the original message continues to groeess. if there is no reply or the answer is incorrect then the message is sent to the G:BC.
8-~ndorsement~tequest :. The endorsement request is sent to a recipient when the correspondent of a message intended for delivery to them has been identified by system as having a rnatcl~ to a positive-recogni~er but this particular recipient has not seen messaQ;es from this correspondent before ar has been classified by the 08897572C~~
autorespond~er segment a.s human but again this paz'ticular recipient has not seen messages from this correspondent before. In each case the subscription level will be tentarive.
The :message is stn~ctured to describe whether the origin is human or machine and if machine the type of message the system thinks this is (i.e. a rriailing list or information service), thcs <~ub-category of the message (news weather spores, lottery results ete.) and the correspondent's address. The recipient is asked to conf rm or reclassify the tuessage category and whether they wish to endorse the tentative system classification by subscribing; to further correspondence. if the intended recipient (the IO user) responds with Bndorse, then the Correspondent will be added to the RSL as machine originated sul:Rsc:rib~~d or humaza subscribed and the message will be delivered.. I:F the intended recipient (the user) responds with reclassify as then -the correspondents entry in the RSL is modified accordingly. If the intended recipient (the user) responds wu:h l:mposter, then the correspondent is rerraoved from the subscription list and the message is sent to the CrBC. Tf the intended recipient Ithe user) responds with Block, then the message is sent to the trash arad the correspondent is added to the RDL
Positzve Recogtrizers are created in part initially by sarupling the existing inbound message corpus as well as an existing corpus of information services and ?p mailing lists. A single laR is created for each correspoxident to be recogcii2ed. These may be individuals, information services such as alerts~ahnsn.com or retailing lists such as issu~s~~idl.com or service messages from such globally popular phenomena as ebay TM :outbid notices.
Tn tin implementation of the present invention, an individual PR is created far most popular inf'ormaiic~u service types and globally popular correspondents.
For example, weather, sppna~ stocks. rind i~ne~_ ~lPf1 ?hP evrct,am rc.n~r~.,..m~.
..-..7 authenticates. an arbitrary inbound l7nessage as matching one of these gmup or type PR, it is able to ;pass this information on to the recipient to endorse. if sufficient individuals respond to the system endorsement message for inbound messages of this o~ss7s?2c~, la type with F?NDORSIr then a new correspondent specific PR will be created automatiealhrbut only at ~,he user level.
In a l;eneral implementation, the PR is shown a large amount of appropriate mail from the target correspondent as well as a large amount of mail not from the correspondent so it can then readily identify valid content from this target correspondent. In the case of an informauan service, the initial group of P.P~
can be augmented in a number of ways. First the system administrator can create an additional F'Ft~ when a rtew widely used hailing list or information service is identified, A new correspondent specific PR is created when the number of individual recipient entries for this cc:~rrespcrndent exceeds a system adrriinistrator set threshold.
Referring to Fig- 2, there is illustrated in a ilow chart further detail of the apparatus and method of F'i~g- 1. Fig. 1 shows an inbound message coming from a L 5 sending SM7.'P server f 00 and received by the system. SMTP server 101.
Correspondent: and recipient addresses are extracted and placed in memory.
Headers are examined .for a valid challenge signature from another challenged based MTA, if there is a valid challenge signature, the message is delivered 900. rf there is no valid challenge signature then processing continues in block 102.
The correspondent address is compared against the GSDL 102 for a match.
Messages from correspondents that match entries in the GSDL 102 are sent to the Global Bad Corpus (GBC) ?00 and processing ends. Messages not matching entries in the GSDL 102 continue to be processed in segment 200. Tl~e purpose of segment 2:5 200 is to idem:ify and classify valid machine originated correspondents and ensure their messages are not forged. The correspondent address is compared farst with entries in the c~~trespondent specific PR table 200 to see if sn entry exists.
If there is no matching entry the message is passed to segment 400.
3C1 If there is a matching entry then the origin, category, sub-category and classifier location are noted :in memory and the message is passed through the specif c PR 201 to aut:~mnticate the correspondent. F:f the content does not match, then The ossg~s~2Cp, correspondent is determined to be an imposter and the message is sent to the GBC
700 and processing ends.
If the message content matches then the correspondent is authenticated and S processing continues to qt.~ery ?~~2. The system checks to see if the correspondent is in the intended recipient's (the user) RDL as represented by query ?02. If it is, then the message is sent to Trash 8001 and processing stops.
If it is not in the R.)~L, then processir~i~ continues and the system looks in the ,tU RST. as represented by query 203 for a match on the correspondent. if there is a match on the correspondent: :in the RSL and the subscription level is subscribed as determined hay a query 205, then processing continues in segment 500. if there is a match in the aZSI. but the subscription level, is rentarive then processing continues in segment 300. hinally, if tYlere is no match in, the RSL then the correspondent is added 1 S to the recipiern's subscription list as machW e.tentative 204 and processing continues in segxneni 30~).
Segme:3t 300 pr4vides an opportunity for the intended recipient to either endorse or cax-,rect the classification of Segment 200. Allowing the system to adapt its 20 future classific:atious. The components of segment 300 of digs. 1 and 2 are shown separately in :dig. 3.
Segment 300 begins with a message being passed from segment 200 or 500.
These messages have been Gtlready classified by the system as either having been 2.i received from a mcogni2ed pre-viously seen correspondent or an identified but previously tin:>een coFreslaondeait with authentic content ar classified by the autoresponder as human and previously unseen by this recipient. In this case previously secr.~ means by they system not necessarily by this recipient arid autheauc content means the message xnatched an established correspondent category.
In each case the intE:nded recipient (W a user) has not seen previous messages from this correspondent. The system prepares an endorsement request 300 to send to osss~572cA.
1~
the intended recipient (the user). The endorserrtent request contains system specific text plus the tentative system classification of this correspondent as:
reeogni~ed or ttslrecognized human or machine and if machine content category and sub-category.
The intended recipient receives the message acid is provided with a means to respond 301. If the recipient responds with ENDORSE, they are indicaring that they agree with the correspondent classification and accept this message and wish to receive further correspondence front this correspondent. The correspondent's entry in their RSL 3Ct6 is modified to machine.s~ubsoribed ox~ human.subscribed and the 1. G message is del',ivered If they respond with I~loclc then the message is placed in the trash 800, the correspondent is removed .fxom the RST~ and placed in the RDL 303 and processing stops. If they respond witEt 7CNfPCJSTER they are indicating that they did not request this message and or that it is not what it claims to be (in other words a forgery). The message is sent to the G~3C' '7~Op and processing stops.
Throughout the system, each timE: the recipient reclassifies a tentative classification h~y the system, the system adapts to the additional input by sending the reclassified data back through the specific classifier fear updated training.
Segment 400 begins with the message being received from segment 200. It has not matched any carresponclent specific PR and is not in the GSDL. The message is passed through each of the' category specific pRs looking for a category match. If there is a category match t:he.n processing continues to query 202, witli the system noting the category, sub-category along with the correspondent and recipient information.
Finally:, ;if there is no category specific matching entry, processing continues to query 401, where the systeru Looks for a match on the RSL. If no match is fotmd, processing continues in query 402 where the system checks further to see if there is a partial match on domain or clomairt.subdomain. If there is, then the system checks the 08897572Cr~
characterisxita of the message and the recipients message history to see if the subscriber leis sent messages there recently and determines if this could be a service message. .If there is a xnatch then processing continues in segzne~xt 300. If there is no match, then processing continues in segment 600. If a match is found, then S processing continues in query 205 with the system checking to see if the RSL
entry is subscribed.
If the correspondent entry is subscribed then the processing continues in guery 206 with the system checking for the correspondent in the Correspondent Specific Personal Recogcti2er table::. If no snatch is found the message is delivered to the recipient 900, If a match fox the correspondent is found in the Correspondent Specific Personal Recognizer tables tY~en t:lie content of the message is checked for authenticity.
If there is a match then it is yeexned to be authentic and the message is delivered 900.
If there is xio match then. the correspondent is deemed to be an imposter and the Z > message is serif to the G$C.". 700 and processing stops.
If the ~:orrespondem is not subscribed then processing continues in segment 300.
2n The system checks tc> see if there as a corresporidezlt specific entry in the Recipient level PlZ table 500, if there is a matching entry then the message is passed through the spe:cifie PR 501. tc~ authenticate the cartespondent by matchipg the content of this message against t#e content of previous messages lmown to have originated from this coxxe:;poudent. If there is no rru~ttch then the correspondent is found to be an 2.~ itnposter and the message is sent to the G~~C 900 and processing ends.
If the message content matches existing content from this correspondent then the processing c:Antinues to l-~lock X02 where the correspondent is added to the RSL as hurxran.tentative and processing continues in segment 300.
Segment 800 begitzs with the message being passed franc segrrtent 200 after a yes from query 205. The system checks in. the Correspondent Specif a Recipient 08S97S72C'~~
1$
Level Positive Recog~er table_ If there is a matching entry then the message is passed through the specific PR_ 801 to compare the content of the message io those lanown to be from this eortespondent. if the content does not match the correspondent is determined to be ~n inaposter, the message is sent to the CrBC 700 and processing ends. if the xr~essage content matches then processing continues to delivery 900.
Segment 600 begins with the message being received firorti Segment 440.
There is no ~maiehing enter in the RSL and processing continues with the systerrl originating a message to the purported correspondent 600. The message contains system specific information plus an image or images or an audio f le together with a subjective qua=soon in either text or audio form for the cvrresponderlt that requires a perceptive being to interpret. If there is no or a bad response then the message is seat to the GBC '740 and processing stops. If there is a correct response then processing continues and the system hates classifies this correspondent as human- The system 1 S then checks in the RDL 601. If there is a match to the correspondent in the RDL then the message is sent to the ''trash 804 and processing stops. If there is no match in the RDL then processing is passed back to segment 500.
Referring to Fig. 4~, there is illustrated in a flow chart segment 604. The system extracts the correspondent, recipient and reply to address fiom the original message 601.. :Processing continues in query 642 with the system checking to see if there is a reply to address present. Tf NO then the message is sent to G>3C
700 and processing ends. If YES then processing continues in block 603 with the system selecting standt~rd components of the original message, creating a unique identifier.
Processing continues in block 644 with the system randomly selecting a challenge meth~~d and creating; a challenge message which includes a unique identifier from the arigin.al message. '1,"he system stores the messaged-pointer to the original message and the; pointer to the answer to the challenge in the challenge message table and parks the original znessat;e in the holding queue b46. Processing continues in 606a with the challenge message being digitally signed so other challenge-based systems can recognize it and mute it aGCOrdingly 606a. The challenge message is sent 0889fi572CW
to the recipient 607. Processing resumes in 608 with the system listening for a reply to the chahenge message on designated CGrI ports and the SIvfTP daemon.
If no reply is rt:c~;ived 609 then the message is ' sent to , GBC 700 and processing ervds. if a reply is received then the system extracts the unique identifier and uses it to lookup the answer to the challenge message in the challenge message table 6I0.
if the answer is correct then the correspondent is added to the subscription a() table as huma:n.tentative and the message is send to 501. If the answer is incorrect the message is sent t0 the GBC and rao action taken on the correspondent.
The present invention provides a method of identifying whether a particular correspondent is a machine or not a machine through a challenge response model.
This alone however is insufficient to implement in the real wand where a typical message recipient regularly rPCeives desirable messages from rtxachixtes in the form of mailing-lists, information services, receipts and service messages from the machines they are attemlrting to correspond with-The purpose of the auto-responder segment is to categorize correspondents into two categories. l~iuxnan and non-human. By the time a message has reached the auto-responder it has already passed through processes that attempted to attribute the message to spe~~ific hurrcan or machine correspondents that the system is aware of or to a parucuiar nategory of ixtachines that the system is aware of and to-authenticate them.
~iaving :failed to be classified by these above-rnentir~ned processes, the auto-responder proceeds to cha.ale:nge the correspondent to prove- helshe is human.
Embodiments of the present invention employ several different methods to accomplish this task, which ~~re used singly or in combination.
08897S72CA, On arrival in this se~rrent tire inbound message is processed to derive:
1. The correspondent address;
2. The reeipi~:nt. address;
S 3. The return address;
4. The overall su2e of the message;
5. Unique identil°ier associated with the message.
For each raethod, pmcessing proceeds as follows:
~a I- A challenge rtzessage is created using one of the methods listed below.
2- The challenge message is digitally signed.
3- The challenge: message is seat to the indicated return address of the original message.
1 S 4- The auto-responder listens for replies to the message eithes as an inboun~~ message to a specified address or as a CGI response to a link embedcied within the message.
S- If there is na return address pmcessing stops and the message is dropped.
20 6- If no response or an incorrect response to the challenge is received within E~ system definable period the cot~espondent is declared to be a machine incapable of a respa.nse and depending on the implementation the message is eixher cropped or placed in. a holding queue.
7- l.f the correct respo:ase to the challenge is received the correspondent is declaredl to be hurri.an and is added to the recipients subscription list as human.tentative and the message is passed to the segment 300 fox delivery along with an endorsement rzquest.
8- This eznbodi~nent of the invention assumes that its existence and methods will form a challenge to those intent on distributing SPAM to overcome its safeguardls. With this in mind, each of the following methods provides a greater or lesser obstacle to overcome. In cambinatiQn or r4tation it 08897572CA.
is believed that the obstacles will prove to be too great for a machine program to ove:rcolue.
Method # 1 ~ 'The system draws on a database of images that contains people, animals, azzd obj ects.
~ One or more of the obj ects, people or animals in these images convey either a hurnan errtotion or behaviotw yr are holding something distinctive etc. that is easy 1. U for the viewer to determine lay looking ax the image_ ~ Subtle features are ittroduced into the image to facilitate a richer ~teld of questions ;end answers.
The objects, people arid animals in the images may or may not be labeled with a name, nuwber ar both . ~ fn the case of the label being a number, the number is either random or is an extract of the unique identifier creased from the original message.
All labels ~~re obfuscated to defeat simple OCR attempts at deciphering.
The system draws on a database of introductory text to create the shell of the message ex.plainilig its l.~urpose ~ The system draws an a database of instructions as to how to respond to the challenge ~ Th'e instructions rnay be abfu$cated text or an audio file ~ The instruction set is lictl:ed to the image and both are linked to a table entry of correct answers ~ There is only one correct: answer to the challenge question..
The following are illustrative of the method.
What colour dress is the smiling girl wearing What is the crying boy holding What is 'the red hairecJ bay doing Who looks cold in this picture Who is walking the dog;
08897572C~~
~7 What are the children. looking at The atlswer is only obtainable in this method from the human cognitive skills of the viewer.
The challenge message is constructed from the above components and sent to the reply to address of the original message.
M fide .nod #~
~ The system extracts ki ~;ertain characters froru unique identifier of the original message ~ The system draws on a database o~ introductozy text to create the shell of the message e:xplai~nixtg it's purpose I 5 ~ The matching six characters are then retrieved from the obfuscated integer table ~ The sysien~ then cons'c:ttuts a. challenge message from the above components and sends it to the return address indicated in the original message The elegance of this method is that it does not require the system to store the 20 answer to the cEuestion since it is obtainable from the original method.
Methqgi #3 The system draws on a database of audio files that contains audio based 2!~ challenges i:n English arid other languages ~ The system draws on a database of introductory text to create the shell of the message exiplaining iI's raurpose ~ There is only one correct answer to the audio challenge question.
The system campilrs a challenge message fznm the introductory text plus the 30 audio file and sends it io the indicated reply address.
08897572C'.A
Method # 4 The system draws on a database of subjective text files that contain a paragraph of text &orn which the reader trust answer a question that requires them understanding the meaning of the paragraph itself ~ The syste~rn draws oxa .a database of ixitroductory text to create a shell of the message explaining its purpose.
~ There is only one correct answer to the challenge question . The cystran compiles :a challenge message from the about components and sends it to the indicated reply address.
The four methods are base upon' Sendiz~.g an image at-rd asking a question about the image requiring human percepvtion . Uses aia obfuscated message that requires human perception to decipher ~ Uses an audio file as part of the challenge message ~ Uses subjective text files and asks questions base upon the text 2~J An embodiment of the present invention provides a method of identifying desirable machine correspondents with two slightly different applications of the same mathematical model namel~r.3ayesian Inference. In embodituents of the invention we make reference; to Positive I:ecognizers, which is our implementation of Bayesian Inference. Irr this impleu~exrtation we are using the texrn. Positive Recogni2er to 2;~ describe the embodiment of the mathematical model in a piece of computer code.
The positive reeognzzer can be any type of text classifier including Bayesian filter, memory-based deci$aon-maker, case based-reasoner or neural network.
The implementation described hero uses a Bayesian flier but the same procedure is 3p identical for each of the other cases.
o~s9~s~zc,~
.~4 In embodiments of the present invention, the text classifier is trazaed to recognize in the ftxst instatnce, a specific correspondent from the nature and content of the correspondent's messages. fn the second instance, a series of text classifiers are uained to recognize a sl$ecif c category of correspondents based on messages from other members of the same category. Both of these are much simpler determinations and the accturacy rate is fat greater than the same approach used to recognize spare vs non-spam.
A Positive Reco~tizer is created on installation far each identified inforrrratio~a services, marling list and service message in the currant message corpus. This is done by sampling the curreztt message corpus for current information services and mailing lists ete. and creating a small definitive message corpus for each and in pari from a large existing; corpus of ;~reviousty identified information service and mailing list provider.
7.5 An entry describing each Positive IZ.ecognizer is added to the PR table ~ Correspondent ~ Origin Category Sub-Category Zt) ~ ~.acatian of classifier And the definitive message corpus is passed through the Positive Recognizer to note it with. ;authentic messages from this correspondent. Ta balaxtce this a similar volume of messages from a variety of sources is also passed through the Reeognizer 35 to train it with an imposter's rn.essages.
oss9~s~zc.~.
Cate;dory Speeifxe Positive Recognizers are intended to recognize a correspondeazt based on the theme of their conient_ fixamples of these are weather sErvices, stn~ck services, lottery services etc. where the theme of the content is very structured. One Positive Recognizer each for weather, sports, news, stocks, ring tones, horoscopes, mailing list axe created and trained with a representative sample pf several of each category. :For example, a corpus created from S different weather services ensua'es that the weather PR will recognize an infartxiation service providing weather infozmation.
Personal Correspondent specific PR are different from regular Correspondent speci~~c PR because they ate automatically created when X recipients subscribe and endorse the s,atne Correspondent, regardless of whether the Correspondent is human ar any c$tega:ry of m.achiTCe. This group of PR is intended to protect large numbers of users from SPAM attacks th1'augh an 1MPOSTER. A Personal Correspondent specific a S PR is created by the system automatically created the record acid the PR
acrd then fiutneling current message traffic: through it to train it to recognize future traffic.
The simplified description below illustrates in the most fundamental way how l3ayesian inference is used in embodiments of the present invention. Teaching this ?0 filter how to recognize eorrespordence from a specific sendei and generally how that is done is described herein below.
in the s,eeand instar~c~, we train a series of Positive Recognizers to recognize correspondents who are in a 1>articular category of correspondents, namely senders of ?:i stock, weather, sports information services. The approach is largely the same but in place of a corpus of messages frorxi a single previously identified correspondent we will use a coapus of messages, from a soup of several previously identified correspondents who are all »~ending the same kind of message. Otherwise the training is the same.
08897572CE~
Positive Recognizers calculate the probability of a message being in one of two cases based on its contents. Unlike simple content-based filters, Bayesian filtering learns from training against a corpus of x~essages previously identified as coming from a specific correspondent and another corpus of messages simply identified as specifically riot from that correspondent. The result is a very :robust, adaptive method far identi.fyistg messages :from a specific correspondent, with very low error rate.
Positive Recognizer~s are: a kind of acoting content-based filters that build the list identifying characteristics themselves both through their initial Training and updates provided by recipient input. In very basic terms to train the filter you stars with a (big) bunch of messages that you have classified as being from a speciFc correspondent, and another bunch of messages from a variety of other correspondents including Spate. The filters look at both, and analyze the specific correspondezzt messages as well as those: from outer sources to calculate the probability of various a! 5 characteristics appealing frot:n the specific correspondent or other correspondents.
The characteristics that a Positive R,ecognizer can look. at include the words in the body of tk~e message, ,~f course, and its headers {senders and message paths, for example!), but also other aspects such as HTML code (like colors), or even word pairs and phrases.
If a ward, "Cartesian" for example, never appears in the corpus from other correspondents. but often in the corpus from the specific correspondent, the probability of "Cartesian" indicating a message is from other correspondents is neat zero.
2:i "Toner", on th~° other hand, appears exclusively, and often, in the corpus the corpus from the specific correspondent. "Toner" has a very high probability of being found in the corpus froxr~ the specific correspondent, not much below 1 (100%)_ When a new message arrives, the Positive r2ecognizer analyzes it, and the probability of~ the complete message being from the specified correspondent is calculated using the individual characteristics.
~$$97S72G.A
I~ow that we have ,a classification, the message can be used to train the filter fiu-ther. Tn this case, either the probability of "Carresiatt" indicating messages from the specified correspondent is lowered (if the message containing both "Cartesian"
and "toner" is fbund to be ita the corpus of oRher correspondents), or the probability of "loner" indicating the specific correspondent must be reconsidered. This way Positive Recogni2ers can learn from. both their own decisions and the recipients' decisions (if they reclassify tentative classifications made by the filters).
Whenever a message is being delivered to a recipient because the systera has tentatively classified ii a~: acceptable and placed it in the user subscription list, an endorsement message is seat along with it. The tentative classification indicates that either the recipient has neh~e:c seen correspondence from this correspondent previously or they have seen some but root subscribed to further correspondence as yet.
:l 5 An example of a general fr~rxn of the endorsement message follows.
The system hes determined that this correspondent is not in your subscription list but ?0 ~ appears to be a ~AC_ HI~~. originated WEATfIER ~NFORlVIATID7~ SERVICE
message from ,WEATIaCp,Rl'u7~''L,A'NE7:ORC
Please do one of the. following 2:i i ,~~rdorse this classification thereby subscribing to future correspondence Reclassify and Bndarse the correspondent from this list of classifications *********
Declare the correspondent an Impaster Block the correspondent from future correspondence with you Read the correspondent wishes to simply read the message without comtmitting to any 30 I other action os$~~s~a~A
zx If the recipient endorses the classification then the cdrt'espondent's entry in the recipient's subscription list is modified to subscribed.
If they recipient reclassifies and then Endorses the correspondent then the correspondents entt'y in the recipient's subscription list is modified to the correct category and sub-category and 'the entry modified to subscribed- this correction is sent to the appropriate Positive R_ecognizer as an adaptive tx~inzng element.
If the recipient declares the correspondent an importer then the correspondent lU is removed from the recipient's ;subscription list, the message is sent to the global bad corpus for adaptive training.
If the recipient simply blacks the correspondent then the correspondent is moved fxotn the recipient's. subscription list to their deny list_ If the correspondent is tentatively classified or the recipient reclassifies the correspondent as a service advisor then Endorsement does not add. the correspondent to the RSL.
If the c~~rrespondent simply reads the message r~o action is taken.
As a xr.~essage is passed Through the system a memory block is constructed consisting of correspondent address, recipient address, origizt, tentative category of message, tentative sub-category of the message. The positive recogni2ers or the auto-2'i responder determines the origin (human or machine). The category is drawn from the Positive Ciassi:fiers and the specific text from lookup table ~1. The sub-category is drawn from th.e Positive t:I~tssitiers. If a recipient has not previously received correspondence from this Correspondent then this information is preserved to the user in the form of the endorsement message described above. The zttessage is presented 3U as either an ht:rnl raessage o~~ a very str~tct~ured text message, In either case the recipient is given the opportunity to respond. In the case of the html message they can respond by simply clicking on one or more links within the message itself.
If xhe 08897S72C.4 lirrh chosen is recl~sify then a drop Gown. list of other possible-classifications is presented 1'or the recipient to choose to correct the tentative classification by the system If the recipient Gannox read htxnl rt-aessages as is the case ors a wireless device the same me~s5age contains text that accomplishes the same purpose but in this case the recipiept is asked to reply to this message with a keyword.
The system table ci.e~initions are provided in Tables A -- T~_ IO
Table A - Giobal )peny List CorrespondEat ~
Origin - ..~~ (value human~rrxachine) ~
~
Category ~ (value from lookup table 1) ~ ~
Sub-category (value from lookup table 2) ._ Status (permit or denied) CreatioD Date Table 8 - PU;~ITIVE
RECOGNTZER
Correspondent -, Origin (value human/machine) Category ~ ~ (value from lookup table 1 }
Sub-category ~ (value from lookup cable 2) C~assifler IAeiltiOD
Creation pate .---_._.
~ Table C - R~C.'IPIENT SXJ'$SCRIEPTIQN LIST -_ _ _ Corrsspondena Origin ( (value hurnanlmachine) i Category I (value from leokup cable 1 ) Sub-categor3~ ~~ (value from lookup Xable ?}
Method ' a~ V (recipient/autoreaponder/PR/cndorsement) -Subscription lave! (tentative,subscribed) TABLE D -1:...ookup Tables ~
~.ooi'~~ T~ble #1 ~..ooku~Table #2 Mailing List hl-ews ~
Information Service Weather System Message Sports ~ ~
Service Mess~~ge Lottery results HorPSCOpeS
____.__-~ Stocks i .__~_ _ Wing Tanes Graphics The embodiments of the present invention are described herein above in the 5 context of az-~ ernail message implementation. However, the present invenrion is equally applicable to the Short Message Service (SMS). SMS is the transmission of short text me:~sages to and :From a mobile phone. Messages must be no longer than 160 alpha-tzurneric characeers and contain no images or graphics. Thus in the SMS
world cell phone owners subscribe tn a service allowing there to have an email 1.G address such as 555-1212(i~wireressprovider.com. Depending on the implementation, this type of service breaks down an email message into one or more 160 character pieces and delivers them tc~ the wireless handset.
A large cellular provider can have 1 million subscribers using such a service_ 15 In large part these services do not provide any control as to v~ho can send a message to a cell phone, hence the possibility of receiving large volumes of correspondence from unrecognized or unwanted correspondents is high. Currently there is no 08897572CA.
mechanism l:or the subscriber to provide any feedback to the provider to block unwanted coiTesporidettce:.
An eXnbodiment o-f the present invention provides can a complete and robust solution for the cellular sttbscribsr and provider. The embodiment thus far described remains in place, however where messages are directed to Segment 300 in the Email implementation, for the StV;S implementation messages are directed to a segment 50.
l.2eferting to Fig, S, there is illustrated a second erx~bodiment of the present invention For .application to messaging in a wireless communications environment.
This embodiment of the present invention not only extends the full functionality oaf the correspondent classifier to the wireless handset but also provides the carrier with an additional. revenue source:. Each command initiated by the recipient 1.5 consumes an SMS message, which costs the subscriber some nominal fee, for example $ 0.1-Segment SO is itnpletnented as a two-way SMS application that serves as a message parser and comrxtaud translator between the SMTP Gateway 101 and the entire application that follows and the Short Message Service Center (SMSC) in a wireless network. Tt includes some specif c wireless features not found or required in the main application body.
Segme~it SO begins with an SMSC module called DLYNX that breaks the 2:5 message down, imo i 60 character component parts. D~.'~'NX is inserted into the dialogue stream between the ernail gateway and the SMSC.
DLYhTX recognizes the following command set. The command set is extensible with additional commands capable of being added without interfering with the main body of logic 1. Read 2. More 3. Reply 4. Forward 5. Block 6, Reclassify 7. Subscribe 8. Alias
autorespond~er segment a.s human but again this paz'ticular recipient has not seen messages from this correspondent before. In each case the subscription level will be tentarive.
The :message is stn~ctured to describe whether the origin is human or machine and if machine the type of message the system thinks this is (i.e. a rriailing list or information service), thcs <~ub-category of the message (news weather spores, lottery results ete.) and the correspondent's address. The recipient is asked to conf rm or reclassify the tuessage category and whether they wish to endorse the tentative system classification by subscribing; to further correspondence. if the intended recipient (the IO user) responds with Bndorse, then the Correspondent will be added to the RSL as machine originated sul:Rsc:rib~~d or humaza subscribed and the message will be delivered.. I:F the intended recipient (the user) responds with reclassify as then -the correspondents entry in the RSL is modified accordingly. If the intended recipient (the user) responds wu:h l:mposter, then the correspondent is rerraoved from the subscription list and the message is sent to the CrBC. Tf the intended recipient Ithe user) responds with Block, then the message is sent to the trash arad the correspondent is added to the RDL
Positzve Recogtrizers are created in part initially by sarupling the existing inbound message corpus as well as an existing corpus of information services and ?p mailing lists. A single laR is created for each correspoxident to be recogcii2ed. These may be individuals, information services such as alerts~ahnsn.com or retailing lists such as issu~s~~idl.com or service messages from such globally popular phenomena as ebay TM :outbid notices.
Tn tin implementation of the present invention, an individual PR is created far most popular inf'ormaiic~u service types and globally popular correspondents.
For example, weather, sppna~ stocks. rind i~ne~_ ~lPf1 ?hP evrct,am rc.n~r~.,..m~.
..-..7 authenticates. an arbitrary inbound l7nessage as matching one of these gmup or type PR, it is able to ;pass this information on to the recipient to endorse. if sufficient individuals respond to the system endorsement message for inbound messages of this o~ss7s?2c~, la type with F?NDORSIr then a new correspondent specific PR will be created automatiealhrbut only at ~,he user level.
In a l;eneral implementation, the PR is shown a large amount of appropriate mail from the target correspondent as well as a large amount of mail not from the correspondent so it can then readily identify valid content from this target correspondent. In the case of an informauan service, the initial group of P.P~
can be augmented in a number of ways. First the system administrator can create an additional F'Ft~ when a rtew widely used hailing list or information service is identified, A new correspondent specific PR is created when the number of individual recipient entries for this cc:~rrespcrndent exceeds a system adrriinistrator set threshold.
Referring to Fig- 2, there is illustrated in a ilow chart further detail of the apparatus and method of F'i~g- 1. Fig. 1 shows an inbound message coming from a L 5 sending SM7.'P server f 00 and received by the system. SMTP server 101.
Correspondent: and recipient addresses are extracted and placed in memory.
Headers are examined .for a valid challenge signature from another challenged based MTA, if there is a valid challenge signature, the message is delivered 900. rf there is no valid challenge signature then processing continues in block 102.
The correspondent address is compared against the GSDL 102 for a match.
Messages from correspondents that match entries in the GSDL 102 are sent to the Global Bad Corpus (GBC) ?00 and processing ends. Messages not matching entries in the GSDL 102 continue to be processed in segment 200. Tl~e purpose of segment 2:5 200 is to idem:ify and classify valid machine originated correspondents and ensure their messages are not forged. The correspondent address is compared farst with entries in the c~~trespondent specific PR table 200 to see if sn entry exists.
If there is no matching entry the message is passed to segment 400.
3C1 If there is a matching entry then the origin, category, sub-category and classifier location are noted :in memory and the message is passed through the specif c PR 201 to aut:~mnticate the correspondent. F:f the content does not match, then The ossg~s~2Cp, correspondent is determined to be an imposter and the message is sent to the GBC
700 and processing ends.
If the message content matches then the correspondent is authenticated and S processing continues to qt.~ery ?~~2. The system checks to see if the correspondent is in the intended recipient's (the user) RDL as represented by query ?02. If it is, then the message is sent to Trash 8001 and processing stops.
If it is not in the R.)~L, then processir~i~ continues and the system looks in the ,tU RST. as represented by query 203 for a match on the correspondent. if there is a match on the correspondent: :in the RSL and the subscription level is subscribed as determined hay a query 205, then processing continues in segment 500. if there is a match in the aZSI. but the subscription level, is rentarive then processing continues in segment 300. hinally, if tYlere is no match in, the RSL then the correspondent is added 1 S to the recipiern's subscription list as machW e.tentative 204 and processing continues in segxneni 30~).
Segme:3t 300 pr4vides an opportunity for the intended recipient to either endorse or cax-,rect the classification of Segment 200. Allowing the system to adapt its 20 future classific:atious. The components of segment 300 of digs. 1 and 2 are shown separately in :dig. 3.
Segment 300 begins with a message being passed from segment 200 or 500.
These messages have been Gtlready classified by the system as either having been 2.i received from a mcogni2ed pre-viously seen correspondent or an identified but previously tin:>een coFreslaondeait with authentic content ar classified by the autoresponder as human and previously unseen by this recipient. In this case previously secr.~ means by they system not necessarily by this recipient arid autheauc content means the message xnatched an established correspondent category.
In each case the intE:nded recipient (W a user) has not seen previous messages from this correspondent. The system prepares an endorsement request 300 to send to osss~572cA.
1~
the intended recipient (the user). The endorserrtent request contains system specific text plus the tentative system classification of this correspondent as:
reeogni~ed or ttslrecognized human or machine and if machine content category and sub-category.
The intended recipient receives the message acid is provided with a means to respond 301. If the recipient responds with ENDORSE, they are indicaring that they agree with the correspondent classification and accept this message and wish to receive further correspondence front this correspondent. The correspondent's entry in their RSL 3Ct6 is modified to machine.s~ubsoribed ox~ human.subscribed and the 1. G message is del',ivered If they respond with I~loclc then the message is placed in the trash 800, the correspondent is removed .fxom the RST~ and placed in the RDL 303 and processing stops. If they respond witEt 7CNfPCJSTER they are indicating that they did not request this message and or that it is not what it claims to be (in other words a forgery). The message is sent to the G~3C' '7~Op and processing stops.
Throughout the system, each timE: the recipient reclassifies a tentative classification h~y the system, the system adapts to the additional input by sending the reclassified data back through the specific classifier fear updated training.
Segment 400 begins with the message being received from segment 200. It has not matched any carresponclent specific PR and is not in the GSDL. The message is passed through each of the' category specific pRs looking for a category match. If there is a category match t:he.n processing continues to query 202, witli the system noting the category, sub-category along with the correspondent and recipient information.
Finally:, ;if there is no category specific matching entry, processing continues to query 401, where the systeru Looks for a match on the RSL. If no match is fotmd, processing continues in query 402 where the system checks further to see if there is a partial match on domain or clomairt.subdomain. If there is, then the system checks the 08897572Cr~
characterisxita of the message and the recipients message history to see if the subscriber leis sent messages there recently and determines if this could be a service message. .If there is a xnatch then processing continues in segzne~xt 300. If there is no match, then processing continues in segment 600. If a match is found, then S processing continues in query 205 with the system checking to see if the RSL
entry is subscribed.
If the correspondent entry is subscribed then the processing continues in guery 206 with the system checking for the correspondent in the Correspondent Specific Personal Recogcti2er table::. If no snatch is found the message is delivered to the recipient 900, If a match fox the correspondent is found in the Correspondent Specific Personal Recognizer tables tY~en t:lie content of the message is checked for authenticity.
If there is a match then it is yeexned to be authentic and the message is delivered 900.
If there is xio match then. the correspondent is deemed to be an imposter and the Z > message is serif to the G$C.". 700 and processing stops.
If the ~:orrespondem is not subscribed then processing continues in segment 300.
2n The system checks tc> see if there as a corresporidezlt specific entry in the Recipient level PlZ table 500, if there is a matching entry then the message is passed through the spe:cifie PR 501. tc~ authenticate the cartespondent by matchipg the content of this message against t#e content of previous messages lmown to have originated from this coxxe:;poudent. If there is no rru~ttch then the correspondent is found to be an 2.~ itnposter and the message is sent to the G~~C 900 and processing ends.
If the message content matches existing content from this correspondent then the processing c:Antinues to l-~lock X02 where the correspondent is added to the RSL as hurxran.tentative and processing continues in segment 300.
Segment 800 begitzs with the message being passed franc segrrtent 200 after a yes from query 205. The system checks in. the Correspondent Specif a Recipient 08S97S72C'~~
1$
Level Positive Recog~er table_ If there is a matching entry then the message is passed through the specific PR_ 801 to compare the content of the message io those lanown to be from this eortespondent. if the content does not match the correspondent is determined to be ~n inaposter, the message is sent to the CrBC 700 and processing ends. if the xr~essage content matches then processing continues to delivery 900.
Segment 600 begins with the message being received firorti Segment 440.
There is no ~maiehing enter in the RSL and processing continues with the systerrl originating a message to the purported correspondent 600. The message contains system specific information plus an image or images or an audio f le together with a subjective qua=soon in either text or audio form for the cvrresponderlt that requires a perceptive being to interpret. If there is no or a bad response then the message is seat to the GBC '740 and processing stops. If there is a correct response then processing continues and the system hates classifies this correspondent as human- The system 1 S then checks in the RDL 601. If there is a match to the correspondent in the RDL then the message is sent to the ''trash 804 and processing stops. If there is no match in the RDL then processing is passed back to segment 500.
Referring to Fig. 4~, there is illustrated in a flow chart segment 604. The system extracts the correspondent, recipient and reply to address fiom the original message 601.. :Processing continues in query 642 with the system checking to see if there is a reply to address present. Tf NO then the message is sent to G>3C
700 and processing ends. If YES then processing continues in block 603 with the system selecting standt~rd components of the original message, creating a unique identifier.
Processing continues in block 644 with the system randomly selecting a challenge meth~~d and creating; a challenge message which includes a unique identifier from the arigin.al message. '1,"he system stores the messaged-pointer to the original message and the; pointer to the answer to the challenge in the challenge message table and parks the original znessat;e in the holding queue b46. Processing continues in 606a with the challenge message being digitally signed so other challenge-based systems can recognize it and mute it aGCOrdingly 606a. The challenge message is sent 0889fi572CW
to the recipient 607. Processing resumes in 608 with the system listening for a reply to the chahenge message on designated CGrI ports and the SIvfTP daemon.
If no reply is rt:c~;ived 609 then the message is ' sent to , GBC 700 and processing ervds. if a reply is received then the system extracts the unique identifier and uses it to lookup the answer to the challenge message in the challenge message table 6I0.
if the answer is correct then the correspondent is added to the subscription a() table as huma:n.tentative and the message is send to 501. If the answer is incorrect the message is sent t0 the GBC and rao action taken on the correspondent.
The present invention provides a method of identifying whether a particular correspondent is a machine or not a machine through a challenge response model.
This alone however is insufficient to implement in the real wand where a typical message recipient regularly rPCeives desirable messages from rtxachixtes in the form of mailing-lists, information services, receipts and service messages from the machines they are attemlrting to correspond with-The purpose of the auto-responder segment is to categorize correspondents into two categories. l~iuxnan and non-human. By the time a message has reached the auto-responder it has already passed through processes that attempted to attribute the message to spe~~ific hurrcan or machine correspondents that the system is aware of or to a parucuiar nategory of ixtachines that the system is aware of and to-authenticate them.
~iaving :failed to be classified by these above-rnentir~ned processes, the auto-responder proceeds to cha.ale:nge the correspondent to prove- helshe is human.
Embodiments of the present invention employ several different methods to accomplish this task, which ~~re used singly or in combination.
08897S72CA, On arrival in this se~rrent tire inbound message is processed to derive:
1. The correspondent address;
2. The reeipi~:nt. address;
S 3. The return address;
4. The overall su2e of the message;
5. Unique identil°ier associated with the message.
For each raethod, pmcessing proceeds as follows:
~a I- A challenge rtzessage is created using one of the methods listed below.
2- The challenge message is digitally signed.
3- The challenge: message is seat to the indicated return address of the original message.
1 S 4- The auto-responder listens for replies to the message eithes as an inboun~~ message to a specified address or as a CGI response to a link embedcied within the message.
S- If there is na return address pmcessing stops and the message is dropped.
20 6- If no response or an incorrect response to the challenge is received within E~ system definable period the cot~espondent is declared to be a machine incapable of a respa.nse and depending on the implementation the message is eixher cropped or placed in. a holding queue.
7- l.f the correct respo:ase to the challenge is received the correspondent is declaredl to be hurri.an and is added to the recipients subscription list as human.tentative and the message is passed to the segment 300 fox delivery along with an endorsement rzquest.
8- This eznbodi~nent of the invention assumes that its existence and methods will form a challenge to those intent on distributing SPAM to overcome its safeguardls. With this in mind, each of the following methods provides a greater or lesser obstacle to overcome. In cambinatiQn or r4tation it 08897572CA.
is believed that the obstacles will prove to be too great for a machine program to ove:rcolue.
Method # 1 ~ 'The system draws on a database of images that contains people, animals, azzd obj ects.
~ One or more of the obj ects, people or animals in these images convey either a hurnan errtotion or behaviotw yr are holding something distinctive etc. that is easy 1. U for the viewer to determine lay looking ax the image_ ~ Subtle features are ittroduced into the image to facilitate a richer ~teld of questions ;end answers.
The objects, people arid animals in the images may or may not be labeled with a name, nuwber ar both . ~ fn the case of the label being a number, the number is either random or is an extract of the unique identifier creased from the original message.
All labels ~~re obfuscated to defeat simple OCR attempts at deciphering.
The system draws on a database of introductory text to create the shell of the message ex.plainilig its l.~urpose ~ The system draws an a database of instructions as to how to respond to the challenge ~ Th'e instructions rnay be abfu$cated text or an audio file ~ The instruction set is lictl:ed to the image and both are linked to a table entry of correct answers ~ There is only one correct: answer to the challenge question..
The following are illustrative of the method.
What colour dress is the smiling girl wearing What is the crying boy holding What is 'the red hairecJ bay doing Who looks cold in this picture Who is walking the dog;
08897572C~~
~7 What are the children. looking at The atlswer is only obtainable in this method from the human cognitive skills of the viewer.
The challenge message is constructed from the above components and sent to the reply to address of the original message.
M fide .nod #~
~ The system extracts ki ~;ertain characters froru unique identifier of the original message ~ The system draws on a database o~ introductozy text to create the shell of the message e:xplai~nixtg it's purpose I 5 ~ The matching six characters are then retrieved from the obfuscated integer table ~ The sysien~ then cons'c:ttuts a. challenge message from the above components and sends it to the return address indicated in the original message The elegance of this method is that it does not require the system to store the 20 answer to the cEuestion since it is obtainable from the original method.
Methqgi #3 The system draws on a database of audio files that contains audio based 2!~ challenges i:n English arid other languages ~ The system draws on a database of introductory text to create the shell of the message exiplaining iI's raurpose ~ There is only one correct answer to the audio challenge question.
The system campilrs a challenge message fznm the introductory text plus the 30 audio file and sends it io the indicated reply address.
08897572C'.A
Method # 4 The system draws on a database of subjective text files that contain a paragraph of text &orn which the reader trust answer a question that requires them understanding the meaning of the paragraph itself ~ The syste~rn draws oxa .a database of ixitroductory text to create a shell of the message explaining its purpose.
~ There is only one correct answer to the challenge question . The cystran compiles :a challenge message from the about components and sends it to the indicated reply address.
The four methods are base upon' Sendiz~.g an image at-rd asking a question about the image requiring human percepvtion . Uses aia obfuscated message that requires human perception to decipher ~ Uses an audio file as part of the challenge message ~ Uses subjective text files and asks questions base upon the text 2~J An embodiment of the present invention provides a method of identifying desirable machine correspondents with two slightly different applications of the same mathematical model namel~r.3ayesian Inference. In embodituents of the invention we make reference; to Positive I:ecognizers, which is our implementation of Bayesian Inference. Irr this impleu~exrtation we are using the texrn. Positive Recogni2er to 2;~ describe the embodiment of the mathematical model in a piece of computer code.
The positive reeognzzer can be any type of text classifier including Bayesian filter, memory-based deci$aon-maker, case based-reasoner or neural network.
The implementation described hero uses a Bayesian flier but the same procedure is 3p identical for each of the other cases.
o~s9~s~zc,~
.~4 In embodiments of the present invention, the text classifier is trazaed to recognize in the ftxst instatnce, a specific correspondent from the nature and content of the correspondent's messages. fn the second instance, a series of text classifiers are uained to recognize a sl$ecif c category of correspondents based on messages from other members of the same category. Both of these are much simpler determinations and the accturacy rate is fat greater than the same approach used to recognize spare vs non-spam.
A Positive Reco~tizer is created on installation far each identified inforrrratio~a services, marling list and service message in the currant message corpus. This is done by sampling the curreztt message corpus for current information services and mailing lists ete. and creating a small definitive message corpus for each and in pari from a large existing; corpus of ;~reviousty identified information service and mailing list provider.
7.5 An entry describing each Positive IZ.ecognizer is added to the PR table ~ Correspondent ~ Origin Category Sub-Category Zt) ~ ~.acatian of classifier And the definitive message corpus is passed through the Positive Recognizer to note it with. ;authentic messages from this correspondent. Ta balaxtce this a similar volume of messages from a variety of sources is also passed through the Reeognizer 35 to train it with an imposter's rn.essages.
oss9~s~zc.~.
Cate;dory Speeifxe Positive Recognizers are intended to recognize a correspondeazt based on the theme of their conient_ fixamples of these are weather sErvices, stn~ck services, lottery services etc. where the theme of the content is very structured. One Positive Recognizer each for weather, sports, news, stocks, ring tones, horoscopes, mailing list axe created and trained with a representative sample pf several of each category. :For example, a corpus created from S different weather services ensua'es that the weather PR will recognize an infartxiation service providing weather infozmation.
Personal Correspondent specific PR are different from regular Correspondent speci~~c PR because they ate automatically created when X recipients subscribe and endorse the s,atne Correspondent, regardless of whether the Correspondent is human ar any c$tega:ry of m.achiTCe. This group of PR is intended to protect large numbers of users from SPAM attacks th1'augh an 1MPOSTER. A Personal Correspondent specific a S PR is created by the system automatically created the record acid the PR
acrd then fiutneling current message traffic: through it to train it to recognize future traffic.
The simplified description below illustrates in the most fundamental way how l3ayesian inference is used in embodiments of the present invention. Teaching this ?0 filter how to recognize eorrespordence from a specific sendei and generally how that is done is described herein below.
in the s,eeand instar~c~, we train a series of Positive Recognizers to recognize correspondents who are in a 1>articular category of correspondents, namely senders of ?:i stock, weather, sports information services. The approach is largely the same but in place of a corpus of messages frorxi a single previously identified correspondent we will use a coapus of messages, from a soup of several previously identified correspondents who are all »~ending the same kind of message. Otherwise the training is the same.
08897572CE~
Positive Recognizers calculate the probability of a message being in one of two cases based on its contents. Unlike simple content-based filters, Bayesian filtering learns from training against a corpus of x~essages previously identified as coming from a specific correspondent and another corpus of messages simply identified as specifically riot from that correspondent. The result is a very :robust, adaptive method far identi.fyistg messages :from a specific correspondent, with very low error rate.
Positive Recognizer~s are: a kind of acoting content-based filters that build the list identifying characteristics themselves both through their initial Training and updates provided by recipient input. In very basic terms to train the filter you stars with a (big) bunch of messages that you have classified as being from a speciFc correspondent, and another bunch of messages from a variety of other correspondents including Spate. The filters look at both, and analyze the specific correspondezzt messages as well as those: from outer sources to calculate the probability of various a! 5 characteristics appealing frot:n the specific correspondent or other correspondents.
The characteristics that a Positive R,ecognizer can look. at include the words in the body of tk~e message, ,~f course, and its headers {senders and message paths, for example!), but also other aspects such as HTML code (like colors), or even word pairs and phrases.
If a ward, "Cartesian" for example, never appears in the corpus from other correspondents. but often in the corpus from the specific correspondent, the probability of "Cartesian" indicating a message is from other correspondents is neat zero.
2:i "Toner", on th~° other hand, appears exclusively, and often, in the corpus the corpus from the specific correspondent. "Toner" has a very high probability of being found in the corpus froxr~ the specific correspondent, not much below 1 (100%)_ When a new message arrives, the Positive r2ecognizer analyzes it, and the probability of~ the complete message being from the specified correspondent is calculated using the individual characteristics.
~$$97S72G.A
I~ow that we have ,a classification, the message can be used to train the filter fiu-ther. Tn this case, either the probability of "Carresiatt" indicating messages from the specified correspondent is lowered (if the message containing both "Cartesian"
and "toner" is fbund to be ita the corpus of oRher correspondents), or the probability of "loner" indicating the specific correspondent must be reconsidered. This way Positive Recogni2ers can learn from. both their own decisions and the recipients' decisions (if they reclassify tentative classifications made by the filters).
Whenever a message is being delivered to a recipient because the systera has tentatively classified ii a~: acceptable and placed it in the user subscription list, an endorsement message is seat along with it. The tentative classification indicates that either the recipient has neh~e:c seen correspondence from this correspondent previously or they have seen some but root subscribed to further correspondence as yet.
:l 5 An example of a general fr~rxn of the endorsement message follows.
The system hes determined that this correspondent is not in your subscription list but ?0 ~ appears to be a ~AC_ HI~~. originated WEATfIER ~NFORlVIATID7~ SERVICE
message from ,WEATIaCp,Rl'u7~''L,A'NE7:ORC
Please do one of the. following 2:i i ,~~rdorse this classification thereby subscribing to future correspondence Reclassify and Bndarse the correspondent from this list of classifications *********
Declare the correspondent an Impaster Block the correspondent from future correspondence with you Read the correspondent wishes to simply read the message without comtmitting to any 30 I other action os$~~s~a~A
zx If the recipient endorses the classification then the cdrt'espondent's entry in the recipient's subscription list is modified to subscribed.
If they recipient reclassifies and then Endorses the correspondent then the correspondents entt'y in the recipient's subscription list is modified to the correct category and sub-category and 'the entry modified to subscribed- this correction is sent to the appropriate Positive R_ecognizer as an adaptive tx~inzng element.
If the recipient declares the correspondent an importer then the correspondent lU is removed from the recipient's ;subscription list, the message is sent to the global bad corpus for adaptive training.
If the recipient simply blacks the correspondent then the correspondent is moved fxotn the recipient's. subscription list to their deny list_ If the correspondent is tentatively classified or the recipient reclassifies the correspondent as a service advisor then Endorsement does not add. the correspondent to the RSL.
If the c~~rrespondent simply reads the message r~o action is taken.
As a xr.~essage is passed Through the system a memory block is constructed consisting of correspondent address, recipient address, origizt, tentative category of message, tentative sub-category of the message. The positive recogni2ers or the auto-2'i responder determines the origin (human or machine). The category is drawn from the Positive Ciassi:fiers and the specific text from lookup table ~1. The sub-category is drawn from th.e Positive t:I~tssitiers. If a recipient has not previously received correspondence from this Correspondent then this information is preserved to the user in the form of the endorsement message described above. The zttessage is presented 3U as either an ht:rnl raessage o~~ a very str~tct~ured text message, In either case the recipient is given the opportunity to respond. In the case of the html message they can respond by simply clicking on one or more links within the message itself.
If xhe 08897S72C.4 lirrh chosen is recl~sify then a drop Gown. list of other possible-classifications is presented 1'or the recipient to choose to correct the tentative classification by the system If the recipient Gannox read htxnl rt-aessages as is the case ors a wireless device the same me~s5age contains text that accomplishes the same purpose but in this case the recipiept is asked to reply to this message with a keyword.
The system table ci.e~initions are provided in Tables A -- T~_ IO
Table A - Giobal )peny List CorrespondEat ~
Origin - ..~~ (value human~rrxachine) ~
~
Category ~ (value from lookup table 1) ~ ~
Sub-category (value from lookup table 2) ._ Status (permit or denied) CreatioD Date Table 8 - PU;~ITIVE
RECOGNTZER
Correspondent -, Origin (value human/machine) Category ~ ~ (value from lookup table 1 }
Sub-category ~ (value from lookup cable 2) C~assifler IAeiltiOD
Creation pate .---_._.
~ Table C - R~C.'IPIENT SXJ'$SCRIEPTIQN LIST -_ _ _ Corrsspondena Origin ( (value hurnanlmachine) i Category I (value from leokup cable 1 ) Sub-categor3~ ~~ (value from lookup Xable ?}
Method ' a~ V (recipient/autoreaponder/PR/cndorsement) -Subscription lave! (tentative,subscribed) TABLE D -1:...ookup Tables ~
~.ooi'~~ T~ble #1 ~..ooku~Table #2 Mailing List hl-ews ~
Information Service Weather System Message Sports ~ ~
Service Mess~~ge Lottery results HorPSCOpeS
____.__-~ Stocks i .__~_ _ Wing Tanes Graphics The embodiments of the present invention are described herein above in the 5 context of az-~ ernail message implementation. However, the present invenrion is equally applicable to the Short Message Service (SMS). SMS is the transmission of short text me:~sages to and :From a mobile phone. Messages must be no longer than 160 alpha-tzurneric characeers and contain no images or graphics. Thus in the SMS
world cell phone owners subscribe tn a service allowing there to have an email 1.G address such as 555-1212(i~wireressprovider.com. Depending on the implementation, this type of service breaks down an email message into one or more 160 character pieces and delivers them tc~ the wireless handset.
A large cellular provider can have 1 million subscribers using such a service_ 15 In large part these services do not provide any control as to v~ho can send a message to a cell phone, hence the possibility of receiving large volumes of correspondence from unrecognized or unwanted correspondents is high. Currently there is no 08897572CA.
mechanism l:or the subscriber to provide any feedback to the provider to block unwanted coiTesporidettce:.
An eXnbodiment o-f the present invention provides can a complete and robust solution for the cellular sttbscribsr and provider. The embodiment thus far described remains in place, however where messages are directed to Segment 300 in the Email implementation, for the StV;S implementation messages are directed to a segment 50.
l.2eferting to Fig, S, there is illustrated a second erx~bodiment of the present invention For .application to messaging in a wireless communications environment.
This embodiment of the present invention not only extends the full functionality oaf the correspondent classifier to the wireless handset but also provides the carrier with an additional. revenue source:. Each command initiated by the recipient 1.5 consumes an SMS message, which costs the subscriber some nominal fee, for example $ 0.1-Segment SO is itnpletnented as a two-way SMS application that serves as a message parser and comrxtaud translator between the SMTP Gateway 101 and the entire application that follows and the Short Message Service Center (SMSC) in a wireless network. Tt includes some specif c wireless features not found or required in the main application body.
Segme~it SO begins with an SMSC module called DLYNX that breaks the 2:5 message down, imo i 60 character component parts. D~.'~'NX is inserted into the dialogue stream between the ernail gateway and the SMSC.
DLYhTX recognizes the following command set. The command set is extensible with additional commands capable of being added without interfering with the main body of logic 1. Read 2. More 3. Reply 4. Forward 5. Block 6, Reclassify 7. Subscribe 8. Alias
9. Number o:n/off
10. Retrieve address ;l Q
When DLYN'X receives a message destined for a wireless handset it has already passed through all of the detection mechanisms and is either a message already in the recipients subscription list us subscribed or a tentative entry in that list.
DLYNX
either sends the first part of the message to the subscriber and advises them there are other parts OF: DLYN3~ sends an endorsement/read request to the subscriber advising them of the correspondent., whether they are in their subscription list or tentative and whether they ;ire human, a mailing list, information service. The recipient can reply with ariy of the keywords plus an argument. DLYNX receives the reply plus argument and carries out the instruction on the subscriber's behalf supplying the message components and updating subscription and deny lists.
~ Read. 'This command is valid in the context of an endorsement/read message. 'Fhe handset sends this as an SMS message instructing DLYNX that the subscriber wants to read the first message part 2.5 DLYNX x~ponds by setlding the first message part and advising how many more r~er~nain 110.
~ More. This conunand is in the context of reading a multipart message.
The handset sends this as and SMS message and instructs DLYNX to send the ne~;t nZessage part. DLYNX respartds by sending the next message part and advising how many more remain (e.g., 2/10) ~ Reply. This command takes an argument of text and the handset sends the command arid the text argument to DLYNX indicates that the recipient i~~ sending this directly io the originator of the message.
DL'YNX talcea the new text, looks up the corespondents address and scads the ir,~e:;sage back with our without the original message body.
~ Forward. ',1 his command takes an argut~ent of an email address and the taandset sends the comrn.and and the email address to DLYNX
indicating ~io forward the just received message to the new ernail address. r7I...Y1~1X parses the message extracting the email address and l:0 executes the command.
~ Block. This command is in the context of the just read message or on receipt of a~a endorsement~read request. The handset sends the command to I)LYNX instructing DT.YNX to remove this correspondent frorta the subscription list and add him to the Deny List.
15 DL'YhTX retrieve , the oti,ginal message, extracts the correspondent information and executes the command.
Reclassify. 7.'his cornrnand takes an argument of any of the valid Argirz.cate,8ary.subcategory. The handset sends the command and the 20 argument to DLYNX. DLYNX parses the message and reclassifies the correspondetxt according to the new classification sent in the arguzneztt ~ Subscribe. ".this command with no argument takes the correspondent of the Last message and places him in tlae subscription list as 2:i subscribed. 'rrtis command with the argur;xent of an email address :places that cnrrespandent in the subscription list as subscribed.
:DLY'NX paT~~es the message extracting the argument or retrieving the ~~riginal coTr~~spondcnt address and executes the command placing the correspondent in the subscription list as subscribed 3t1 0889'757?CA.
Alias. 'Th~~ typical implementation of email in a wireless gateway is n ~r c~ox~. This is afters inconvenient for a subscriber to use and may lead to larger volumes of unwanted correspondence. This command tal~:es the argument of an email alias and the handset sends the cornmarid with the argument to DLYNX to provide for mail to be received at ~ ias andomain as well as number a~.~c~omain _ DLYNX
parses the message extracting the proposed etrlail alias. It then looks in the user table to see if it already exists. if it does not it assigns that alias to be associated with that Tecipient alolig with their normal 'l0 nu~nberCr,~.dontain address.
~ Number. This command takes the argument of either ON or OFF. The handset sends. the command with the argument to DLYNX to turn on or off the alaility for messages to be received at ~~mbc~~a domain.
~ DLYNX parses the message extracting the argument and either adds or rerrtoves the facility for this recipient to receive messages at nu herla7dc~~ain.
Retrieve Address. Messages received on a wireless handset typically do not iziclude the return address of the correspondent. Simply 30 executing the reply command sends a new message back along the path to the original correspondent. This comxrxand permits the holder of the handset to nxiract the original email address from the body of the email m~bssage. The handset sends this command to DL'YNX and ;DLYNX ret~~ieves the correspondents email address from the body of 2.i the message and sends it as and SMS message to the recipient handset.
A,s this system reduces the volume of unwanted messaging traff c, it has the effect of redvciug disk si:orage requirements, and server-to-client download bandwidth. Bath of these evescources are Eitr~er paid for by the service provider or subscriber, anal provide clear benefits: faster, more useful mail with lower associated overhead costs.
Finally, messaging; carriers subscribing to the "sender-pays" fee model stand to benefit greatly from this embodiment of the present invention. This embodiment S tends- to increase the I>rofitable subscriber-originated nraffic, while delivering unsolicited, unwanted messages would yield no income at all and raise operating costs. A prima example ol:'this business paradigm exists in the North American SMS
market, where SMS catx:iers ate forced by market conditions to deliver inbound (mobile-iertni:nated) traffic at no charge, yet command a very lucrative fee for 'l0 subscriber-originate messages. This embodiment, then, is doubly beneficial for '°sender-pays'' carriers, as it reduces operating overhead by dramatically reducing the amount of unwanted traffic they musi store arid deliver, and it encourages subscribers to originate more messages; both by making their message system more useful, and the subscribers' desire to itrteract with the classification engine l S Embocfirnents of tire present invention may also require a method for new correspondents contained in messages initiated by subscribers be added to the subscribers subscription list. This can be provided by a modified SMTP Aaemon which when eased by a subscriber to send an outbound message traps the message before sending, extracts ar~y correspondent's address and sends there. for inclusion in 20 the subscriber's subscription list as subscribed. This prevents correspondents that the subscriber has already initiated correspondence with being sent a challenge message.
The ~~bove described embodiments deal with methods of classifying correspondent<;. When the system is applied to wireless communications, as in the preceding enruodiment, the environment in the wireless world may be somewhat 25 different and c~~nsequer~tly requires an alternative embodiment having additional logic and technola~~.
A wireless carrier can: have millions of subscribers each with their individual cell-phone number as their identifier. The wireless carrier creates an email gateway and allows tfae outside world to send email to the subscriber with the following address: phonenumbar@domain.com.
The spanner sees this as an inviting target since he now has all of the addresses for the millions of subscribers simply by looking at the nurriber range of the carrier. The result is what the industry refers to as a "carpet bomb", covering all of the carrier's subscriber base. ';fhis not only results in unwanted messages being delivered to subscribers. but also can clog up the carrier's system and may deny them the ability to deliver regular tragic.
Sophisticated spaaxzmers targeting carriers are using a variety of tactics, for example chartging their message every few messages, changing the originating IP
address across thousands of 1P addres&es to thwarC existing salt-SP.AIVi technology_ Consequently, by the time rhes~° technologies determine that a message is a SPAM
message; there has already been a flood of messages getting through to the subscribers.
7.5 The cla.allenge method already described herein above is extremely effective in this circurrista~nce, howeve.r° sor~t,e: carriers may prefer alternative solutions.
Referring to Pig. ~ there is illustrated a third embodiment of the present invention for application to messaging irt a wireless communications environment.
The presenl embodiment adds threat indicator 1?00 and a message sorter '?0 1? 10.
Since, ir1 a wireless envirorunent a subscriber must subscribe to the email system and v~~~: can know which of the millions of subscribers have agreed to accept snail on their wireless devices. As messages flow through the system vn their way to wireless handsets, the message sorter 12 i 0 splits the traffic ixtto two queues, those 2.S destined for Sixbscribed Recipients and those destined for Un,subscribed Recipients.
Thus, the systew of Fig, 6 cart discriminate between the two.
3 ;T
In general, the rate of rjew subscriptions to such a service should be rather level, typically represent only a small percentage of existing wireless subscribers.
Hence, if the number of messages arriving for unsubscribed recipients a suddenly escalates, it is ~~ery likely that a spammer is sending an attack and cannot differentiate between those 'that have subscribed to email and Chose that have not. By being able to detect this event early, we allow the system to adapt and put additional rt1ec11an1stns in place tv de:al with the attack.
As messages enter tl a system identified as being for unsubscribed recipients a fuz2y signature is created fc>r each. A fuzzy signature differs from a more traditional MDS hash in that small differences in target text strings are ignored making a word or two differences between Two email messages statistically irrelevant. This means that when a span-uner changes his message slightly the changed messages are still be identified by ;~ single fuzzy signature.
As ea~=h new message far unsubscribed recipients arrives a new signature is created and Chis signature is compared to those already in the database. if the signature is rtlready them: c~r a very close maieh, then a count associated with the original Signature is iztcrean~mte~1 and the current signature is discarded.
In this way, the system creates a frc;quency distribution based on identical or very similar :?0 messages.
When the count fax one or more of these signatures exceeds a predetermined threshold the threat indicator 1:?QO is turned ON and the signatures) triggering the threat indicate>r made available to a signature analyzer in segment 1201.
As long as the threat indicator is ON all messages are routed through the 'S signature analyzer in 1201 ;~h.is continues until the numb~.~r of messages matching the trigger signature remains lael.ow a predetermined level for a predetermined period, far example, has been 0 for at least 48 hours.
Any messages rttatching the signature that triggered the threat indicator, which are already in the queue for delivery to wireless subscribers, are removed from the delivery queue, This allows trlore stringent measures for dealing with potential SPAIvI to be reserved for those instances where the cattier is de~'mitely under a carpet bomb SPAM attack. The other anti~SPAM measures, described herein above with regard to Fig. S, are available to handle lower volurrles of normal SPAM.
Numerou$ modihcatipxis, variations and adaptations may be made to the particular erribodinuents of l:he present invention described above without departing 'LO from the scope ofthe invention as defined in the claims.
When DLYN'X receives a message destined for a wireless handset it has already passed through all of the detection mechanisms and is either a message already in the recipients subscription list us subscribed or a tentative entry in that list.
DLYNX
either sends the first part of the message to the subscriber and advises them there are other parts OF: DLYN3~ sends an endorsement/read request to the subscriber advising them of the correspondent., whether they are in their subscription list or tentative and whether they ;ire human, a mailing list, information service. The recipient can reply with ariy of the keywords plus an argument. DLYNX receives the reply plus argument and carries out the instruction on the subscriber's behalf supplying the message components and updating subscription and deny lists.
~ Read. 'This command is valid in the context of an endorsement/read message. 'Fhe handset sends this as an SMS message instructing DLYNX that the subscriber wants to read the first message part 2.5 DLYNX x~ponds by setlding the first message part and advising how many more r~er~nain 110.
~ More. This conunand is in the context of reading a multipart message.
The handset sends this as and SMS message and instructs DLYNX to send the ne~;t nZessage part. DLYNX respartds by sending the next message part and advising how many more remain (e.g., 2/10) ~ Reply. This command takes an argument of text and the handset sends the command arid the text argument to DLYNX indicates that the recipient i~~ sending this directly io the originator of the message.
DL'YNX talcea the new text, looks up the corespondents address and scads the ir,~e:;sage back with our without the original message body.
~ Forward. ',1 his command takes an argut~ent of an email address and the taandset sends the comrn.and and the email address to DLYNX
indicating ~io forward the just received message to the new ernail address. r7I...Y1~1X parses the message extracting the email address and l:0 executes the command.
~ Block. This command is in the context of the just read message or on receipt of a~a endorsement~read request. The handset sends the command to I)LYNX instructing DT.YNX to remove this correspondent frorta the subscription list and add him to the Deny List.
15 DL'YhTX retrieve , the oti,ginal message, extracts the correspondent information and executes the command.
Reclassify. 7.'his cornrnand takes an argument of any of the valid Argirz.cate,8ary.subcategory. The handset sends the command and the 20 argument to DLYNX. DLYNX parses the message and reclassifies the correspondetxt according to the new classification sent in the arguzneztt ~ Subscribe. ".this command with no argument takes the correspondent of the Last message and places him in tlae subscription list as 2:i subscribed. 'rrtis command with the argur;xent of an email address :places that cnrrespandent in the subscription list as subscribed.
:DLY'NX paT~~es the message extracting the argument or retrieving the ~~riginal coTr~~spondcnt address and executes the command placing the correspondent in the subscription list as subscribed 3t1 0889'757?CA.
Alias. 'Th~~ typical implementation of email in a wireless gateway is n ~r c~ox~. This is afters inconvenient for a subscriber to use and may lead to larger volumes of unwanted correspondence. This command tal~:es the argument of an email alias and the handset sends the cornmarid with the argument to DLYNX to provide for mail to be received at ~ ias andomain as well as number a~.~c~omain _ DLYNX
parses the message extracting the proposed etrlail alias. It then looks in the user table to see if it already exists. if it does not it assigns that alias to be associated with that Tecipient alolig with their normal 'l0 nu~nberCr,~.dontain address.
~ Number. This command takes the argument of either ON or OFF. The handset sends. the command with the argument to DLYNX to turn on or off the alaility for messages to be received at ~~mbc~~a domain.
~ DLYNX parses the message extracting the argument and either adds or rerrtoves the facility for this recipient to receive messages at nu herla7dc~~ain.
Retrieve Address. Messages received on a wireless handset typically do not iziclude the return address of the correspondent. Simply 30 executing the reply command sends a new message back along the path to the original correspondent. This comxrxand permits the holder of the handset to nxiract the original email address from the body of the email m~bssage. The handset sends this command to DL'YNX and ;DLYNX ret~~ieves the correspondents email address from the body of 2.i the message and sends it as and SMS message to the recipient handset.
A,s this system reduces the volume of unwanted messaging traff c, it has the effect of redvciug disk si:orage requirements, and server-to-client download bandwidth. Bath of these evescources are Eitr~er paid for by the service provider or subscriber, anal provide clear benefits: faster, more useful mail with lower associated overhead costs.
Finally, messaging; carriers subscribing to the "sender-pays" fee model stand to benefit greatly from this embodiment of the present invention. This embodiment S tends- to increase the I>rofitable subscriber-originated nraffic, while delivering unsolicited, unwanted messages would yield no income at all and raise operating costs. A prima example ol:'this business paradigm exists in the North American SMS
market, where SMS catx:iers ate forced by market conditions to deliver inbound (mobile-iertni:nated) traffic at no charge, yet command a very lucrative fee for 'l0 subscriber-originate messages. This embodiment, then, is doubly beneficial for '°sender-pays'' carriers, as it reduces operating overhead by dramatically reducing the amount of unwanted traffic they musi store arid deliver, and it encourages subscribers to originate more messages; both by making their message system more useful, and the subscribers' desire to itrteract with the classification engine l S Embocfirnents of tire present invention may also require a method for new correspondents contained in messages initiated by subscribers be added to the subscribers subscription list. This can be provided by a modified SMTP Aaemon which when eased by a subscriber to send an outbound message traps the message before sending, extracts ar~y correspondent's address and sends there. for inclusion in 20 the subscriber's subscription list as subscribed. This prevents correspondents that the subscriber has already initiated correspondence with being sent a challenge message.
The ~~bove described embodiments deal with methods of classifying correspondent<;. When the system is applied to wireless communications, as in the preceding enruodiment, the environment in the wireless world may be somewhat 25 different and c~~nsequer~tly requires an alternative embodiment having additional logic and technola~~.
A wireless carrier can: have millions of subscribers each with their individual cell-phone number as their identifier. The wireless carrier creates an email gateway and allows tfae outside world to send email to the subscriber with the following address: phonenumbar@domain.com.
The spanner sees this as an inviting target since he now has all of the addresses for the millions of subscribers simply by looking at the nurriber range of the carrier. The result is what the industry refers to as a "carpet bomb", covering all of the carrier's subscriber base. ';fhis not only results in unwanted messages being delivered to subscribers. but also can clog up the carrier's system and may deny them the ability to deliver regular tragic.
Sophisticated spaaxzmers targeting carriers are using a variety of tactics, for example chartging their message every few messages, changing the originating IP
address across thousands of 1P addres&es to thwarC existing salt-SP.AIVi technology_ Consequently, by the time rhes~° technologies determine that a message is a SPAM
message; there has already been a flood of messages getting through to the subscribers.
7.5 The cla.allenge method already described herein above is extremely effective in this circurrista~nce, howeve.r° sor~t,e: carriers may prefer alternative solutions.
Referring to Pig. ~ there is illustrated a third embodiment of the present invention for application to messaging irt a wireless communications environment.
The presenl embodiment adds threat indicator 1?00 and a message sorter '?0 1? 10.
Since, ir1 a wireless envirorunent a subscriber must subscribe to the email system and v~~~: can know which of the millions of subscribers have agreed to accept snail on their wireless devices. As messages flow through the system vn their way to wireless handsets, the message sorter 12 i 0 splits the traffic ixtto two queues, those 2.S destined for Sixbscribed Recipients and those destined for Un,subscribed Recipients.
Thus, the systew of Fig, 6 cart discriminate between the two.
3 ;T
In general, the rate of rjew subscriptions to such a service should be rather level, typically represent only a small percentage of existing wireless subscribers.
Hence, if the number of messages arriving for unsubscribed recipients a suddenly escalates, it is ~~ery likely that a spammer is sending an attack and cannot differentiate between those 'that have subscribed to email and Chose that have not. By being able to detect this event early, we allow the system to adapt and put additional rt1ec11an1stns in place tv de:al with the attack.
As messages enter tl a system identified as being for unsubscribed recipients a fuz2y signature is created fc>r each. A fuzzy signature differs from a more traditional MDS hash in that small differences in target text strings are ignored making a word or two differences between Two email messages statistically irrelevant. This means that when a span-uner changes his message slightly the changed messages are still be identified by ;~ single fuzzy signature.
As ea~=h new message far unsubscribed recipients arrives a new signature is created and Chis signature is compared to those already in the database. if the signature is rtlready them: c~r a very close maieh, then a count associated with the original Signature is iztcrean~mte~1 and the current signature is discarded.
In this way, the system creates a frc;quency distribution based on identical or very similar :?0 messages.
When the count fax one or more of these signatures exceeds a predetermined threshold the threat indicator 1:?QO is turned ON and the signatures) triggering the threat indicate>r made available to a signature analyzer in segment 1201.
As long as the threat indicator is ON all messages are routed through the 'S signature analyzer in 1201 ;~h.is continues until the numb~.~r of messages matching the trigger signature remains lael.ow a predetermined level for a predetermined period, far example, has been 0 for at least 48 hours.
Any messages rttatching the signature that triggered the threat indicator, which are already in the queue for delivery to wireless subscribers, are removed from the delivery queue, This allows trlore stringent measures for dealing with potential SPAIvI to be reserved for those instances where the cattier is de~'mitely under a carpet bomb SPAM attack. The other anti~SPAM measures, described herein above with regard to Fig. S, are available to handle lower volurrles of normal SPAM.
Numerou$ modihcatipxis, variations and adaptations may be made to the particular erribodinuents of l:he present invention described above without departing 'LO from the scope ofthe invention as defined in the claims.
Claims (43)
1. A method of identifying and classifying correspondents sending electronic messages, said method comprising the steps of:
classifying a correspondent as one a recognized machine or human and unrecognized;
and if classified as an unrecognized, sending a request for response from the correspondent in an attempt to classify as human.
classifying a correspondent as one a recognized machine or human and unrecognized;
and if classified as an unrecognized, sending a request for response from the correspondent in an attempt to classify as human.
2. A method as claimed in claim 1 wherein the request includes information that is only interpretable with the aid of human perception.
3. A method as claimed in claim 1 wherein the step of classifying includes the step of classifying a sender of a message as one of recognized or unrecognized.
4. A method as claimed in claim 3 wherein the step of classifying includes the steps of classifying a sender of a message as one of recognized or unrecognized and if recognized, determining that contents of the message are authentic to the sender.
5. A method of identifying and classifying correspondents sending electronic messages, said method comprising the step of:
determining that contents of a message are authentic to a category of correspondents.
determining that contents of a message are authentic to a category of correspondents.
6. A method as claimed in claim 5 wherein the step of determining includes the step of applying a plurality of category-based recognizers to the contents.
7. A method as claimed in claim 5 further comprising the step of classifying a sender of a message as one of recognized or unrecognized.
8. A method as claimed in claim 5 further comprising the step of classifying a sender of a message as one of recognized or unrecognized and if recognized, determining that contents of the message are authentic to the sender.
9. A method of identifying and classifying correspondents sending electronic messages, said method comprising the steps of:
requesting confirmation from a potential recipient that receiving a message from a previously unseen correspondent is desired and that an initial classification of the correspondent is correct ;
upon confirmation without reclassification, delivering the message to the recipient;
and upon confirmation with reclassification, changing the classification of the correspondent and delivering the message to the recipient.
requesting confirmation from a potential recipient that receiving a message from a previously unseen correspondent is desired and that an initial classification of the correspondent is correct ;
upon confirmation without reclassification, delivering the message to the recipient;
and upon confirmation with reclassification, changing the classification of the correspondent and delivering the message to the recipient.
10. A method as claimed in claim 9 further comprising the step of classifying a sender of a message as one of recognized or unrecognized and if unrecognized requesting a response from the correspondent in an attempt to classify as human.
11. A method of identifying and classifying correspondents sending electronic messages, said method comprising the steps of:
classifying a sender of a message as one of recognized or unrecognized; and if recognized, determining that contents of the message are authentic to the sender.
classifying a sender of a message as one of recognized or unrecognized; and if recognized, determining that contents of the message are authentic to the sender.
12. A method as claimed in claim 11 further comprising the step of determining that contents of a message are authentic to a category of correspondents.
13. A method as claimed in claim 11 further comprising the step requesting confirmation item a potential recipient that receiving a message from a previously unseen recipient is desired.
14. A method as claimed in claim 13 further comprising the step of upon confirmation, delivering the message to the recipient.
15. A method of identifying and classifying correspondents sending electronic messages, said method comprising the steps of:
classifying a sender of a message as one of recognized or unrecognized;
classifying a correspondent as one a machine and a person;
if classified as unrecognized, sending a request for response from the correspondent in an attempt to classify as human;
if recognized, determining one of that contents of the message are authentic to the sender and that contents of a message are authentic to a category of correspondents;
if unable to determine total acceptability of the correspondent, requesting confirmation from a potential recipient that receiving a message from a previously unseen correspondent is desired and that an initial classification of the correspondent is correct;
upon confirmation without reclassification, delivering the message to the recipient;
and upon confirmation with reclassification, changing the classification of the correspondent and delivering the message to the recipient.
classifying a sender of a message as one of recognized or unrecognized;
classifying a correspondent as one a machine and a person;
if classified as unrecognized, sending a request for response from the correspondent in an attempt to classify as human;
if recognized, determining one of that contents of the message are authentic to the sender and that contents of a message are authentic to a category of correspondents;
if unable to determine total acceptability of the correspondent, requesting confirmation from a potential recipient that receiving a message from a previously unseen correspondent is desired and that an initial classification of the correspondent is correct;
upon confirmation without reclassification, delivering the message to the recipient;
and upon confirmation with reclassification, changing the classification of the correspondent and delivering the message to the recipient.
16. A method as claimed in any of claims 1 -15 further comprising the step of screening messages based upon a predetermined list.
17. An apparatus for identifying and classifying correspondents sending electronic messages comprising:
means for classifying a correspondent as one recognized or unrecognized; and means for sending a request for response from the correspondent if classified as unrecognized in an attempt to classify as human.
means for classifying a correspondent as one recognized or unrecognized; and means for sending a request for response from the correspondent if classified as unrecognized in an attempt to classify as human.
18. Apparatus as claimed in claim 17 wherein the request includes information that is only interpretable with the aid of human perception.
19. Apparatus as claimed in claim 17 wherein the means for classifying includes the step of classifying a sender of a message as one of recognized or unrecognized.
20. Apparatus as claimed in claim 19 wherein the means for classifying includes means for classifying a sender of a message as one of recognized or unrecognized and means for determining that contents of the message are authentic to the sender if recognized.
21. Apparatus as claimed in claim 20 wherein the means for classifying include means for determining that contents of a message authentic to a category of correspondence.
22, Apparatus as claimed in claim 21 wherein the means for determining includes a plurality of category-based recognizers to the contents.
23. Apparatus as claimed in claim 21 further comprising means for classifying a sender of a message as one of recognized or unrecognized.
24. Apparatus as claimed in claim 21 further comprising means for classifying a sender of a message as one of recognized or unrecognized and if recognized, determining that contents of the message are authentic to the sender.
25. Apparatus for identifying and classifying correspondents sending electronic messages comprising:
means for requesting confirmation from a potential recipient that receiving a message from a previously unseen correspondent is desired and that an initial classification of the correspondent is correct ;
means for delivering the message to the recipient upon confirmation without reclassification; and means for changing the classification of the correspondent upon confirmation with reclassification and delivering the message to the recipient.
means for requesting confirmation from a potential recipient that receiving a message from a previously unseen correspondent is desired and that an initial classification of the correspondent is correct ;
means for delivering the message to the recipient upon confirmation without reclassification; and means for changing the classification of the correspondent upon confirmation with reclassification and delivering the message to the recipient.
26. Apparatus as claimed in claim 25 further comprising means for classifying a sender of a message as one of recognized or unrecognized and means for obtaining confirmation of human/machine origin if unrecognized.
27. Apparatus for identifying and classifying correspondents sending electronic messages comprising:
means for classifying a sender of a message as one of recognized or unrecognized;
and means for determining that contents of the message are authentic to the sender if recognized.
means for classifying a sender of a message as one of recognized or unrecognized;
and means for determining that contents of the message are authentic to the sender if recognized.
28. Apparatus as claimed in claim 27 further comprising means for determining that contents of a message are authentic to a category of correspondents.
29. Apparatus as claimed in claim 27 further comprising means for requesting confirmation from a potential recipient that receiving a message from an unrecognized recipient is desired.
30. Apparatus as claimed in claim 29 further comprising means for delivering the message to the recipient upon confirmation.
31. Apparatus for identifying and classifying correspondents sending electronic messages comprising:
means for classifying a sender of a message as one of recognized or unrecognized;
means for classifying a correspondent as one a machine and a person;
means for sending a request for response from the correspondent if classified as unrecognized;
means for determining one of that contents of the message are authentic to the sender and that contents of a message are authentic to a category of correspondents if recognized;
means for requesting confirmation from a potential recipient that receiving a message from a previously unseen correspondent is desired and that an initial classification of the correspondent is correct ; and means for delivering the message to the recipient upon confirmation without reclassification and upon confirmation with reclassification, changing the classification of the correspondent and delivering the message to the recipient.
means for classifying a sender of a message as one of recognized or unrecognized;
means for classifying a correspondent as one a machine and a person;
means for sending a request for response from the correspondent if classified as unrecognized;
means for determining one of that contents of the message are authentic to the sender and that contents of a message are authentic to a category of correspondents if recognized;
means for requesting confirmation from a potential recipient that receiving a message from a previously unseen correspondent is desired and that an initial classification of the correspondent is correct ; and means for delivering the message to the recipient upon confirmation without reclassification and upon confirmation with reclassification, changing the classification of the correspondent and delivering the message to the recipient.
32. Apparatus as claimed in any of claims 17 -31 further comprising means for screening messages based upon a predetermined list.
33. A method of identifying and classifying correspondents sending electronic messages, said method comprising the steps of:
classifying a receiver of a message as one of subscribed and unsubscribed;
if unsubscribed, preparing a signature from contents of the message;
comparing the signature to determine whether a similar message previously received;
and if yes, determining if a predetermined threshold has been reached for the similar message and blocking any further similar messages.
classifying a receiver of a message as one of subscribed and unsubscribed;
if unsubscribed, preparing a signature from contents of the message;
comparing the signature to determine whether a similar message previously received;
and if yes, determining if a predetermined threshold has been reached for the similar message and blocking any further similar messages.
34. A method as claimed in claim 33 wherein the step of determining if a predetermined threshold has been reached includes the step of incrementing a count associated with the similar message.
35. A method as claimed in claim 34 wherein the step determining includes the step of comparing the count with the predetermined threshold.
36. A method as claimed in claim 33 wherein the step of preparing the signature prepares a fuzzy signature.
37. A method as claimed in claim 36 wherein the step of comparing the signature uses fuzzy logic.
38. Apparatus for identifying and classifying correspondents sending electronic messages comprising:
means for classifying a receiver of a message as one of subscribed and unsubscribed;
means for preparing a signature from contents of the message if unsubscribed;
means for comparing the signature to determine whether a similar message previously received; and means for determining if a predetermined threshold has been reached for the similar message; and blocking any further similar messages.
means for classifying a receiver of a message as one of subscribed and unsubscribed;
means for preparing a signature from contents of the message if unsubscribed;
means for comparing the signature to determine whether a similar message previously received; and means for determining if a predetermined threshold has been reached for the similar message; and blocking any further similar messages.
39. Apparatus as claimed in claim 38 wherein the means for determining if a predetermined threshold has been reached means for incrementing a count associated with the similar message.
40. Apparatus as claimed in claim 39 wherein the means for determining includes a means for comparing the count with the predetermined threshold.
41. Apparatus as claimed in claim 38 wherein the means for preparing the signature includes means for preparing a fuzzy signature.
42. Apparatus as claimed in claim 41 wherein the means for comparing the signature includes fuzzy logic.
43. Apparatus as claimed in claim 38 further comprising a threat indicator responsive to the means for determining if a predetermined threshold has been reached.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA002423654A CA2423654A1 (en) | 2003-03-28 | 2003-03-28 | Method and apparatus for identification and classification of correspondents sending electronic messages |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CA002423654A CA2423654A1 (en) | 2003-03-28 | 2003-03-28 | Method and apparatus for identification and classification of correspondents sending electronic messages |
Publications (1)
Publication Number | Publication Date |
---|---|
CA2423654A1 true CA2423654A1 (en) | 2004-09-28 |
Family
ID=33034938
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CA002423654A Abandoned CA2423654A1 (en) | 2003-03-28 | 2003-03-28 | Method and apparatus for identification and classification of correspondents sending electronic messages |
Country Status (1)
Country | Link |
---|---|
CA (1) | CA2423654A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110913353A (en) * | 2018-09-17 | 2020-03-24 | 阿里巴巴集团控股有限公司 | Short message classification method and device |
-
2003
- 2003-03-28 CA CA002423654A patent/CA2423654A1/en not_active Abandoned
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110913353A (en) * | 2018-09-17 | 2020-03-24 | 阿里巴巴集团控股有限公司 | Short message classification method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6199102B1 (en) | Method and system for filtering electronic messages | |
US10042919B2 (en) | Using distinguishing properties to classify messages | |
US9386046B2 (en) | Statistical message classifier | |
RU2381551C2 (en) | Spam detector giving identification requests | |
US7653606B2 (en) | Dynamic message filtering | |
RU2378692C2 (en) | Lists and features of sources/addressees for preventing spam messages | |
US20050050150A1 (en) | Filter, system and method for filtering an electronic mail message | |
RU2331913C2 (en) | Feedback loop for unauthorised mailing prevention | |
US20060047766A1 (en) | Controlling transmission of email | |
US20030236845A1 (en) | Method and system for classifying electronic documents | |
US20040162795A1 (en) | Method and system for feature extraction from outgoing messages for use in categorization of incoming messages | |
CA2423654A1 (en) | Method and apparatus for identification and classification of correspondents sending electronic messages | |
CA2420812A1 (en) | Method and apparatus for identification and classification of correspondents sending electronic messages | |
CN102598009A (en) | Method and apparatus for filtering information | |
Jamnekar et al. | Review on Effective Email Classification for Spam and Non Spam Detection on Various Machine Learning Techniques | |
Malathi | Email Spam Filter using Supervised Learning with Bayesian Neural Network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
FZDE | Discontinued |