WO2003090081A1 - A hierarchical system for analysing data streams - Google Patents
A hierarchical system for analysing data streams Download PDFInfo
- Publication number
- WO2003090081A1 WO2003090081A1 PCT/AU2003/000460 AU0300460W WO03090081A1 WO 2003090081 A1 WO2003090081 A1 WO 2003090081A1 AU 0300460 W AU0300460 W AU 0300460W WO 03090081 A1 WO03090081 A1 WO 03090081A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- analysis
- target activity
- sub
- alert
- data
- Prior art date
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1416—Event detection, e.g. attack signature detection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M3/00—Automatic or semi-automatic exchanges
- H04M3/22—Arrangements for supervision, monitoring or testing
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04M—TELEPHONIC COMMUNICATION
- H04M2203/00—Aspects of automatic or semi-automatic exchanges
- H04M2203/60—Aspects of automatic or semi-automatic exchanges related to security aspects in telephonic communication systems
- H04M2203/6027—Fraud preventions
Definitions
- the present invention relates to a hierarchical system for analysing data streams.
- the present invention relates to analysing data streams to identify target events.
- a target event may be an instance of fraud on a telephone system, however the present invention has applications in other high data volume environments to identify other target events/activities.
- Fraud is a serious problem in modern telecommunication systems, and can result in revenue loss by the telecommunications service provider, reduced operational efficiency, and an increased risk of subscribers moving to other providers that are perceived to offer better security.
- any provider that can reduce revenue loss resulting from fraud - either by its prevention or early detection - has a significant advantage over its competitors.
- Telecommunications networks support many hundreds or thousands of transactions per second, and one of the challenges in developing effective fraud detection systems is to achieve the high throughput necessary to analyse all network traffic in detail and in real time.
- fraud detection systems frequently ignore services that are considered to be low risk (e.g. low cost calls), or limit the sophistication of the fraud detection algorithms in order to achieve the required throughput.
- the present invention provides a system of hierarchical data analysis that seeks to provide high throughput and sensitivity with less false positive alerts of possible target activity.
- a method for analysing data streams comprising at least the steps of: receiving a data stream; conducting a first analysis of the data stream for a possible target activity, and if a possible target activity is indicated generating a first alert; if the first alert is generated, conducting a second analysis for the possible target activity to determine whether the target activity is indicated in the data stream with a high degree of certainty, if a possible target activity is indicated by the second analysis, generating a second alert; and providing the second alert to an external system for action.
- the first analysis step comprises at least: conducing a first sub-analysis of the data stream for the possible target activity to determine whether the target activity is indicated in the data stream, if the possible target activity is indicated by the first sub-analysis then a first sub-alert is generated; and conducting a second sub-analysis of the data stream for the possible target activity to determine whether the target activity is indicated in the data stream with a higher degree of certainty than in the first sub-analysis, if the possible target activity is indicated by the second sub-analysis then the first alert is generated.
- the second sub-analysis provides an indication of the target activity with a higher degree of certainty than in the first sub-analysis.
- the second analysis provides an indication of the target activity with a higher degree of certainty than in the second sub-analysis.
- the method further comprises propagating data from the data stream relevant to the second sub-analysis for conducting the second sub-analysis.
- the method further comprises the step of propagating data from the data stream relevant to the second analysis for conducting the second analysis.
- the second sub-analysis is conducted on additional data to the propagated data.
- the second analysis is conducted using additional data to the data propagated for the second analysis.
- one or more additional levels of sub-analysis are conducted between the first sub-analysis and the second sub-analysis wherein an alert is generated by one of the additional levels and passed to a next of the additional levels.
- a subsequent analysis is conducted while determining whether the target activity is indicated to a higher degree of certainty than the previous level.
- the first sub-alert triggers the first of one or more additional levels of sub-analysis and the alert generated by the final level of additional sub-analysis triggers the second sub-analysis.
- data is propagated from one additional level of sub-analysis to the next and includes data necessary in the subsequent levels of additional sub-analysis.
- each additional level of sub-analysis is conducted on additional data specific to the type of analysis conducted in addition to the propagated data.
- each level of the sub-analysis creates a third alert if a fraudulent activity is indicated with a relatively high degree of certainty, any one of the second alerts and third alerts triggering an action in the external system.
- the first analysis may conduct one or more types of analysis in parallel.
- one or more of the additional levels of sub-analysis may conduct one • or more types of analysis in parallel.
- the target activity is fraudulent activity.
- a system for analysing data streams comprising at least: a first analyser arranged to analyse a data stream for possible target activity and if a possible target activity is indicated to generate a first alert; a second analyser arranged to conduct an analysis for possible target activity if the first alert is generated, and if a possible target activity is indicated with a relatively high probability by the second analysis to generate a second alert for an external system to act on.
- a system for analysing datastreams comprising at least: one or more sequential analysers are arranged to conduct an analysis for possible target activity, a first analyser of the sequence of analysers analysing a data stream, each subsequent analyser of the sequence of analysers only conducting its analysis if the previous analyser indicates a possible target activity, and if a possible target activity is indicated by each analysis generating a subsequent alert for the next analyser; and a final analyser arranged to conduct an analysis for possible target activity if the last analyser of the sequence of analysers generates an alert, and if a possible target activity is indicated with a relatively high probability by the analysis of the final analyser, the final analyser generates an alert for an external system to act on.
- a method of analysing data streams comprising at least: conducing one or more sequential analyses of a data stream for possible target activity, the first of the analyses being conducted directly on the data stream, subsequent analyses after the first, only being conducted if the previous analysis indicated a possible target activity; conducting a final analysis for possible target activity if the last of the sequential analyses indicated a possible target activity; and if the final analysis indicates a possible target activity with a relatively high degree of certainty generating an alert to an external system for action.
- Figure 1 is a schematic representation of a preferred embodiment of a system for analysing data streams in accordance with the present invention.
- Figure 2 is a schematic representation exemplifying data analysis using the system of Figure 1.
- FIG. 1 there is shown a system 10 that receives a data stream 12 (that may include one or more sub-streams) and outputs a data stream of alerts 34 for use by an external system.
- the system 10 includes a plurality of data analysis modules, in this case three are shown 14, 16 and 18.
- Each of the analysis modules 14, 16 and 18 receives respective additional data 20, 22 and 24 used in the analysis of the data stream 12 provided to the first data module 14.
- Each data module 14, 16 and 18 propagates data to the next data module indicated by propagated data 26 and 30.
- Each data module provides internal alerts 28 and 32 to the subsequent data module.
- the system 10 is configured to identify suspicious telephone activity that may indicate fraud. Due to the high volume of telephone call data required to be processed, each data analysis module can provide a different analysis technique to progressively increase the certainty that the data indicated the presence of fraudulent telephone activity.
- the system 10 may be implemented in the form of a computer or a network of computers programmed to perform the analysis of each of the modules.
- a single computer can be programmed to run the system or a dedicated computer may be programmed to conduct each of the analysis of each of the modules with communication being provided between each of the computers of the whole system 10.
- Each of the data analysis modules 14, 16 and 18 cascade data initially provided by data stream 12 to the subsequent module.
- the data stream 12 could, for example, include call data records (CDRs, which contain details of the calls made on a telecommunication network). For example, a portion of a CDR produced from a real call is given in Table 1. The fields contained in the CDR are (from top to bottom) A-number (the number of the phone from which the call was made), B-number (the number to which the call was made), B-number type (whether it was local, national, international etc encoded as a number), the call's cost, its duration and the date and time at which it started.
- CDRs call data records
- Table 1 The fields contained in the CDR are (from top to bottom) A-number (the number of the phone from which the call was made), B-number (the number to which the call was made), B-number type (whether it was local, national, international etc encoded as a number), the call's cost, its duration and
- the data stream 12 can also include several substreams from different sources.
- one substream could be a CDR stream, while another could provide customer information such as postcodes and payment histories.
- Each of the data analysis modules 14, 16 and 18 contains one or more fraud detection engines that analyse their input data for signs of fraudulent activity, in response to which they generate alerts. Each fraud detection engine can process different subsets of the modules' input data.
- Each data analysis module after the first, receives propagated data that is passed from the analysis module immediately receiving it in the hierarchy. The additional data available to each data analysis module may be specific to the type of analysis conducted by that particular data analysis module. The propagated data may contain low level data from the original data stream 12 or additional data used by data analysis modules lower in the hierarchy, depending on the configuration of the system 10.
- Propagated data is important for the efficiency of the system because the analyses performed within particular analysis modules may require particular access to potentially large quantities of data that are not required elsewhere within the system. Propagating data that is not required in other analysis modules is a waste of resources and is likely to reduce the rate at which the system can process incoming data. Propagated data consists of information that is used in more that one data analysis module. For example, the A-number field is used to identify the calling party, is provided within the CDR stream that usually forms part of the systems input 12, and is usually required throughout the system, and hence usually propagated through the system rather than forming part of the additional data inputs.
- Each of the data analysis modules 14, 16 and 18 can generate internal and external alerts.
- External alerts 34 are combined from all of the modules 14, 16 and 18 to form the output 34 of the system. Combining the outputs may be the equivalent of providing a logical OR to each of the alerts, so that if any of the modules generates an external alert, the system as a whole generates the alert.
- External alerts are only produced by the modules when the calculated probability of a target activity (fraud) is sufficiently high to reasonably conclude that fraud has occurred. What is considered a high probability depends on the particular application, its expected throughput, and the desired degree of certainty. When individual calls are analysed for fraud within telecommunication networks, a probability as large as 0.99995 to 0.99999 may be required to keep the number of alerts to a manageable level (since large networks can experience as many as 100 million calls per day).
- Each of the data analysis modules 14, 16 and 18 can generate internal alerts if its analysis reveals something unusual, but does not provide sufficiently high probability that target activity is indicated to warrant an external alert.
- Internal alerts are important for regulating the activity of subsequent data analysis modules within the hierarchy of the system, because subsequent data analysis modules may only be activated if an internal alert is received, indicating that further analysis of the data is required to obtain the sufficient degree of certainty to generate an external alert.
- Subsequent data analysis modules 16 and 18 may only be activated if they receive an internal alert 28 or 32 from a proceeding analysis module or if any of its input data is updated.
- the additional data is only provided in response to a request made by a lower module and the input additional data is not configured to activate an analysis module.
- an analysis module 14, 16 or 18 may identify a short term increase in the total cost of calls made by a particular subscriber, which may not be severe enough to conclude that fraud has occurred and hence to generate an external alert.
- a subsystem may therefore generate an internal alert that causes the next module in the system to perform its analysis.
- This cascaded activation of analysis modules within the system means that lower level subsystems are activated most frequently and that the throughput of the system can be maximised by designing the lower level subsystems to require a minimum amount of processing.
- Higher level analysis, which is activated less frequently can thus use more expensive processes (such as nonlinear or iterative functions) and can perform expensive operations (such as database reads and writes) or make use of human intervention, with minimal effect on the throughput of the entire system.
- a neural network could be trained to estimate the probability that a particular telephone call was fraudulent based on its characteristics (cost, duration, etc.) or Fourier analysis could be used to see if a short term fluctuation in the calling activity was part of a cycle of a subscriber's normal behaviour in an analysis module that becomes active only once a lower level system has generated an alert.
- the lower level subsystems may need some level of parallelism in order to achieve the required throughput and thus can be distributed across several computers. Later stages may require so little resources that several can be run simultaneously on a single computer while others may require user interaction or database access, placing specific requirements on their geographic location.
- By building a fraud detection system from a hierarchy of subsystems of increasing sophistication it is possible to produce a superior trade off between fraud detection accuracy and throughput.
- Each of the data analysis modules should be designed to generate many more internal false positives (that is, internal alerts for events that are not actually fraudulent) than internal false negatives (where an internal alert was not generated when fraud did in fact occur). This is because the higher level subsystems that are activated by the internal alerts may be able to provide a higher degree of certainty to confirm or refute the internal alert based on different analysis techniques and/or the inclusion of additional data in the analysis to clarify whether, with the required of certainty, the data indicates that a fraud is actually present. If the system is not designed in this way, then when false negatives occur the higher level subsystems are never activated and thus are not able to correct an error made by the lower level subsystem.
- the analysis modules 14, 16 and 18 are designed to generate a small number of external false positives (external alerts generated for events that are not actually fraudulent) and a large number of external false negatives (resulting in no external alert being generated when in fact a fraud did occur). This is because provided that an internal alert was generated, the external false negative can be corrected by higher level analysis modules generating their own external alerts. In a situation where a false positive external alert is generated the system as a whole will generate an alert that can't be prevented by analysis conducted at a subsequent level modules even if subsequent modules were activated.
- FIG. 2 shows an example of a real telecommunications fraud detection system based on the system 10.
- the input data stream 12 includes a CDR stream that provides details of each call made on the telecommunications network shortly after the call is terminated.
- the CDR stream is passed to the lowest level data analysis module 14 which is configured as a candidate fraud detector (CFD).
- the CFD contains two separate fraud detection algorithms, based on a set of rules 36 that search directly for common fraud indicators (such as more than 8 hours of calls to the Caribbean in any 24 hour period), and change detection algorithm 38 that searches for unusual changes in the pattern of behaviour associated with individual subscribers (which can indicate that a line has been taken over by fraudsters). These two components 36 and 38 of the lowest level data analysis module 14 operate independently.
- An internal alert 28 is generated when either of its components 36 and 38 indicates that a particular telephone call is a fraud candidate.
- the rules 36 and change detector 38 are designed to be fast and simple because the CDR stream 12 can present the data analysis module with as many as 100 million CDRs per day.
- the internal alerts 28 are passed to the next level data analysis module which operates as an intelligent alarm analyser (IAA) which is only activated when an internal alert is generated by the CFD.
- IAA intelligent alarm analyser
- the ratio of the number of CDRs to the number of internal alerts 28 is about 1000:1 meaning that statistically the IAA is activated only once for every 1000 times the CFD is activated.
- the IAA is a rule based system that removes some of the false alerts generated by the CFD by performing complex analysis on the distributions of the alerts themselves. These complex analyses are possible due to the low level of activity demanded of the IAA compared to the CFD. The analyses also require time information (real world, date and time) which is provided to the IAA as additional data 22.
- time information real world, date and time
- the third level data analysis module operates as a case manager.
- the case manager may be a team committed by the telecommunications operator employed for the purpose of investigating the events that caused internal alerts to be generated by the IAA. Because the case manager is a higher level subsystem it is activated only once every 500,000 or so CDRs and hence can use much slower and more expensive processing methods such as manual investigations of potential frauds than either the CFD or IAA without being overwhelmed.
- the case manager uses customer information (names, addresses, payment histories, etc.) as further additional data 24 and frequently a wide variety of additional data sources (six month history of calls made by a particular customer) to investigate internal alerts 32 generated by the IAA to determine whether they are likely to be cases of actual fraud. If it is determined that they are, the case manager subsystem generates an external alert 34 which is passed out of the system.
- the alert could be used for a variety of purposes, such as to inform billing services within the network operator to remove fraudulent calls from a customer's bill, or to inform law enforcement agencies.
- null additional data 20 is provided to the CFD.
- no data is propagated from the CFD to the IAA or from the IAA to the case manager.
- additional data may be provided to the CFD or data may be propagated from the CFD to the IAA and possibly then from the IAA to the case manager.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Signal Processing (AREA)
- Computer Hardware Design (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Description
Claims
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2003218899A AU2003218899A1 (en) | 2002-04-16 | 2003-04-16 | A hierarchical system for analysing data streams |
EP03714539A EP1499969A1 (en) | 2002-04-16 | 2003-04-16 | A hierarchical system for analysing data streams |
US10/965,703 US20050190905A1 (en) | 2002-04-16 | 2004-10-14 | Hierarchical system and method for analyzing data streams |
US12/340,504 US20090164761A1 (en) | 2002-04-16 | 2008-12-19 | Hierarchical system and method for analyzing data streams |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
GB0208711A GB0208711D0 (en) | 2002-04-16 | 2002-04-16 | A hierarchical system for analysing data streams |
GB0208711.2 | 2002-04-16 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/965,703 Continuation US20050190905A1 (en) | 2002-04-16 | 2004-10-14 | Hierarchical system and method for analyzing data streams |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2003090081A1 true WO2003090081A1 (en) | 2003-10-30 |
Family
ID=9934941
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/AU2003/000460 WO2003090081A1 (en) | 2002-04-16 | 2003-04-16 | A hierarchical system for analysing data streams |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP1499969A1 (en) |
AU (1) | AU2003218899A1 (en) |
GB (1) | GB0208711D0 (en) |
WO (1) | WO2003090081A1 (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5680542A (en) * | 1995-06-07 | 1997-10-21 | Motorola, Inc. | Method and apparatus for synchronizing data in a host memory with data in target MCU memory |
GB2328043A (en) * | 1997-07-26 | 1999-02-10 | Ibm | Managing a distributed data processing system |
EP0985995A1 (en) * | 1998-09-09 | 2000-03-15 | International Business Machines Corporation | Method and apparatus for intrusion detection in computers and computer networks |
EP0833489B1 (en) * | 1996-09-26 | 2002-05-15 | Eyretel Limited | Signal monitoring apparatus |
-
2002
- 2002-04-16 GB GB0208711A patent/GB0208711D0/en not_active Ceased
-
2003
- 2003-04-16 AU AU2003218899A patent/AU2003218899A1/en not_active Abandoned
- 2003-04-16 EP EP03714539A patent/EP1499969A1/en not_active Withdrawn
- 2003-04-16 WO PCT/AU2003/000460 patent/WO2003090081A1/en not_active Application Discontinuation
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5680542A (en) * | 1995-06-07 | 1997-10-21 | Motorola, Inc. | Method and apparatus for synchronizing data in a host memory with data in target MCU memory |
EP0833489B1 (en) * | 1996-09-26 | 2002-05-15 | Eyretel Limited | Signal monitoring apparatus |
GB2328043A (en) * | 1997-07-26 | 1999-02-10 | Ibm | Managing a distributed data processing system |
EP0985995A1 (en) * | 1998-09-09 | 2000-03-15 | International Business Machines Corporation | Method and apparatus for intrusion detection in computers and computer networks |
Also Published As
Publication number | Publication date |
---|---|
AU2003218899A1 (en) | 2003-11-03 |
GB0208711D0 (en) | 2002-05-29 |
EP1499969A1 (en) | 2005-01-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3324607B1 (en) | Fraud detection on a communication network | |
US11240372B2 (en) | System architecture for fraud detection | |
US20230136732A1 (en) | Systems and methods for phone number fraud prediction | |
US10165128B2 (en) | Toll-tree numbers metadata tagging, analysis and reporting | |
EP1889461B1 (en) | Network assurance analytic system | |
JP2001516107A (en) | System and method for detecting and managing fraud | |
US6587552B1 (en) | Fraud library | |
US6570968B1 (en) | Alert suppression in a telecommunications fraud control system | |
US5970129A (en) | Administrative monitoring system for calling card fraud prevention | |
US20090164761A1 (en) | Hierarchical system and method for analyzing data streams | |
US6188753B1 (en) | Method and apparatus for detection and prevention of calling card fraud | |
US20230344932A1 (en) | Systems and methods for use in detecting anomalous call behavior | |
EP1499969A1 (en) | A hierarchical system for analysing data streams | |
US6590967B1 (en) | Variable length called number screening | |
Rosas et al. | Telecommunications fraud: problem analysis-an agent-based KDD perspective | |
Kang et al. | Toll Fraud Detection of Voip Services via an Ensemble of Novelty Detection Algorithms. | |
CN116915904A (en) | Call service detection method, device and storage medium | |
Moreau et al. | of Deliverable Definition of Fraud Detection Concepts |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AK | Designated states |
Kind code of ref document: A1 Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PH PL PT RO RU SC SD SE SG SK SL TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW |
|
AL | Designated countries for regional patents |
Kind code of ref document: A1 Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application | ||
DFPE | Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101) | ||
WWE | Wipo information: entry into national phase |
Ref document number: 10965703 Country of ref document: US |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2003218899 Country of ref document: AU |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2003714539 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 3582/DELNP/2004 Country of ref document: IN |
|
WWE | Wipo information: entry into national phase |
Ref document number: 3696/DELNP/2004 Country of ref document: IN |
|
WWP | Wipo information: published in national office |
Ref document number: 2003714539 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: JP |
|
WWW | Wipo information: withdrawn in national office |
Ref document number: JP |