US20210158161A1 - Methods and Systems for Detecting Spurious Data Patterns - Google Patents
Methods and Systems for Detecting Spurious Data Patterns Download PDFInfo
- Publication number
- US20210158161A1 US20210158161A1 US17/100,243 US202017100243A US2021158161A1 US 20210158161 A1 US20210158161 A1 US 20210158161A1 US 202017100243 A US202017100243 A US 202017100243A US 2021158161 A1 US2021158161 A1 US 2021158161A1
- Authority
- US
- United States
- Prior art keywords
- neural network
- data
- network circuit
- connections
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 98
- 238000013528 artificial neural network Methods 0.000 claims abstract description 122
- 230000002547 anomalous effect Effects 0.000 claims abstract description 52
- 238000012549 training Methods 0.000 claims description 41
- 238000007781 pre-processing Methods 0.000 claims description 26
- 230000004044 response Effects 0.000 claims description 16
- 238000001514 detection method Methods 0.000 claims description 10
- 238000010606 normalization Methods 0.000 claims description 8
- 238000004891 communication Methods 0.000 claims description 3
- 230000008569 process Effects 0.000 abstract description 29
- 239000013598 vector Substances 0.000 description 30
- 238000010586 diagram Methods 0.000 description 21
- 239000002131 composite material Substances 0.000 description 13
- 230000006399 behavior Effects 0.000 description 10
- 230000000694 effects Effects 0.000 description 10
- 230000009466 transformation Effects 0.000 description 9
- 238000012545 processing Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 6
- 230000015654 memory Effects 0.000 description 6
- 230000035945 sensitivity Effects 0.000 description 5
- 230000002159 abnormal effect Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 238000001914 filtration Methods 0.000 description 3
- 230000002093 peripheral effect Effects 0.000 description 3
- 238000000844 transformation Methods 0.000 description 3
- 238000012935 Averaging Methods 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 239000004065 semiconductor Substances 0.000 description 2
- 230000002123 temporal effect Effects 0.000 description 2
- 230000001131 transforming effect Effects 0.000 description 2
- 241000282412 Homo Species 0.000 description 1
- 206010027339 Menstruation irregular Diseases 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 239000004020 conductor Substances 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 229910000078 germane Inorganic materials 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 238000013450 outlier detection Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000011897 real-time detection Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2365—Ensuring data consistency and integrity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/082—Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/018—Certifying business or products
- G06Q30/0185—Product, service or business identity fraud
Definitions
- Examples of such records may be transaction records (e.g., credit card records), with respect to which a learning system is configured to detect anomalous (outlying) activity or behavior. Such anomalous activity may be indicative of possible fraudulent activity.
- novel neural network architectures of the present disclosure can be combined with a unique data preprocessing methodology to reduce the dimensionality of the input data based on specific data filters that maximize the entropy of the input data.
- the implementations described herein use graph network topologies and processing to identify outliers or anomalous data.
- a method is thus provided to combine graph networks topology of the processed data with a neural network to automatically cluster data in a topological way that separates the spurious data patterns from normal data flow.
- the example implementations also include apparatus comprising the neural networks (or other types of learning machines), the topological graphs, and the neural network filters described herein.
- the example implementations additionally include non-transitory computer-readable medium having program code recorded thereon for filtering the input streaming data according to the preprocessing parameters and forwarding this data to the filtering neural network and the graph based topological calculator and neural network.
- the medium may include program code to, when executed by a processor, select at least one moment of an input of the data, along with the execution of the neural networks and graph topological transformers.
- the methods and apparatus of the present disclosure include engineered features that are created/generated from the base streaming data.
- the engineered features measure many aspects of the data instance and may or may not be interesting or germane to human-based analysis. Inputting these features to the neural network modules might or might not supply them with data relationships humans intuitively find interesting.
- the methods and apparatus include a methodology, device and code to flag specific patterns in data for potential review from a human reviewers (for example, in the case of a financial transaction) if a feature such as comparing the distance between billing and shipping addresses for a transaction is above a certain threshold, the transaction will be automatically flagged for potential review.
- a method for robust detection and classification of data outliers includes converting a set of data values representative of a multi-dimensional item into a graph representation of the multi-dimensional item, with the graph representation comprising nodes and edges, applying a graph convolution process to the graph representation of the multi-dimensional item to generate a transformed graph representation for the multi-dimensional items comprising a resultant transformed configuration of the nodes and edges representing the multi-dimensional item, and determining, based on the transformed configuration of the nodes and edges representing the multi-dimensional item, a probability that the multi-dimensional item is anomalous.
- Embodiments of the method may include at least some of the features described in the present disclosure, including one or more of the following features.
- Determining the probability that the multi-dimensional item is anomalous may include processing the transformed configuration of the nodes and edges representing the multi-dimensional item with a global attention module to generate a resultant vector of values, and applying a softmax module to the resultant vector of values to derive the probability that the multi-dimensional item is anomalous.
- Converting the set of data values representative of the multi-dimensional item may include transforming values comprising the multi-dimensional items into a plurality of respective multi-dimensional vectors by a plurality of trained multi-layer perceptron applied to the respective values.
- the method may further include generating, for the plurality of respective multi-dimensional vectors, a graph representation of nodes with interconnecting edges connecting at least some of the nodes, with positions and orientations of the interconnected nodes in the graph representation relative to each other being indicative of potential anomalous relationships between the set of data values of the multi-dimensional item.
- Applying the graph convolution process may include generating, for a particular edge of the edges of the graph representation, an edge composite value based on an edge value representing the particular edge, node values representative of a respective source node and destination node of the particular edge, and a global state value associated with the graph representation, and providing the edge composite value to an edge multi-layer perceptron unit to generate a resultant transformed edge corresponding to the particular edge.
- Applying the graph convolution process may include generating, for a particular node of the nodes of the graph representation, a node composite value based on an average of intermediate values, computed using one or more node multi-layer perceptrons, based on a respective one of incoming edge values representing incomings edges directed to the particular node and a value of the particular node.
- Applying the graph convolution process comprises may include averaging values of the nodes of the graph representation to generate an average node value generating a global composite value based on the average node value and a global state value associated with the graph representation, and providing the global composite value to a global multi-layer perceptron unit to generate a resultant transformed global state value corresponding to the global state value associated with the graph representation.
- Applying the graph convolution process may include applying the graph convolution process using at least one graph neural network system.
- the method may further include performing preprocessing on a received raw data record to produce the multi-dimensional item, including performing one or more of, for example, Gaussian normalization applied to the received raw data record, and/or removing one or more data elements of the received raw data record. Such removing may be based on at least one of, for example, entropy associated with the one or more data elements, sparseness associated with the one or more data elements, a p-value associated with the one or more data elements, and/or a low-effect size associated with the one or more data elements.
- Removing one or more data elements may include identifying a particular data element as a rare element in response to determining, based on training data to train a learning engine implementation for performing the preprocessing, that the particular data element is present in fewer than an adjustable threshold number of data records comprising the training data, with the adjustable threshold number being adjusted based on likelihood of occurrence of anomalous values for the particular data element, and removing from runtime data records the particular data element identified as the rare element.
- Applying the graph convolution process to the graph representation of the multi-dimensional item may include applying a learning-engine implementation of a graph-convolution process.
- a system in some variations, includes an input stage to one or more input data records, and a controller, implementing one or more learning engines, in communication with a memory device to store programmable instructions.
- the controller is configured to convert a set of data values representative of a multi-dimensional item into a graph representation of the multi-dimensional item, with the graph representation comprising nodes and edges, apply a graph convolution process to the graph representation of the multi-dimensional item to generate a transformed graph representation for the multi-dimensional items comprising a resultant transformed configuration of the nodes and edges representing the multi-dimensional item, and determine, based on the resultant transformed configuration of the nodes and edges representing the multi-dimensional item, a probability that the multi-dimensional item is anomalous.
- a non-transitory computer readable media for storing a set of instructions, executable on at least one programmable device, to convert a set of data values representative of a multi-dimensional item into a graph representation of the multi-dimensional item, with the graph representation comprising nodes and edges, apply a graph convolution process to the graph representation of the multi-dimensional item to generate a transformed graph representation for the multi-dimensional items comprising a resultant transformed configuration of the nodes and edges representing the multi-dimensional item, and determine, based on the resultant transformed configuration of the nodes and edges representing the multi-dimensional item, a probability that the multi-dimensional item is anomalous.
- Embodiments of the system and the non-transitory computer readable media may include at least some of the features described in the present disclosure, including any one or more of the features described above in relation to the method.
- another method for detection and classification of data.
- the method includes receiving input data at a neural network circuit comprising a plurality of node layers, with each of the plurality of node layers comprising respective one or more nodes, and with the neural network circuit further comprising adjustable weighted connections connecting at least some nodes in different layers of the plurality of node layers.
- the method also includes removing one or more of the weighted connections at one or more time instances.
- Embodiments of the other method may include at least some of the features described in the present disclosure, including one or more of the following features.
- the neural network circuit may be a feed-forward neural network circuit.
- Removing the one or more of the weighted connections may include selecting the one or more of the weighted connections randomly, and removing the randomly selected one or more of the weighted connections.
- Removing the one or more of the weighted connections may include selecting a set of multiple connections from the weighted connections based, at least in part, on output of the neural network circuit, and selecting randomly the one or more of the weighted connections from the selected set of multiple connections.
- Selecting the set of multiple connections may include selecting one or more pairs of node layers of the neural network circuit according to the output of the neural network circuit, and removing at least one weighted connection between node layers of the selected one or more pairs of node layers.
- Selecting the set of multiple connections may include selecting the set of multiple connections according to output values produced by elements of an output node layer of the neural network circuit and a plurality of output ranges defined for possible values produced by the output node layer.
- the method may further include configuring at least some of the weighted connections according to a biasing factor in response to output of the neural network resulting from an input data record, of the received input data, processed by the neural network.
- the biasing factor may be a multiplication factor applied to the at least some of the weighted connections through a back-propagation operation in response to a determination that the neural network correctly identified the input data record as being anomalous.
- the method may further include performing preprocessing on a received raw data record to produce an input data record provided to the neural network circuit, including performing one or more of, for example, Gaussian normalization applied to the raw data record, and/or removing one or more data elements of the raw data record. Such removing may be based on at least one of, for example, entropy associated with the one or more data elements, sparseness associated with the one or more data elements, a p-value associated with the one or more data elements, and/or a low-effect value associated with the one or more data elements.
- Removing one or more data elements may include identifying a particular data element as a rare element in response to determining, based on training data to train a learning engine implementation for performing the preprocessing, that the particular data element is present in fewer than an adjustable threshold number of data records comprising the training data, the adjustable threshold number being adjusted based on likelihood of occurrence of anomalous values for the particular data element, and removing from runtime data records the particular data element identified as the rare element.
- another system includes an input stage to receive one or more input data records, and a controller, implementing one or more learning engines, in communication with a memory device to store programmable instructions, to receive input data at a neural network circuit comprising a plurality of node layers, with each of the plurality of node layers comprising respective one or more nodes, and with the neural network circuit further comprising adjustable weighted connections connecting at least some nodes in different layers of the plurality of node layers, and remove one or more of the weighted connections at one or more time instances.
- another non-transitory computer readable media for storing a set of instructions, executable on at least one programmable device, to receive input data at a neural network circuit comprising a plurality of node layers, with each of the plurality of node layers comprising respective one or more nodes, and with the neural network circuit further comprising adjustable weighted connections connecting at least some nodes in different layers of the plurality of node layers, and remove one or more of the weighted connections at one or more time instances.
- Embodiments of the other system, and the other computer readable media may include at least some of the features described in the present disclosure, including at least some of the various features described above in relation to any of the different methods, systems, and media.
- FIG. 1 is a flow diagram illustrating operations/stages to perform data pre-processing for numerical data.
- FIG. 2 is a flow diagram showing preprocessing operations for categorical data.
- FIG. 3 is a flow diagram illustrating an example data preprocessing procedure for input data (e.g., post-training data).
- FIG. 4 is a flow diagram showing a procedure to identify anomalous data using graph neural networks.
- FIG. 5 is a diagram of a topology of an example detector neural network.
- FIG. 6 is a diagram of a features-to-nodes module to converts vector data into graph representation data.
- FIG. 7 is a diagram illustrating transformation of an initial graph representation into a resultant transformed representation.
- FIG. 8 include diagrams showing iterative updating of edges, nodes and a global state in a graph representation.
- FIG. 9 is a flowchart of an example procedure to detect and classify.
- FIG. 10 is a flowchart of an example procedure to detect and classify.
- FIG. 11 is a schematic diagram of a computing system.
- a first implementation detects anomalous data based on graph representation in which data (after optionally being pre-processed to remove certain features, to normalize the data to be represented as a vector of a pre-determined dimensionality that can be input into a learning machine, etc.) is converted into a graph representation comprising a resultant configuration of nodes connected by weighted edges.
- a graph neural network trained to generate a resultant transformed graph that organized the nodes and edges into a resultant representation (in which the nodes' clustering can be indicative of their relevance or anomality) is applied.
- the transformed graph representation is then processed by a post-transformation stage to generate an output vector, based on which an output determination (e.g., suspected transaction, or valid transaction) can be made.
- an output determination e.g., suspected transaction, or valid transaction
- a feed-forward neural network with an adjustable configuration controlled through a dropout operation (as will be discussed in greater detail below) is provided.
- the input data may be pre-processed (similarly to the pre-processing applied for the input data provided to the first example implementation, with such pre-processing including culling unimportant, redundant, or non-impactful features and values, normalizing the data, etc.)
- the pre-processed data is then provided to a multi-layer feed forward neural network, for which the various connections interconnecting the network's nodes (elements) can be controllably removed or adjusted (in some examples, based on the determined output of the feed-forward network).
- a flow diagram 100 illustrating operations/stages to perform data pre-processing for numerical data is shown.
- the use of preprocessing helps to reduce dimensionality of the data (thus reducing the computation effort required for operating the neural networks of the system, and making the data conform to what the receiving system can handle as input), and to make the neural networks more sensitive to anomalous data (e.g., outliers).
- the input data may be optionally preprocessed to facilitate and/or optimize neural network performance.
- Neural network training data can be used to determine the preprocessing parameters.
- Numerical features are gaussian normalized according to, for example, the distribution of training data, and dropped altogether if the entropy of that feature exceeds some threshold (as illustrated in FIG. 1 ). More particularly, and as depicted in FIG. 1 , a training set 110 of numerical features is used to determine parameters for gaussian normalization, which are reused when inputting test data to the neural network. First, the entropy of each numerical column is determined, and columns with an entropy above or below a defined threshold are dropped (at block 120 of FIG. 1 ).
- input data is gaussian normalized (at block 130 ) according to the mean and standard deviation of the data column (i.e., the mean and standard deviation generated for a particular feature or field in the records of the training data).
- the means and standard deviations of each column may be saved on a computer-readable medium, and are used when more data is input to the model ( 140 ).
- Training data can thus be processed to identify and drop sparse columns (e.g., corresponding to data fields that might not provide meaningful training input).
- Low frequency categories fields in the records of the remaining data may be classified as rare if their frequency is below a threshold frequency.
- Threshold frequency for rare classification may be lowered depending on the fraud likelihood of the category, or if the feature has a set number of categories or unrestricted number of categories. Subsequent to the rare encoding, columns with high p-values or low effect size can be removed/discarded.
- columns or fields of data records the corresponding to categories (e.g., descriptive data from a finite set of values or descriptions, such as a month field, purchase type field, etc.) are replaced with, for example, one-hot columns for each column category (in one-hot encoding, a vector representation may include, for example, one element that is ‘1’ with other elements of the vector being ‘0’).
- category data represented as alpha-numerical strings may be replaced with integer indices.
- FIG. 2 is a flow diagram 200 showing preprocessing operations for categorical data (e.g., descriptive data rather than numerical data).
- the training set of categorical features ( 210 ) is used to determine which features to use, which categories to use, and which categories to classify as “rare.”
- columns below a certain sparsity level are dropped (at block 220 ).
- categories for each column are classified as “rare” if the category occurs some number of times below a “rare” threshold (as determined at block 230 ).
- An exception to this rule is if the category is below the cutoff, but is still one of the top three (3), or some other number of categories, most frequent categories.
- the p-value and Cramer's Corrected Statistic, or “effect size,” are calculated for each categorical column. Columns with a p-value above a threshold, and columns below an effect size threshold, are dropped/discarded (at block 240 ).
- the categorical columns are encoded for input to the neural network modules. For the feed-forward neural network, categorical features are one-hot encoded, such that each categorical entry may be replaced by a number of columns equal to the number of categories for that categorical column (at blocks 250 and 260 ). For the graph neural network, the number of columns stays the same, but categorical strings in each column are replaced by an integer label (at blocks 270 and 280 ).
- a flow diagram 300 illustrating an example data preprocessing procedure for input data (e.g., post-training data) is shown.
- a data record 310 (depicted as a column with entries corresponding to fields or features) includes numerical features and categorical features (i.e., populated by descriptive categories from a finite dictionary or set of values).
- Input numerical features are gaussian-normalized (at block 320 ) according to the distribution determined by the training data.
- numerical values may by normalized (e.g., based on a Gaussian normalization process) according to mean and standard deviation values ( ⁇ and ⁇ ) that may have been determined during training phase.
- categorical features e.g., descriptive features populated based on a finite dictionary of values/terms
- categorical features are translated (at block 330 ) to “rare” if either the category was “rare” during training, or if the category was not seen in the training data.
- Categorical features which saw no “rare” categories during training e.g., all categories for this column in training data were present with high frequency, but are input with a category not present in the training set, will ignore the new category input and instead use no information for this column.
- a resultant transaction record 340 is generated.
- FIG. 4 is a flow diagram 400 showing a procedure to identify anomalous data using graph neural networks.
- a graph neural network module (as depicted in FIG. 4 ) of the present disclosure turns the input data 410 (which may correspond to the resultant transaction record 340 of FIG. 3 ) into a graph representation, then outputs (at the “Linear+Softmax” module 480 ) the probability that the record is anomalous (e.g., whether a transaction, represented by the record, is fraudulent or legitimate).
- Each of the individual data features of the transaction data is translated into a high-dimensional graph node representation using, for example, a features-to-nodes module 420 .
- FIG. 6 is a diagram showing an example implementation of a features-to-nodes module (such as the module 420 ) which turns/converts vector data (representative of a data record, such as a transaction record) into a graph representation for input into the graph neural network module.
- a features-to-nodes module such as the module 420
- Each individual feature included in the input transaction data can be mapped from 1-dimensional space to a high dimensional space (e.g., d>16) by a multi-layer perceptron (MLP) arrangement (depicted as the structure 620 in FIG. 6 ).
- MLP multi-layer perceptron
- the MLP arrangement may be implemented as an artificial neural network (ANN), such a feedforward ANN, but other types of neural networks (as discussed herein), and/or other types of learning machines, may be used to implement the MLP arrangement of FIG. 6 or the other MLP arrangements discussed herein (e.g., with respect to FIG. 8 , as more particularly detailed below).
- ANN artificial neural network
- a separate MLP is trained for each individual input feature.
- the output of an individual MLP is a resultant multi-dimensional vector (such as vector 630 in FIG. 6 ) that can be represented node within a graph representation of the input data record.
- the resultant output vectors, representing nodes provide not only data representative of the feature information (that was input into respective MLP's) but also their positional/orientational relationship, in the graph representation, to other resultant nodes in the graph representation.
- Such graphical representation of data can be used to determine if there are abnormal relationships between various nodes in a graph representation (e.g., if the orientation between, for example, a group of several (e.g., 3) particular nodes is such that the angles between straight lines passing between is unusually large).
- the nodes of the node-based graphical representation of the input data records are made into a fully connected graph, using a learned initial edge representation.
- the interconnected edge elements in the resultant graph representation of the node representations for the output of the MLP structures may share an initial weight vector, which is determined by the neural network training process.
- the graph representation to the GNN module ( 440 ), which may be implemented using a neural network or some other learning machine, transforms the initial graphical representation 430 of the input data record into a transformed graph representation resulting from the learned behavior/configuration of the GNN to identify anomalous data.
- the transformed graph (represented as a graph 450 in FIG. 4 ) may have been transformed (by updating the edge and node representations of the graph) so that important nodes are clustered into a configuration that can be indicative of the existence or lack of anomalous behavior.
- the resultant graph representation 450 is input to a global attention layer 460 , which outputs a vector representation 470 of the graph.
- the global node attention operation can thus generate a composite vector representation based on the individual nodes. For example, nodes of the graph representation 450 are input to the global attention module 460 , providing a node weight for each node. The node representations are multiplied by their weights, and averaged.
- the global node attention operation can be represented according to
- V output ⁇ ( v 1 , v 2 , ... , v d ) w a ⁇ ( a 1 , a 2 , ... , a d ) + w b ⁇ ( b 1 , b 2 , ... , b d ) + w n ⁇ ( n 1 . n 2 , ... , n d ) n ,
- V output is the output vector 470
- each of a, b, n is one of the individual nodes of the transformed graph representation 450
- w a , . . . w n are the respective weights applied to the d-dimensional vector representation of the nodes.
- Other global node attention operations may be used.
- the final weighted, averaged node representation may next be operated on by a module 480 that transforms a single linear layer to, for example, 2-dimensions, which is then input to a softmax layer to produce class probabilities, quantifying the probability of the data as being anomalous (e.g., the transaction is erroneous/fraudulent) or as being within normal data patterns (e.g., the transaction is not suspected to be abnormal/suspicious). Other filtering or processing operations may be applied to the composite vector representation 470 .
- FIG. 7 comprising a diagram 700 illustrating the transformation of an initial graph representation (e.g., generated by the array of MLP structures 620 depicted in FIG. 6 ).
- the Graph Neural Network module 720 takes as input a graph 710 (where each node and edge may be represented by a vector), and outputs a graph 730 with updated values for each node and edge.
- FIG. 8 provides diagrams depicting the various operations performed by the graph neural network modules (such as the GNN module 720 depicted in FIG. 7 ).
- the process of updating graph state is referred to as “message-passing” or “graph convolution.”
- message-passing is implemented as follows. First edges are updated as shown in diagram 810 . For each edge, the edge representation, source node representation, destination node representation, and global representation are consolidated (e.g., concatenated) into a single vector.
- node representations are updated as shown in diagram 820 .
- a new representation is created for each of that node's incoming edges.
- the final node representation used is the average of each of these representations.
- the node representation for each incoming edge is created by, for example, concatenating the original node representation with the incoming edge representation, and using that as input to a node-MLP (such as node-MLP 822 depicted in FIG. 8 ), which outputs a new node representation.
- the graph global state is updated as shown in diagram 830 .
- the node representations for all nodes are averaged.
- the global state vector is concatenated with the average node vector, and used as input to a global-MLP (such as global-MLP 832 depicted in FIG. 8 ), which outputs a new global state representation.
- the procedure 900 includes converting 910 a set of data values representative of a multi-dimensional item into a graph representation of the multi-dimensional item, with the graph representation comprising nodes and edges.
- converting the set of data values representative of the multi-dimensional item may include transforming values comprising the multi-dimensional items into a plurality of respective multi-dimensional vectors by a plurality of trained multi-layer perceptron applied to the respective values.
- the procedures may also include generating, for the plurality of respective multi-dimensional vectors, a graph representation of nodes with interconnecting edges connecting at least some of the nodes, with positions and orientations of the interconnected nodes in the graph representation relative to each other being indicative of potential anomalous relationships between the set of data values of the multi-dimensional item.
- unusually skewed orientations can be indicative of abnormal (anomalous) relationships between different features of a multi-dimensional data item, which can indicate some oddity or inconsistency in the relationship between the features (which, in turn, can suggest an increased likelihood of unnatural or fraudulent behavior).
- the procedure 900 further includes applying 920 a graph convolution process to the graph representation of the multi-dimensional item to generate a transformed graph representation for the multi-dimensional item comprising a resultant transformed configuration of the nodes and edges representing the multi-dimensional item.
- applying the graph convolution process may include generating, for a particular edge of the edges of the graph representation, an edge composite value based on an edge value representing the particular edge, node values representative of a respective source node and destination node of the particular edge, and a global state value associated with the graph representation, and providing the edge composite value to an edge multi-layer perceptron unit to generate a resultant transformed edge corresponding to the particular edge.
- applying the graph convolution process may include generating, for a particular node of the nodes of the graph representation, a node composite value based on an average of intermediate values, computed using one or more node multi-layer perceptrons, based on a respective one of incoming edge values representing incomings edges directed to the particular node and a value of the particular node.
- applying the graph convolution process may include averaging values of the nodes of the graph representation to generate an average node value, generating a global composite value based on the average node value and a global state value associated with the graph representation, and providing the global composite value to an global multi-layer perceptron unit to generate a resultant transformed global state value corresponding to the global state value associated with the graph representation.
- the various operations performed with respect to the edge transformation, the node transformation, and the global state value transformation may be performed together or independently of each operation.
- applying the graph convolution process may include applying the graph convolution process using at least one graph neural network system.
- the procedure 900 further includes performing preprocessing on a received raw data record to produce the multi-dimensional item, including performing one or more of, for example, Gaussian normalization applied to the received raw data record, and/or removing one or more data elements of the received raw data record based at least one of, for example, entropy associated with the one or more data elements, sparseness associated with the one or more data elements, a p-value associated with the one or more data elements, and/or a low-effect value associated with the one or more data elements.
- Gaussian normalization applied to the received raw data record
- removing one or more data elements of the received raw data record based at least one of, for example, entropy associated with the one or more data elements, sparseness associated with the one or more data elements, a p-value associated with the one or more data elements, and/or a low-effect value associated with the one or more data elements.
- removing the one or more data elements may include identifying a particular data element as a rare element in response to determining, based on training data to train a learning engine implementation for performing the preprocessing, that the particular data element is present in fewer than an adjustable threshold number of data records comprising the training data, with the adjustable threshold number based adjusted based on likelihood of occurrence of anomalous values for the particular data element. For example, for a data element that is determine to include, at a higher relative frequency, anomalous values, its associated threshold may be increased so that the data element is not removed from a data records, and may thus be captured by the anomalous data detection engine.
- the procedure may also include removing from runtime data records the particular data element identified as the rare element.
- FIG. 5 is a diagram of an example implementation of a feed-forward neural network 500 , configured to increase sensitivity of the neural network to the presence of outliers in the input data.
- a feed forward neural network module converts a set of transaction information into a numerical array and outputs the probability that the data input is normal or spurious.
- the input information can be both numerical data (for example in a financial transaction use case, the numerical data can include payment total or days since the last order) or categorical data (for example, payment method or country of origin).
- the model includes of a series of vectors (layers), where each node in a layer may be connected to some or all the nodes in the previous and subsequent layers.
- An input layer 510 is (or receives) the array created from the numerical and categorical variables. The values in the input layer are multiplied by the weight values in the connections to create the array for a first hidden layer 520 of one or more hidden layers ( FIG. 5 shows multiple hidden layers). This process of multiplying each layer by the connection weights to the next layer is repeated until a last layer 530 (in the example of FIG. 5 , the last layer 530 includes 2 nodes). The values in the nodes of the last layer represent the probability the model predicts for the specific transaction to be erroneous. As further depicted in FIG.
- a dropout module 540 is connected to the neural network, and is configured to cut or remove one or more of the connections between nodes of different layers.
- the dropout may randomly cut connections between one or more or the layers, and may do so either at random instances, or in response to a certain event (e.g., the determination, at the output stage layer of the network, that the generated probability of the existence of an anomalous events exceeds a threshold).
- the trigger event may be a determination, at the output, of the existence of an anomalous event (e.g., according to a yes/no determination with respect to existence of an outlier or some aspect of the data rendering the data anomalous).
- the dropout module 540 may be configured to select the connection of the neural network at least partly based on deterministic criteria. For example, selection of the layers from which connections are to be (randomly) removed may be based, in part, on the output value produced by the network (e.g., selecting a connection between the first and second layer if the output is in some output range). The specific connection to be removed between the selected layer may then be picked randomly (or, at least partly, deterministically).
- the use of the dropout module 540 facilitates controlled structuring of the interconnections of the neural network in a way that increases sensitivity of the network to outlier data.
- increasing sensitivity of the neural network 500 to anomalous data may be achieved by applying a bias to weights of the neural network connections in response to, for example, a correct identification of a classification category (e.g., outlier/normal classification).
- a bias value e.g., a multiplication factor to increase the strength of at least some of the connections' weight
- biasing can be performed through the use of a biasing factor to modify the weights of each output class through a weight (float) value, used for weighting the loss function during training based on a certain bias factor.
- This biasing scheme allows the sensitivity of the neural network to anomalous data (upon a correct identification of an input data record as being anomalous) to increase. It has been observed that use of a biasing procedure is more effective (i.e., to increase sensitivity of the network) when used in conjunction with the dropout module 540 .
- the procedure 1000 includes receiving 1010 input data at a neural network circuit comprising a plurality of node layers, with each of the plurality of node layers comprising respective one or more nodes, with the neural network circuit further comprising adjustable weighted connections connecting at least some nodes in different layers of the plurality of node layers.
- the neural network circuit may be a feed-forward neural network circuit.
- the procedure 1000 further includes removing 1020 one or more of the weighted connections at one or more time instances.
- removing the one or more of the weighted connections may include selecting the one or more of the weighted connections randomly, and removing the randomly selected one or more of the weighted connections.
- part of the connection-selection process may be deterministic. For example, the layers between which one of the connections is to be removed may be selected based on output of the neural network circuit.
- removing the one or more of the weighted connections may include selecting a set of multiple connections from the weighted connections based, at least in part, on output of the neural network circuit, and selecting randomly the one or more of the weighted connections from the selected set of multiple connections.
- the procedure 1000 may further include configuring at least some of the weighted connections according to a biasing factor in response to output of the neural network resulting from an input data record, of the received input data, processed by the neural network.
- the biasing factor is a multiplication factor applied to the output of the feed forward neural network in response to a determination that the neural network correctly identified the input data record as being anomalous.
- the procedure 1000 may further include performing preprocessing on a received raw data record to produce an input data record provided to the neural network circuit, including performing one or more of, for example, Gaussian normalization applied to the raw data record, and/or removing one or more data elements of the raw data record based on at least one of, for example, entropy associated with the one or more data elements, sparseness associated with the one or more data elements, a p-value associated with the one or more data elements, and/or a low-effect value associated with the one or more data elements.
- Gaussian normalization applied to the raw data record
- removing one or more data elements of the raw data record based on at least one of, for example, entropy associated with the one or more data elements, sparseness associated with the one or more data elements, a p-value associated with the one or more data elements, and/or a low-effect value associated with the one or more data elements.
- removing one or more data elements may include identifying a particular data element as a rare element in response to determining, based on training data to train a learning engine implementation for performing the preprocessing, that the particular data element is present in fewer than an adjustable threshold number of data records comprising the training data, with the adjustable threshold number being adjusted based on likelihood of occurrence of anomalous values for the particular data element, and removing from runtime data records the particular data element identified as the rare element.
- Neural networks are in general composed of multiple layers of linear transformations (multiplications by a “weight” matrix), each followed by a nonlinear function (e.g., a rectified linear activation function, or ReLU, etc.)
- the linear transformations are learned during training by making small changes to the weight matrices that progressively make the transformations more helpful to the final classification task.
- a multilayer network is adapted to analyze data (such as transaction data for normal and suspicious transactions, or other types of data), taking into account the dimensionality or resolution of the data (e.g., a preprocessing stage may be applied to the data to normalize and/or cull some of the fields).
- the layered network may include convolutional processes which are followed by pooling processes along with intermediate connections between the layers to enhance the sharing of information between the layers.
- learning engine approaches/architectures include generating an auto-encoder and using a dense layer of the network to correlate with probability for a future event through a support vector machine, or constructing a regression or classification neural network model that predicts a specific output from data records (based on training reflective of correlation between similar records and the output that is to predicted).
- neural networks examples include convolutional neural network (CNN), feed-forward neural networks, recurrent neural networks (RNN, e.g., implemented, for example, using long short-term memory (LSTM) structures), etc.
- Feed-forward networks include one or more layers of perceptrons (the learning nodes/elements) with connections to one or more portions of the input data.
- the connectivity of the inputs and layers of perceptrons is such that input data and intermediate data propagate in a forward direction towards the network's output. There are typically no feedback loops or cycles in the configuration/structure of the feed-forward network.
- Convolutional layers allow a network to efficiently learn features by applying the same learned transformation to subsections of the data.
- the various learning processes implemented through use of the learning machines may be realized using keras (an open-source neural network library) building blocks and/or NumPy (an open-source programming library useful for realizing modules to process arrays) building blocks.
- the various learning engine implementations may include a trained learning engine (e.g., a neural network) and a corresponding coupled learning engine controller/adapter configured to determine and/or adapt the parameters (e.g., neural network weights) of the learning engine that would produce output representative of determined anomalous data (e.g., corresponding to potential fraudulent transactions).
- training data includes sets of input records (similar to the types of transaction input data that would be provided as input during runtime operations of the learning engines constituting the anomalous data detection systems described herein) along with corresponding data defining the ground truth for the input training data. After initial training of the various learning engines comprising the systems described herein, subsequent training may be intermittently performed (at regular or irregular periods).
- the learning engine adapters/controllers may perform additional training cycles to configure the learning engines to generate appropriate output consistent with the old types of data that the learning engines had previously been adapted for, and also consistent with the new types of data (e.g., corresponding to the new population groups or geographical regions).
- the adapter Upon completion of a training cycles by the adapter/controller coupled to a particular learning engine, the adapter provides data representative of updates/changes (e.g., in the form of parameter values/weights to be assigned to links of a neural-network-based learning engine) to the particular learning engine to cause the learning engine to be updated in accordance with the training cycle(s) completed.
- updates/changes e.g., in the form of parameter values/weights to be assigned to links of a neural-network-based learning engine
- Performing the various operations described herein may be facilitated by a controller system (e.g., a processor-based controller system).
- a controller system e.g., a processor-based controller system
- at least some of the various devices/systems described herein, including any neural network systems, may be implemented, at least in part, using one or more processor-based devices.
- the computing system 1100 includes a processor-based device (also referred to as a controller device) 1110 such as a personal computer, a server, a specialized computing device, and so forth, that typically includes a central processor unit 1112 , or some other type of controller (or a plurality of such processor/controller units).
- a processor-based device also referred to as a controller device
- the system includes main memory, cache memory and bus interface circuits (not shown in FIG. 11 ).
- the processor-based device 1110 may include a mass storage element 1114 , such as a hard drive (realize as magnetic discs, solid state (semiconductor) memory devices), flash drive associated with the computer system, etc.
- the computing system 1100 may further include a keyboard 1116 , or keypad, or some other user input interface, and a monitor 1120 , e.g., an LCD (liquid crystal display) monitor, that may be placed where a user can access them.
- the computing system 1100 may also include one or more sensors 1130 (e.g., an image-capture device, inertial sensors, environmental sensors, etc.) to obtain data to be analyzed.
- the processor-based device 1110 is configured to facilitate, for example, the implementation of detection of anomalous behavior in data (e.g., detection of fraudulent activity in financial transaction data), through implementation (using the computing system 1100 ) of trained learning machines, and according to the procedures and operations described herein.
- the storage device 1114 may thus include a computer program product that when executed on the processor-based device 1110 causes the processor-based device to perform operations to facilitate the implementation of procedures and operations described herein.
- the processor-based device may further include peripheral devices to enable input/output functionality.
- Such peripheral devices may include, for example, a CD-ROM drive and/or flash drive (e.g., a removable flash drive), or a network connection (e.g., implemented using a USB port and/or a wireless transceiver(s)), for downloading related content to the connected system.
- a CD-ROM drive and/or flash drive e.g., a removable flash drive
- a network connection e.g., implemented using a USB port and/or a wireless transceiver(s)
- Such peripheral devices may also be used for downloading software containing computer instructions to enable general operation of the respective system/device.
- the computing system 1100 may include one or more graphics processing units (GPU's, such as NVIDIA GPU's), and may also include special purpose logic circuitry, e.g., an FPGA (field programmable gate array), an ASIC (application-specific integrated circuit), a DSP processor, accelerated processing units (APU's), application processing units, etc., may be used in the implementation of the system 1100 in order to implement the learning engine including the neural networks.
- Other modules that may be included with the processor-based device 1110 are speakers, a sound card, a pointing device, e.g., a mouse or a trackball, by which the user can provide input to the computing system 1100 .
- the processor-based device 1110 may include an operating system, e.g., Windows XP® Microsoft Corporation operating system, Ubuntu operating system, etc.
- Computer programs include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language.
- machine-readable medium refers to any non-transitory computer program product, apparatus and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a non-transitory machine-readable medium that receives machine instructions as a machine-readable signal.
- PLDs Programmable Logic Devices
- any suitable computer readable media can be used for storing instructions for performing the processes/operations/procedures described herein.
- computer readable media can be transitory or non-transitory.
- non-transitory computer readable media can include media such as magnetic media (such as hard disks, floppy disks, etc.), optical media (such as compact discs, digital video discs, Blu-ray discs, etc.), semiconductor media (such as flash memory), electrically programmable read only memory (EPROM), electrically erasable programmable read only Memory (EEPROM), etc.), any suitable media that is not fleeting or not devoid of any semblance of permanence during transmission, and/or any suitable tangible media.
- transitory computer readable media can include signals on networks, in wires, conductors, optical fibers, circuits, any suitable media that is fleeting and devoid of any semblance of permanence during transmission, and/or any suitable intangible media.
- “or” as used in a list of items prefaced by “at least one of” or “one or more of” indicates a disjunctive list such that, for example, a list of “at least one of A, B, or C” means A or B or C or AB or AC or BC or ABC (i.e., A and B and C), or combinations with more than one feature (e.g., AA, AAB, ABBC, etc.).
- a statement that a function or operation is “based on” an item or condition means that the function or operation is based on the stated item or condition and may be based on one or more items and/or conditions in addition to the stated item or condition.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Security & Cryptography (AREA)
- Probability & Statistics with Applications (AREA)
- Image Analysis (AREA)
Abstract
Description
- This application claims priority to, and the benefit of, U.S. Provisional Application No. 62/939,236 entitled “METHODS AND SYSTEMS FOR DETECTING SPURIOUS DATA PATTERNS,” and filed Nov. 22, 2019, the content of which is incorporated herein by reference in its entirety.
- The ever-growing volume of electronic business and economic activity has been accompanied by a similar sharp increase in fraudulent and harmful electronic activity. Being able to robustly detect rare data patterns is beneficial in cases where anomalous behavior needs to be detected (e.g., through detection of data outliers) to prevent damage to devices or fraud in financial transactions.
- There is a need for robust detectors of spurious data patterns among a stream of data whether data source is sensors, financial transactions or server logs. In the present disclosure, a method and an apparatus for empowering the robust, fast and real time detection of spurious signals in data with a novel method and a device using a graph network methodology and/or a novel neural network topology is described. Analytical description of data preprocessing before data are fed as input to the system is also described.
- Disclosed are systems, methods, and other implementations to identify outlier data records from a set of records processed by a learning machine. Examples of such records may be transaction records (e.g., credit card records), with respect to which a learning system is configured to detect anomalous (outlying) activity or behavior. Such anomalous activity may be indicative of possible fraudulent activity.
- In the present disclosure, methods are described for using a neural network, or other types of learning machines, with specific configurations structured to robustly detect outliers in data streams. The novel neural network architectures of the present disclosure can be combined with a unique data preprocessing methodology to reduce the dimensionality of the input data based on specific data filters that maximize the entropy of the input data.
- In some embodiments, the implementations described herein use graph network topologies and processing to identify outliers or anomalous data. A method is thus provided to combine graph networks topology of the processed data with a neural network to automatically cluster data in a topological way that separates the spurious data patterns from normal data flow. The example implementations also include apparatus comprising the neural networks (or other types of learning machines), the topological graphs, and the neural network filters described herein. The example implementations additionally include non-transitory computer-readable medium having program code recorded thereon for filtering the input streaming data according to the preprocessing parameters and forwarding this data to the filtering neural network and the graph based topological calculator and neural network. The medium may include program code to, when executed by a processor, select at least one moment of an input of the data, along with the execution of the neural networks and graph topological transformers.
- The methods and apparatus of the present disclosure include engineered features that are created/generated from the base streaming data. The engineered features measure many aspects of the data instance and may or may not be interesting or germane to human-based analysis. Inputting these features to the neural network modules might or might not supply them with data relationships humans intuitively find interesting. The methods and apparatus include a methodology, device and code to flag specific patterns in data for potential review from a human reviewers (for example, in the case of a financial transaction) if a feature such as comparing the distance between billing and shipping addresses for a transaction is above a certain threshold, the transaction will be automatically flagged for potential review.
- In some variations, a method for robust detection and classification of data outliers is provided. The method includes converting a set of data values representative of a multi-dimensional item into a graph representation of the multi-dimensional item, with the graph representation comprising nodes and edges, applying a graph convolution process to the graph representation of the multi-dimensional item to generate a transformed graph representation for the multi-dimensional items comprising a resultant transformed configuration of the nodes and edges representing the multi-dimensional item, and determining, based on the transformed configuration of the nodes and edges representing the multi-dimensional item, a probability that the multi-dimensional item is anomalous.
- Embodiments of the method may include at least some of the features described in the present disclosure, including one or more of the following features.
- Determining the probability that the multi-dimensional item is anomalous may include processing the transformed configuration of the nodes and edges representing the multi-dimensional item with a global attention module to generate a resultant vector of values, and applying a softmax module to the resultant vector of values to derive the probability that the multi-dimensional item is anomalous.
- Converting the set of data values representative of the multi-dimensional item may include transforming values comprising the multi-dimensional items into a plurality of respective multi-dimensional vectors by a plurality of trained multi-layer perceptron applied to the respective values.
- The method may further include generating, for the plurality of respective multi-dimensional vectors, a graph representation of nodes with interconnecting edges connecting at least some of the nodes, with positions and orientations of the interconnected nodes in the graph representation relative to each other being indicative of potential anomalous relationships between the set of data values of the multi-dimensional item.
- Applying the graph convolution process may include generating, for a particular edge of the edges of the graph representation, an edge composite value based on an edge value representing the particular edge, node values representative of a respective source node and destination node of the particular edge, and a global state value associated with the graph representation, and providing the edge composite value to an edge multi-layer perceptron unit to generate a resultant transformed edge corresponding to the particular edge.
- Applying the graph convolution process may include generating, for a particular node of the nodes of the graph representation, a node composite value based on an average of intermediate values, computed using one or more node multi-layer perceptrons, based on a respective one of incoming edge values representing incomings edges directed to the particular node and a value of the particular node.
- Applying the graph convolution process comprises may include averaging values of the nodes of the graph representation to generate an average node value generating a global composite value based on the average node value and a global state value associated with the graph representation, and providing the global composite value to a global multi-layer perceptron unit to generate a resultant transformed global state value corresponding to the global state value associated with the graph representation.
- Applying the graph convolution process may include applying the graph convolution process using at least one graph neural network system.
- The method may further include performing preprocessing on a received raw data record to produce the multi-dimensional item, including performing one or more of, for example, Gaussian normalization applied to the received raw data record, and/or removing one or more data elements of the received raw data record. Such removing may be based on at least one of, for example, entropy associated with the one or more data elements, sparseness associated with the one or more data elements, a p-value associated with the one or more data elements, and/or a low-effect size associated with the one or more data elements.
- Removing one or more data elements may include identifying a particular data element as a rare element in response to determining, based on training data to train a learning engine implementation for performing the preprocessing, that the particular data element is present in fewer than an adjustable threshold number of data records comprising the training data, with the adjustable threshold number being adjusted based on likelihood of occurrence of anomalous values for the particular data element, and removing from runtime data records the particular data element identified as the rare element.
- Applying the graph convolution process to the graph representation of the multi-dimensional item may include applying a learning-engine implementation of a graph-convolution process.
- In some variations, a system is provided that includes an input stage to one or more input data records, and a controller, implementing one or more learning engines, in communication with a memory device to store programmable instructions. The controller is configured to convert a set of data values representative of a multi-dimensional item into a graph representation of the multi-dimensional item, with the graph representation comprising nodes and edges, apply a graph convolution process to the graph representation of the multi-dimensional item to generate a transformed graph representation for the multi-dimensional items comprising a resultant transformed configuration of the nodes and edges representing the multi-dimensional item, and determine, based on the resultant transformed configuration of the nodes and edges representing the multi-dimensional item, a probability that the multi-dimensional item is anomalous.
- In some variations, a non-transitory computer readable media is provided, for storing a set of instructions, executable on at least one programmable device, to convert a set of data values representative of a multi-dimensional item into a graph representation of the multi-dimensional item, with the graph representation comprising nodes and edges, apply a graph convolution process to the graph representation of the multi-dimensional item to generate a transformed graph representation for the multi-dimensional items comprising a resultant transformed configuration of the nodes and edges representing the multi-dimensional item, and determine, based on the resultant transformed configuration of the nodes and edges representing the multi-dimensional item, a probability that the multi-dimensional item is anomalous.
- Embodiments of the system and the non-transitory computer readable media may include at least some of the features described in the present disclosure, including any one or more of the features described above in relation to the method.
- In some variations, another method is provided for detection and classification of data. The method includes receiving input data at a neural network circuit comprising a plurality of node layers, with each of the plurality of node layers comprising respective one or more nodes, and with the neural network circuit further comprising adjustable weighted connections connecting at least some nodes in different layers of the plurality of node layers. The method also includes removing one or more of the weighted connections at one or more time instances.
- Embodiments of the other method may include at least some of the features described in the present disclosure, including one or more of the following features.
- The neural network circuit may be a feed-forward neural network circuit.
- Removing the one or more of the weighted connections may include selecting the one or more of the weighted connections randomly, and removing the randomly selected one or more of the weighted connections.
- Removing the one or more of the weighted connections may include selecting a set of multiple connections from the weighted connections based, at least in part, on output of the neural network circuit, and selecting randomly the one or more of the weighted connections from the selected set of multiple connections.
- Selecting the set of multiple connections may include selecting one or more pairs of node layers of the neural network circuit according to the output of the neural network circuit, and removing at least one weighted connection between node layers of the selected one or more pairs of node layers.
- Selecting the set of multiple connections may include selecting the set of multiple connections according to output values produced by elements of an output node layer of the neural network circuit and a plurality of output ranges defined for possible values produced by the output node layer.
- The method may further include configuring at least some of the weighted connections according to a biasing factor in response to output of the neural network resulting from an input data record, of the received input data, processed by the neural network.
- The biasing factor may be a multiplication factor applied to the at least some of the weighted connections through a back-propagation operation in response to a determination that the neural network correctly identified the input data record as being anomalous.
- The method may further include performing preprocessing on a received raw data record to produce an input data record provided to the neural network circuit, including performing one or more of, for example, Gaussian normalization applied to the raw data record, and/or removing one or more data elements of the raw data record. Such removing may be based on at least one of, for example, entropy associated with the one or more data elements, sparseness associated with the one or more data elements, a p-value associated with the one or more data elements, and/or a low-effect value associated with the one or more data elements.
- Removing one or more data elements may include identifying a particular data element as a rare element in response to determining, based on training data to train a learning engine implementation for performing the preprocessing, that the particular data element is present in fewer than an adjustable threshold number of data records comprising the training data, the adjustable threshold number being adjusted based on likelihood of occurrence of anomalous values for the particular data element, and removing from runtime data records the particular data element identified as the rare element.
- In some variations, another system is provided that includes an input stage to receive one or more input data records, and a controller, implementing one or more learning engines, in communication with a memory device to store programmable instructions, to receive input data at a neural network circuit comprising a plurality of node layers, with each of the plurality of node layers comprising respective one or more nodes, and with the neural network circuit further comprising adjustable weighted connections connecting at least some nodes in different layers of the plurality of node layers, and remove one or more of the weighted connections at one or more time instances.
- In some variations, another non-transitory computer readable media is provided, for storing a set of instructions, executable on at least one programmable device, to receive input data at a neural network circuit comprising a plurality of node layers, with each of the plurality of node layers comprising respective one or more nodes, and with the neural network circuit further comprising adjustable weighted connections connecting at least some nodes in different layers of the plurality of node layers, and remove one or more of the weighted connections at one or more time instances.
- Embodiments of the other system, and the other computer readable media may include at least some of the features described in the present disclosure, including at least some of the various features described above in relation to any of the different methods, systems, and media.
- Other features and advantages of the invention are apparent from the following description, and from the claims.
- These and other aspects will now be described in detail with reference to the following drawings.
-
FIG. 1 is a flow diagram illustrating operations/stages to perform data pre-processing for numerical data. -
FIG. 2 is a flow diagram showing preprocessing operations for categorical data. -
FIG. 3 is a flow diagram illustrating an example data preprocessing procedure for input data (e.g., post-training data). -
FIG. 4 is a flow diagram showing a procedure to identify anomalous data using graph neural networks. -
FIG. 5 is a diagram of a topology of an example detector neural network. -
FIG. 6 is a diagram of a features-to-nodes module to converts vector data into graph representation data. -
FIG. 7 is a diagram illustrating transformation of an initial graph representation into a resultant transformed representation. -
FIG. 8 include diagrams showing iterative updating of edges, nodes and a global state in a graph representation. -
FIG. 9 is a flowchart of an example procedure to detect and classify. -
FIG. 10 is a flowchart of an example procedure to detect and classify. -
FIG. 11 is a schematic diagram of a computing system. - Like reference symbols in the various drawings indicate like elements.
- Described herein are systems, methods, devices, media, and other implementations, including implementations based on learning machines (such as neural networks) to detect anomalous data (e.g., outliers). Two examples implementations are described herein. A first implementation detects anomalous data based on graph representation in which data (after optionally being pre-processed to remove certain features, to normalize the data to be represented as a vector of a pre-determined dimensionality that can be input into a learning machine, etc.) is converted into a graph representation comprising a resultant configuration of nodes connected by weighted edges. A graph neural network (GNN), trained to generate a resultant transformed graph that organized the nodes and edges into a resultant representation (in which the nodes' clustering can be indicative of their relevance or anomality) is applied. The transformed graph representation is then processed by a post-transformation stage to generate an output vector, based on which an output determination (e.g., suspected transaction, or valid transaction) can be made. In a second example implementation, a feed-forward neural network, with an adjustable configuration controlled through a dropout operation (as will be discussed in greater detail below) is provided. In the second example implementation, the input data may be pre-processed (similarly to the pre-processing applied for the input data provided to the first example implementation, with such pre-processing including culling unimportant, redundant, or non-impactful features and values, normalizing the data, etc.) The pre-processed data is then provided to a multi-layer feed forward neural network, for which the various connections interconnecting the network's nodes (elements) can be controllably removed or adjusted (in some examples, based on the determined output of the feed-forward network).
- With reference to
FIG. 1 , a flow diagram 100 illustrating operations/stages to perform data pre-processing for numerical data is shown. The use of preprocessing helps to reduce dimensionality of the data (thus reducing the computation effort required for operating the neural networks of the system, and making the data conform to what the receiving system can handle as input), and to make the neural networks more sensitive to anomalous data (e.g., outliers). As noted, before used as input data to an outlier detection system (e.g., implemented as a neural network), the input data may be optionally preprocessed to facilitate and/or optimize neural network performance. Neural network training data can be used to determine the preprocessing parameters. Numerical features (e.g., certain fields within transaction records) are gaussian normalized according to, for example, the distribution of training data, and dropped altogether if the entropy of that feature exceeds some threshold (as illustrated inFIG. 1 ). More particularly, and as depicted inFIG. 1 , atraining set 110 of numerical features is used to determine parameters for gaussian normalization, which are reused when inputting test data to the neural network. First, the entropy of each numerical column is determined, and columns with an entropy above or below a defined threshold are dropped (atblock 120 ofFIG. 1 ). Then, input data is gaussian normalized (at block 130) according to the mean and standard deviation of the data column (i.e., the mean and standard deviation generated for a particular feature or field in the records of the training data). The means and standard deviations of each column may be saved on a computer-readable medium, and are used when more data is input to the model (140). - To force a reduction in the dimensionality of the input data, categorical features are dropped if in the training set the feature is too sparse, the p-value is too high (the p-value is a measure of the probability that an observed difference could have occurred by random chance, with a low p-value being indicative of meaningful statistical significance of such an observed difference), or the effect size is too low. Training data can thus be processed to identify and drop sparse columns (e.g., corresponding to data fields that might not provide meaningful training input). Low frequency categories (fields in the records of the remaining data) may be classified as rare if their frequency is below a threshold frequency. Threshold frequency for rare classification may be lowered depending on the fraud likelihood of the category, or if the feature has a set number of categories or unrestricted number of categories. Subsequent to the rare encoding, columns with high p-values or low effect size can be removed/discarded. In some embodiments, when the processed training data is provided to train a feed-forward network, columns or fields of data records the corresponding to categories (e.g., descriptive data from a finite set of values or descriptions, such as a month field, purchase type field, etc.) are replaced with, for example, one-hot columns for each column category (in one-hot encoding, a vector representation may include, for example, one element that is ‘1’ with other elements of the vector being ‘0’). When the resultant data at is to be used to train a graph neural network, category data represented as alpha-numerical strings may be replaced with integer indices.
-
FIG. 2 is a flow diagram 200 showing preprocessing operations for categorical data (e.g., descriptive data rather than numerical data). The training set of categorical features (210) is used to determine which features to use, which categories to use, and which categories to classify as “rare.” First, columns below a certain sparsity level are dropped (at block 220). Next, categories for each column are classified as “rare” if the category occurs some number of times below a “rare” threshold (as determined at block 230). An exception to this rule is if the category is below the cutoff, but is still one of the top three (3), or some other number of categories, most frequent categories. Next, the p-value and Cramer's Corrected Statistic, or “effect size,” are calculated for each categorical column. Columns with a p-value above a threshold, and columns below an effect size threshold, are dropped/discarded (at block 240). Next the categorical columns are encoded for input to the neural network modules. For the feed-forward neural network, categorical features are one-hot encoded, such that each categorical entry may be replaced by a number of columns equal to the number of categories for that categorical column (atblocks 250 and 260). For the graph neural network, the number of columns stays the same, but categorical strings in each column are replaced by an integer label (atblocks 270 and 280). - With reference to
FIG. 3 , a flow diagram 300 illustrating an example data preprocessing procedure for input data (e.g., post-training data) is shown. A data record 310 (depicted as a column with entries corresponding to fields or features) includes numerical features and categorical features (i.e., populated by descriptive categories from a finite dictionary or set of values). Input numerical features are gaussian-normalized (at block 320) according to the distribution determined by the training data. Thus, numerical values may by normalized (e.g., based on a Gaussian normalization process) according to mean and standard deviation values (μ and σ) that may have been determined during training phase. In some examples, categorical features (e.g., descriptive features populated based on a finite dictionary of values/terms) of the input data (310) are translated (at block 330) to “rare” if either the category was “rare” during training, or if the category was not seen in the training data. Categorical features which saw no “rare” categories during training (e.g., all categories for this column in training data were present with high frequency), but are input with a category not present in the training set, will ignore the new category input and instead use no information for this column. Following the pre-processing performed on the input data record 310 (according tooperations 320 and 330), aresultant transaction record 340 is generated. - As noted, one example learning-based processing applied to preprocessed data is based on graph neural networks.
FIG. 4 is a flow diagram 400 showing a procedure to identify anomalous data using graph neural networks. A graph neural network module (as depicted inFIG. 4 ) of the present disclosure turns the input data 410 (which may correspond to theresultant transaction record 340 ofFIG. 3 ) into a graph representation, then outputs (at the “Linear+Softmax” module 480) the probability that the record is anomalous (e.g., whether a transaction, represented by the record, is fraudulent or legitimate). Each of the individual data features of the transaction data is translated into a high-dimensional graph node representation using, for example, a features-to-nodes module 420. -
FIG. 6 is a diagram showing an example implementation of a features-to-nodes module (such as the module 420) which turns/converts vector data (representative of a data record, such as a transaction record) into a graph representation for input into the graph neural network module. Each individual feature included in the input transaction data (depicted as record 610) can be mapped from 1-dimensional space to a high dimensional space (e.g., d>16) by a multi-layer perceptron (MLP) arrangement (depicted as thestructure 620 inFIG. 6 ). In some embodiments, the MLP arrangement may be implemented as an artificial neural network (ANN), such a feedforward ANN, but other types of neural networks (as discussed herein), and/or other types of learning machines, may be used to implement the MLP arrangement ofFIG. 6 or the other MLP arrangements discussed herein (e.g., with respect toFIG. 8 , as more particularly detailed below). In some embodiments, a separate MLP is trained for each individual input feature. The output of an individual MLP is a resultant multi-dimensional vector (such asvector 630 inFIG. 6 ) that can be represented node within a graph representation of the input data record. In some examples, the resultant output vectors, representing nodes, provide not only data representative of the feature information (that was input into respective MLP's) but also their positional/orientational relationship, in the graph representation, to other resultant nodes in the graph representation. Such graphical representation of data can be used to determine if there are abnormal relationships between various nodes in a graph representation (e.g., if the orientation between, for example, a group of several (e.g., 3) particular nodes is such that the angles between straight lines passing between is unusually large). The nodes of the node-based graphical representation of the input data records are made into a fully connected graph, using a learned initial edge representation. The interconnected edge elements in the resultant graph representation of the node representations for the output of the MLP structures may share an initial weight vector, which is determined by the neural network training process. - With continued reference to
FIG. 4 , the graph representation to the GNN module (440), which may be implemented using a neural network or some other learning machine, transforms the initialgraphical representation 430 of the input data record into a transformed graph representation resulting from the learned behavior/configuration of the GNN to identify anomalous data. For example, the transformed graph (represented as agraph 450 inFIG. 4 ) may have been transformed (by updating the edge and node representations of the graph) so that important nodes are clustered into a configuration that can be indicative of the existence or lack of anomalous behavior. Theresultant graph representation 450 is input to aglobal attention layer 460, which outputs avector representation 470 of the graph. The global node attention operation can thus generate a composite vector representation based on the individual nodes. For example, nodes of thegraph representation 450 are input to theglobal attention module 460, providing a node weight for each node. The node representations are multiplied by their weights, and averaged. The global node attention operation can be represented according to -
- where Voutput is the
output vector 470, each of a, b, n is one of the individual nodes of the transformedgraph representation 450, and wa, . . . wn are the respective weights applied to the d-dimensional vector representation of the nodes. Other global node attention operations (to generate a composite vector from the graph representation) may be used. - The final weighted, averaged node representation may next be operated on by a
module 480 that transforms a single linear layer to, for example, 2-dimensions, which is then input to a softmax layer to produce class probabilities, quantifying the probability of the data as being anomalous (e.g., the transaction is erroneous/fraudulent) or as being within normal data patterns (e.g., the transaction is not suspected to be abnormal/suspicious). Other filtering or processing operations may be applied to thecomposite vector representation 470. - Further details of the graph neural network module are provided in
FIG. 7 , comprising a diagram 700 illustrating the transformation of an initial graph representation (e.g., generated by the array ofMLP structures 620 depicted inFIG. 6 ). The GraphNeural Network module 720 takes as input a graph 710 (where each node and edge may be represented by a vector), and outputs agraph 730 with updated values for each node and edge. - An example process for calculating output graph node and edge representations is detailed herein. The graph neural network module functions by iteratively updating representations of the edges, then nodes, then global state of the graph. This process is illustrated by
FIG. 8 providing diagrams depicting the various operations performed by the graph neural network modules (such as theGNN module 720 depicted inFIG. 7 ). The process of updating graph state is referred to as “message-passing” or “graph convolution.” In the implementations described herein, message-passing is implemented as follows. First edges are updated as shown in diagram 810. For each edge, the edge representation, source node representation, destination node representation, and global representation are consolidated (e.g., concatenated) into a single vector. This vector is used as input to an edge-MLP (such asedge MLP 812 depicted inFIG. 8 ), which outputs a new edge representation of the same length as the original edge representation. Second, node representations are updated as shown in diagram 820. For each node, a new representation is created for each of that node's incoming edges. The final node representation used is the average of each of these representations. The node representation for each incoming edge is created by, for example, concatenating the original node representation with the incoming edge representation, and using that as input to a node-MLP (such as node-MLP 822 depicted inFIG. 8 ), which outputs a new node representation. Finally, the graph global state is updated as shown in diagram 830. In an example embodiment, first, the node representations for all nodes are averaged. Then, the global state vector is concatenated with the average node vector, and used as input to a global-MLP (such as global-MLP 832 depicted inFIG. 8 ), which outputs a new global state representation. - With reference to
FIG. 9 , a flowchart of anexample procedure 900 to detect and classify data (e.g., identifying data with anomalous behavior) is shown. Theprocedure 900 includes converting 910 a set of data values representative of a multi-dimensional item into a graph representation of the multi-dimensional item, with the graph representation comprising nodes and edges. In some examples, converting the set of data values representative of the multi-dimensional item may include transforming values comprising the multi-dimensional items into a plurality of respective multi-dimensional vectors by a plurality of trained multi-layer perceptron applied to the respective values. In such examples, the procedures may also include generating, for the plurality of respective multi-dimensional vectors, a graph representation of nodes with interconnecting edges connecting at least some of the nodes, with positions and orientations of the interconnected nodes in the graph representation relative to each other being indicative of potential anomalous relationships between the set of data values of the multi-dimensional item. For example, unusually skewed orientations can be indicative of abnormal (anomalous) relationships between different features of a multi-dimensional data item, which can indicate some oddity or inconsistency in the relationship between the features (which, in turn, can suggest an increased likelihood of unnatural or fraudulent behavior). - The
procedure 900 further includes applying 920 a graph convolution process to the graph representation of the multi-dimensional item to generate a transformed graph representation for the multi-dimensional item comprising a resultant transformed configuration of the nodes and edges representing the multi-dimensional item. In some examples, applying the graph convolution process may include generating, for a particular edge of the edges of the graph representation, an edge composite value based on an edge value representing the particular edge, node values representative of a respective source node and destination node of the particular edge, and a global state value associated with the graph representation, and providing the edge composite value to an edge multi-layer perceptron unit to generate a resultant transformed edge corresponding to the particular edge. In another example, applying the graph convolution process may include generating, for a particular node of the nodes of the graph representation, a node composite value based on an average of intermediate values, computed using one or more node multi-layer perceptrons, based on a respective one of incoming edge values representing incomings edges directed to the particular node and a value of the particular node. In yet another example, applying the graph convolution process may include averaging values of the nodes of the graph representation to generate an average node value, generating a global composite value based on the average node value and a global state value associated with the graph representation, and providing the global composite value to an global multi-layer perceptron unit to generate a resultant transformed global state value corresponding to the global state value associated with the graph representation. In some embodiments, the various operations performed with respect to the edge transformation, the node transformation, and the global state value transformation may be performed together or independently of each operation. In some examples, applying the graph convolution process may include applying the graph convolution process using at least one graph neural network system. - With continued reference to
FIG. 9 , theprocedure 900 additionally includes determining 930, based on the transformed configuration of the nodes and edges representing the multi-dimensional item, a probability that the multi-dimensional item is anomalous. Determining the probability that the multi-dimensional item is anomalous may include processing the transformed configuration of the nodes and edges representing the multi-dimensional item with a global attention module to generate a resultant vector of values, and applying a softmax module to the resultant vector of values to derive the probability that the multi-dimensional item is anomalous. - In some embodiments, the
procedure 900 further includes performing preprocessing on a received raw data record to produce the multi-dimensional item, including performing one or more of, for example, Gaussian normalization applied to the received raw data record, and/or removing one or more data elements of the received raw data record based at least one of, for example, entropy associated with the one or more data elements, sparseness associated with the one or more data elements, a p-value associated with the one or more data elements, and/or a low-effect value associated with the one or more data elements. In some embodiments, removing the one or more data elements may include identifying a particular data element as a rare element in response to determining, based on training data to train a learning engine implementation for performing the preprocessing, that the particular data element is present in fewer than an adjustable threshold number of data records comprising the training data, with the adjustable threshold number based adjusted based on likelihood of occurrence of anomalous values for the particular data element. For example, for a data element that is determine to include, at a higher relative frequency, anomalous values, its associated threshold may be increased so that the data element is not removed from a data records, and may thus be captured by the anomalous data detection engine. The procedure may also include removing from runtime data records the particular data element identified as the rare element. - As noted, another example implementation for detecting anomalous behavior (e.g., the existence of outliers) is based on a feed-forward neural network.
FIG. 5 is a diagram of an example implementation of a feed-forwardneural network 500, configured to increase sensitivity of the neural network to the presence of outliers in the input data. A feed forward neural network module converts a set of transaction information into a numerical array and outputs the probability that the data input is normal or spurious. The input information can be both numerical data (for example in a financial transaction use case, the numerical data can include payment total or days since the last order) or categorical data (for example, payment method or country of origin). The model includes of a series of vectors (layers), where each node in a layer may be connected to some or all the nodes in the previous and subsequent layers. Aninput layer 510 is (or receives) the array created from the numerical and categorical variables. The values in the input layer are multiplied by the weight values in the connections to create the array for a firsthidden layer 520 of one or more hidden layers (FIG. 5 shows multiple hidden layers). This process of multiplying each layer by the connection weights to the next layer is repeated until a last layer 530 (in the example ofFIG. 5 , thelast layer 530 includes 2 nodes). The values in the nodes of the last layer represent the probability the model predicts for the specific transaction to be erroneous. As further depicted inFIG. 5 , adropout module 540 is connected to the neural network, and is configured to cut or remove one or more of the connections between nodes of different layers. In some embodiments, the dropout may randomly cut connections between one or more or the layers, and may do so either at random instances, or in response to a certain event (e.g., the determination, at the output stage layer of the network, that the generated probability of the existence of an anomalous events exceeds a threshold). Instead of a probability exceeding a threshold, the trigger event may be a determination, at the output, of the existence of an anomalous event (e.g., according to a yes/no determination with respect to existence of an outlier or some aspect of the data rendering the data anomalous). In some examples, thedropout module 540 may be configured to select the connection of the neural network at least partly based on deterministic criteria. For example, selection of the layers from which connections are to be (randomly) removed may be based, in part, on the output value produced by the network (e.g., selecting a connection between the first and second layer if the output is in some output range). The specific connection to be removed between the selected layer may then be picked randomly (or, at least partly, deterministically). The use of thedropout module 540 facilitates controlled structuring of the interconnections of the neural network in a way that increases sensitivity of the network to outlier data. - In some embodiments, increasing sensitivity of the
neural network 500 to anomalous data (such as outlier data) may be achieved by applying a bias to weights of the neural network connections in response to, for example, a correct identification of a classification category (e.g., outlier/normal classification). Thus, for example, if during training the neural network produces a correct output in response to an input data record (e.g., correctly identifying, as defined in the ground truths for the training data, a particular record as corresponding to an outlier), a bias value (e.g., a multiplication factor to increase the strength of at least some of the connections' weight) is applied. The application of biasing can be performed through the use of a biasing factor to modify the weights of each output class through a weight (float) value, used for weighting the loss function during training based on a certain bias factor. This biasing scheme allows the sensitivity of the neural network to anomalous data (upon a correct identification of an input data record as being anomalous) to increase. It has been observed that use of a biasing procedure is more effective (i.e., to increase sensitivity of the network) when used in conjunction with thedropout module 540. - Thus, with reference to
FIG. 10 , a flowchart of anexample procedure 1000 to detect and classify data (e.g., as anomalous or as being an outlier, etc.) is shown. Theprocedure 1000 may be used in conjunction with other anomalous data detection procedures (such as theprocedure 900 depicted inFIG. 9 ). Theprocedure 1000 includes receiving 1010 input data at a neural network circuit comprising a plurality of node layers, with each of the plurality of node layers comprising respective one or more nodes, with the neural network circuit further comprising adjustable weighted connections connecting at least some nodes in different layers of the plurality of node layers. In some examples, the neural network circuit may be a feed-forward neural network circuit. - The
procedure 1000 further includes removing 1020 one or more of the weighted connections at one or more time instances. In some examples, removing the one or more of the weighted connections may include selecting the one or more of the weighted connections randomly, and removing the randomly selected one or more of the weighted connections. In some embodiments, part of the connection-selection process may be deterministic. For example, the layers between which one of the connections is to be removed may be selected based on output of the neural network circuit. In such examples, removing the one or more of the weighted connections may include selecting a set of multiple connections from the weighted connections based, at least in part, on output of the neural network circuit, and selecting randomly the one or more of the weighted connections from the selected set of multiple connections. In some embodiments, selecting the set of multiple connections may include selecting one or more pairs of node layers of the neural network circuit according to the output of the neural network circuit, and removing at least one weighted connection between node layers of the selected one or more pairs of node layers. Selecting the set of multiple connections may include selecting the set of multiple connections according to output values produced by elements of an output node layer of the neural network circuit and a plurality of output ranges defined for possible values produced by the output node layer. - In some embodiments, the
procedure 1000 may further include configuring at least some of the weighted connections according to a biasing factor in response to output of the neural network resulting from an input data record, of the received input data, processed by the neural network. In such embodiments, the biasing factor is a multiplication factor applied to the output of the feed forward neural network in response to a determination that the neural network correctly identified the input data record as being anomalous. - In some implementations, the
procedure 1000 may further include performing preprocessing on a received raw data record to produce an input data record provided to the neural network circuit, including performing one or more of, for example, Gaussian normalization applied to the raw data record, and/or removing one or more data elements of the raw data record based on at least one of, for example, entropy associated with the one or more data elements, sparseness associated with the one or more data elements, a p-value associated with the one or more data elements, and/or a low-effect value associated with the one or more data elements. In some examples, removing one or more data elements may include identifying a particular data element as a rare element in response to determining, based on training data to train a learning engine implementation for performing the preprocessing, that the particular data element is present in fewer than an adjustable threshold number of data records comprising the training data, with the adjustable threshold number being adjusted based on likelihood of occurrence of anomalous values for the particular data element, and removing from runtime data records the particular data element identified as the rare element. - As noted, implementation of the anomalous behavior detection systems and methods described herein may be realized using one or more learning machines such as neural networks. Neural networks are in general composed of multiple layers of linear transformations (multiplications by a “weight” matrix), each followed by a nonlinear function (e.g., a rectified linear activation function, or ReLU, etc.) The linear transformations are learned during training by making small changes to the weight matrices that progressively make the transformations more helpful to the final classification task. A multilayer network is adapted to analyze data (such as transaction data for normal and suspicious transactions, or other types of data), taking into account the dimensionality or resolution of the data (e.g., a preprocessing stage may be applied to the data to normalize and/or cull some of the fields). The layered network may include convolutional processes which are followed by pooling processes along with intermediate connections between the layers to enhance the sharing of information between the layers. Several examples of learning engine approaches/architectures that may be used include generating an auto-encoder and using a dense layer of the network to correlate with probability for a future event through a support vector machine, or constructing a regression or classification neural network model that predicts a specific output from data records (based on training reflective of correlation between similar records and the output that is to predicted).
- Examples of neural networks include convolutional neural network (CNN), feed-forward neural networks, recurrent neural networks (RNN, e.g., implemented, for example, using long short-term memory (LSTM) structures), etc. Feed-forward networks include one or more layers of perceptrons (the learning nodes/elements) with connections to one or more portions of the input data. In a feedforward network, the connectivity of the inputs and layers of perceptrons is such that input data and intermediate data propagate in a forward direction towards the network's output. There are typically no feedback loops or cycles in the configuration/structure of the feed-forward network. Convolutional layers allow a network to efficiently learn features by applying the same learned transformation to subsections of the data. In some embodiments, the various learning processes implemented through use of the learning machines may be realized using keras (an open-source neural network library) building blocks and/or NumPy (an open-source programming library useful for realizing modules to process arrays) building blocks.
- In some embodiments, the various learning engine implementations may include a trained learning engine (e.g., a neural network) and a corresponding coupled learning engine controller/adapter configured to determine and/or adapt the parameters (e.g., neural network weights) of the learning engine that would produce output representative of determined anomalous data (e.g., corresponding to potential fraudulent transactions). In such implementations, training data includes sets of input records (similar to the types of transaction input data that would be provided as input during runtime operations of the learning engines constituting the anomalous data detection systems described herein) along with corresponding data defining the ground truth for the input training data. After initial training of the various learning engines comprising the systems described herein, subsequent training may be intermittently performed (at regular or irregular periods). For example, upon the acquisition of new data corresponding to different population groups or geographical regions that may be associated with different transaction behaviors or characteristics (e.g., for systems configured to detect anomalous transactions), the learning engine adapters/controllers may perform additional training cycles to configure the learning engines to generate appropriate output consistent with the old types of data that the learning engines had previously been adapted for, and also consistent with the new types of data (e.g., corresponding to the new population groups or geographical regions). Upon completion of a training cycles by the adapter/controller coupled to a particular learning engine, the adapter provides data representative of updates/changes (e.g., in the form of parameter values/weights to be assigned to links of a neural-network-based learning engine) to the particular learning engine to cause the learning engine to be updated in accordance with the training cycle(s) completed.
- Performing the various operations described herein may be facilitated by a controller system (e.g., a processor-based controller system). Particularly, at least some of the various devices/systems described herein, including any neural network systems, may be implemented, at least in part, using one or more processor-based devices.
- Thus, with reference to
FIG. 11 , a schematic diagram of acomputing system 1100 is shown. Thecomputing system 1100 includes a processor-based device (also referred to as a controller device) 1110 such as a personal computer, a server, a specialized computing device, and so forth, that typically includes acentral processor unit 1112, or some other type of controller (or a plurality of such processor/controller units). In addition to theCPU 1112, the system includes main memory, cache memory and bus interface circuits (not shown inFIG. 11 ). The processor-baseddevice 1110 may include amass storage element 1114, such as a hard drive (realize as magnetic discs, solid state (semiconductor) memory devices), flash drive associated with the computer system, etc. Thecomputing system 1100 may further include akeyboard 1116, or keypad, or some other user input interface, and amonitor 1120, e.g., an LCD (liquid crystal display) monitor, that may be placed where a user can access them. Thecomputing system 1100 may also include one or more sensors 1130 (e.g., an image-capture device, inertial sensors, environmental sensors, etc.) to obtain data to be analyzed. - The processor-based
device 1110 is configured to facilitate, for example, the implementation of detection of anomalous behavior in data (e.g., detection of fraudulent activity in financial transaction data), through implementation (using the computing system 1100) of trained learning machines, and according to the procedures and operations described herein. Thestorage device 1114 may thus include a computer program product that when executed on the processor-baseddevice 1110 causes the processor-based device to perform operations to facilitate the implementation of procedures and operations described herein. The processor-based device may further include peripheral devices to enable input/output functionality. Such peripheral devices may include, for example, a CD-ROM drive and/or flash drive (e.g., a removable flash drive), or a network connection (e.g., implemented using a USB port and/or a wireless transceiver(s)), for downloading related content to the connected system. Such peripheral devices may also be used for downloading software containing computer instructions to enable general operation of the respective system/device. Alternatively or additionally, in some embodiments, thecomputing system 1100 may include one or more graphics processing units (GPU's, such as NVIDIA GPU's), and may also include special purpose logic circuitry, e.g., an FPGA (field programmable gate array), an ASIC (application-specific integrated circuit), a DSP processor, accelerated processing units (APU's), application processing units, etc., may be used in the implementation of thesystem 1100 in order to implement the learning engine including the neural networks. Other modules that may be included with the processor-baseddevice 1110 are speakers, a sound card, a pointing device, e.g., a mouse or a trackball, by which the user can provide input to thecomputing system 1100. The processor-baseddevice 1110 may include an operating system, e.g., Windows XP® Microsoft Corporation operating system, Ubuntu operating system, etc. - Computer programs (also known as programs, software, software applications or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the term “machine-readable medium” refers to any non-transitory computer program product, apparatus and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a non-transitory machine-readable medium that receives machine instructions as a machine-readable signal.
- In some embodiments, any suitable computer readable media can be used for storing instructions for performing the processes/operations/procedures described herein. For example, in some embodiments computer readable media can be transitory or non-transitory. For example, non-transitory computer readable media can include media such as magnetic media (such as hard disks, floppy disks, etc.), optical media (such as compact discs, digital video discs, Blu-ray discs, etc.), semiconductor media (such as flash memory), electrically programmable read only memory (EPROM), electrically erasable programmable read only Memory (EEPROM), etc.), any suitable media that is not fleeting or not devoid of any semblance of permanence during transmission, and/or any suitable tangible media. As another example, transitory computer readable media can include signals on networks, in wires, conductors, optical fibers, circuits, any suitable media that is fleeting and devoid of any semblance of permanence during transmission, and/or any suitable intangible media.
- Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly or conventionally understood. As used herein, the articles “a” and “an” refer to one or to more than one (i.e., to at least one) of the grammatical object of the article. By way of example, “an element” means one element or more than one element. “About” and/or “approximately” as used herein when referring to a measurable value such as an amount, a temporal duration, and the like, encompasses variations of ±20% or ±10%, ±5%, or +0.1% from the specified value, as such variations are appropriate in the context of the systems, devices, circuits, methods, and other implementations described herein. “Substantially” as used herein when referring to a measurable value such as an amount, a temporal duration, a physical attribute (such as frequency), and the like, also encompasses variations of ±20% or ±10%, ±5%, or +0.1% from the specified value, as such variations are appropriate in the context of the systems, devices, circuits, methods, and other implementations described herein.
- As used herein, including in the claims, “or” as used in a list of items prefaced by “at least one of” or “one or more of” indicates a disjunctive list such that, for example, a list of “at least one of A, B, or C” means A or B or C or AB or AC or BC or ABC (i.e., A and B and C), or combinations with more than one feature (e.g., AA, AAB, ABBC, etc.). Also, as used herein, unless otherwise stated, a statement that a function or operation is “based on” an item or condition means that the function or operation is based on the stated item or condition and may be based on one or more items and/or conditions in addition to the stated item or condition.
- Although particular embodiments have been disclosed herein in detail, this has been done by way of example for purposes of illustration only, and is not intended to be limiting with respect to the scope of the appended claims, which follow. Features of the disclosed embodiments can be combined, rearranged, etc., within the scope of the invention to produce more embodiments. Some other aspects, advantages, and modifications are considered to be within the scope of the claims provided below. The claims presented are representative of at least some of the embodiments and features disclosed herein. Other unclaimed embodiments and features are also contemplated.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/100,243 US20210158161A1 (en) | 2019-11-22 | 2020-11-20 | Methods and Systems for Detecting Spurious Data Patterns |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201962939236P | 2019-11-22 | 2019-11-22 | |
US17/100,243 US20210158161A1 (en) | 2019-11-22 | 2020-11-20 | Methods and Systems for Detecting Spurious Data Patterns |
Publications (1)
Publication Number | Publication Date |
---|---|
US20210158161A1 true US20210158161A1 (en) | 2021-05-27 |
Family
ID=75973986
Family Applications (3)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/100,195 Active 2041-06-25 US11921697B2 (en) | 2019-11-22 | 2020-11-20 | Methods and systems for detecting spurious data patterns |
US17/100,243 Pending US20210158161A1 (en) | 2019-11-22 | 2020-11-20 | Methods and Systems for Detecting Spurious Data Patterns |
US18/595,518 Pending US20240346008A1 (en) | 2019-11-22 | 2024-03-05 | Methods and Systems for Detecting Spurious Data Patterns |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/100,195 Active 2041-06-25 US11921697B2 (en) | 2019-11-22 | 2020-11-20 | Methods and systems for detecting spurious data patterns |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US18/595,518 Pending US20240346008A1 (en) | 2019-11-22 | 2024-03-05 | Methods and Systems for Detecting Spurious Data Patterns |
Country Status (1)
Country | Link |
---|---|
US (3) | US11921697B2 (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114003791A (en) * | 2021-12-30 | 2022-02-01 | 之江实验室 | Depth map matching-based automatic classification method and system for medical data elements |
CN114154001A (en) * | 2021-11-29 | 2022-03-08 | 北京智美互联科技有限公司 | Method and system for mining and identifying false media content |
US11361449B2 (en) * | 2020-05-06 | 2022-06-14 | Luminar, Llc | Neural network for object detection and tracking |
US20220198471A1 (en) * | 2020-12-18 | 2022-06-23 | Feedzai - Consultadoria E Inovação Tecnológica, S.A. | Graph traversal for measurement of fraudulent nodes |
US11455531B2 (en) * | 2019-10-15 | 2022-09-27 | Siemens Aktiengesellschaft | Trustworthy predictions using deep neural networks based on adversarial calibration |
CN115830375A (en) * | 2022-11-25 | 2023-03-21 | 中国科学院自动化研究所 | Point cloud classification method and device |
US20230326215A1 (en) * | 2022-04-07 | 2023-10-12 | Waymo Llc | End-to-end object tracking using neural networks with attention |
US20240163187A1 (en) * | 2021-01-28 | 2024-05-16 | Wiz, Inc. | System and method for generation of unified graph models for network entities |
US20240161106A1 (en) * | 2022-11-15 | 2024-05-16 | U.S. Bank | Systems and methods for real-time identification of an anomaly of a block transactions graph of a blockchain |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102591898B1 (en) * | 2021-06-03 | 2023-10-20 | 주식회사 카카오뱅크 | Method for detecting fraud using effective transaction pattern and server performing the same |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180174025A1 (en) * | 2016-12-16 | 2018-06-21 | SK Hynix Inc. | Apparatus and method for normalizing neural network device |
US20180268289A1 (en) * | 2017-03-15 | 2018-09-20 | Nuance Communications, Inc. | Method and System for Training a Digital Computational Learning System |
US20190005377A1 (en) * | 2017-06-30 | 2019-01-03 | Advanced Micro Devices, Inc. | Artificial neural network reduction to reduce inference computation time |
US20190188567A1 (en) * | 2016-09-30 | 2019-06-20 | Intel Corporation | Dynamic neural network surgery |
US20190311220A1 (en) * | 2018-04-09 | 2019-10-10 | Diveplane Corporation | Improvements To Computer Based Reasoning and Artificial Intellignence Systems |
US20200097857A1 (en) * | 2010-03-15 | 2020-03-26 | Numenta, Inc. | Sparse Distributed Representation for Networked Processing in Predictive System |
US20200104716A1 (en) * | 2018-08-23 | 2020-04-02 | Samsung Electronics Co., Ltd. | Method and system with deep learning model generation |
US20200143203A1 (en) * | 2018-11-01 | 2020-05-07 | Stephen D. Liang | Method for Design and Optimization of Convolutional Neural Networks |
US20200364573A1 (en) * | 2019-05-15 | 2020-11-19 | Advanced Micro Devices, Inc. | Accelerating neural networks with one shot skip layer pruning |
US20210081798A1 (en) * | 2019-09-16 | 2021-03-18 | Samsung Electronics Co., Ltd. | Neural network method and apparatus |
US20210262204A1 (en) * | 2018-06-01 | 2021-08-26 | Motion Metrics International Corp. | Method, apparatus and system for monitoring a condition associated with operating heavy equipment such as a mining shovel or excavator |
US20230196753A1 (en) * | 2018-07-24 | 2023-06-22 | Samsung Electronics Co., Ltd. | Object recognition devices, electronic devices and methods of recognizing objects |
Family Cites Families (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060036372A1 (en) * | 2004-03-18 | 2006-02-16 | Bulent Yener | Method and apparatus for tissue modeling |
US20110264482A1 (en) * | 2010-04-22 | 2011-10-27 | Maher Rahmouni | Resource matching |
US10796319B2 (en) * | 2015-04-07 | 2020-10-06 | International Business Machines Corporation | Rating aggregation and propagation mechanism for hierarchical services and products |
US9407652B1 (en) * | 2015-06-26 | 2016-08-02 | Palantir Technologies Inc. | Network anomaly detection |
US20190042743A1 (en) * | 2017-12-15 | 2019-02-07 | Intel Corporation | Malware detection and classification using artificial neural network |
US20190199626A1 (en) * | 2017-12-26 | 2019-06-27 | Cisco Technology, Inc. | Routing traffic across isolation networks |
CA3041140C (en) * | 2018-04-26 | 2021-12-14 | NeuralSeg Ltd. | Systems and methods for segmenting an image |
US11341034B2 (en) * | 2018-08-06 | 2022-05-24 | International Business Machines Corporation | Analysis of verification parameters for training reduction |
EP3857431A1 (en) * | 2018-10-30 | 2021-08-04 | Google LLC | Automatic hyperlinking of documents |
-
2020
- 2020-11-20 US US17/100,195 patent/US11921697B2/en active Active
- 2020-11-20 US US17/100,243 patent/US20210158161A1/en active Pending
-
2024
- 2024-03-05 US US18/595,518 patent/US20240346008A1/en active Pending
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20200097857A1 (en) * | 2010-03-15 | 2020-03-26 | Numenta, Inc. | Sparse Distributed Representation for Networked Processing in Predictive System |
US20190188567A1 (en) * | 2016-09-30 | 2019-06-20 | Intel Corporation | Dynamic neural network surgery |
US20180174025A1 (en) * | 2016-12-16 | 2018-06-21 | SK Hynix Inc. | Apparatus and method for normalizing neural network device |
US20180268289A1 (en) * | 2017-03-15 | 2018-09-20 | Nuance Communications, Inc. | Method and System for Training a Digital Computational Learning System |
US20190005377A1 (en) * | 2017-06-30 | 2019-01-03 | Advanced Micro Devices, Inc. | Artificial neural network reduction to reduce inference computation time |
US20190311220A1 (en) * | 2018-04-09 | 2019-10-10 | Diveplane Corporation | Improvements To Computer Based Reasoning and Artificial Intellignence Systems |
US20210262204A1 (en) * | 2018-06-01 | 2021-08-26 | Motion Metrics International Corp. | Method, apparatus and system for monitoring a condition associated with operating heavy equipment such as a mining shovel or excavator |
US20230196753A1 (en) * | 2018-07-24 | 2023-06-22 | Samsung Electronics Co., Ltd. | Object recognition devices, electronic devices and methods of recognizing objects |
US20200104716A1 (en) * | 2018-08-23 | 2020-04-02 | Samsung Electronics Co., Ltd. | Method and system with deep learning model generation |
US20200143203A1 (en) * | 2018-11-01 | 2020-05-07 | Stephen D. Liang | Method for Design and Optimization of Convolutional Neural Networks |
US20200364573A1 (en) * | 2019-05-15 | 2020-11-19 | Advanced Micro Devices, Inc. | Accelerating neural networks with one shot skip layer pruning |
US20210081798A1 (en) * | 2019-09-16 | 2021-03-18 | Samsung Electronics Co., Ltd. | Neural network method and apparatus |
Non-Patent Citations (3)
Title |
---|
• NPL: Luo, Jian-Hao, et al. "ThiNet: Pruning CNN filters for a thinner net." (2018). (Year: 2018) * |
NPL: Sakti Saurav, et al. (2018). Online anomaly detection with concept drift adaptation using recurrent neural networks (Year: 2018) * |
NPL: SHERAZ NASEER, et al. (2018). Enhanced Network Anomaly Detection Based on Deep Neural Networks (Year: 2018) * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11455531B2 (en) * | 2019-10-15 | 2022-09-27 | Siemens Aktiengesellschaft | Trustworthy predictions using deep neural networks based on adversarial calibration |
US11361449B2 (en) * | 2020-05-06 | 2022-06-14 | Luminar, Llc | Neural network for object detection and tracking |
US20220198471A1 (en) * | 2020-12-18 | 2022-06-23 | Feedzai - Consultadoria E Inovação Tecnológica, S.A. | Graph traversal for measurement of fraudulent nodes |
US20240163187A1 (en) * | 2021-01-28 | 2024-05-16 | Wiz, Inc. | System and method for generation of unified graph models for network entities |
CN114154001A (en) * | 2021-11-29 | 2022-03-08 | 北京智美互联科技有限公司 | Method and system for mining and identifying false media content |
CN114003791A (en) * | 2021-12-30 | 2022-02-01 | 之江实验室 | Depth map matching-based automatic classification method and system for medical data elements |
US20230326215A1 (en) * | 2022-04-07 | 2023-10-12 | Waymo Llc | End-to-end object tracking using neural networks with attention |
US20240161106A1 (en) * | 2022-11-15 | 2024-05-16 | U.S. Bank | Systems and methods for real-time identification of an anomaly of a block transactions graph of a blockchain |
CN115830375A (en) * | 2022-11-25 | 2023-03-21 | 中国科学院自动化研究所 | Point cloud classification method and device |
Also Published As
Publication number | Publication date |
---|---|
US11921697B2 (en) | 2024-03-05 |
US20240346008A1 (en) | 2024-10-17 |
US20210157786A1 (en) | 2021-05-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11921697B2 (en) | Methods and systems for detecting spurious data patterns | |
CN116194929A (en) | Training of machine learning systems for transaction data processing | |
US8543522B2 (en) | Automatic rule discovery from large-scale datasets to detect payment card fraud using classifiers | |
CN112085565A (en) | Deep learning-based information recommendation method, device, equipment and storage medium | |
US11983720B2 (en) | Mixed quantum-classical method for fraud detection with quantum feature selection | |
CN111143838B (en) | Database user abnormal behavior detection method | |
CN113011889B (en) | Account anomaly identification method, system, device, equipment and medium | |
CN114090601B (en) | Data screening method, device, equipment and storage medium | |
CN110956278A (en) | Method and system for retraining machine learning models | |
Wambura et al. | Robust anomaly detection in feature-evolving time series | |
Jang et al. | Decision fusion approach for detecting unknown wafer bin map patterns based on a deep multitask learning model | |
US11916958B2 (en) | Phishing detection and mitigation | |
US20190340514A1 (en) | System and method for generating ultimate reason codes for computer models | |
Tomar et al. | Ensemble learning based credit card fraud detection system | |
CN115080868A (en) | Product pushing method, product pushing device, computer equipment, storage medium and program product | |
CN118586706A (en) | Enterprise risk assessment system, method, device, storage medium and program product | |
CN118295842A (en) | Data processing method, device and server for transaction system abnormal event | |
Liu et al. | Ensembled mechanical fault recognition system based on deep learning algorithm | |
Hassan et al. | Web Phishing Classification Model using Artificial Neural Network and Deep Learning Neural Network | |
CN115994331A (en) | Message sorting method and device based on decision tree | |
CN115063143A (en) | Account data processing method and device, computer equipment and storage medium | |
Anand et al. | A Comparative Analysis of Artificial Neural Networks in Time Series Forecasting Using Arima Vs Prophet | |
CN117745423B (en) | Abnormal account identification method | |
US12079375B2 (en) | Automatic segmentation using hierarchical timeseries analysis | |
Para et al. | Fraud detection in digital payments using data analytics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
STPP | Information on status: patent application and granting procedure in general |
Free format text: APPLICATION DISPATCHED FROM PREEXAM, NOT YET DOCKETED |
|
AS | Assignment |
Owner name: FRAUD.NET, INC., NEW YORK Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LOUIZOS, LOUIZOS ALEXANDROS;CHAUDHRY, AYAAN;PLUNKETT, GARY;AND OTHERS;SIGNING DATES FROM 20210223 TO 20210224;REEL/FRAME:055386/0531 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |