US20150032729A1 - Matching snippets of search results to clusters of objects - Google Patents
Matching snippets of search results to clusters of objects Download PDFInfo
- Publication number
- US20150032729A1 US20150032729A1 US14/337,352 US201414337352A US2015032729A1 US 20150032729 A1 US20150032729 A1 US 20150032729A1 US 201414337352 A US201414337352 A US 201414337352A US 2015032729 A1 US2015032729 A1 US 2015032729A1
- Authority
- US
- United States
- Prior art keywords
- objects
- data
- cluster
- matches
- snippet
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G06F17/30554—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/248—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/38—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
-
- G06F17/30598—
Definitions
- Some customer data providers attempt to address this challenge by using a crowd-sourced platform to build a contact database which is sourced and updated by sales and marketing professionals.
- the customer data provided by customer data providers often has a variety problems, such as invalid email addresses or invalid phone numbers, a contact record with incorrect information from a name spelled wrong to a bad address, incomplete or inaccurate records for company names, job titles, and phone numbers, non-current data, wrong company information or wrong contact data, duplicate contacts with inconsistent information, fields that are empty due to poor data capture techniques or contain other inaccurate information, completed fields that contain nonsense data such as “TBA” or “TBD,” and outdated information, such as a contact that no longer works at the contact's former company.
- systems and methods for matching and confidently adding snippets of search results to clusters of objects Information is searched based on objects in a cluster of objects.
- a data snippet is extracted from the search results.
- the data snippet is added to the cluster of objects if the data snippet includes data that matches at least one of the objects in the cluster of objects.
- a confidence score may be calculated for adding the data snippet to the cluster of objects based on the recency, a job title, an email address, and/or a phone number associated with the data snippet.
- the data snippet may be added to the cluster of objects in a customer accessible database if the confidence score is sufficiently high, and a notice for review may be generated if the confidence score is not sufficiently high.
- a database system searches a business database for information about a business contact stored in a contact database, wherein the contact database includes objects stored in a cluster of objects that correspond to a given name “Gregory,” a family name “Jones,” a company “International Business Machines,” a title “V.P for sales,” a location “New York City,” and an email address for a specific business contact.
- the database system extracts data that includes a given name “Greg,” a family name “Jones,” a company “IBM,” and a mobile phone number from the information in one of the search results.
- the database system determines whether the data snippet extracted from the information in the search results includes data that matches any of the objects stored in the cluster of objects in the contact database corresponding to the business contact named Gregory Jones.
- the database system adds the extracted data snippet, including the mobile phone number, to the objects stored in the cluster of objects in the customer accessible database that correspond to the business contact named Gregory Jones because the calculated confidence score is sufficiently high since both the data snippet and the objects in the cluster of objects include the uncommon family name “Jones.”
- a sales person planning on contacting Greg Jones at IBM now has Jones's mobile phone number that the sales person did not have previously.
- the database system builds, manages and sustains a high-quality person data object by bringing in data from multiple sources, normalizing, enriching, matching, and merging data to provide a “golden record,” or a best version of the data, for a person and the person's various business profile attributes.
- the database system leverages free web data sources such as news feeds, blogs and search results to mine attributes such as titles, social handles, etc., to further improve the quality of contact, company, and location data objects, and uses this additional data to build, validate and enrich person profiles.
- While one or more implementations and techniques are described with reference to an embodiment in which matching and confidently adding snippets of search results to clusters of objects is implemented in a system having an application server providing a front end for an on-demand database service capable of supporting multiple tenants, the one or more implementations and techniques are not limited to multi-tenant databases nor deployment on application servers. Embodiments may be practiced using other database architectures, i.e., ORACLE®, DB2® by IBM and the like without departing from the scope of the embodiments claimed.
- any of the above embodiments may be used alone or together with one another in any combination.
- the one or more implementations encompassed within this specification may also include embodiments that are only partially mentioned or alluded to or are not mentioned or alluded to at all in this brief summary or in the abstract.
- FIG. 1 is an operational flow diagram illustrating a high level overview of a method for matching and confidently adding snippets of search results to clusters of objects in an embodiment
- FIG. 2 is a block diagram of a system for matching and confidently adding snippets of search results to clusters of objects in an embodiment
- FIG. 3 illustrates a block diagram of an example of an environment wherein an on-demand database service might be used
- FIG. 4 illustrates a block diagram of an embodiment of elements of FIG. 3 and various possible interconnections between these elements.
- the term multi-tenant database system refers to those systems in which various elements of hardware and software of the database system may be shared by one or more customers. For example, a given application server may simultaneously process requests for a great number of customers, and a given database table may store rows for a potentially much greater number of customers.
- the term query plan refers to a set of steps used to access information in a database system.
- a lot of customer data makes up a database of contact records.
- the primary source for this customer data could be a website where users add and update business card information by adding or updating contact information one record at a time through a web form or by uploading comma separated value files that contain contact information. Users may also occasionally submit bounce email reports that contain error codes for invalid emails that they receive from their mail providers as part of their email marketing campaigns.
- a database system can receive and process millions of data records to provide new or updated data to customers in a timely manner.
- the database system cleans the data from the record, normalizes the data into a standard set of values that might be used for matching, enriches the data, and attempts to match the data with previously stored data to create a “golden record” for a person identified by the incoming data and/or previously stored data. False matches can result in the loss of good data and missed matches may reduce the value of previously stored data.
- the matching process also helps in identifying duplicates and decreases the possibilities that duplicate records are created for the same person. After the matching process returns a suitable list of matching person candidates, the database system adds the incoming data to a cluster of data objects that contains data values that matches data objects for the person identified by the incoming data.
- the database system creates a new cluster of data objects for the person if the database system does not already include any cluster of data objects that match data objects for the person identified by the incoming data. Then the database system determines whether to store the added data in a customer accessible database.
- a significant majority of bad and erroneous operations may be prevented, thereby resulting in much higher quality of customer data, if a database system treats every add or update contribution as a claim and takes into account the reputation of the user/partner who makes the claim.
- bad and erroneous operations may be prevented if a database system takes into account the type of the claim, the date and time of the claim, and further validates the claim with data from the free web and other sources with additional levels of data stewardship. Claims from trusted users can be treated as sources of truth and valuable enough to overwrite almost all existing information. Claims from average members and the free web may be treated as good as any other information. The more consistent points identified will prevail, such as if three people evaluate some data as good and one person evaluates the same data as bad, the evaluations as good prevail.
- the database system weighs a claim on a graded scale and calculates various scores to generate a confidence score that is then used to determine the type of actions that are needed before the claim is fully processed and applied to generate a “golden record” for a person.
- the database system determines the quality of each and every individual attribute such as names, titles, emails, phones, social handles etc. Each and every attribute of data in the claim is scored and weighed against similar attributes from other claims and golden records in case they already exist. If the data in an attribute of a new claim is of better quality than an existing attribute and the confidence score of the new claim is above a certain threshold, the database system uses the incoming attribute in a data snippet to replace the existing data for that attribute in the golden record.
- the attribute in the data snippet is linked to a person record, where data from multiple contacts is combined to create/update work profiles for the person record, allowing tracking of the lifecycle and work profile of contacts. If the attribute in the data snippet is conflicting or additional details are needed, the database system generates an additional task/alert to data stewards for additional review based on the importance of the data record and the attribute in question. If the attribute in the data snippet is of poor quality, then the database system rejects the claim and the state of the attribute and golden record remains unaffected. If there is not enough information to make a decision, there is not enough authority to change the state, or no new information is detected, then no decision is made, re-affirming the current state of the data.
- FIG. 1 is an operational flow diagram illustrating a high level overview of a method 100 for matching and confidently adding snippets of search results to clusters of objects.
- a database system may match and confidently add snippets of search results to clusters of objects.
- a database system searches information based on objects in a cluster of objects, block 102 .
- this can include the database system searching a business database for information about a business contact stored in a contact database, wherein the contact database includes objects stored in a cluster of objects that correspond to a given name “Gregory,” a family name “Jones,” a company “International Business Machines,” a title “V.P for sales,” a location “New York City, and an email address for a specific business contact.
- the database system extracts a data snippet from the search results, block 104 .
- this can include the database system extracting data that includes a given name “Greg,” a family name “Jones,” a company “IBM,” and a mobile phone number from the information in one of the search results.
- the database system determines whether the data snippet includes data that matches at least one of the objects in the cluster of objects, block 106 .
- this can include the database system determining whether the data snippet extracted from the information in the search result includes data that matches any of the objects stored in the cluster of objects in the contact database corresponding to the business contact named Gregory Jones. Whether the data snippet includes data that matches at least one of the objects in the cluster of objects may include matching based on first name aliases and/or acronym expansion.
- “Greg” is a given name alias that matches the given name “Gregory” and “IBM” is an acronym that can be expanded to match “International Business Machines.”
- the method continues to block 108 . If the data snippet does not include data that matches at least one of the objects in the cluster of objects, the method proceeds to block 110 . If the data snippet includes data that matches at least one of the objects in the cluster of objects, the database system adds the data snippet to the cluster of objects, block 108 .
- this can include the database system adding the extracted data snippet, including the mobile phone number, to the objects stored in the cluster of objects in the contact database that correspond to the business contact named Gregory Jones because both the data snippet and the objects in the cluster of objects include the uncommon family name “Jones.”
- the method 100 then proceeds to block 112 .
- the database system optionally stores the data snippet for matching with subsequent clusters of objects, block 110 .
- this can include the database system storing the data snippet for matching with subsequent clusters of objects if the data snippet does not include data that matches at least one of the objects in the cluster of objects, as the contact database may be later supplemented with an additional contact that includes an object which matches some of the data in the extracted data snippet.
- the method 100 either terminates or begins again at block 102 .
- the database system can also determine whether the data snippet includes data that matches objects in another cluster of objects, block 112 . In embodiments, this can include the database system determining that the extracted data snippet that includes “Greg,” “Jones,” and the mobile phone number also matches an object in another cluster of objects that includes “Greg,” “Jones,” and a company “Microsoft.” If the data snippet includes data that matches at least one of the objects in another cluster of objects, the method continues to block 114 . If the data snippet does not include data that matches at least one of the objects in another cluster of objects, the method proceeds to block 116 .
- the database system may combine the cluster of objects with the other cluster of objects, block 114 .
- this can include the database system combining the clusters of objects for the two business contacts named “Jones” whose objects include “International Business Machines” and “Microsoft.”
- Such a combination of clusters of objects for business contact objects could be useful for a sales person planning on contacting Greg Jones at IBM if the sales person knows some business contacts who worked at Microsoft at the time when Greg Jones worked at Microsoft.
- the database system After adding a data snippet to a cluster of objects, the database system optionally calculates a confidence score for adding the data snippet to the cluster of objects based on the recency, a job title, an email address, and/or a phone number associated with the data snippet, block 116 .
- this can include the database system calculating a confidence score based on how recent the data objects from the search result were stored in the business database, with the today's date of storage equated with the highest recency score.
- the database system calculates a confidence score based on a job title from the search result, with hierarchically higher job titles equated with a higher title rank score, and with job titles known to be used by the business contact's claimed company equated with a higher title quality score.
- the database system calculates a confidence score based on an email address from the search result, with the email score based on how well the email address matches the pattern of other email addresses for business contacts for the business contact's claimed company and how well the email address matches the first name and the last name of the business contact.
- the database system calculates a confidence score based on a phone number from the search result, where the phone number score is based on the consistency between the claimed phone number and the area code associated with the claimed geographic location for the business contact.
- the confidence score may be based on any weighted combination of the recency, the job title, the email address, and the phone number from the data snippet.
- the database system optionally determines whether a confidence score is sufficiently high for adding the data snippet to the cluster of objects stored in a customer accessible database, block 118 . In embodiments, this can include the database system determining that a confidence score is sufficiently high for a new mobile phone number to be added to the cluster of data objects for Greg Jones in a customer accessible database. If a confidence score is sufficiently high for adding the data snippet to the cluster of objects stored in a customer accessible database, the method 100 continues to block 120 . If a confidence score is not sufficiently high for adding the data snippet to the cluster of objects stored in a customer accessible database, the method 100 proceeds to block 122 . If the confidence score is sufficiently high for adding the data snippet to the cluster of objects stored in the customer accessible database, the database system optionally adds the data snippet to the cluster of objects stored in the customer accessible database, block 120 .
- this can include the database system storing the new mobile phone number in the contact database that is accessible by a sales person planning on contacting Greg Jones, who now has Jones' mobile phone number that the salesman did not have previously. Then the method 100 either terminates or begins again at block 102 .
- this example describes the database system using a confidence score to determine whether to add a data snippet to a cluster of objects in a customer accessible database
- the database system may also use a confidence score to determine whether to combine the cluster of objects with the other cluster of objects.
- the database system may also use a confidence score to determine whether to combine the cluster of objects with the other cluster of objects in a customer accessible database. If the confidence score is not sufficiently high for adding the data snippet to the cluster of objects stored in the customer accessible database, the database system optionally generates a notice for review, block 122 .
- the notice for review can include the database system generating a notice for reviewing the adding of the data snippet to the cluster of objects because the mobile phone number in the search results is not associated with New York City, the claimed office location for Jones in the search results, and the title “VP” in the search results is too generic and does not match any titles known to be used by IBM, the claimed company for Jones in the search results. Then the method 100 either terminates or begins again at block 102 . Accordingly, systems and methods are provided which enable a database system for matching and confidently adding snippets of search results to clusters of objects.
- the method 100 may be repeated as desired.
- this disclosure describes the blocks 102 - 122 executing in a particular order, the blocks 102 - 122 may be executed in a different order. In other implementations, each of the blocks 102 - 122 may also be executed in combination with other blocks and/or some blocks may be divided into a different set of blocks.
- FIG. 2 illustrates a block diagram of an example system for matching and confidently adding snippets of search results to clusters of objects, under an embodiment.
- the system 200 may illustrate a cloud computing environment in which data, applications, services, and other resources are stored and delivered through shared data-centers and appear as a single point of access for the users.
- the system 200 may also represent any other type of distributed computer network environment in which servers control the storage and distribution of resources and services for different client users.
- Storm is a real time, open source data streaming framework that functions entirely in memory.
- Storm constructs a processing graph, called a “topology,” that feeds data from input sources through processing nodes.
- the input data sources are called “spouts,” and the processing nodes are called “bolts.”
- the data model consists of tuples, which flow from spouts to the bolts, which execute user code. Besides simply being locations where data is transformed or accumulated, bolts may also join streams of data and branch streams of data.
- Storm is designed to be run on several machines to provided parallelism. Storm processes streams of tuples.
- a stream is defined to be an unlimited ordered sequence of tuples, and each tuple is a one dimensional array of objects.
- the system 200 acts as a central data processing hub, or clearing house, that brings in multiple data sources and free web data together to generate the “golden” record for core data assets around accounts and persons.
- the following describes the key components of the system 200 as part of a storm topology to implement the data processing pipeline.
- the system 200 generates a set of specialized keys for each person record and claims that enable fast lookups for the purpose of matching and retrieval. Indices are created for each company, person and location object in the cache, and these indices are used for person matching, company matching and location matching.
- a spout is a source of data streams in a Storm topology. Generally spouts will read tuples from an external source and emit them into the topology.
- the business directory spout 202 reads data from the business directory database and emits tuples, which are treated as claims, into the topology, such as contact added, contact updated, contact invalid phone, contact invalid email, and contact not at company. Each of these claim types has an associated contributor identifier that is the identity of the user who performed the action.
- the business directory spout 202 is an unbounded stream and keeps emitting data till there is no more data to be read.
- the tuples that are emitted out of the business directory spout 202 may be distributed randomly (shuffle grouping) to a normalize bolt 204 which is the first bolt in the pipeline. This data can also sent to a search engine bolt 206 which executes free web search queries and tries to find additional data around this contact, such as titles and social handles.
- the partial records spout 208 provides partial records from disparate sources.
- the partial records spout 208 reads contact data from a partial records database where files that are uploaded by users on the website are stored in raw format before partial records processing.
- the key difference here is that unlike the business directory spout 202 , the partial records spout 208 emits tuples based on partial data based on the data in the uploaded files. Also, the tuples that come out of the partial records spout 208 will often contain very poorly normalized data.
- the tuples that are emitted out of the partial records spout 208 may be distributed randomly (shuffle grouping) to the normalize bolt 204 which is the first bolt in the pipeline. This data can also sent to the search engine bolt 206 which executes free web search queries and tries to find additional data around this contact, such as titles and social handles. Examples of claims emitted by the partial records spout 208 include contact added and contact added for new company.
- the bounce email spout 210 reads bounce email error codes, which may be from comma separated value files that are uploaded by website administrators and website users. Examples of claims emitted by the bounce email spout include contact email and contact message.
- the bounce file message that the bounce email spout 210 receives for an email is typically unstructured text, such as records that are comma-separated with the email in the first column and the second column containing the bounce message as unstructured text.
- an automatic column mapping algorithm may initially process the first few lines of the file. The algorithm does not need to rely on the names of the column headers, but rather the algorithm can tokenize the bounce file.
- the field separator may be determined from the file by tokenizing on each kind of separator and computing how consistent the number of tokens the algorithm creates for the entire file. After determining the field separator, the algorithm can determine which column contains the email and which column contains the message. The algorithm may split out the record, remove the email, and concatenate the rest of the record to create the contact message claim.
- the emitted contact message claim is typically an unstructured snippet of text.
- the social handle spout 212 reads contact data and social handles from a social handle repository and submits claims such as contact social handle.
- the crawler spout 214 emits contacts found on the web from crawling websites for their management pages.
- the crawler spout 214 may start with a number of seed companies that the system 200 currently has and use it as the starting point for crawling. Examples of claims emitted by the crawler spout 214 include contact added and contact updated.
- the normalize bolt 204 processes all the tuples that come to it through a series of data normalization routines.
- the normalizer bolt 204 may standardize addresses, titles, phone numbers, and properly classify contact records by department and level. The following are some of the key normalizations.
- An address normalizer can include a list of abbreviations, such as E to East, W to West, Blvd to Boulevard; only allows letters, numbers, and special characters; and remove any space if there are any spaces around the special characters.
- a title normalizer may include a list of misspellings and abbreviations.
- a name normalizer can allow letters and special characters, not allow special characters at the beginning and the end of a name, capitalize the first letter and add a space after each name, capitalize the next letter if a name starts with “Mc,” and capitalize all Roman numerals.
- a city normalizer may only allow letters and special characters, and only keep the last non-space special character if there are a sequence of special characters.
- a base normalizer can return the correct country normalizer based on the country abbreviation.
- a phone normalizer may normalize phone patterns based on each country having its own phone pattern.
- a zip normalizer can normalize zip code patterns based on each country having its own zip code pattern.
- a state normalizer may normalize states based on countries having its own state requirements, if there are any.
- the enrich bolt 216 uses external data services for email verification, for phone verification and social append services for social handles, and appends a set of meta-attributes to all the new contact claims that enter the pipeline. After enrichment, the tuple may contain additional metadata around emails, phones and social handles that is useful for matching and merging purposes. The enrich bolt 216 passes this data to the match bolt 218 that tries to match the incoming contact claims with other existing claims and facts in the system 200 .
- the working data model of person object attributes may include: first name, last name, linkedin handle, twitter handle, other social handles, links to contact objects, work history, photos, education, and snippets, which are unstructured short pieces of text such as search result snippets, tweets, etc., and others containing person-identifying content.
- P denotes a person object and p.work_history.company_names denotes the names of companies p has worked at, p.work_history.cities denotes the set of all cities p has worked in, p.work_history.titles denotes the set of job titles that the person has held, and similar notations exist for work emails, work phones, work states, work countries, and social handles.
- the formats of the objects of different types of social handles (linkedin, twitter, etc.) is quite different, so it may not be necessary to have a different index type for a different type of social handle because there is no risk of a collision.
- a final check models the probability that a match is a chance event.
- M denote this match.
- the system 200 can estimate the upper bound on the expected number E(M) of objects in the universe that have the properties of the match M, under the universe probability model. If this upper bound estimate is below a certain threshold (1 may be a sensible choice) the system 200 accepts this match, otherwise the system 200 rejects the match.
- One way to estimate a suitable upper bound on E(M) is to model the probabilities of various attribute:value pairs under the universe probability model, then assume the independence of attributes in the match and multiply out these probabilities, then finally multiply this by n.
- E(M) n*product_ ⁇ a:v in M ⁇ P(a:v) (EUB 1. Modeling the probabilities of all attribute:value pairs in the universe is probably too complex, so the database system may begin by modeling the probabilities of certain key attributes and their value, drop all attributes other than these from M and still use (EUB 1. The result is still an estimate of the upper bound on E(M).
- the result-set size based estimate may not generalize as well as explicit modeling.
- the P(person_name) explicit model which assumes independence of first and last names does not generalize well.
- An alternative to an explicit estimate is a result-set size based estimate.
- the system 200 runs the matcher to find all true positive matches.
- ‘true positive’ may not include ‘modeling chance matches’. If there are at least two distinct objects in the result set, the system 200 deems that the probe being matched is not matched uniquely. This approach has the benefit that the P(a:v) probabilities are not explicitly modeled.
- the result set will carry the information to judge whether a match is unique or not, even in complex cases.
- This approach has the limitation that it does not model the real world; only the current, actual universe of (golden) data objects. Another issue is that to implement this approach, the system 200 may need to do this computation after all the true positives have been generated. Furthermore, the system 200 can match within the result set to check whether there are indeed at least two different objects or not.
- the search engine bolt 206 takes partial data (aka seed) and tries to find more publically available information via a search engine 220 , such as Yahoo® Boss, because finding titles and social handles is useful.
- the data thus obtained is passed through a search results bolt 222 to extract vital information and enrich a data record to build a full person profile, such as by passing the data to a handle extractor bolt 224 .
- the search results bolt 222 uses search result snippets having attractive properties that suggest they be made first-class “objects” in a person database 226 and/or contact data model and matching engines. Snippets are consumed without running afoul of terms of use restrictions. For the most part, snippets contain information about a single entity—a person, company or contact. Snippets might be matched to a different type of suitable object, such as person, company, or contact. Some snippets contain information about multiple companies at which a person has worked, so snippets could be used to connect together multiple contacts of the same person Such a matching is of mostly unstructured text (the snippet) to structured data (a particular contact object): This matching does not require entity extraction from the snippet.
- This matching could be algorithmically relatively easy to do.
- certain “nuggets” might be extracted from the snippet and the matching object enriched. For example, if the snippet contains a LinkedIn handle and the snippet matches a particular contact sufficiently well, this handle is then be attached to that contact.
- a snippet may tie together multiple contacts of the same person because the snippet contains the names of multiple companies at which the person has worked.
- Contact initiated snippets generation and matching may work as follows. Start with a contact J. Let C denote the cluster of the person database 226 containing J. Generate a suitable query Q to the search engine 220 from J. For each snippet S in the top search results on Q, if S matches C with a sufficiently high confidence, add S to C, otherwise add S to a collection of unmatched snippets. If the person name in J is sufficiently uncommon, set Q to person-name(J), else set Q to person-name(J)+company-name(J). Two examples are Pawan Nachnani and John Smith ibm. Note that there is no data quality risk by setting a query too broad, such as a common person name, because the resulting snippets will be deeply matched with C.
- An overly broad query does not yield good recall because none of the snippets in its result set deeply match C. Recall may be less important than precision because if the system 200 makes up for low recall by pounding away at the search engine 220 , so long as the system 200 is not constrained overly by search volume limits. Also, if the system 200 uses a mechanism to consume unmatched snippets, this mitigates the recall limitation a lot.
- C denotes the data of a single person. A snippet may contain data of this person spread across multiple contacts, which is why the database system matches S to C and not merely to J.
- the match bolt 218 includes bolts such as a handle bolt 228 , an email bolt 230 , a name@company bolt 232 , a name@phone bolt 234 , and a name@location bolt 236 to match snippets to clusters of objects in the person database 226 .
- a cluster bolt 238 clusters all matching claims together into a common cluster.
- a merge bolt 240 merges all claims and existing contact records (partial and/or complete) from a cluster into a single composite record (the merged record) and computes a confidence score for the merged record. If the merged record is incomplete, the merge bolt 240 enriches the record when possible with information available in the cache. If the record is complete, the merge bolt 240 marks the record as canonicalized. At this point, the record is ready to be persisted in the person database 226 , provided its confidence score is sufficiently high. The merge bolt 240 also updates the merge time of the incoming claim.
- r.day is today, then this score may have the value 1, and the score can reduce to 0 for a long time (many, many days) in the past.
- Score(r,rank) based on r.title. c-level titles may get a score of 1 and the rank score can monotonically decay for lower rank titles.
- Score(r,title_quality) High rank titles, e.g. Vice President, do not necessarily have high quality.
- Title_quality may score this separate dimension. A title might be deemed to have high quality if it has a known rank and has a known department and is not in an explicit list of poor titles. The quality may decrease depending on which (and how many) of the tests in the above sentence are violated.
- Score(r,domain) might only be defined when r's company has been matched to company jc.
- Score(r,d) #emails in domain d/#contacts in company jc.
- updates algorithmically deemed risky may be logged for review by a data steward or community. Feedback from the review can be used to assess the accuracy of this scoring/detection mechanism, and tuning of it if it is deemed useful enough.
- An update is risky if a contact's last name is changed.
- a title change with more than one level increase in rank, such as software engineer to ceo, is also risky.
- a score version of this may make the risk score depend on the number of skipped levels.
- a title change which changes departments to another incompatible department, such as. vp sales to vp engineering is also risky.
- Updating or adding a C-level contact in a large company is risky, but easy to generalize in a scoring setting—the higher the rank of the contact and the larger the company size, the higher the risk score may be. Also, different update actions might possibly have differing risks, such as a title change is generally more risky than a last name change for a female. A fortune 1000 headquarters address change is also risky, but scoring may generalize this to important company combined with attribute-specific change score overall risk score)
- the join bolt 242 takes all the merged claims from the merge bolt 240 and construct person objects.
- a person object may be a collection of major profiles, such as a person profile, a work profile, and a social profile.
- the data from each merged claim can update one or many attributes across all the three profiles of a person.
- a merge claim may end up creating new profile objects as new claims become available.
- Each attribute in a profile ends up with a confidence score that may ultimately determine the level of “gold” for that particular profile object. While most of the attributes might be permanent, some of the attributes could be transient and need to re-computed over time due to privacy and legal reasons.
- a persist bolt 244 may save all the resultant person records and the underlying claims to the person database 246 once all the processing is completed by the join bolt 242 .
- the bounce email processing bolt is a reaper bolt 246 that aggregates multiple facts with a current claim and comes up with a score and a disposition about that score.
- the reaper bolt 246 may determine if a fact is a duplicate.
- the fact disposition can determine if the computed score warrants a graveyard or ungraveyard of the underlying contact.
- the score of the current claim could be computed as follows: Take all claims and scored facts for the same email. For each fact, get the base score determined by the response category of the email. From the description from the bounce email spout 210 , the contact message is typically unstructured data.
- the reaper bolt 246 may address this by using a trie-based approach to find tokens specified in a list of vendor dictionaries.
- Each vendor dictionary can specify the token with a classified response category.
- Response categories for email may be hard_error, heavy_error, soft_error, email — received, unknown.
- the crawler spout 214 looks at free web (sites approved by a legal department for acceptable terms of service) and finds publically available information/claims. Since most of the open web sources of data are un-structured; the publicly available information typically requires sophisticated natural language processing techniques to extract meaningful information from it. Therefore, the crawler spout 214 feeds snippets of information to a natural language processing bolt 248 , which applies natural language processing and machine learning techniques to extract relevant data/facts to emit the following types of claims: contact added, contact updated, contact graveyarded, and social handles.
- a natural, human person may be represented as a graph of p:Person entities (nodes, or vertices) interconnected by links (edges). Each node can represent a different facet of the user (person). Each of these facets may be held in a separate (graph) container called a context.
- Each person entity node can be a set of attributes and objects. These attributes might be simple literals (such as the user's first name) or they could be other entities (called complex attributes). These latter attributes might be links to other entity nodes.
- each node in the person graph is located in its own context.
- the root node may lie in a special context (for each user) called the root context.
- the system 200 delivers this data to the person database 226 that is customer accessible. This golden data may also be propagated back to the original source systems and other partner systems and help keep the data clean in their respective source databases.
- the system 200 provides a complete 360 degrees feedback loop and reduces the chances that bad or fraudulent data may ever make it into customer's customer relationship management systems or any other system where a consolidated view of an account and person data is required.
- the core person and account repository also continues to grow over time as new pieces of data are found on the free web and other sources. Additional sources of data may also be on-boarded quickly into the system 200 by adding and configuring new spouts and corresponding bolts into the Storm topology.
- a de-duplication bolt detects duplicates and automatically merges the duplicates or float suspected duplicates to a community for task resolution.
- a pinger bolt pings hypertext transfer protocol and simple mail transfer protocol domains for validity, automatically graveyarding when a domain is deemed invalid.
- the system 200 may create indices for each company, person, and location object for matching purposes.
- person indices include record identifier, social handle, email direct phone number, company, city, zip, state, and country.
- location indices include record identifier, zip, city, and country.
- company indices include record identifier, domain, corporate phone, company prefix, stock ticker, company name and city, domain and city.
- the system may build an inverted index from a snippet, and use the index to map words in the snippet to their positions.
- the positions for a given word could be in increasing order.
- An inverted index is illustrated in an example below.
- the system 200 detects acronyms (if any) in the snippet, expands out these acronyms, tokenizes the expansion and incorporates these expansions into the inverted index, as illustrated in the example below.
- the inverted index contains the entry ibm ⁇ 0,i,j> where i and j denote the word positions of the 2nd and 3rd occurrence of IBM in the snippet.
- the database system After recognizing the acronym ibm ⁇ “international business machines”, the database system adds the entries international [i,0], business [i,1], and machines [i,2] to the inverted index.
- Acronym-expansion entries in a snippet's inverted index could be useful for matching titles or company names to the snippet.
- the system 200 may represent an attribute:value pair as an ordered tree.
- the order can capture the order of the words in the value, and also in acronym expansions.
- the ordered tree may capture choices, which include aliases, and acronym expansions.
- Table 1 below shows various examples. Ordered trees can be depicted as nested arrays, and constructed via attribute-specific constructors. For example, person_name objects are expanded to include first name aliases, and acronyms in company names and titles are detected and expanded, such as depicted in table 1.
- Ordered trees may have alternating levels of ordered ANDs and unordered ORs. For visual convenience, an AND-node is encapsulated in [ . . . ] and an OR-node in ( . . . ).
- [chairman, and, (ceo, [chief, executive, officer])] is read as “chairman AND (ceo OR (chief AND executive AND officer)).”
- Chairman AND ceo OR (chief AND executive AND officer)
- Representing the snippet as an inverted index combined with representing attribute:value pairs as ordered trees may lead to a very fast matching algorithm, as described below.
- the system 200 has attribute-specific matchers to match a value of a field to a snippet, which is unstructured text.
- the attribute-specific matchers could be instances of the following generic matcher.
- Row attribute_value_ordered_tree snippet_inverted_index hits 1 [shabd, vaid] ⁇ shabd ⁇ ⁇ 0, 5>, vaid [ ⁇ 0, 5>, ⁇ 1, 6>] ⁇ ⁇ 1, 6>, vice ⁇ ⁇ 9>, president ⁇ ⁇ 10>, . . . ⁇ 2 [vice, president] ⁇ shabd ⁇ ⁇ 0, 5>, vaid [ ⁇ 9>, ⁇ 10>] ⁇ ⁇ 1, 6>, vice ⁇ ⁇ 9>, president ⁇ ⁇ 10>, . . .
- Enumerating individual hits may be described based on the hits data structure in the last column of Table 2. Individual hits can reveal exactly what tokens in the query matched what positions in the snippet. Each hit could be individually scored. The overall score for the match of the attribute:value pair in the snippet might be defined as the aggregation of these individual scores.
- a hit could be a pair (tokens,positions), where tokens might be an array of tokens in attribute_value_ordered_tree and positions could be an array of positions in the snippet at which these tokens match, such as the examples below.
- a one-level hits tree is simply an array of post-lists.
- Table 2 hits of rows 1 and 2 form one-level trees.
- the system 200 may use a k-merge like algorithm to enumerate all the hits of such a tree to a snippet. This algorithm can “merge” k post-lists, as illustrated below. Below is an illustration on the hits [ ⁇ 0,5>, ⁇ 1,6>]
- the underlined entries depict the locations of the pointers in the various post-lists.
- the pointers are at the start positions. Since 1 minus 0 equals 1, the system 200 generates a hit, 0 . . . 1, and advances both pointers.
- step 2 since 6 minus 5 equals 1, the system 200 enumerates a hit, 5 . . . 6, and advances both pointers.
- Enumerating hits of a multi-level tree may be done by suitably generalizing the k-merge operation.
- the generalization can be a little complex, and may be well described by building up inductively from different types of multi-level tree examples.
- Example 1 is based on the hits of row 3 in Table 2: [(nil,[ ⁇ 9>, ⁇ 10>])] and corresponds to a 3-level tree.
- the system 200 processes this example as follows.
- the system 200 goes down one level since the top level is a singleton-AND. Next, the system 200 skips the nil. Finally, the system 200 produces the hit 9 . . . 10 from [ ⁇ 9>, ⁇ 10>] and annotates it with [vice, president].
- Example 2 is based on the hits of row 4 in Table 2: [(nil, ⁇ 8>), ⁇ 9>]
- step 1 the system 200 tries to 2-merge (nil, ⁇ 8>) with ⁇ 9>. Recognizing that the first argument is an OR, the system 200 goes down one level into the OR and effectively does the 2-merge of [ ⁇ 8>, ⁇ 9>] in step 2.
- Example 3 is based on the hits in row 5 of Table 2: [ ⁇ 0>, ⁇ 1>, ( ⁇ 8>, [ ⁇ 2>, ⁇ 3>, ⁇ 4>])]
- step 1 the system 200 recognizes that the need of a 3-merge at the top level.
- the system 200 places the pointers at the correct locations of the first two entries.
- the third entry is an OR, so the system 200 descends into the third entry and then places the pointer on the first entry in the first post-list in the OR choices. (This entry is 8.)
- the system 200 then outputs the hit (0 . . . 1,8) off to the scorer.
- step 3 the system 200 moves over to the second choice in this OR. This is itself an AND of three choices. So the system 200 needs a 3-merge, of [ ⁇ 2>, ⁇ 3>, ⁇ 4>]. This 3-merge produces the hit 2 . . . 4, which gets appended to 0 . . . 1 to yield 0 . . . 4.
- Example 4 is based on the hits row 6 of Table 2: [( ⁇ 0>,[nil,nil,nil]),nil]
- step 1 the system 200 recognizes that the need of a 2-merge at the top level.
- the system 200 notices that the first entry is an OR, so the system 200 descends into the first entry and then places the pointer on the first entry in the first post-list in the OR choices.
- the system 200 notes that the second entry of the top-level AND is nil, so the system 200 outputs [0,nil] as one hit.
- the system 200 advances the first pointer to the second choice in the OR ( ⁇ 0 >,[nil,nil,nil]) and notices that it is [nil,nil,nil]. So the system 200 stops; such that no new hits are generated.
- the hit scorer may take two arguments: argument_name and hit.
- Table 3 shows a number of examples explaining the scoring. Table 3, Scoring individual hits:
- the system 200 brings together various algorithms, processes and techniques that are particularly suited for finding inaccurate data and piecing together rapidly changing pieces of data and claims to generate golden records at a massive scale.
- the system 200 provides a complete framework to efficiently evaluate data and to improve the completeness and accuracy of data.
- the system 200 provides a solid foundation for linking external data sources to core data assets in a reliable and scalable way that will enable customers to gain additional insights into their customers.
- FIG. 3 illustrates a block diagram of an environment 310 wherein an on-demand database service might be used.
- the environment 310 may include user systems 312 , a network 314 , a system 316 , a processor system 317 , an application platform 318 , a network interface 320 , a tenant data storage 322 , a system data storage 324 , program code 326 , and a process space 328 .
- the environment 310 may not have all of the components listed and/or may have other elements instead of, or in addition to, those listed above.
- the environment 310 is an environment in which an on-demand database service exists.
- a user system 312 may be any machine or system that is used by a user to access a database user system.
- any of the user systems 312 may be a handheld computing device, a mobile phone, a laptop computer, a work station, and/or a network of computing devices.
- the user systems 312 might interact via the network 314 with an on-demand database service, which is the system 316 .
- An on-demand database service such as the system 316
- Some on-demand database services may store information from one or more tenants stored into tables of a common database image to form a multi-tenant database system (MTS).
- MTS multi-tenant database system
- the “on-demand database service 316 ” and the “system 316 ” will be used interchangeably herein.
- a database image may include one or more database objects.
- a relational database management system (RDMS) or the equivalent may execute storage and retrieval of information against the database object(s).
- RDMS relational database management system
- the application platform 318 may be a framework that allows the applications of the system 316 to run, such as the hardware and/or software, e.g., the operating system.
- the on-demand database service 316 may include the application platform 318 which enables creation, managing and executing one or more applications developed by the provider of the on-demand database service, users accessing the on-demand database service via user systems 312 , or third party application developers accessing the on-demand database service via the user systems 312 .
- the users of the user systems 312 may differ in their respective capacities, and the capacity of a particular user system 312 might be entirely determined by permissions (permission levels) for the current user. For example, where a salesperson is using a particular user system 312 to interact with the system 316 , that user system 312 has the capacities allotted to that salesperson. However, while an administrator is using that user system 312 to interact with the system 316 , that user system 312 has the capacities allotted to that administrator. In systems with a hierarchical role model, users at one permission level may have access to applications, data, and database information accessible by a lower permission level user, but may not have access to certain applications, database information, and data accessible by a user at a higher permission level. Thus, different users will have different capabilities with regard to accessing and modifying application and database information, depending on a user's security or permission level.
- the network 314 is any network or combination of networks of devices that communicate with one another.
- the network 314 may be any one or any combination of a LAN (local area network), WAN (wide area network), telephone network, wireless network, point-to-point network, star network, token ring network, hub network, or other appropriate configuration.
- LAN local area network
- WAN wide area network
- telephone network wireless network
- point-to-point network star network
- token ring network token ring network
- hub network or other appropriate configuration.
- TCP/IP Transfer Control Protocol and Internet Protocol
- the user systems 312 might communicate with the system 316 using TCP/IP and, at a higher network level, use other common Internet protocols to communicate, such as HTTP, FTP, AFS, WAP, etc.
- HTTP HyperText Transfer Protocol
- the user systems 312 might include an HTTP client commonly referred to as a “browser” for sending and receiving HTTP messages to and from an HTTP server at the system 316 .
- HTTP server might be implemented as the sole network interface between the system 316 and the network 314 , but other techniques might be used as well or instead.
- the interface between the system 316 and the network 314 includes load sharing functionality, such as round-robin HTTP request distributors to balance loads and distribute incoming HTTP requests evenly over a plurality of servers. At least as for the users that are accessing that server, each of the plurality of servers has access to the MTS' data; however, other alternative configurations may be used instead.
- the system 316 implements a web-based customer relationship management (CRM) system.
- the system 316 includes application servers configured to implement and execute CRM software applications as well as provide related data, code, forms, webpages and other information to and from the user systems 312 and to store to, and retrieve from, a database system related data, objects, and Webpage content.
- CRM customer relationship management
- data for multiple tenants may be stored in the same physical database object, however, tenant data typically is arranged so that data of one tenant is kept logically separate from that of other tenants so that one tenant does not have access to another tenant's data, unless such data is expressly shared.
- the system 316 implements applications other than, or in addition to, a CRM application.
- the system 316 may provide tenant access to multiple hosted (standard and custom) applications, including a CRM application.
- User (or third party developer) applications which may or may not include CRM, may be supported by the application platform 318 , which manages creation, storage of the applications into one or more database objects and executing of the applications in a virtual machine in the process space of the system 316 .
- FIG. 3 One arrangement for elements of the system 316 is shown in FIG. 3 , including the network interface 320 , the application platform 318 , the tenant data storage 322 for tenant data 323 , the system data storage 324 for system data 325 accessible to the system 316 and possibly multiple tenants, the program code 326 for implementing various functions of the system 316 , and the process space 328 for executing MTS system processes and tenant-specific processes, such as running applications as part of an application hosting service. Additional processes that may execute on the system 316 include database indexing processes.
- each of the user systems 312 could include a desktop personal computer, workstation, laptop, PDA, cell phone, or any wireless access protocol (WAP) enabled device or any other computing device capable of interfacing directly or indirectly to the Internet or other network connection.
- WAP wireless access protocol
- Each of the user systems 312 typically runs an HTTP client, e.g., a browsing program, such as Microsoft's Internet Explorer browser, Netscape's Navigator browser, Opera's browser, or a WAP-enabled browser in the case of a cell phone, PDA or other wireless device, or the like, allowing a user (e.g., subscriber of the multi-tenant database system) of the user systems 312 to access, process and view information, pages and applications available to it from the system 316 over the network 314 .
- a browsing program such as Microsoft's Internet Explorer browser, Netscape's Navigator browser, Opera's browser, or a WAP-enabled browser in the case of a cell phone, PDA or other wireless device, or the like.
- Each of the user systems 312 also typically includes one or more user interface devices, such as a keyboard, a mouse, trackball, touch pad, touch screen, pen or the like, for interacting with a graphical user interface (GUI) provided by the browser on a display (e.g., a monitor screen, LCD display, etc.) in conjunction with pages, forms, applications and other information provided by the system 316 or other systems or servers.
- GUI graphical user interface
- the user interface device may be used to access data and applications hosted by the system 316 , and to perform searches on stored data, and otherwise allow a user to interact with various GUI pages that may be presented to a user.
- embodiments are suitable for use with the Internet, which refers to a specific global internetwork of networks. However, it should be understood that other networks can be used instead of the Internet, such as an intranet, an extranet, a virtual private network (VPN), a non-TCP/IP based network, any LAN or WAN or the like.
- VPN virtual private network
- each of the user systems 312 and all of its components are operator configurable using applications, such as a browser, including computer code run using a central processing unit such as an Intel Pentium® processor or the like.
- the system 316 (and additional instances of an MTS, where more than one is present) and all of their components might be operator configurable using application(s) including computer code to run using a central processing unit such as the processor system 317 , which may include an Intel Pentium® processor or the like, and/or multiple processor units.
- a computer program product embodiment includes a machine-readable storage medium (media) having instructions stored thereon/in which can be used to program a computer to perform any of the processes of the embodiments described herein.
- Computer code for operating and configuring the system 316 to intercommunicate and to process webpages, applications and other data and media content as described herein are preferably downloaded and stored on a hard disk, but the entire program code, or portions thereof, may also be stored in any other volatile or non-volatile memory medium or device as is well known, such as a ROM or RAM, or provided on any media capable of storing program code, such as any type of rotating media including floppy disks, optical discs, digital versatile disk (DVD), compact disk (CD), microdrive, and magneto-optical disks, and magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media or device suitable for storing instructions and/or data.
- any type of rotating media including floppy disks, optical discs, digital versatile disk (DVD), compact disk (CD), microdrive, and magneto-optical disks, and magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media or device suitable for storing instructions and/or data
- the entire program code, or portions thereof may be transmitted and downloaded from a software source over a transmission medium, e.g., over the Internet, or from another server, as is well known, or transmitted over any other conventional network connection as is well known (e.g., extranet, VPN, LAN, etc.) using any communication medium and protocols (e.g., TCP/IP, HTTP, HTTPS, Ethernet, etc.) as are well known.
- a transmission medium e.g., over the Internet
- any other conventional network connection e.g., extranet, VPN, LAN, etc.
- any communication medium and protocols e.g., TCP/IP, HTTP, HTTPS, Ethernet, etc.
- computer code for implementing embodiments can be implemented in any programming language that can be executed on a client system and/or server or server system such as, for example, C, C++, HTML, any other markup language, JavaTM, JavaScript, ActiveX, any other scripting language, such as VBScript, and many other programming languages as are well known may be used.
- JavaTM is a trademark of Sun Microsystems, Inc.
- the system 316 is configured to provide webpages, forms, applications, data and media content to the user (client) systems 312 to support the access by the user systems 312 as tenants of the system 316 .
- the system 316 provides security mechanisms to keep each tenant's data separate unless the data is shared.
- MTS Mobility Management Entity
- they may be located in close proximity to one another (e.g., in a server farm located in a single building or campus), or they may be distributed at locations remote from one another (e.g., one or more servers located in city A and one or more servers located in city B).
- each MTS could include one or more logically and/or physically connected servers distributed locally or across one or more geographic locations.
- server is meant to include a computer system, including processing hardware and process space(s), and an associated storage system and database application (e.g., OODBMS or RDBMS) as is well known in the art. It should also be understood that “server system” and “server” are often used interchangeably herein.
- database object described herein can be implemented as single databases, a distributed database, a collection of distributed databases, a database with redundant online or offline backups or other redundancies, etc., and might include a distributed database or storage network and associated processing intelligence.
- FIG. 4 also illustrates the environment 310 . However, in FIG. 4 elements of the system 316 and various interconnections in an embodiment are further illustrated.
- FIG. 4 shows that the each of the user systems 312 may include a processor system 312 A, a memory system 312 B, an input system 312 C, and an output system 312 D.
- FIG. 4 shows the network 314 and the system 316 .
- system 316 may include the tenant data storage 322 , the tenant data 323 , the system data storage 324 , the system data 325 , a User Interface (UI) 430 , an Application Program Interface (API) 432 , a PL/SOQL 434 , save routines 436 , an application setup mechanism 438 , applications servers 400 1 - 400 N , a system process space 402 , tenant process spaces 404 , a tenant management process space 410 , a tenant storage area 412 , a user storage 414 , and application metadata 416 .
- the environment 310 may not have the same elements as those listed above and/or may have other elements instead of, or in addition to, those listed above.
- the processor system 312 A may be any combination of one or more processors.
- the memory system 312 B may be any combination of one or more memory devices, short term, and/or long term memory.
- the input system 312 C may be any combination of input devices, such as one or more keyboards, mice, trackballs, scanners, cameras, and/or interfaces to networks.
- the output system 312 D may be any combination of output devices, such as one or more monitors, printers, and/or interfaces to networks. As shown by FIG.
- the system 316 may include the network interface 320 (of FIG. 3 ) implemented as a set of HTTP application servers 400 , the application platform 318 , the tenant data storage 322 , and the system data storage 324 . Also shown is the system process space 402 , including individual tenant process spaces 404 and the tenant management process space 410 .
- Each application server 400 may be configured to access tenant data storage 322 and the tenant data 323 therein, and the system data storage 324 and the system data 325 therein to serve requests of the user systems 312 .
- the tenant data 323 might be divided into individual tenant storage areas 412 , which can be either a physical arrangement and/or a logical arrangement of data.
- each tenant storage area 412 the user storage 414 and the application metadata 416 might be similarly allocated for each user. For example, a copy of a user's most recently used (MRU) items might be stored to the user storage 414 . Similarly, a copy of MRU items for an entire organization that is a tenant might be stored to the tenant storage area 412 .
- the UI 430 provides a user interface and the API 432 provides an application programmer interface to the system 316 resident processes to users and/or developers at the user systems 312 .
- the tenant data and the system data may be stored in various databases, such as one or more OracleTM databases.
- the application platform 318 includes the application setup mechanism 438 that supports application developers' creation and management of applications, which may be saved as metadata into the tenant data storage 322 by the save routines 436 for execution by subscribers as one or more tenant process spaces 404 managed by the tenant management process 410 for example. Invocations to such applications may be coded using the PL/SOQL 34 that provides a programming language style interface extension to the API 432 . A detailed description of some PL/SOQL language embodiments is discussed in commonly owned U.S. Pat. No. 7,730,478 entitled, METHOD AND SYSTEM FOR ALLOWING ACCESS TO DEVELOPED APPLICATIONS VIA A MULTI-TENANT ON-DEMAND DATABASE SERVICE, by Craig Weissman, filed Sep. 21, 2007, which is incorporated in its entirety herein for all purposes. Invocations to applications may be detected by one or more system processes, which manages retrieving the application metadata 416 for the subscriber making the invocation and executing the metadata as an application in a virtual machine.
- Each application server 400 may be communicably coupled to database systems, e.g., having access to the system data 325 and the tenant data 323 , via a different network connection.
- database systems e.g., having access to the system data 325 and the tenant data 323 , via a different network connection.
- one application server 400 1 might be coupled via the network 314 (e.g., the Internet)
- another application server 400 N-1 might be coupled via a direct network link
- another application server 400 N might be coupled by yet a different network connection.
- Transfer Control Protocol and Internet Protocol TCP/IP
- TCP/IP Transfer Control Protocol and Internet Protocol
- each application server 400 is configured to handle requests for any user associated with any organization that is a tenant. Because it is desirable to be able to add and remove application servers from the server pool at any time for any reason, there is preferably no server affinity for a user and/or organization to a specific application server 400 .
- an interface system implementing a load balancing function e.g., an F5 Big-IP load balancer
- the load balancer uses a least connections algorithm to route user requests to the application servers 400 .
- Other examples of load balancing algorithms such as round robin and observed response time, also can be used.
- the system 316 is multi-tenant, wherein the system 316 handles storage of, and access to, different objects, data and applications across disparate users and organizations.
- one tenant might be a company that employs a sales force where each salesperson uses the system 316 to manage their sales process.
- a user might maintain contact data, leads data, customer follow-up data, performance data, goals and progress data, etc., all applicable to that user's personal sales process (e.g., in the tenant data storage 322 ).
- the user since all of the data and the applications to access, view, modify, report, transmit, calculate, etc., can be maintained and accessed by a user system having nothing more than network access, the user can manage his or her sales efforts and cycles from any of many different user systems. For example, if a salesperson is visiting a customer and the customer has Internet access in their lobby, the salesperson can obtain critical updates as to that customer while waiting for the customer to arrive in the lobby.
- the user systems 312 (which may be client systems) communicate with the application servers 400 to request and update system-level and tenant-level data from the system 316 that may require sending one or more queries to the tenant data storage 322 and/or the system data storage 324 .
- the system 316 e.g., an application server 400 in the system 316 ) automatically generates one or more SQL statements (e.g., one or more SQL queries) that are designed to access the desired information.
- the system data storage 324 may generate query plans to access the requested data from the database.
- Each database can generally be viewed as a collection of objects, such as a set of logical tables, containing data fitted into predefined categories.
- a “table” is one representation of a data object, and may be used herein to simplify the conceptual description of objects and custom objects. It should be understood that “table” and “object” may be used interchangeably herein.
- Each table generally contains one or more data categories logically arranged as columns or fields in a viewable schema. Each row or record of a table contains an instance of data for each category defined by the fields.
- a CRM database may include a table that describes a customer with fields for basic contact information such as name, address, phone number, fax number, etc.
- Another table might describe a purchase order, including fields for information such as customer, product, sale price, date, etc.
- standard entity tables might be provided for use by all tenants.
- such standard entities might include tables for Account, Contact, Lead, and Opportunity data, each containing pre-defined fields. It should be understood that the word “entity” may also be used interchangeably herein with “object” and “table”.
- tenants may be allowed to create and store custom objects, or they may be allowed to customize standard entities or objects, for example by creating custom fields for standard objects, including custom index fields.
- all custom entity data rows are stored in a single multi-tenant physical table, which may contain multiple logical tables per organization. It is transparent to customers that their multiple “tables” are in fact stored in one large table or that their data may be stored in the same table as the data of other customers.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Library & Information Science (AREA)
- Computational Linguistics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
- This application claims the benefit of U.S. Provisional Patent Application No. 61/857,325 entitled, SYSTEM AND METHOD FOR MATCHING SNIPPETS OF SEARCH RESULTS TO CLUSTERS OF OBJECTS, by Nachnani, et al., filed Jul. 23, 2013, and U.S. Provisional Patent Application No. 61/862,873 entitled SYSTEM AND METHOD FOR CONFIDENTLY MERGING SNIPPETS OF SEARCH RESULTS WITH CLUSTERS OF OBJECTS, by Nachnani, et al., filed Aug. 6, 2013, the entire contents of which are incorporated herein by reference.
- A portion of the disclosure of this patent document contains material which is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure, as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
- The subject matter discussed in the background section should not be assumed to be prior art merely as a result of its mention in the background section. Similarly, a problem mentioned in the background section or associated with the subject matter of the background section should not be assumed to have been previously recognized in the prior art. The subject matter in the background section merely represents different approaches, which in and of themselves may also be inventions.
- Companies are often overwhelmed with customer data. Names, titles, billing addresses, shipping addresses, email addresses, phone numbers, household data, affiliated companies, and associated parties are examples of customer data fields. Managing customer data can become extremely complex and dynamic due to the many changes individual customers go through over time. Multiply all of these customer data fields by the millions of customer data records which a company may have in its data sources, and factor in how quickly and how often this customer data changes, and the result is that many companies have a significant data management challenge.
- Some customer data providers attempt to address this challenge by using a crowd-sourced platform to build a contact database which is sourced and updated by sales and marketing professionals. However, the customer data provided by customer data providers often has a variety problems, such as invalid email addresses or invalid phone numbers, a contact record with incorrect information from a name spelled wrong to a bad address, incomplete or inaccurate records for company names, job titles, and phone numbers, non-current data, wrong company information or wrong contact data, duplicate contacts with inconsistent information, fields that are empty due to poor data capture techniques or contain other inaccurate information, completed fields that contain nonsense data such as “TBA” or “TBD,” and outdated information, such as a contact that no longer works at the contact's former company. Customer data providers may have these problems because community update models treat every add request or update request as an absolute fact, which can potentially lead to bad updates, such as incorrectly inactivating high-profile executives or fraudulently adding bogus contacts. While some issues may be alleviated by adding carrot-and-stick safeguards such as penalties for bad updates, rewards for good updates, and reputation-based updates, only a few ill-intentioned users can undermine the quality of customer data. Furthermore, the potential for bad data still exists when millions of records enter a customer data provider system from other sources, such that users or partners may end up adding bad data unknowingly from outdated lists and databases.
- In accordance with embodiments, there are provided systems and methods for matching and confidently adding snippets of search results to clusters of objects. Information is searched based on objects in a cluster of objects. A data snippet is extracted from the search results. The data snippet is added to the cluster of objects if the data snippet includes data that matches at least one of the objects in the cluster of objects. A confidence score may be calculated for adding the data snippet to the cluster of objects based on the recency, a job title, an email address, and/or a phone number associated with the data snippet. The data snippet may be added to the cluster of objects in a customer accessible database if the confidence score is sufficiently high, and a notice for review may be generated if the confidence score is not sufficiently high.
- For example, a database system searches a business database for information about a business contact stored in a contact database, wherein the contact database includes objects stored in a cluster of objects that correspond to a given name “Gregory,” a family name “Jones,” a company “International Business Machines,” a title “V.P for sales,” a location “New York City,” and an email address for a specific business contact. The database system extracts data that includes a given name “Greg,” a family name “Jones,” a company “IBM,” and a mobile phone number from the information in one of the search results. The database system determines whether the data snippet extracted from the information in the search results includes data that matches any of the objects stored in the cluster of objects in the contact database corresponding to the business contact named Gregory Jones. The database system adds the extracted data snippet, including the mobile phone number, to the objects stored in the cluster of objects in the customer accessible database that correspond to the business contact named Gregory Jones because the calculated confidence score is sufficiently high since both the data snippet and the objects in the cluster of objects include the uncommon family name “Jones.” In this example, a sales person planning on contacting Greg Jones at IBM now has Jones's mobile phone number that the sales person did not have previously.
- The database system builds, manages and sustains a high-quality person data object by bringing in data from multiple sources, normalizing, enriching, matching, and merging data to provide a “golden record,” or a best version of the data, for a person and the person's various business profile attributes. The database system leverages free web data sources such as news feeds, blogs and search results to mine attributes such as titles, social handles, etc., to further improve the quality of contact, company, and location data objects, and uses this additional data to build, validate and enrich person profiles.
- While one or more implementations and techniques are described with reference to an embodiment in which matching and confidently adding snippets of search results to clusters of objects is implemented in a system having an application server providing a front end for an on-demand database service capable of supporting multiple tenants, the one or more implementations and techniques are not limited to multi-tenant databases nor deployment on application servers. Embodiments may be practiced using other database architectures, i.e., ORACLE®, DB2® by IBM and the like without departing from the scope of the embodiments claimed.
- Any of the above embodiments may be used alone or together with one another in any combination. The one or more implementations encompassed within this specification may also include embodiments that are only partially mentioned or alluded to or are not mentioned or alluded to at all in this brief summary or in the abstract. Although various embodiments may have been motivated by various deficiencies with the prior art, which may be discussed or alluded to in one or more places in the specification, the embodiments do not necessarily address any of these deficiencies. In other words, different embodiments may address different deficiencies that may be discussed in the specification. Some embodiments may only partially address some deficiencies or just one deficiency that may be discussed in the specification, and some embodiments may not address any of these deficiencies.
- In the following drawings like reference numbers are used to refer to like elements. Although the following figures depict various examples, the one or more implementations are not limited to the examples depicted in the figures.
-
FIG. 1 is an operational flow diagram illustrating a high level overview of a method for matching and confidently adding snippets of search results to clusters of objects in an embodiment; -
FIG. 2 is a block diagram of a system for matching and confidently adding snippets of search results to clusters of objects in an embodiment; -
FIG. 3 illustrates a block diagram of an example of an environment wherein an on-demand database service might be used; and -
FIG. 4 illustrates a block diagram of an embodiment of elements ofFIG. 3 and various possible interconnections between these elements. - Systems and methods are provided for matching and confidently adding snippets of search results to clusters of objects. As used herein, the term multi-tenant database system refers to those systems in which various elements of hardware and software of the database system may be shared by one or more customers. For example, a given application server may simultaneously process requests for a great number of customers, and a given database table may store rows for a potentially much greater number of customers. As used herein, the term query plan refers to a set of steps used to access information in a database system. Next, mechanisms and methods for matching and confidently adding snippets of search results to clusters of objects will be described with reference to example embodiments. The following detailed description will first describe a method for matching and confidently adding snippets of search results to clusters of objects. Next, a block diagram of an example system for matching and confidently adding snippets of search results to clusters of objects is described.
- A lot of customer data makes up a database of contact records. The primary source for this customer data could be a website where users add and update business card information by adding or updating contact information one record at a time through a web form or by uploading comma separated value files that contain contact information. Users may also occasionally submit bounce email reports that contain error codes for invalid emails that they receive from their mail providers as part of their email marketing campaigns. A database system can receive and process millions of data records to provide new or updated data to customers in a timely manner. The database system cleans the data from the record, normalizes the data into a standard set of values that might be used for matching, enriches the data, and attempts to match the data with previously stored data to create a “golden record” for a person identified by the incoming data and/or previously stored data. False matches can result in the loss of good data and missed matches may reduce the value of previously stored data. The matching process also helps in identifying duplicates and decreases the possibilities that duplicate records are created for the same person. After the matching process returns a suitable list of matching person candidates, the database system adds the incoming data to a cluster of data objects that contains data values that matches data objects for the person identified by the incoming data. Alternatively, the database system creates a new cluster of data objects for the person if the database system does not already include any cluster of data objects that match data objects for the person identified by the incoming data. Then the database system determines whether to store the added data in a customer accessible database.
- A significant majority of bad and erroneous operations may be prevented, thereby resulting in much higher quality of customer data, if a database system treats every add or update contribution as a claim and takes into account the reputation of the user/partner who makes the claim. In addition, bad and erroneous operations may be prevented if a database system takes into account the type of the claim, the date and time of the claim, and further validates the claim with data from the free web and other sources with additional levels of data stewardship. Claims from trusted users can be treated as sources of truth and valuable enough to overwrite almost all existing information. Claims from average members and the free web may be treated as good as any other information. The more consistent points identified will prevail, such as if three people evaluate some data as good and one person evaluates the same data as bad, the evaluations as good prevail.
- The database system weighs a claim on a graded scale and calculates various scores to generate a confidence score that is then used to determine the type of actions that are needed before the claim is fully processed and applied to generate a “golden record” for a person. The database system determines the quality of each and every individual attribute such as names, titles, emails, phones, social handles etc. Each and every attribute of data in the claim is scored and weighed against similar attributes from other claims and golden records in case they already exist. If the data in an attribute of a new claim is of better quality than an existing attribute and the confidence score of the new claim is above a certain threshold, the database system uses the incoming attribute in a data snippet to replace the existing data for that attribute in the golden record. The attribute in the data snippet is linked to a person record, where data from multiple contacts is combined to create/update work profiles for the person record, allowing tracking of the lifecycle and work profile of contacts. If the attribute in the data snippet is conflicting or additional details are needed, the database system generates an additional task/alert to data stewards for additional review based on the importance of the data record and the attribute in question. If the attribute in the data snippet is of poor quality, then the database system rejects the claim and the state of the attribute and golden record remains unaffected. If there is not enough information to make a decision, there is not enough authority to change the state, or no new information is detected, then no decision is made, re-affirming the current state of the data.
-
FIG. 1 is an operational flow diagram illustrating a high level overview of amethod 100 for matching and confidently adding snippets of search results to clusters of objects. As shown inFIG. 1 , a database system may match and confidently add snippets of search results to clusters of objects. - A database system searches information based on objects in a cluster of objects, block 102. For example and without limitation, this can include the database system searching a business database for information about a business contact stored in a contact database, wherein the contact database includes objects stored in a cluster of objects that correspond to a given name “Gregory,” a family name “Jones,” a company “International Business Machines,” a title “V.P for sales,” a location “New York City, and an email address for a specific business contact. After receiving search results based on objects in a cluster of objects, the database system extracts a data snippet from the search results, block 104. By way of example and without limitation, this can include the database system extracting data that includes a given name “Greg,” a family name “Jones,” a company “IBM,” and a mobile phone number from the information in one of the search results.
- Having extracted the data snippet from the search results, the database system determines whether the data snippet includes data that matches at least one of the objects in the cluster of objects, block 106. In embodiments, this can include the database system determining whether the data snippet extracted from the information in the search result includes data that matches any of the objects stored in the cluster of objects in the contact database corresponding to the business contact named Gregory Jones. Whether the data snippet includes data that matches at least one of the objects in the cluster of objects may include matching based on first name aliases and/or acronym expansion.
- For example, “Greg” is a given name alias that matches the given name “Gregory” and “IBM” is an acronym that can be expanded to match “International Business Machines.” If the data snippet includes data that matches at least one of the objects in the cluster of objects, the method continues to block 108. If the data snippet does not include data that matches at least one of the objects in the cluster of objects, the method proceeds to block 110. If the data snippet includes data that matches at least one of the objects in the cluster of objects, the database system adds the data snippet to the cluster of objects, block 108. For example and without limitation, this can include the database system adding the extracted data snippet, including the mobile phone number, to the objects stored in the cluster of objects in the contact database that correspond to the business contact named Gregory Jones because both the data snippet and the objects in the cluster of objects include the uncommon family name “Jones.”
- The
method 100 then proceeds to block 112. If the data snippet does not include data that matches at least one of the objects in the cluster of objects, the database system optionally stores the data snippet for matching with subsequent clusters of objects, block 110. By way of example and without limitation, this can include the database system storing the data snippet for matching with subsequent clusters of objects if the data snippet does not include data that matches at least one of the objects in the cluster of objects, as the contact database may be later supplemented with an additional contact that includes an object which matches some of the data in the extracted data snippet. Then themethod 100 either terminates or begins again atblock 102. - Having determined that the data snippet includes data which matches objects in a cluster of objects, the database system can also determine whether the data snippet includes data that matches objects in another cluster of objects, block 112. In embodiments, this can include the database system determining that the extracted data snippet that includes “Greg,” “Jones,” and the mobile phone number also matches an object in another cluster of objects that includes “Greg,” “Jones,” and a company “Microsoft.” If the data snippet includes data that matches at least one of the objects in another cluster of objects, the method continues to block 114. If the data snippet does not include data that matches at least one of the objects in another cluster of objects, the method proceeds to block 116. If the data snippet includes data that matches at least one object in the other cluster of objects, the database system may combine the cluster of objects with the other cluster of objects, block 114. For example and without limitation, this can include the database system combining the clusters of objects for the two business contacts named “Jones” whose objects include “International Business Machines” and “Microsoft.” Such a combination of clusters of objects for business contact objects could be useful for a sales person planning on contacting Greg Jones at IBM if the sales person knows some business contacts who worked at Microsoft at the time when Greg Jones worked at Microsoft.
- After adding a data snippet to a cluster of objects, the database system optionally calculates a confidence score for adding the data snippet to the cluster of objects based on the recency, a job title, an email address, and/or a phone number associated with the data snippet, block 116. By way of example and without limitation, this can include the database system calculating a confidence score based on how recent the data objects from the search result were stored in the business database, with the today's date of storage equated with the highest recency score.
- In another example, the database system calculates a confidence score based on a job title from the search result, with hierarchically higher job titles equated with a higher title rank score, and with job titles known to be used by the business contact's claimed company equated with a higher title quality score. In yet another example, the database system calculates a confidence score based on an email address from the search result, with the email score based on how well the email address matches the pattern of other email addresses for business contacts for the business contact's claimed company and how well the email address matches the first name and the last name of the business contact. In a further example, the database system calculates a confidence score based on a phone number from the search result, where the phone number score is based on the consistency between the claimed phone number and the area code associated with the claimed geographic location for the business contact. The confidence score may be based on any weighted combination of the recency, the job title, the email address, and the phone number from the data snippet.
- The database system optionally determines whether a confidence score is sufficiently high for adding the data snippet to the cluster of objects stored in a customer accessible database, block 118. In embodiments, this can include the database system determining that a confidence score is sufficiently high for a new mobile phone number to be added to the cluster of data objects for Greg Jones in a customer accessible database. If a confidence score is sufficiently high for adding the data snippet to the cluster of objects stored in a customer accessible database, the
method 100 continues to block 120. If a confidence score is not sufficiently high for adding the data snippet to the cluster of objects stored in a customer accessible database, themethod 100 proceeds to block 122. If the confidence score is sufficiently high for adding the data snippet to the cluster of objects stored in the customer accessible database, the database system optionally adds the data snippet to the cluster of objects stored in the customer accessible database, block 120. - For example and without limitation, this can include the database system storing the new mobile phone number in the contact database that is accessible by a sales person planning on contacting Greg Jones, who now has Jones' mobile phone number that the salesman did not have previously. Then the
method 100 either terminates or begins again atblock 102. Although this example describes the database system using a confidence score to determine whether to add a data snippet to a cluster of objects in a customer accessible database, the database system may also use a confidence score to determine whether to combine the cluster of objects with the other cluster of objects. The database system may also use a confidence score to determine whether to combine the cluster of objects with the other cluster of objects in a customer accessible database. If the confidence score is not sufficiently high for adding the data snippet to the cluster of objects stored in the customer accessible database, the database system optionally generates a notice for review, block 122. - By way of example and without limitation, the notice for review can include the database system generating a notice for reviewing the adding of the data snippet to the cluster of objects because the mobile phone number in the search results is not associated with New York City, the claimed office location for Jones in the search results, and the title “VP” in the search results is too generic and does not match any titles known to be used by IBM, the claimed company for Jones in the search results. Then the
method 100 either terminates or begins again atblock 102. Accordingly, systems and methods are provided which enable a database system for matching and confidently adding snippets of search results to clusters of objects. - The
method 100 may be repeated as desired. Although this disclosure describes the blocks 102-122 executing in a particular order, the blocks 102-122 may be executed in a different order. In other implementations, each of the blocks 102-122 may also be executed in combination with other blocks and/or some blocks may be divided into a different set of blocks. -
FIG. 2 illustrates a block diagram of an example system for matching and confidently adding snippets of search results to clusters of objects, under an embodiment. As shown inFIG. 2 , thesystem 200 may illustrate a cloud computing environment in which data, applications, services, and other resources are stored and delivered through shared data-centers and appear as a single point of access for the users. Thesystem 200 may also represent any other type of distributed computer network environment in which servers control the storage and distribution of resources and services for different client users. - One example of a system that can implement matching and confidently adding data snippets to clusters of objects is the popular open-source framework from Twitter® called Storm, which is a real time, open source data streaming framework that functions entirely in memory. Storm constructs a processing graph, called a “topology,” that feeds data from input sources through processing nodes. The input data sources are called “spouts,” and the processing nodes are called “bolts.” The data model consists of tuples, which flow from spouts to the bolts, which execute user code. Besides simply being locations where data is transformed or accumulated, bolts may also join streams of data and branch streams of data. Storm is designed to be run on several machines to provided parallelism. Storm processes streams of tuples. A stream is defined to be an unlimited ordered sequence of tuples, and each tuple is a one dimensional array of objects.
- The
system 200 acts as a central data processing hub, or clearing house, that brings in multiple data sources and free web data together to generate the “golden” record for core data assets around accounts and persons. The following describes the key components of thesystem 200 as part of a storm topology to implement the data processing pipeline. As part of the system initialization, before the Storm topology is activated to process any incoming claims, all the existing claims, reference data and golden person records are loaded into an in-memory key-value data store. Thesystem 200 generates a set of specialized keys for each person record and claims that enable fast lookups for the purpose of matching and retrieval. Indices are created for each company, person and location object in the cache, and these indices are used for person matching, company matching and location matching. A spout is a source of data streams in a Storm topology. Generally spouts will read tuples from an external source and emit them into the topology. - The business directory spout 202 reads data from the business directory database and emits tuples, which are treated as claims, into the topology, such as contact added, contact updated, contact invalid phone, contact invalid email, and contact not at company. Each of these claim types has an associated contributor identifier that is the identity of the user who performed the action. The business directory spout 202 is an unbounded stream and keeps emitting data till there is no more data to be read. The tuples that are emitted out of the business directory spout 202 may be distributed randomly (shuffle grouping) to a normalize bolt 204 which is the first bolt in the pipeline. This data can also sent to a
search engine bolt 206 which executes free web search queries and tries to find additional data around this contact, such as titles and social handles. - The partial records spout 208 provides partial records from disparate sources. The partial records spout 208 reads contact data from a partial records database where files that are uploaded by users on the website are stored in raw format before partial records processing. The key difference here is that unlike the business directory spout 202, the partial records spout 208 emits tuples based on partial data based on the data in the uploaded files. Also, the tuples that come out of the partial records spout 208 will often contain very poorly normalized data. Similar to the business directory spout 202, the tuples that are emitted out of the partial records spout 208 may be distributed randomly (shuffle grouping) to the normalize bolt 204 which is the first bolt in the pipeline. This data can also sent to the
search engine bolt 206 which executes free web search queries and tries to find additional data around this contact, such as titles and social handles. Examples of claims emitted by the partial records spout 208 include contact added and contact added for new company. - The
bounce email spout 210 reads bounce email error codes, which may be from comma separated value files that are uploaded by website administrators and website users. Examples of claims emitted by the bounce email spout include contact email and contact message. The bounce file message that thebounce email spout 210 receives for an email is typically unstructured text, such as records that are comma-separated with the email in the first column and the second column containing the bounce message as unstructured text. In order for thebounce email spout 210 to emit the objects properly, an automatic column mapping algorithm may initially process the first few lines of the file. The algorithm does not need to rely on the names of the column headers, but rather the algorithm can tokenize the bounce file. The field separator may be determined from the file by tokenizing on each kind of separator and computing how consistent the number of tokens the algorithm creates for the entire file. After determining the field separator, the algorithm can determine which column contains the email and which column contains the message. The algorithm may split out the record, remove the email, and concatenate the rest of the record to create the contact message claim. The emitted contact message claim is typically an unstructured snippet of text. - The
social handle spout 212 reads contact data and social handles from a social handle repository and submits claims such as contact social handle. - The
crawler spout 214 emits contacts found on the web from crawling websites for their management pages. Thecrawler spout 214 may start with a number of seed companies that thesystem 200 currently has and use it as the starting point for crawling. Examples of claims emitted by thecrawler spout 214 include contact added and contact updated. - Processing in a Storm topology is generally done in bolts. Bolts may do anything from filtering, functions, aggregations, joins, talking to databases, and more. The normalize bolt 204 processes all the tuples that come to it through a series of data normalization routines. The normalizer bolt 204 may standardize addresses, titles, phone numbers, and properly classify contact records by department and level. The following are some of the key normalizations. An address normalizer can include a list of abbreviations, such as E to East, W to West, Blvd to Boulevard; only allows letters, numbers, and special characters; and remove any space if there are any spaces around the special characters. A title normalizer may include a list of misspellings and abbreviations. A name normalizer can allow letters and special characters, not allow special characters at the beginning and the end of a name, capitalize the first letter and add a space after each name, capitalize the next letter if a name starts with “Mc,” and capitalize all Roman numerals. A city normalizer may only allow letters and special characters, and only keep the last non-space special character if there are a sequence of special characters. A base normalizer can return the correct country normalizer based on the country abbreviation. A phone normalizer may normalize phone patterns based on each country having its own phone pattern. A zip normalizer can normalize zip code patterns based on each country having its own zip code pattern. A state normalizer may normalize states based on countries having its own state requirements, if there are any. Once the data is normalized, the normalizer bolt 204 can pass the data to the next stage in the pipeline, which is an enrich
bolt 216. - The enrich
bolt 216 uses external data services for email verification, for phone verification and social append services for social handles, and appends a set of meta-attributes to all the new contact claims that enter the pipeline. After enrichment, the tuple may contain additional metadata around emails, phones and social handles that is useful for matching and merging purposes. The enrichbolt 216 passes this data to thematch bolt 218 that tries to match the incoming contact claims with other existing claims and facts in thesystem 200. - The
match bolt 218 is based on thesystem 200 modeling a specific data model of a person object. For example, to allow matching on the probe (title=CEO, company=Google), thesystem 200 creates a suitable index (title@company or title_rank@company). A probe is a (partial) person record, such as some attribute:value pairs of a person with at least the person name present. For example (first_name=shabd, last_name=vaid, company=Responsys) should match the person Shabd Vaid because this name is uncommon and in the past he has worked at Responsys. The working data model of person object attributes may include: first name, last name, linkedin handle, twitter handle, other social handles, links to contact objects, work history, photos, education, and snippets, which are unstructured short pieces of text such as search result snippets, tweets, etc., and others containing person-identifying content. - A person object is composed of contact objects in a one-too-many relationship. That is, a person may have many contact objects, but a contact object belongs to only one person object. So if the probe matches a contact object, the
system 200 can infer that the contact object matches the associated person object. If a probe does not match any contact object, yet it does match a person object, the probe contains some person-level attributes (such as social handles) which match a person object, or the probe contains some attributes of a person which cross contact boundaries. For example, the probe may be {person_name:“shabd vaid”, company=“iStorez”, company=“Responsys”}. This probe should match the Shabd Vaid person because his name is uncommon and he worked at both companies. - P denotes a person object and p.work_history.company_names denotes the names of companies p has worked at, p.work_history.cities denotes the set of all cities p has worked in, p.work_history.titles denotes the set of job titles that the person has held, and similar notations exist for work emails, work phones, work states, work countries, and social handles. The formats of the objects of different types of social handles (linkedin, twitter, etc.) is quite different, so it may not be necessary to have a different index type for a different type of social handle because there is no risk of a collision.
- A final check models the probability that a match is a chance event. Let M denote this match. Specifically, assume a universe of objects (here, persons) that has size n, and assume a uniform probability model on this universe, that is, all objects are equally likely. The
system 200 can estimate the upper bound on the expected number E(M) of objects in the universe that have the properties of the match M, under the universe probability model. If this upper bound estimate is below a certain threshold (1 may be a sensible choice) thesystem 200 accepts this match, otherwise thesystem 200 rejects the match. One way to estimate a suitable upper bound on E(M) is to model the probabilities of various attribute:value pairs under the universe probability model, then assume the independence of attributes in the match and multiply out these probabilities, then finally multiply this by n. To formally describe this, let M={a:v|a is an attribute and v is its value}. For example, M={person_name: “john smith”, company_name: “ibm”}. This means that the person name matched in M is John Smith, and the company_name matched in M is ibm. Now E(M)=n*product_{a:v in M} P(a:v) (EUB 1. Modeling the probabilities of all attribute:value pairs in the universe is probably too complex, so the database system may begin by modeling the probabilities of certain key attributes and their value, drop all attributes other than these from M and still use (EUB 1. The result is still an estimate of the upper bound on E(M). For concreteness, suppose thesystem 200 has modeled the probability of person names in the universe, and of company names. For example M={person_name: “john smith”, company_name: “ibm”} The estimated upper bound on E(M) is P(person_name: “john smith”)*P(company_name: “ibm”)*n˜P(person_name: “john smith”)*#contacts_in_company(company_name: “ibm”) - The result-set size based estimate may not generalize as well as explicit modeling. For example, the P(person_name) explicit model which assumes independence of first and last names does not generalize well. An alternative to an explicit estimate is a result-set size based estimate. In this version, the
system 200 runs the matcher to find all true positive matches. Here, ‘true positive’ may not include ‘modeling chance matches’. If there are at least two distinct objects in the result set, thesystem 200 deems that the probe being matched is not matched uniquely. This approach has the benefit that the P(a:v) probabilities are not explicitly modeled. The result set will carry the information to judge whether a match is unique or not, even in complex cases. This approach has the limitation that it does not model the real world; only the current, actual universe of (golden) data objects. Another issue is that to implement this approach, thesystem 200 may need to do this computation after all the true positives have been generated. Furthermore, thesystem 200 can match within the result set to check whether there are indeed at least two different objects or not. - The
search engine bolt 206 takes partial data (aka seed) and tries to find more publically available information via a search engine 220, such as Yahoo® Boss, because finding titles and social handles is useful. The data thus obtained is passed through a search resultsbolt 222 to extract vital information and enrich a data record to build a full person profile, such as by passing the data to a handle extractor bolt 224. - The search results
bolt 222 uses search result snippets having attractive properties that suggest they be made first-class “objects” in a person database 226 and/or contact data model and matching engines. Snippets are consumed without running afoul of terms of use restrictions. For the most part, snippets contain information about a single entity—a person, company or contact. Snippets might be matched to a different type of suitable object, such as person, company, or contact. Some snippets contain information about multiple companies at which a person has worked, so snippets could be used to connect together multiple contacts of the same person Such a matching is of mostly unstructured text (the snippet) to structured data (a particular contact object): This matching does not require entity extraction from the snippet. This matching could be algorithmically relatively easy to do. Once a snippet has been matched to a suitable object with a sufficiently high confidence score, certain “nuggets” might be extracted from the snippet and the matching object enriched. For example, if the snippet contains a LinkedIn handle and the snippet matches a particular contact sufficiently well, this handle is then be attached to that contact. A snippet may tie together multiple contacts of the same person because the snippet contains the names of multiple companies at which the person has worked. - Contact initiated snippets generation and matching may work as follows. Start with a contact J. Let C denote the cluster of the person database 226 containing J. Generate a suitable query Q to the search engine 220 from J. For each snippet S in the top search results on Q, if S matches C with a sufficiently high confidence, add S to C, otherwise add S to a collection of unmatched snippets. If the person name in J is sufficiently uncommon, set Q to person-name(J), else set Q to person-name(J)+company-name(J). Two examples are Pawan Nachnani and John Smith ibm. Note that there is no data quality risk by setting a query too broad, such as a common person name, because the resulting snippets will be deeply matched with C. An overly broad query does not yield good recall because none of the snippets in its result set deeply match C. Recall may be less important than precision because if the
system 200 makes up for low recall by pounding away at the search engine 220, so long as thesystem 200 is not constrained overly by search volume limits. Also, if thesystem 200 uses a mechanism to consume unmatched snippets, this mitigates the recall limitation a lot. C denotes the data of a single person. A snippet may contain data of this person spread across multiple contacts, which is why the database system matches S to C and not merely to J. - The process described in the previous section can produce a lot of snippets that remain unmatched. Accumulating these even over a short period of time may yield millions of snippets. Many of these snippets could contain useful information about contacts or persons that are not even yet in the database. In short, these snippets collectively have a lot of value. These snippets might be matched to contact or person objects and placed in the suitable cluster, then be available for merge. One major challenge in this regard is that of indexing a snippet for efficient matching. A person name may be a good index for snippets from person queries. The person name can be found from a snippet by light-weight entity recognition. Therefore, the
match bolt 218 includes bolts such as ahandle bolt 228, anemail bolt 230, aname@company bolt 232, aname@phone bolt 234, and aname@location bolt 236 to match snippets to clusters of objects in the person database 226. - A cluster bolt 238 clusters all matching claims together into a common cluster. A
merge bolt 240 merges all claims and existing contact records (partial and/or complete) from a cluster into a single composite record (the merged record) and computes a confidence score for the merged record. If the merged record is incomplete, themerge bolt 240 enriches the record when possible with information available in the cache. If the record is complete, themerge bolt 240 marks the record as canonicalized. At this point, the record is ready to be persisted in the person database 226, provided its confidence score is sufficiently high. Themerge bolt 240 also updates the merge time of the incoming claim. - If r.day is today, then this score may have the
value 1, and the score can reduce to 0 for a long time (many, many days) in the past. Score(r,rank)—based on r.title. c-level titles may get a score of 1 and the rank score can monotonically decay for lower rank titles. Score(r,title_quality)—High rank titles, e.g. Vice President, do not necessarily have high quality. Title_quality may score this separate dimension. A title might be deemed to have high quality if it has a known rank and has a known department and is not in an explicit list of poor titles. The quality may decrease depending on which (and how many) of the tests in the above sentence are violated. Score(r,domain)—might only be defined when r's company has been matched to company jc. Score(r,d)=#emails in domain d/#contacts in company jc. Score(r,pattern_domain)—How well (r.first_name,r.last_name,r.email) fits the email pattern of the domain of r.email Let p(r)=(r.first_name,r.lastname,r.email) be the pattern in r. For example, p=first.last for (john,doe,john.doe@xyz.com), p=flast for (john doe,jdoe@xyz.com) Score(r,pattern_domain)=#emails in domain of r.email having pattern p(r) divided by #emails in domain of r.email - The intent is that updates algorithmically deemed risky may be logged for review by a data steward or community. Feedback from the review can be used to assess the accuracy of this scoring/detection mechanism, and tuning of it if it is deemed useful enough. An update is risky if a contact's last name is changed. A title change with more than one level increase in rank, such as software engineer to ceo, is also risky. A score version of this may make the risk score depend on the number of skipped levels. A title change which changes departments to another incompatible department, such as. vp sales to vp engineering is also risky. Updating or adding a C-level contact in a large company is risky, but easy to generalize in a scoring setting—the higher the rank of the contact and the larger the company size, the higher the risk score may be. Also, different update actions might possibly have differing risks, such as a title change is generally more risky than a last name change for a female. A fortune 1000 headquarters address change is also risky, but scoring may generalize this to important company combined with attribute-specific change scoreoverall risk score)
- The join bolt 242 takes all the merged claims from the
merge bolt 240 and construct person objects. A person object may be a collection of major profiles, such as a person profile, a work profile, and a social profile. The data from each merged claim can update one or many attributes across all the three profiles of a person. In some cases, a merge claim may end up creating new profile objects as new claims become available. Each attribute in a profile ends up with a confidence score that may ultimately determine the level of “gold” for that particular profile object. While most of the attributes might be permanent, some of the attributes could be transient and need to re-computed over time due to privacy and legal reasons. - A persist bolt 244 may save all the resultant person records and the underlying claims to the
person database 246 once all the processing is completed by the join bolt 242. - The bounce email processing bolt is a
reaper bolt 246 that aggregates multiple facts with a current claim and comes up with a score and a disposition about that score. Thereaper bolt 246 may determine if a fact is a duplicate. The fact disposition can determine if the computed score warrants a graveyard or ungraveyard of the underlying contact. The score of the current claim could be computed as follows: Take all claims and scored facts for the same email. For each fact, get the base score determined by the response category of the email. From the description from thebounce email spout 210, the contact message is typically unstructured data. Thereaper bolt 246 may address this by using a trie-based approach to find tokens specified in a list of vendor dictionaries. Each vendor dictionary can specify the token with a classified response category. Response categories for email may be hard_error, heavy_error, soft_error, email— received, unknown. Once the score is computed, depending on the live contact and graveyard thresholds, thereaper bolt 246 may determine if the contact is to be made live or graveyarded. Thereaper bolt 246 can automatically graveyard records from bounce reports and phone campaigns, or float these records to a community for task resolution. - The
crawler spout 214 looks at free web (sites approved by a legal department for acceptable terms of service) and finds publically available information/claims. Since most of the open web sources of data are un-structured; the publicly available information typically requires sophisticated natural language processing techniques to extract meaningful information from it. Therefore, thecrawler spout 214 feeds snippets of information to a natural language processing bolt 248, which applies natural language processing and machine learning techniques to extract relevant data/facts to emit the following types of claims: contact added, contact updated, contact graveyarded, and social handles. - A natural, human person may be represented as a graph of p:Person entities (nodes, or vertices) interconnected by links (edges). Each node can represent a different facet of the user (person). Each of these facets may be held in a separate (graph) container called a context. Each person entity node can be a set of attributes and objects. These attributes might be simple literals (such as the user's first name) or they could be other entities (called complex attributes). These latter attributes might be links to other entity nodes. Typically each node in the person graph is located in its own context. The root node may lie in a special context (for each user) called the root context.
- Once the golden records are curated, the
system 200 delivers this data to the person database 226 that is customer accessible. This golden data may also be propagated back to the original source systems and other partner systems and help keep the data clean in their respective source databases. - The
system 200 provides a complete 360 degrees feedback loop and reduces the chances that bad or fraudulent data may ever make it into customer's customer relationship management systems or any other system where a consolidated view of an account and person data is required. The core person and account repository also continues to grow over time as new pieces of data are found on the free web and other sources. Additional sources of data may also be on-boarded quickly into thesystem 200 by adding and configuring new spouts and corresponding bolts into the Storm topology. For example, a de-duplication bolt detects duplicates and automatically merges the duplicates or float suspected duplicates to a community for task resolution. In another example, a pinger bolt pings hypertext transfer protocol and simple mail transfer protocol domains for validity, automatically graveyarding when a domain is deemed invalid. - The
system 200 may create indices for each company, person, and location object for matching purposes. Examples of person indices include record identifier, social handle, email direct phone number, company, city, zip, state, and country. Examples of location indices include record identifier, zip, city, and country. Examples of company indices include record identifier, domain, corporate phone, company prefix, stock ticker, company name and city, domain and city. - The system may build an inverted index from a snippet, and use the index to map words in the snippet to their positions. The positions for a given word could be in increasing order. An inverted index is illustrated in an example below.
- www.linkedin.com/in/shabdvaid Cached
Shabd Vaid. Experience: Co-founder, Vice President Engineering & Operations, iStorez Inc.; Director of Engineering, Responsys; Senior Software Engineer, Newgen . . . .
Inverted Index: (only some key-value pairs shown).
shabd→<0,5>, vaid→<1,6>, vice→<9>, president→<10>, - The
system 200 detects acronyms (if any) in the snippet, expands out these acronyms, tokenizes the expansion and incorporates these expansions into the inverted index, as illustrated in the example below. - IBM News room—Virginia M. Rometty—Chairman, President and . . . .
www-03.ibm.com/press/us/en/biography/10069.wss Cached
IBM Press Room—Ginni Rometti Biography . . . Full biography. Ginni Rometty is Chairman, President and Chief Executive Officer of IBM. - Before acronymization, the inverted index contains the entry ibm<0,i,j> where i and j denote the word positions of the 2nd and 3rd occurrence of IBM in the snippet. After recognizing the acronym ibm→“international business machines”, the database system adds the entries international[i,0], business[i,1], and machines[i,2] to the inverted index. Acronym-expansion entries in a snippet's inverted index could be useful for matching titles or company names to the snippet.
- The
system 200 may represent an attribute:value pair as an ordered tree. The order can capture the order of the words in the value, and also in acronym expansions. The ordered tree may capture choices, which include aliases, and acronym expansions. Table 1 below shows various examples. Ordered trees can be depicted as nested arrays, and constructed via attribute-specific constructors. For example, person_name objects are expanded to include first name aliases, and acronyms in company names and titles are detected and expanded, such as depicted in table 1. Ordered trees may have alternating levels of ordered ANDs and unordered ORs. For visual convenience, an AND-node is encapsulated in [ . . . ] and an OR-node in ( . . . ). -
-
attribute value ordered tree person_name (first_name = bob, [(bob, robert), smith] last_name = smith) title chairman and ceo [chairman, and , (ceo, [chief, executive, officer])] company ibm corp [(ibm, [international, business, machines]), corp] - As an example, [chairman, and, (ceo, [chief, executive, officer])] is read as “chairman AND (ceo OR (chief AND executive AND officer)).” Representing the snippet as an inverted index combined with representing attribute:value pairs as ordered trees may lead to a very fast matching algorithm, as described below. The
system 200 has attribute-specific matchers to match a value of a field to a snippet, which is unstructured text. The attribute-specific matchers could be instances of the following generic matcher. - match(attribute,value,snippet_inverted_index)
- Build ordered tree, attribute_value_ordered_tree, from attribute:value pair.
- Build hits, which populates a copy of the ordered tree with positions of words in the snippet that match (these replace the words in the original ordered tree). hits uses snippet_inverted_index and attribute_value_ordered_tree as arguments.
- Analyze hits to score the match.
- end match
- Building hits could be attribute-independent. Analyzing hits might be done “on-the-fly” with building hits, however the algorithm is easier to understand when the two steps are separated out. Table 2 below shows some examples. A post-list in hits is represented by < . . . >.
- Table 2, Hits from attribute_value_ordered_tree and snippet_inverted_index:
-
Row attribute_value_ordered_tree snippet_inverted_index hits 1 [shabd, vaid] {shabd → <0, 5>, vaid [<0, 5>, <1, 6>] → <1, 6>, vice → <9>, president → <10>, . . . } 2 [vice, president] {shabd → <0, 5>, vaid [<9>, <10>] → <1, 6>, vice → <9>, president → <10>, . . . } 3 [(vp, [vice, president])] {shabd → <0, 5>, vaid [(nil, [<9>, <10>])] → <1, 6>, vice → <9>, president → <10>, . . . } 4 [(bob, robert), smith] {robert → <8>, smith [(nil, <8>), <9>] → <9>, . . . } 5 [chairman, and , (ceo, [chief, {chairman → <0>, and [<0>, <1>, (<8>, executive, officer])] → <1>, chief → <2>, [<2>, <3>, <4>])] executive → <3>, officer → <4>, . . . , ceo → <8>} 6 [(ibm, {ibm → <0>} [(<0>, [nil, nil, nil]), [international, business, nil] machines]), corp] - Enumerating individual hits may be described based on the hits data structure in the last column of Table 2. Individual hits can reveal exactly what tokens in the query matched what positions in the snippet. Each hit could be individually scored. The overall score for the match of the attribute:value pair in the snippet might be defined as the aggregation of these individual scores. A hit could be a pair (tokens,positions), where tokens might be an array of tokens in attribute_value_ordered_tree and positions could be an array of positions in the snippet at which these tokens match, such as the examples below.
- A one-level hits tree is simply an array of post-lists. In Table 2, hits of
rows system 200 may use a k-merge like algorithm to enumerate all the hits of such a tree to a snippet. This algorithm can “merge” k post-lists, as illustrated below. Below is an illustration on the hits [<0,5>, <1,6>] - [< 0 ,5>, < 1 ,6>]→([shabd,vaid],0 . . . 1)
[<0,5>, <1,6>]→([shabd,vaid],5 . . . 6) - The underlined entries depict the locations of the pointers in the various post-lists. In
step 1, the pointers are at the start positions. Since 1 minus 0 equals 1, thesystem 200 generates a hit, 0 . . . 1, and advances both pointers. Instep 2, since 6 minus 5 equals 1, thesystem 200 enumerates a hit, 5 . . . 6, and advances both pointers. - Enumerating hits of a multi-level tree may be done by suitably generalizing the k-merge operation. The generalization can be a little complex, and may be well described by building up inductively from different types of multi-level tree examples.
- Example 1 is based on the hits of row 3 in Table 2: [(nil,[<9>, <10>])] and corresponds to a 3-level tree. The
system 200 processes this example as follows. - [(nil, [<9>, <10>])]
(nil, [<9>, <10>])
[<9>, <10>]→([vice,president],9 . . . 10) - First, the
system 200 goes down one level since the top level is a singleton-AND. Next, thesystem 200 skips the nil. Finally, thesystem 200 produces the hit 9 . . . 10 from [<9>, <10>] and annotates it with [vice, president]. - Example 2 is based on the hits of row 4 in Table 2: [(nil,<8>),<9>]
- [(nil,<8>),<9>]
[(nil,<8>),<9>]→([robert, smith],8 . . . 9) - In
step 1, thesystem 200 tries to 2-merge (nil,<8>) with <9>. Recognizing that the first argument is an OR, thesystem 200 goes down one level into the OR and effectively does the 2-merge of [<8>,<9>] instep 2. - Example 3 is based on the hits in row 5 of Table 2: [<0>, <1>, (<8>, [<2>, <3>, <4>])]
- [<0>, <1>, (<8>, [<2>, <3>, <4>])]
[<0>, <1>, (<8>, [<2>, <3>, <4>])]→([chairman,and,ceo],(0 . . . 1,8))
[<0>, <1>, (<8>, [<2>, <3>, <4>])]→([chairman,and,chief,executive,officer],0 . . . 4) - In
step 1, thesystem 200 recognizes that the need of a 3-merge at the top level. Thesystem 200 places the pointers at the correct locations of the first two entries. The third entry is an OR, so thesystem 200 descends into the third entry and then places the pointer on the first entry in the first post-list in the OR choices. (This entry is 8.) Thesystem 200 then outputs the hit (0 . . . 1,8) off to the scorer. Next, in step 3, thesystem 200 moves over to the second choice in this OR. This is itself an AND of three choices. So thesystem 200 needs a 3-merge, of [<2>, <3>, <4>]. This 3-merge produces thehit 2 . . . 4, which gets appended to 0 . . . 1 to yield 0 . . . 4. - Example 4: is based on the hits row 6 of Table 2: [(<0>,[nil,nil,nil]),nil]
- [(<0>, [nil,nil,nil]),nil]
[(<0>,[nil,nil,nil]),nil]→([ibm,corp],[0,nil]) - In
step 1, thesystem 200 recognizes that the need of a 2-merge at the top level. Thesystem 200 notices that the first entry is an OR, so thesystem 200 descends into the first entry and then places the pointer on the first entry in the first post-list in the OR choices. Thesystem 200 notes that the second entry of the top-level AND is nil, so thesystem 200 outputs [0,nil] as one hit. Next, thesystem 200 advances the first pointer to the second choice in the OR (<0>,[nil,nil,nil]) and notices that it is [nil,nil,nil]. So thesystem 200 stops; such that no new hits are generated. - The hit scorer may take two arguments: argument_name and hit. Table 3 shows a number of examples explaining the scoring. Table 3, Scoring individual hits:
-
Attribute Hit Scoring Explanation person ([shabd, vaid], 0 . . . 1) Very high score since 1-0 = 1 name title ([vice, president], 9 . . . 10) Very high score since 10-9 = 1 title ([chairman, and, ceo], (0 . . . 1, 8)) Moderate score since 8 is far from 0 . . . 1 title ([chairman, and, chief, Very high score because of executive, officer], 0 . . . 4) 0 . . . 4 company ([ibm, corp], [0, nil]) High score because the name unmatched corp is a company stop word company ([jigsaw, data, corp], [5, nil, nil]) Moderately high because the name unmatched corp is a company stop word and the unmatched data is not the first word company ([data, corp], [nil, 3]) Low score because the name unmatched data is first word in company name person ([john, smith], [3, 9]) Low score because the name distance between the two matches, i.e. 9-3, is too high. person ([john, smith], [3, 5]) Moderately high score name because the distance 5-3 is small (2) though not ideal (1). title ([director, of, engineering], [6, nil, 5]) Moderately high score because |5-6| = 1 and title matches should be looser on word order. - The
system 200 brings together various algorithms, processes and techniques that are particularly suited for finding inaccurate data and piecing together rapidly changing pieces of data and claims to generate golden records at a massive scale. Thesystem 200 provides a complete framework to efficiently evaluate data and to improve the completeness and accuracy of data. Thesystem 200 provides a solid foundation for linking external data sources to core data assets in a reliable and scalable way that will enable customers to gain additional insights into their customers. -
FIG. 3 illustrates a block diagram of anenvironment 310 wherein an on-demand database service might be used. Theenvironment 310 may includeuser systems 312, anetwork 314, asystem 316, aprocessor system 317, anapplication platform 318, anetwork interface 320, atenant data storage 322, asystem data storage 324,program code 326, and aprocess space 328. In other embodiments, theenvironment 310 may not have all of the components listed and/or may have other elements instead of, or in addition to, those listed above. - The
environment 310 is an environment in which an on-demand database service exists. Auser system 312 may be any machine or system that is used by a user to access a database user system. For example, any of theuser systems 312 may be a handheld computing device, a mobile phone, a laptop computer, a work station, and/or a network of computing devices. As illustrated inFIG. 3 (and in more detail inFIG. 4 ) theuser systems 312 might interact via thenetwork 314 with an on-demand database service, which is thesystem 316. - An on-demand database service, such as the
system 316, is a database system that is made available to outside users that do not need to necessarily be concerned with building and/or maintaining the database system, but instead may be available for their use when the users need the database system (e.g., on the demand of the users). Some on-demand database services may store information from one or more tenants stored into tables of a common database image to form a multi-tenant database system (MTS). Accordingly, the “on-demand database service 316” and the “system 316” will be used interchangeably herein. A database image may include one or more database objects. A relational database management system (RDMS) or the equivalent may execute storage and retrieval of information against the database object(s). Theapplication platform 318 may be a framework that allows the applications of thesystem 316 to run, such as the hardware and/or software, e.g., the operating system. In an embodiment, the on-demand database service 316 may include theapplication platform 318 which enables creation, managing and executing one or more applications developed by the provider of the on-demand database service, users accessing the on-demand database service viauser systems 312, or third party application developers accessing the on-demand database service via theuser systems 312. - The users of the
user systems 312 may differ in their respective capacities, and the capacity of aparticular user system 312 might be entirely determined by permissions (permission levels) for the current user. For example, where a salesperson is using aparticular user system 312 to interact with thesystem 316, thatuser system 312 has the capacities allotted to that salesperson. However, while an administrator is using thatuser system 312 to interact with thesystem 316, thatuser system 312 has the capacities allotted to that administrator. In systems with a hierarchical role model, users at one permission level may have access to applications, data, and database information accessible by a lower permission level user, but may not have access to certain applications, database information, and data accessible by a user at a higher permission level. Thus, different users will have different capabilities with regard to accessing and modifying application and database information, depending on a user's security or permission level. - The
network 314 is any network or combination of networks of devices that communicate with one another. For example, thenetwork 314 may be any one or any combination of a LAN (local area network), WAN (wide area network), telephone network, wireless network, point-to-point network, star network, token ring network, hub network, or other appropriate configuration. As the most common type of computer network in current use is a TCP/IP (Transfer Control Protocol and Internet Protocol) network, such as the global internetwork of networks often referred to as the “Internet” with a capital “I,” that network will be used in many of the examples herein. However, it should be understood that the networks that the one or more implementations might use are not so limited, although TCP/IP is a frequently implemented protocol. - The
user systems 312 might communicate with thesystem 316 using TCP/IP and, at a higher network level, use other common Internet protocols to communicate, such as HTTP, FTP, AFS, WAP, etc. In an example where HTTP is used, theuser systems 312 might include an HTTP client commonly referred to as a “browser” for sending and receiving HTTP messages to and from an HTTP server at thesystem 316. Such an HTTP server might be implemented as the sole network interface between thesystem 316 and thenetwork 314, but other techniques might be used as well or instead. In some implementations, the interface between thesystem 316 and thenetwork 314 includes load sharing functionality, such as round-robin HTTP request distributors to balance loads and distribute incoming HTTP requests evenly over a plurality of servers. At least as for the users that are accessing that server, each of the plurality of servers has access to the MTS' data; however, other alternative configurations may be used instead. - In one embodiment, the
system 316, shown inFIG. 3 , implements a web-based customer relationship management (CRM) system. For example, in one embodiment, thesystem 316 includes application servers configured to implement and execute CRM software applications as well as provide related data, code, forms, webpages and other information to and from theuser systems 312 and to store to, and retrieve from, a database system related data, objects, and Webpage content. With a multi-tenant system, data for multiple tenants may be stored in the same physical database object, however, tenant data typically is arranged so that data of one tenant is kept logically separate from that of other tenants so that one tenant does not have access to another tenant's data, unless such data is expressly shared. In certain embodiments, thesystem 316 implements applications other than, or in addition to, a CRM application. For example, thesystem 316 may provide tenant access to multiple hosted (standard and custom) applications, including a CRM application. User (or third party developer) applications, which may or may not include CRM, may be supported by theapplication platform 318, which manages creation, storage of the applications into one or more database objects and executing of the applications in a virtual machine in the process space of thesystem 316. - One arrangement for elements of the
system 316 is shown inFIG. 3 , including thenetwork interface 320, theapplication platform 318, thetenant data storage 322 fortenant data 323, thesystem data storage 324 forsystem data 325 accessible to thesystem 316 and possibly multiple tenants, theprogram code 326 for implementing various functions of thesystem 316, and theprocess space 328 for executing MTS system processes and tenant-specific processes, such as running applications as part of an application hosting service. Additional processes that may execute on thesystem 316 include database indexing processes. - Several elements in the system shown in
FIG. 3 include conventional, well-known elements that are explained only briefly here. For example, each of theuser systems 312 could include a desktop personal computer, workstation, laptop, PDA, cell phone, or any wireless access protocol (WAP) enabled device or any other computing device capable of interfacing directly or indirectly to the Internet or other network connection. Each of theuser systems 312 typically runs an HTTP client, e.g., a browsing program, such as Microsoft's Internet Explorer browser, Netscape's Navigator browser, Opera's browser, or a WAP-enabled browser in the case of a cell phone, PDA or other wireless device, or the like, allowing a user (e.g., subscriber of the multi-tenant database system) of theuser systems 312 to access, process and view information, pages and applications available to it from thesystem 316 over thenetwork 314. Each of theuser systems 312 also typically includes one or more user interface devices, such as a keyboard, a mouse, trackball, touch pad, touch screen, pen or the like, for interacting with a graphical user interface (GUI) provided by the browser on a display (e.g., a monitor screen, LCD display, etc.) in conjunction with pages, forms, applications and other information provided by thesystem 316 or other systems or servers. For example, the user interface device may be used to access data and applications hosted by thesystem 316, and to perform searches on stored data, and otherwise allow a user to interact with various GUI pages that may be presented to a user. As discussed above, embodiments are suitable for use with the Internet, which refers to a specific global internetwork of networks. However, it should be understood that other networks can be used instead of the Internet, such as an intranet, an extranet, a virtual private network (VPN), a non-TCP/IP based network, any LAN or WAN or the like. - According to one embodiment, each of the
user systems 312 and all of its components are operator configurable using applications, such as a browser, including computer code run using a central processing unit such as an Intel Pentium® processor or the like. Similarly, the system 316 (and additional instances of an MTS, where more than one is present) and all of their components might be operator configurable using application(s) including computer code to run using a central processing unit such as theprocessor system 317, which may include an Intel Pentium® processor or the like, and/or multiple processor units. A computer program product embodiment includes a machine-readable storage medium (media) having instructions stored thereon/in which can be used to program a computer to perform any of the processes of the embodiments described herein. Computer code for operating and configuring thesystem 316 to intercommunicate and to process webpages, applications and other data and media content as described herein are preferably downloaded and stored on a hard disk, but the entire program code, or portions thereof, may also be stored in any other volatile or non-volatile memory medium or device as is well known, such as a ROM or RAM, or provided on any media capable of storing program code, such as any type of rotating media including floppy disks, optical discs, digital versatile disk (DVD), compact disk (CD), microdrive, and magneto-optical disks, and magnetic or optical cards, nanosystems (including molecular memory ICs), or any type of media or device suitable for storing instructions and/or data. Additionally, the entire program code, or portions thereof, may be transmitted and downloaded from a software source over a transmission medium, e.g., over the Internet, or from another server, as is well known, or transmitted over any other conventional network connection as is well known (e.g., extranet, VPN, LAN, etc.) using any communication medium and protocols (e.g., TCP/IP, HTTP, HTTPS, Ethernet, etc.) as are well known. It will also be appreciated that computer code for implementing embodiments can be implemented in any programming language that can be executed on a client system and/or server or server system such as, for example, C, C++, HTML, any other markup language, Java™, JavaScript, ActiveX, any other scripting language, such as VBScript, and many other programming languages as are well known may be used. (Java™ is a trademark of Sun Microsystems, Inc.). - According to one embodiment, the
system 316 is configured to provide webpages, forms, applications, data and media content to the user (client)systems 312 to support the access by theuser systems 312 as tenants of thesystem 316. As such, thesystem 316 provides security mechanisms to keep each tenant's data separate unless the data is shared. If more than one MTS is used, they may be located in close proximity to one another (e.g., in a server farm located in a single building or campus), or they may be distributed at locations remote from one another (e.g., one or more servers located in city A and one or more servers located in city B). As used herein, each MTS could include one or more logically and/or physically connected servers distributed locally or across one or more geographic locations. Additionally, the term “server” is meant to include a computer system, including processing hardware and process space(s), and an associated storage system and database application (e.g., OODBMS or RDBMS) as is well known in the art. It should also be understood that “server system” and “server” are often used interchangeably herein. Similarly, the database object described herein can be implemented as single databases, a distributed database, a collection of distributed databases, a database with redundant online or offline backups or other redundancies, etc., and might include a distributed database or storage network and associated processing intelligence. -
FIG. 4 also illustrates theenvironment 310. However, inFIG. 4 elements of thesystem 316 and various interconnections in an embodiment are further illustrated.FIG. 4 shows that the each of theuser systems 312 may include aprocessor system 312A, amemory system 312B, aninput system 312C, and anoutput system 312D.FIG. 4 shows thenetwork 314 and thesystem 316.FIG. 4 also shows that thesystem 316 may include thetenant data storage 322, thetenant data 323, thesystem data storage 324, thesystem data 325, a User Interface (UI) 430, an Application Program Interface (API) 432, a PL/SOQL 434, saveroutines 436, anapplication setup mechanism 438, applications servers 400 1-400 N, asystem process space 402,tenant process spaces 404, a tenantmanagement process space 410, atenant storage area 412, auser storage 414, andapplication metadata 416. In other embodiments, theenvironment 310 may not have the same elements as those listed above and/or may have other elements instead of, or in addition to, those listed above. - The
user systems 312, thenetwork 314, thesystem 316, thetenant data storage 322, and thesystem data storage 324 were discussed above inFIG. 3 . Regarding theuser systems 312, theprocessor system 312A may be any combination of one or more processors. Thememory system 312B may be any combination of one or more memory devices, short term, and/or long term memory. Theinput system 312C may be any combination of input devices, such as one or more keyboards, mice, trackballs, scanners, cameras, and/or interfaces to networks. Theoutput system 312D may be any combination of output devices, such as one or more monitors, printers, and/or interfaces to networks. As shown byFIG. 4 , thesystem 316 may include the network interface 320 (ofFIG. 3 ) implemented as a set ofHTTP application servers 400, theapplication platform 318, thetenant data storage 322, and thesystem data storage 324. Also shown is thesystem process space 402, including individualtenant process spaces 404 and the tenantmanagement process space 410. Eachapplication server 400 may be configured to accesstenant data storage 322 and thetenant data 323 therein, and thesystem data storage 324 and thesystem data 325 therein to serve requests of theuser systems 312. Thetenant data 323 might be divided into individualtenant storage areas 412, which can be either a physical arrangement and/or a logical arrangement of data. Within eachtenant storage area 412, theuser storage 414 and theapplication metadata 416 might be similarly allocated for each user. For example, a copy of a user's most recently used (MRU) items might be stored to theuser storage 414. Similarly, a copy of MRU items for an entire organization that is a tenant might be stored to thetenant storage area 412. TheUI 430 provides a user interface and theAPI 432 provides an application programmer interface to thesystem 316 resident processes to users and/or developers at theuser systems 312. The tenant data and the system data may be stored in various databases, such as one or more Oracle™ databases. - The
application platform 318 includes theapplication setup mechanism 438 that supports application developers' creation and management of applications, which may be saved as metadata into thetenant data storage 322 by the saveroutines 436 for execution by subscribers as one or moretenant process spaces 404 managed by thetenant management process 410 for example. Invocations to such applications may be coded using the PL/SOQL 34 that provides a programming language style interface extension to theAPI 432. A detailed description of some PL/SOQL language embodiments is discussed in commonly owned U.S. Pat. No. 7,730,478 entitled, METHOD AND SYSTEM FOR ALLOWING ACCESS TO DEVELOPED APPLICATIONS VIA A MULTI-TENANT ON-DEMAND DATABASE SERVICE, by Craig Weissman, filed Sep. 21, 2007, which is incorporated in its entirety herein for all purposes. Invocations to applications may be detected by one or more system processes, which manages retrieving theapplication metadata 416 for the subscriber making the invocation and executing the metadata as an application in a virtual machine. - Each
application server 400 may be communicably coupled to database systems, e.g., having access to thesystem data 325 and thetenant data 323, via a different network connection. For example, oneapplication server 400 1 might be coupled via the network 314 (e.g., the Internet), anotherapplication server 400 N-1 might be coupled via a direct network link, and anotherapplication server 400 N might be coupled by yet a different network connection. Transfer Control Protocol and Internet Protocol (TCP/IP) are typical protocols for communicating betweenapplication servers 400 and the database system. However, it will be apparent to one skilled in the art that other transport protocols may be used to optimize the system depending on the network interconnect used. - In certain embodiments, each
application server 400 is configured to handle requests for any user associated with any organization that is a tenant. Because it is desirable to be able to add and remove application servers from the server pool at any time for any reason, there is preferably no server affinity for a user and/or organization to aspecific application server 400. In one embodiment, therefore, an interface system implementing a load balancing function (e.g., an F5 Big-IP load balancer) is communicably coupled between theapplication servers 400 and theuser systems 312 to distribute requests to theapplication servers 400. In one embodiment, the load balancer uses a least connections algorithm to route user requests to theapplication servers 400. Other examples of load balancing algorithms, such as round robin and observed response time, also can be used. For example, in certain embodiments, three consecutive requests from the same user could hit threedifferent application servers 400, and three requests from different users could hit thesame application server 400. In this manner, thesystem 316 is multi-tenant, wherein thesystem 316 handles storage of, and access to, different objects, data and applications across disparate users and organizations. - As an example of storage, one tenant might be a company that employs a sales force where each salesperson uses the
system 316 to manage their sales process. Thus, a user might maintain contact data, leads data, customer follow-up data, performance data, goals and progress data, etc., all applicable to that user's personal sales process (e.g., in the tenant data storage 322). In an example of a MTS arrangement, since all of the data and the applications to access, view, modify, report, transmit, calculate, etc., can be maintained and accessed by a user system having nothing more than network access, the user can manage his or her sales efforts and cycles from any of many different user systems. For example, if a salesperson is visiting a customer and the customer has Internet access in their lobby, the salesperson can obtain critical updates as to that customer while waiting for the customer to arrive in the lobby. - While each user's data might be separate from other users' data regardless of the employers of each user, some data might be organization-wide data shared or accessible by a plurality of users or all of the users for a given organization that is a tenant. Thus, there might be some data structures managed by the
system 316 that are allocated at the tenant level while other data structures might be managed at the user level. Because an MTS might support multiple tenants including possible competitors, the MTS should have security protocols that keep data, applications, and application use separate. Also, because many tenants may opt for access to an MTS rather than maintain their own system, redundancy, up-time, and backup are additional functions that may be implemented in the MTS. In addition to user-specific data and tenant specific data, thesystem 316 might also maintain system level data usable by multiple tenants or other data. Such system level data might include industry reports, news, postings, and the like that are sharable among tenants. - In certain embodiments, the user systems 312 (which may be client systems) communicate with the
application servers 400 to request and update system-level and tenant-level data from thesystem 316 that may require sending one or more queries to thetenant data storage 322 and/or thesystem data storage 324. The system 316 (e.g., anapplication server 400 in the system 316) automatically generates one or more SQL statements (e.g., one or more SQL queries) that are designed to access the desired information. Thesystem data storage 324 may generate query plans to access the requested data from the database. - Each database can generally be viewed as a collection of objects, such as a set of logical tables, containing data fitted into predefined categories. A “table” is one representation of a data object, and may be used herein to simplify the conceptual description of objects and custom objects. It should be understood that “table” and “object” may be used interchangeably herein. Each table generally contains one or more data categories logically arranged as columns or fields in a viewable schema. Each row or record of a table contains an instance of data for each category defined by the fields. For example, a CRM database may include a table that describes a customer with fields for basic contact information such as name, address, phone number, fax number, etc. Another table might describe a purchase order, including fields for information such as customer, product, sale price, date, etc. In some multi-tenant database systems, standard entity tables might be provided for use by all tenants. For CRM database applications, such standard entities might include tables for Account, Contact, Lead, and Opportunity data, each containing pre-defined fields. It should be understood that the word “entity” may also be used interchangeably herein with “object” and “table”.
- In some multi-tenant database systems, tenants may be allowed to create and store custom objects, or they may be allowed to customize standard entities or objects, for example by creating custom fields for standard objects, including custom index fields. U.S. Pat. No. 7,779,039, filed Apr. 2, 2004, entitled “Custom Entities and Fields in a Multi-Tenant Database System”, which is hereby incorporated herein by reference, teaches systems and methods for creating custom objects as well as customizing standard objects in a multi-tenant database system. In certain embodiments, for example, all custom entity data rows are stored in a single multi-tenant physical table, which may contain multiple logical tables per organization. It is transparent to customers that their multiple “tables” are in fact stored in one large table or that their data may be stored in the same table as the data of other customers.
- While one or more implementations have been described by way of example and in terms of the specific embodiments, it is to be understood that one or more implementations are not limited to the disclosed embodiments. To the contrary, it is intended to cover various modifications and similar arrangements as would be apparent to those skilled in the art. Therefore, the scope of the appended claims should be accorded the broadest interpretation so as to encompass all such modifications and similar arrangements.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/337,352 US20150032729A1 (en) | 2013-07-23 | 2014-07-22 | Matching snippets of search results to clusters of objects |
Applications Claiming Priority (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201361857325P | 2013-07-23 | 2013-07-23 | |
US201361862873P | 2013-08-06 | 2013-08-06 | |
US14/337,352 US20150032729A1 (en) | 2013-07-23 | 2014-07-22 | Matching snippets of search results to clusters of objects |
Publications (1)
Publication Number | Publication Date |
---|---|
US20150032729A1 true US20150032729A1 (en) | 2015-01-29 |
Family
ID=52391371
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/337,352 Abandoned US20150032729A1 (en) | 2013-07-23 | 2014-07-22 | Matching snippets of search results to clusters of objects |
US14/337,505 Active 2035-10-26 US9760620B2 (en) | 2013-07-23 | 2014-07-22 | Confidently adding snippets of search results to clusters of objects |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/337,505 Active 2035-10-26 US9760620B2 (en) | 2013-07-23 | 2014-07-22 | Confidently adding snippets of search results to clusters of objects |
Country Status (1)
Country | Link |
---|---|
US (2) | US20150032729A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017016130A1 (en) * | 2015-07-30 | 2017-02-02 | 中兴通讯股份有限公司 | Message processing method and device |
US10366247B2 (en) | 2015-06-02 | 2019-07-30 | ALTR Solutions, Inc. | Replacing distinct data in a relational database with a distinct reference to that data and distinct de-referencing of database data |
US11360990B2 (en) | 2019-06-21 | 2022-06-14 | Salesforce.Com, Inc. | Method and a system for fuzzy matching of entities in a database system based on machine learning |
Families Citing this family (208)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10565229B2 (en) | 2018-05-24 | 2020-02-18 | People.ai, Inc. | Systems and methods for matching electronic activities directly to record objects of systems of record |
US20150006304A1 (en) * | 2013-06-28 | 2015-01-01 | International Business Machines Corporation | Location-based and time-sensitive goods ratings |
US10007702B2 (en) * | 2013-12-19 | 2018-06-26 | Siemens Aktiengesellschaft | Processing an input query |
US10181051B2 (en) | 2016-06-10 | 2019-01-15 | OneTrust, LLC | Data processing systems for generating and populating a data inventory for processing data access requests |
US10289867B2 (en) | 2014-07-27 | 2019-05-14 | OneTrust, LLC | Data processing systems for webform crawling to map processing activities and related methods |
US9729583B1 (en) | 2016-06-10 | 2017-08-08 | OneTrust, LLC | Data processing systems and methods for performing privacy assessments and monitoring of new versions of computer code for privacy compliance |
WO2016145457A1 (en) * | 2015-03-12 | 2016-09-15 | Kaplan, Inc. | Course skill matching system and method thereof |
CN104967885B (en) * | 2015-03-27 | 2019-01-11 | 哈尔滨工业大学深圳研究生院 | A kind of method and system for advertisement recommendation based on video content perception |
US9953073B2 (en) * | 2015-05-18 | 2018-04-24 | Oath Inc. | System and method for editing dynamically aggregated data |
US10083403B2 (en) * | 2015-06-30 | 2018-09-25 | The Boeing Company | Data driven classification and data quality checking method |
US10089581B2 (en) * | 2015-06-30 | 2018-10-02 | The Boeing Company | Data driven classification and data quality checking system |
US10664481B2 (en) * | 2015-09-29 | 2020-05-26 | Cisco Technology, Inc. | Computer system programmed to identify common subsequences in logs |
US10657135B2 (en) * | 2015-09-30 | 2020-05-19 | International Business Machines Corporation | Smart tuple resource estimation |
US10296620B2 (en) | 2015-09-30 | 2019-05-21 | International Business Machines Corporation | Smart tuple stream alteration |
US10733209B2 (en) | 2015-09-30 | 2020-08-04 | International Business Machines Corporation | Smart tuple dynamic grouping of tuples |
US10558670B2 (en) | 2015-09-30 | 2020-02-11 | International Business Machines Corporation | Smart tuple condition-based operation performance |
FR3047622B1 (en) * | 2016-02-09 | 2019-07-26 | Idemia Identity And Security | METHOD FOR CONTROLLING AN INDICATIVE PARAMETER OF A CONFIDENCE LEVEL ASSOCIATED WITH A USER ACCOUNT OF AN ONLINE SERVICE |
US11244367B2 (en) | 2016-04-01 | 2022-02-08 | OneTrust, LLC | Data processing systems and methods for integrating privacy information management systems with data loss prevention tools or other tools for privacy design |
US10706447B2 (en) | 2016-04-01 | 2020-07-07 | OneTrust, LLC | Data processing systems and communication systems and methods for the efficient generation of privacy risk assessments |
US11004125B2 (en) | 2016-04-01 | 2021-05-11 | OneTrust, LLC | Data processing systems and methods for integrating privacy information management systems with data loss prevention tools or other tools for privacy design |
US10423996B2 (en) | 2016-04-01 | 2019-09-24 | OneTrust, LLC | Data processing systems and communication systems and methods for the efficient generation of privacy risk assessments |
US20220164840A1 (en) | 2016-04-01 | 2022-05-26 | OneTrust, LLC | Data processing systems and methods for integrating privacy information management systems with data loss prevention tools or other tools for privacy design |
US10706174B2 (en) | 2016-06-10 | 2020-07-07 | OneTrust, LLC | Data processing systems for prioritizing data subject access requests for fulfillment and related methods |
US10592648B2 (en) | 2016-06-10 | 2020-03-17 | OneTrust, LLC | Consent receipt management systems and related methods |
US10944725B2 (en) | 2016-06-10 | 2021-03-09 | OneTrust, LLC | Data processing systems and methods for using a data model to select a target data asset in a data migration |
US11354435B2 (en) | 2016-06-10 | 2022-06-07 | OneTrust, LLC | Data processing systems for data testing to confirm data deletion and related methods |
US11188862B2 (en) | 2016-06-10 | 2021-11-30 | OneTrust, LLC | Privacy management systems and methods |
US10885485B2 (en) | 2016-06-10 | 2021-01-05 | OneTrust, LLC | Privacy management systems and methods |
US11200341B2 (en) | 2016-06-10 | 2021-12-14 | OneTrust, LLC | Consent receipt management systems and related methods |
US10284604B2 (en) | 2016-06-10 | 2019-05-07 | OneTrust, LLC | Data processing and scanning systems for generating and populating a data inventory |
US11403377B2 (en) | 2016-06-10 | 2022-08-02 | OneTrust, LLC | Privacy management systems and methods |
US10510031B2 (en) | 2016-06-10 | 2019-12-17 | OneTrust, LLC | Data processing systems for identifying, assessing, and remediating data processing risks using data modeling techniques |
US10440062B2 (en) | 2016-06-10 | 2019-10-08 | OneTrust, LLC | Consent receipt management systems and related methods |
US10642870B2 (en) | 2016-06-10 | 2020-05-05 | OneTrust, LLC | Data processing systems and methods for automatically detecting and documenting privacy-related aspects of computer software |
US10467432B2 (en) | 2016-06-10 | 2019-11-05 | OneTrust, LLC | Data processing systems for use in automatically generating, populating, and submitting data subject access requests |
US11138299B2 (en) | 2016-06-10 | 2021-10-05 | OneTrust, LLC | Data processing and scanning systems for assessing vendor risk |
US10614247B2 (en) | 2016-06-10 | 2020-04-07 | OneTrust, LLC | Data processing systems for automated classification of personal information from documents and related methods |
US10282692B2 (en) | 2016-06-10 | 2019-05-07 | OneTrust, LLC | Data processing systems for identifying, assessing, and remediating data processing risks using data modeling techniques |
US11157600B2 (en) | 2016-06-10 | 2021-10-26 | OneTrust, LLC | Data processing and scanning systems for assessing vendor risk |
US11025675B2 (en) | 2016-06-10 | 2021-06-01 | OneTrust, LLC | Data processing systems and methods for performing privacy assessments and monitoring of new versions of computer code for privacy compliance |
US10713387B2 (en) | 2016-06-10 | 2020-07-14 | OneTrust, LLC | Consent conversion optimization systems and related methods |
US10949565B2 (en) | 2016-06-10 | 2021-03-16 | OneTrust, LLC | Data processing systems for generating and populating a data inventory |
US11144622B2 (en) | 2016-06-10 | 2021-10-12 | OneTrust, LLC | Privacy management systems and methods |
US11392720B2 (en) | 2016-06-10 | 2022-07-19 | OneTrust, LLC | Data processing systems for verification of consent and notice processing and related methods |
US10353673B2 (en) | 2016-06-10 | 2019-07-16 | OneTrust, LLC | Data processing systems for integration of consumer feedback with data subject access requests and related methods |
US10909265B2 (en) | 2016-06-10 | 2021-02-02 | OneTrust, LLC | Application privacy scanning systems and related methods |
US10606916B2 (en) | 2016-06-10 | 2020-03-31 | OneTrust, LLC | Data processing user interface monitoring systems and related methods |
US10585968B2 (en) | 2016-06-10 | 2020-03-10 | OneTrust, LLC | Data processing systems for fulfilling data subject access requests and related methods |
US11238390B2 (en) | 2016-06-10 | 2022-02-01 | OneTrust, LLC | Privacy management systems and methods |
US11636171B2 (en) | 2016-06-10 | 2023-04-25 | OneTrust, LLC | Data processing user interface monitoring systems and related methods |
US11301796B2 (en) | 2016-06-10 | 2022-04-12 | OneTrust, LLC | Data processing systems and methods for customizing privacy training |
US10346637B2 (en) | 2016-06-10 | 2019-07-09 | OneTrust, LLC | Data processing systems for the identification and deletion of personal data in computer systems |
US10848523B2 (en) | 2016-06-10 | 2020-11-24 | OneTrust, LLC | Data processing systems for data-transfer risk identification, cross-border visualization generation, and related methods |
US11277448B2 (en) | 2016-06-10 | 2022-03-15 | OneTrust, LLC | Data processing systems for data-transfer risk identification, cross-border visualization generation, and related methods |
US11461500B2 (en) | 2016-06-10 | 2022-10-04 | OneTrust, LLC | Data processing systems for cookie compliance testing with website scanning and related methods |
US11188615B2 (en) | 2016-06-10 | 2021-11-30 | OneTrust, LLC | Data processing consent capture systems and related methods |
US10430740B2 (en) | 2016-06-10 | 2019-10-01 | One Trust, LLC | Data processing systems for calculating and communicating cost of fulfilling data subject access requests and related methods |
US11341447B2 (en) | 2016-06-10 | 2022-05-24 | OneTrust, LLC | Privacy management systems and methods |
US10565161B2 (en) | 2016-06-10 | 2020-02-18 | OneTrust, LLC | Data processing systems for processing data subject access requests |
US11074367B2 (en) | 2016-06-10 | 2021-07-27 | OneTrust, LLC | Data processing systems for identity validation for consumer rights requests and related methods |
US11416589B2 (en) | 2016-06-10 | 2022-08-16 | OneTrust, LLC | Data processing and scanning systems for assessing vendor risk |
US11416109B2 (en) | 2016-06-10 | 2022-08-16 | OneTrust, LLC | Automated data processing systems and methods for automatically processing data subject access requests using a chatbot |
US10783256B2 (en) | 2016-06-10 | 2020-09-22 | OneTrust, LLC | Data processing systems for data transfer risk identification and related methods |
US11295316B2 (en) | 2016-06-10 | 2022-04-05 | OneTrust, LLC | Data processing systems for identity validation for consumer rights requests and related methods |
US11222142B2 (en) | 2016-06-10 | 2022-01-11 | OneTrust, LLC | Data processing systems for validating authorization for personal data collection, storage, and processing |
US10776517B2 (en) | 2016-06-10 | 2020-09-15 | OneTrust, LLC | Data processing systems for calculating and communicating cost of fulfilling data subject access requests and related methods |
US10769301B2 (en) | 2016-06-10 | 2020-09-08 | OneTrust, LLC | Data processing systems for webform crawling to map processing activities and related methods |
US10353674B2 (en) | 2016-06-10 | 2019-07-16 | OneTrust, LLC | Data processing and communications systems and methods for the efficient implementation of privacy by design |
US10997318B2 (en) | 2016-06-10 | 2021-05-04 | OneTrust, LLC | Data processing systems for generating and populating a data inventory for processing data access requests |
US12136055B2 (en) | 2016-06-10 | 2024-11-05 | OneTrust, LLC | Data processing systems for identifying, assessing, and remediating data processing risks using data modeling techniques |
US10509894B2 (en) | 2016-06-10 | 2019-12-17 | OneTrust, LLC | Data processing and scanning systems for assessing vendor risk |
US11151233B2 (en) | 2016-06-10 | 2021-10-19 | OneTrust, LLC | Data processing and scanning systems for assessing vendor risk |
US11586700B2 (en) | 2016-06-10 | 2023-02-21 | OneTrust, LLC | Data processing systems and methods for automatically blocking the use of tracking tools |
US11294939B2 (en) | 2016-06-10 | 2022-04-05 | OneTrust, LLC | Data processing systems and methods for automatically detecting and documenting privacy-related aspects of computer software |
US10896394B2 (en) | 2016-06-10 | 2021-01-19 | OneTrust, LLC | Privacy management systems and methods |
US10708305B2 (en) | 2016-06-10 | 2020-07-07 | OneTrust, LLC | Automated data processing systems and methods for automatically processing requests for privacy-related information |
US10496803B2 (en) | 2016-06-10 | 2019-12-03 | OneTrust, LLC | Data processing systems and methods for efficiently assessing the risk of privacy campaigns |
US10282700B2 (en) | 2016-06-10 | 2019-05-07 | OneTrust, LLC | Data processing systems for generating and populating a data inventory |
US11727141B2 (en) | 2016-06-10 | 2023-08-15 | OneTrust, LLC | Data processing systems and methods for synching privacy-related user consent across multiple computing devices |
US11410106B2 (en) | 2016-06-10 | 2022-08-09 | OneTrust, LLC | Privacy management systems and methods |
US10289866B2 (en) | 2016-06-10 | 2019-05-14 | OneTrust, LLC | Data processing systems for fulfilling data subject access requests and related methods |
US11138242B2 (en) | 2016-06-10 | 2021-10-05 | OneTrust, LLC | Data processing systems and methods for automatically detecting and documenting privacy-related aspects of computer software |
US11343284B2 (en) | 2016-06-10 | 2022-05-24 | OneTrust, LLC | Data processing systems and methods for performing privacy assessments and monitoring of new versions of computer code for privacy compliance |
US11354434B2 (en) | 2016-06-10 | 2022-06-07 | OneTrust, LLC | Data processing systems for verification of consent and notice processing and related methods |
US10169609B1 (en) | 2016-06-10 | 2019-01-01 | OneTrust, LLC | Data processing systems for fulfilling data subject access requests and related methods |
US10242228B2 (en) | 2016-06-10 | 2019-03-26 | OneTrust, LLC | Data processing systems for measuring privacy maturity within an organization |
US10997315B2 (en) | 2016-06-10 | 2021-05-04 | OneTrust, LLC | Data processing systems for fulfilling data subject access requests and related methods |
US10509920B2 (en) | 2016-06-10 | 2019-12-17 | OneTrust, LLC | Data processing systems for processing data subject access requests |
US10496846B1 (en) | 2016-06-10 | 2019-12-03 | OneTrust, LLC | Data processing and communications systems and methods for the efficient implementation of privacy by design |
US10678945B2 (en) | 2016-06-10 | 2020-06-09 | OneTrust, LLC | Consent receipt management systems and related methods |
US10853501B2 (en) | 2016-06-10 | 2020-12-01 | OneTrust, LLC | Data processing and scanning systems for assessing vendor risk |
US11438386B2 (en) | 2016-06-10 | 2022-09-06 | OneTrust, LLC | Data processing systems for data-transfer risk identification, cross-border visualization generation, and related methods |
US10798133B2 (en) | 2016-06-10 | 2020-10-06 | OneTrust, LLC | Data processing systems for data-transfer risk identification, cross-border visualization generation, and related methods |
US10454973B2 (en) * | 2016-06-10 | 2019-10-22 | OneTrust, LLC | Data processing systems for data-transfer risk identification, cross-border visualization generation, and related methods |
US10181019B2 (en) | 2016-06-10 | 2019-01-15 | OneTrust, LLC | Data processing systems and communications systems and methods for integrating privacy compliance systems with software development and agile tools for privacy design |
US11416590B2 (en) | 2016-06-10 | 2022-08-16 | OneTrust, LLC | Data processing and scanning systems for assessing vendor risk |
US12045266B2 (en) | 2016-06-10 | 2024-07-23 | OneTrust, LLC | Data processing systems for generating and populating a data inventory |
US10318761B2 (en) | 2016-06-10 | 2019-06-11 | OneTrust, LLC | Data processing systems and methods for auditing data request compliance |
US11328092B2 (en) | 2016-06-10 | 2022-05-10 | OneTrust, LLC | Data processing systems for processing and managing data subject access in a distributed environment |
US10846433B2 (en) | 2016-06-10 | 2020-11-24 | OneTrust, LLC | Data processing consent management systems and related methods |
US11418492B2 (en) | 2016-06-10 | 2022-08-16 | OneTrust, LLC | Data processing systems and methods for using a data model to select a target data asset in a data migration |
US11057356B2 (en) | 2016-06-10 | 2021-07-06 | OneTrust, LLC | Automated data processing systems and methods for automatically processing data subject access requests using a chatbot |
US10275614B2 (en) | 2016-06-10 | 2019-04-30 | OneTrust, LLC | Data processing systems for generating and populating a data inventory |
US10776518B2 (en) | 2016-06-10 | 2020-09-15 | OneTrust, LLC | Consent receipt management systems and related methods |
US10839102B2 (en) | 2016-06-10 | 2020-11-17 | OneTrust, LLC | Data processing systems for identifying and modifying processes that are subject to data subject access requests |
US11038925B2 (en) | 2016-06-10 | 2021-06-15 | OneTrust, LLC | Data processing systems for data-transfer risk identification, cross-border visualization generation, and related methods |
US10878127B2 (en) | 2016-06-10 | 2020-12-29 | OneTrust, LLC | Data subject access request processing systems and related methods |
US11520928B2 (en) | 2016-06-10 | 2022-12-06 | OneTrust, LLC | Data processing systems for generating personal data receipts and related methods |
US11210420B2 (en) | 2016-06-10 | 2021-12-28 | OneTrust, LLC | Data subject access request processing systems and related methods |
US10762236B2 (en) | 2016-06-10 | 2020-09-01 | OneTrust, LLC | Data processing user interface monitoring systems and related methods |
US12118121B2 (en) | 2016-06-10 | 2024-10-15 | OneTrust, LLC | Data subject access request processing systems and related methods |
US11023842B2 (en) | 2016-06-10 | 2021-06-01 | OneTrust, LLC | Data processing systems and methods for bundled privacy policies |
US10572686B2 (en) | 2016-06-10 | 2020-02-25 | OneTrust, LLC | Consent receipt management systems and related methods |
US11544667B2 (en) | 2016-06-10 | 2023-01-03 | OneTrust, LLC | Data processing systems for generating and populating a data inventory |
US10346638B2 (en) | 2016-06-10 | 2019-07-09 | OneTrust, LLC | Data processing systems for identifying and modifying processes that are subject to data subject access requests |
US11651106B2 (en) | 2016-06-10 | 2023-05-16 | OneTrust, LLC | Data processing systems for fulfilling data subject access requests and related methods |
US10592692B2 (en) | 2016-06-10 | 2020-03-17 | OneTrust, LLC | Data processing systems for central consent repository and related methods |
US11087260B2 (en) | 2016-06-10 | 2021-08-10 | OneTrust, LLC | Data processing systems and methods for customizing privacy training |
US10706131B2 (en) | 2016-06-10 | 2020-07-07 | OneTrust, LLC | Data processing systems and methods for efficiently assessing the risk of privacy campaigns |
US10706379B2 (en) | 2016-06-10 | 2020-07-07 | OneTrust, LLC | Data processing systems for automatic preparation for remediation and related methods |
US10685140B2 (en) | 2016-06-10 | 2020-06-16 | OneTrust, LLC | Consent receipt management systems and related methods |
US10565397B1 (en) | 2016-06-10 | 2020-02-18 | OneTrust, LLC | Data processing systems for fulfilling data subject access requests and related methods |
US10289870B2 (en) | 2016-06-10 | 2019-05-14 | OneTrust, LLC | Data processing systems for fulfilling data subject access requests and related methods |
US11416798B2 (en) | 2016-06-10 | 2022-08-16 | OneTrust, LLC | Data processing systems and methods for providing training in a vendor procurement process |
US10607028B2 (en) | 2016-06-10 | 2020-03-31 | OneTrust, LLC | Data processing systems for data testing to confirm data deletion and related methods |
US10282559B2 (en) | 2016-06-10 | 2019-05-07 | OneTrust, LLC | Data processing systems for identifying, assessing, and remediating data processing risks using data modeling techniques |
US11146566B2 (en) | 2016-06-10 | 2021-10-12 | OneTrust, LLC | Data processing systems for fulfilling data subject access requests and related methods |
US11100444B2 (en) | 2016-06-10 | 2021-08-24 | OneTrust, LLC | Data processing systems and methods for providing training in a vendor procurement process |
US11227247B2 (en) | 2016-06-10 | 2022-01-18 | OneTrust, LLC | Data processing systems and methods for bundled privacy policies |
US11481710B2 (en) | 2016-06-10 | 2022-10-25 | OneTrust, LLC | Privacy management systems and methods |
US10740487B2 (en) | 2016-06-10 | 2020-08-11 | OneTrust, LLC | Data processing systems and methods for populating and maintaining a centralized database of personal data |
US10796260B2 (en) | 2016-06-10 | 2020-10-06 | OneTrust, LLC | Privacy management systems and methods |
US10438017B2 (en) | 2016-06-10 | 2019-10-08 | OneTrust, LLC | Data processing systems for processing data subject access requests |
US11625502B2 (en) | 2016-06-10 | 2023-04-11 | OneTrust, LLC | Data processing systems for identifying and modifying processes that are subject to data subject access requests |
US10776514B2 (en) | 2016-06-10 | 2020-09-15 | OneTrust, LLC | Data processing systems for the identification and deletion of personal data in computer systems |
US11336697B2 (en) | 2016-06-10 | 2022-05-17 | OneTrust, LLC | Data processing systems for data-transfer risk identification, cross-border visualization generation, and related methods |
US10452866B2 (en) | 2016-06-10 | 2019-10-22 | OneTrust, LLC | Data processing systems for fulfilling data subject access requests and related methods |
US10586075B2 (en) | 2016-06-10 | 2020-03-10 | OneTrust, LLC | Data processing systems for orphaned data identification and deletion and related methods |
US10503926B2 (en) | 2016-06-10 | 2019-12-10 | OneTrust, LLC | Consent receipt management systems and related methods |
US10706176B2 (en) | 2016-06-10 | 2020-07-07 | OneTrust, LLC | Data-processing consent refresh, re-prompt, and recapture systems and related methods |
US11562097B2 (en) | 2016-06-10 | 2023-01-24 | OneTrust, LLC | Data processing systems for central consent repository and related methods |
US10437412B2 (en) | 2016-06-10 | 2019-10-08 | OneTrust, LLC | Consent receipt management systems and related methods |
US12052289B2 (en) | 2016-06-10 | 2024-07-30 | OneTrust, LLC | Data processing systems for data-transfer risk identification, cross-border visualization generation, and related methods |
US10803200B2 (en) | 2016-06-10 | 2020-10-13 | OneTrust, LLC | Data processing systems for processing and managing data subject access in a distributed environment |
US10949170B2 (en) | 2016-06-10 | 2021-03-16 | OneTrust, LLC | Data processing systems for integration of consumer feedback with data subject access requests and related methods |
US10909488B2 (en) | 2016-06-10 | 2021-02-02 | OneTrust, LLC | Data processing systems for assessing readiness for responding to privacy-related incidents |
US10726158B2 (en) | 2016-06-10 | 2020-07-28 | OneTrust, LLC | Consent receipt management and automated process blocking systems and related methods |
US11675929B2 (en) | 2016-06-10 | 2023-06-13 | OneTrust, LLC | Data processing consent sharing systems and related methods |
US10452864B2 (en) | 2016-06-10 | 2019-10-22 | OneTrust, LLC | Data processing systems for webform crawling to map processing activities and related methods |
US10565236B1 (en) | 2016-06-10 | 2020-02-18 | OneTrust, LLC | Data processing systems for generating and populating a data inventory |
US10873606B2 (en) | 2016-06-10 | 2020-12-22 | OneTrust, LLC | Data processing systems for data-transfer risk identification, cross-border visualization generation, and related methods |
US10416966B2 (en) | 2016-06-10 | 2019-09-17 | OneTrust, LLC | Data processing systems for identity validation of data subject access requests and related methods |
US11366786B2 (en) | 2016-06-10 | 2022-06-21 | OneTrust, LLC | Data processing systems for processing data subject access requests |
US11651104B2 (en) | 2016-06-10 | 2023-05-16 | OneTrust, LLC | Consent receipt management systems and related methods |
US11366909B2 (en) | 2016-06-10 | 2022-06-21 | OneTrust, LLC | Data processing and scanning systems for assessing vendor risk |
US11228620B2 (en) | 2016-06-10 | 2022-01-18 | OneTrust, LLC | Data processing systems for data-transfer risk identification, cross-border visualization generation, and related methods |
US11222139B2 (en) | 2016-06-10 | 2022-01-11 | OneTrust, LLC | Data processing systems and methods for automatic discovery and assessment of mobile software development kits |
US11475136B2 (en) | 2016-06-10 | 2022-10-18 | OneTrust, LLC | Data processing systems for data transfer risk identification and related methods |
US11134086B2 (en) | 2016-06-10 | 2021-09-28 | OneTrust, LLC | Consent conversion optimization systems and related methods |
US10204154B2 (en) | 2016-06-10 | 2019-02-12 | OneTrust, LLC | Data processing systems for generating and populating a data inventory |
US11222309B2 (en) | 2016-06-10 | 2022-01-11 | OneTrust, LLC | Data processing systems for generating and populating a data inventory |
US10235534B2 (en) | 2016-06-10 | 2019-03-19 | OneTrust, LLC | Data processing systems for prioritizing data subject access requests for fulfillment and related methods |
CN107544979A (en) * | 2016-06-24 | 2018-01-05 | 上海壹账通金融科技有限公司 | The credibility Analysis method and system of user data |
CN106909600A (en) * | 2016-07-07 | 2017-06-30 | 阿里巴巴集团控股有限公司 | The collection method and device of user context information |
US10585930B2 (en) | 2016-07-29 | 2020-03-10 | International Business Machines Corporation | Determining a relevancy of a content summary |
US10803070B2 (en) * | 2016-07-29 | 2020-10-13 | International Business Machines Corporation | Selecting a content summary based on relevancy |
US10372816B2 (en) | 2016-12-13 | 2019-08-06 | International Business Machines Corporation | Preprocessing of string inputs in natural language processing |
US10546063B2 (en) | 2016-12-13 | 2020-01-28 | International Business Machines Corporation | Processing of string inputs utilizing machine learning |
US20180203916A1 (en) * | 2017-01-19 | 2018-07-19 | Acquire Media Ventures Inc. | Data clustering with reduced partial signature matching using key-value storage and retrieval |
US10645138B2 (en) * | 2017-05-02 | 2020-05-05 | Salesforce.Com, Inc | Event stream processing system using a coordinating spout instance |
US11005864B2 (en) | 2017-05-19 | 2021-05-11 | Salesforce.Com, Inc. | Feature-agnostic behavior profile based anomaly detection |
US10013577B1 (en) | 2017-06-16 | 2018-07-03 | OneTrust, LLC | Data processing systems for identifying whether cookies contain personally identifying information |
US20190042932A1 (en) * | 2017-08-01 | 2019-02-07 | Salesforce Com, Inc. | Techniques and Architectures for Deep Learning to Support Security Threat Detection |
GB2572541A (en) * | 2018-03-27 | 2019-10-09 | Innoplexus Ag | System and method for identifying at least one association of entity |
US10956402B2 (en) | 2018-04-13 | 2021-03-23 | Visa International Service Association | Method and system for automatically detecting errors in at least one date entry using image maps |
US11924297B2 (en) | 2018-05-24 | 2024-03-05 | People.ai, Inc. | Systems and methods for generating a filtered data set |
US11463441B2 (en) | 2018-05-24 | 2022-10-04 | People.ai, Inc. | Systems and methods for managing the generation or deletion of record objects based on electronic activities and communication policies |
US10803202B2 (en) | 2018-09-07 | 2020-10-13 | OneTrust, LLC | Data processing systems for orphaned data identification and deletion and related methods |
US11144675B2 (en) | 2018-09-07 | 2021-10-12 | OneTrust, LLC | Data processing systems and methods for automatically protecting sensitive data within privacy management systems |
US11544409B2 (en) | 2018-09-07 | 2023-01-03 | OneTrust, LLC | Data processing systems and methods for automatically protecting sensitive data within privacy management systems |
US11126673B2 (en) | 2019-01-29 | 2021-09-21 | Salesforce.Com, Inc. | Method and system for automatically enriching collected seeds with information extracted from one or more websites |
US10866996B2 (en) | 2019-01-29 | 2020-12-15 | Saleforce.com, inc. | Automated method and system for clustering enriched company seeds into a cluster and selecting best values for each attribute within the cluster to generate a company profile |
US11755914B2 (en) | 2019-01-31 | 2023-09-12 | Salesforce, Inc. | Machine learning from data steward feedback for merging records |
US11176108B2 (en) | 2019-02-04 | 2021-11-16 | International Business Machines Corporation | Data resolution among disparate data sources |
WO2020191355A1 (en) * | 2019-03-21 | 2020-09-24 | Salesforce.Com, Inc. | Machine learning from data steward feedback for merging records |
US11157508B2 (en) | 2019-06-21 | 2021-10-26 | Salesforce.Com, Inc. | Estimating the number of distinct entities from a set of records of a database system |
US12039538B2 (en) | 2020-04-01 | 2024-07-16 | Visa International Service Association | System, method, and computer program product for breach detection using convolutional neural networks |
US11966372B1 (en) * | 2020-05-01 | 2024-04-23 | Bottomline Technologies, Inc. | Database record combination |
US11487936B2 (en) * | 2020-05-27 | 2022-11-01 | Capital One Services, Llc | System and method for electronic text analysis and contextual feedback |
EP4179435B1 (en) | 2020-07-08 | 2024-09-04 | OneTrust LLC | Systems and methods for targeted data discovery |
WO2022026564A1 (en) | 2020-07-28 | 2022-02-03 | OneTrust, LLC | Systems and methods for automatically blocking the use of tracking tools |
US11475165B2 (en) | 2020-08-06 | 2022-10-18 | OneTrust, LLC | Data processing systems and methods for automatically redacting unstructured data from a data subject access request |
US11500853B1 (en) * | 2020-09-04 | 2022-11-15 | Live Data Technologies, Inc. | Virtual data store systems and methods |
WO2022060860A1 (en) | 2020-09-15 | 2022-03-24 | OneTrust, LLC | Data processing systems and methods for detecting tools for the automatic blocking of consent requests |
US11526624B2 (en) | 2020-09-21 | 2022-12-13 | OneTrust, LLC | Data processing systems and methods for automatically detecting target data transfers and target data processing |
US11397819B2 (en) | 2020-11-06 | 2022-07-26 | OneTrust, LLC | Systems and methods for identifying data processing activities based on data discovery results |
WO2022159901A1 (en) | 2021-01-25 | 2022-07-28 | OneTrust, LLC | Systems and methods for discovery, classification, and indexing of data in a native computing system |
WO2022170047A1 (en) | 2021-02-04 | 2022-08-11 | OneTrust, LLC | Managing custom attributes for domain objects defined within microservices |
US11494515B2 (en) | 2021-02-08 | 2022-11-08 | OneTrust, LLC | Data processing systems and methods for anonymizing data samples in classification analysis |
WO2022173912A1 (en) | 2021-02-10 | 2022-08-18 | OneTrust, LLC | Systems and methods for mitigating risks of third-party computing system functionality integration into a first-party computing system |
US11775348B2 (en) | 2021-02-17 | 2023-10-03 | OneTrust, LLC | Managing custom workflows for domain objects defined within microservices |
US11546661B2 (en) | 2021-02-18 | 2023-01-03 | OneTrust, LLC | Selective redaction of media content |
EP4305539A1 (en) | 2021-03-08 | 2024-01-17 | OneTrust, LLC | Data transfer discovery and analysis systems and related methods |
US11562078B2 (en) | 2021-04-16 | 2023-01-24 | OneTrust, LLC | Assessing and managing computational risk involved with integrating third party computing functionality within a computing system |
US20220374700A1 (en) * | 2021-05-21 | 2022-11-24 | Adp, Llc | Time-Series Anomaly Detection Via Deep Learning |
US11934402B2 (en) | 2021-08-06 | 2024-03-19 | Bank Of America Corporation | System and method for generating optimized data queries to improve hardware efficiency and utilization |
US11748346B2 (en) | 2021-09-30 | 2023-09-05 | Amazon Technologies, Inc. | Multi-tenant hosting of inverted indexes for text searches |
US11620142B1 (en) | 2022-06-03 | 2023-04-04 | OneTrust, LLC | Generating and customizing user interfaces for demonstrating functions of interactive user environments |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5504890A (en) * | 1994-03-17 | 1996-04-02 | Sanford; Michael D. | System for data sharing among independently-operating information-gathering entities with individualized conflict resolution rules |
US20030037051A1 (en) * | 1999-07-20 | 2003-02-20 | Gruenwald Bjorn J. | System and method for organizing data |
US6658423B1 (en) * | 2001-01-24 | 2003-12-02 | Google, Inc. | Detecting duplicate and near-duplicate files |
US6947930B2 (en) * | 2003-03-21 | 2005-09-20 | Overture Services, Inc. | Systems and methods for interactive search query refinement |
US20050234952A1 (en) * | 2004-04-15 | 2005-10-20 | Microsoft Corporation | Content propagation for enhanced document retrieval |
US20060026152A1 (en) * | 2004-07-13 | 2006-02-02 | Microsoft Corporation | Query-based snippet clustering for search result grouping |
US20060117002A1 (en) * | 2004-11-26 | 2006-06-01 | Bing Swen | Method for search result clustering |
US20070027921A1 (en) * | 2005-08-01 | 2007-02-01 | Billy Alvarado | Context based action |
US20070192293A1 (en) * | 2006-02-13 | 2007-08-16 | Bing Swen | Method for presenting search results |
US20080222140A1 (en) * | 2007-02-20 | 2008-09-11 | Wright State University | Comparative web search system and method |
US20090240672A1 (en) * | 2008-03-18 | 2009-09-24 | Cuill, Inc. | Apparatus and method for displaying search results with a variety of display paradigms |
US20100023515A1 (en) * | 2008-07-28 | 2010-01-28 | Andreas Marx | Data clustering engine |
US20100070460A1 (en) * | 2005-05-02 | 2010-03-18 | Fuerst Karl | System and method for rule-based data object matching |
US20120023107A1 (en) * | 2010-01-15 | 2012-01-26 | Salesforce.Com, Inc. | System and method of matching and merging records |
US8782016B2 (en) * | 2011-08-26 | 2014-07-15 | Qatar Foundation | Database record repair |
Family Cites Families (118)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5608872A (en) | 1993-03-19 | 1997-03-04 | Ncr Corporation | System for allowing all remote computers to perform annotation on an image and replicating the annotated image on the respective displays of other comuters |
US5649104A (en) | 1993-03-19 | 1997-07-15 | Ncr Corporation | System for allowing user of any computer to draw image over that generated by the host computer and replicating the drawn image to other computers |
US5577188A (en) | 1994-05-31 | 1996-11-19 | Future Labs, Inc. | Method to provide for virtual screen overlay |
GB2300991B (en) | 1995-05-15 | 1997-11-05 | Andrew Macgregor Ritchie | Serving signals to browsing clients |
US5715450A (en) | 1995-09-27 | 1998-02-03 | Siebel Systems, Inc. | Method of selecting and presenting data from a database using a query language to a user of a computer system |
US5821937A (en) | 1996-02-23 | 1998-10-13 | Netsuite Development, L.P. | Computer method for updating a network design |
US5831610A (en) | 1996-02-23 | 1998-11-03 | Netsuite Development L.P. | Designing networks |
US6604117B2 (en) | 1996-03-19 | 2003-08-05 | Siebel Systems, Inc. | Method of maintaining a network of partially replicated database system |
US5873096A (en) | 1997-10-08 | 1999-02-16 | Siebel Systems, Inc. | Method of maintaining a network of partially replicated database system |
EP1021775A4 (en) | 1997-02-26 | 2005-05-11 | Siebel Systems Inc | Method of determining the visibility to a remote databaseclient of a plurality of database transactions using simplified visibility rules |
AU6654798A (en) | 1997-02-26 | 1998-09-18 | Siebel Systems, Inc. | Method of determining visibility to a remote database client of a plurality of database transactions using a networked proxy server |
AU6183698A (en) | 1997-02-26 | 1998-09-18 | Siebel Systems, Inc. | Method of determining visibility to a remote database client of a plurality of database transactions having variable visibility strengths |
WO1998040804A2 (en) | 1997-02-26 | 1998-09-17 | Siebel Systems, Inc. | Distributed relational database |
AU6440398A (en) | 1997-02-26 | 1998-09-18 | Siebel Systems, Inc. | Method of using a cache to determine the visibility to a remote database client of a plurality of database transactions |
AU6336798A (en) | 1997-02-27 | 1998-09-29 | Siebel Systems, Inc. | Method of synchronizing independently distributed software and database schema |
WO1998040807A2 (en) | 1997-02-27 | 1998-09-17 | Siebel Systems, Inc. | Migrating to a successive software distribution level |
WO1998038564A2 (en) | 1997-02-28 | 1998-09-03 | Siebel Systems, Inc. | Partially replicated distributed database with multiple levels of remote clients |
US6169534B1 (en) | 1997-06-26 | 2001-01-02 | Upshot.Com | Graphical user interface for customer information management |
US5918159A (en) | 1997-08-04 | 1999-06-29 | Fomukong; Mundi | Location reporting satellite paging system with optional blocking of location reporting |
US6560461B1 (en) | 1997-08-04 | 2003-05-06 | Mundi Fomukong | Authorized location reporting paging system |
US20020059095A1 (en) | 1998-02-26 | 2002-05-16 | Cook Rachael Linette | System and method for generating, capturing, and managing customer lead information over a computer network |
US6732111B2 (en) | 1998-03-03 | 2004-05-04 | Siebel Systems, Inc. | Method, apparatus, system, and program product for attaching files and other objects to a partially replicated database |
US6772229B1 (en) | 2000-11-13 | 2004-08-03 | Groupserve, Inc. | Centrifugal communication and collaboration method |
US6161149A (en) | 1998-03-13 | 2000-12-12 | Groupserve, Inc. | Centrifugal communication and collaboration method |
US5963953A (en) | 1998-03-30 | 1999-10-05 | Siebel Systems, Inc. | Method, and system for product configuration |
AU5791899A (en) | 1998-08-27 | 2000-03-21 | Upshot Corporation | A method and apparatus for network-based sales force management |
US6393605B1 (en) | 1998-11-18 | 2002-05-21 | Siebel Systems, Inc. | Apparatus and system for efficient delivery and deployment of an application |
US6728960B1 (en) | 1998-11-18 | 2004-04-27 | Siebel Systems, Inc. | Techniques for managing multiple threads in a browser environment |
US6601087B1 (en) | 1998-11-18 | 2003-07-29 | Webex Communications, Inc. | Instant document sharing |
EP1163604A4 (en) | 1998-11-30 | 2002-01-09 | Siebel Systems Inc | Assignment manager |
JP2002531890A (en) | 1998-11-30 | 2002-09-24 | シーベル システムズ,インコーポレイティド | Development tools, methods and systems for client-server applications |
JP2002531896A (en) | 1998-11-30 | 2002-09-24 | シーベル システムズ,インコーポレイティド | Call center using smart script |
JP2002531899A (en) | 1998-11-30 | 2002-09-24 | シーベル システムズ,インコーポレイティド | State model for process monitoring |
US7356482B2 (en) | 1998-12-18 | 2008-04-08 | Alternative Systems, Inc. | Integrated change management unit |
US20020072951A1 (en) | 1999-03-03 | 2002-06-13 | Michael Lee | Marketing support database management method, system and program product |
US6574635B2 (en) | 1999-03-03 | 2003-06-03 | Siebel Systems, Inc. | Application instantiation based upon attributes and values stored in a meta data repository, including tiering of application layers objects and components |
US8095413B1 (en) | 1999-05-07 | 2012-01-10 | VirtualAgility, Inc. | Processing management information |
US7698160B2 (en) | 1999-05-07 | 2010-04-13 | Virtualagility, Inc | System for performing collaborative tasks |
US6621834B1 (en) | 1999-11-05 | 2003-09-16 | Raindance Communications, Inc. | System and method for voice transmission over network protocols |
US6535909B1 (en) | 1999-11-18 | 2003-03-18 | Contigo Software, Inc. | System and method for record and playback of collaborative Web browsing session |
US6324568B1 (en) | 1999-11-30 | 2001-11-27 | Siebel Systems, Inc. | Method and system for distributing objects over a network |
US6654032B1 (en) | 1999-12-23 | 2003-11-25 | Webex Communications, Inc. | Instant sharing of documents on a remote server |
US6577726B1 (en) | 2000-03-31 | 2003-06-10 | Siebel Systems, Inc. | Computer telephony integration hotelling method and system |
US7266502B2 (en) | 2000-03-31 | 2007-09-04 | Siebel Systems, Inc. | Feature centric release manager method and system |
US6336137B1 (en) | 2000-03-31 | 2002-01-01 | Siebel Systems, Inc. | Web client-server system and method for incompatible page markup and presentation languages |
US6732100B1 (en) | 2000-03-31 | 2004-05-04 | Siebel Systems, Inc. | Database access method and system for user role defined access |
US6665655B1 (en) | 2000-04-14 | 2003-12-16 | Rightnow Technologies, Inc. | Implicit rating of retrieved information in an information search system |
US6434550B1 (en) | 2000-04-14 | 2002-08-13 | Rightnow Technologies, Inc. | Temporal updates of relevancy rating of retrieved information in an information search system |
US7730072B2 (en) | 2000-04-14 | 2010-06-01 | Rightnow Technologies, Inc. | Automated adaptive classification system for knowledge networks |
US6842748B1 (en) | 2000-04-14 | 2005-01-11 | Rightnow Technologies, Inc. | Usage based strength between related information in an information retrieval system |
US6763501B1 (en) | 2000-06-09 | 2004-07-13 | Webex Communications, Inc. | Remote document serving |
KR100365357B1 (en) | 2000-10-11 | 2002-12-18 | 엘지전자 주식회사 | Method for data communication of mobile terminal |
US7581230B2 (en) | 2001-02-06 | 2009-08-25 | Siebel Systems, Inc. | Adaptive communication application programming interface |
USD454139S1 (en) | 2001-02-20 | 2002-03-05 | Rightnow Technologies | Display screen for a computer |
US7363388B2 (en) | 2001-03-28 | 2008-04-22 | Siebel Systems, Inc. | Method and system for direct server synchronization with a computing device |
US6829655B1 (en) | 2001-03-28 | 2004-12-07 | Siebel Systems, Inc. | Method and system for server synchronization with a computing device via a companion device |
US7174514B2 (en) | 2001-03-28 | 2007-02-06 | Siebel Systems, Inc. | Engine to present a user interface based on a logical structure, such as one for a customer relationship management system, across a web site |
US20030018705A1 (en) | 2001-03-31 | 2003-01-23 | Mingte Chen | Media-independent communication server |
US20030206192A1 (en) | 2001-03-31 | 2003-11-06 | Mingte Chen | Asynchronous message push to web browser |
US6732095B1 (en) | 2001-04-13 | 2004-05-04 | Siebel Systems, Inc. | Method and apparatus for mapping between XML and relational representations |
US7761288B2 (en) | 2001-04-30 | 2010-07-20 | Siebel Systems, Inc. | Polylingual simultaneous shipping of software |
US6782383B2 (en) | 2001-06-18 | 2004-08-24 | Siebel Systems, Inc. | System and method to implement a persistent and dismissible search center frame |
US6728702B1 (en) | 2001-06-18 | 2004-04-27 | Siebel Systems, Inc. | System and method to implement an integrated search center supporting a full-text search and query on a database |
US6763351B1 (en) | 2001-06-18 | 2004-07-13 | Siebel Systems, Inc. | Method, apparatus, and system for attaching search results |
US6711565B1 (en) | 2001-06-18 | 2004-03-23 | Siebel Systems, Inc. | Method, apparatus, and system for previewing search results |
US20030004971A1 (en) | 2001-06-29 | 2003-01-02 | Gong Wen G. | Automatic generation of data models and accompanying user interfaces |
WO2003007734A1 (en) | 2001-07-19 | 2003-01-30 | San-Ei Gen F.F.I., Inc. | Flavor-improving compositions and application thereof |
US6826582B1 (en) | 2001-09-28 | 2004-11-30 | Emc Corporation | Method and system for using file systems for content management |
US7761535B2 (en) | 2001-09-28 | 2010-07-20 | Siebel Systems, Inc. | Method and system for server synchronization with a computing device |
US6978445B2 (en) | 2001-09-28 | 2005-12-20 | Siebel Systems, Inc. | Method and system for supporting user navigation in a browser environment |
US6993712B2 (en) | 2001-09-28 | 2006-01-31 | Siebel Systems, Inc. | System and method for facilitating user interaction in a browser environment |
US6724399B1 (en) | 2001-09-28 | 2004-04-20 | Siebel Systems, Inc. | Methods and apparatus for enabling keyboard accelerators in applications implemented via a browser |
US8359335B2 (en) | 2001-09-29 | 2013-01-22 | Siebel Systems, Inc. | Computing system and method to implicitly commit unsaved data for a world wide web application |
US7146617B2 (en) | 2001-09-29 | 2006-12-05 | Siebel Systems, Inc. | Method, apparatus, and system for implementing view caching in a framework to support web-based applications |
US6901595B2 (en) | 2001-09-29 | 2005-05-31 | Siebel Systems, Inc. | Method, apparatus, and system for implementing a framework to support a web-based application |
US7962565B2 (en) | 2001-09-29 | 2011-06-14 | Siebel Systems, Inc. | Method, apparatus and system for a mobile web client |
US7289949B2 (en) | 2001-10-09 | 2007-10-30 | Right Now Technologies, Inc. | Method for routing electronic correspondence based on the level and type of emotion contained therein |
US7062502B1 (en) | 2001-12-28 | 2006-06-13 | Kesler John N | Automated generation of dynamic data entry user interface for relational database management systems |
US6804330B1 (en) | 2002-01-04 | 2004-10-12 | Siebel Systems, Inc. | Method and system for accessing CRM data via voice |
US7058890B2 (en) | 2002-02-13 | 2006-06-06 | Siebel Systems, Inc. | Method and system for enabling connectivity to a data system |
US7672853B2 (en) | 2002-03-29 | 2010-03-02 | Siebel Systems, Inc. | User interface for processing requests for approval |
US7131071B2 (en) | 2002-03-29 | 2006-10-31 | Siebel Systems, Inc. | Defining an approval process for requests for approval |
US6968348B1 (en) * | 2002-05-28 | 2005-11-22 | Providian Financial Corporation | Method and system for creating and maintaining an index for tracking files relating to people |
US6850949B2 (en) | 2002-06-03 | 2005-02-01 | Right Now Technologies, Inc. | System and method for generating a dynamic interface via a communications network |
US7437720B2 (en) | 2002-06-27 | 2008-10-14 | Siebel Systems, Inc. | Efficient high-interactivity user interface for client-server applications |
US8639542B2 (en) | 2002-06-27 | 2014-01-28 | Siebel Systems, Inc. | Method and apparatus to facilitate development of a customer-specific business process model |
US7594181B2 (en) | 2002-06-27 | 2009-09-22 | Siebel Systems, Inc. | Prototyping graphical user interfaces |
US7251787B2 (en) | 2002-08-28 | 2007-07-31 | Siebel Systems, Inc. | Method and apparatus for an integrated process modeller |
US9448860B2 (en) | 2003-03-21 | 2016-09-20 | Oracle America, Inc. | Method and architecture for providing data-change alerts to external applications via a push service |
JP2006521641A (en) | 2003-03-24 | 2006-09-21 | シーベル システムズ,インコーポレイティド | Custom common objects |
WO2004086198A2 (en) | 2003-03-24 | 2004-10-07 | Siebel Systems, Inc. | Common common object |
US7904340B2 (en) | 2003-03-24 | 2011-03-08 | Siebel Systems, Inc. | Methods and computer-readable medium for defining a product model |
US8762415B2 (en) | 2003-03-25 | 2014-06-24 | Siebel Systems, Inc. | Modeling of order data |
US7620655B2 (en) | 2003-05-07 | 2009-11-17 | Enecto Ab | Method, device and computer program product for identifying visitors of websites |
US7409336B2 (en) | 2003-06-19 | 2008-08-05 | Siebel Systems, Inc. | Method and system for searching data based on identified subset of categories and relevance-scored text representation-category combinations |
US20040260659A1 (en) | 2003-06-23 | 2004-12-23 | Len Chan | Function space reservation system |
US7237227B2 (en) | 2003-06-30 | 2007-06-26 | Siebel Systems, Inc. | Application user interface template with free-form layout |
US7694314B2 (en) | 2003-08-28 | 2010-04-06 | Siebel Systems, Inc. | Universal application network architecture |
US8209308B2 (en) | 2006-05-01 | 2012-06-26 | Rueben Steven L | Method for presentation of revisions of an electronic document |
US9135228B2 (en) | 2006-05-01 | 2015-09-15 | Domo, Inc. | Presentation of document history in a web browsing application |
US8566301B2 (en) | 2006-05-01 | 2013-10-22 | Steven L. Rueben | Document revisions in a collaborative computing environment |
US7779475B2 (en) | 2006-07-31 | 2010-08-17 | Petnote Llc | Software-based method for gaining privacy by affecting the screen of a computing device |
US8082301B2 (en) | 2006-11-10 | 2011-12-20 | Virtual Agility, Inc. | System for supporting collaborative activity |
US8954500B2 (en) | 2008-01-04 | 2015-02-10 | Yahoo! Inc. | Identifying and employing social network relationships |
US8719287B2 (en) | 2007-08-31 | 2014-05-06 | Business Objects Software Limited | Apparatus and method for dynamically selecting componentized executable instructions at run time |
US20090100342A1 (en) | 2007-10-12 | 2009-04-16 | Gabriel Jakobson | Method and system for presenting address and mapping information |
US8504945B2 (en) | 2008-02-01 | 2013-08-06 | Gabriel Jakobson | Method and system for associating content with map zoom function |
US8490025B2 (en) | 2008-02-01 | 2013-07-16 | Gabriel Jakobson | Displaying content associated with electronic mapping systems |
US8014943B2 (en) | 2008-05-08 | 2011-09-06 | Gabriel Jakobson | Method and system for displaying social networking navigation information |
US8032297B2 (en) | 2008-05-08 | 2011-10-04 | Gabriel Jakobson | Method and system for displaying navigation information on an electronic map |
US8646103B2 (en) | 2008-06-30 | 2014-02-04 | Gabriel Jakobson | Method and system for securing online identities |
US8510664B2 (en) | 2008-09-06 | 2013-08-13 | Steven L. Rueben | Method and system for displaying email thread information |
US8010663B2 (en) | 2008-11-21 | 2011-08-30 | The Invention Science Fund I, Llc | Correlating data indicating subjective user states associated with multiple users with data indicating objective occurrences |
US8495384B1 (en) * | 2009-03-10 | 2013-07-23 | James DeLuccia | Data comparison system |
US8577849B2 (en) * | 2011-05-18 | 2013-11-05 | Qatar Foundation | Guided data repair |
US8769004B2 (en) | 2012-02-17 | 2014-07-01 | Zebedo | Collaborative web browsing system integrated with social networks |
US8756275B2 (en) | 2012-02-17 | 2014-06-17 | Zebedo | Variable speed collaborative web browsing system |
US8769017B2 (en) | 2012-02-17 | 2014-07-01 | Zebedo | Collaborative web browsing system having document object model element interaction detection |
-
2014
- 2014-07-22 US US14/337,352 patent/US20150032729A1/en not_active Abandoned
- 2014-07-22 US US14/337,505 patent/US9760620B2/en active Active
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5504890A (en) * | 1994-03-17 | 1996-04-02 | Sanford; Michael D. | System for data sharing among independently-operating information-gathering entities with individualized conflict resolution rules |
US20030037051A1 (en) * | 1999-07-20 | 2003-02-20 | Gruenwald Bjorn J. | System and method for organizing data |
US6658423B1 (en) * | 2001-01-24 | 2003-12-02 | Google, Inc. | Detecting duplicate and near-duplicate files |
US6947930B2 (en) * | 2003-03-21 | 2005-09-20 | Overture Services, Inc. | Systems and methods for interactive search query refinement |
US20050234952A1 (en) * | 2004-04-15 | 2005-10-20 | Microsoft Corporation | Content propagation for enhanced document retrieval |
US7305389B2 (en) * | 2004-04-15 | 2007-12-04 | Microsoft Corporation | Content propagation for enhanced document retrieval |
US20060026152A1 (en) * | 2004-07-13 | 2006-02-02 | Microsoft Corporation | Query-based snippet clustering for search result grouping |
US20060117002A1 (en) * | 2004-11-26 | 2006-06-01 | Bing Swen | Method for search result clustering |
US20100070460A1 (en) * | 2005-05-02 | 2010-03-18 | Fuerst Karl | System and method for rule-based data object matching |
US20070027921A1 (en) * | 2005-08-01 | 2007-02-01 | Billy Alvarado | Context based action |
US20070192293A1 (en) * | 2006-02-13 | 2007-08-16 | Bing Swen | Method for presenting search results |
US20080222140A1 (en) * | 2007-02-20 | 2008-09-11 | Wright State University | Comparative web search system and method |
US20090240672A1 (en) * | 2008-03-18 | 2009-09-24 | Cuill, Inc. | Apparatus and method for displaying search results with a variety of display paradigms |
US20100023515A1 (en) * | 2008-07-28 | 2010-01-28 | Andreas Marx | Data clustering engine |
US20120023107A1 (en) * | 2010-01-15 | 2012-01-26 | Salesforce.Com, Inc. | System and method of matching and merging records |
US8782016B2 (en) * | 2011-08-26 | 2014-07-15 | Qatar Foundation | Database record repair |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10366247B2 (en) | 2015-06-02 | 2019-07-30 | ALTR Solutions, Inc. | Replacing distinct data in a relational database with a distinct reference to that data and distinct de-referencing of database data |
WO2017016130A1 (en) * | 2015-07-30 | 2017-02-02 | 中兴通讯股份有限公司 | Message processing method and device |
US11360990B2 (en) | 2019-06-21 | 2022-06-14 | Salesforce.Com, Inc. | Method and a system for fuzzy matching of entities in a database system based on machine learning |
Also Published As
Publication number | Publication date |
---|---|
US20150032738A1 (en) | 2015-01-29 |
US9760620B2 (en) | 2017-09-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9760620B2 (en) | Confidently adding snippets of search results to clusters of objects | |
US10579691B2 (en) | Application programming interface representation of multi-tenant non-relational platform objects | |
US8521758B2 (en) | System and method of matching and merging records | |
US10733212B2 (en) | Entity identifier clustering based on context scores | |
US9465828B2 (en) | Computer implemented methods and apparatus for identifying similar labels using collaborative filtering | |
US11016959B2 (en) | Trie-based normalization of field values for matching | |
US20190114342A1 (en) | Entity identifier clustering | |
US9646246B2 (en) | System and method for using a statistical classifier to score contact entities | |
US9223852B2 (en) | Methods and systems for analyzing search terms in a multi-tenant database system environment | |
US10579692B2 (en) | Composite keys for multi-tenant non-relational platform objects | |
US11714811B2 (en) | Run-time querying of multi-tenant non-relational platform objects | |
US9268822B2 (en) | System and method for determining organizational hierarchy from business card data | |
US11216435B2 (en) | Techniques and architectures for managing privacy information and permissions queries across disparate database tables | |
US20170060919A1 (en) | Transforming columns from source files to target files | |
US10599654B2 (en) | Method and system for determining unique events from a stream of events | |
US20150106390A1 (en) | Processing user-submitted updates based on user reliability scores | |
US20160379265A1 (en) | Account recommendations for user account sets | |
US20160378759A1 (en) | Account routing to user account sets | |
US10817465B2 (en) | Match index creation | |
US10671626B2 (en) | Identity consolidation in heterogeneous data environment | |
US10628384B2 (en) | Optimized match keys for fields with prefix structure | |
US9619458B2 (en) | System and method for phrase matching with arbitrary text | |
US10852926B2 (en) | Filter of data presentations via user-generated links | |
US11436233B2 (en) | Generating adaptive match keys | |
US9659059B2 (en) | Matching large sets of words |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: SALESFORCE.COM, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:NACHNANI, PAWAN;JAGOTA, ARUN KUMAR;SIGNING DATES FROM 20140714 TO 20140718;REEL/FRAME:033360/0816 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |