US7979418B1 - System, method, and computer program product for processing a prefix tree file utilizing a selected agent - Google Patents
System, method, and computer program product for processing a prefix tree file utilizing a selected agent Download PDFInfo
- Publication number
- US7979418B1 US7979418B1 US11/963,589 US96358907A US7979418B1 US 7979418 B1 US7979418 B1 US 7979418B1 US 96358907 A US96358907 A US 96358907A US 7979418 B1 US7979418 B1 US 7979418B1
- Authority
- US
- United States
- Prior art keywords
- file
- agent
- prefix
- prefix tree
- processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
- G06F16/322—Trees
Definitions
- the present invention relates to prefix trees, and more particularly to storing data in prefix trees.
- relational databases For providing access to such data.
- use of relational databases for hierarchical data has exhibited various limitations.
- Particularly, relational databases have been limited with respect to storing and accessing very large amounts of data.
- queries for retrieving data for web crawling purposes, storing data, modifying data, etc. have conventionally been unable to be a part of a distributed system.
- selecting data from a traditional relational database usually results in a high number of cross-table joins and constraints to be processed in order to define crawling behavior and to impose limitations on the amount of data to be stored.
- constraints include prevention of target server flooding by keeping an amount of parallel requests low, Internet protocol (IP) address information tracking to avoid parallel visits of target servers on multiple crawler nodes through usage of mutually exclusive IP ranges, as well as focused selections of hyperlinks to follow via user-definable strategies (e.g. only follow links that could point to potentially interesting content).
- IP Internet protocol
- a system, method, and computer program product are provided for processing a prefix tree file utilizing a selected agent.
- a file including a prefix tree is identified.
- an agent is selected from a plurality of agents to process the file. Further, the file is processed utilizing the agent.
- FIG. 1 illustrates a network architecture, in accordance with one embodiment.
- FIG. 2 shows a representative hardware environment that may be associated with the servers and/or clients of FIG. 1 , in accordance with one embodiment.
- FIG. 3 shows a method for processing a prefix tree file utilizing a selected agent, in accordance with one embodiment.
- FIG. 4 shows a system for processing a prefix tree file utilizing a selected agent, in accordance with one embodiment.
- FIG. 5 shows a method for selecting an agent from a plurality of agents for processing a prefix tree file, in accordance with yet another embodiment.
- FIG. 6 shows a system for storing new data in a plurality of distributed prefix trees, in accordance with still yet another embodiment.
- FIG. 7 shows a system for querying a plurality of distributed tries, in accordance with another embodiment.
- FIG. 8 shows uniform resource locators stored in a prefix tree, in accordance with yet another embodiment.
- FIG. 1 illustrates a network architecture 100 , in accordance with one embodiment.
- a plurality of networks 102 is provided.
- the networks 102 may each take any form including, but not limited to a local area network (LAN), a wireless network, a wide area network (WAN) such as the Internet, peer-to-peer network, etc.
- LAN local area network
- WAN wide area network
- peer-to-peer network etc.
- servers 104 which are capable of communicating over the networks 102 .
- clients 106 are also coupled to the networks 102 and the servers 104 .
- Such servers 104 and/or clients 106 may each include a desktop computer, lap-top computer, hand-held computer, mobile phone, personal digital assistant (PDA), peripheral (e.g. printer, etc.), any component of a computer, and/or any other type of logic.
- PDA personal digital assistant
- peripheral e.g. printer, etc.
- any component of a computer and/or any other type of logic.
- at least one gateway 108 is optionally coupled therebetween.
- FIG. 2 shows a representative hardware environment that may be associated with the servers 104 and/or clients 106 of FIG. 1 , in accordance with one embodiment.
- Such figure illustrates a typical hardware configuration of a workstation in accordance with one embodiment having a central processing unit 210 , such as a microprocessor, and a number of other units interconnected via a system bus 212 .
- a central processing unit 210 such as a microprocessor
- the workstation shown in FIG. 2 includes a Random Access Memory (RAM) 214 , Read Only Memory (ROM) 216 , an I/O adapter 218 for connecting peripheral devices such as disk storage units 220 to the bus 212 , a user interface adapter 222 for connecting a keyboard 224 , a mouse 226 , a speaker 228 , a microphone 232 , and/or other user interface devices such as a touch screen (not shown) to the bus 212 , communication adapter 234 for connecting the workstation to a communication network 235 (e.g., a data processing network) and a display adapter 236 for connecting the bus 212 to a display device 238 .
- a communication network 235 e.g., a data processing network
- display adapter 236 for connecting the bus 212 to a display device 238 .
- the workstation may have resident thereon any desired operating system. It will be appreciated that an embodiment may also be implemented on platforms and operating systems other than those mentioned.
- One embodiment may be written using JAVA, C, and/or C++ language, or other programming languages, along with an object oriented programming methodology.
- Object oriented programming (OOP) has become increasingly used to develop complex applications.
- FIG. 3 shows a method 300 for processing a prefix tree file utilizing a selected agent, in accordance with one embodiment.
- the method 300 may be carried out in the context of the architecture and environment of FIGS. 1 and/or 2 . Of course, however, the method 300 may be carried out in any desired environment.
- the prefix tree may include any ordered hierarchical data structure which stores arrays of data with a common prefix. For example, each of a plurality of descendants of a node may include a common prefix. In one embodiment, the prefix tree may be a trie.
- the prefix tree may store a plurality of uniform resource locators (URLs).
- each of the URLs stored in the prefix tree may include a common prefix.
- the common prefix may include the first two characters (e.g. letters, etc.) of a domain name server (DNS) domain name of each of the URLs.
- DNS domain name server
- the common prefix of URLs capable of being stored in the prefix tree may be predefined (e.g. by a user, etc.).
- the characters representative of the common prefix may be predefined.
- the number of characters representative of the common prefix may also be predefined.
- the file may include any resource capable of storing the prefix tree.
- the file may be dedicated to storing the prefix tree.
- the file may only store the prefix tree, as an option.
- the file may be identified in any desired manner.
- the file may be identified based on a selection of the file, as set forth in the example explained below.
- the file may be selected from a plurality of files, each file including a different prefix tree.
- the file may be selected from a queue of files, where each file in the queue of files includes a different prefix tree.
- the file may be selected based on a time of a last access to the file.
- the queue of files may optionally store the files in order of a time each of the files were last accessed, where more recently accessed files are stored in the queue after less recently accessed files.
- an agent is selected from a plurality of agents to process the file.
- the agents may be running on the client computer and/or server computer on which the file is stored. In this way, the selected agent may access the file locally for processing the file.
- each of the agents may include software.
- each of the agents may include a command line executable.
- each of the agents may include hardware with respect to another embodiment.
- a predefined number of the agents e.g. predefined by a user, predefined based on capabilities and/or resources of the computer on which the agents are located, etc. may optionally execute in parallel.
- each of the agents may perform different functions.
- Such functions may include a crawling function (e.g. for crawling sites, such as web sites, indicated by data stored in the prefix tree, etc.), a merging function (e.g. for merging data into the prefix tree, etc.), a separating function (e.g. for separating data within the prefix tree, etc.), a reorganization function (e.g. for reorganizing data stored in the prefix tree, etc.) and/or any other function capable of being performed with respect to the prefix tree and/or any information associated with such prefix tree.
- the agent may optionally be selected based on the type of processing to be performed on the file.
- the agent may be selected based on a determination of whether the agent is in an idle state. For example, only an agent in an idle state may be selected for processing the file. As yet another option, the agent may be selected based on the type of prefix tree included in the file. Thus, an agent capable of categorizing data may be selected for processing a prefix tree containing uncategorized data, just by way of example.
- the file is processed utilizing the agent, as shown in operation 306 .
- the processing may include performing any functions in association with the file.
- the processing may include executing any functions capable of being performed by the agent.
- the processing may include downloading a web page identified using a URL stored in the prefix tree.
- the processing may include analyzing the web page for gathering information associated with the URL. Such information may include, for example, information indicating whether the web page includes unwanted content (e.g. malware, etc.) and/or a type of any unwanted content discovered in the web page, a categorization of the web page, a type of language included in the web page, etc.
- the processing may include categorizing the URL (e.g. based on a categorization of the web page, etc.).
- the prefix tree file may be processed utilizing an agent selected from a plurality of agents.
- the identification of the file (operation 302 ), the selection of the agent (operation 304 ), and the processing of the file (operation 306 ) may be controlled utilizing a controller application.
- the controller application may schedule execution of one of the agents for processing the prefix tree file.
- the controller application may be located on the computer on which the file and/or agents are located, for example.
- FIG. 4 shows a system 400 for processing a prefix tree file utilizing a selected agent, in accordance with one embodiment.
- the system 400 may be implemented in the context of the architecture and environment of FIGS. 1-3 . Of course, however, the system 400 may be implemented in any desired environment. It should also be noted that the aforementioned definitions may apply during the present description.
- a first processing node 402 A is in communication with a second processing node 402 B.
- the first processing node 402 A may communicate with the second processing node 402 B over a network (e.g. such as any of the networks described above with respect to FIG. 1 ), but of course the first processing node 402 A may also communicate with the second processing node 402 B via a local connection.
- the first processing node 402 A may communicate with the second processing node 402 B using a server message block (SMB) protocol, a node specific protocol and/or any other protocol capable of being used for communication purposes.
- SMB server message block
- the first processing node 402 A and the second processing node 402 B may optionally operate as a peer network of independent nodes.
- the first processing node 402 A and the second processing node 402 B may each include any of the devices described above with respect to FIGS. 1 and/or 2 .
- the first processing node 402 A and the second processing node 402 B may each include the Intel® CoreTM 2 Duo processor with 250 gigabytes of hard disk space.
- the first processing node 402 A and the second processing node 402 B may optionally include client computers instead of being consolidated on expensive high-end servers. While only two processing nodes 402 A and 402 B are shown, it should be noted that any desired number of processing nodes may be included in the system 400 , thus allowing scalability of the system 400 .
- the first processing node 402 A and the second processing node 402 B each include a respective queue of prefix trees 404 A and 404 B (hereinafter referred to as tries).
- the queues of tries 404 A and 404 B may each include a plurality of trie files, where each trie file includes a different trie (e.g. a different type of trie, etc.) that stores different data (e.g. URLs), for example.
- tries may each trie file includes a different trie (e.g. a different type of trie, etc.) that stores different data (e.g. URLs), for example.
- tries may each trie file includes a different trie (e.g. a different type of trie, etc.) that stores different data (e.g. URLs), for example.
- tries may be distributed across a plurality of processing nodes for providing distributed processing and load-balancing of such trie files, as described in more detail below.
- the trie files may be stored locally on a hard disk of a respective one of the first processing node 402 A and the second processing node 402 B.
- the queues of tries 404 A and 404 B may access trie files in an order based on a time in which the files were last accessed (e.g. modified, etc.). For example, the least recently accessed trie file may be stored first in the queue 404 A and 404 B, whereas the most recently accessed trie file may be stored last in the queue 404 A and 404 B. Accordingly, after a top trie file in the queue 404 A and 404 B is accessed (e.g. for processing, etc.), such trie file may be returned to the end of the queue 404 A and 404 B.
- each trie may only store data with a common prefix.
- the data may include URLs
- the common prefix may include the first two characters of a DNS domain name of such URLs.
- Table 1 illustrates exemplary trie file names that may be stored in the queues 404 A and 404 B of the first processing node 402 A and the second processing node 402 B.
- such example is set forth for illustrative purposes only, and thus should not be construed as limiting in any manner.
- Trie_1 aa.ct-stores data with the common prefix aa Trie_2: cx.ct-stores data with the common prefix cx Trie_3: de.ct-stores data with the common prefix de Processing Node_2 Trie_1: ab.ct-stores data with the common prefix ab Trie_2: mi.ct-stores data with the common prefix mi
- the URL msdn.microsoft.com would be stored in Trie — 2 of Processing Node — 2 since the first two characters of the DNS domain name (i.e. “microsoft”) of such URL share the common prefix “mi”.
- the URL www.dell.com would be stored in Trie — 3 of Processing Node — 1 since the first two characters of the DNS domain name (i.e. “dell”) of such URL share the common prefix “de”.
- each trie file may be limited to a predetermined size (e.g. 100 megabytes, etc.), such that multiple tries may be capable of being loaded into random access memory (RAM).
- RAM random access memory
- a new trie may be created and added to the queue of tries 404 A and 404 B in which the original trie is stored.
- the new trie may store data with a three character common prefix which includes the two character common prefix.
- the respective tree is split into multiple files with a three character prefix.
- the first processing node 402 A and the second processing node 402 B may also each include a crawler controller 406 A and 406 B.
- the crawler controller 406 A and 406 B may include a controller application, in one embodiment.
- the crawler controller 406 A and 406 B may select the top trie file from a respective queue of tries 404 A and 404 B and may send the selected trie file to one of a plurality of different types of agents 410 A, 410 B, 412 A, 412 B, 414 A, and 414 B located in an agent pool 408 A and 408 B on the same processing node 402 A and 402 B.
- the agent 410 A, 410 B, 412 A, 412 B, 414 A, and 414 B to which the selected trie file is sent may be selected by the crawler controller 406 A and 406 B.
- the agent pool 408 A and 408 B may include a first crawler agent 410 A and 410 B, a second crawler agent 412 A and 412 B, and a maintenance agent 414 A and 414 B.
- the first crawler agent 410 A and 410 B and/or second crawler agent 412 A and 412 B may each categorize data stored in trie files stored in the trie file queue 404 A and 404 B, download content (e.g. web pages, etc.) identified by the data stored in the trie files, analyze such content (e.g. for malware, etc.) and/or gather any other information associated with the data stored in the trie files. New data will be stored in an extra trie file (seed trie).
- the maintenance agent 414 A and 414 B may sort data stored in such trie files that do not have a common prefix (e.g. determine a common prefix of the data), separate such data from such trie files, and optionally merge the data into various other trie files based on a common prefix between the other trie files and the data.
- the crawler controller 406 A of the first processing node 402 A may select a trie file from the top of the queue of tries 404 A stored on the first processing node 402 A.
- the crawler controller 406 A may determine the type of the selected trie file for determining which processing is to be performed on the trie file.
- a trie file storing data without a common prefix may optionally require processing for separating the data and storing the data by common prefix.
- the crawler controller 406 A may select an agent 410 A, 412 A and 414 A from the agent pool 408 A of the first processing node 402 A based on the type of the selected trie file.
- the crawler controller 406 A may select the maintenance agent 414 A for sorting and merging the data into other trie files with prefixes common to the data.
- the crawler controller 406 A may select an agent 410 A, 412 A and 414 A based on which agent 410 A, 412 A and 414 A is idle.
- the crawler controller 406 A of the first processing node 402 A is in communication with the second processing node 402 B.
- the crawler controller 406 B of the second processing node 402 B may also be in communication with the first processing node 402 A.
- the crawler controller 406 A may control the copying of data (e.g. via a file share) stored in trie files of the first processing node 402 A to a trie file in the queue of tries 404 B stored on the second processing node 402 B (e.g. utilizing an SMB protocol, etc.).
- the crawler controller 406 B of the second processing node 402 B may control a merging of the received data into trie files based on an associated common prefix of the data.
- the crawler controller 406 A of the first processing node 402 A may be synchronized with the crawler controller 406 B of the second processing node 402 B via the node specific protocol.
- FIG. 5 shows a method 500 for selecting an agent from a plurality of agents for processing a prefix tree file, in accordance with yet another embodiment.
- the method 500 may be carried out in the context of the architecture and environment of FIGS. 1-4 .
- the method 500 may be carried out utilizing the crawler controller 406 A and 406 B of FIG. 4 .
- the method 500 may be carried out in any desired environment.
- the aforementioned definitions may apply during the present description.
- the idle state may include any state in which an agent is not being utilized (e.g. executed for processing purposes, etc.).
- each of a plurality of agents within a pool of agents may be monitored for determining whether any of such agents are idle.
- each of the plurality of agents may broadcast a signal indicating an associated state, such that an agent with an idle state may be determined based on a signal broadcasted therefrom.
- a prefix tree stored in a queue with an oldest access time is identified. Note operation 504 .
- the prefix tree may be included in a file stored in the queue.
- the prefix tree with the oldest access time may be identified based on an ordering of prefix trees within the queue.
- the queue may store prefix trees in an order based on a latest access time for each prefix tree.
- a prefix tree at a top of the queue may be the prefix tree with the oldest access time.
- the identified prefix tree is sent to the idle agent, as shown in operation 506 .
- the agent may be selected for processing the prefix tree.
- the prefix tree may only be sent to an agent capable of processing the type of the prefix tree.
- the type of the prefix tree may be indicated by the file extension of the file in which the prefix tree is included, in one embodiment.
- the prefix tree may only be sent to an agent capable of categorizing data within the prefix tree.
- the method 500 may wait to identify an idle agent capable of processing the prefix tree prior to sending the prefix tree.
- FIG. 6 shows a system 600 for storing new data in a plurality of distributed prefix trees, in accordance with still yet another embodiment.
- the system 600 may be implemented in the context of the architecture and environment of FIGS. 1-5 .
- the system 600 may be implemented in any desired environment.
- the aforementioned definitions may apply during the present description.
- a first computer 602 is in communication with a second computer 604 A, a third computer 604 B, and a fourth computer 604 C.
- the first computer 602 , second computer 604 A, third computer 604 B, and fourth computer 604 C may each include a processing node (e.g. such as one of the processing nodes 402 A and 402 B of FIG. 4 ), for example.
- the first computer 602 may be in communication with the second computer 604 A, third computer 604 B, and fourth computer 604 C over a network. While not shown, it should be noted that any of the second computer 604 A, third computer 604 B, and fourth computer 604 C may also be in communication with one another.
- the first computer 602 includes a seed trie 606 .
- the seed trie may be stored in a file with a unique extension (e.g. *.st).
- the seed trie may store newly discovered data (e.g. URLs discovered via a crawler crawling web pages on the Internet, etc.).
- the seed trie 606 may store data which is unsorted according to common prefixes of such data. While only a single seed trie 606 is shown, it should be noted that the first computer 602 may also include a plurality of seed tries, in another embodiment. For example, each time a module (e.g. crawler) is finished discovering new data, the crawler may produce a new seed trie containing the newly discovered data.
- a module e.g. crawler
- the first computer 602 also includes a plurality of sorted seed tries 608 A-C.
- the sorted seed tries 608 A-C may also each be included in a file with a unique extension (e.g. *.sst). Additionally, the sorted seed tries 608 A-C may store data which has been sorted from the seed trie according to common prefixes of such data. For example, each sorted seed trie 608 A-C may store data with a single common prefix.
- a maintenance agent may process the data stored in the seed trie 606 by sorting (e.g. grouping, etc.) the data according to common prefixes, separating the data based on the sorting, and storing the sorted data in the sorted seed tries 608 A-C based on common prefixes.
- the sorting may include identifying a name of a target sorted seed trie 608 A-C by extracting a registered domain name from a URL stored in the seed trie 606 and utilizing the first n letters (e.g. where n is predefined) as the name (or part of the name) of the target sorted seed trie 608 A-C.
- Each of the second computer 604 A, third computer 604 B, and fourth computer 604 C also include a sorted seed trie 610 A-C.
- the maintenance agent of the first computer 602 may copy each of the sorted seed tries 608 A-C to a remote computer as a separate file.
- each of the sorted seed tries 608 A-C of the first computer 602 may only be copied to one of the second computer 604 A, third computer 604 B, and fourth computer 604 C if the name of such sorted seed trig 608 A-C matches the names of tries already existing on the second computer 604 A, third computer 604 B, and/or fourth computer 604 C. In this way, each of the second computer 604 A, third computer 604 B, and fourth computer 604 C may store a different sorted seed tree 610 A-C.
- each of the second computer 604 A, third computer 604 B, and fourth computer 604 C include a crawl tree 612 A-C.
- the crawl trees 612 A-C may also each be included in a file with a unique extension (e.g. *.ct).
- the crawl trees 612 -C may be utilized for permanently storing data, as an option.
- the maintenance agent of each of the second computer 604 A, third computer 604 B, and fourth computer 604 C may identify data stored in a respective sorted seed tree 610 A-C and merge the data into an associated crawl tree 612 A-C.
- the merging may include copying data from the sorted seed trie 610 A-C to the associated crawl tree 612 A-C.
- the maintenance agent may also perform operations (e.g. based on user-configured constraints) such as splitting the crawl trie 612 A-C into multiple crawl tries (e.g. due to size limits of the crawl trie 612 A-C), removal of data in the crawl trie 612 A-C that is marked for deletion, etc.
- the maintenance agent may delete data stored in the crawl trie 612 A-C by reorganizing the structure of the crawl trie 612 A-C. Data marked for deletion may be deleted in bulk during other maintenance operations, such as merging, for limiting a performance impact on the crawl trie 612 A-C, as an option.
- the crawl tries 612 A-C may be utilized as input for crawlers.
- each crawler may pull a set of URLs to be visited (e.g. according to defined crawling strategies) from one of the crawl tries 612 A-C, may download an associated web page and may update such crawl trie 612 A-C with metadata.
- the metadata may include, for example, a category of the web page, a main language used by the web page, etc.
- FIG. 7 shows a system 700 for querying a plurality of distributed tries, in accordance with another embodiment.
- the system 700 may be implemented in the context of the architecture and environment of FIGS. 1-6 .
- the system 700 may be implemented in any desired environment.
- the aforementioned definitions may apply during the present description.
- tries may be queried for data stored in such tries.
- a central control facility e.g. user interface
- source code 708 for each of the queries is sent to each processing node that stores a trie file 701 .
- a query agent 706 local to each processing node compiles and binds the source code 708 as dynamic link library (DLL) files 710 and iterates over data in the trie.
- DLL dynamic link library
- Multiple queries may optionally be processed with a single in-order run of a selected trie file 701 .
- Information may be collected during the processing and returned in a normalized format (e.g. query results 712 ) to a central database server, such as the SQL server 702 shown, via a database connector 714 .
- the information may be returned to the central database server for the purpose of collation and reporting, for example.
- the central database server may communicate results of the query to the initiator of the query (e.g. the user interface).
- FIG. 8 shows uniform resource locators (URLs) 800 stored in a prefix tree, in accordance with yet another embodiment.
- the URLs 800 may be implemented in the context of the architecture and environment of FIGS. 1-7 .
- the URLs 800 may be implemented in any desired environment.
- the aforementioned definitions may apply during the present description.
- the URLs 800 include arrays which are stored in the prefix tree.
- the prefix tree may include keys represented as strings, as an option.
- Each element of an array may be a node within the prefix tree.
- descendants of a node may share a common prefix with the node for minimizing storage space used by the prefix tree.
- the URLs http://www.mcafe* and http://earth.goo* may share the common prefix “http://”.
- the URL http://earth.goo* may be a descendant of http://www.mcafe* at the first “w” node within the URL http://www.mcafe*.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
TABLE 1 |
Processing Node_1 |
Trie_1: | aa.ct-stores data with the common prefix aa | |
Trie_2: | cx.ct-stores data with the common prefix cx | |
Trie_3: | de.ct-stores data with the common prefix de |
Processing Node_2 |
Trie_1: | ab.ct-stores data with the common prefix ab | ||
Trie_2: | mi.ct-stores data with the common prefix mi | ||
Claims (24)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/963,589 US7979418B1 (en) | 2007-12-21 | 2007-12-21 | System, method, and computer program product for processing a prefix tree file utilizing a selected agent |
US13/150,947 US8560521B2 (en) | 2007-12-21 | 2011-06-16 | System, method, and computer program product for processing a prefix tree file utilizing a selected agent |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US11/963,589 US7979418B1 (en) | 2007-12-21 | 2007-12-21 | System, method, and computer program product for processing a prefix tree file utilizing a selected agent |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/150,947 Continuation US8560521B2 (en) | 2007-12-21 | 2011-06-16 | System, method, and computer program product for processing a prefix tree file utilizing a selected agent |
Publications (1)
Publication Number | Publication Date |
---|---|
US7979418B1 true US7979418B1 (en) | 2011-07-12 |
Family
ID=44245636
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US11/963,589 Active 2029-07-19 US7979418B1 (en) | 2007-12-21 | 2007-12-21 | System, method, and computer program product for processing a prefix tree file utilizing a selected agent |
US13/150,947 Active US8560521B2 (en) | 2007-12-21 | 2011-06-16 | System, method, and computer program product for processing a prefix tree file utilizing a selected agent |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/150,947 Active US8560521B2 (en) | 2007-12-21 | 2011-06-16 | System, method, and computer program product for processing a prefix tree file utilizing a selected agent |
Country Status (1)
Country | Link |
---|---|
US (2) | US7979418B1 (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2014064564A1 (en) * | 2012-10-22 | 2014-05-01 | Telefonaktiebolaget L M Ericsson (Publ) | Method and system of packet based identifier locator network protocol (ilnp) load balancing and routing |
CN103823823B (en) * | 2013-07-08 | 2016-12-28 | 电子科技大学 | Denormalization policy selection method based on Frequent Itemsets Mining Algorithm |
US10304086B2 (en) * | 2011-06-22 | 2019-05-28 | Skyhook Wireless, Inc. | Techniques for estimating demographic information |
US10438000B1 (en) * | 2017-09-22 | 2019-10-08 | Symantec Corporation | Using recognized backup images for recovery after a ransomware attack |
US10725870B1 (en) * | 2018-01-02 | 2020-07-28 | NortonLifeLock Inc. | Content-based automatic backup of images |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11089056B2 (en) * | 2018-09-28 | 2021-08-10 | Sophos Limited | Intrusion detection with honeypot keys |
CN109508422A (en) * | 2018-12-05 | 2019-03-22 | 南京邮电大学 | The height of multithreading intelligent scheduling is hidden crawler system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050192966A1 (en) * | 2004-03-01 | 2005-09-01 | Hilbert David M. | Remote file management |
US20080033998A1 (en) * | 2006-08-03 | 2008-02-07 | Yahoo! Inc. | Agent for identifying domains with content arranged for display by a mobile device |
US20080114765A1 (en) * | 2005-06-30 | 2008-05-15 | Fujitsu Limited | Computer program, method, and apparatus for data sorting |
Family Cites Families (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6073142A (en) | 1997-06-23 | 2000-06-06 | Park City Group | Automated post office based rule analysis of e-mail messages and other data objects for controlled distribution in network environments |
US5987610A (en) | 1998-02-12 | 1999-11-16 | Ameritech Corporation | Computer virus screening methods and systems |
US6094706A (en) * | 1998-03-02 | 2000-07-25 | International Business Machines Corporation | Caching in a data processing system using the pigeon hole principle |
US6460050B1 (en) | 1999-12-22 | 2002-10-01 | Mark Raymond Pace | Distributed content identification system |
JP3879350B2 (en) * | 2000-01-25 | 2007-02-14 | 富士ゼロックス株式会社 | Structured document processing system and structured document processing method |
US6976090B2 (en) * | 2000-04-20 | 2005-12-13 | Actona Technologies Ltd. | Differentiated content and application delivery via internet |
US7716163B2 (en) * | 2000-06-06 | 2010-05-11 | Microsoft Corporation | Method and system for defining semantic categories and actions |
US6901519B1 (en) | 2000-06-22 | 2005-05-31 | Infobahn, Inc. | E-mail virus protection system and method |
US7080073B1 (en) * | 2000-08-18 | 2006-07-18 | Firstrain, Inc. | Method and apparatus for focused crawling |
DE10237875A1 (en) * | 2002-08-19 | 2004-03-04 | Siemens Ag | Device, in particular automation device, with a file directory structure stored in a file |
US7328403B2 (en) * | 2003-10-22 | 2008-02-05 | Intel Corporation | Device for structured data transformation |
US7454429B2 (en) * | 2004-02-14 | 2008-11-18 | Alan S Rojer | Declarative Dispatch |
US20060053097A1 (en) * | 2004-04-01 | 2006-03-09 | King Martin T | Searching and accessing documents on private networks for use with captures from rendered documents |
US7877366B2 (en) * | 2004-03-12 | 2011-01-25 | Oracle International Corporation | Streaming XML data retrieval using XPath |
US7861304B1 (en) * | 2004-05-07 | 2010-12-28 | Symantec Corporation | Pattern matching using embedded functions |
US7711542B2 (en) * | 2004-08-31 | 2010-05-04 | Research In Motion Limited | System and method for multilanguage text input in a handheld electronic device |
US8037527B2 (en) * | 2004-11-08 | 2011-10-11 | Bt Web Solutions, Llc | Method and apparatus for look-ahead security scanning |
US7680791B2 (en) * | 2005-01-18 | 2010-03-16 | Oracle International Corporation | Method for sorting data using common prefix bytes |
US7979368B2 (en) * | 2005-07-01 | 2011-07-12 | Crossbeam Systems, Inc. | Systems and methods for processing data flows |
US8769127B2 (en) * | 2006-02-10 | 2014-07-01 | Northrop Grumman Systems Corporation | Cross-domain solution (CDS) collaborate-access-browse (CAB) and assured file transfer (AFT) |
US7599931B2 (en) * | 2006-03-03 | 2009-10-06 | Microsoft Corporation | Web forum crawler |
CA2675216A1 (en) * | 2007-01-10 | 2008-07-17 | Nick Koudas | Method and system for information discovery and text analysis |
US7720869B2 (en) * | 2007-05-09 | 2010-05-18 | Illinois Institute Of Technology | Hierarchical structured abstract file system |
US20080306949A1 (en) * | 2007-06-08 | 2008-12-11 | John Martin Hoernkvist | Inverted index processing |
US20090063538A1 (en) * | 2007-08-30 | 2009-03-05 | Krishna Prasad Chitrapura | Method for normalizing dynamic urls of web pages through hierarchical organization of urls from a web site |
US20090089278A1 (en) * | 2007-09-27 | 2009-04-02 | Krishna Leela Poola | Techniques for keyword extraction from urls using statistical analysis |
-
2007
- 2007-12-21 US US11/963,589 patent/US7979418B1/en active Active
-
2011
- 2011-06-16 US US13/150,947 patent/US8560521B2/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050192966A1 (en) * | 2004-03-01 | 2005-09-01 | Hilbert David M. | Remote file management |
US20080114765A1 (en) * | 2005-06-30 | 2008-05-15 | Fujitsu Limited | Computer program, method, and apparatus for data sorting |
US20080033998A1 (en) * | 2006-08-03 | 2008-02-07 | Yahoo! Inc. | Agent for identifying domains with content arranged for display by a mobile device |
Non-Patent Citations (1)
Title |
---|
"Trie," Wikipedia, last modified Dec. 1, 2007, http://en.wikipedia.org/wiki/Trie. |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10304086B2 (en) * | 2011-06-22 | 2019-05-28 | Skyhook Wireless, Inc. | Techniques for estimating demographic information |
WO2014064564A1 (en) * | 2012-10-22 | 2014-05-01 | Telefonaktiebolaget L M Ericsson (Publ) | Method and system of packet based identifier locator network protocol (ilnp) load balancing and routing |
CN104718733A (en) * | 2012-10-22 | 2015-06-17 | 瑞典爱立信有限公司 | Method and system of packet based identifier locator network protocol (ILNP) load balancing and routing |
CN104718733B (en) * | 2012-10-22 | 2018-02-13 | 瑞典爱立信有限公司 | The method and system of packet-based identifier finger URL procotol (ILNP) load balance and Route Selection |
CN103823823B (en) * | 2013-07-08 | 2016-12-28 | 电子科技大学 | Denormalization policy selection method based on Frequent Itemsets Mining Algorithm |
US10438000B1 (en) * | 2017-09-22 | 2019-10-08 | Symantec Corporation | Using recognized backup images for recovery after a ransomware attack |
US10725870B1 (en) * | 2018-01-02 | 2020-07-28 | NortonLifeLock Inc. | Content-based automatic backup of images |
Also Published As
Publication number | Publication date |
---|---|
US8560521B2 (en) | 2013-10-15 |
US20110246531A1 (en) | 2011-10-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8560521B2 (en) | System, method, and computer program product for processing a prefix tree file utilizing a selected agent | |
KR100971863B1 (en) | System and method for batched indexing of network documents | |
US9026566B2 (en) | Generating equivalence classes and rules for associating content with document identifiers | |
US6873982B1 (en) | Ordering of database search results based on user feedback | |
US9576024B2 (en) | Hierarchy of servers for query processing of column chunks in a distributed column chunk data store | |
US9443019B2 (en) | Optimized web domains classification based on progressive crawling with clustering | |
CA2511098C (en) | Dispersing search engine results by using page category information | |
US8266134B1 (en) | Distributed crawling of hyperlinked documents | |
US7308643B1 (en) | Anchor tag indexing in a web crawler system | |
CN109885615B (en) | Index-based block chain light client-oriented range query verifiable query method | |
CN107645508A (en) | A kind of data handling system, method, client and server | |
US8027961B2 (en) | System and method for composite record keys ordered in a flat key space for a distributed database | |
US8041893B1 (en) | System and method for managing large filesystem-based caches | |
CA2475328A1 (en) | Improved systems and methods for ranking documents based upon structurally interrelated information | |
US20080133460A1 (en) | Searching descendant pages of a root page for keywords | |
CN113767377A (en) | Dynamic hash function combination for change detection in distributed storage systems | |
WO2011053377A1 (en) | System for user driven ranking of web pages | |
US8930518B2 (en) | Processing of write requests in application server clusters | |
Deka | NoSQL web crawler application | |
JP2019103039A (en) | Firewall device | |
JP5463988B2 (en) | Configuration information management apparatus, configuration information management program, and configuration information management method | |
US20240289333A1 (en) | Metadata search via n-gram index | |
AU2002351296B2 (en) | System and method for processing a request using multiple database units | |
US20210089507A1 (en) | Systems and methods for providing an adaptive attention-based bloom filter for tree-based information repositories | |
Sachdeva et al. | A novel focused crawler with anti-spamming approach & fast query retrieval |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: MCAFEE, INC., CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SCHLEMMER, ANDREAS;STEINER, THOMAS C.H.;BLAIMSCHEIN, PETER;REEL/FRAME:020296/0277 Effective date: 20071221 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
AS | Assignment |
Owner name: MCAFEE, LLC, CALIFORNIA Free format text: CHANGE OF NAME AND ENTITY CONVERSION;ASSIGNOR:MCAFEE, INC.;REEL/FRAME:043665/0918 Effective date: 20161220 |
|
AS | Assignment |
Owner name: JPMORGAN CHASE BANK, N.A., NEW YORK Free format text: SECURITY INTEREST;ASSIGNOR:MCAFEE, LLC;REEL/FRAME:045055/0786 Effective date: 20170929 Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND Free format text: SECURITY INTEREST;ASSIGNOR:MCAFEE, LLC;REEL/FRAME:045056/0676 Effective date: 20170929 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 8TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1552); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 8 |
|
AS | Assignment |
Owner name: MORGAN STANLEY SENIOR FUNDING, INC., MARYLAND Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE PATENT 6336186 PREVIOUSLY RECORDED ON REEL 045056 FRAME 0676. ASSIGNOR(S) HEREBY CONFIRMS THE SECURITY INTEREST;ASSIGNOR:MCAFEE, LLC;REEL/FRAME:054206/0593 Effective date: 20170929 Owner name: JPMORGAN CHASE BANK, N.A., NEW YORK Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE REMOVE PATENT 6336186 PREVIOUSLY RECORDED ON REEL 045055 FRAME 786. ASSIGNOR(S) HEREBY CONFIRMS THE SECURITY INTEREST;ASSIGNOR:MCAFEE, LLC;REEL/FRAME:055854/0047 Effective date: 20170929 |
|
AS | Assignment |
Owner name: MCAFEE, LLC, CALIFORNIA Free format text: RELEASE OF INTELLECTUAL PROPERTY COLLATERAL - REEL/FRAME 045055/0786;ASSIGNOR:JPMORGAN CHASE BANK, N.A., AS COLLATERAL AGENT;REEL/FRAME:054238/0001 Effective date: 20201026 |
|
AS | Assignment |
Owner name: MCAFEE, LLC, CALIFORNIA Free format text: RELEASE OF INTELLECTUAL PROPERTY COLLATERAL - REEL/FRAME 045056/0676;ASSIGNOR:MORGAN STANLEY SENIOR FUNDING, INC., AS COLLATERAL AGENT;REEL/FRAME:059354/0213 Effective date: 20220301 |
|
AS | Assignment |
Owner name: JPMORGAN CHASE BANK, N.A., AS ADMINISTRATIVE AGENT AND COLLATERAL AGENT, NEW YORK Free format text: SECURITY INTEREST;ASSIGNOR:MCAFEE, LLC;REEL/FRAME:059354/0335 Effective date: 20220301 |
|
AS | Assignment |
Owner name: JPMORGAN CHASE BANK, N.A., AS ADMINISTRATIVE AGENT, NEW YORK Free format text: CORRECTIVE ASSIGNMENT TO CORRECT THE THE PATENT TITLES AND REMOVE DUPLICATES IN THE SCHEDULE PREVIOUSLY RECORDED AT REEL: 059354 FRAME: 0335. ASSIGNOR(S) HEREBY CONFIRMS THE ASSIGNMENT;ASSIGNOR:MCAFEE, LLC;REEL/FRAME:060792/0307 Effective date: 20220301 |
|
MAFP | Maintenance fee payment |
Free format text: PAYMENT OF MAINTENANCE FEE, 12TH YEAR, LARGE ENTITY (ORIGINAL EVENT CODE: M1553); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY Year of fee payment: 12 |