CN111641664B - Crawler equipment service request method, device and system and storage medium - Google Patents
Crawler equipment service request method, device and system and storage medium Download PDFInfo
- Publication number
- CN111641664B CN111641664B CN201910153670.XA CN201910153670A CN111641664B CN 111641664 B CN111641664 B CN 111641664B CN 201910153670 A CN201910153670 A CN 201910153670A CN 111641664 B CN111641664 B CN 111641664B
- Authority
- CN
- China
- Prior art keywords
- service request
- proxy
- long connection
- proxy client
- target station
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 31
- 235000014510 cooky Nutrition 0.000 claims abstract description 49
- 238000013507 mapping Methods 0.000 claims abstract description 29
- 230000004044 response Effects 0.000 claims description 17
- 238000001514 detection method Methods 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 6
- 239000000758 substrate Substances 0.000 claims 1
- 238000010586 diagram Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000007547 defect Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 101150012579 ADSL gene Proteins 0.000 description 2
- 102100020775 Adenylosuccinate lyase Human genes 0.000 description 2
- 108700040193 Adenylosuccinate lyases Proteins 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 238000012790 confirmation Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/60—Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
- H04L67/63—Routing a service request depending on the request content or context
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1001—Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
- H04L67/1004—Server selection for load balancing
- H04L67/1014—Server selection for load balancing based on the content of a request
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/56—Provisioning of proxy services
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer And Data Communications (AREA)
Abstract
The application provides a method, a device and a system for service request of crawler equipment, wherein when a load balancing device receives a service request sent by the crawler equipment deployed in an intranet, if the service request carries a route cookie, the service request is sent to a proxy server corresponding to the route cookie; the proxy server determines whether to locally store the mapping relation between the route cookie and the long connection identifier when receiving the service request sent by the load balancing equipment, and if so, sends the service request to the corresponding proxy client through the corresponding long connection; otherwise, selecting a long connection according to a first preset rule and sending the service request to a corresponding proxy client; and the proxy client sends the service request to the target station when receiving the service request sent by the proxy server. The scheme can reduce cost, improve safety and usability.
Description
Technical Field
The present application relates to the field of internet technologies, and in particular, to a method, an apparatus, a system, and a storage medium for requesting service of a crawler device.
Background
Many applications of the internet require the use of crawler technology, which utilizes crawler equipment, such as crawler robotic agents, to manually perform some frequently performed operations.
In the current general network deployment, robots are deployed on an external network, but most of development resources of operation clients are not open to the outside due to the safety problem, so that crawler equipment deployed on the external network loses the right of using the resources.
To solve the above problem, a set of development environment resources needs to be newly established in the public network, for example: redis cluster, MQ cluster, RPC dispatching center, monitoring system. Another aspect is that robots deployed on the external network are also unsafe and require additional operational and safety protection efforts.
Disclosure of Invention
In view of the above, the present application provides a method, apparatus, system and storage medium for requesting service of crawler equipment, which can reduce cost, improve security and usability.
In order to solve the technical problems, the technical scheme of the application is realized as follows:
a crawler service request system, the system comprising: the system comprises crawler equipment, load balancing equipment, a plurality of proxy servers and a plurality of proxy clients;
the load balancing equipment is used for sending the service request to the proxy server corresponding to the route cookie if the service request carries the route cookie when receiving the service request sent by the crawler equipment deployed in the intranet;
the proxy server determines whether to locally store the mapping relation between the route cookie and the long connection identifier when receiving the service request sent by the load balancing equipment, and if so, sends the service request to the corresponding proxy client through the corresponding long connection; otherwise, selecting a long connection according to a first preset rule and sending the service request to a corresponding proxy client;
and the proxy client sends the service request to the target station when receiving the service request sent by the proxy server.
A service request method of a crawler device is applied to any proxy server in a system comprising the crawler device, a load balancing device, a plurality of proxy servers and a plurality of proxy clients, and comprises the following steps:
when a service request transmitted by crawler equipment deployed in an intranet and forwarded by load balancing equipment is received, determining whether a mapping relation between a route cookie carried by the service request and a long connection identifier is locally stored, and if so, transmitting the service request to a corresponding proxy client through a corresponding long connection, so that the proxy client transmits the service request to a target station; otherwise, according to a first preset rule, selecting one long connection and sending the service request to the corresponding proxy client, so that the proxy client sends the service request to the target station.
A crawler service request device applied to any proxy server in a system comprising a crawler, a load balancing device, a plurality of proxy servers and a plurality of proxy clients, the device comprising: a receiving unit, a determining unit and a transmitting unit;
the receiving unit is used for receiving the service request transmitted by the crawler equipment deployed in the intranet and forwarded by the load balancing equipment;
the determining unit is used for determining whether to locally store the mapping relation between the route cookie carried by the service request and the long connection identifier when the receiving unit receives the service request;
the sending unit is used for sending the service request to the corresponding proxy client through the corresponding long connection when the determining unit determines to store the mapping relation between the route cookie carried by the service request and the long connection identifier, so that the proxy client sends the service request to the target station; otherwise, according to a first preset rule, selecting one long connection and sending the service request to the corresponding proxy client, so that the proxy client sends the service request to the target station.
An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the crawler service request method when the program is executed.
A computer readable storage medium having stored thereon a computer program which when executed by a processor implements the steps of the crawler service request method.
According to the technical scheme, the high-availability distributed proxy cluster is introduced, and the distributed proxy cluster consists of a plurality of proxy servers and proxy clients, so that service requests sent by the crawler equipment are scattered to multiple machines, and the defect that the crawler equipment is deployed by using an intranet and a single-machine proxy outlet is avoided; and the same IP outlet is used by the requests of the same route cookie to realize that one IP outlet is used as much as possible by a group of requests.
Drawings
FIG. 1 is a schematic diagram of a crawler service request system in an embodiment of the present application;
FIG. 2 is a schematic diagram of a business request flow of crawler equipment according to an embodiment of the present application;
fig. 3 is a schematic diagram of a device structure applied to the above technology in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail below by referring to the accompanying drawings and examples.
Referring to fig. 1, fig. 1 is a schematic diagram of a service request system of a crawler device in an embodiment of the present application. The system comprises: crawler equipment, load balancing equipment, a plurality of proxy servers and a plurality of proxy clients.
Wherein, the crawler equipment can be equipment that crawler robot etc. can realize the crawler function.
In the embodiment of the application, the crawler equipment, the load balancing equipment and the proxy server are deployed in an intranet, and the proxy client is deployed in a public network.
Before the crawler equipment sends a service request, the proxy client establishes long connection with the proxy server when the crawler equipment is online;
the proxy server stores the mapping relation between the long connection identifier and the proxy client identifier when the long connection with the proxy client is completed; wherein, a proxy server establishes long connection with 1 or more proxy terminals; one proxy client establishes a long connection with 1 or more proxy servers.
That is, one proxy server, or one proxy client, may establish 1 or more long connections with the opposite end.
In fig. 1, two proxy servers and 3 proxy clients are taken as examples, and are respectively a proxy server 1, a proxy server 2, a proxy client 1, a proxy client 2 and a proxy client 3.
Assuming that the proxy server 1 establishes long connection with the proxy client 1 and the proxy client 2 respectively, and the corresponding long connection identifiers are 1 and 2 respectively, the mapping relationship stored on the proxy server 1 is:
long connection identity 1, proxy client identity 1, long connection identity 2, proxy client identity 2.
Assuming that the proxy server 2 establishes long connections with the proxy client 2 (two) and the proxy client 3 respectively, and the corresponding long connection identifiers are 3, 5 and 4 respectively, the mapping relationship stored on the proxy server 2 is:
long connection identifier 3, proxy client identifier 2, long connection identifier 5, proxy client identifier 2, and long connection identifier 4, proxy client identifier 3.
Multiple long connections may also be established between two devices.
The crawler device sends service requests, and if the service requests are aimed at a group of requests, such as a filling order request, a bill of lading request and the like, the service requests can be respectively regarded as a group of requests; if the service request in the group of requests is sent for the first time, no route data (route cookie) is carried; if the service request in the group of requests is not sent for the first time and the route cookie fed back by the load balancing device is received, the corresponding route cookie is carried in the sent service request.
The Cookie is a piece of data that the server temporarily stores on your computer, which allows the server to identify your computer. When you are browsing the website, the Web server will send a small document on your computer, and the Cookie will help you make a word or some choices on the website, all recorded. When you visit the same website next time, the Web server can see whether the Cookie information left by the Web server last time exists, if so, the Web server can judge the user according to the content in the Cookie and send specific webpage content to you.
The load balancing equipment is used for determining whether a route cookie is carried in a service request or not when receiving the service request sent by the crawler equipment deployed in the intranet;
if the service request carries a route cookie, the service request is sent to a proxy server corresponding to the route cookie;
further, if the fact that the service request does not carry the route cookie is determined, a route cookie is generated for the service request, and the route cookie is returned to the crawler device;
selecting a proxy server according to a second preset rule, sending the request to the selected proxy server, and carrying the generated route cookie in the service request sent to the proxy server; and establishing a mapping relation between the route cookie and the identification of the selected proxy server, and storing the mapping relation.
The second preset rule may be a load balancing rule, for example, a polling rule is used to select a proxy server, and if the embodiment of the application does not embody the rule, a reasonable rule for selecting the proxy server is configured according to actual needs.
The proxy server determines whether to locally store the mapping relation between the route cookie and the long connection identifier when receiving the service request sent by the load balancing equipment, and if so, sends the service request to the corresponding proxy client through the corresponding long connection; otherwise, selecting a long connection according to a first preset rule and sending the service request to a corresponding proxy client;
the proxy server further establishes a mapping relation between a route cookie carried in the service request and a long connection identifier of the selected long connection after selecting one long connection and sending the service request to a corresponding proxy client according to a first preset rule, and locks the long connection identifier;
and selecting long connections outside the long connections corresponding to the locked long connection identifiers when selecting the long connections according to a first preset rule aiming at a service request. That is, the present application does not use a locked long connection when a new set of requests arrives for an already locked long connection.
The first preset rule may be a load balancing rule, and if a polling rule is adopted to select the proxy client, the embodiment of the application configures a rule for reasonably selecting the proxy client according to actual needs.
If a polling is used to select a proxy client, a long connection is selected among clients that have established long connections with the proxy server, and a long connection is selected among unlocked long connections, that is, if a plurality of long connections are established between one proxy server and one proxy client, one of the locked long connections is selected without affecting the other long connections:
if long connection 3 and long connection 5 are connected between proxy server 2 and proxy client 2, long connection 5 may still be the long connection to be selected if long connection 3 is locked.
And the proxy client sends the service request to the target station when receiving the service request sent by the proxy server.
In order to reduce the cost and be high in availability, we use the 'way of using reverse proxy HTTP+WebSocket service cluster+ADSL host WebSocket client cluster reverse proxy' for the intranet, because the ADSL host is cheaper and the IP resources are rich, the method has the function of dynamic IP.
By introducing a high-availability distributed proxy cluster, the distributed proxy cluster consists of a plurality of proxy servers and proxy clients, so that service requests sent by crawler equipment are scattered to multiple machines, and the defect that the crawler equipment deployed by an intranet uses a single-machine proxy outlet is avoided; and the same IP outlet is used by the requests of the same route cookie to realize that one IP outlet is used as much as possible by a group of requests.
In order to prevent being shielded, the embodiment of the application provides an anti-shielding implementation scheme for combining the following two cases by the proxy client, which is specifically as follows:
first case:
in the embodiment of the application, the proxy client needs to add the function of disconnection reconnection to achieve the aim of replacing the IP address at regular time. The method comprises the following steps:
the proxy client configures a switching timer for an IP address used when sending a service request to the target station; when the switching timer is timed out, the switching IP address sends the service request.
The timer timing is set for each IP address, for example, such that the timer timing time is set to 1 hour, 2 hours.
Taking windows as an example, here we can let the client call the ras dial program, and the script is implemented as follows:
@echo off
initializing connection data
set adslName = broadband connection
set adslUsername=05711937xxxx
set adslPassword=348124
:start
Dial-up connection
rasdial%adslName%%adslUsername%%adslPassword%
echo adsl connecting
Output IP after successful connection
For/f "tokens=2 delims=:"% "iin ('ipconfig +|findstr" IPv4 address' ") do set ip =%" i
::echo IP adress:%ip%
Breaking reconnection once every 60 minutes
ping 127.0.0.1-n 900
Disconnection of the connection
rasdial%adslName%/disconnect
echo adsl disconnect
::
goto start
Second case:
the proxy client sends a connection request to the target station before sending the received service request to the target station;
if the response of the target station is not received within the first preset time, or the rejection response of the target station is received, switching the currently used IP address and then sending a connection request to the target station;
and transmitting the service request to the target station by using the IP address for transmitting the connection request until receiving an acceptance response transmitted by the target station within a first preset time after transmitting the connection request.
After sending a service request to the target station, if a response of the target station is not received within a second preset time or the received response carries an error keyword configured by the proxy client, switching the currently used IP address;
after switching the currently used IP address, detecting the target station; if the detection fails, switching the IP address again, and detecting again; and sending the service request by using the IP address used when the detection is successful until the detection is successful.
That is, before sending the service request, a connection request is sent; if a confirmation response is received, namely the connection is successful, the corresponding IP address is used for sending a service request;
after receiving the corresponding service response, determining that the service request is successfully processed; otherwise, switching the IP address until the service request is successfully processed; in the service request process, if the used IP address is timed out, the IP address is switched.
Based on the same inventive concept, the embodiment of the application also provides a service request method of the crawler equipment, which is applied to any one of the proxy servers in the system comprising the crawler equipment, the load balancing equipment, the proxy servers and the proxy clients.
When the proxy client is online, long connection establishment is carried out between the proxy client and the proxy server;
the proxy server stores the mapping relation between the long connection identifier and the proxy client identifier when the long connection with the proxy client is completed; wherein, a proxy server establishes long connection with 1 or more proxy terminals; one proxy client establishes a long connection with 1 or more proxy servers.
Referring to fig. 2, fig. 2 is a schematic diagram of a service request flow of a crawler according to an embodiment of the present application. The method comprises the following specific steps:
in step 201, the proxy server receives a service request sent by a crawler device deployed in an intranet and forwarded by a load balancing device.
The load balancing equipment is used for sending the service request to the proxy server corresponding to the route cookie if the service request carries the route cookie when receiving the service request sent by the crawler equipment deployed in the intranet;
the load balancing equipment further generates a route cookie for the service request if the service request is determined to not carry the route cookie, and returns the route cookie to the crawler equipment;
selecting a proxy server according to a second preset rule, sending the request to the selected proxy server, and carrying the generated route cookie in the service request sent to the proxy server;
and establishing a mapping relation between the route cookie and the identification of the selected proxy server, and storing the mapping relation.
Step 202, the proxy server determines whether to locally store the mapping relationship between the route cookie carried by the service request and the long connection identifier, if yes, step 203 is executed; otherwise, step 204 is performed.
And 203, the proxy server sends the service request to the corresponding proxy client through the corresponding long connection, so that the proxy client sends the service request to the target station, and the process is ended.
Step 204, the proxy server selects a long connection according to a first preset rule and sends the service request to the corresponding proxy client, so that the proxy client sends the service request to the target station.
The proxy server further establishes a mapping relation between a route cookie carried in the service request and a long connection identifier of the selected long connection after selecting one long connection and sending the service request to a corresponding proxy client according to a first preset rule, and locks the long connection identifier;
and selecting long connections outside the long connections corresponding to the locked long connection identifiers when selecting the long connections according to a first preset rule aiming at a service request.
To prevent the IP address from being masked, the process performed by the proxy client further includes:
the proxy server enables the proxy client to send a connection request to the target station before sending the service request to the target station; if the response of the target station is not received within the first preset time, or the rejection response of the target station is received, switching the currently used IP address and then sending a connection request to the target station; and transmitting the service request to the target station by using the IP address for transmitting the connection request until receiving an acceptance response transmitted by the target station within a first preset time after transmitting the connection request.
After the proxy server enables the proxy client to send the service request to the target station by using the IP address for sending the connection request, if the response of the target station is not received within the second preset time or the received response carries the error key word configured by the proxy client, the currently used IP address is switched;
the proxy client detects the target station after switching the currently used IP address; if the detection fails, switching the IP address again, and detecting again; and sending the service request by using the IP address used when the detection is successful until the detection is successful.
The implementation scheme for periodically replacing the IP address is also provided by combining the anti-shielding treatment process, and is specifically as follows:
the proxy server enables the proxy client to configure a switching timer for an IP address used when sending a service request to the target station; when the switching timer is timed out, the switching IP address sends the service request.
Based on the same inventive concept, the embodiment of the application also provides a service request device of the crawler equipment, which is applied to any one of the proxy servers in the system comprising the crawler equipment, the load balancing equipment, the proxy servers and the proxy clients. Referring to fig. 3, fig. 3 is a schematic view of a device structure according to an embodiment of the present application, where the device structure is applied to the above technology. The device comprises: a receiving unit 301, a determining unit 302, and a transmitting unit 303;
the receiving unit 301 is configured to receive a service request sent by a crawler device deployed in an intranet and forwarded by a load balancing device;
a determining unit 302, configured to determine, when the receiving unit 301 receives a service request, whether to locally store a mapping relationship between a route cookie carried by the service request and a long connection identifier;
a sending unit 303, configured to send, when the determining unit 302 determines to store the mapping relationship between the route cookie carried by the service request and the long connection identifier, the service request to a corresponding proxy client through a corresponding long connection, so that the proxy client sends the service request to a target station; otherwise, according to a first preset rule, selecting one long connection and sending the service request to the corresponding proxy client, so that the proxy client sends the service request to the target station.
Preferably, the apparatus further comprises: a setup unit 304;
when the establishment of the long connection with the proxy client is completed, the establishment unit 304 stores the mapping relation between the long connection identifier and the proxy client identifier; wherein, a proxy server establishes long connection with 1 or more proxy terminals; one proxy client establishes a long connection with 1 or more proxy servers.
Preferably, the apparatus further comprises:
a setting up unit 304, configured to, after selecting a long connection according to a first preset rule and sending the service request to a corresponding proxy client, set up a mapping relationship between a route cookie carried in the service request and a long connection identifier of the selected long connection, and lock the long connection identifier; and selecting long connections outside the long connections corresponding to the locked long connection identifiers when selecting the long connections according to a first preset rule aiming at a service request.
The units of the above embodiments may be integrated or may be separately deployed; can be combined into one unit or further split into a plurality of sub-units.
In addition, the embodiment of the application also provides electronic equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the steps of the service request method of the crawler equipment are realized when the processor executes the program.
In addition, a computer readable storage medium has stored thereon a computer program which when executed by a processor implements the steps of the crawler service request method.
In summary, the application introduces a high-availability distributed proxy cluster, which is composed of a plurality of proxy servers and proxy clients, so that the service request sent by the crawler equipment is scattered to multiple machines, and the defect that the crawler equipment deployed by using an intranet uses a single-machine proxy outlet is avoided; and the same IP outlet is used by the requests of the same route cookie to realize that one IP outlet is used as much as possible by a group of requests.
And causes the proxy client to switch IP addresses for sending service requests at regular intervals so as to avoid being masked.
The foregoing description of the preferred embodiments of the application is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the application.
Claims (12)
1. A crawler service request system, the system comprising: the system comprises crawler equipment, load balancing equipment, a plurality of proxy servers and a plurality of proxy clients;
the load balancing equipment is used for sending the service request to the proxy server corresponding to the route cookie if the service request carries the route cookie when receiving the service request sent by the crawler equipment deployed in the intranet;
the proxy server determines whether to locally store the mapping relation between the route cookie and the long connection identifier when receiving the service request sent by the load balancing equipment, and if so, sends the service request to the corresponding proxy client through the corresponding long connection; otherwise, selecting a long connection according to a first preset rule and sending the service request to a corresponding proxy client;
and the proxy client sends the service request to the target station when receiving the service request sent by the proxy server.
2. A method for requesting service of a crawler device, which is applied to any proxy server in a system including the crawler device, a load balancing device, a plurality of proxy servers and a plurality of proxy clients, the method comprising:
when a service request transmitted by crawler equipment deployed in an intranet and forwarded by load balancing equipment is received, determining whether a mapping relation between a route cookie carried by the service request and a long connection identifier is locally stored, and if so, transmitting the service request to a corresponding proxy client through a corresponding long connection, so that the proxy client transmits the service request to a target station; otherwise, according to a first preset rule, selecting one long connection and sending the service request to the corresponding proxy client, so that the proxy client sends the service request to the target station.
3. The method of claim 2, wherein the step of determining the position of the substrate comprises,
when the establishment of the long connection with the proxy client is completed, storing the mapping relation between the long connection identifier and the proxy client identifier; wherein, a proxy server establishes long connection with 1 or more proxy clients; one proxy client establishes a long connection with 1 or more proxy servers.
4. The method according to claim 2, wherein the method further comprises:
after selecting one long connection according to a first preset rule and sending the service request to a corresponding proxy client, establishing a mapping relation between a route cookie carried in the service request and a long connection identifier of the selected long connection, and locking the long connection identifier;
and selecting long connections outside the long connections corresponding to the locked long connection identifiers when selecting the long connections according to a first preset rule aiming at a service request.
5. The method according to claim 2, wherein the method further comprises:
before the proxy client sends the service request to the target station, a connection request is sent to the target station; if the response of the target station is not received within the first preset time, or the rejection response of the target station is received, switching the currently used IP address and then sending a connection request to the target station; and transmitting the service request to the target station by using the IP address for transmitting the connection request until receiving an acceptance response transmitted by the target station within a first preset time after transmitting the connection request.
6. The method according to claim 5, wherein the method further comprises:
after the proxy client sends the service request to the target station by using the IP address for sending the connection request, if the response of the target station is not received within the second preset time or the received response carries the error key word configured by the proxy client, the currently used IP address is switched;
after the proxy client switches the currently used IP address, detecting the target station; if the detection fails, switching the IP address again, and detecting again; and sending the service request by using the IP address used when the detection is successful until the detection is successful.
7. The method according to any one of claims 2-6, wherein the method further comprises:
the proxy client configures a switching timer for an IP address used when sending a service request to the target station; when the switching timer is timed out, the switching IP address sends the service request.
8. A crawler service request apparatus, which is applied to any proxy server in a system including a crawler, a load balancing device, a plurality of proxy servers, and a plurality of proxy clients, the apparatus comprising: a receiving unit, a determining unit and a transmitting unit;
the receiving unit is used for receiving the service request transmitted by the crawler equipment deployed in the intranet and forwarded by the load balancing equipment;
the determining unit is used for determining whether to locally store the mapping relation between the route cookie carried by the service request and the long connection identifier when the receiving unit receives the service request;
the sending unit is used for sending the service request to the corresponding proxy client through the corresponding long connection when the determining unit determines to store the mapping relation between the route cookie carried by the service request and the long connection identifier, so that the proxy client sends the service request to the target station; otherwise, according to a first preset rule, selecting one long connection and sending the service request to the corresponding proxy client, so that the proxy client sends the service request to the target station.
9. The apparatus of claim 8, wherein the apparatus further comprises: a building unit;
when the establishment of the long connection between the establishment unit and the proxy client is completed, storing the mapping relation between the long connection identifier and the proxy client identifier; wherein, a proxy server establishes long connection with 1 or more proxy clients; one proxy client establishes a long connection with 1 or more proxy servers.
10. The apparatus of claim 8, wherein the apparatus further comprises:
the establishing unit is used for establishing a mapping relation between a route cookie carried in the service request and a long connection identifier of the selected long connection and locking the long connection identifier after the sending unit selects one long connection and sends the service request to a corresponding proxy client according to a first preset rule; and selecting long connections outside the long connections corresponding to the locked long connection identifiers when selecting the long connections according to a first preset rule aiming at a service request.
11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method of any of claims 2-7 when the program is executed by the processor.
12. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method of any of claims 2-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910153670.XA CN111641664B (en) | 2019-03-01 | 2019-03-01 | Crawler equipment service request method, device and system and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910153670.XA CN111641664B (en) | 2019-03-01 | 2019-03-01 | Crawler equipment service request method, device and system and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111641664A CN111641664A (en) | 2020-09-08 |
CN111641664B true CN111641664B (en) | 2023-12-05 |
Family
ID=72330426
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910153670.XA Active CN111641664B (en) | 2019-03-01 | 2019-03-01 | Crawler equipment service request method, device and system and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111641664B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114143368B (en) * | 2021-12-21 | 2022-12-30 | 苏州万店掌网络科技有限公司 | Communication method and device |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103678311A (en) * | 2012-08-31 | 2014-03-26 | 腾讯科技(深圳)有限公司 | Webpage access method and system based on transfer mode and path capturing server |
CN103914568A (en) * | 2014-04-24 | 2014-07-09 | 厦门市美亚柏科信息股份有限公司 | Method and device for dispatching HTTP proxy |
CN105740384A (en) * | 2016-01-27 | 2016-07-06 | 浪潮软件集团有限公司 | Crawler agent automatic switching method and device |
CN107948329A (en) * | 2018-01-03 | 2018-04-20 | 湖南麓山云数据科技服务有限公司 | A kind of cross-domain processing method and system |
CN108345642A (en) * | 2018-01-12 | 2018-07-31 | 深圳壹账通智能科技有限公司 | Method, storage medium and the server of website data are crawled using Agent IP |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105335511A (en) * | 2015-10-30 | 2016-02-17 | 百度在线网络技术(北京)有限公司 | Webpage access method and device |
-
2019
- 2019-03-01 CN CN201910153670.XA patent/CN111641664B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103678311A (en) * | 2012-08-31 | 2014-03-26 | 腾讯科技(深圳)有限公司 | Webpage access method and system based on transfer mode and path capturing server |
CN103914568A (en) * | 2014-04-24 | 2014-07-09 | 厦门市美亚柏科信息股份有限公司 | Method and device for dispatching HTTP proxy |
CN105740384A (en) * | 2016-01-27 | 2016-07-06 | 浪潮软件集团有限公司 | Crawler agent automatic switching method and device |
CN107948329A (en) * | 2018-01-03 | 2018-04-20 | 湖南麓山云数据科技服务有限公司 | A kind of cross-domain processing method and system |
CN108345642A (en) * | 2018-01-12 | 2018-07-31 | 深圳壹账通智能科技有限公司 | Method, storage medium and the server of website data are crawled using Agent IP |
Also Published As
Publication number | Publication date |
---|---|
CN111641664A (en) | 2020-09-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107483260B (en) | Fault processing method and device and electronic equipment | |
US7518983B2 (en) | Proxy response apparatus | |
US8099510B2 (en) | Relay device and program product, allowing continued communication via an alternative protocol | |
CN111190747A (en) | Message loss detection method and device for message queue | |
EP2939401B1 (en) | Method for guaranteeing service continuity in a telecommunication network and system thereof | |
CN112398847B (en) | Intranet penetration method and system based on TCP Socket and improved heartbeat mechanism | |
CN111447185A (en) | Processing method of push information and related equipment | |
CN108377247B (en) | Message pushing method and device | |
CN104601702B (en) | Cluster remote procedure calling (PRC) method and system | |
CN107528891B (en) | Websocket-based automatic clustering method and system | |
CN108712457A (en) | Back-end server dynamic load method of adjustment and device based on Nginx reverse proxys | |
CN107124483A (en) | Domain name analytic method and server | |
CN113347037B (en) | Data center access method and device | |
CN111427703A (en) | Industrial data real-time display method and system | |
CN111641664B (en) | Crawler equipment service request method, device and system and storage medium | |
US6807582B1 (en) | Interprocess communication system | |
CN104009961A (en) | PPPoE session ID distribution method and equipment thereof | |
CN106470249A (en) | Gateway-whois domain name registration querying method and device | |
CN103428171A (en) | Session processing method, application server and system | |
CN108632355B (en) | Routing method for household appliance network, control terminal, readable storage medium and equipment | |
CN114866596B (en) | Session processing method, session processing device, server and storage medium | |
CN113452800B (en) | Method for realizing load balance based on multiple Broker in MQTT protocol | |
CN111385324A (en) | Data communication method, device, equipment and storage medium | |
US20230146880A1 (en) | Management system and management method | |
CN110677417A (en) | Anti-crawler system and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |