[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

US20130173806A1 - Load-balancing cluster - Google Patents

Load-balancing cluster Download PDF

Info

Publication number
US20130173806A1
US20130173806A1 US13/495,085 US201213495085A US2013173806A1 US 20130173806 A1 US20130173806 A1 US 20130173806A1 US 201213495085 A US201213495085 A US 201213495085A US 2013173806 A1 US2013173806 A1 US 2013173806A1
Authority
US
United States
Prior art keywords
server
request
servers
connection
resource
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US13/495,085
Inventor
Christopher Newton
Maksim Yevmenkin
David Fullagar
Jeffrey G. Koller
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Level 3 Communications LLC
Original Assignee
Level 3 Communications LLC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Level 3 Communications LLC filed Critical Level 3 Communications LLC
Priority to US13/495,085 priority Critical patent/US20130173806A1/en
Priority to PCT/US2012/071674 priority patent/WO2013101844A1/en
Priority to CA2862339A priority patent/CA2862339C/en
Priority to EP12862575.3A priority patent/EP2798513B1/en
Publication of US20130173806A1 publication Critical patent/US20130173806A1/en
Priority to HK15103847.2A priority patent/HK1203654A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1008Server selection for load balancing based on parameters of servers, e.g. available memory or workload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1014Server selection for load balancing based on the content of a request
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1025Dynamic adaptation of the criteria on which the server selection is based
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management
    • H04L67/148Migration or transfer of sessions

Definitions

  • This invention relates to content delivery.
  • MAC Media Access Control
  • MAC address means Media Access Control address
  • IP Internet Protocol
  • TCP Transmission Control Protocol
  • IP address means an address used in the Internet Protocol to identify electronic devices such as servers and the like;
  • ARP Address Resolution Protocol
  • HTTP Hyper Text Transfer Protocol
  • URL means Uniform Resource Locator
  • IGMP Internet Group Management Protocol
  • DNS Domain Name System
  • FIG. 1 depicts a load-aware load-balancing cluster
  • FIG. 2 depicts an exemplary TCP connection handoff
  • FIGS. 3-4 are flowcharts of a TCP connection handoff
  • FIG. 5 depicts a collection of load-balancing clusters
  • FIGS. 6A-6B depict a flowchart of processing associated with server interactions
  • FIG. 7 is a flowchart of processing associated with server interactions
  • FIGS. 8A-8B show categorization of server load
  • FIG. 9 is flowchart of processing associated with server load categorization
  • FIG. 10 is an exemplary load table
  • FIG. 11 is a flowchart of processing associated servers sharing load information
  • FIGS. 12A-12B are flowcharts of processing associated with load-aware request processing.
  • FIGS. 13-15 are flowcharts of processing associated with request processing.
  • a load-balancing cluster 10 is formed by an n-port switch 12 connected to a number (between 1 and n) of servers 14 - 1 , 14 - 2 , . . . , 14 - m , where m ⁇ n (collectively “servers 14 ”) via ports 16 - 1 , 16 - 2 , . . . , 16 - n . Not every port 16 - k of the switch 12 needs to have an actual (or operating) server 14 connected thereto.
  • the switch 12 is preferably an Ethernet switch.
  • Each server 14 - j includes a processor (or collection of processors) constructed and adapted to provide data in response to requests.
  • all servers are the same and run the same version of operating system (OS), with same kernel and software.
  • OS operating system
  • the servers may be any server running any type of server processes.
  • the servers need not all be the homogeneous, and heterogeneous servers are contemplated herein.
  • Each server 14 - j in the cluster 10 is addressable by a unique hardware address—in the case of the Ethernet, a so-called a MAC address (also known sometimes as an Ethernet address).
  • a MAC address also known sometimes as an Ethernet address
  • the MAC or actual hardware address of the j-th cluster server is denoted MACj.
  • the servers 14 in the load-balancing cluster 10 are all assigned the same virtual IP address (VIP), e.g., “10.0.0.1”.
  • VIP virtual IP address
  • Each server preferably also has at least one other unique (preferably local) IP address, denoted IPj for the j-th cluster server.
  • IPj IP address
  • a VIP address is also has MAC address (denoted MACVIP) associated with it, with the VIP's MAC address being shared by all the servers in a cluster. That is, in preferred embodiments, the (VIP, VIP's MAC address) pair, i.e., (VIP, MACVIP) is the same for all the servers in a cluster.
  • each server also preferably has its own private (IP address, IP's MAC address) pair (e.g., (IPi, MACi)).
  • the servers 14 in cluster 10 are addressable externally (e.g., from network 17 , e.g., the Internet) via the local (Ethernet) network 13 and switch 12 .
  • network 17 e.g., the Internet
  • switch 12 forwards Ethernet frames, preferably as fast and as efficiently as possible.
  • the switch 12 may perform one-to-one (unicast) forwarding or one-to-many (broadcast or multicast) forwarding. In unicast forwarding a packet enters the switch on one port and leaves the switch on another port.
  • the switch 12 may use so-called “IGMP snooping” to learn which physical ports belong to live servers. In case of an “unlearned” unicast MAC address, the switch 12 forwards incoming traffic to all ports.
  • the system is not limited by the manner in which the switch 12 provides packets to the servers 14 connected thereto. Those skilled in the art will realize and understand, upon reading this description, that different and/or other methods of achieving this result may be used.
  • an Ethernet MAC address is used to identify a particular host machine connected to the network.
  • a protocol such as, e.g., ARP, may be used to translate between a host's IP address and its Ethernet MAC address.
  • ARP e.g., ARP
  • IP router provides a gateway between two (or more) IP networks.
  • the purpose of an IP router is to forward IP packets from one IP network to another.
  • An IP router should have an interface and IP address in each network to which it is connected. So, IP router 11 has at least two interfaces and two IP addresses: one IP address to connect to the upstream network ( 17 in FIG. 1 ) and the other IP address to connect to the local Ethernet network ( 13 in FIG. 1 ).
  • a request from client 19 is made to the IP address VIP (via network 17 ) and reaches the router 11 .
  • the request comes into the router 11 via the interface connected to the upstream network 17 , and the router 11 forwards the request to the VIP (on the local Ethernet network 13 ).
  • the router 11 encapsulates the IP packet (i.e., the request) into an Ethernet packet before sending it. In order for the router 11 to know where to send the Ethernet packet, the router makes an ARP request. Once the Ethernet packet is sent, the switch 12 forwards it to the server(s) 14 .
  • a router In order to affect ARP mapping, a router (e.g., router 11 ) typically maintains a so-called ARP table 15 (mapping IP addresses to the MAC addresses of hosts connected thereto). In this manner, when an IP packet is sent to a particular host that is connected to the router 11 , the router automatically resolves to the destination host's MAC address and forwards the packet to the appropriate host. The router 11 will try to deliver the IP packet directly to destination (i.e., the VIP) because the router is connected to the same local Ethernet network.
  • ARP table 15 mapping IP addresses to the MAC addresses of hosts connected thereto
  • Certain special MAC addresses can be used to instruct a switch to broadcast (or multicast) a packet, thereby providing a packet to all hosts connected to that switch.
  • an Ethernet switch sends a packet with a broadcast or multicast MAC address in its destination field to every port (or every port with a server connected thereto), whereby every host/server connected to the Ethernet switch should get a copy of the packet.
  • the client 19 has the IP address of a server (in this case VIP), and tries to establish a connection via the network 17 and the router 11 .
  • VIP IP address of a server
  • the router 11 When the router 11 gets a request to connect to a server with the IP address VIP (shared by the cluster servers 14 - j ), the router maps the IP address VIP to a special MAC address that causes the switch 12 to forward the request to each server connected thereto.
  • the switch 12 treats the MAC address for a VIP as a multicast Ethernet address. Consequently, each member of the cluster 12 (i.e., each server 14 ) sees all incoming traffic (addressed to VIP).
  • the router's ARP table 15 thus gets a multicast Ethernet address for the VIP, and thus, at the IP layer, all incoming traffic to the VIP address is provided to all servers 14 connected to the switch 12 .
  • the switch 12 maintains a so-called “forwarding database,” (FDB 23 in FIG. 1 ) to map destination Ethernet MAC addresses to physical Ethernet ports 16 on switch 12 .
  • FDB 23 forwarding database
  • the switch queries the forwarding database (e.g., using the destination MAC address as a key) and tries determine which physical port should be used to send the Ethernet packet out.
  • This forwarding database 23 allows switch 12 to forward Ethernet packets only where they should go.
  • switch 12 when switch 12 receives an Ethernet packet and cannot find an entry in its forwarding database for a destination Ethernet MAC address (i.e., e.g., in the case of an unknown/unlearned MAC address), the switch forwards such an Ethernet packet to all the ports (except the one it came from).
  • a multicast Ethernet MAC address has entry in the switch's 12 forwarding database instructing it to forward Ethernet packet to multiple ports 16 .
  • An Ethernet switch will generally try to learn by looking at the MAC addresses of all the Ethernet packets passed through the switch and will try to update its forwarding database accordingly. However, it is preferable to ensure that the switch 12 never “learns” about MAC address for the VIP and never builds an association between VIP cluster MAC addresses and physical ports 16 . The switch 12 is thereby forced to always forward Ethernet packets destined for the cluster MAC address (and thus the cluster VIP) to multiple/all ports 16 .
  • a TCP connection must be established between the client 19 and that cluster server 14 .
  • a TCP connection is established between two machines, in part, using a well-known three-way handshake (SYN, SYN/ACK, ACK). This protocol is described, e.g., in “RFC 793—Transmission Control Protocol,” September 1991, the entire contents of which are incorporated herein by reference for all purposes.
  • each cluster member i.e., each server 14
  • each cluster member effectively decides which server 14 will handle a connection.
  • each cluster member decides for itself whether or not to handle a connection.
  • the other cluster members do not handle (and need not even see) traffic related to that connection. The manner of server selection is described below.
  • Each cluster member (server) includes a stateful firewall (FW) mechanism that is used to filter unwanted incoming traffic.
  • the firewall mechanism for the j-th server is denoted 20 - j .
  • the firewall Upon receipt of an IP packet, the firewall first determines whether the packet is for an old (i.e., already established) connection or for a new connection. For already-established connections each firewall mechanism is configured to reject incoming traffic that does not have an entry in its firewall state table 22 , and only to accept incoming traffic that does have an entry in its firewall state table.
  • the firewall table for the j-th server is denoted 22 - j .
  • the firewall must still inspect packets associated with new connections (i.e., connections in the process of being established, specifically packets with only SYN flag set). To summarize: first the firewalls make a decision as to whether an IP packet is “new” or “old”. If the packet is “old” then it is discarded unless a state entry exists. If the packet is “new” it is passed for further inspection (e.g., load balancing) and then, depending on the results, can be either discarded or accepted.
  • packets associated with new connections i.e., connections in the process of being established, specifically packets with only SYN flag set.
  • a corresponding entry is created in that member's firewall state table 22 - j .
  • the cluster member/server creates a firewall state table entry for any packet that belongs to a connection initiated from or accepted by the cluster member. If a packet indicates that a remote host wishes to open a new connection (e.g., via an IP SYN packet), then such packet gets inspected by a firewall rule that determines whether or not the cluster member should accept it. If the packet was accepted by a cluster member, the firewall state table for that cluster member is updated and all subsequent packets on the connection will be accepted by the cluster member. The firewalls of the other cluster members will block packets that they are not supposed to be processing (i.e., packets that do not belong to connections they initiated or accepted).
  • the firewall rule preferably ensures that only one cluster member will accept a particular connection, however in some cases, it is possible that more than one cluster member decide to accept the same connection. This situation would create duplicate responses from the cluster. However, as those skilled in the art will realize and understand, upon reading this description, this is not a problem for a TCP connection because the remote host will only accept one response and discard others. In this scenario only one cluster member will be able to communicate with the remote host, other cluster members will have a stuck connection that will be closed due to timeout. In the case when no servers respond to an initial SYN packet the client will retry and will send another SYN packet after a timeout. While cluster members may have inconsistent state, they should converge and achieve consistent state quickly.
  • the firewall determines which cluster member should handle a particular connection using a given mapping function, preferably a hash function.
  • a hash function preferably a hash function.
  • the hash function jhash a standard hash function supplied in the Linux kernel, may be used.
  • the hash function produces an integer value. To obtain a value in the range 1 to m, for some m, the output of the hash function is divided by in and the remainder is used (this operation may be performed using an integer remainder or modulo operation).
  • the value of in is the number of currently live servers in the cluster.
  • the function MAP(source IP, in) may be implemented as:
  • each server 14 performs the (same) mapping function (with the same inputs).
  • Each server or cluster member 14 is associated with a particular local server number (or agent identifier (ID)).
  • agent identifier ID
  • Each server compares the result of the mapping function (e.g., hash modulo m) to its local server number. If the result of the mapping function is equal to the local server number, the packet is accepted, otherwise the packet is dropped.
  • the exemplary functions shown above all operate on values related to the particular connection (e.g., source and destination address and port information).
  • the mapping function may be one which merely takes as input the number of active servers (MAP (m) ⁇ 1 . . . m ⁇ ).
  • An example of such a function is a round-robin function.
  • Another example of such a function is one which uses external (possibly random) information. Note, however, that since all servers have to use the same mapping function and have to produce the same result, such a function would need to access a global space and all invocations of such a function (from each cluster server) would need to be operating on the same values.
  • mapping function should produce a number in the range 0 to 6.
  • server S4 (which corresponds to bucket 4) handles the connection.
  • one of the servers e.g., S3 becomes inactive.
  • the status of the cluster is then as follows:
  • mapping function must produce a value in the range 0 to 5. If a new connection comes in, and if the mapping function produces a value 4, then server S6 (not S5) will handle this connection.
  • the buckets may be renumbered or reordered in different ways when a server is added to or removed from the cluster. For example, it may be desirable to give the new server the bucket number 5 and to leave the other servers as they were. It should be noted that existing connections are not affected by server/bucket renumbering because load balancing is only performed on new connections. Existing (i.e., old) connections handled entirely in the firewall.
  • Each cluster member 14 includes a so-called heartbeat processes/mechanism 18 .
  • Each heartbeat mechanism 18 (on each cluster member 14 ) is a process (or collection of processes) that performs at least the following tasks:
  • the heartbeat monitors the state of VIPs on servers.
  • Each server may have more than one VIP configured, and the heartbeat keeps track of each VIP's state separately.
  • heartbeat mechanism While described herein as a single mechanism, those skilled in the art will realize and understand, upon reading this description, that the various functions of the heartbeat mechanism can each be considered a separate function or mechanism.
  • the Heartbeat Mechanism Monitors Server Configuration on The Cluster
  • the heartbeat mechanism 18 on each cluster member/server 14 determines its own state as well as that of each VIP on other cluster members. (In order to simplify the drawing, not all of the connections between the various heartbeat mechanisms are shown in FIG. 1 .)
  • heartbeat mechanism 18 maintains information about other VIPs in the cluster 10 (preferably all other VIPs). To this end, the heartbeat mechanism 18 builds and maintains a list of VIPs connected to the switch 12 , and then, for each of those VIPs, maintains (and routinely updates) information.
  • the heartbeat mechanism 18 on each server 14 first builds a list of network interfaces in the system and obtains information about IP addresses on these interfaces.
  • the heartbeat mechanism 18 may, e.g., use, as its main input, a table containing information about the local cluster and VIPs. In general, an external process may provide VIP configuration on the local cluster to the heartbeat process, e.g., in a form of table. Those skilled in the art will know and understand, upon reading this description how such a process and table may be defined and configured.
  • the heartbeat mechanism 18 considers each VIP in the cluster 10 to be in one of three states, namely “configured”, “connecting” and “connectable”. In order to maintain these states, the heartbeat mechanism 18 obtains a list of VIPs that should be configured on the cluster 10 . Each VIP from the list is preferably cross-checked against list of IP addresses on all interfaces. If a match is found, the VIP is marked as “configured”. (A VIP is in the “configured” state—when the VIP is configured on one of the local (to host) interfaces). For every VIP marked as “configured”, the heartbeat mechanism 18 tries to initiate a TCP connection on a specified port, e.g., either 80 or 443.
  • a specified port e.g., either 80 or 443.
  • connection to a VIP As soon as connection to a VIP is initiated, the VIP is marked as “connecting”. If connection to a VIP is successful, the VIP is marked as “connectable”. A VIP's state is “connecting” when a TCP health check is currently in-progress; a VIP's state is “connectable” when the most recent TCP health check succeeded.
  • the heartbeat mechanism 18 continuously performs the actions described above, preferably at fixed, prescribed time intervals.
  • Servers are automatically configured (or removed) on (from) loopback clone interfaces as needed.
  • the heartbeat mechanism takes over the first 100 (1o:0-1o:99) loopback clone interfaces. If needed, manual loopback interfaces can be configured starting from 1o:100 and up.
  • the Heartbeat Mechanism Answers ARP Queries for the Configured VIPs
  • Each active heartbeat mechanism 18 continuously listens for ARP requests. Upon receipt of an ARP request, the heartbeat mechanism examines request to see if it relates to a VIP that should be configured on the cluster. If the ARP request does relate to a VIP, the heartbeat mechanism checks if the VIP is in “configured” state and if so, the heartbeat mechanism replies with an ARP reply for that VIP.
  • the Heartbeat Mechanism Monitors Local State and State of Other Cluster Members
  • the heartbeat mechanism 18 preferably tries to maintain full state information for all servers 14 in the cluster 10 .
  • State per cluster preferably includes one or more of: (a) number of cluster members that should serve traffic for the cluster, (b) number of cluster members that are serving traffic for the cluster; and (c) timestamp information.
  • Each heartbeat mechanism preferably announces its full state to other cluster members at a prescribed time interval. State updates are preferably sent to a multicast UDP address which is shared by all cluster members. (Note: this UDP multicast is not the same as the VIP multicast discussed above.)
  • the heartbeat mechanism can also be configured to send multiple unicast UDP messages to each member of the cluster when performing state announcing.
  • Each heartbeat mechanism updates its state upon receiving state update from other cluster members if the following conditions are met: the server is present on the receiving cluster member and the received state is “newer” (per timestamp) than the current state on receiving cluster member. Since a timestamp is used, preferably clocks on all cluster members are synchronized.
  • a heartbeat mechanism 18 analyzes its state and checks for state transitions.
  • the heartbeat mechanism checks each server's state and makes sure that it is fresh. So-called “non-fresh” servers are automatically considered as “down”. Each server's state is compared to its previous state, and, if different, a state transition is noted.
  • Changes to VIP state are made as they detected, based on the current heartbeat's view of the cluster.
  • server selection has been made within a cluster by the cluster members at the TCP level.
  • the system does not require a load balancing switch, thereby reducing the cost.
  • the system duplicates incoming (client-to-cluster) traffic to all servers in the cluster and lets each server decide if it is to deal with particular part of the incoming traffic. All servers in the cluster communicate with each other and decide on an individual server's health.
  • the server may ascertain whether it is responsible for handling/serving the resource, and, if not, the previously-selected server may notify (or provide a notification) to another cluster member that is responsible for handling the resource (e.g., another cluster member that already has a copy of the requested resource).
  • the notification may include a hand-off request to so that another cluster member responsible for the resource can server the resource itself.
  • the notification may include a request for a copy of the resource (e.g., via a peer-fill request) from another cluster member responsible for the resource (i.e., that already has a copy of the requested resource).
  • the cluster member responsible for (handling) the requested resource may process the notification from the previously or originally selected server in a number of ways. For instance, a cluster member that has previously served the requested resource (or that is “responsible” for handling the request, or already has a copy of the requested resource) may determine whether to accept or reject a hand-off request (or a peer-fill request) from the previously or originally selected server. For example, the other cluster member may decide to accept or reject the hand-off request (or peer-fill request) based on various attributes of the requested resource such as, but not limited to, the size and popularity of the requested resource.
  • the responsible server accepts a hand-off request (or rejects a peer-fill request) if the size of the request resource exceeds a threshold value.
  • This step is advantageous because copying a large resource to the previously selected server is inefficient and would not be a worthwhile expenditure of system and network resources. If, on the other hand, the size of the requested resource is small (i.e., does not exceed a size threshold), then it may be worthwhile to reject the hand-off request (or accept the peer-fill request) and provide a copy of the requested resource to the previously selected sever so that the previously selected server can handle the request.
  • the responsible server may reject the hand-off request (or accept/honor the peer-fill request) and (indirectly) force the previously selected server to obtain and serve the requested resource (or simply provide a copy of the requested resource to the previously selected server). Since the resource is popular and, thus, likely to continue to be requested frequently, it would be beneficial for other servers (i.e., the previously selected server) to have a copy of the requested resource so that the requested “popular” resource can be served more efficiently.
  • the responsible server may also provide a copy of the requested resource to the previously selected server (or the previously selected server may also obtain a copy of the requested resource from other sources, such as other peers, upstream servers, etc.).
  • a “resource” may be any kind of resource, including, without limitation static and dynamic: video content, audio content, text, image content, web pages, Hypertext Markup Language (HTML) files, XML files, files in a markup language, documents, hypertext documents, data files, and embedded resources.
  • HTTP Hypertext Markup Language
  • the server 14 - k may receive a request from the client 19 , e.g., for a resource.
  • the server 14 - k may receive an HTTP request (e.g., an HTTP GET request) from client 19 .
  • HTTP request e.g., an HTTP GET request
  • Such a request generally includes a URL along with various HTTP headers (e.g., a host header, etc.).
  • the selected server 14 - k now determines whether it is responsible to handle this request or whether the request should be passed on to a different cluster member. To make this determination, the selected server 14 - k considers the request itself and applies a second given function to at least some of the information used to make the request (e.g., to the URL and/or headers in the request).
  • This second function essentially partitions the request space (e.g., the URL space) so as to determine whether the selected server is, in fact, responsible to for this particular request. If the server determines that it is responsible for the request, it continues processing the request. If not, the server hands-off the request (as described below) on to another cluster member (e.g., server 14 - p ) that is responsible for the request. Having successfully passed off the request, the cluster member, server 14 - k , updates its firewall to reject packets associated with the connection. The responsible cluster member (server 14 - p ) correspondingly updates its firewall to accept packets associated with this connection.
  • the request space e.g., the URL space
  • partition function the function used to partition the requests is referred to as a partition function.
  • the partition function may be a hash function or the like. In some cases the partition function may take into account the nature or type of request or resource requested. For example, certain cluster members may be allocated to certain types of requests (e.g., movies, software applications, etc.).
  • the partition function applied to the URL can be used to implement a degree of policy based load mapping.
  • the Partition function may choose to use only a part of the URL (e.g., the hostname).
  • a cluster may comprise a number of non-homogenous servers. Certain requests may be directed to certain cluster servers based on server capability (e.g., speed) or based on arrangements with customers.
  • the cluster includes two servers: server A and server B.
  • Each of the servers runs a web cache, listening on a shared VIP (and port, e.g., port 80). Remote clients make incoming TCP connections to the VIP and port (as described above).
  • server A is initially selected to accept a particular TCP connection from a client (at S 30 in FIG. 3 ).
  • Server A accepts the connection from the client and waits for the HTTP request from the client.
  • server A decides to hand the request off to the server B. That is, the selected server (server A in this example) ascertains (using the partition function described above) whether it is the server responsible for the request (at S 31 ).
  • the originally-selected server is responsible for the request (at S 32 ), then it handles the request (at S 33 ), otherwise it hands off (or tries to hand off) the request to the responsible cluster member (server B in this example) (at S 34 ). If the handoff is determined to be successful (at S 35 ), then the server responsible for the request (Server B in the example) handles the request (at S 36 ), otherwise the originally selected server (Server A) handles the request (at S 37 ).
  • the hand-off process takes place as follows (with reference to FIG. 4 ) (for the purposes of this discussion, assume that server A hands off to server B):
  • the originally-selected server freezes the TCP connection from the client (at S 40 ).
  • the selected server takes a snapshot of the frozen TCP connection (at S 41 ), storing required information about the connection.
  • the originally-selected server then sends the snapshot of the frozen TCP connection to the responsible server (server B), preferably using a side communication channel to the responsible server (at S 42 ).
  • the responsible server (Server B) receives the snapshot of the frozen TCP connection from the originally-selected server (Server A) (at S 43 ). Using the snapshot of the frozen TCP connection, the responsible server (Server B) attempts to clone the TCP connection to the remote client (at S 44 ). If the connection was cloned successfully, the responsible server (server B) sends acknowledgement to the originally-selected server (Server A), preferably using the side communication channel to the server A (at S 45 ).
  • the originally-selected server (Server A) closes the frozen TCP connection to the client (at S 46 ).
  • the responsible server then thaws the frozen (clone) TCP connection to the client (at S 47 ).
  • the responsible server (Server B) continues to process incoming HTTP request from the client (at 52 in FIG. 4 ).
  • the accepting server may fail to clone connection or may refuse to satisfy handoff request. In these cases a negative acknowledgment will be sent and originating (handoff) server will continue to process original request. Should the responsible server (Server B) decline (or fail to satisfy) the handoff request from the originally-selected server (Server A), server A may thaw the TCP connection and continue to serve it locally.
  • a responsible server generally should not decline a handoff request or a request to take over a connection. However, a responsible server may have to decline a request, for example if its software is being shutdown. Note, too that two or more servers in the same cluster may be responsible for the same content, and may provide a degree of redundancy in content (to reduce fills from the origin server) and also to handle a so-called “flash crowd” when a certain piece of content becomes very popular for a relatively short period time.
  • the responsible server When a handoff is successful, the responsible server must update its firewall to accept packets relating to that connection (and the server that handed off the connection must update its firewall to no longer accept such packets).
  • the server making the handoff may provide the responsible server with information about the request (e.g., the type of request, the URL, the headers, etc.). In this way the responsible server may have sufficient information to satisfy the request.
  • information about the request e.g., the type of request, the URL, the headers, etc.
  • mapping function should produce a number in the range 0 to 6.
  • server S4 (which corresponds to bucket 4) is selected at the TCP level to handle the connection.
  • Server S4 and the client then establish their connection and the client then sends an HTTP request (e.g., a GET request with a URL (URL1) and header information).
  • HTTP request e.g., a GET request with a URL (URL1) and header information.
  • partition function can use the same bucket association as the mapping function or it may use a different association. For example, if the partition function is implementing policy-based or capacity based distribution, then the partition function may need a separate bucket association. For this example, assume that the partition function uses the same bucket association as the mapping function.
  • Server S4 freezes the TCP connection from the client (at S 40 in FIG. 4 ) and then takes a snapshot of the frozen TCP connection, storing required information about the connection (at S 41 ).
  • Server S4 sends the snapshot of the frozen TCP connection to Server S6, preferably using a side communication channel (at S 42 ).
  • Server S6 receives the snapshot of the frozen TCP connection from Server S4 (at S 43 ).
  • Server S6 uses the snapshot of the frozen TCP connection, Server S6 attempts to clone the TCP connection to the remote client (at S 44 ). If the connection is successfully cloned, then server S6 sends an acknowledgement to Server S4, preferably using the side communication channel (at S 45 ).
  • Server S4 Upon receipt of the acknowledgement, Server S4 closes the frozen TCP connection to the client (at S 46 ). Server S6 then thaws the frozen (clone) TCP connection to the client (at S 47 ). With the handoff successful, Server S6 continues to process incoming HTTP request from the client.
  • the result of this function is 6 which corresponds to server S6.
  • S6 connects with the client and the client then sends an HTTP GET request with a URL (URL1—the same as in the earlier request) and header information.
  • URL1 the same as in the earlier request
  • the number of servers connected to the switch could be greater than the number of servers responsible for the VIP.
  • a cluster may be configured with 20 servers connected to the same switch, 10 servers serving one VIP and another 10 servers serving another VIP.
  • the heartbeat assists in load balancing for two VIPs, and each VIP will be load balanced across 10 servers.
  • a collection 100 of load-balancing clusters 10 - 1 , 10 - 2 , . . . , 10 - p may be combined.
  • Each cluster 10 - j has one or more corresponding VIPs (VIP-j), so that requests for a server at the IP address VIP-k (for some value of k) will be directed (by router 110 ) to the appropriate cluster for handling by one of the cluster members.
  • the router 110 may be, e.g., a load balancing router. It should be appreciated that the collection 100 of clusters may form part of a content delivery network (CDN).
  • CDN content delivery network
  • a client 19 may request a resource and be directed by a server selector system (e.g., DNS or the like) to a cluster.
  • the server selector returns an IP address that happens to be a VIP address.
  • the client then requests the resource from the VIP and, as described above, is connected (during a TCP connection) to a particular cluster member to handle the request.
  • connection may be handed off to another cluster member.
  • FIGS. 6A and 6B are a flowchart ( 600 - 1 and 600 - 2 ) of processing steps associated with server interactions.
  • step 605 the cluster (i.e., via a switch) obtains a connection request to connect to a server associated with the virtual IP address (i.e., any server sitting behind the switch associated with a virtual IP address).
  • a server associated with the virtual IP address i.e., any server sitting behind the switch associated with a virtual IP address.
  • step 610 the cluster (i.e., via the switch) provides the connection request to each server connected to the switch.
  • step 615 at least one of the plurality of servers connected to the switch determines which of the plurality of servers should handle the connection. Such a determination can be based, for example, on a given function of information used to request the connection.
  • step 620 if the server that is determined to handle the request does not have a copy of the requested resource, that server then requests to hand-off the connection (i.e., TCP connection) to at least one other of the plurality of servers that does have a copy of the requested resource.
  • the server may request a copy of the requested resource (e.g., via a peer-fill request) from another server that has a copy of the resource instead of sending a hand-off request.
  • the server that has a copy of the requested resource determines whether to accept or reject the hand-off request (or reject or accept the peer-fill request) from the server that was originally determined to handle the connection/request. This determination can be based, for example, on the size of the requested resource, the popularity of the requested resource, as well as other attributes that are suitable for determining whether or not a TCP hand-off should occur in a server cluster in response to a request for certain resources.
  • the server that has the copy of the requested resource accepts the hand-off request (or rejects the peer-fill request) if the size of the requested resource value exceeds a threshold value.
  • the size of the requested resource is determined to be too large (i.e., exceeds a threshold value) for expending precious system and network resource (i.e., by providing intra-cluster copies of resources, for example, one server sending a copy of a resource to another server in the cluster)
  • the server with the requested resource will handle the request itself (i.e., serve the requested resources, and, for example, not honor the peer-fill request).
  • the server that has the copy of the requested resource accepts the hand-off request (or rejects the peer-fill request) if the popularity of the requested resource does not exceed a popularity threshold value.
  • a popularity threshold value i.e., the number of times the particular resource has been requested during a retrospective time period does not exceed a threshold value
  • the server with the copy of the request resource handles the connection and serves the resource (and, for example, does not honor the peer-fill request). Since the resource is not yet deemed popular, it is likely that the resource will not be requested as often and therefore is would not be efficient to transfer copies of the resource to other servers in the cluster.
  • the server that has the copy of the requested resource rejects the hand-off request (or accepts/honors the peer-fill request if a copy of the resource is available) if the popularity of the requested resource exceeds the popularity threshold value.
  • the server that has the copy of the requested resource rejects the hand-off request (or accepts/honors the peer-fill request if a copy of the resource is available) if the popularity of the requested resource exceeds the popularity threshold value.
  • the server that has the copy of the requested resource rejects the hand-off request (or accepts/honors the peer-fill request if a copy of the resource is available) if the popularity of the requested resource exceeds the popularity threshold value.
  • the server with the copy of the requested resource rejects the request, which, in one embodiment, forces the requesting server to obtain and serve the requested resource itself (and, thus, maintain a copy of the popular resource, for example, by honoring the peer-fill request and thus providing a copy of the requested resource).
  • the server that has the copy of the requested resource rejects the hand-off request (or accepts/honors the peer-fill request if a copy of the resource is available) if the popularity of the requested resource exceeds the popularity threshold value and the size of the requested resource exceeds the threshold size value.
  • This particular step elucidates the significance of popular content. Even if the size of the requested resource is deemed to large to send an intra-cluster copy from one server to another server within the same cluster (i.e., in light of the expenditure to system and network resources within the cluster), the popularity of the content may still make it more efficient in the long run to distribute a copy (or copies) of the requested resource throughout the cluster in anticipation of more requests for the popular content at the cluster. For example, one way to distribute copies of the requested resource is to reject the hand-off request and (either directly or indirectly) force the originally-selected server to handle the connection and ultimately serve the requested resource.
  • FIG. 7 is a flowchart 700 of processing steps associated with server interactions.
  • a connection request to connect to a server associated with the IP address is received (e.g., at a cluster comprising a switch and plurality of server connected thereto via one or more ports of the switch).
  • step 710 a determination is made as to which of the plurality servers is to handle the connection (e.g., via a hash function).
  • step 720 if a first server of the plurality of servers is determined to be the server to handle the connection (e.g., via the hash function), and the first server does not have a copy of the requested resource, the first server provides a notification to a second server of the plurality of servers that does have a copy of the requested resource.
  • the notification indicates that the first server does not have a copy of the requested resource.
  • the notification can include a hand-off request to hand-off the connection to another server (e.g., the second server in this step), and/or a peer-fill request that requests a copy of the requested resource from another server (e.g., the second server in this step).
  • the second sever determines whether to: i) provide a copy of the requested resource to said server (e.g., reject a hand-off request or accept a peer-fill request if a copy of the requested resource is available), or ii) request the server to handoff the connection to the second server so that the second server can serve the requested resource (e.g., accept a hand-off request or reject a peer-fill request). For example, in one embodiment this determining may be based on an attribute of the requested resource (e.g., size, popularity, etc.).
  • each server in a cluster measures its own load and maintains information about the load of the other servers in the same cluster.
  • each server 14 - j in the cluster 10 includes a load mechanism denoted 24 - j for the j-th cluster server.
  • the load mechanism is constructed and adapted to measure the load on the cluster server using known load measurement techniques.
  • the load mechanism on each particular cluster server converts or maps the load on that cluster server into one of a relatively small number of bands. These bands may be based on predetermined threshold values.
  • the load on a cluster server is mapped into one of three (3) bands or categories, corresponding to light load, medium load, and heavy load.
  • the three bands are referred to as Green (for lightly loaded servers), Yellow (for servers with medium load), and Red (for servers with a heavy load).
  • k ⁇ 1 threshold values are needed (denoted T 1 , T 2 . . . , T k-1 ). If the measured load is less than the first (lowest) threshold value T 1 , then the server's load is categorized as category 1 (i.e., for the purposes of load, that server is considered to be in Band 1). If the measured load on a server is between T i and T i+1 , then the server's load is categorized as category on band T i , and otherwise, if the server's load is greater than T k-1 then that server's load is categorized as being in band k.
  • FIG. 8A shows K bands with k ⁇ 1 thresholds.
  • three bands e.g., denoted “Green”, “Yellow”, “Red”
  • two threshold values T 1 and T 2 ) are needed in order to map or convert an load reading to a band. If the measured load on a server is less than T 1 then that server's load is considered to be Green; if the measured load on a server is equal to or greater than T 1 but less than T 2 then that server's load is considered to be Yellow; otherwise, if the measured load on a server is equal to or greater than T 2 then that server's load is considered to be Red.
  • FIG. 9 is a flowchart showing an exemplary categorization of a server's load into bands.
  • the measured load on a server is denoted L
  • there are k bands denoted Band 1, Band 2 . . . Band k
  • there are k ⁇ 1 threshold values denoteted T 1 , T 2 . . . , T k-1 ).
  • each server in a cluster should use the same categories (e.g., Band 1, Band 2 . . . Band k or “Green”, “Yellow”, “Red”).
  • categories e.g., Band 1, Band 2 . . . Band k or “Green”, “Yellow”, “Red”.
  • the load categorization thresholds (T 1 , T 2 . . . , T k-1 ) are preferably set in advance when the system is configured. However, it should be appreciated that the thresholds may be changed dynamically during operation of the system.
  • Each cluster server 14 - j in the cluster 10 also includes a load table (denoted 26 - j for the j-th cluster server) in which the server stores its own load and the loads of the other servers in the cluster 10 .
  • Each server may provide its load to each other server in the cluster, using, e.g., the heartbeat mechanism.
  • An exemplary load table 26 - j is shown in FIG. 10 .
  • Each server may provide its load as a measured value or as a value mapped to a band or category.
  • servers map their own loads to categories, since the load categories can be represented by fewer bits, and therefore fewer bits are required to provide the load information to the other servers. For example, only two (2) bits are needed when server load is categorized into three bands (“Green”, “Yellow”, “Red”).
  • the following table is an example of a load table for a cluster with eight (8) servers, where the load on each server is one of “Green”, “Yellow”, “Red”.
  • servers S1, S5, and S6 have load “Green”
  • servers S2, S7, and S8 have load “Yellow”
  • servers S3 and S4 have load “Red.”
  • FIG. 11 is a flowchart of process(s) performed by each server in a cluster to determine its own load and to maintain its load table.
  • the server determines its load (by some measure of load) (at 1102 ), categorizes its load (at 1104 ), and updates its load table with its current load category (at 1106 ).
  • the server then provides its load category to the other servers in the cluster (at 1108 ), e.g., using the heartbeat mechanism 18 .
  • the server may also obtain load categories from other servers in the cluster and update its load table if necessary (at 1110 ). These processes are repeated while the system is in operation.
  • load determination ( 1102 ), categorization ( 1104 ), and table update ( 1106 ) may be performed at certain intervals, whereas the provision of load category information to the other servers ( 1108 ) may occur on each heartbeat.
  • the cluster servers may use their own load information and load information about other servers in the cluster in order to determine which cluster server is to handle a request (i.e., which cluster member is going to handle incoming traffic on a certain connection). As previously described, the cluster servers may make an initial determination of which cluster server is to handle a connection request, e.g., using values associated with the request.
  • the router 11 when the router 11 gets a request to connect to a server with the IP address VIP (shared by the cluster servers 14 - j ), the router maps the IP address VIP to a special MAC address that causes the switch 12 to forward the request to each server connected thereto, so that each member of the cluster 12 (i.e., each server 14 ) sees all incoming traffic (addressed to VIP).
  • each cluster member when a TCP connection is first established, each cluster member (i.e., each server 14 ) effectively decides for itself whether or not to handle a connection.
  • a stateful firewall may use a mapping function (e.g., a hash function such as jhash) to select which cluster server is to handle a connection.
  • a mapping function e.g., a hash function such as jhash
  • the system using the load information operates in a similar manner, except that heavily loaded servers may be omitted from the selection process (or may decline connections for which they are initially determined to be responsible). For example, consider the system described above with eight (8) cluster servers having their loads categorized as follows:
  • each (live) server will determine whether it is responsible for a connection request (e.g., using a hash function) (at 1202 ).
  • the selection function will always select only one server.
  • the particular server that is responsible for the connection determines whether its load is too high to accept the connection (at 1204 ). This determination may be made using information in the load table on the server.
  • a server may consider its load to be too high if its load is in the “Red” band.
  • the servers S 3 and S 4 may consider their loads to be too high. If the selected server does not consider its load to be too high to accept the connection, the server accepts responsibility and makes the connection (at 1206 ). It should be appreciated that this server may subsequently hand off the connection to another server after an actual connection is established, e.g., based on other policy considerations. For example, as described above, the initial server may make a connection handoff to another server based on the actual request being made.
  • the server may try to find another server (within the cluster) to handle the connection.
  • the server may use its load table to determine if there is another server in the cluster with a load that is lighter that its load (at 1208 ). For the purposes of this comparison, a server will treat all servers in the same load category as having the same load. If there are k load categories, with category 1 being the lightest loaded and category k being the heaviest loaded, the server may try to find a server in the lightest load category, then in the next lightest category, and so on, until it reaches its own category.
  • a “Red” server will first try to find a “Green” server to handle the request, and, failing that will try to find a “Yellow” server to handle the request.
  • server S4 was initially selected as the server to handle the connection, that server would select one of the “Green” servers (S1, S5, S6) to take on the request.
  • the initially-selected server takes responsibility for the connection (at 1206 ), otherwise it passes responsibility to a server with a lighter load (at 1210 ). Note that the lighter loaded server may decline to take the connection.
  • the system may implement a policy in which only servers in the heaviest loaded category (e.g., Red servers) may switch load, and then only to a server in the lightest loaded category on that cluster.
  • a Red server may switch to a Green server, and may only switch to a Yellow server if there are no Green servers.
  • a cluster may try to maintain a certain ratio of Green (or Yellow) servers relative to the total number of (live) servers.
  • Another way to use the load information in initial server selection is to use the load information to essentially reduce the set of candidate machines For example, all machines categorized as having the heaviest load (e.g., Red) can be removed from the selection by being treated as inactive. Those servers become active again when their load drops to a lower category.
  • the heaviest load e.g., Red
  • An alternative approach reduces the set of candidate machines for initial server selection based on a configured threshold, based, e.g., on a percentage of the size of the cluster.
  • the threshold may be set to X, where X represents a percentage of the number of live servers connected to ports of the switch and having the same VIP.
  • the selection algorithm proceeds using only the Green machines, with all other machines being considered offline (for selection purposes).
  • Green machines If there are not enough Green machines (i.e., the number of Green machines is less than the threshold), then the machines with their loads in the next band (Yellow) are considered candidates (i.e., are treated as if online), such that the total of all of the machines in those bands is larger than the configured threshold. All machines make the same determination, however non-candidate machines cannot be selected or select themselves.
  • the threshold used to select candidate machines for initial server selection is not the same as the thresholds discussed above that are used to mark the load category or band boundaries.
  • the value X used for the threshold may be, for example, 5%, 10%, 15%, 20%, 30%, 40%, 50%, and so on. Those of skill in the art will realize and understand, upon reading this description, that other values of X may be used.
  • FIG. 12B is a flowchart showing selection of the set of candidate machines for initial server selection.
  • the threshold value T is 40% of eight which is rounded to 3.
  • the candidate selection algorithm is:
  • the candidate servers is set to the Green servers (at 1214 ). Otherwise, if the number of Green servers (#G) and the number of Yellow servers (#Y) is greater than or equal to the threshold (T) (at 1216 ), then the candidate servers is set to the Green and Yellow servers (at 1218 ). Otherwise the candidate servers are set to be the Green, Yellow, and Red servers (at 1220 ).
  • red a server whose load is categorized as “red”
  • red may still be a candidate for server selection if there are too few green and yellow machines (based on the threshold).
  • the candidates for selection i.e., the machines considered online
  • the machines of the lowest loaded N bands such that the total number of machines in those bands is larger than a configured threshold.
  • the remaining machines, not in the candidates, are considered offline for the purposes of initial server selection.
  • candidate servers may also be used for subsequent decisions (i.e., after the initial selection) regarding server selection and handoff.
  • a initial server is selected (at 1302 ) to handle a connection request from a client (whether or not using load information) to handle a connection request from a client (whether or not using load information)
  • that initial server then establishes a TCP connection with the client (at 1304 ), as described above.
  • the initial server will then obtain an actual request (e.g., an HTTP request) from the client (at 1306 ). Based on that request, the initial server may determine that a different server should preferably handle the request (at 1306 ), and so the initial server may attempt to handoff the connection to another cluster server (as described above) (at 1308 ).
  • the initial server determines that it should not try to handoff the request to another server (at 1308 ), then the initial server handles the request (at 1310 ). If the initial server determines that it should try to handoff the request to another server (at 1308 ), then the other cluster server may accept or reject the handoff request (at 1312 ). If the other cluster server rejects the handoff (at 1312 ), then the initial server handles the request (at 1310 ). If the other cluster server accepts the connection handoff, then that other server handles the request (at 1314 ). The selection of the initial server (at 1302 ) may use the load-aware selection described above or some other technique.
  • the other cluster server may serve the client via the initial server, essentially using the initial server as a proxy. In these cases there is no handoff of the TCP/IP connection.
  • the decision as to whether or not to accept a handoff or use the initial server as a proxy may be made by the other server based, e.g., on its load, on the popularity of the requested resource, and/or on other factors.
  • FIG. 14 is an exemplary flowchart of a system in which the initial server acts as a proxy for another cluster server.
  • the initial server acts as a proxy for another cluster server.
  • that initial server then establishes a TCP connection with the client (at 1404 ).
  • the initial server obtains an actual request (e.g., an HTTP request) from the client (at 1406 ).
  • the initial server may determine that a different server should preferably handle the request (at 1406 ), and so the initial server may attempt to handoff the connection to another cluster server (at 1408 ). If the initial server determines that it should not try to handoff the request to another server (at 1408 ), then the initial server handles the request itself (at 1410 ).
  • the other cluster server may accept or reject the handoff request (at 1412 ). However, in this case, in addition to an outright rejection of the request, the other server may decline the connection handoff and still remain involved and serve the request to the client through the initial server (as a proxy). In this case there is no connection handoff. If the other cluster server rejects the handoff (at 1412 ), then it is determined (at 1414 ) whether the initial server should act as a proxy (i.e., whether the other server should serve the client request through the initial server). Preferably the other server determines whether it will serve the request via the initial server, however it should be appreciated that either of the peer servers may chose to handle the request in a proxy manner.
  • the initial server handles the request (at 1410 ), otherwise the other server handles the request via the initial server as a proxy (at 1416 ).
  • the initial server determines whether or not to try to handoff the connection to a peer server (e.g., another server in the cluster) (e.g., at step 1308 in FIGS. 13 and 1408 in FIG. 14 ).
  • a peer server e.g., another server in the cluster
  • the decision to switch to a peer server may be based on a number of factors.
  • the decision to switch to a peer may be based, at least in part, on load information about the initial server and/or the other server.
  • the decision may be based, at least in part, on load information in the load table, including, in some cases, load information about all of the other servers in the cluster.
  • the decision (at 1414 ) as to whether or not the initial server is to act as a proxy for the second server may be based, at least in part, on load information about one or more of the servers.
  • migration to another server may occur as a result of a peering query event.
  • the initial contact machine may accept the connection from the client, read the request and if it has the resource to hand, serve the resource. However, if the initial contact server does not have the requested resource, it will query its peers to see if they have it. The normal response would then indicate from which of those peers (if any) the initial server could fill the resource.
  • the peer response includes the ‘migrate the connection to me’ or ‘proxy the resource through’ options.
  • the initial contact server will migrate the connection, e.g., as described above.
  • the initial contact server will proxy the response through the designated server. It should be appreciated that the ‘proxy the resource’ option can be used with machines in a different cluster than the one that includes the initial contact server.
  • Exemplary processing of peering query events is described with reference to the flowchart in FIG. 15 .
  • a initial server is selected (at 1502 ) to handle a connection request from a client
  • that initial server then establishes a TCP connection with the client (at 1504 ).
  • the initial server obtains an actual request (e.g., an HTTP request) from the client (at 1506 ). If the initial server has the requested resource (at 1508 ), then it serves that resource (at 1510 ), otherwise it queries its peers for the resource (at 1512 ).
  • the initial server obtains and analyzes any peer response(s) (at 1514 ).
  • the initial server may obtain the resource from that peer (at 1516 ). If a peer responds with a “migrate the connection to me” response, the initial contact server may migrate the connection (e.g., as described above) to that peer (at 1518 ). If a peer responds with a “proxy the resource through” response (identifying a second server), the initial contact server may the proxy the resource through that second server (at 1520 ). If no peer responds, the initial server may obtain the resource in some other manner, e.g., from an origin server or the like.
  • the present invention operates on any computer system and can be implemented in software, hardware or any combination thereof.
  • the invention can reside, permanently or temporarily, on any memory or storage medium, including but not limited to a RAM, a ROM, a disk, an ASIC, a PROM and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Computer And Data Communications (AREA)

Abstract

A load-balancing cluster includes a switch having ports; and servers connected to at least some of the ports. The servers are each addressable by the same virtual Internet Protocol (VIP) address. A first server of the plurality of servers establishing a Transmission Control Protocol (TCP) connection with a client computer, and, in response to a resource request received by the first server from the client computer for a particular resource, if the first server does not have a copy of the particular resource it queries one or more peers regarding the particular resource. Based at least in part on responses from the peers, the first server either: obtains the particular resource from a peer; or migrates the TCP connection to a peer; or serves the particular resource to the client request through a second server.

Description

    RELATED APPLICATIONS
  • This application is related to and claims priority under 35 U.S.C. §119(e) to co-pending and co-owned U.S. patent applications No. 61/582,298, filed Dec. 31, 2011, titled “Load-Balancing Cluster,” and No. 61/582,301, filed Dec. 31, 2011, titled “Load-Aware Load-Balancing Cluster,” the entire contents of both which are fully incorporated herein by reference for all purposes.
  • COPYRIGHT STATEMENT
  • This patent document contains material subject to copyright protection. The copyright owner has no objection to the reproduction of this patent document or any related materials in the files of the United States Patent and Trademark Office, but otherwise reserves all copyrights whatsoever.
  • FIELD OF THE DISCLOSURE
  • This invention relates to content delivery.
  • GLOSSARY
  • As used herein, unless stated otherwise, the following terms or abbreviations have the following meanings:
  • MAC means Media Access Control;
  • MAC address means Media Access Control address;
  • IP means Internet Protocol;
  • TCP means Transmission Control Protocol;
  • “IP address” means an address used in the Internet Protocol to identify electronic devices such as servers and the like;
  • ARP means Address Resolution Protocol;
  • HTTP means Hyper Text Transfer Protocol;
  • URL means Uniform Resource Locator;
  • IGMP means Internet Group Management Protocol;
  • DNS means Domain Name System.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The following description, given with respect to the attached drawings, may be better understood with reference to the non-limiting examples of the drawings, wherein:
  • FIG. 1 depicts a load-aware load-balancing cluster;
  • FIG. 2 depicts an exemplary TCP connection handoff;
  • FIGS. 3-4 are flowcharts of a TCP connection handoff;
  • FIG. 5 depicts a collection of load-balancing clusters;
  • FIGS. 6A-6B depict a flowchart of processing associated with server interactions;
  • FIG. 7 is a flowchart of processing associated with server interactions;
  • FIGS. 8A-8B show categorization of server load;
  • FIG. 9 is flowchart of processing associated with server load categorization;
  • FIG. 10 is an exemplary load table;
  • FIG. 11 is a flowchart of processing associated servers sharing load information;
  • FIGS. 12A-12B are flowcharts of processing associated with load-aware request processing; and
  • FIGS. 13-15 are flowcharts of processing associated with request processing.
  • THE PRESENTLY PREFERRED EXEMPLARY EMBODIMENTS
  • As shown in FIG. 1, a load-balancing cluster 10 is formed by an n-port switch 12 connected to a number (between 1 and n) of servers 14-1, 14-2, . . . , 14-m, where m≦n (collectively “servers 14”) via ports 16-1, 16-2, . . . , 16-n. Not every port 16-k of the switch 12 needs to have an actual (or operating) server 14 connected thereto. The switch 12 is preferably an Ethernet switch.
  • Each server 14-j includes a processor (or collection of processors) constructed and adapted to provide data in response to requests. In presently preferred implementations, all servers are the same and run the same version of operating system (OS), with same kernel and software. However, those skilled in the art will realize and understand, upon reading this description, that the servers may be any server running any type of server processes. Those skilled in the art will further realize and understand, upon reading this description, that the servers need not all be the homogeneous, and heterogeneous servers are contemplated herein.
  • Each server 14-j in the cluster 10 is addressable by a unique hardware address—in the case of the Ethernet, a so-called a MAC address (also known sometimes as an Ethernet address). For the purposes of this description, the MAC or actual hardware address of the j-th cluster server is denoted MACj.
  • The servers 14 in the load-balancing cluster 10 are all assigned the same virtual IP address (VIP), e.g., “10.0.0.1”. Each server preferably also has at least one other unique (preferably local) IP address, denoted IPj for the j-th cluster server. Preferably a VIP address is also has MAC address (denoted MACVIP) associated with it, with the VIP's MAC address being shared by all the servers in a cluster. That is, in preferred embodiments, the (VIP, VIP's MAC address) pair, i.e., (VIP, MACVIP) is the same for all the servers in a cluster. However, as noted, each server also preferably has its own private (IP address, IP's MAC address) pair (e.g., (IPi, MACi)).
  • The servers 14 in cluster 10 are addressable externally (e.g., from network 17, e.g., the Internet) via the local (Ethernet) network 13 and switch 12. For example, using router 11, an external request from client 19 via network 17 (such as the Internet) to the IP address VIP is directed via the switch 12 to each real cluster server 14-j connected thereto. The switch 12 forwards Ethernet frames, preferably as fast and as efficiently as possible. The switch 12 may perform one-to-one (unicast) forwarding or one-to-many (broadcast or multicast) forwarding. In unicast forwarding a packet enters the switch on one port and leaves the switch on another port. In the case of broadcast or multicast forwarding packet enters the switch on one port and multiple copies of the same packet leave the switch on many ports. When broadcast forwarding (using, e.g., a so-called “unlearned” unicast MAC address), the switch sends all incoming packets to every port, whereas when multicasting mode (using a multicast MAC address), the switch sends all packets to those ports that have servers connected thereto. In either case, the desired result is that all cluster members—i.e., all servers 14 connected to the switch 12—get all packets destined for the IP address VIP.
  • In case of multicast MAC address, the switch 12 may use so-called “IGMP snooping” to learn which physical ports belong to live servers. In case of an “unlearned” unicast MAC address, the switch 12 forwards incoming traffic to all ports.
  • The system is not limited by the manner in which the switch 12 provides packets to the servers 14 connected thereto. Those skilled in the art will realize and understand, upon reading this description, that different and/or other methods of achieving this result may be used.
  • In a local Ethernet network, an Ethernet MAC address is used to identify a particular host machine connected to the network. In such a network, a protocol such as, e.g., ARP, may be used to translate between a host's IP address and its Ethernet MAC address. For example, a host on an IP network wishing to obtain a physical address broadcasts an ARP request onto the IP network. A host on the network that has the IP address in the request then replies with its physical hardware address.
  • An IP router provides a gateway between two (or more) IP networks. The purpose of an IP router is to forward IP packets from one IP network to another. An IP router should have an interface and IP address in each network to which it is connected. So, IP router 11 has at least two interfaces and two IP addresses: one IP address to connect to the upstream network (17 in FIG. 1) and the other IP address to connect to the local Ethernet network (13 in FIG. 1).
  • A request from client 19 is made to the IP address VIP (via network 17) and reaches the router 11. The request comes into the router 11 via the interface connected to the upstream network 17, and the router 11 forwards the request to the VIP (on the local Ethernet network 13).
  • Because the local network 13 is an Ethernet network and because router 11 is connected directly to the local network 13, the router 11 encapsulates the IP packet (i.e., the request) into an Ethernet packet before sending it. In order for the router 11 to know where to send the Ethernet packet, the router makes an ARP request. Once the Ethernet packet is sent, the switch 12 forwards it to the server(s) 14.
  • In order to affect ARP mapping, a router (e.g., router 11) typically maintains a so-called ARP table 15 (mapping IP addresses to the MAC addresses of hosts connected thereto). In this manner, when an IP packet is sent to a particular host that is connected to the router 11, the router automatically resolves to the destination host's MAC address and forwards the packet to the appropriate host. The router 11 will try to deliver the IP packet directly to destination (i.e., the VIP) because the router is connected to the same local Ethernet network.
  • Certain special MAC addresses (e.g., broadcast or multicast) can be used to instruct a switch to broadcast (or multicast) a packet, thereby providing a packet to all hosts connected to that switch. Specifically, e.g., an Ethernet switch sends a packet with a broadcast or multicast MAC address in its destination field to every port (or every port with a server connected thereto), whereby every host/server connected to the Ethernet switch should get a copy of the packet.
  • In order for two machines (e.g., client 19 and one of the servers 14) to interact, a network connection must be established between them. The client 19 has the IP address of a server (in this case VIP), and tries to establish a connection via the network 17 and the router 11.
  • When the router 11 gets a request to connect to a server with the IP address VIP (shared by the cluster servers 14-j), the router maps the IP address VIP to a special MAC address that causes the switch 12 to forward the request to each server connected thereto. In the case of the load-balancing cluster 10, preferably the switch 12 treats the MAC address for a VIP as a multicast Ethernet address. Consequently, each member of the cluster 12 (i.e., each server 14) sees all incoming traffic (addressed to VIP). The router's ARP table 15 thus gets a multicast Ethernet address for the VIP, and thus, at the IP layer, all incoming traffic to the VIP address is provided to all servers 14 connected to the switch 12.
  • In a presently preferred implementation, the switch 12 maintains a so-called “forwarding database,” (FDB 23 in FIG. 1) to map destination Ethernet MAC addresses to physical Ethernet ports 16 on switch 12. When switch 12 receives an Ethernet packet, the switch queries the forwarding database (e.g., using the destination MAC address as a key) and tries determine which physical port should be used to send the Ethernet packet out. This forwarding database 23 allows switch 12 to forward Ethernet packets only where they should go.
  • However, when switch 12 receives an Ethernet packet and cannot find an entry in its forwarding database for a destination Ethernet MAC address (i.e., e.g., in the case of an unknown/unlearned MAC address), the switch forwards such an Ethernet packet to all the ports (except the one it came from).
  • A multicast Ethernet MAC address has entry in the switch's 12 forwarding database instructing it to forward Ethernet packet to multiple ports 16.
  • An Ethernet switch will generally try to learn by looking at the MAC addresses of all the Ethernet packets passed through the switch and will try to update its forwarding database accordingly. However, it is preferable to ensure that the switch 12 never “learns” about MAC address for the VIP and never builds an association between VIP cluster MAC addresses and physical ports 16. The switch 12 is thereby forced to always forward Ethernet packets destined for the cluster MAC address (and thus the cluster VIP) to multiple/all ports 16.
  • Those skilled in the art will realize and understand, upon reading this description, that different and/or other ways of causing the switch to provide incoming data to all cluster members may be used.
  • Having found a cluster server with the IP address VIP, a TCP connection must be established between the client 19 and that cluster server 14. A TCP connection is established between two machines, in part, using a well-known three-way handshake (SYN, SYN/ACK, ACK). This protocol is described, e.g., in “RFC 793—Transmission Control Protocol,” September 1991, the entire contents of which are incorporated herein by reference for all purposes.
  • In the cluster 10, when a TCP connection is first established, each cluster member (i.e., each server 14) effectively decides which server 14 will handle a connection. In effect, each cluster member decides for itself whether or not to handle a connection. Once a particular cluster member takes (or is given) responsibility for a particular connection, the other cluster members do not handle (and need not even see) traffic related to that connection. The manner of server selection is described below.
  • Each cluster member (server) includes a stateful firewall (FW) mechanism that is used to filter unwanted incoming traffic. In FIG. 1, for the purposes of this discussion, the firewall mechanism for the j-th server is denoted 20-j. Upon receipt of an IP packet, the firewall first determines whether the packet is for an old (i.e., already established) connection or for a new connection. For already-established connections each firewall mechanism is configured to reject incoming traffic that does not have an entry in its firewall state table 22, and only to accept incoming traffic that does have an entry in its firewall state table. In FIG. 1, the firewall table for the j-th server is denoted 22-j. The firewall must still inspect packets associated with new connections (i.e., connections in the process of being established, specifically packets with only SYN flag set). To summarize: first the firewalls make a decision as to whether an IP packet is “new” or “old”. If the packet is “old” then it is discarded unless a state entry exists. If the packet is “new” it is passed for further inspection (e.g., load balancing) and then, depending on the results, can be either discarded or accepted.
  • Once it is determined (e.g., as described below) that a particular cluster member 14-j is going to handle incoming traffic on a certain connection, a corresponding entry is created in that member's firewall state table 22-j. Specifically, the cluster member/server creates a firewall state table entry for any packet that belongs to a connection initiated from or accepted by the cluster member. If a packet indicates that a remote host wishes to open a new connection (e.g., via an IP SYN packet), then such packet gets inspected by a firewall rule that determines whether or not the cluster member should accept it. If the packet was accepted by a cluster member, the firewall state table for that cluster member is updated and all subsequent packets on the connection will be accepted by the cluster member. The firewalls of the other cluster members will block packets that they are not supposed to be processing (i.e., packets that do not belong to connections they initiated or accepted).
  • The firewall rule preferably ensures that only one cluster member will accept a particular connection, however in some cases, it is possible that more than one cluster member decide to accept the same connection. This situation would create duplicate responses from the cluster. However, as those skilled in the art will realize and understand, upon reading this description, this is not a problem for a TCP connection because the remote host will only accept one response and discard others. In this scenario only one cluster member will be able to communicate with the remote host, other cluster members will have a stuck connection that will be closed due to timeout. In the case when no servers respond to an initial SYN packet the client will retry and will send another SYN packet after a timeout. While cluster members may have inconsistent state, they should converge and achieve consistent state quickly.
  • The firewall determines which cluster member should handle a particular connection using a given mapping function, preferably a hash function. By way of example, the hash function jhash, a standard hash function supplied in the Linux kernel, may be used. Those skilled in the art know how to produce a number in a particular range from the output of a hash function such as jhash. The hash function produces an integer value. To obtain a value in the range 1 to m, for some m, the output of the hash function is divided by in and the remainder is used (this operation may be performed using an integer remainder or modulo operation). For load balancing in a cluster, the value of in is the number of currently live servers in the cluster. Those skilled in the art will realize and understand, upon reading this description, that the function's output value need not be offset by one if the buckets are numbered starting at zero.
  • Using, e.g., jhash, the function MAP(source IP, in) may be implemented as:
      • (jhash(parameters) modulo in)
  • If there are m alive servers in a cluster, each server 14 performs the (same) mapping function (with the same inputs). Each server or cluster member 14 is associated with a particular local server number (or agent identifier (ID)). E.g., if there are eight servers 14-0, . . . , 14-7, their corresponding agent IDs may be 0, 2, . . . , 7, respectively. Each server compares the result of the mapping function (e.g., hash modulo m) to its local server number. If the result of the mapping function is equal to the local server number, the packet is accepted, otherwise the packet is dropped.
  • Note that the exemplary functions shown above all operate on values related to the particular connection (e.g., source and destination address and port information). However, in a simplified case, the mapping function may be one which merely takes as input the number of active servers (MAP (m)→{1 . . . m}). An example of such a function is a round-robin function. Another example of such a function is one which uses external (possibly random) information. Note, however, that since all servers have to use the same mapping function and have to produce the same result, such a function would need to access a global space and all invocations of such a function (from each cluster server) would need to be operating on the same values.
  • Example I
  • By way of example, and without limitation, consider a cluster with 8 ports and with 7 active servers connected to those ports as shown in the following table:
  • Port #. 0 1 2 3 4 5 6 7
    Server S0 S1 S2 S3 S4 S6 S7
    Bucket 0 1 2 3 4 5 6
  • In this case, the number of active servers, m, is 7, there are seven buckets (numbered 0 to 6), and so the mapping function should produce a number in the range 0 to 6. Suppose, for the sake of this example, that the mapping function is:
      • MAP (source IP, destination IP, destination port, m)=hash (source IP, destination IP, destination port) modulo m
  • If a connection request comes in from IP address 123.156.189.123, for the VIP (1.0.0.1) on port 80. Each server runs the mapping function:
      • hash (123.222.189.123, 1.0.0.1, 80) modulo 7
  • Suppose that this mapping produces a value of 4 then server S4 (which corresponds to bucket 4) handles the connection. Suppose that at some time one of the servers, e.g., S3 becomes inactive. The status of the cluster is then as follows:
  • Port #. 0 1 2 3 4 5 6 7
    Server S0 S1 S3 S4 S5 S6
    Bucket 0 1 2 4 4 5
  • Notice that the association between servers and buckets has changed, so that server S4 is now associated with bucket 3, and server S5 is associated with bucket 4. Now, as there are only five “alive” severs, the mapping function must produce a value in the range 0 to 5. If a new connection comes in, and if the mapping function produces a value 4, then server S6 (not S5) will handle this connection.
  • If a new server S7 is connected to port 5, the number of servers becomes 7 and the status of the cluster would be:
  • Port #. 0 1 2 3 4 5 6 7
    Server S0 S1 S2 S4 S7 S5 S6
    Bucket 0 1 2 3 4 5 6
  • End of Example I
  • Those skilled in the art will realize and understand, upon reading this description, that the buckets may be renumbered or reordered in different ways when a server is added to or removed from the cluster. For example, it may be desirable to give the new server the bucket number 5 and to leave the other servers as they were. It should be noted that existing connections are not affected by server/bucket renumbering because load balancing is only performed on new connections. Existing (i.e., old) connections handled entirely in the firewall.
  • Heartbeat
  • Each cluster member 14 includes a so-called heartbeat processes/mechanism 18. Each heartbeat mechanism 18 (on each cluster member 14) is a process (or collection of processes) that performs at least the following tasks:
      • monitors server configurations on the cluster;
      • answers ARP queries for the configured VIPs;
      • monitors local state and state of other cluster members; and
      • controls local load balancing firewall configuration.
  • The heartbeat monitors the state of VIPs on servers. Each server may have more than one VIP configured, and the heartbeat keeps track of each VIP's state separately.
  • While described herein as a single mechanism, those skilled in the art will realize and understand, upon reading this description, that the various functions of the heartbeat mechanism can each be considered a separate function or mechanism.
  • The Heartbeat Mechanism Monitors Server Configuration on The Cluster
  • The heartbeat mechanism 18 on each cluster member/server 14 determines its own state as well as that of each VIP on other cluster members. (In order to simplify the drawing, not all of the connections between the various heartbeat mechanisms are shown in FIG. 1.)
  • On each cluster member/server, heartbeat mechanism 18 maintains information about other VIPs in the cluster 10 (preferably all other VIPs). To this end, the heartbeat mechanism 18 builds and maintains a list of VIPs connected to the switch 12, and then, for each of those VIPs, maintains (and routinely updates) information. The heartbeat mechanism 18 on each server 14 first builds a list of network interfaces in the system and obtains information about IP addresses on these interfaces. The heartbeat mechanism 18 may, e.g., use, as its main input, a table containing information about the local cluster and VIPs. In general, an external process may provide VIP configuration on the local cluster to the heartbeat process, e.g., in a form of table. Those skilled in the art will know and understand, upon reading this description how such a process and table may be defined and configured.
  • The heartbeat mechanism 18 considers each VIP in the cluster 10 to be in one of three states, namely “configured”, “connecting” and “connectable”. In order to maintain these states, the heartbeat mechanism 18 obtains a list of VIPs that should be configured on the cluster 10. Each VIP from the list is preferably cross-checked against list of IP addresses on all interfaces. If a match is found, the VIP is marked as “configured”. (A VIP is in the “configured” state—when the VIP is configured on one of the local (to host) interfaces). For every VIP marked as “configured”, the heartbeat mechanism 18 tries to initiate a TCP connection on a specified port, e.g., either 80 or 443. As soon as connection to a VIP is initiated, the VIP is marked as “connecting”. If connection to a VIP is successful, the VIP is marked as “connectable”. A VIP's state is “connecting” when a TCP health check is currently in-progress; a VIP's state is “connectable” when the most recent TCP health check succeeded.
  • The heartbeat mechanism 18 continuously performs the actions described above, preferably at fixed, prescribed time intervals.
  • If a VIP changes it's state or completely disappears from the list of IP addresses, a state transition in noted. Servers are automatically configured (or removed) on (from) loopback clone interfaces as needed. In a presently preferred implementation, the heartbeat mechanism takes over the first 100 (1o:0-1o:99) loopback clone interfaces. If needed, manual loopback interfaces can be configured starting from 1o:100 and up.
  • The Heartbeat Mechanism Answers ARP Queries for the Configured VIPs
  • Each active heartbeat mechanism 18 continuously listens for ARP requests. Upon receipt of an ARP request, the heartbeat mechanism examines request to see if it relates to a VIP that should be configured on the cluster. If the ARP request does relate to a VIP, the heartbeat mechanism checks if the VIP is in “configured” state and if so, the heartbeat mechanism replies with an ARP reply for that VIP.
  • Although multiple heartbeat mechanisms may reply to the same ARP request, this is not a problem, since they will each return the same MAC address (MACVIP).
  • The Heartbeat Mechanism Monitors Local State and State of Other Cluster Members
  • The heartbeat mechanism 18 preferably tries to maintain full state information for all servers 14 in the cluster 10. State per cluster preferably includes one or more of: (a) number of cluster members that should serve traffic for the cluster, (b) number of cluster members that are serving traffic for the cluster; and (c) timestamp information. Those skilled in the art will realize and understand, upon reading this description, that different and/or other state information may be maintained for the cluster and for cluster members.
  • Each heartbeat mechanism preferably announces its full state to other cluster members at a prescribed time interval. State updates are preferably sent to a multicast UDP address which is shared by all cluster members. (Note: this UDP multicast is not the same as the VIP multicast discussed above.) The heartbeat mechanism can also be configured to send multiple unicast UDP messages to each member of the cluster when performing state announcing.
  • Each heartbeat mechanism updates its state upon receiving state update from other cluster members if the following conditions are met: the server is present on the receiving cluster member and the received state is “newer” (per timestamp) than the current state on receiving cluster member. Since a timestamp is used, preferably clocks on all cluster members are synchronized.
  • At prescribed time intervals a heartbeat mechanism 18 analyzes its state and checks for state transitions. The heartbeat mechanism checks each server's state and makes sure that it is fresh. So-called “non-fresh” servers are automatically considered as “down”. Each server's state is compared to its previous state, and, if different, a state transition is noted.
  • Changes to VIP state are made as they detected, based on the current heartbeat's view of the cluster.
  • Inter-Cluster Handoff
  • As described thus far, server selection has been made within a cluster by the cluster members at the TCP level. The system does not require a load balancing switch, thereby reducing the cost. Instead, as described, the system duplicates incoming (client-to-cluster) traffic to all servers in the cluster and lets each server decide if it is to deal with particular part of the incoming traffic. All servers in the cluster communicate with each other and decide on an individual server's health.
  • Another level of server selection—within a cluster—is also provided, as a result of which an initially-selected server (selected as described above) may pass on (or attempt to pass on) responsibility for a particular connection to another cluster member. For example, if one server in a cluster has already handled a particular request for a certain resource, that server may have that resource cached. The server with the already-cached copy of the resource may then be a better choice than another server in the cluster to process a request.
  • Accordingly, in some cases, after receiving a request from a client for a certain resource (after a server has been selected and the TCP connection has been established, as described above), the server may ascertain whether it is responsible for handling/serving the resource, and, if not, the previously-selected server may notify (or provide a notification) to another cluster member that is responsible for handling the resource (e.g., another cluster member that already has a copy of the requested resource). The notification may include a hand-off request to so that another cluster member responsible for the resource can server the resource itself. Or, alternatively, the notification may include a request for a copy of the resource (e.g., via a peer-fill request) from another cluster member responsible for the resource (i.e., that already has a copy of the requested resource).
  • The cluster member responsible for (handling) the requested resource may process the notification from the previously or originally selected server in a number of ways. For instance, a cluster member that has previously served the requested resource (or that is “responsible” for handling the request, or already has a copy of the requested resource) may determine whether to accept or reject a hand-off request (or a peer-fill request) from the previously or originally selected server. For example, the other cluster member may decide to accept or reject the hand-off request (or peer-fill request) based on various attributes of the requested resource such as, but not limited to, the size and popularity of the requested resource.
  • In one embodiment, the responsible server accepts a hand-off request (or rejects a peer-fill request) if the size of the request resource exceeds a threshold value. This step is advantageous because copying a large resource to the previously selected server is inefficient and would not be a worthwhile expenditure of system and network resources. If, on the other hand, the size of the requested resource is small (i.e., does not exceed a size threshold), then it may be worthwhile to reject the hand-off request (or accept the peer-fill request) and provide a copy of the requested resource to the previously selected sever so that the previously selected server can handle the request.
  • According to another example embodiment, if it determined that the requested resource is popular (i.e., exceeds a popularity threshold), then the responsible server may reject the hand-off request (or accept/honor the peer-fill request) and (indirectly) force the previously selected server to obtain and serve the requested resource (or simply provide a copy of the requested resource to the previously selected server). Since the resource is popular and, thus, likely to continue to be requested frequently, it would be beneficial for other servers (i.e., the previously selected server) to have a copy of the requested resource so that the requested “popular” resource can be served more efficiently. For example, in addition to sending a hand-off rejection message, the responsible server may also provide a copy of the requested resource to the previously selected server (or the previously selected server may also obtain a copy of the requested resource from other sources, such as other peers, upstream servers, etc.).
  • As used herein, a “resource” may be any kind of resource, including, without limitation static and dynamic: video content, audio content, text, image content, web pages, Hypertext Markup Language (HTML) files, XML files, files in a markup language, documents, hypertext documents, data files, and embedded resources.
  • Once a TCP/IP connection is made between two machines (e.g., client 19 and a particular cluster member, server 14-k (for some value of k)), the server 14-k may receive a request from the client 19, e.g., for a resource. For example, the server 14-k may receive an HTTP request (e.g., an HTTP GET request) from client 19. Such a request generally includes a URL along with various HTTP headers (e.g., a host header, etc.). The selected server 14-k now determines whether it is responsible to handle this request or whether the request should be passed on to a different cluster member. To make this determination, the selected server 14-k considers the request itself and applies a second given function to at least some of the information used to make the request (e.g., to the URL and/or headers in the request).
  • This second function essentially partitions the request space (e.g., the URL space) so as to determine whether the selected server is, in fact, responsible to for this particular request. If the server determines that it is responsible for the request, it continues processing the request. If not, the server hands-off the request (as described below) on to another cluster member (e.g., server 14-p) that is responsible for the request. Having successfully passed off the request, the cluster member, server 14-k, updates its firewall to reject packets associated with the connection. The responsible cluster member (server 14-p) correspondingly updates its firewall to accept packets associated with this connection.
  • For the sake of this discussion, the function used to partition the requests is referred to as a partition function. The partition function may be a hash function or the like. In some cases the partition function may take into account the nature or type of request or resource requested. For example, certain cluster members may be allocated to certain types of requests (e.g., movies, software applications, etc.). The partition function applied to the URL (and/or other information) can be used to implement a degree of policy based load mapping.
  • Exemplary partition functions are:
      • Partition (URL, m)→{1 . . . m}
      • Partition (URL, host header, in)→{1 . . . m}
      • Partition (URL, HTTP headers, in)→{1 . . . m}
      • where Partition (params, m) is implemented as, e.g.,
      • hash(params) modulo m
      • where m is the number of active servers in the cluster.
  • Those skilled in the art will realize and understand, upon reading this description, that different and or other parameters may be used in the Partition function. Further, not all parts of a parameter need be used. For example, if the URL is a parameter, the function may choose to use only a part of the URL (e.g., the hostname).
  • Since accounting and other information may be included in HTTP headers and/or URLs, such information may be used by the partition function. For example, a cluster may comprise a number of non-homogenous servers. Certain requests may be directed to certain cluster servers based on server capability (e.g., speed) or based on arrangements with customers.
  • In order to hand off a request to another server within its cluster, a server must be able to completely move an individual established TCP connection from one server to another in the same cluster. The following scenario, with references to FIGS. 2-4, describes this operation of the system. As shown in the FIG. 2, the cluster includes two servers: server A and server B. Each of the servers runs a web cache, listening on a shared VIP (and port, e.g., port 80). Remote clients make incoming TCP connections to the VIP and port (as described above).
  • Using the TCP-level load balancing described above, assume that server A is initially selected to accept a particular TCP connection from a client (at S30 in FIG. 3). Server A accepts the connection from the client and waits for the HTTP request from the client. Using information from the HTTP request (e.g., the URL and one or more HTTP headers) server A decides to hand the request off to the server B. That is, the selected server (server A in this example) ascertains (using the partition function described above) whether it is the server responsible for the request (at S31). If the originally-selected server is responsible for the request (at S32), then it handles the request (at S33), otherwise it hands off (or tries to hand off) the request to the responsible cluster member (server B in this example) (at S34). If the handoff is determined to be successful (at S35), then the server responsible for the request (Server B in the example) handles the request (at S36), otherwise the originally selected server (Server A) handles the request (at S37).
  • The hand-off process (S34) takes place as follows (with reference to FIG. 4) (for the purposes of this discussion, assume that server A hands off to server B):
  • First the originally-selected server (Server A) freezes the TCP connection from the client (at S40). The selected server (Server A) then takes a snapshot of the frozen TCP connection (at S41), storing required information about the connection. The originally-selected server (Server A) then sends the snapshot of the frozen TCP connection to the responsible server (server B), preferably using a side communication channel to the responsible server (at S42).
  • The responsible server (Server B) receives the snapshot of the frozen TCP connection from the originally-selected server (Server A) (at S43). Using the snapshot of the frozen TCP connection, the responsible server (Server B) attempts to clone the TCP connection to the remote client (at S44). If the connection was cloned successfully, the responsible server (server B) sends acknowledgement to the originally-selected server (Server A), preferably using the side communication channel to the server A (at S45).
  • Upon receipt of the acknowledgement, the originally-selected server (Server A) closes the frozen TCP connection to the client (at S46).
  • The responsible server (Server B) then thaws the frozen (clone) TCP connection to the client (at S47).
  • With the handoff successful, the responsible server (Server B) continues to process incoming HTTP request from the client (at 52 in FIG. 4).
  • The accepting server may fail to clone connection or may refuse to satisfy handoff request. In these cases a negative acknowledgment will be sent and originating (handoff) server will continue to process original request. Should the responsible server (Server B) decline (or fail to satisfy) the handoff request from the originally-selected server (Server A), server A may thaw the TCP connection and continue to serve it locally.
  • A responsible server generally should not decline a handoff request or a request to take over a connection. However, a responsible server may have to decline a request, for example if its software is being shutdown. Note, too that two or more servers in the same cluster may be responsible for the same content, and may provide a degree of redundancy in content (to reduce fills from the origin server) and also to handle a so-called “flash crowd” when a certain piece of content becomes very popular for a relatively short period time.
  • When a handoff is successful, the responsible server must update its firewall to accept packets relating to that connection (and the server that handed off the connection must update its firewall to no longer accept such packets).
  • It should be apparent that only the server that is actually handling the connection will invoke the partition function. The other servers do not generally have the information required (e.g., the URL) to make the required decision.
  • The server making the handoff may provide the responsible server with information about the request (e.g., the type of request, the URL, the headers, etc.). In this way the responsible server may have sufficient information to satisfy the request.
  • Example II
  • By way of example, and without limitation, consider a cluster with 8 ports and with 7 active servers connected to those ports as shown in the following table:
  • Port #. 0 1 2 3 4 5 6 7
    Server S0 S1 S2 S3 S4 S5 S6
    Bucket 0 1 2 3 4 5 6
  • In this case, the number of active servers, m, is 7, there are seven buckets (numbered 0 to 6), and so the mapping function should produce a number in the range 0 to 6. Suppose, for the sake of this example, that the mapping function is:
      • MAP (source IP, destination IP, destination port, m)=hash (source IP, destination IP, destination port) modulo in
  • If a connection request comes in from IP address 123.156.189.123, for the VIP (1.0.0.1) on port 80. Each server runs the mapping function
      • hash (123.156.189.123, 1.0.0.1, 80) modulo 7
  • Suppose that this mapping produces a value of 4 then server S4 (which corresponds to bucket 4) is selected at the TCP level to handle the connection. Server S4 and the client then establish their connection and the client then sends an HTTP request (e.g., a GET request with a URL (URL1) and header information).
  • Server S4 invokes the partition function:
      • Partition (URL1, host header, 7)
  • Note that the partition function can use the same bucket association as the mapping function or it may use a different association. For example, if the partition function is implementing policy-based or capacity based distribution, then the partition function may need a separate bucket association. For this example, assume that the partition function uses the same bucket association as the mapping function.
  • Suppose that this invocation of the partition function returns a value of 6. This means that server S6 (associated with bucket no. 6) should handle this connection instead of the initially-selected server S4. So server S4 tries to hand off the connection to server S6.
  • Server S4 freezes the TCP connection from the client (at S40 in FIG. 4) and then takes a snapshot of the frozen TCP connection, storing required information about the connection (at S41). Server S4 sends the snapshot of the frozen TCP connection to Server S6, preferably using a side communication channel (at S42). Server S6 receives the snapshot of the frozen TCP connection from Server S4 (at S43). Using the snapshot of the frozen TCP connection, Server S6 attempts to clone the TCP connection to the remote client (at S44). If the connection is successfully cloned, then server S6 sends an acknowledgement to Server S4, preferably using the side communication channel (at S45). Upon receipt of the acknowledgement, Server S4 closes the frozen TCP connection to the client (at S46). Server S6 then thaws the frozen (clone) TCP connection to the client (at S47). With the handoff successful, Server S6 continues to process incoming HTTP request from the client.
  • Suppose now that another connection request comes in, this time from IP address 123.156.111.123, for the VIP (1.0.0.1) on port 80. Each server runs the mapping function:
      • hash (123.156.111.123, 1.0.0.1, 80) modulo 7
  • Suppose that the result of this function is 6 which corresponds to server S6. S6 connects with the client and the client then sends an HTTP GET request with a URL (URL1—the same as in the earlier request) and header information.
  • Server S6 invokes the partition function:
      • Partition (URL1, host header, 7)
  • Again the partition function returns the value 6. However, in this case the server responsible for the request is the one already handling the request, and so no handoff is needed (i.e., the check at S32 will return “YES”). Note that since server S6 has already served the resource associated with URL1, it may still have that resource cached.
  • End of Example II
  • Note that the number of servers connected to the switch could be greater than the number of servers responsible for the VIP. For example, a cluster may be configured with 20 servers connected to the same switch, 10 servers serving one VIP and another 10 servers serving another VIP. In this case the heartbeat assists in load balancing for two VIPs, and each VIP will be load balanced across 10 servers.
  • As shown in FIG. 5, a collection 100 of load-balancing clusters 10-1, 10-2, . . . , 10-p, may be combined. Each cluster 10-j has one or more corresponding VIPs (VIP-j), so that requests for a server at the IP address VIP-k (for some value of k) will be directed (by router 110) to the appropriate cluster for handling by one of the cluster members. The router 110 may be, e.g., a load balancing router. It should be appreciated that the collection 100 of clusters may form part of a content delivery network (CDN).
  • A client 19 may request a resource and be directed by a server selector system (e.g., DNS or the like) to a cluster. The server selector returns an IP address that happens to be a VIP address. The client then requests the resource from the VIP and, as described above, is connected (during a TCP connection) to a particular cluster member to handle the request.
  • If the cluster implements the partitioning function, then the connection may be handed off to another cluster member.
  • FIGS. 6A and 6B are a flowchart (600-1 and 600-2) of processing steps associated with server interactions.
  • In step 605, the cluster (i.e., via a switch) obtains a connection request to connect to a server associated with the virtual IP address (i.e., any server sitting behind the switch associated with a virtual IP address).
  • In step 610, the cluster (i.e., via the switch) provides the connection request to each server connected to the switch.
  • In step 615, at least one of the plurality of servers connected to the switch determines which of the plurality of servers should handle the connection. Such a determination can be based, for example, on a given function of information used to request the connection.
  • In step 620, if the server that is determined to handle the request does not have a copy of the requested resource, that server then requests to hand-off the connection (i.e., TCP connection) to at least one other of the plurality of servers that does have a copy of the requested resource. Note that the server may request a copy of the requested resource (e.g., via a peer-fill request) from another server that has a copy of the resource instead of sending a hand-off request.
  • In step 625, the server that has a copy of the requested resource determines whether to accept or reject the hand-off request (or reject or accept the peer-fill request) from the server that was originally determined to handle the connection/request. This determination can be based, for example, on the size of the requested resource, the popularity of the requested resource, as well as other attributes that are suitable for determining whether or not a TCP hand-off should occur in a server cluster in response to a request for certain resources.
  • In step 630, the server that has the copy of the requested resource accepts the hand-off request (or rejects the peer-fill request) if the size of the requested resource value exceeds a threshold value. In this example embodiment, if the size of the requested resource is determined to be too large (i.e., exceeds a threshold value) for expending precious system and network resource (i.e., by providing intra-cluster copies of resources, for example, one server sending a copy of a resource to another server in the cluster), then the server with the requested resource will handle the request itself (i.e., serve the requested resources, and, for example, not honor the peer-fill request).
  • In step 635, the server that has the copy of the requested resource accepts the hand-off request (or rejects the peer-fill request) if the popularity of the requested resource does not exceed a popularity threshold value. In other words, if it determined that the requested content is not popular (i.e., the number of times the particular resource has been requested during a retrospective time period does not exceed a threshold value), then the server with the copy of the request resource handles the connection and serves the resource (and, for example, does not honor the peer-fill request). Since the resource is not yet deemed popular, it is likely that the resource will not be requested as often and therefore is would not be efficient to transfer copies of the resource to other servers in the cluster.
  • In step 640, the server that has the copy of the requested resource rejects the hand-off request (or accepts/honors the peer-fill request if a copy of the resource is available) if the popularity of the requested resource exceeds the popularity threshold value. In this example circumstance, since it is determined that the requested content is popular, then it further behooves the cluster to have copies of the requested resource on other servers in the cluster to handle the possibility of more requests for the popular resource. Thus, instead of accepting the hand-off request, the server with the copy of the requested resource rejects the request, which, in one embodiment, forces the requesting server to obtain and serve the requested resource itself (and, thus, maintain a copy of the popular resource, for example, by honoring the peer-fill request and thus providing a copy of the requested resource).
  • In step 645, the server that has the copy of the requested resource rejects the hand-off request (or accepts/honors the peer-fill request if a copy of the resource is available) if the popularity of the requested resource exceeds the popularity threshold value and the size of the requested resource exceeds the threshold size value. This particular step elucidates the significance of popular content. Even if the size of the requested resource is deemed to large to send an intra-cluster copy from one server to another server within the same cluster (i.e., in light of the expenditure to system and network resources within the cluster), the popularity of the content may still make it more efficient in the long run to distribute a copy (or copies) of the requested resource throughout the cluster in anticipation of more requests for the popular content at the cluster. For example, one way to distribute copies of the requested resource is to reject the hand-off request and (either directly or indirectly) force the originally-selected server to handle the connection and ultimately serve the requested resource.
  • FIG. 7 is a flowchart 700 of processing steps associated with server interactions.
  • In step 705, a connection request to connect to a server associated with the IP address is received (e.g., at a cluster comprising a switch and plurality of server connected thereto via one or more ports of the switch).
  • In step 710, a determination is made as to which of the plurality servers is to handle the connection (e.g., via a hash function).
  • In step 720, if a first server of the plurality of servers is determined to be the server to handle the connection (e.g., via the hash function), and the first server does not have a copy of the requested resource, the first server provides a notification to a second server of the plurality of servers that does have a copy of the requested resource. In one example embodiment, the notification indicates that the first server does not have a copy of the requested resource. Alternatively, the notification can include a hand-off request to hand-off the connection to another server (e.g., the second server in this step), and/or a peer-fill request that requests a copy of the requested resource from another server (e.g., the second server in this step).
  • In step 725, in response to receiving the notification from the first server, the second sever determines whether to: i) provide a copy of the requested resource to said server (e.g., reject a hand-off request or accept a peer-fill request if a copy of the requested resource is available), or ii) request the server to handoff the connection to the second server so that the second server can serve the requested resource (e.g., accept a hand-off request or reject a peer-fill request). For example, in one embodiment this determining may be based on an attribute of the requested resource (e.g., size, popularity, etc.).
  • Load-Aware Load Balancing
  • In some embodiments each server in a cluster measures its own load and maintains information about the load of the other servers in the same cluster. For example, as shown in FIG. 1, each server 14-j in the cluster 10 includes a load mechanism denoted 24-j for the j-th cluster server. The load mechanism is constructed and adapted to measure the load on the cluster server using known load measurement techniques. In preferred embodiments the load mechanism on each particular cluster server converts or maps the load on that cluster server into one of a relatively small number of bands. These bands may be based on predetermined threshold values. For example, in one preferred implementation the load on a cluster server is mapped into one of three (3) bands or categories, corresponding to light load, medium load, and heavy load. For the sake of this description the three bands are referred to as Green (for lightly loaded servers), Yellow (for servers with medium load), and Red (for servers with a heavy load).
  • In order to categorize a server's load into k categories or bands (Band 1, Band 2 . . . Band k), k−1 threshold values are needed (denoted T1, T2 . . . , Tk-1). If the measured load is less than the first (lowest) threshold value T1, then the server's load is categorized as category 1 (i.e., for the purposes of load, that server is considered to be in Band 1). If the measured load on a server is between Ti and Ti+1, then the server's load is categorized as category on band Ti, and otherwise, if the server's load is greater than Tk-1 then that server's load is categorized as being in band k. FIG. 8A shows K bands with k−1 thresholds. In the case of three bands (e.g., denoted “Green”, “Yellow”, “Red”), as shown in FIG. 8B, two threshold values (T1 and T2) are needed in order to map or convert an load reading to a band. If the measured load on a server is less than T1 then that server's load is considered to be Green; if the measured load on a server is equal to or greater than T1 but less than T2 then that server's load is considered to be Yellow; otherwise, if the measured load on a server is equal to or greater than T2 then that server's load is considered to be Red.
  • FIG. 9 is a flowchart showing an exemplary categorization of a server's load into bands. For the flowchart in FIG. 9, the measured load on a server is denoted L, there are k bands denoted Band 1, Band 2 . . . Band k, and there are k−1 threshold values (denoted T1, T2 . . . , Tk-1). The measured load (L) is compared to the thresholds (at 902). If the measured load L is less than the first threshold value T1 then the category is set to Band 1 (at 904). If the load L is between Tj and Tj+1 for j=2 to k, then the category is set to Band j (at 906), otherwise the load is greater than Tk-1 and the category is set to Band k (at 908).
  • It should be appreciated that because the servers in a cluster are sharing load information in the form of bands or categories (as described further below), each server in a cluster should use the same categories (e.g., Band 1, Band 2 . . . Band k or “Green”, “Yellow”, “Red”). However, those of skill in the art will realize and understand, upon reading this description, that it is not necessary for each server to use the same actual measure of load in order to determine which category or band it is in. This allows servers to self-categorize their load. Preferably all cluster servers in a cluster use the same concept of load.
  • The load categorization thresholds (T1, T2 . . . , Tk-1) are preferably set in advance when the system is configured. However, it should be appreciated that the thresholds may be changed dynamically during operation of the system.
  • Those of skill in the art will realize and understand, upon reading this description, that different and/or other categorizations and names of the loads may be used. Those of skill in the art will realize and understand, upon reading this description, that the “load” on a cluster server may be measured in a number of different ways, taking into account a number of different factors.
  • Each cluster server 14-j in the cluster 10 also includes a load table (denoted 26-j for the j-th cluster server) in which the server stores its own load and the loads of the other servers in the cluster 10. Each server may provide its load to each other server in the cluster, using, e.g., the heartbeat mechanism. An exemplary load table 26-j is shown in FIG. 10. Each server may provide its load as a measured value or as a value mapped to a band or category. Preferably servers map their own loads to categories, since the load categories can be represented by fewer bits, and therefore fewer bits are required to provide the load information to the other servers. For example, only two (2) bits are needed when server load is categorized into three bands (“Green”, “Yellow”, “Red”).
  • The following table is an example of a load table for a cluster with eight (8) servers, where the load on each server is one of “Green”, “Yellow”, “Red”.
  • S1 S2 S3 S4 S5 S6 S7 S8
    Green Yellow Red Red Green Green Yellow Yellow
  • In this example the servers S1, S5, and S6 have load “Green,” servers S2, S7, and S8 have load “Yellow,” and servers S3 and S4 have load “Red.”
  • FIG. 11 is a flowchart of process(s) performed by each server in a cluster to determine its own load and to maintain its load table. The server determines its load (by some measure of load) (at 1102), categorizes its load (at 1104), and updates its load table with its current load category (at 1106). The server then provides its load category to the other servers in the cluster (at 1108), e.g., using the heartbeat mechanism 18. The server may also obtain load categories from other servers in the cluster and update its load table if necessary (at 1110). These processes are repeated while the system is in operation.
  • While the processes are shown in the flowchart in FIG. 11 as sequential, it should be appreciated that not all process steps need to be performed each time. For example, load determination (1102), categorization (1104), and table update (1106) may be performed at certain intervals, whereas the provision of load category information to the other servers (1108) may occur on each heartbeat.
  • Server Selection Using Load Information
  • The cluster servers may use their own load information and load information about other servers in the cluster in order to determine which cluster server is to handle a request (i.e., which cluster member is going to handle incoming traffic on a certain connection). As previously described, the cluster servers may make an initial determination of which cluster server is to handle a connection request, e.g., using values associated with the request.
  • For example, as described above, when the router 11 gets a request to connect to a server with the IP address VIP (shared by the cluster servers 14-j), the router maps the IP address VIP to a special MAC address that causes the switch 12 to forward the request to each server connected thereto, so that each member of the cluster 12 (i.e., each server 14) sees all incoming traffic (addressed to VIP). In the cluster 10, when a TCP connection is first established, each cluster member (i.e., each server 14) effectively decides for itself whether or not to handle a connection. As described above, a stateful firewall may use a mapping function (e.g., a hash function such as jhash) to select which cluster server is to handle a connection.
  • The system using the load information operates in a similar manner, except that heavily loaded servers may be omitted from the selection process (or may decline connections for which they are initially determined to be responsible). For example, consider the system described above with eight (8) cluster servers having their loads categorized as follows:
  • S1 S2 S3 S4 S5 S6 S7 S8
    Green Yellow Red Red Green Green Yellow Yellow
  • In this example, it may be preferable to have “Green” or even “Yellow” servers take on new requests instead of “Red” servers. One way to achieve this is to make an initial determination as before, and then to have the selected server decide whether to accept the connection request or pass it on to a server with a lighter load. For example, as shown in the flowchart of FIG. 12A, each (live) server will determine whether it is responsible for a connection request (e.g., using a hash function) (at 1202). Preferably the selection function will always select only one server. The particular server that is responsible for the connection then determines whether its load is too high to accept the connection (at 1204). This determination may be made using information in the load table on the server. For example, a server may consider its load to be too high if its load is in the “Red” band. In the example above, using this policy, the servers S3 and S4 may consider their loads to be too high. If the selected server does not consider its load to be too high to accept the connection, the server accepts responsibility and makes the connection (at 1206). It should be appreciated that this server may subsequently hand off the connection to another server after an actual connection is established, e.g., based on other policy considerations. For example, as described above, the initial server may make a connection handoff to another server based on the actual request being made.
  • If the selected server considers its load to be too high to accept the connection (at 1204), the server may try to find another server (within the cluster) to handle the connection. The server may use its load table to determine if there is another server in the cluster with a load that is lighter that its load (at 1208). For the purposes of this comparison, a server will treat all servers in the same load category as having the same load. If there are k load categories, with category 1 being the lightest loaded and category k being the heaviest loaded, the server may try to find a server in the lightest load category, then in the next lightest category, and so on, until it reaches its own category. For example, in the case of the three load categories “Green”, “Yellow”, “Red,” a “Red” server will first try to find a “Green” server to handle the request, and, failing that will try to find a “Yellow” server to handle the request. In the example above, if server S4 was initially selected as the server to handle the connection, that server would select one of the “Green” servers (S1, S5, S6) to take on the request.
  • If no server can be found with a lighter load (at 1208), then the initially-selected server takes responsibility for the connection (at 1206), otherwise it passes responsibility to a server with a lighter load (at 1210). Note that the lighter loaded server may decline to take the connection.
  • Those of skill in the art will realize and understand, upon reading this description, that different and/or other load-based server selection schemes may be used. For example, the system may implement a policy in which only servers in the heaviest loaded category (e.g., Red servers) may switch load, and then only to a server in the lightest loaded category on that cluster. In this policy, a Red server may switch to a Green server, and may only switch to a Yellow server if there are no Green servers. In another example, a cluster may try to maintain a certain ratio of Green (or Yellow) servers relative to the total number of (live) servers.
  • Another way to use the load information in initial server selection is to use the load information to essentially reduce the set of candidate machines For example, all machines categorized as having the heaviest load (e.g., Red) can be removed from the selection by being treated as inactive. Those servers become active again when their load drops to a lower category.
  • An alternative approach reduces the set of candidate machines for initial server selection based on a configured threshold, based, e.g., on a percentage of the size of the cluster. For example, the threshold may be set to X, where X represents a percentage of the number of live servers connected to ports of the switch and having the same VIP. By way of example, in the case of the three load categories “Green”, “Yellow”, “Red,” if there are sufficient machines in the Green load band (based on the threshold), then the selection algorithm proceeds using only the Green machines, with all other machines being considered offline (for selection purposes). If there are not enough Green machines (i.e., the number of Green machines is less than the threshold), then the machines with their loads in the next band (Yellow) are considered candidates (i.e., are treated as if online), such that the total of all of the machines in those bands is larger than the configured threshold. All machines make the same determination, however non-candidate machines cannot be selected or select themselves.
  • Its should be appreciated that the threshold used to select candidate machines for initial server selection is not the same as the thresholds discussed above that are used to mark the load category or band boundaries. The value X used for the threshold may be, for example, 5%, 10%, 15%, 20%, 30%, 40%, 50%, and so on. Those of skill in the art will realize and understand, upon reading this description, that other values of X may be used.
  • FIG. 12B is a flowchart showing selection of the set of candidate machines for initial server selection. In this example, there are three load categories denoted “Green”, “Yellow”, and “Red”, with the number of servers in the Green load category being denoted #G, the number of servers in the Yellow load category being denoted #Y, and the number of servers in the Red load category being denoted #R. For the sake of this example, suppose that the value X used to set the threshold is 40% and that there are eight live servers connected to the switch and having the same VIP. In this case then the threshold value T is 40% of eight which is rounded to 3. The candidate selection algorithm is:
      • Candidates=
        • if #G>=T then Green servers
        • else if #G+#Y>=T then Green and Yellow servers
        • else Green and Yellow and Red servers
  • With reference to the flowchart in FIG. 12B, if the number of Green servers (#G) is greater than or equal to the threshold (T) (at 1212), then the candidate servers is set to the Green servers (at 1214). Otherwise, if the number of Green servers (#G) and the number of Yellow servers (#Y) is greater than or equal to the threshold (T) (at 1216), then the candidate servers is set to the Green and Yellow servers (at 1218). Otherwise the candidate servers are set to be the Green, Yellow, and Red servers (at 1220).
  • In this case, e.g., being a “red” server (i.e., a server whose load is categorized as “red”) may still be a candidate for server selection if there are too few green and yellow machines (based on the threshold).
  • Those of skill in the art will realize and understand, upon reading this description, how to extend this candidate selection algorithm to more than three load categories/bands. In the general case, with M bands, the candidates for selection (i.e., the machines considered online) are the machines of the lowest loaded N bands such that the total number of machines in those bands is larger than a configured threshold. The remaining machines, not in the candidates, are considered offline for the purposes of initial server selection.
  • It should be appreciated that the candidate servers may also be used for subsequent decisions (i.e., after the initial selection) regarding server selection and handoff.
  • It should be further appreciated that the above approach will work even if the load data are not identical on all servers. Suppose, e.g., that based on locally stored load information, some servers have server S as a candidate and others have S excluded (based on their information about S′s load). The servers that exclude S will select S′ as the server. If the other servers select S, then the first of the server S′ and S to make the connection with the client will handle the request.
  • Proxy Serving
  • With reference to FIG. 13, once a initial server is selected (at 1302) to handle a connection request from a client (whether or not using load information), that initial server then establishes a TCP connection with the client (at 1304), as described above. As described, after the initial TCP connection is established between the client and the initial server, the initial server will then obtain an actual request (e.g., an HTTP request) from the client (at 1306). Based on that request, the initial server may determine that a different server should preferably handle the request (at 1306), and so the initial server may attempt to handoff the connection to another cluster server (as described above) (at 1308). If the initial server determines that it should not try to handoff the request to another server (at 1308), then the initial server handles the request (at 1310). If the initial server determines that it should try to handoff the request to another server (at 1308), then the other cluster server may accept or reject the handoff request (at 1312). If the other cluster server rejects the handoff (at 1312), then the initial server handles the request (at 1310). If the other cluster server accepts the connection handoff, then that other server handles the request (at 1314). The selection of the initial server (at 1302) may use the load-aware selection described above or some other technique.
  • In some cases, however, the other cluster server may serve the client via the initial server, essentially using the initial server as a proxy. In these cases there is no handoff of the TCP/IP connection. The decision as to whether or not to accept a handoff or use the initial server as a proxy may be made by the other server based, e.g., on its load, on the popularity of the requested resource, and/or on other factors.
  • FIG. 14 is an exemplary flowchart of a system in which the initial server acts as a proxy for another cluster server. As in the case with the system described with respect to FIG. 13, once a initial server is selected (at 1402) to handle a connection request from a client, that initial server then establishes a TCP connection with the client (at 1404). After the initial TCP connection is established between the client and the initial server, the initial server obtains an actual request (e.g., an HTTP request) from the client (at 1406). Based on that request, the initial server may determine that a different server should preferably handle the request (at 1406), and so the initial server may attempt to handoff the connection to another cluster server (at 1408). If the initial server determines that it should not try to handoff the request to another server (at 1408), then the initial server handles the request itself (at 1410).
  • If the initial server determines that it should try to handoff the request to another server (at 1408), then the other cluster server (the “2nd Server” in the flowchart) may accept or reject the handoff request (at 1412). However, in this case, in addition to an outright rejection of the request, the other server may decline the connection handoff and still remain involved and serve the request to the client through the initial server (as a proxy). In this case there is no connection handoff. If the other cluster server rejects the handoff (at 1412), then it is determined (at 1414) whether the initial server should act as a proxy (i.e., whether the other server should serve the client request through the initial server). Preferably the other server determines whether it will serve the request via the initial server, however it should be appreciated that either of the peer servers may chose to handle the request in a proxy manner.
  • If it is determined (at 1414) that the initial server is not to act as a proxy then the initial server handles the request (at 1410), otherwise the other server handles the request via the initial server as a proxy (at 1416).
  • If the other cluster server accepts the connection handoff (at 1412), then that other server handles the request (at 1418).
  • In the embodiments described above, after establishing a connection with a client, the initial server determines whether or not to try to handoff the connection to a peer server (e.g., another server in the cluster) (e.g., at step 1308 in FIGS. 13 and 1408 in FIG. 14). As explained above, the decision to switch to a peer server may be based on a number of factors. In some embodiments, the decision to switch to a peer may be based, at least in part, on load information about the initial server and/or the other server. The decision may be based, at least in part, on load information in the load table, including, in some cases, load information about all of the other servers in the cluster.
  • Similarly, in the case of proxy serving described above, the decision (at 1414) as to whether or not the initial server is to act as a proxy for the second server may be based, at least in part, on load information about one or more of the servers.
  • In some embodiments, after an initial server is selected to handle a request, migration to another server (within the same cluster or outside the cluster) may occur as a result of a peering query event. For example, the initial contact machine may accept the connection from the client, read the request and if it has the resource to hand, serve the resource. However, if the initial contact server does not have the requested resource, it will query its peers to see if they have it. The normal response would then indicate from which of those peers (if any) the initial server could fill the resource. In some embodiments the peer response includes the ‘migrate the connection to me’ or ‘proxy the resource through’ options.
  • In the case of a peer ‘migrate the connection to me’ response, the initial contact server will migrate the connection, e.g., as described above. In the case of peer “proxy the resource through” response, the initial contact server will proxy the response through the designated server. It should be appreciated that the ‘proxy the resource’ option can be used with machines in a different cluster than the one that includes the initial contact server.
  • In presently preferred implementations there is no option to migrate out of the cluster.
  • Exemplary processing of peering query events is described with reference to the flowchart in FIG. 15. As in the case with the system described with respect to FIGS. 13 and 14, once a initial server is selected (at 1502) to handle a connection request from a client, that initial server then establishes a TCP connection with the client (at 1504). After the initial TCP connection is established between the client and the initial server, the initial server obtains an actual request (e.g., an HTTP request) from the client (at 1506). If the initial server has the requested resource (at 1508), then it serves that resource (at 1510), otherwise it queries its peers for the resource (at 1512). The initial server obtains and analyzes any peer response(s) (at 1514).
  • If a peer has the resource and instructs the initial contact server accordingly, then the initial server may obtain the resource from that peer (at 1516). If a peer responds with a “migrate the connection to me” response, the initial contact server may migrate the connection (e.g., as described above) to that peer (at 1518). If a peer responds with a “proxy the resource through” response (identifying a second server), the initial contact server may the proxy the resource through that second server (at 1520). If no peer responds, the initial server may obtain the resource in some other manner, e.g., from an origin server or the like.
  • Although aspects of this invention have been described with reference to a particular system, the present invention operates on any computer system and can be implemented in software, hardware or any combination thereof. When implemented fully or partially in software, the invention can reside, permanently or temporarily, on any memory or storage medium, including but not limited to a RAM, a ROM, a disk, an ASIC, a PROM and the like.
  • While certain configurations of structures have been illustrated for the purposes of presenting the basic structures of the present invention, one of ordinary skill in the art will appreciate that other variations are possible which would still fall within the scope of the appended claims. While the invention has been described in connection with what is presently considered to be the most practical and preferred embodiment, it is to be understood that the invention is not to be limited to the disclosed embodiment, but on the contrary, is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.

Claims (30)

We claim:
1. A method, operable in a load-balancing cluster comprising:
(i) a switch having a plurality of ports; and
(ii) a plurality of servers connected to at least some of the plurality of ports of the switch, each of said servers being addressable by the same virtual Internet Protocol (VIP) address, the method comprising:
(A) in response to a connection request at said switch to connect a client computer to a server associated with said VIP address, a first server of said plurality of servers establishing a first connection with the client computer as a Transmission Control Protocol (TCP) connection;
(B) after establishing the TCP connection with the client computer in (A), and in response to a resource request received by said first server from said client computer for a particular resource, said first server determining whether or not to attempt to handoff the request to a second server of said plurality of servers; and
(C) based on said determining in (B), said first server attempting to handoff the TCP connection with the client computer to the second server when said first server determines that it should attempt to handoff the request to the second server; and,
(D) said second server rejecting the first server's attempted handoff of the TCP connection and said second server serving the particular resource to the client through the first server.
2. The method of claim 1, said connection request having been provided by said switch to each server connected to said switch, the method further comprising:
prior to said first server of said plurality of servers establishing a first connection with the client computer in (A), at least some of said servers determining which server of said plurality of servers is to handle the connection, wherein the determining is based, at least in part, on a first given function of information used to request the connection, whereby the first server is determined to handle the connection.
3. The method of claim 1 further comprising:
said first server updating a firewall of said first server to allow incoming traffic at said first server related to said TCP connection; and
other servers of said plurality of servers each updating a corresponding firewall thereof to reject incoming traffic related to said TCP connection between the client computer and the first server.
4. The method of claim 1 wherein the resource request for the particular resource comprises an HTTP request from said client computer for said particular resource.
5. The method of claim 1 wherein said determining whether or not to attempt to handoff the request to the second server in (B) is based, at least in part, on whether said second server had already handled a previous request from said client computer.
6. The method of claim 5 wherein the said determining whether or not to attempt to handoff the request to the second server in (B) is based, at least in part, on whether said second server had already handled a previous request from said client computer for said particular resource.
7. The method of claim 1 wherein the said determining whether or not to attempt to handoff the request to the second server in (B) is based, at least in part, on whether said second server had already handled a previous request for said particular resource.
8. The method of claim 7 wherein the said determining whether or not to attempt to handoff the request to the second server in (B) is based, at least in part, on whether said second server had previously served said particular resource
9. The method of claim 1 wherein the said determining whether or not to attempt to handoff the request to the second server in (B) is based, at least in part, on whether said second server had previously served said particular resource to any computer.
10. The method of claim 1 wherein the determining in (B) is made based at least in part on at least some information associated with the resource request for the particular resource.
11. The method of claim 10 wherein the resource request for the particular resource comprises an HTTP request from said client computer for said particular resource.
12. The method of claim 11 wherein the information associated with the resource request for the particular resource comprises at least some information from at least one HTTP header associated with the resource request.
13. The method of claim 1 wherein the said determining whether or not to attempt to handoff the request to the second server in (B) is based, at least in part, on whether said second server had previously served any resource to the client computer.
14. The method of claim 10 wherein the determining in (B) is made based at least in part on a second given function of said at least some information associated with the resource request for the particular resource.
15. The method of claim 14 wherein said second given function comprises a hash function.
16. The method of claim 1 wherein said determining in (B) is based, at least in part, on one or more of:
(b1) the type of resource request;
(b2) the type of particular resource requested; and
(b3) the particular resource requested.
17. The method of claim 1 further comprising:
said second server determining whether or not to accept the first server's attempted handoff of the TCP connection.
18. The method of claim 17 wherein the second server rejects the attempted handoff in (D) based, at least in part, on a load on said second server.
19. A method, operable in a load-balancing cluster comprising:
(i) a switch having a plurality of ports; and
(ii) a plurality of servers connected to at least some of the plurality of ports of the switch, each of said servers being addressable by the same virtual Internet Protocol (VIP) address, the method comprising:
(A) in response to a connection request at said switch to connect a client computer to a server associated with said VIP address, at least some of said servers determining which server of said plurality of servers is to handle the connection, wherein the determining is based, at least in part, on a first given function of information used to request the connection, whereby a first server is determined to handle the connection;
(B) said first server of said plurality of servers establishing a first connection with the client computer as a Transmission Control Protocol (TCP) connection;
(C) said first server updating a firewall of said first server to allow incoming traffic at said first server related to said TCP connection; and
(D) other servers of said plurality of servers each updating a corresponding firewall thereof to reject incoming traffic related to said TCP connection between the client computer and the first server.
(E) after establishing the TCP connection with the client computer in (B), and in response to a resource request received by said first server from said client computer for a particular resource, said first server determining whether or not to attempt to handoff the request to a second server of said plurality of servers; and
(F) based on said determining in (E), said first server attempting to handoff the TCP connection with the client computer to the second server when said first server determines that it should attempt to handoff the request to the second server; and,
(G) said second server rejecting the first server's attempted handoff of the TCP connection and said second server serving the particular resource to the client through the first server.
20. The method of claim 19 wherein the determining in (E) by said first server whether or not to attempt to handoff the request to a second server of said plurality of servers is based, at least in part, on one or more of the following:
(e1) whether said second server had already handled a previous request from said client computer;
(e2) whether said second server had already handled a previous request for said particular resource;
(e3) the type of resource request;
(e4) the type of particular resource requested; and
(e5) the particular resource requested.
21. The method of claim 19 wherein the resource request for the particular resource comprises an HTTP request from said client computer for said particular resource.
22. The method of claim 19 wherein the determining in (E) is made based at least in part on at least some information associated with the resource request for the particular resource.
23. The method of claim 22 wherein the resource request for the particular resource comprises an HTTP request from said client computer for said particular resource.
24. The method of claim 23 wherein the information associated with the resource request for the particular resource comprises information associated with at least one HTTP header.
25. The method of claim 24 wherein the information associated with the at least one HTTP header comprises a URL.
26. The method of claim 19 wherein the determining in (E) is made based at least in part on a second given function of said at least some information associated with the resource request for the particular resource.
27. The method of claim 26 wherein said second given function comprises a hash function.
28. The method of claim 19 further comprising:
said second server determining whether or not to accept the first server's attempted handoff of the TCP connection.
29. The method of claim 28 wherein the second server rejects the attempted handoff in (G) based, at least in part, on a load on said second server.
30. A method, operable in a load-balancing cluster comprising:
(i) a switch having a plurality of ports; and
(ii) a plurality of servers connected to at least some of the plurality of ports of the switch, each of said servers being addressable by the same virtual Internet Protocol (VIP) address, the method comprising:
(A) in response to a connection request at said switch to connect a client computer to a server associated with said VIP address, a first server of said plurality of servers establishing a first connection with the client computer as a Transmission Control Protocol (TCP) connection;
(B) after establishing the TCP connection with the client computer in (A), and in response to a resource request received by said first server from said client computer for a particular resource, said first server determining whether or not it has a copy of the particular resource;
(C) based on said determining in (B), said first server querying one or more peers regarding the particular resource;
(D) based on said querying in (C), said first server obtaining one or more responses from said one or more peers, and, based at least in part on said one or more responses, said first server performing one of:
(d1) obtaining the particular resource from a first peer of said one or more peers when said first peer indicates that said first peer has a copy of the particular resource; and
(d2) migrating the TCP connection to a second peer when the second peer indicates that the initial peer server should migrate the connection to the second peer; and
(d3) serving the particular resource to the client request through a second server when a third peer indicates that the initial server should proxy the particular resource through the second server.
US13/495,085 2011-12-31 2012-06-13 Load-balancing cluster Abandoned US20130173806A1 (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
US13/495,085 US20130173806A1 (en) 2011-12-31 2012-06-13 Load-balancing cluster
PCT/US2012/071674 WO2013101844A1 (en) 2011-12-31 2012-12-26 Load-balancing cluster
CA2862339A CA2862339C (en) 2011-12-31 2012-12-26 Load-balancing cluster
EP12862575.3A EP2798513B1 (en) 2011-12-31 2012-12-26 Load-balancing cluster
HK15103847.2A HK1203654A1 (en) 2011-12-31 2015-04-21 Load-balancing cluster

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
US201161582298P 2011-12-31 2011-12-31
US201161582301P 2011-12-31 2011-12-31
US13/495,085 US20130173806A1 (en) 2011-12-31 2012-06-13 Load-balancing cluster

Publications (1)

Publication Number Publication Date
US20130173806A1 true US20130173806A1 (en) 2013-07-04

Family

ID=48695883

Family Applications (2)

Application Number Title Priority Date Filing Date
US13/495,085 Abandoned US20130173806A1 (en) 2011-12-31 2012-06-13 Load-balancing cluster
US13/495,114 Active 2033-05-21 US9444884B2 (en) 2011-12-31 2012-06-13 Load-aware load-balancing cluster without a central load balancer

Family Applications After (1)

Application Number Title Priority Date Filing Date
US13/495,114 Active 2033-05-21 US9444884B2 (en) 2011-12-31 2012-06-13 Load-aware load-balancing cluster without a central load balancer

Country Status (5)

Country Link
US (2) US20130173806A1 (en)
EP (1) EP2798513B1 (en)
CA (1) CA2862339C (en)
HK (1) HK1203654A1 (en)
WO (1) WO2013101844A1 (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140067986A1 (en) * 2012-08-31 2014-03-06 Matthew D. Shaver Network service system and method with off-heap caching
US20170083525A1 (en) * 2015-09-22 2017-03-23 Wal-Mart Stores, Inc. System and method for implementing a database in a heterogeneous cluster
US9609048B1 (en) * 2012-08-23 2017-03-28 TidalScale, Inc. Resource request and transfer in a multi-node distributed system
CN107135242A (en) * 2016-02-29 2017-09-05 阿里巴巴集团控股有限公司 Mongodb clusters access method, apparatus and system
US20170366477A1 (en) * 2016-06-17 2017-12-21 Intel Corporation Technologies for coordinating access to data packets in a memory
US10083201B2 (en) 2015-09-22 2018-09-25 Walmart Apollo, Llc System for maintaining consistency across a decentralized database cluster and method therefor
US10116736B2 (en) 2015-09-22 2018-10-30 Walmart Apollo, Llc System for dynamically varying traffic routing modes in a distributed cluster and method therefor
US10169138B2 (en) 2015-09-22 2019-01-01 Walmart Apollo, Llc System and method for self-healing a database server in a cluster
US10268744B2 (en) 2015-09-22 2019-04-23 Walmart Apollo, Llc System for maintaining consistency across a decentralized database cluster and method therefor
CN109684081A (en) * 2018-12-11 2019-04-26 北京数盾信息科技有限公司 A kind of allocation processing method of load balancing in cluster
US10353736B2 (en) 2016-08-29 2019-07-16 TidalScale, Inc. Associating working sets and threads
US10394817B2 (en) 2015-09-22 2019-08-27 Walmart Apollo, Llc System and method for implementing a database
US20190379730A1 (en) * 2018-06-07 2019-12-12 Level 3 Communications, Llc Load distribution across superclusters
US10999209B2 (en) 2017-06-28 2021-05-04 Intel Corporation Technologies for scalable network packet processing with lock-free rings
US11201791B2 (en) * 2020-03-24 2021-12-14 Verizon Patent And Licensing Inc. Optimum resource allocation and device assignment in a MEC cluster
US20230308475A1 (en) * 2022-03-25 2023-09-28 Roblox Corporation Online Game Network Demultiplexer with Denial-of-Service Prevention
US11803306B2 (en) 2017-06-27 2023-10-31 Hewlett Packard Enterprise Development Lp Handling frequently accessed pages
US11907768B2 (en) 2017-08-31 2024-02-20 Hewlett Packard Enterprise Development Lp Entanglement of pages and guest threads

Families Citing this family (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11323510B2 (en) 2008-02-28 2022-05-03 Level 3 Communications, Llc Load-balancing cluster
US8489750B2 (en) 2008-02-28 2013-07-16 Level 3 Communications, Llc Load-balancing cluster
US8776207B2 (en) * 2011-02-16 2014-07-08 Fortinet, Inc. Load balancing in a network with session information
US9270639B2 (en) * 2011-02-16 2016-02-23 Fortinet, Inc. Load balancing among a cluster of firewall security devices
EP2581831A1 (en) * 2011-10-14 2013-04-17 Alcatel Lucent Method and apparatus for dynamically assigning resources of a distributed server infrastructure
US9083709B2 (en) * 2012-05-11 2015-07-14 Cisco Technology, Inc. Virtual internet protocol migration and load balancing
US9729493B1 (en) 2012-06-25 2017-08-08 Vmware, Inc. Communicating messages over a social network to members of a virtualization infrastructure
US9189758B2 (en) * 2012-06-25 2015-11-17 Vmware, Inc. Administration of a network
US9929998B1 (en) 2012-08-24 2018-03-27 Vmware, Inc. Tagged messages to facilitate administration of a virtualization infrastructure
US9923859B1 (en) 2013-06-25 2018-03-20 Vmware, Inc. Creating a group of members based on monitoring a social network
CN103684900B (en) * 2012-09-19 2018-03-16 腾讯科技(深圳)有限公司 Business method for inspecting and system
US10027761B2 (en) * 2013-05-03 2018-07-17 A10 Networks, Inc. Facilitating a secure 3 party network session by a network device
US9503333B2 (en) * 2013-08-08 2016-11-22 Level 3 Communications, Llc Content delivery methods and systems
KR102169302B1 (en) * 2014-04-30 2020-10-23 삼성전자주식회사 A method, a terminal and a server for providing communication service
US9936048B2 (en) * 2014-09-10 2018-04-03 International Business Machines Corporation Client system communication with a member of a cluster of server systems
US10003641B2 (en) * 2014-09-16 2018-06-19 Telefonaktiebolaget Lm Ericsson (Publ) Method and system of session-aware load balancing
US10904111B2 (en) * 2014-10-02 2021-01-26 International Business Machines Corporation Lightweight framework with dynamic self-organizing coordination capability for clustered applications
US11533255B2 (en) * 2014-11-14 2022-12-20 Nicira, Inc. Stateful services on stateless clustered edge
US9876714B2 (en) 2014-11-14 2018-01-23 Nicira, Inc. Stateful services on stateless clustered edge
US9866473B2 (en) 2014-11-14 2018-01-09 Nicira, Inc. Stateful services on stateless clustered edge
US10044617B2 (en) 2014-11-14 2018-08-07 Nicira, Inc. Stateful services on stateless clustered edge
US10320905B2 (en) * 2015-10-02 2019-06-11 Oracle International Corporation Highly available network filer super cluster
US11570092B2 (en) 2017-07-31 2023-01-31 Nicira, Inc. Methods for active-active stateful network service cluster
US11296984B2 (en) 2017-07-31 2022-04-05 Nicira, Inc. Use of hypervisor for active-active stateful network service cluster
US10951584B2 (en) 2017-07-31 2021-03-16 Nicira, Inc. Methods for active-active stateful network service cluster
US10749945B2 (en) * 2017-10-09 2020-08-18 Level 3 Communications, Llc Cross-cluster direct server return with anycast rendezvous in a content delivery network (CDN)
US10942779B1 (en) 2017-10-27 2021-03-09 EMC IP Holding Company LLC Method and system for compliance map engine
US10834189B1 (en) * 2018-01-10 2020-11-10 EMC IP Holding Company LLC System and method for managing workload in a pooled environment
US11153122B2 (en) 2018-02-19 2021-10-19 Nicira, Inc. Providing stateful services deployed in redundant gateways connected to asymmetric network
US11799761B2 (en) 2022-01-07 2023-10-24 Vmware, Inc. Scaling edge services with minimal disruption
US11962564B2 (en) 2022-02-15 2024-04-16 VMware LLC Anycast address for network address translation at edge

Family Cites Families (45)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6470389B1 (en) 1997-03-14 2002-10-22 Lucent Technologies Inc. Hosting a network service on a cluster of servers using a single-address image
US6128279A (en) 1997-10-06 2000-10-03 Web Balance, Inc. System for balancing loads among network servers
US6553420B1 (en) 1998-03-13 2003-04-22 Massachusetts Institute Of Technology Method and apparatus for distributing requests among a plurality of resources
US6430618B1 (en) 1998-03-13 2002-08-06 Massachusetts Institute Of Technology Method and apparatus for distributing requests among a plurality of resources
US6108703A (en) 1998-07-14 2000-08-22 Massachusetts Institute Of Technology Global hosting system
US6578066B1 (en) * 1999-09-17 2003-06-10 Alteon Websystems Distributed load-balancing internet servers
US6888797B1 (en) 1999-05-05 2005-05-03 Lucent Technologies Inc. Hashing-based network load balancing
US6463454B1 (en) * 1999-06-17 2002-10-08 International Business Machines Corporation System and method for integrated load distribution and resource management on internet environment
US6785704B1 (en) 1999-12-20 2004-08-31 Fastforward Networks Content distribution system for operation over an internetwork including content peering arrangements
US20020010783A1 (en) 1999-12-06 2002-01-24 Leonard Primak System and method for enhancing operation of a web server cluster
US6826613B1 (en) 2000-03-15 2004-11-30 3Com Corporation Virtually addressing storage devices through a switch
US7240100B1 (en) 2000-04-14 2007-07-03 Akamai Technologies, Inc. Content delivery network (CDN) content server request handling mechanism with metadata framework support
US6976090B2 (en) 2000-04-20 2005-12-13 Actona Technologies Ltd. Differentiated content and application delivery via internet
US7912978B2 (en) 2000-07-19 2011-03-22 Akamai Technologies, Inc. Method for determining metrics of a content delivery and global traffic management network
US7725602B2 (en) 2000-07-19 2010-05-25 Akamai Technologies, Inc. Domain name resolution using a distributed DNS network
US6954784B2 (en) 2000-08-17 2005-10-11 International Business Machines Corporation Systems, method and computer program products for cluster workload distribution without preconfigured port identification by utilizing a port of multiple ports associated with a single IP address
US20030154266A1 (en) 2000-09-01 2003-08-14 Mark Bobick Server system and method for discovering digital assets in enterprise information systems
US6970939B2 (en) 2000-10-26 2005-11-29 Intel Corporation Method and apparatus for large payload distribution in a network
WO2002057917A2 (en) 2001-01-22 2002-07-25 Sun Microsystems, Inc. Peer-to-peer network computing platform
US7197536B2 (en) 2001-04-30 2007-03-27 International Business Machines Corporation Primitive communication mechanism for adjacent nodes in a clustered computer system
US6944678B2 (en) 2001-06-18 2005-09-13 Transtech Networks Usa, Inc. Content-aware application switch and methods thereof
US9167036B2 (en) 2002-02-14 2015-10-20 Level 3 Communications, Llc Managed object replication and delivery
US7284067B2 (en) * 2002-02-20 2007-10-16 Hewlett-Packard Development Company, L.P. Method for integrated load balancing among peer servers
GB0213073D0 (en) 2002-06-07 2002-07-17 Hewlett Packard Co Method of maintaining availability of requested network resources
US20030236813A1 (en) * 2002-06-24 2003-12-25 Abjanic John B. Method and apparatus for off-load processing of a message stream
US7480737B2 (en) 2002-10-25 2009-01-20 International Business Machines Corporation Technique for addressing a cluster of network servers
US7774495B2 (en) 2003-02-13 2010-08-10 Oracle America, Inc, Infrastructure for accessing a peer-to-peer network environment
US7181524B1 (en) 2003-06-13 2007-02-20 Veritas Operating Corporation Method and apparatus for balancing a load among a plurality of servers in a computer system
US20050022017A1 (en) 2003-06-24 2005-01-27 Maufer Thomas A. Data structures and state tracking for network protocol processing
US7912954B1 (en) * 2003-06-27 2011-03-22 Oesterreicher Richard T System and method for digital media server load balancing
US7613822B2 (en) * 2003-06-30 2009-11-03 Microsoft Corporation Network load balancing with session information
US7636917B2 (en) * 2003-06-30 2009-12-22 Microsoft Corporation Network load balancing with host status information
US7606929B2 (en) * 2003-06-30 2009-10-20 Microsoft Corporation Network load balancing with connection manipulation
US7590736B2 (en) 2003-06-30 2009-09-15 Microsoft Corporation Flexible network load balancing
JP2005062927A (en) 2003-08-11 2005-03-10 Hitachi Ltd Load control method and device, and processing program therefor
JP4515319B2 (en) 2005-04-27 2010-07-28 株式会社日立製作所 Computer system
US7694011B2 (en) 2006-01-17 2010-04-06 Cisco Technology, Inc. Techniques for load balancing over a cluster of subscriber-aware application servers
US20090049449A1 (en) 2007-08-15 2009-02-19 Srinidhi Varadarajan Method and apparatus for operating system independent resource allocation and control
EP2248003A1 (en) 2007-12-31 2010-11-10 Netapp, Inc. System and method for automatic storage load balancing in virtual server environments
US8489750B2 (en) * 2008-02-28 2013-07-16 Level 3 Communications, Llc Load-balancing cluster
EP2248016B1 (en) * 2008-02-28 2016-09-21 Level 3 Communications, LLC Load-balancing cluster
US9596278B2 (en) 2010-09-03 2017-03-14 Level 3 Communications, Llc Extending caching network functionality to an existing streaming media server
US9037712B2 (en) * 2010-09-08 2015-05-19 Citrix Systems, Inc. Systems and methods for self-loading balancing access gateways
US8824286B2 (en) 2010-10-29 2014-09-02 Futurewei Technologies, Inc. Network aware global load balancing system and method
US9055076B1 (en) * 2011-06-23 2015-06-09 Amazon Technologies, Inc. System and method for distributed load balancing with load balancer clients for hosts

Cited By (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10187452B2 (en) 2012-08-23 2019-01-22 TidalScale, Inc. Hierarchical dynamic scheduling
US10623479B2 (en) 2012-08-23 2020-04-14 TidalScale, Inc. Selective migration of resources or remapping of virtual processors to provide access to resources
US9609048B1 (en) * 2012-08-23 2017-03-28 TidalScale, Inc. Resource request and transfer in a multi-node distributed system
US10645150B2 (en) 2012-08-23 2020-05-05 TidalScale, Inc. Hierarchical dynamic scheduling
US11159605B2 (en) 2012-08-23 2021-10-26 TidalScale, Inc. Hierarchical dynamic scheduling
US10205772B2 (en) 2012-08-23 2019-02-12 TidalScale, Inc. Saving and resuming continuation on a physical processor after virtual processor stalls
US9021050B2 (en) * 2012-08-31 2015-04-28 Yume, Inc. Network service system and method with off-heap caching
US20140067986A1 (en) * 2012-08-31 2014-03-06 Matthew D. Shaver Network service system and method with off-heap caching
US10268744B2 (en) 2015-09-22 2019-04-23 Walmart Apollo, Llc System for maintaining consistency across a decentralized database cluster and method therefor
US10169138B2 (en) 2015-09-22 2019-01-01 Walmart Apollo, Llc System and method for self-healing a database server in a cluster
US10116736B2 (en) 2015-09-22 2018-10-30 Walmart Apollo, Llc System for dynamically varying traffic routing modes in a distributed cluster and method therefor
US10083201B2 (en) 2015-09-22 2018-09-25 Walmart Apollo, Llc System for maintaining consistency across a decentralized database cluster and method therefor
US9996591B2 (en) * 2015-09-22 2018-06-12 Walmart Apollo, Inc. System and method for implementing a database in a heterogeneous cluster
US10394817B2 (en) 2015-09-22 2019-08-27 Walmart Apollo, Llc System and method for implementing a database
US20170083525A1 (en) * 2015-09-22 2017-03-23 Wal-Mart Stores, Inc. System and method for implementing a database in a heterogeneous cluster
CN107135242A (en) * 2016-02-29 2017-09-05 阿里巴巴集团控股有限公司 Mongodb clusters access method, apparatus and system
US11671382B2 (en) * 2016-06-17 2023-06-06 Intel Corporation Technologies for coordinating access to data packets in a memory
US20170366477A1 (en) * 2016-06-17 2017-12-21 Intel Corporation Technologies for coordinating access to data packets in a memory
US10620992B2 (en) 2016-08-29 2020-04-14 TidalScale, Inc. Resource migration negotiation
US10353736B2 (en) 2016-08-29 2019-07-16 TidalScale, Inc. Associating working sets and threads
US10579421B2 (en) 2016-08-29 2020-03-03 TidalScale, Inc. Dynamic scheduling of virtual processors in a distributed system
US11513836B2 (en) 2016-08-29 2022-11-29 TidalScale, Inc. Scheduling resuming of ready to run virtual processors in a distributed system
US10783000B2 (en) 2016-08-29 2020-09-22 TidalScale, Inc. Associating working sets and threads
US11403135B2 (en) 2016-08-29 2022-08-02 TidalScale, Inc. Resource migration negotiation
US11803306B2 (en) 2017-06-27 2023-10-31 Hewlett Packard Enterprise Development Lp Handling frequently accessed pages
US10999209B2 (en) 2017-06-28 2021-05-04 Intel Corporation Technologies for scalable network packet processing with lock-free rings
US11907768B2 (en) 2017-08-31 2024-02-20 Hewlett Packard Enterprise Development Lp Entanglement of pages and guest threads
US11637893B2 (en) * 2018-06-07 2023-04-25 Level 3 Communications, Llc Load distribution across superclusters
US10594782B2 (en) * 2018-06-07 2020-03-17 Level 3 Communications, Llc Load distribution across superclusters
US12132779B2 (en) 2018-06-07 2024-10-29 Sandpiper Cdn, Llc Load distribution across superclusters
WO2019236113A1 (en) * 2018-06-07 2019-12-12 Level 3 Communications, Llc Load distribution across superclusters
AU2018427212A1 (en) * 2018-06-07 2021-01-07 Level 3 Communications, Llc Load distribution across superclusters
JP7148033B2 (en) 2018-06-07 2022-10-05 レベル スリー コミュニケーションズ,エルエルシー Load balancing across superclusters
US20200213388A1 (en) * 2018-06-07 2020-07-02 Level 3 Communications, Llc Load distribution across superclusters
JP2021526268A (en) * 2018-06-07 2021-09-30 レベル スリー コミュニケーションズ,エルエルシー Load balancing across superclusters
US20190379730A1 (en) * 2018-06-07 2019-12-12 Level 3 Communications, Llc Load distribution across superclusters
CN109684081A (en) * 2018-12-11 2019-04-26 北京数盾信息科技有限公司 A kind of allocation processing method of load balancing in cluster
US11658877B2 (en) * 2020-03-24 2023-05-23 Verizon Patent And Licensing Inc. Optimum resource allocation and device assignment in a MEC cluster
US20220086049A1 (en) * 2020-03-24 2022-03-17 Verizon Patent And Licensing Inc. Optimum resource allocation and device assignment in a mec cluster
US11201791B2 (en) * 2020-03-24 2021-12-14 Verizon Patent And Licensing Inc. Optimum resource allocation and device assignment in a MEC cluster
US20230308475A1 (en) * 2022-03-25 2023-09-28 Roblox Corporation Online Game Network Demultiplexer with Denial-of-Service Prevention
US12081585B2 (en) * 2022-03-25 2024-09-03 Roblox Corporation Online game network demultiplexer with denial-of-service prevention

Also Published As

Publication number Publication date
CA2862339A1 (en) 2013-07-04
HK1203654A1 (en) 2015-10-30
US20130174177A1 (en) 2013-07-04
EP2798513B1 (en) 2018-09-12
EP2798513A1 (en) 2014-11-05
EP2798513A4 (en) 2015-08-05
WO2013101844A1 (en) 2013-07-04
US9444884B2 (en) 2016-09-13
CA2862339C (en) 2020-08-18

Similar Documents

Publication Publication Date Title
US9444884B2 (en) Load-aware load-balancing cluster without a central load balancer
US8886814B2 (en) Load-balancing cluster
US8015298B2 (en) Load-balancing cluster
US11349912B2 (en) Cross-cluster direct server return in a content delivery network (CDN)
US10826832B2 (en) Load balanced access to distributed scaling endpoints using global network addresses
US20150146539A1 (en) Flow distribution table for packet flow load balancing
US10749945B2 (en) Cross-cluster direct server return with anycast rendezvous in a content delivery network (CDN)
US11323510B2 (en) Load-balancing cluster
Sakurauchi et al. Openweb: seamless proxy interconnection at the switching layer

Legal Events

Date Code Title Description
STCB Information on status: application discontinuation

Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION