US20090187641A1 - Optimization of network protocol options by reinforcement learning and propagation - Google Patents
Optimization of network protocol options by reinforcement learning and propagation Download PDFInfo
- Publication number
- US20090187641A1 US20090187641A1 US10/591,378 US59137806D US2009187641A1 US 20090187641 A1 US20090187641 A1 US 20090187641A1 US 59137806 D US59137806 D US 59137806D US 2009187641 A1 US2009187641 A1 US 2009187641A1
- Authority
- US
- United States
- Prior art keywords
- options
- component
- option
- selection
- learning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L69/00—Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
- H04L69/24—Negotiation of communication capabilities
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/08—Configuration management of networks or network elements
- H04L41/0803—Configuration setting
- H04L41/0813—Configuration setting characterised by the conditions triggering a change of settings
- H04L41/0816—Configuration setting characterised by the conditions triggering a change of settings the condition being an adaptation, e.g. in response to network events
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/50—Network services
- H04L67/60—Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources
- H04L67/61—Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources taking into account QoS or priority requirements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/16—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
Definitions
- the embodiments of the invention relate generally to the field of network communication and, more specifically, relate to optimization of network protocol options by reinforcement learning and propagation.
- Trivial file transfer protocol is a simple user datagram protocol (UDP)-based file transfer program that is frequently used in pre-boot environments.
- UDP user datagram protocol
- TFTP is widely used in image provisioning to allow diskless hosts to boot over the network.
- TFTP provides extensive options, such as block size of data packets and multicast provisioning, which may be applied in order to achieve better performance. For instance, a larger value block size may result in better transfer performance (e.g., a session with the block size of 32 KB results in a 700% increased performance gain over a session with the block size of 512 B in certain 100 Mbps environments). Multicasting enables simultaneous provisioning to multiple clients.
- TFTP server When a TFTP server receives requests from clients, simple negotiations are conducted in which the TFTP server may select appropriate option values as responses. After the negotiation, TFTP sessions are created and the files are transferred according to the selected options of the sessions.
- TFTP option selection presents problems in the area of optimizing and propagation of these options in different network environments for performance enhancement. The effectiveness of the TFTP options is highly dependent on the specific network environments. Some affecting factors on performance include, but are not limited to: network topology, switches and their configurations, network drivers, and implementation of the TFTP clients.
- TFTP options that could lead to high performance in some environments may be risky in other environments, possibly even causing failures.
- a single session of a block size of 32 KB may fail on one type of switch, while a block size of 16 KB may succeed on the same switch with acceptable performance.
- a single multicast session of a block size of 32 KB on an older driver version of a certain Ethernet adapter in a 1 Gbps environment may fail, while reducing the block size or replacing an updated version of the driver will succeed.
- complicated environments may include infrastructures having connectors with hubs, a mix of both 1 Gbps connections and 100 Mbps connections, implementations of UDP multicast of different switches, multiple sessions occurring simultaneously but starting and ending at different times, specific TFTP clients not perfectly implemented due to pre-boot limitations, etc.
- FIG. 1 is a block diagram of one embodiment of an exemplary network system to perform embodiments of the invention
- FIG. 2 is a block diagram of one embodiment of a network environment for providing optimal option selection for trivial file transfer protocol (TFTP);
- TFTP trivial file transfer protocol
- FIG. 3 is a block diagram of one embodiment of an application of option optimization using reinforcement learning
- FIG. 4 is a flow diagram depicting a method of one embodiment of the invention.
- FIG. 5 illustrates a block diagram of one embodiment of an electronic system to perform various embodiments of the invention.
- Embodiments of the present invention describe a method and respective circuit for optimization of network protocol options by reinforcement learning and propagation. More specifically, embodiments of the invention provide a novel approach to trivial file transfer protocol (TFTP) option negotiation and selection using reinforcement learning and propagation.
- TFTP trivial file transfer protocol
- FIG. 1 is a block diagram illustrating one embodiment of an exemplary network system to perform embodiments of the invention.
- System 100 includes a TFTP server 110 , a network 120 , and a client 130 .
- TFTP server 110 may listen over network 120 for connection requests from client 130 .
- Client 130 may make a connection to the TFTP server 110 .
- client 130 and TFTP sever 1100 may communicate via the TFTP.
- client 130 may do a number of file manipulation operations such as uploading files to the TFTP server 110 , download files to the TFTP server 110 , and so on.
- a server other than a TFTP server communicating via the TFTP e.g., FTP server
- FTP server may be utilized.
- TFTP server 110 and client 130 may further enter into option negotiations.
- option negotiations options to enhance and modify the functionality of the TFTP may be selected and enacted between the TFTP server 110 and client 130 .
- Embodiments of the invention provide a novel approach for the optimum selection of protocol options during option negotiation by using reinforcement learning and propagation.
- FIG. 2 is a block diagram illustrating one embodiment of a system 200 for providing optimal option selection for TFTP.
- a TFTP server 210 interacts with an environment 230 using a trial-and-error strategy by providing different options.
- the environment 230 includes a file transfer component 240 of the TFTP server 210 , along with a network environment 235 (switches, network drivers, etc.) and one or more TFTP clients 220 .
- the option negotiation component 215 of TFTP server 210 is outside of and interacts with the environment 230 .
- the TFTP server 210 receives performance feedback for the different options as rewards, and improves its decision-making policy for option negotiation based on these past experiences and resulting rewards.
- the TFTP server 210 may optionally upload the decision-making policy along with the observed configurations of the specific environment to a centralized place (e.g., an electronic library). Other TFTP servers 210 may then download the resources and use the policy for the most similar environment to start their own trial-and-error learning process.
- option negotiation via a decision-making process in uncertain environments is accomplished by applying a Q-learning method.
- an option negotiation component 215 of the TFTP server 210 may be utilized as an intelligent agent that interacts with the environment 230 .
- the option negotiation component 215 provides the trial options for various environments 230 and receives the rewards as feedback.
- the option negotiation component 215 then utilized reinforcement learning to come to the optimal option selection for any particular environment 230 .
- the option negotiation component 215 may be in a certain state s t at a time t.
- the state is used to describe the specific status of the current system, namely the pending file transfer requests and existing transfer sessions along with the options of the sessions. State transitions may occur whenever a new request is received, new sessions are created, or old sessions are ended.
- the option negotiation component 215 may choose an action a t from the action set allowed in the state D (s t ). For most of the states where there are no pending file transfer requests, only a null action is allowed. For the states where there are new file transfer requests, the action set includes all of the legal options the TFTP server 210 may respond with.
- a reward r t is received describing the utility that the option negotiation component 215 obtains. In some embodiments, a reward may refer to the data transferred at that time plus any penalties incurred, such as those caused by a timeout, session failure, etc.
- the state transitions are assumed to depend on the action probabilistically according to an unknown distribution P(s t+1
- the rewards are assumed to depend on the state the agent resides and the action it takes probabilistically according to an unknown distribution P(r t+1
- the goal of the option negotiation component 215 is to decide appropriate actions to maximize the performance of a file transfer, i.e., to choose appropriate actions to maximize the discounted returns during an infinite long run. This may be demonstrated as:
- a Q-function may be introduced that is the expected return of an action a at a state s with respect to a policy ⁇ as:
- the policy ⁇ denotes the probability distribution of choosing actions at the various states.
- Capital letters, such as S, A are used to denote the random variables, and lower case letters, such as s, a, are used to denote the value of the random variables.
- the Q-learning algorithm is a standard approach of reinforcement learning that iteratively calculates the value functions of the optimal policy.
- ⁇ circumflex over (Q) ⁇ (s, a) denote the estimated Q function of the optimal policy.
- These values may then either be stored as a lookup table, or approximated by functions h(s, a, w) with w as parameters (e.g., a linear function of features implied in the states s and the actions a, or more sophisticated function approximators).
- the Q-learning algorithm works as follows:
- FIG. 3 is a block diagram of one embodiment of the application of option optimization using reinforcement learning, such as the Q-learning algorithm, in a system 300 .
- the components of system 300 interact together to utilize various embodiments of the invention.
- the components of system 300 include an option provider 310 , a file transfer component 320 , and a Q-function update component. In one embodiment, these components are included as part of TFTP server 210 , described with respect to FIG. 2 .
- option provider 310 receives file transfer requests.
- Option provider may associate the environment of the file transfer requests with, for example, Q values related to a Q-learning algorithm.
- Option provider 310 may then select options for the environment based on the Q values. These selected options, as well as the file transfer requests, are sent to the file transfer component 320 .
- File transfer component 320 transfers data associated with the file transfer requests.
- File transfer component 320 also sends feedback, or rewards, to Q function update component 330 .
- Q function update component may modify its Q values that is provides to option provider 310 based on the rewards received from file transfer component 320 .
- the components of system 300 utilize a Q-learning algorithm, such as that described above.
- the initial Q function values may be randomized if there is no further information available.
- the server may select the policy of the most similar environment by comparing the observed configurations to initialize the Q function.
- the estimated Q function converges to the values of the optimal policy when the parameters are controlled in an appropriate manner.
- the action selected in step 2 of the algorithm may be optimal when k gets larger after a certain number of iterations.
- FIG. 4 is a flow diagram illustrating a method of one embodiment of the invention.
- Process 400 provides a method for optimization of network protocol options with reinforcement learning and propagation.
- the process 400 begins at processing block 410 where a learning component of a TFTP server interacts with clients, as well as with the environment, by conducting different trials of various TFTP options in different states. Then, at processing block 420 , the learning component of the TFTP server receives performance feedback for these trials as rewards.
- the learning component of the TFTP server utilizes the past trials and resulting rewards to improve its decision-making policy for option negotiation.
- a reinforcement learning algorithm is used to improve the decision-making policy.
- the reinforcement algorithm may be a Q-learning algorithm.
- the learned policies for various option implementation decisions are uploaded, along with the observed configurations of the environment, to a centralized place (e.g., an electronic library). Then, at processing block 450 , other TFTP servers may then download the resources and use the policy of the most similar environment as the initial point to start a new learning process in their environments.
- components of the TFTP server or other clients may utilize various electronic systems to perform embodiments of the invention.
- the electronic system 500 illustrated in FIG. 5 is intended to represent a range of electronic systems, for example, computer systems, network access devices, etc. Alternative systems, whether electronic or non-electronic, can include more, fewer and/or different components.
- Electronic system 500 includes bus 501 or other communication device to communicate information, and processor 502 coupled to bus 501 to process information.
- one or more lines of bus 501 are optical fibers that carry optical signals between components of electronic system 500 .
- One or more of the components of electronic system 500 having optical transmission and/or optical reception functionality can include an optical modulator and bias circuit as described in embodiments of the invention.
- Electronic system 500 is illustrated with a single processor, electronic system 500 can include multiple processors and/or co-processors.
- Electronic system 500 further includes random access memory (RAM) or other dynamic storage device 504 (referred to as memory), coupled to bus 501 to store information and instructions to be executed by processor 502 .
- RAM random access memory
- Memory 504 also can be used to store temporary variables or other intermediate information during execution of instructions by processor 502 .
- Electronic system 500 also includes read only memory (ROM) and/or other static storage device 506 coupled to bus 501 to store static information and instructions for processor 502 .
- Data storage device 507 is coupled to bus 501 to store information and instructions.
- Data storage device 507 such as a magnetic disk or optical disc and corresponding drive can be coupled to electronic system 500 .
- Electronic system 500 can also be coupled via bus 501 to display device 521 , such as a cathode ray tube (CRT) or liquid crystal display (LCD), to display information to a computer user.
- display device 521 such as a cathode ray tube (CRT) or liquid crystal display (LCD)
- Alphanumeric input device 522 is typically coupled to bus 501 to communicate information and command selections to processor 502 .
- cursor control 523 is Another type of user input device, such as a mouse, a trackball, or cursor direction keys to communicate direction information and command selections to processor 502 and to control cursor movement on display 521 .
- Electronic system 500 further includes network interface 530 to provide access to a network, such as a local area network.
- Instructions are provided to memory from a storage device, such as magnetic disk, a read-only memory (ROM) integrated circuit, CD-ROM, DVD, via a remote connection (e.g., over a network via network interface 530 ) that is either wired or wireless providing access to one or more electronically-accessible media, etc.
- a storage device such as magnetic disk, a read-only memory (ROM) integrated circuit, CD-ROM, DVD
- a remote connection e.g., over a network via network interface 530
- hard-wired circuitry can be used in place of or in combination with software instructions.
- execution of sequences of instructions is not limited to any specific combination of hardware circuitry and software instructions.
- Embodiments of the invention provide numerous advantages over prior art solutions, including: (1) dynamically deciding TFTP option to optimize the network performance according to the environment; (2) adaptive, self-learning approach for option optimization; and (3) information propagation of learned strategies in different environments for future reuse.
- embodiments of the invention provide a self-learning, self-adapting, and self-distributing system seamlessly integrated into standard TFTP without impacting current protocol options and capabilities.
- embodiments of the invention may potentially be applied to other network transportation protocols, such as file transfer protocol (FTP).
- FTP file transfer protocol
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Computer Security & Cryptography (AREA)
- Computer And Data Communications (AREA)
- Telephonic Communication Services (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
- Information Transfer Between Computers (AREA)
- Communication Control (AREA)
Abstract
Description
- The embodiments of the invention relate generally to the field of network communication and, more specifically, relate to optimization of network protocol options by reinforcement learning and propagation.
- Trivial file transfer protocol (TFTP) is a simple user datagram protocol (UDP)-based file transfer program that is frequently used in pre-boot environments. For example, TFTP is widely used in image provisioning to allow diskless hosts to boot over the network.
- TFTP provides extensive options, such as block size of data packets and multicast provisioning, which may be applied in order to achieve better performance. For instance, a larger value block size may result in better transfer performance (e.g., a session with the block size of 32 KB results in a 700% increased performance gain over a session with the block size of 512 B in certain 100 Mbps environments). Multicasting enables simultaneous provisioning to multiple clients.
- When a TFTP server receives requests from clients, simple negotiations are conducted in which the TFTP server may select appropriate option values as responses. After the negotiation, TFTP sessions are created and the files are transferred according to the selected options of the sessions. However, TFTP option selection presents problems in the area of optimizing and propagation of these options in different network environments for performance enhancement. The effectiveness of the TFTP options is highly dependent on the specific network environments. Some affecting factors on performance include, but are not limited to: network topology, switches and their configurations, network drivers, and implementation of the TFTP clients.
- In some cases, TFTP options that could lead to high performance in some environments may be risky in other environments, possibly even causing failures. One example is that a single session of a block size of 32 KB may fail on one type of switch, while a block size of 16 KB may succeed on the same switch with acceptable performance. Another example is that a single multicast session of a block size of 32 KB on an older driver version of a certain Ethernet adapter in a 1 Gbps environment may fail, while reducing the block size or replacing an updated version of the driver will succeed. These issues become more serious when the environments are complicated.
- For instance, complicated environments may include infrastructures having connectors with hubs, a mix of both 1 Gbps connections and 100 Mbps connections, implementations of UDP multicast of different switches, multiple sessions occurring simultaneously but starting and ending at different times, specific TFTP clients not perfectly implemented due to pre-boot limitations, etc. There are no obvious rules or guidelines that uniformly work in these different environments. Therefore, under current TFTP implementations, it is difficult for a TFTP server to make optimal decisions during option negotiation that can both achieve a high performance and ensure success of a file transfer.
- The invention will be understood more fully from the detailed description given below and from the accompanying drawings of various embodiments of the invention. The drawings, however, should not be taken to limit the invention to the specific embodiments, but are for explanation and understanding only.
-
FIG. 1 is a block diagram of one embodiment of an exemplary network system to perform embodiments of the invention; -
FIG. 2 is a block diagram of one embodiment of a network environment for providing optimal option selection for trivial file transfer protocol (TFTP); -
FIG. 3 is a block diagram of one embodiment of an application of option optimization using reinforcement learning; -
FIG. 4 is a flow diagram depicting a method of one embodiment of the invention; and -
FIG. 5 illustrates a block diagram of one embodiment of an electronic system to perform various embodiments of the invention. - An apparatus and method for optimization of network protocol options by reinforcement learning and propagation are disclosed. Reference in the specification to “one embodiment” or “an embodiment” means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the invention. The appearances of the phrase “in one embodiment” in various places in the specification are not necessarily all referring to the same embodiment.
- In the following description, numerous details are set forth. It will be apparent, however, to one skilled in the art, that the embodiments of the invention may be practiced without these specific details. In other instances, well-known structures and devices are shown in block diagram form, rather than in detail, in order to avoid obscuring the invention.
- Embodiments of the present invention describe a method and respective circuit for optimization of network protocol options by reinforcement learning and propagation. More specifically, embodiments of the invention provide a novel approach to trivial file transfer protocol (TFTP) option negotiation and selection using reinforcement learning and propagation.
-
FIG. 1 is a block diagram illustrating one embodiment of an exemplary network system to perform embodiments of the invention.System 100 includes aTFTP server 110, anetwork 120, and aclient 130.TFTP server 110 may listen overnetwork 120 for connection requests fromclient 130.Client 130 may make a connection to theTFTP server 110. Once connected,client 130 and TFTP sever 1100 may communicate via the TFTP. For instance,client 130 may do a number of file manipulation operations such as uploading files to theTFTP server 110, download files to theTFTP server 110, and so on. In other embodiments, one skilled in the art will appreciate that a server other than a TFTP server communicating via the TFTP (e.g., FTP server) may be utilized. - Additionally,
TFTP server 110 andclient 130 may further enter into option negotiations. During option negotiations, options to enhance and modify the functionality of the TFTP may be selected and enacted between theTFTP server 110 andclient 130. Embodiments of the invention provide a novel approach for the optimum selection of protocol options during option negotiation by using reinforcement learning and propagation. -
FIG. 2 is a block diagram illustrating one embodiment of asystem 200 for providing optimal option selection for TFTP. In one embodiment, aTFTP server 210 interacts with anenvironment 230 using a trial-and-error strategy by providing different options. In one embodiment, theenvironment 230 includes afile transfer component 240 of theTFTP server 210, along with a network environment 235 (switches, network drivers, etc.) and one ormore TFTP clients 220. Theoption negotiation component 215 ofTFTP server 210 is outside of and interacts with theenvironment 230. - In one embodiment, the TFTP
server 210 receives performance feedback for the different options as rewards, and improves its decision-making policy for option negotiation based on these past experiences and resulting rewards. In some embodiments, theTFTP server 210 may optionally upload the decision-making policy along with the observed configurations of the specific environment to a centralized place (e.g., an electronic library).Other TFTP servers 210 may then download the resources and use the policy for the most similar environment to start their own trial-and-error learning process. In some embodiments, option negotiation via a decision-making process in uncertain environments is accomplished by applying a Q-learning method. - In one embodiment, an
option negotiation component 215 of theTFTP server 210 may be utilized as an intelligent agent that interacts with theenvironment 230. Theoption negotiation component 215 provides the trial options forvarious environments 230 and receives the rewards as feedback. Theoption negotiation component 215 then utilized reinforcement learning to come to the optimal option selection for anyparticular environment 230. - In some embodiments, the
option negotiation component 215 may be in a certain state st at a time t. The state is used to describe the specific status of the current system, namely the pending file transfer requests and existing transfer sessions along with the options of the sessions. State transitions may occur whenever a new request is received, new sessions are created, or old sessions are ended. - At state st, the
option negotiation component 215 may choose an action at from the action set allowed in the state D (st). For most of the states where there are no pending file transfer requests, only a null action is allowed. For the states where there are new file transfer requests, the action set includes all of the legal options theTFTP server 210 may respond with. At each time step t, a reward rt is received describing the utility that theoption negotiation component 215 obtains. In some embodiments, a reward may refer to the data transferred at that time plus any penalties incurred, such as those caused by a timeout, session failure, etc. - In one embodiment, the state transitions are assumed to depend on the action probabilistically according to an unknown distribution P(st+1|st, at) of the specific network environment. The rewards are assumed to depend on the state the agent resides and the action it takes probabilistically according to an unknown distribution P(rt+1|st, at, st+1) of the specific network environment.
- The goal of the
option negotiation component 215 is to decide appropriate actions to maximize the performance of a file transfer, i.e., to choose appropriate actions to maximize the discounted returns during an infinite long run. This may be demonstrated as: -
- In one embodiment, in order to resolve the problem, a Q-function may be introduced that is the expected return of an action a at a state s with respect to a policy π as:
-
- The policy π denotes the probability distribution of choosing actions at the various states. Capital letters, such as S, A, are used to denote the random variables, and lower case letters, such as s, a, are used to denote the value of the random variables.
- The Q-function of the optimal policy π* satisfies the following Bellman optimal equation:
-
- The Q-learning algorithm is a standard approach of reinforcement learning that iteratively calculates the value functions of the optimal policy. Under the Q-learning algorithm, let {circumflex over (Q)}·(s, a) denote the estimated Q function of the optimal policy. These values may then either be stored as a lookup table, or approximated by functions h(s, a, w) with w as parameters (e.g., a linear function of features implied in the states s and the actions a, or more sophisticated function approximators).
- In one embodiment, the Q-learning algorithm works as follows:
- 1. Initialize {circumflex over (Q)}·(s, a).
- 2. t←0, k←1, start from s0.
- 3. Select an action at according the distribution
-
P(A t =a t |S t =s t)∝k {circumflex over (Q)}·(st , at ), - and transit to the state st+1, and receive the immediate reward rt+1.
- 4. Update the estimated Q function with a sample backup strategy for the Bellman optimal equation
-
- 5. Increase k and t←t←1.
- 6. If the terminate condition is not met, go back to step 2.
- 7. Optionally retrieve the configurations of the environment and upload the policy (estimated Q function) to a centralized environment.
-
FIG. 3 is a block diagram of one embodiment of the application of option optimization using reinforcement learning, such as the Q-learning algorithm, in asystem 300. The components ofsystem 300 interact together to utilize various embodiments of the invention. The components ofsystem 300 include anoption provider 310, afile transfer component 320, and a Q-function update component. In one embodiment, these components are included as part ofTFTP server 210, described with respect toFIG. 2 . - In one embodiment,
option provider 310 receives file transfer requests. Option provider may associate the environment of the file transfer requests with, for example, Q values related to a Q-learning algorithm.Option provider 310 may then select options for the environment based on the Q values. These selected options, as well as the file transfer requests, are sent to thefile transfer component 320. -
File transfer component 320, in turn, transfers data associated with the file transfer requests.File transfer component 320 also sends feedback, or rewards, to Qfunction update component 330. Q function update component may modify its Q values that is provides to optionprovider 310 based on the rewards received fromfile transfer component 320. - In some embodiments, the components of
system 300 utilize a Q-learning algorithm, such as that described above. In the initialization stage (e.g., step 1) of the above algorithm, the initial Q function values may be randomized if there is no further information available. However, if the server is able to download resources from the centralized environment, the server may select the policy of the most similar environment by comparing the observed configurations to initialize the Q function. - When the values of the estimated Q function are stored with a lookup table, the estimated Q function converges to the values of the optimal policy when the parameters are controlled in an appropriate manner. The action selected in step 2 of the algorithm, may be optimal when k gets larger after a certain number of iterations.
-
FIG. 4 is a flow diagram illustrating a method of one embodiment of the invention.Process 400 provides a method for optimization of network protocol options with reinforcement learning and propagation. Theprocess 400 begins atprocessing block 410 where a learning component of a TFTP server interacts with clients, as well as with the environment, by conducting different trials of various TFTP options in different states. Then, atprocessing block 420, the learning component of the TFTP server receives performance feedback for these trials as rewards. - At
processing block 430, the learning component of the TFTP server utilizes the past trials and resulting rewards to improve its decision-making policy for option negotiation. In some embodiments, a reinforcement learning algorithm is used to improve the decision-making policy. In one embodiment, the reinforcement algorithm may be a Q-learning algorithm. - At
processing block 440, the learned policies for various option implementation decisions are uploaded, along with the observed configurations of the environment, to a centralized place (e.g., an electronic library). Then, atprocessing block 450, other TFTP servers may then download the resources and use the policy of the most similar environment as the initial point to start a new learning process in their environments. - One skilled in the art will appreciate the embodiments of the present invention may be applied to communication protocols other than TFTP, and the present descriptions are not intended to limit the application of the various embodiments to solely TFTP.
- In some embodiments, components of the TFTP server or other clients may utilize various electronic systems to perform embodiments of the invention. The
electronic system 500 illustrated inFIG. 5 is intended to represent a range of electronic systems, for example, computer systems, network access devices, etc. Alternative systems, whether electronic or non-electronic, can include more, fewer and/or different components. -
Electronic system 500 includesbus 501 or other communication device to communicate information, andprocessor 502 coupled tobus 501 to process information. In one embodiment, one or more lines ofbus 501 are optical fibers that carry optical signals between components ofelectronic system 500. One or more of the components ofelectronic system 500 having optical transmission and/or optical reception functionality can include an optical modulator and bias circuit as described in embodiments of the invention. - While
electronic system 500 is illustrated with a single processor,electronic system 500 can include multiple processors and/or co-processors.Electronic system 500 further includes random access memory (RAM) or other dynamic storage device 504 (referred to as memory), coupled tobus 501 to store information and instructions to be executed byprocessor 502.Memory 504 also can be used to store temporary variables or other intermediate information during execution of instructions byprocessor 502. -
Electronic system 500 also includes read only memory (ROM) and/or otherstatic storage device 506 coupled tobus 501 to store static information and instructions forprocessor 502.Data storage device 507 is coupled tobus 501 to store information and instructions.Data storage device 507 such as a magnetic disk or optical disc and corresponding drive can be coupled toelectronic system 500. -
Electronic system 500 can also be coupled viabus 501 to displaydevice 521, such as a cathode ray tube (CRT) or liquid crystal display (LCD), to display information to a computer user.Alphanumeric input device 522, including alphanumeric and other keys, is typically coupled tobus 501 to communicate information and command selections toprocessor 502. Another type of user input device iscursor control 523, such as a mouse, a trackball, or cursor direction keys to communicate direction information and command selections toprocessor 502 and to control cursor movement ondisplay 521.Electronic system 500 further includesnetwork interface 530 to provide access to a network, such as a local area network. - Instructions are provided to memory from a storage device, such as magnetic disk, a read-only memory (ROM) integrated circuit, CD-ROM, DVD, via a remote connection (e.g., over a network via network interface 530) that is either wired or wireless providing access to one or more electronically-accessible media, etc. In alternative embodiments, hard-wired circuitry can be used in place of or in combination with software instructions. Thus, execution of sequences of instructions is not limited to any specific combination of hardware circuitry and software instructions.
- Embodiments of the invention provide numerous advantages over prior art solutions, including: (1) dynamically deciding TFTP option to optimize the network performance according to the environment; (2) adaptive, self-learning approach for option optimization; and (3) information propagation of learned strategies in different environments for future reuse.
- In addition, embodiments of the invention provide a self-learning, self-adapting, and self-distributing system seamlessly integrated into standard TFTP without impacting current protocol options and capabilities. One skilled in the art will appreciate that embodiments of the invention may potentially be applied to other network transportation protocols, such as file transfer protocol (FTP).
- Whereas many alterations and modifications of the present invention will no doubt become apparent to a person of ordinary skill in the art after having read the foregoing description, it is to be understood that any particular embodiment shown and described by way of illustration is in no way intended to be considered limiting. Therefore, references to details of various embodiments are not intended to limit the scope of the claims, which in themselves recite only those features regarded as the invention.
Claims (20)
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2006/000545 WO2007109923A1 (en) | 2006-03-29 | 2006-03-29 | Optimization of network protocol options by reinforcement learning and propagation |
Publications (2)
Publication Number | Publication Date |
---|---|
US20090187641A1 true US20090187641A1 (en) | 2009-07-23 |
US8438248B2 US8438248B2 (en) | 2013-05-07 |
Family
ID=38540777
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US10/591,378 Expired - Fee Related US8438248B2 (en) | 2006-03-29 | 2006-03-29 | Optimization of network protocol options by reinforcement learning and propagation |
Country Status (6)
Country | Link |
---|---|
US (1) | US8438248B2 (en) |
JP (1) | JP4825270B2 (en) |
CN (1) | CN101416466B (en) |
DE (1) | DE112006003821B4 (en) |
GB (1) | GB2450257B (en) |
WO (1) | WO2007109923A1 (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050268146A1 (en) * | 2004-05-14 | 2005-12-01 | International Business Machines Corporation | Recovery in a distributed stateful publish-subscribe system |
US20080209440A1 (en) * | 2004-05-07 | 2008-08-28 | Roman Ginis | Distributed messaging system supporting stateful subscriptions |
US20080244025A1 (en) * | 2004-05-07 | 2008-10-02 | Roman Ginis | Continuous feedback-controlled deployment of message transforms in a distributed messaging system |
US20080239951A1 (en) * | 2006-06-27 | 2008-10-02 | Robert Evan Strom | Method for applying stochastic control optimization for messaging systems |
US20090141737A1 (en) * | 2007-11-30 | 2009-06-04 | Texas Instruments Incorporated | Systems and methods for prioritized channel access hardware assistance design |
US20120030150A1 (en) * | 2010-07-29 | 2012-02-02 | Telcordia Technologies, Inc. | Hybrid Learning Component for Link State Routing Protocols |
US20120233348A1 (en) * | 2011-03-09 | 2012-09-13 | Derek Alan Winters | Dual-mode download manager |
US20130031036A1 (en) * | 2011-07-25 | 2013-01-31 | Fujitsu Limited | Parameter setting apparatus, non-transitory medium storing computer program, and parameter setting method |
US20130122885A1 (en) * | 2011-11-14 | 2013-05-16 | Fujitsu Limited | Parameter setting apparatus and parameter setting method |
US20180164756A1 (en) * | 2016-12-14 | 2018-06-14 | Fanuc Corporation | Control system and machine learning device |
WO2018110985A1 (en) * | 2016-12-15 | 2018-06-21 | Samsung Electronics Co., Ltd. | Method and apparatus for automated decision making |
CN114356535A (en) * | 2022-03-16 | 2022-04-15 | 北京锦诚世纪咨询服务有限公司 | Resource management method and device for wireless sensor network |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101702743A (en) | 2009-11-04 | 2010-05-05 | 中兴通讯股份有限公司 | Self-adaption adjustment method of mobile terminal browser and device thereof |
US8769137B2 (en) * | 2011-06-23 | 2014-07-01 | Honeywell International Inc. | Systems and methods for negotiated accelerated block option for trivial file transfer protocol (TFTP) |
JP6898479B2 (en) * | 2016-02-05 | 2021-07-07 | ファナック株式会社 | Machine learning device, numerical control device, machine tool system, manufacturing system and machine learning method to learn the display of operation menu |
US10719777B2 (en) | 2016-07-28 | 2020-07-21 | At&T Intellectual Propery I, L.P. | Optimization of multiple services via machine learning |
US20180082210A1 (en) * | 2016-09-18 | 2018-03-22 | Newvoicemedia, Ltd. | System and method for optimizing communications using reinforcement learning |
US10536505B2 (en) * | 2017-04-30 | 2020-01-14 | Cisco Technology, Inc. | Intelligent data transmission by network device agent |
CN107367929B (en) * | 2017-07-19 | 2021-05-04 | 北京上格云技术有限公司 | Method for updating Q value matrix, storage medium and terminal equipment |
CN109587519B (en) * | 2018-12-28 | 2021-11-23 | 南京邮电大学 | Heterogeneous network multipath video transmission control system and method based on Q learning |
JP7272606B2 (en) * | 2020-02-20 | 2023-05-12 | 国立大学法人京都大学 | A control device, a base station equipped with the same, a program to be executed by a computer, and a computer-readable recording medium recording the program |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20020058532A1 (en) * | 1998-09-25 | 2002-05-16 | Snelgrove William Martin | Method and system for negotiating telecommunication resources |
US20030074338A1 (en) * | 2001-07-18 | 2003-04-17 | Young Peter M. | Control system and technique employing reinforcement learning having stability and learning phases |
US20030204615A1 (en) * | 2002-04-30 | 2003-10-30 | Yongbin Wei | Outer-loop scheduling design for communication systems with channel quality feedback mechanisms |
US20040120011A1 (en) * | 2002-12-20 | 2004-06-24 | Double Floyd C. | Alternative to pre-printed special forms |
US20040133599A1 (en) * | 2003-01-03 | 2004-07-08 | Microsoft Corporation | System and method for improved client server communications of email messages |
US20040141525A1 (en) * | 2003-01-21 | 2004-07-22 | Naga Bhushan | Power boosting in a wireless communication system |
US20050030903A1 (en) * | 2003-08-05 | 2005-02-10 | Djamal Al-Zain | Determining a transmission parameter in a transmission system |
US20050193136A1 (en) * | 2004-02-27 | 2005-09-01 | International Business Machines Corporation | Server-side protocol configuration of accessing clients |
US20050251516A1 (en) * | 1997-12-31 | 2005-11-10 | International Business Machines Corporation | Methods and apparatus for high-speed access to and sharing of storage devices on a networked digital data processing system |
US7013238B1 (en) * | 2003-02-24 | 2006-03-14 | Microsoft Corporation | System for delivering recommendations |
US20060171356A1 (en) * | 2005-02-01 | 2006-08-03 | Mehmet Gurelli | Method and apparatus for controlling a transmission data rate based on feedback relating to channel conditions |
US20060274899A1 (en) * | 2005-06-03 | 2006-12-07 | Innomedia Pte Ltd. | System and method for secure messaging with network address translation firewall traversal |
US20070058669A1 (en) * | 2003-08-01 | 2007-03-15 | Fg Microtec Gmbh | Distributed quality-of-service management system |
US20070299915A1 (en) * | 2004-05-02 | 2007-12-27 | Markmonitor, Inc. | Customer-based detection of online fraud |
US7478160B2 (en) * | 2004-04-30 | 2009-01-13 | International Business Machines Corporation | Method and apparatus for transparent negotiations |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US1659800A (en) * | 1927-01-28 | 1928-02-21 | Perle A Bailey | Shaving comport |
JPH06164672A (en) * | 1992-11-19 | 1994-06-10 | Toshiba Corp | Data communication system |
JPH1098502A (en) * | 1996-09-20 | 1998-04-14 | Fujitsu Ltd | Mobile data communication system |
JP2000250875A (en) * | 1999-02-26 | 2000-09-14 | Toshiba Corp | Boot program distributor and computer readable storage medium stored with the program |
JP2001136173A (en) * | 1999-11-02 | 2001-05-18 | Matsushita Electric Ind Co Ltd | Communication method for wireless home network and wireless home network system |
JP4523694B2 (en) * | 2000-03-21 | 2010-08-11 | アンリツ株式会社 | Information processing system |
JP2001339591A (en) * | 2000-05-25 | 2001-12-07 | Murata Mach Ltd | Communication terminal |
JP2003030067A (en) * | 2001-07-12 | 2003-01-31 | Fujitsu Ltd | Communication managing device, communication system, communication management program and communication program |
CN1169332C (en) * | 2002-09-29 | 2004-09-29 | 清华大学 | Method for selecting transmission protocol based on client terminal feedback |
JP2005352639A (en) * | 2004-06-09 | 2005-12-22 | Nec Corp | Access support server, system, method and program |
JP2006035388A (en) * | 2004-07-28 | 2006-02-09 | Riyuukoku Univ | Learning device, operating object equipped with learning device, learning method, learning program, and program-recording medium readable by computer |
-
2006
- 2006-03-29 US US10/591,378 patent/US8438248B2/en not_active Expired - Fee Related
- 2006-03-29 GB GB0812411.7A patent/GB2450257B/en not_active Expired - Fee Related
- 2006-03-29 WO PCT/CN2006/000545 patent/WO2007109923A1/en active Application Filing
- 2006-03-29 JP JP2008552663A patent/JP4825270B2/en not_active Expired - Fee Related
- 2006-03-29 DE DE200611003821 patent/DE112006003821B4/en not_active Expired - Fee Related
- 2006-03-29 CN CN200680054135.XA patent/CN101416466B/en not_active Expired - Fee Related
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050251516A1 (en) * | 1997-12-31 | 2005-11-10 | International Business Machines Corporation | Methods and apparatus for high-speed access to and sharing of storage devices on a networked digital data processing system |
US20020058532A1 (en) * | 1998-09-25 | 2002-05-16 | Snelgrove William Martin | Method and system for negotiating telecommunication resources |
US20030074338A1 (en) * | 2001-07-18 | 2003-04-17 | Young Peter M. | Control system and technique employing reinforcement learning having stability and learning phases |
US20030204615A1 (en) * | 2002-04-30 | 2003-10-30 | Yongbin Wei | Outer-loop scheduling design for communication systems with channel quality feedback mechanisms |
US20040120011A1 (en) * | 2002-12-20 | 2004-06-24 | Double Floyd C. | Alternative to pre-printed special forms |
US20040133599A1 (en) * | 2003-01-03 | 2004-07-08 | Microsoft Corporation | System and method for improved client server communications of email messages |
US20040141525A1 (en) * | 2003-01-21 | 2004-07-22 | Naga Bhushan | Power boosting in a wireless communication system |
US7013238B1 (en) * | 2003-02-24 | 2006-03-14 | Microsoft Corporation | System for delivering recommendations |
US20070058669A1 (en) * | 2003-08-01 | 2007-03-15 | Fg Microtec Gmbh | Distributed quality-of-service management system |
US20050030903A1 (en) * | 2003-08-05 | 2005-02-10 | Djamal Al-Zain | Determining a transmission parameter in a transmission system |
US20050193136A1 (en) * | 2004-02-27 | 2005-09-01 | International Business Machines Corporation | Server-side protocol configuration of accessing clients |
US7478160B2 (en) * | 2004-04-30 | 2009-01-13 | International Business Machines Corporation | Method and apparatus for transparent negotiations |
US20070299915A1 (en) * | 2004-05-02 | 2007-12-27 | Markmonitor, Inc. | Customer-based detection of online fraud |
US20060171356A1 (en) * | 2005-02-01 | 2006-08-03 | Mehmet Gurelli | Method and apparatus for controlling a transmission data rate based on feedback relating to channel conditions |
US20060274899A1 (en) * | 2005-06-03 | 2006-12-07 | Innomedia Pte Ltd. | System and method for secure messaging with network address translation firewall traversal |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8533742B2 (en) | 2004-05-07 | 2013-09-10 | International Business Machines Corporation | Distributed messaging system supporting stateful subscriptions |
US20080209440A1 (en) * | 2004-05-07 | 2008-08-28 | Roman Ginis | Distributed messaging system supporting stateful subscriptions |
US20080244025A1 (en) * | 2004-05-07 | 2008-10-02 | Roman Ginis | Continuous feedback-controlled deployment of message transforms in a distributed messaging system |
US7962646B2 (en) | 2004-05-07 | 2011-06-14 | International Business Machines Corporation | Continuous feedback-controlled deployment of message transforms in a distributed messaging system |
US7886180B2 (en) | 2004-05-14 | 2011-02-08 | International Business Machines Corporation | Recovery in a distributed stateful publish-subscribe system |
US20050268146A1 (en) * | 2004-05-14 | 2005-12-01 | International Business Machines Corporation | Recovery in a distributed stateful publish-subscribe system |
US20080239951A1 (en) * | 2006-06-27 | 2008-10-02 | Robert Evan Strom | Method for applying stochastic control optimization for messaging systems |
US7792038B2 (en) * | 2006-06-27 | 2010-09-07 | International Business Machines Corporation | Method for applying stochastic control optimization for messaging systems |
US20090141737A1 (en) * | 2007-11-30 | 2009-06-04 | Texas Instruments Incorporated | Systems and methods for prioritized channel access hardware assistance design |
US20120030150A1 (en) * | 2010-07-29 | 2012-02-02 | Telcordia Technologies, Inc. | Hybrid Learning Component for Link State Routing Protocols |
US9118637B2 (en) * | 2011-03-09 | 2015-08-25 | Arris Enterprises, Inc. | Dual-mode download manager |
US20120233348A1 (en) * | 2011-03-09 | 2012-09-13 | Derek Alan Winters | Dual-mode download manager |
US9807148B2 (en) | 2011-03-09 | 2017-10-31 | Arris Enterprises Llc | Dual-mode download manager |
US20130031036A1 (en) * | 2011-07-25 | 2013-01-31 | Fujitsu Limited | Parameter setting apparatus, non-transitory medium storing computer program, and parameter setting method |
US9002757B2 (en) * | 2011-07-25 | 2015-04-07 | Fujitsu Limited | Parameter setting apparatus, non-transitory medium storing computer program, and parameter setting method |
US20130122885A1 (en) * | 2011-11-14 | 2013-05-16 | Fujitsu Limited | Parameter setting apparatus and parameter setting method |
US8897767B2 (en) * | 2011-11-14 | 2014-11-25 | Fujitsu Limited | Parameter setting apparatus and parameter setting method |
US20180164756A1 (en) * | 2016-12-14 | 2018-06-14 | Fanuc Corporation | Control system and machine learning device |
US10564611B2 (en) * | 2016-12-14 | 2020-02-18 | Fanuc Corporation | Control system and machine learning device |
WO2018110985A1 (en) * | 2016-12-15 | 2018-06-21 | Samsung Electronics Co., Ltd. | Method and apparatus for automated decision making |
US11983647B2 (en) | 2016-12-15 | 2024-05-14 | Samsung Electronics Co., Ltd. | Method and apparatus for operating an electronic device based on a decision-making data structure using a machine learning data structure |
CN114356535A (en) * | 2022-03-16 | 2022-04-15 | 北京锦诚世纪咨询服务有限公司 | Resource management method and device for wireless sensor network |
Also Published As
Publication number | Publication date |
---|---|
WO2007109923A1 (en) | 2007-10-04 |
GB2450257B (en) | 2012-01-04 |
JP2009525643A (en) | 2009-07-09 |
US8438248B2 (en) | 2013-05-07 |
GB2450257A (en) | 2008-12-17 |
CN101416466A (en) | 2009-04-22 |
DE112006003821T5 (en) | 2009-01-15 |
JP4825270B2 (en) | 2011-11-30 |
CN101416466B (en) | 2014-05-28 |
DE112006003821B4 (en) | 2010-12-16 |
GB0812411D0 (en) | 2008-08-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8438248B2 (en) | Optimization of network protocol options by reinforcement learning and propagation | |
US8117258B2 (en) | Distributed computing by carrier-hosted agent | |
US7676582B2 (en) | Optimized desktop sharing viewer join | |
US7668903B2 (en) | Method and apparatus for dynamically delivering service profiles to clients | |
US7552213B2 (en) | Remote network node management system and method | |
US7194544B2 (en) | Method and system for dynamic protocol selection among object-handled specified protocols | |
AU2007214516A1 (en) | System and method for generating and executing a platform emulation based on a selected application | |
US20060206583A1 (en) | Framework for managing client application data in offline and online environments | |
JP2004186883A (en) | Control system and method, information processing apparatus and method, information processing terminal and method, recording medium, and program | |
JP2000232446A (en) | Data transfer method and device | |
EP1869554A2 (en) | System and method for managing software patches | |
US20030061361A1 (en) | System and methods for automatic negotiation in distributed computing | |
US8032834B2 (en) | Context-based user assistance | |
US20090077218A1 (en) | Software Method And System For Controlling And Observing Computer Networking Devices | |
US20040128114A1 (en) | Supervisory control system, supervisory control method, control program for controlled device | |
US20070094400A1 (en) | Software installation within a federation | |
EP1504339A1 (en) | Communication system and method with configurable posting points | |
US20080077704A1 (en) | Variable Electronic Communication Ping Time System and Method | |
JP4707973B2 (en) | Transaction process that provides rules for rule-based networks | |
US20070136301A1 (en) | Systems and methods for enforcing protocol in a network using natural language messaging | |
US7614058B2 (en) | System and method for virtual media command filtering | |
US11294773B2 (en) | Method, apparatus and computer program product for managing backup system | |
US20070136472A1 (en) | Systems and methods for requesting protocol in a network using natural language messaging | |
US20150120945A1 (en) | Push Channel Based Creation of Web-Based User Interface Sessions | |
JP2002342084A (en) | Software demonstration environment providing system and software demonstration environment providing method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: INTEL CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LI, CONG;XU, WENBIN;REEL/FRAME:018421/0578 Effective date: 20060728 |
|
FEPP | Fee payment procedure |
Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FPAY | Fee payment |
Year of fee payment: 4 |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20210507 |