[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

US20100169271A1 - File sharing method, computer system, and job scheduler - Google Patents

File sharing method, computer system, and job scheduler Download PDF

Info

Publication number
US20100169271A1
US20100169271A1 US12/388,174 US38817409A US2010169271A1 US 20100169271 A1 US20100169271 A1 US 20100169271A1 US 38817409 A US38817409 A US 38817409A US 2010169271 A1 US2010169271 A1 US 2010169271A1
Authority
US
United States
Prior art keywords
job
file
hosts
computing
processor
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US12/388,174
Other versions
US8442939B2 (en
Inventor
Takashi Yasui
Masaaki Shimizu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Assigned to HITACHI, LTD. reassignment HITACHI, LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: SHIMIZU, MASAAKI, YASUI, TAKASHI
Publication of US20100169271A1 publication Critical patent/US20100169271A1/en
Application granted granted Critical
Publication of US8442939B2 publication Critical patent/US8442939B2/en
Expired - Fee Related legal-status Critical Current
Adjusted expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/176Support for shared access to files; File sharing support
    • G06F16/1767Concurrency control, e.g. optimistic or pessimistic approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2147Locking files

Definitions

  • This invention relates to a technology of executing, while accessing a file, a job in a computer system including a plurality of computing hosts.
  • file staging In order to realize high-speed file access in a large-scale computer system, there is disclosed a technology called “file staging” in A. Uno, “Software of the Earth Simulator”, Journal of the Earth Simulator, Volume 3, September 2005, 52-59.
  • the file staging is a technology in which a file to be accessed for a job is transferred between a computing host and a login host before and after execution of the job. With the file staging, a job which is in execution in each computing host can be executed by just accessing a file stored in a local storage device provided for the computing host itself. Therefore, it is possible to realize high-speed file access.
  • shared file system As a technology of sharing the same file among a plurality of computing hosts in a large-scale computer system, there is disclosed a technology called “shared file system” in R. Sandberg, “The Sun Network Filesystem: Design, Implementation and Experience”, in Proceedings of the Summer 1986 USENIX Technical Conference and Exhibition.
  • the shared file system is a technology in which a file to be accessed for a job is shared between a computing host and a login host. With the shared file system, there is no need to transfer the file to be accessed for the job between the computing host and the login host.
  • each computing host is provided with a local file system. Accordingly, a file to be accessed for a job needs to be transferred to each computing host.
  • this invention has been made, and it is therefore an object of this invention to realize, for a computer system including a plurality of computing hosts, high-speed file sharing by sharing, at the time of execution of a job, a necessary file among computing hosts allocated to process the job.
  • a file sharing method used for a computer system which includes a plurality of computers, and which executes a job requested, the plurality of computers including a plurality of computing hosts which execute the job, each of the plurality of computing hosts comprising: a first interface coupled to another one of the plurality of computing hosts; a first processor coupled to the first interface; and a first memory coupled to the first processor, the file sharing method including the steps of, in a case where the job is executed by the plurality of computing hosts: sharing, by the first processor of the each of the plurality of computing hosts which execute the job, a file necessary for executing the job; executing, by the first processor of the each of the plurality of computing hosts which execute the job, the requested job through accessing the file; and canceling, by the first processor of the each of the plurality of computing hosts which execute the requested job, the sharing of the file after the executing of the requested job is completed.
  • FIG. 1 is a diagram illustrating an example of a configuration of a computer system according to a first embodiment of this invention
  • FIG. 2 is a diagram illustrating an example of a job according to the first embodiment of this invention
  • FIG. 3 is a diagram illustrating an example of a computing host management table according to the first embodiment of this invention.
  • FIG. 4 is a diagram illustrating an example of a file system configuration file according to the first embodiment of this invention.
  • FIG. 5 is a flow chart illustrating a procedure in which an execution request for a job is received from a user by a job reception module according to the first embodiment of this invention
  • FIG. 6 is a flow chart illustrating a procedure in which a job waiting for execution is allocated to the computing host by a job scheduling module according to the first embodiment of this invention
  • FIG. 7 is a flow chart illustrating a procedure in which allocation of the job to the computing hosts is canceled by the job scheduling module according to the first embodiment of this invention
  • FIG. 8 is a flow chart illustrating a procedure in which a request to construct a shared file system is made by a file system construction request module according to the first embodiment of this invention
  • FIG. 9 is a flow chart illustrating a procedure in which a request to destruct a shared file system is made by a file system destruction request module according to the first embodiment of this invention.
  • FIG. 10 is a flow chart illustrating a procedure in which a shared file system is constructed by a file system construction module according to the first embodiment of this invention
  • FIG. 11 is a flow chart illustrating a procedure in which a shared file system is destructed by a file system destruction module according to the first embodiment of this invention
  • FIG. 12 is a flow chart illustrating a procedure of a processing performed by a master server of the shared file system according to the first embodiment of this invention
  • FIG. 13 is a flow chart illustrating a procedure of a processing performed by a sub-server of the shared file system according to the first embodiment of this invention
  • FIG. 14 is a flow chart illustrating a procedure of a processing performed by a client of the shared file system according to the first embodiment of this invention.
  • FIG. 15 is a diagram illustrating an example of a configuration of a computer system according to a second embodiment of this invention.
  • FIG. 16 is a diagram illustrating an example of an I/O host management table according to the second embodiment of this invention.
  • FIG. 17 is a diagram illustrating an example of a file system configuration file according to the second embodiment of this invention.
  • FIG. 18 is a flow chart illustrating a procedure in which a job waiting for execution is allocated to a computing host and the I/O host by a job scheduling module according to the second embodiment of this invention
  • FIG. 19 is a flow chart illustrating a procedure in which allocation of a job to the computing hosts and the I/O hosts is canceled by a job scheduling module according to the second embodiment of this invention.
  • FIG. 20 is a flow chart illustrating a procedure in which a request to construct a shared file system is made by a file system construction request module according to the second embodiment of this invention
  • FIG. 21 is a flow chart illustrating a procedure in which a request to destruct the shared file system is made by a file system destruction request module according to the second embodiment of this invention.
  • FIG. 22 is a flow chart illustrating a procedure in which a shared file system is constructed by a file system construction module according to the second embodiment of this invention.
  • FIG. 23 is a flow chart illustrating a procedure in which a shared file system is destructed by a file system destruction module according to the second embodiment of this invention.
  • FIG. 24 is a flow chart illustrating a procedure of a processing performed by a master server of the shared file system according to the second embodiment of this invention.
  • FIG. 25 is a flow chart illustrating a procedure of the processing performed by a client of the shared file system according to the second embodiment of this invention.
  • FIG. 1 is a diagram illustrating an example of a configuration of a computer system according to a first embodiment of this invention.
  • the computer system includes a login host 110 and a plurality of computing hosts 120 .
  • the login host 110 is coupled to the computing hosts 120 via a network 100 .
  • the computing hosts 120 are coupled to one another via the network 100 as well.
  • the login host 110 receives a request for an execution of a job from a user.
  • the login host 110 also selects a computing host 120 which is to execute the requested job, and carries out such processing (job scheduling) as causing the selected computing host 120 to execute the job.
  • the login host 110 includes a CPU 111 , a memory 112 , a storage device 113 , and a network interface card (NIC) 114 .
  • NIC network interface card
  • the CPU 111 executes such processing as job scheduling.
  • the memory 112 stores a program to be executed by the CPU 111 , and data necessary for executing the program.
  • the memory 112 stores a program for performing the job scheduling, and management information of the computing hosts 120 .
  • the program and data stored in the memory 112 will be described later.
  • the storage device 113 stores programs and files. For example, a program stored in the storage device 113 is loaded into the memory 112 , and then executed by the CPU 111 .
  • the NIC 114 is an interface for establishing connection with the computing host 120 via the network 100 .
  • the computing host 120 executes a job allocated by the login host 110 .
  • a job is processed by a single computing host 120
  • a job is processed by a plurality of computing hosts 120 .
  • the processing capabilities of the computing hosts 120 included in the computer system are essentially the same. In a case where the processing capabilities are different from each other at the time of executing a job with a plurality of computing hosts 120 , there arises a fear that a computing host 120 inferior in processing capability becomes a bottleneck.
  • the computing host 120 includes a CPU 121 , a memory 122 , a storage device 123 , and an NIC 124 .
  • the CPU 121 By executing a program stored in the memory 122 , the CPU 121 processes an allocated job.
  • the memory 122 stores a program to be executed by the CPU 121 , and data necessary for executing the program.
  • the memory 122 stores a program for executing a job.
  • the program and data stored in the memory 122 will be described later.
  • the storage device 123 stores programs and files. For example, upon allocation of a job, a transferred program is stored. The stored program is loaded into the memory 122 , and then executed by the CPU 121 , whereby the allocated job is processed. The storage device 123 also stores a file necessary for executing a job, and constructs a shared file system.
  • the NIC 124 is an interface for establishing connection with the login host 110 and other computing hosts 120 via the network 100 .
  • the login host 110 stores a job 200 and a job scheduler 210 in the memory 112 .
  • the job 200 includes information which is necessary for executing computational processing requested from a user.
  • the job 200 includes, for example, data and a program for processing the data.
  • the job 200 is executed by a single computing host 120 or by a plurality of computing hosts 120 .
  • the job scheduler 210 is a program which is processed by the CPU 111 .
  • the job scheduler 210 performs management of the job 200 , such as allocating the job 200 to the computing host 120 , and canceling the allocation of the job 200 .
  • the job scheduler 210 includes a job reception module 211 , a job scheduling module 212 , a file system construction request module 213 , and a file system destruction request module 214 .
  • the job reception module 211 receives a request for an execution of a job from a user. Detailed description of the processing performed by the job reception module 211 will be made later with reference to FIG. 5 .
  • the job scheduling module 212 obtains a job stored in a job queue 215 , and then allocates the job to the computing host 120 which is to execute the job. Further, upon end of the execution of the job, the job scheduling module 212 cancels the allocation of the job to the computing host 120 . Detailed description of the processing performed by the job scheduling module 212 will be made later with reference to FIGS. 6 and 7 .
  • the file system construction request module 213 identifies computing hosts 120 which are to share a file with, and then requests the computing host 120 to construct a shared file system. Detailed description of the processing performed by the file system construction request module 213 will be made later with reference to FIG. 8 .
  • the file system destruction request module 214 makes a request for destruction of the constructed shared file system. Detailed description of the processing performed by the file system destruction request module 214 will be made later with reference to FIG. 9 .
  • the job scheduler 210 further includes the job queue 215 and a computing host management table 216 .
  • the job queue 215 temporarily stores the job 200 which the user requests to be processed until the processing starts.
  • the computing host management table 216 keeps identification information of the computing hosts 120 and allocation states of the job 200 among the computing hosts 120 . Detailed description of the configuration of the computing host management table 216 will be made later with reference to FIG. 3 .
  • the computing host 120 stores a program 220 and a file system program 230 in the memory 122 .
  • the program 220 is a program which is included in the job 200 allocated by the job scheduler 210 of the login host 110 . With the CPU 121 executing the program 220 , the job requested by the user is executed.
  • the file system program 230 is executed by the CPU 121 , whereby construction and destruction of a shared file system, and processing necessary for file access are performed.
  • the file system program 230 includes a file system construction module 231 , a file system destruction module 232 , a master server module 233 , a sub-server module 234 , and a client module 235 .
  • the file system program 230 further includes a file system configuration file 236 .
  • the file system configuration file 236 keeps information of the computing hosts 120 which constitute the shared file system.
  • the file system construction module 231 constructs a file system based on the file system configuration file 236 which has been received from the login host 110 . Detailed description of the processing performed by the file system construction module 231 will be made later with reference to FIG. 10 .
  • the file system destruction module 232 Upon notification of completion of a job from the login host 110 , the file system destruction module 232 destructs the constructed file system. Detailed description of the processing performed by the file system destruction module 232 will be made later with reference to FIG. 11 .
  • the shared file system is configured by a master server, a sub-server, and a client.
  • the computing host 120 executes at least one function from among the functions of the master server, the sub-server, and the client.
  • the client receives a file access request made by the job 200 .
  • the sub-server stores a file to be shared in the storage device 123 .
  • the master server manages a storage location of the file, and, in a case where an inquiry is made about the storage location of the file by the client, notifies which sub-server stores the file.
  • the program 220 included in the job 200 is executed by each of the computing hosts 120 . Therefore, the function of the client is executed by each computing host 120 .
  • the master server module 233 executes processing which, in the constructed shared file system, causes the computing host 120 to behave as a master server. Detailed description of the processing performed by the master server module 233 will be made later with reference to FIG. 12 .
  • the sub-server module 234 executes processing which, in the constructed shared file system, causes the computing host 120 to behave as a sub-server. Detailed description of the processing performed by the sub-server module 234 will be made later with reference to FIG. 13 .
  • the client module 235 executes processing which, in the constructed shared file system, causes the computing host 120 to behave as a client. Detailed description of the processing performed by the client module 235 will be made later with reference to FIG. 14 .
  • FIG. 2 is a diagram illustrating an example of the job 200 according to the first embodiment of this invention.
  • the job 200 includes, as described above, information necessary for executing the requested computational processing.
  • FIG. 2 illustrates a script file, which is one form of execution request for the job 200 .
  • an estimated execution time 300 there are defined an estimated execution time 300 , a number of computing hosts to be used 310 , a shared directory name 320 , a stage-in file list 330 , and a stage-out file list 340 .
  • the estimated execution time 300 represents an estimated value of a period of time required to process the job 200 .
  • the number of computing hosts to be used 310 represents the number of computing hosts 120 which are to process the job 200 .
  • the shared directory name 320 represents the name of a shared directory which stores a file shared among the computing hosts which process the job.
  • the stage-in file list 330 represents a list of files which are copy sources of the files stored in the shared directory. Upon construction of the shared file system, a directory which is identified by the shared directory name 320 is mounted, and then, a file specified by the stage-in file list 330 is duplicated.
  • the stage-out file list 340 represents a list of files to which a processing result of the job 200 is output. Upon completion of the execution of the job, a file specified by the stage-out file list 340 is transferred to the login host 110 .
  • FIG. 3 is a diagram illustrating an example of the computing host management table 216 according to the first embodiment of this invention.
  • the computing host management table 216 includes a computing host identifier 400 , a computing host name 410 , an IP address 420 , and an execution job identifier 430 .
  • the computing host identifier 400 represents an identifier for identifying the computing host 120 included in the computer system.
  • the computing host name 410 represents a name of the computing host 120 .
  • the IP address 420 represents an IP address of the computing host 120 .
  • the execution job identifier 430 represents an identifier of the job 200 executed by the computing host 120 . Referring to FIG. 3 , the job 200 having the execution job identifier 430 of “10” is executed by the computing hosts “comp 0 ” and “comp 1 ”.
  • FIG. 4 is a diagram illustrating an example of the file system configuration file 236 according to the first embodiment of this invention.
  • the login host 110 creates, based on the received execution request for the job 200 , the file system configuration file 236 which includes definition information of the shared file system.
  • the login host 110 transmits the created file system configuration file 236 to the computing host 120 , and gives an instruction to construct a shared file system.
  • the file system configuration file 236 includes a master server computing host name 500 , a sub-server computing host name list 510 , a client computing host name list 520 , and a shared directory name 530 .
  • the master server computing host name 500 represents a name of the computing host 120 which serves as the master server in the shared file system.
  • the sub-server computing host name list 510 represents a list of computing hosts 120 which serve as the sub-servers. In the sub-server computing host name list 510 , at least one name of the computing host 120 is described.
  • the client computing host name list 520 represents a list of computing hosts 120 which serve as the clients. In the client computing host name list 520 , at least one name of the computing host 120 is described.
  • the shared directory name 530 the name of a directory which stores the shared file is described.
  • the shared directory name 320 included in the execution request for the job 200 illustrated in FIG. 2 is set.
  • FIG. 5 is a flow chart illustrating a procedure in which an execution request for the job 200 is received from the user by the job reception module 211 according to the first embodiment of this invention.
  • the CPU 111 of the login host 110 receives the execution request for the job 200 from the user by executing the job reception module 211 (Step 600 ).
  • the CPU 111 of the login host 110 temporarily stores the job 200 received in the processing of Step 600 in the job queue 215 (Step 601 ).
  • FIG. 6 is a flow chart illustrating a procedure in which a job 200 waiting for execution is allocated to the computing host 120 by the job scheduling module 212 according to the first embodiment of this invention.
  • Step 700 the CPU 111 of the login host 110 judges whether or not any job 200 waiting for execution is stored in the job queue 215 (Step 700 ). In a case where no job 200 waiting for execution is stored in the job queue 215 (the result of Step 700 is “No”), there is no job 200 to be processed, and hence the CPU 111 of the login host 110 waits until a job 200 waiting for execution is stored in the job queue 215 .
  • Step 700 the CPU 111 of the login host 110 judges whether or not it is possible to assure a sufficient number of computing hosts 120 to execute the job 200 (Step 710 ).
  • Step 710 is processing in which it is judged whether or not the number of computing hosts 120 which are not executing a job is equal to or larger than the number of computing hosts 120 specified by the execution request for the job 200 .
  • the number of computing hosts 120 which are not executing a job can be obtained by referring to the computing host management table 216 , and counting the number of computing hosts 120 for which no value is registered as the execution job identifier 430 .
  • the number of computing hosts 120 specified by the execution request for the job 200 corresponds to a value which is set as the number of computing hosts to be used 310 illustrated in FIG. 2 .
  • the CPU 111 of the login host 110 obtains, from the computing host management table 216 , information of the computing hosts 120 which are to be allocated the job 200 , and then updates the computing host management table 216 (Step 720 ). Specifically, the CPU 111 of the login host 110 obtains a record associated with the computing host 120 which is to be allocated the job 200 , and then registers, as the execution job identifier 430 of the record, the identifier of the job 200 to be executed.
  • the CPU 111 of the login host 110 constructs a shared file system by executing the file system construction request module 213 (Step 721 ).
  • the CPU 111 of the login host 110 Upon constructing the shared file system, the CPU 111 of the login host 110 obtains the job 200 waiting for execution from the job queue 215 , and requests the computing host 120 to execute the obtained job 200 (Step 722 ).
  • FIG. 7 is a flow chart illustrating a procedure in which allocation of the job 200 to the computing hosts 120 is canceled by the job scheduling module 212 according to the first embodiment of this invention.
  • the CPU 111 of the login host 110 judges whether or not there is any job 200 which has been executed (Step 800 ). Completion of the job 200 may be judged by, for example, receiving a notification of execution completion of the job 200 from the computing host 120 , or making an inquiry to the computing host 120 periodically.
  • Step 800 the CPU 111 of the login host 110 waits until the job 200 has been executed.
  • Step 800 the CPU 111 of the login host 110 obtains, from the computing host management table 216 , information of the computing hosts 120 which have been allocated the executed job 200 (Step 810 ).
  • Step 810 the CPU 111 of the login host 110 executes the file system destruction request module 214 to thereby destruct the shared file system (Step 811 ).
  • the CPU 111 of the login host 110 updates the computing host management table 216 (Step 812 ). Specifically, the CPU 111 of the login host 110 clears, in the computing host management table 216 , the execution job identifier 430 of a record associated with the computing host 120 which has been allocated the executed job 200 .
  • FIG. 8 is a flow chart illustrating a procedure in which a request to construct a shared file system is made by the file system construction request module 213 according to the first embodiment of this invention.
  • the CPU 111 of the login host 110 receives information of the computing hosts 120 which are to be allocated the job 200 from the job scheduling module 212 (Step 900 ).
  • the CPU 111 of the login host 110 creates the file system configuration file 236 (Step 901 ).
  • the content of the file system configuration file 236 is as described with reference to FIG. 4 as an example.
  • the processing will be described in detail.
  • the CPU 111 of the login host 110 registers a computing host 120 which is to serve as the master server (Step 901 ). Specifically, the CPU 111 of the login host 110 selects one computing host 120 from among the computing hosts 120 which are allocated the job 200 , and then registers the name of the selected computing host 120 as the master server computing host name 500 of the file system configuration file 236 . In selecting a computing host 120 which is to serve as the master server, for example, a computing host 120 having the smallest identifier may be selected from among the computing hosts 120 which are allocated the job 200 .
  • the CPU 111 of the login host 110 registers computing hosts 120 which are to serve as the sub-servers (Step 902 ).
  • the CPU 111 of the login host 110 registers, in the sub-server computing host name list 510 of the file system configuration file 236 , all the computing hosts 120 which are allocated the job 200 .
  • the CPU 111 of the login host 110 registers computing hosts 120 which are to serve as the clients (Step 903 ). Specifically, the CPU 111 of the login host 110 registers all the computing hosts 120 which are allocated the job 200 in the client computing host name list 520 of the file system configuration file 236 . According to the first embodiment of this invention, all the computing hosts 120 which are allocated the job 200 execute the job, and hence, in order to minimize an overhead at the time of executing such processing that requires file access, all the computing hosts 120 execute the client function.
  • the CPU 111 of the login host 110 registers a directory in which a shared file is to be stored (Step 904 ). Specifically, the CPU 111 of the login host 110 registers the directory for the shared file (shared directory name 320 of FIG. 2 ), which is defined in the job 200 , in the shared directory name 530 .
  • the CPU 111 of the login host 110 transfers the created file system configuration file 236 to the computing host 120 which is registered as the master server of the shared file system in the processing of Step 901 (Step 905 ).
  • an instruction to execute the file system construction module 231 may be given to the computing host 120 which is a transfer destination of the file system configuration file 236 .
  • Step 905 constructing the shared file system has been completed among the computing hosts 120 which process the job 200 .
  • the CPU 111 of the login host 110 transfers a file to be used for the job 200 to one of the computing hosts 120 which are registered as the clients of the shared file system in the processing of Step 903 (Step 906 ).
  • the file to be transferred is a file specified in the stage-in file list 330 of FIG. 2 .
  • a computing host 120 having the smallest identifier may be selected from among the computing hosts 120 which serve as the clients.
  • FIG. 9 is a flow chart illustrating a procedure in which a request to destruct the shared file system is made by the file system destruction request module 214 according to the first embodiment of this invention.
  • the CPU 111 of the login host 110 receives, from the job scheduling module 212 , information of the computing hosts 120 which are allocated the job 200 (Step 1000 ).
  • the CPU 111 of the login host 110 obtains the file which has been used for the job 200 from the computing host 120 serving as the client of the shared file system (Step 1001 ).
  • the file which has been used for the job 200 corresponds to a file specified in the stage-out file list 340 of FIG. 2 .
  • a computing host 120 having the smallest identifier may be selected.
  • the CPU 111 of the login host 110 notifies the computing host 120 registered as the master server of the shared file system that the execution of the job 200 has been ended (Step 1002 ).
  • FIG. 10 is a flow chart illustrating a procedure in which a shared file system is constructed by the file system construction module 231 according to the first embodiment of this invention.
  • the CPU 121 of the computing host 120 receives the file system configuration file 236 transmitted by the job scheduler 210 of the login host 110 (Step 1100 ).
  • the CPU 121 of the computing host 120 starts up the master server module 233 so as to operate as the master server of the shared file system (Step 1101 ). It should be noted that detailed description of the processing performed by the master server module 233 will be made later with reference to FIG. 12 .
  • the CPU 121 of the computing host 120 gives an instruction to start up the sub-server module 234 to the computing hosts 120 which are registered in the sub-server computing host name list 510 of the file system configuration file 236 received in the processing of Step 1100 (Step 1102 ). It should be noted that detailed description of the processing performed by the sub-server module 234 will be made later with reference to FIG. 13 .
  • the CPU 121 of the computing host 120 gives an instruction to start up the client module 235 to the computing hosts 120 which are registered in the client computing host name list 520 of the file system configuration file 236 received in the processing of Step 1100 (Step 1103 ). It should be noted that detailed description of the processing performed by the client module 235 will be made later with reference to FIG. 14 .
  • the job 200 is executed in each of the computing hosts 120 which are allocated the job 200 (Step 722 of FIG. 6 ).
  • FIG. 11 is a flow chart illustrating a procedure in which a shared file system is destructed by the file system destruction module 232 according to the first embodiment of this invention.
  • the CPU 121 of the computing host 120 receives a notification of end of the job from the job scheduler 210 of the login host 110 (Step 1200 ).
  • the CPU 121 of the computing host 120 refers to the file system configuration file 236 , and then instructs all the computing hosts 120 registered as the sub-servers to delete a file which is duplicated at the time of execution of the job 200 , and a file which is generated during the execution of the job 200 (Step 1201 ).
  • the CPU 121 of the computing host 120 refers to the file system configuration file 236 , and then instructs all the computing hosts 120 registered as the clients of the shared file system to suspend the client module 235 (Step 1202 ).
  • the CPU 121 of the computing host 120 refers to the file system configuration file 236 , and then instructs all the computing hosts 120 registered as the sub-servers of the shared file system to suspend the sub-server module 234 (Step 1203 ).
  • Step 1204 the CPU 121 of the computing host 120 suspends the master server module 233 (Step 1204 ).
  • FIG. 12 is a flow chart illustrating a procedure of the processing performed by the master server of the shared file system according to the first embodiment of this invention.
  • This processing is executed continuously by the master server module 233 until the shared file system is destructed.
  • the function carried out as the master server does not include processing of directly accessing a file stored in the storage device 123 , which therefore makes a load thereof smaller compared to execution of a job or file access processing.
  • the master server is allocated to any one of the sub-servers in an overlapping manner.
  • the CPU 121 of the computing host 120 receives an access request for a file, which is transmitted by the computing host 120 serving as the client of the shared file system (Step 1300 ).
  • the CPU 121 of the computing host 120 notifies the computing host 120 , which has transmitted the access request in the processing of Step 1300 , of information of the sub-server computing host 120 which stores the requested file (Step 1301 ).
  • a correlation between the file stored in the shared directory and the computing host 120 which actually stores the file may be kept.
  • the computing host 120 serving as the sub-server may be identified.
  • FIG. 13 is a flow chart illustrating a procedure of the processing performed by the sub-server of the shared file system according to the first embodiment of this invention.
  • This processing is executed continuously by the sub-server module 234 until the shared file system is destructed.
  • the CPU 121 of the computing host 120 receives an access request for a file, which is transmitted by the client module 235 of the computing host 120 serving as the client of the shared file system (Step 1400 ).
  • the CPU 121 of the computing host 120 accesses the file stored in the storage device 123 (Step 1401 ). Further, the CPU 121 of the computing host 120 transfers an access result of the file to the computing host 120 which is a client of a request source (Step 1402 ).
  • FIG. 14 is a flow chart illustrating a procedure of the processing performed by the client of the shared file system according to the first embodiment of this invention.
  • This processing is executed continuously by the client module 235 until the shared file system is destructed.
  • the CPU 121 of the computing host 120 receives a file access requested through execution of the program 220 included in the job 200 (Step 1500 ).
  • the CPU 121 of the computing host 120 issues a file access request to the computing host 120 serving as the master server of the shared file system (Step 1501 ).
  • the CPU 121 of the computing host 120 receives, from the computing host 120 serving as the master server, the information of the sub-server computing host 120 having the storage device 123 which actually stores the file (Step 1502 ).
  • Step 1503 Based on the information of the computing host 120 serving as the sub-server, which has been received in the processing of Step 1502 , the CPU 121 of the computing host 120 issues an access request for the files (Step 1503 ).
  • the CPU 121 of the computing host 120 receives, from the computing host 120 serving as the sub-server, an access result of the file access request transmitted in the processing of Step 1503 (Step 1504 ).
  • Step 1505 the CPU 121 of the computing host 120 returns the access result of the file, which has been received in the processing of Step 1504 , to the program 220 (Step 1505 ).
  • a shared file system can be constructed dynamically at the time of execution of the job 200 .
  • the computing hosts 120 which are to process the job 200 are allocated for the job 200 ( FIG. 6 ).
  • a shared file system configured by the computing hosts 120 which are to process the job 200 is constructed (FIGS. 8 and 10 ).
  • the job 200 is processed while accessing the shared file system thus constructed ( FIGS. 12 to 14 ).
  • the constructed shared file system is destructed ( FIGS. 9 and 11 ).
  • the first embodiment of this invention it is possible to dynamically construct a shared file system which is configured by the computing hosts 120 which are allocated the job 200 . Therefore, a processing delay caused by file access from another job or the like can be prevented.
  • the shared file system is constructed by duplicating only a file necessary for the execution of the job 200 , and hence the shared file system can be constructed quickly.
  • the computer system according to the first embodiment of this invention is configured by the login host 110 and the computing host 120 , but a computer system according to a second embodiment of this invention further includes an I/O host, which is a computer having a higher file access capability than the computing host 120 .
  • an I/O host which is a computer having a higher file access capability than the computing host 120 .
  • FIG. 15 is a diagram illustrating an example of a configuration of the computer system according to the second embodiment of this invention.
  • the computer system according to the second embodiment of this invention further includes a plurality of I/O hosts 1600 in addition to the login host 110 and a plurality of the computing hosts 120 .
  • the login host 110 , the computing hosts 120 , and the I/O hosts 1600 are coupled via the network 100 . Further, the plurality of I/O hosts 1600 are coupled to one another via the network 100 as well.
  • the hardware configurations of the login host 110 and the computing host 120 are the same as those of the first embodiment of this invention.
  • the I/O hosts 1600 are dynamically allocated for a job at the time of execution of the job.
  • the number of the I/O hosts 1600 to be allocated for the job does not necessarily have to be equal to the number of the computing hosts 120 .
  • the I/O host 1600 includes a CPU 1601 , a memory 1602 , a storage device 1603 , and an NIC 1604 .
  • the CPU 1601 executes a file system program by executing a program stored in the memory 1602 .
  • the memory 1602 stores a program to be executed by the CPU 1601 , and data necessary for executing the program.
  • the memory 1602 stores a control program necessary for file access.
  • the storage device 1603 stores programs and files. For example, in a case where a shared file system is constructed, a file which is accessed by the job 200 is stored.
  • the NIC 1604 is an interface for establishing connection with the login host 110 and the computing host 120 via the network 100 .
  • the login host 110 stores the job 200 and a job scheduler 210 in the memory 112 .
  • the job 200 is the same as that of the first embodiment of this invention.
  • the job scheduler 210 includes an I/O host management table 1700 in addition to the configuration of the job scheduler 210 of the first embodiment of this invention.
  • the I/O host management table 1700 keeps identification information for the I/O hosts 1600 , and allocation states of the job 200 among the I/O hosts 1600 . It should be noted that detailed description of the I/O host management table 1700 will be made later with reference to FIG. 16 .
  • the computing host 120 stores the file system program 230 and the program 220 in the memory 122 .
  • the file system program 230 and the program 220 may be configured in the same manner as those of the first embodiment of this invention.
  • the I/O host 1600 stores the file system program 230 in the memory 1602 .
  • the configuration of the file system program 230 is the same as the file system program 230 provided to the computing host 120 according to the first embodiment of this invention. It should be noted that a configuration which excludes the client module 235 may be employed for the I/O host 1600 .
  • FIG. 16 is a diagram illustrating an example of the I/O host management table 1700 according to the second embodiment of this invention.
  • the I/O host management table 1700 includes an I/O host identifier 1800 , an I/O host name 1810 , an IP address 1820 , and an allocated job identifier 1830 .
  • the I/O host identifier 1800 represents an identifier of the I/O host 1600 included in the computer system.
  • the I/O host name 1810 represents a name of the I/O host 1600 .
  • the IP address 1820 represents an IP address of the I/O host 1600 .
  • the allocated job identifier 1830 represents an identifier of the job 200 which is allocated to the I/O host 1600 .
  • FIG. 17 is a diagram illustrating an example of the file system configuration file 236 according to the second embodiment of this invention.
  • the login host 110 creates, based on the received execution request for the job 200 , the file system configuration file 236 which includes definition information of the shared file system.
  • the login host 110 transmits the created file system configuration file 236 to the I/O host 1600 , and gives an instruction to construct a shared file system.
  • the configuration of the file system configuration file 236 is the same as that of the first embodiment of this invention in terms of form.
  • the functions of the master server and the sub-server are provided by the I/O host 1600 , and hence, instead of the master server computing host name 500 and the sub-server computing host name list 510 , the file system configuration file 236 includes a master server I/O host name 1900 and a sub-server I/O host name list 1910 .
  • an I/O host 1600 having the I/O host identifier “io 0 ” functions as the master server
  • I/O hosts 1600 having the I/O host identifiers “io 0 ” and “io 1 ” function as the sub-servers.
  • computing hosts 120 having the computing host identifiers “comp 0 ” and “comp 1 ” function as the clients, and process a requested job.
  • FIG. 18 is a flow chart illustrating a procedure in which a job 200 waiting for execution is allocated to the computing host 120 and the I/O host 1600 by the job scheduling module 212 according to the second embodiment of this invention.
  • the processing performed by the job scheduling module 212 according to the second embodiment of this invention is added with a procedure associated with the I/O host 1600 .
  • the CPU 111 of the login host 110 judges whether or not any job waiting for execution is stored in the job queue 215 (Step 700 ). Then, the CPU 111 of the login host 110 further judges whether or not it is possible to assure a sufficient number of computing hosts 120 to execute the job 200 (Step 710 ).
  • the CPU 111 of the login host 110 further judges whether or not it is possible to assure a sufficient number of I/O hosts 1600 to execute the job 200 (Step 2000 ).
  • the CPU 111 of the login host 110 refers to the I/O host management table 1700 to obtain the number of I/O hosts 1600 which have no value registered as the allocated job identifier 1830 , and then judges whether or not the obtained number is equal to or larger than the number of I/O hosts 1600 necessary for executing the job 200 .
  • the I/O host 1600 is a computer having a higher file access capability, and hence the necessary number of I/O hosts 1600 may be set smaller than the number of computing hosts 120 necessary for executing the job.
  • the number of I/O hosts 1600 has only to satisfy a predetermined ratio with respect to the number of computing hosts 120 necessary for executing the job.
  • the predetermined ratio may be set as a ratio between all the I/O hosts 1600 and all the computing hosts 120 which constitute the computer system. Alternatively, the predetermined ratio may be set based on the capability of the I/O host 1600 or the like.
  • Step 2000 the CPU 111 of the login host 110 waits until a job which is being executed is completed.
  • the CPU 111 of the login host 110 updates the computing host management table 216 (Step 720 ), and then executes the file system construction request module 213 (Step 721 ).
  • the CPU 111 of the login host 110 obtains, from the I/O host management table 1700 , information for the sufficient number of I/O hosts 1600 to execute the job 200 , and then registers the identifier of the job 200 as the allocated job identifier 1830 of each of the I/O hosts 1600 (Step 2010 ).
  • the CPU 111 of the login host 110 transmits the information of the I/O hosts 1600 , which has been obtained in the processing of Step 2010 , to the file system construction request module 213 (Step 2011 ).
  • the CPU 111 of the login host 110 obtains the job 200 waiting for execution from the job queue 215 , and requests the computing host 120 to execute the obtained job 200 (Step 722 ).
  • FIG. 19 is a flow chart illustrating a procedure in which allocation of the job 200 to the computing hosts 120 and the I/O hosts 1600 is canceled by the job scheduling module 212 according to the second embodiment of this invention.
  • Step 800 Upon completion of the execution of the job (the result of Step 800 is “Yes”), similarly to the case of the first embodiment of this invention, the CPU 111 of the login host 110 cancels the allocation of the executed job to the computing hosts 120 (Steps 810 to 812 ).
  • the CPU 111 of the login host 110 further refers to the I/O host management table 1700 , and then obtains the information of the I/O hosts 1600 which have the identifier of the executed job 200 registered as the allocated job identifier 1830 (Step 2100 ).
  • the CPU 111 of the login host 110 transmits the information of the I/O hosts 1600 , which has been obtained in the processing of Step 2100 , to the file system destruction request module 214 (Step 2101 ). Lastly, the CPU 111 of the login host 110 clears the allocated job identifier 1830 of each of the obtained I/O hosts 1600 (Step 2102 ).
  • FIG. 20 is a flow chart illustrating a procedure in which a request to construct a shared file system is made by the file system construction request module 213 according to the second embodiment of this invention.
  • the processing performed by the file system construction request module 213 according to the second embodiment of this invention includes processing associated with the I/O host 1600 in addition to the procedure of the first embodiment of this invention illustrated in FIG. 8 .
  • the CPU 111 of the login host 110 After reception of the information of the computing hosts 120 from the job scheduling module 212 (Step 900 ), the CPU 111 of the login host 110 receives information of the I/O hosts 1600 which are allocated for the job 200 (Step 2200 ).
  • the CPU 111 of the login host 110 Based on the received information of the I/O hosts 1600 , the CPU 111 of the login host 110 creates the file system configuration file 236 , and then registers an I/O host 1600 which is to serve as the master server (Step 2201 ). For example, an I/O host 1600 having the smallest identifier may be selected and registered from among the allocated I/O hosts 1600 .
  • the CPU 111 of the login host 110 registers I/O hosts 1600 which are to serve as the sub-servers (Step 2202 ). Specifically, the CPU 111 of the login host 110 registers all the I/O hosts 1600 which are allocated for the job 200 in the sub-server I/O host name list 1910 of the file system configuration file 236 illustrated in FIG. 17 .
  • the CPU 111 of the login host 110 registers computing hosts 120 which are to serve as the clients (Step 903 ). Further, the CPU 111 of the login host 110 registers a directory which is to store a shared file (Step 904 ).
  • the CPU 111 of the login host 110 transfers the created file system configuration file 236 to the I/O host 1600 which is registered as the master server of the shared file system in the processing of Step 2201 (Step 2210 ).
  • Step 906 the CPU 111 of the login host 110 transfers a file to be used for the job 200 to one of the computing hosts 120 which are registered as the clients of the shared file system in the processing of Step 903 (Step 906 ).
  • FIG. 21 is a flow chart illustrating a procedure in which a request to destruct the shared file system is made by the file system destruction request module 214 according to the second embodiment of this invention.
  • the processing performed by the file system destruction request module 214 according to the second embodiment of this invention includes processing associated with the I/O host 1600 in addition to the procedure of the first embodiment of this invention illustrated in FIG. 9 .
  • the CPU 111 of the login host 110 receives, from the job scheduling module 212 , the information of the computing hosts 120 which are allocated the job 200 (Step 1000 ), and obtains the file used for the job 200 from the computing host 120 serving as the client (Step 1001 ).
  • the CPU 111 of the login host 110 further receives, from the job scheduling module 212 , the information of the I/O hosts 1600 which are allocated the job 200 (Step 2300 ), and notifies the I/O host 1600 which is registered as the master server of the shared file system that the execution of the job 200 has been ended (Step 2301 ).
  • FIG. 22 is a flow chart illustrating a procedure in which a shared file system is constructed by the file system construction module 231 according to the second embodiment of this invention.
  • the processing performed by the file system construction module 231 according to the second embodiment of this invention is obtained by partly changing the procedure of the first embodiment of this invention illustrated in FIG. 10 , and includes a procedure associated with the I/O host 1600 . Besides, in the first embodiment of this invention, the processing is executed by the computing host 120 , but, in the second embodiment of this invention, the processing is executed by the I/O host 1600 .
  • the CPU 1601 of the I/O host 1600 receives the file system configuration file 236 transmitted by the job scheduler 210 of the login host 110 (Step 1100 ), and then starts up the master server module 233 (Step 1101 ). It should be noted that detailed description of the master server module 233 will be made later with reference to FIG. 24 .
  • the CPU 1601 of the I/O host 1600 gives an instruction to start up the sub-server module 234 to all the I/O hosts 1600 which are registered in the sub-server I/O host name list 1910 of the file system configuration file 236 received in the processing of Step 1100 (Step 2400 ). It should be noted that the processing performed by the sub-server module 234 is the same as the processing of the first embodiment of this invention illustrated in FIG. 13 except that the operation subject has been changed from the computing host 120 to the I/O host 1600 .
  • the CPU 1601 of the I/O host 1600 gives an instruction to start up the client module 235 to the computing hosts 120 which are registered in the client computing host name list 520 of the file system configuration file 236 received in the processing of Step 1100 (Step 1103 ). It should be noted that detailed description of the client module 235 will be made later with reference to FIG. 25 .
  • FIG. 23 is a flow chart illustrating a procedure in which a shared file system is destructed by the file system destruction module 232 according to the second embodiment of this invention.
  • the processing performed by the file system destruction module 232 according to the second embodiment of this invention is obtained by partly changing the procedure of the first embodiment of this invention illustrated in FIG. 11 , and includes a procedure associated with the I/O host 1600 .
  • the CPU 1601 of the I/O host 1600 receives a notification of end of the job from the job scheduler 210 of the login host 110 (Step 1200 ).
  • the CPU 1601 of the I/O host 1600 refers to the file system configuration file 236 , and then instructs all the I/O hosts 1600 registered as the sub-servers to delete a file which is duplicated at the time of execution of the job 200 , and a file which is generated during the execution of the job 200 (Step 2500 ).
  • the CPU 1601 of the I/O host 1600 refers to the file system configuration file 236 , and then instructs all the computing hosts 120 registered as the clients to suspend the client module 235 (Step 1202 ).
  • the CPU 1601 of the I/O host 1600 refers to the file system configuration file 236 , and then instructs all the I/O hosts 1600 registered as the sub-servers to suspend the sub-server module 234 (Step 2510 ).
  • FIG. 24 is a flow chart illustrating a procedure of the processing performed by the master server of the shared file system according to the second embodiment of this invention.
  • the processing performed by the master server module 233 according to the second embodiment of this invention is obtained by partly changing the procedure of the first embodiment of this invention illustrated in FIG. 12 , and includes a procedure associated with the I/O host 1600 .
  • the CPU 1601 of the I/O host 1600 receives a file access request transmitted from the computing host 120 serving as the client of the shared file system (Step 1300 ).
  • the CPU 1601 of the I/O host 1600 notifies the computing host 120 , which has transmitted the access request in the processing of Step 1300 , of the information of the sub-server I/O host 1600 which stores the requested file (Step 2600 ).
  • FIG. 25 is a flow chart illustrating a procedure of the processing performed by the client of the shared file system according to the second embodiment of this invention.
  • the processing performed by the client module 235 according to the second embodiment of this invention is obtained by partly changing the procedure of the first embodiment of this invention illustrated in FIG. 14 , and includes a procedure associated with the I/O host 1600 .
  • the CPU 121 of the computing host 120 receives a file access requested through execution of the program 220 included in the job 200 (Step 1500 ).
  • the CPU 121 of the computing host 120 issues a file access request to the I/O host 1600 serving as the master server of the shared file system (Step 2700 ).
  • the CPU 121 of the computing host 120 receives, from the I/O host 1600 serving as the master server, the information of the sub-server I/O host 1600 which stores the file to be accessed (Step 2701 ).
  • Step 2702 the CPU 121 of the computing host 120 issues a file access request (Step 2702 ).
  • the CPU 121 of the computing host 120 receives, from the I/O host 1600 serving as the sub-server, an access result of the file access request transmitted in the processing of Step 2702 (Step 2703 ).
  • Step 1505 the CPU 121 of the computing host 120 returns the access result of the file, which has been received in the processing of Step 2703 , to the program 220 (Step 1505 ).
  • the second embodiment of this invention similarly to the case of the first embodiment of this invention, it is possible to dynamically construct a shared file system which is configured by the computing hosts 120 which are allocated the job 200 . Therefore, a processing delay caused by file access from another job or the like can be prevented.
  • the shared file system is constructed by duplicating only a file necessary for the execution of the job 200 , and hence the shared file system can be constructed quickly.
  • the constructing of a shared file system is executed using the I/O hosts 1600 having higher file access capabilities than the computing hosts 120 .
  • file access which is likely to become a bottleneck at the time of execution of a job can be processed with higher speed.
  • This invention is applicable to file access management for a computer system, and more particularly, is applicable to shared file access management for a large-scale computer system.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Provided is a computer system including a plurality of computing hosts, which constructs a shared file system dynamically so that job execution efficiency is improved. In the computer system which includes the plurality of computing hosts and executes a job requested, in a case where the job is executed by the plurality of computing hosts, each computing host which executes the job is configured to: share a file necessary for executing the job; access the shared file to execute the requested job; and cancel the sharing of the file after execution of the requested job is completed.

Description

    CLAIM OF PRIORITY
  • The present application claims priority from Japanese patent application JP 2008-333135 filed on Dec. 26, 2008, the content of which is hereby incorporated by reference into this application.
  • BACKGROUND
  • This invention relates to a technology of executing, while accessing a file, a job in a computer system including a plurality of computing hosts.
  • There has been known a technology of building a computer system which includes hundreds to thousands of computing hosts coupled to a high-speed network. Each of the computing hosts is provided with a processor, and one or a plurality of computing hosts execute various kinds of processing. Such a computer system has been implemented as a cluster computer system, a massively parallel computer system, or a supercomputer. In those cases, such a large-scale computer system as described above processes a huge amount of data, which therefore requires high-speed file access.
  • In order to realize high-speed file access in a large-scale computer system, there is disclosed a technology called “file staging” in A. Uno, “Software of the Earth Simulator”, Journal of the Earth Simulator, Volume 3, September 2005, 52-59. The file staging is a technology in which a file to be accessed for a job is transferred between a computing host and a login host before and after execution of the job. With the file staging, a job which is in execution in each computing host can be executed by just accessing a file stored in a local storage device provided for the computing host itself. Therefore, it is possible to realize high-speed file access.
  • Further, as a technology of sharing the same file among a plurality of computing hosts in a large-scale computer system, there is disclosed a technology called “shared file system” in R. Sandberg, “The Sun Network Filesystem: Design, Implementation and Experience”, in Proceedings of the Summer 1986 USENIX Technical Conference and Exhibition. The shared file system is a technology in which a file to be accessed for a job is shared between a computing host and a login host. With the shared file system, there is no need to transfer the file to be accessed for the job between the computing host and the login host.
  • SUMMARY
  • In a computer system to which the file staging disclosed in above-described “Software of the Earth Simulator” is applied, each computing host is provided with a local file system. Accordingly, a file to be accessed for a job needs to be transferred to each computing host.
  • Further, in a large-scale computer system, in order to improve execution efficiencies of the computing hosts, in a case where another job is allocated during the execution of a job, the another job is allocated to another computing host which is not involved in the execution of the job. In such a case, in a computer system in which the shared file system disclosed in R. Sandberg, “The Sun Network Filesystem: Design, Implementation and Experience”, in Proceedings of the Summer 1986 USENIX Technical Conference and Exhibition is built, file accesses for the job which is in execution in each computing host concentrate on a host serving as a server to realize file sharing. As a result, there occurs a risk that file access for a certain job interferes with execution of another job.
  • In view of the above-mentioned problems, this invention has been made, and it is therefore an object of this invention to realize, for a computer system including a plurality of computing hosts, high-speed file sharing by sharing, at the time of execution of a job, a necessary file among computing hosts allocated to process the job.
  • The representative aspects of this invention are as follows. That is, there is provided a file sharing method used for a computer system which includes a plurality of computers, and which executes a job requested, the plurality of computers including a plurality of computing hosts which execute the job, each of the plurality of computing hosts comprising: a first interface coupled to another one of the plurality of computing hosts; a first processor coupled to the first interface; and a first memory coupled to the first processor, the file sharing method including the steps of, in a case where the job is executed by the plurality of computing hosts: sharing, by the first processor of the each of the plurality of computing hosts which execute the job, a file necessary for executing the job; executing, by the first processor of the each of the plurality of computing hosts which execute the job, the requested job through accessing the file; and canceling, by the first processor of the each of the plurality of computing hosts which execute the requested job, the sharing of the file after the executing of the requested job is completed.
  • According to the one aspect of this invention, by duplicating a necessary file prior to execution of a job, and sharing the file among the computing hosts, it is possible to access the file necessary for executing the job without being affected by another job.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The present invention can be appreciated by the description which follows in conjunction with the following figures, wherein:
  • FIG. 1 is a diagram illustrating an example of a configuration of a computer system according to a first embodiment of this invention;
  • FIG. 2 is a diagram illustrating an example of a job according to the first embodiment of this invention;
  • FIG. 3 is a diagram illustrating an example of a computing host management table according to the first embodiment of this invention;
  • FIG. 4 is a diagram illustrating an example of a file system configuration file according to the first embodiment of this invention;
  • FIG. 5 is a flow chart illustrating a procedure in which an execution request for a job is received from a user by a job reception module according to the first embodiment of this invention;
  • FIG. 6 is a flow chart illustrating a procedure in which a job waiting for execution is allocated to the computing host by a job scheduling module according to the first embodiment of this invention;
  • FIG. 7 is a flow chart illustrating a procedure in which allocation of the job to the computing hosts is canceled by the job scheduling module according to the first embodiment of this invention;
  • FIG. 8 is a flow chart illustrating a procedure in which a request to construct a shared file system is made by a file system construction request module according to the first embodiment of this invention;
  • FIG. 9 is a flow chart illustrating a procedure in which a request to destruct a shared file system is made by a file system destruction request module according to the first embodiment of this invention;
  • FIG. 10 is a flow chart illustrating a procedure in which a shared file system is constructed by a file system construction module according to the first embodiment of this invention;
  • FIG. 11 is a flow chart illustrating a procedure in which a shared file system is destructed by a file system destruction module according to the first embodiment of this invention;
  • FIG. 12 is a flow chart illustrating a procedure of a processing performed by a master server of the shared file system according to the first embodiment of this invention;
  • FIG. 13 is a flow chart illustrating a procedure of a processing performed by a sub-server of the shared file system according to the first embodiment of this invention;
  • FIG. 14 is a flow chart illustrating a procedure of a processing performed by a client of the shared file system according to the first embodiment of this invention;
  • FIG. 15 is a diagram illustrating an example of a configuration of a computer system according to a second embodiment of this invention;
  • FIG. 16 is a diagram illustrating an example of an I/O host management table according to the second embodiment of this invention;
  • FIG. 17 is a diagram illustrating an example of a file system configuration file according to the second embodiment of this invention;
  • FIG. 18 is a flow chart illustrating a procedure in which a job waiting for execution is allocated to a computing host and the I/O host by a job scheduling module according to the second embodiment of this invention;
  • FIG. 19 is a flow chart illustrating a procedure in which allocation of a job to the computing hosts and the I/O hosts is canceled by a job scheduling module according to the second embodiment of this invention;
  • FIG. 20 is a flow chart illustrating a procedure in which a request to construct a shared file system is made by a file system construction request module according to the second embodiment of this invention;
  • FIG. 21 is a flow chart illustrating a procedure in which a request to destruct the shared file system is made by a file system destruction request module according to the second embodiment of this invention;
  • FIG. 22 is a flow chart illustrating a procedure in which a shared file system is constructed by a file system construction module according to the second embodiment of this invention;
  • FIG. 23 is a flow chart illustrating a procedure in which a shared file system is destructed by a file system destruction module according to the second embodiment of this invention;
  • FIG. 24 is a flow chart illustrating a procedure of a processing performed by a master server of the shared file system according to the second embodiment of this invention; and
  • FIG. 25 is a flow chart illustrating a procedure of the processing performed by a client of the shared file system according to the second embodiment of this invention.
  • DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • Hereinbelow, referring to the accompanying drawings, embodiments of this invention will be described. It should be noted that, in all the drawings attached for describing the embodiments of this invention, the same components are, in principle, denoted by the same reference numerals, and repetitive description thereof will be omitted.
  • First Embodiment
  • FIG. 1 is a diagram illustrating an example of a configuration of a computer system according to a first embodiment of this invention.
  • The computer system according to the first embodiment of this invention includes a login host 110 and a plurality of computing hosts 120. The login host 110 is coupled to the computing hosts 120 via a network 100. The computing hosts 120 are coupled to one another via the network 100 as well.
  • The login host 110 receives a request for an execution of a job from a user. The login host 110 also selects a computing host 120 which is to execute the requested job, and carries out such processing (job scheduling) as causing the selected computing host 120 to execute the job.
  • The login host 110 includes a CPU 111, a memory 112, a storage device 113, and a network interface card (NIC) 114.
  • By executing a program stored in the memory 112, the CPU 111 executes such processing as job scheduling.
  • The memory 112 stores a program to be executed by the CPU 111, and data necessary for executing the program. For example, the memory 112 stores a program for performing the job scheduling, and management information of the computing hosts 120. The program and data stored in the memory 112 will be described later.
  • The storage device 113 stores programs and files. For example, a program stored in the storage device 113 is loaded into the memory 112, and then executed by the CPU 111. The NIC 114 is an interface for establishing connection with the computing host 120 via the network 100.
  • The computing host 120 executes a job allocated by the login host 110. In some cases, a job is processed by a single computing host 120, and, in other cases, a job is processed by a plurality of computing hosts 120. The processing capabilities of the computing hosts 120 included in the computer system are essentially the same. In a case where the processing capabilities are different from each other at the time of executing a job with a plurality of computing hosts 120, there arises a fear that a computing host 120 inferior in processing capability becomes a bottleneck.
  • The computing host 120 includes a CPU 121, a memory 122, a storage device 123, and an NIC 124.
  • By executing a program stored in the memory 122, the CPU 121 processes an allocated job.
  • The memory 122 stores a program to be executed by the CPU 121, and data necessary for executing the program. For example, the memory 122 stores a program for executing a job. The program and data stored in the memory 122 will be described later.
  • The storage device 123 stores programs and files. For example, upon allocation of a job, a transferred program is stored. The stored program is loaded into the memory 122, and then executed by the CPU 121, whereby the allocated job is processed. The storage device 123 also stores a file necessary for executing a job, and constructs a shared file system.
  • The NIC 124 is an interface for establishing connection with the login host 110 and other computing hosts 120 via the network 100.
  • A hardware configuration of the computer system according to the first embodiment of this invention has been described above. Next, a software configuration thereof will be described.
  • The login host 110 according to the first embodiment of this invention stores a job 200 and a job scheduler 210 in the memory 112.
  • The job 200 includes information which is necessary for executing computational processing requested from a user. The job 200 includes, for example, data and a program for processing the data. The job 200 is executed by a single computing host 120 or by a plurality of computing hosts 120.
  • The job scheduler 210 is a program which is processed by the CPU 111. The job scheduler 210 performs management of the job 200, such as allocating the job 200 to the computing host 120, and canceling the allocation of the job 200.
  • The job scheduler 210 includes a job reception module 211, a job scheduling module 212, a file system construction request module 213, and a file system destruction request module 214.
  • The job reception module 211 receives a request for an execution of a job from a user. Detailed description of the processing performed by the job reception module 211 will be made later with reference to FIG. 5.
  • The job scheduling module 212 obtains a job stored in a job queue 215, and then allocates the job to the computing host 120 which is to execute the job. Further, upon end of the execution of the job, the job scheduling module 212 cancels the allocation of the job to the computing host 120. Detailed description of the processing performed by the job scheduling module 212 will be made later with reference to FIGS. 6 and 7.
  • The file system construction request module 213 identifies computing hosts 120 which are to share a file with, and then requests the computing host 120 to construct a shared file system. Detailed description of the processing performed by the file system construction request module 213 will be made later with reference to FIG. 8.
  • The file system destruction request module 214 makes a request for destruction of the constructed shared file system. Detailed description of the processing performed by the file system destruction request module 214 will be made later with reference to FIG. 9.
  • The job scheduler 210 further includes the job queue 215 and a computing host management table 216. The job queue 215 temporarily stores the job 200 which the user requests to be processed until the processing starts.
  • The computing host management table 216 keeps identification information of the computing hosts 120 and allocation states of the job 200 among the computing hosts 120. Detailed description of the configuration of the computing host management table 216 will be made later with reference to FIG. 3.
  • The computing host 120 according to the first embodiment of this invention stores a program 220 and a file system program 230 in the memory 122.
  • The program 220 is a program which is included in the job 200 allocated by the job scheduler 210 of the login host 110. With the CPU 121 executing the program 220, the job requested by the user is executed.
  • The file system program 230 is executed by the CPU 121, whereby construction and destruction of a shared file system, and processing necessary for file access are performed.
  • The file system program 230 includes a file system construction module 231, a file system destruction module 232, a master server module 233, a sub-server module 234, and a client module 235.
  • The file system program 230 further includes a file system configuration file 236. The file system configuration file 236 keeps information of the computing hosts 120 which constitute the shared file system.
  • The file system construction module 231 constructs a file system based on the file system configuration file 236 which has been received from the login host 110. Detailed description of the processing performed by the file system construction module 231 will be made later with reference to FIG. 10.
  • Upon notification of completion of a job from the login host 110, the file system destruction module 232 destructs the constructed file system. Detailed description of the processing performed by the file system destruction module 232 will be made later with reference to FIG. 11.
  • The shared file system according to the first embodiment of this invention is configured by a master server, a sub-server, and a client. Upon allocation of the job 200 by the login host 110, the computing host 120 executes at least one function from among the functions of the master server, the sub-server, and the client.
  • The client receives a file access request made by the job 200. The sub-server stores a file to be shared in the storage device 123. The master server manages a storage location of the file, and, in a case where an inquiry is made about the storage location of the file by the client, notifies which sub-server stores the file.
  • According to the first embodiment of this invention, in a case where a job is processed by a plurality of computing hosts 120, the program 220 included in the job 200 is executed by each of the computing hosts 120. Therefore, the function of the client is executed by each computing host 120.
  • The master server module 233 executes processing which, in the constructed shared file system, causes the computing host 120 to behave as a master server. Detailed description of the processing performed by the master server module 233 will be made later with reference to FIG. 12.
  • The sub-server module 234 executes processing which, in the constructed shared file system, causes the computing host 120 to behave as a sub-server. Detailed description of the processing performed by the sub-server module 234 will be made later with reference to FIG. 13.
  • The client module 235 executes processing which, in the constructed shared file system, causes the computing host 120 to behave as a client. Detailed description of the processing performed by the client module 235 will be made later with reference to FIG. 14.
  • FIG. 2 is a diagram illustrating an example of the job 200 according to the first embodiment of this invention.
  • The job 200 according to the first embodiment of this invention includes, as described above, information necessary for executing the requested computational processing. FIG. 2 illustrates a script file, which is one form of execution request for the job 200.
  • To describe more specifically, in the job 200, there are defined an estimated execution time 300, a number of computing hosts to be used 310, a shared directory name 320, a stage-in file list 330, and a stage-out file list 340.
  • The estimated execution time 300 represents an estimated value of a period of time required to process the job 200. The number of computing hosts to be used 310 represents the number of computing hosts 120 which are to process the job 200. The shared directory name 320 represents the name of a shared directory which stores a file shared among the computing hosts which process the job.
  • The stage-in file list 330 represents a list of files which are copy sources of the files stored in the shared directory. Upon construction of the shared file system, a directory which is identified by the shared directory name 320 is mounted, and then, a file specified by the stage-in file list 330 is duplicated.
  • The stage-out file list 340 represents a list of files to which a processing result of the job 200 is output. Upon completion of the execution of the job, a file specified by the stage-out file list 340 is transferred to the login host 110.
  • FIG. 3 is a diagram illustrating an example of the computing host management table 216 according to the first embodiment of this invention.
  • The computing host management table 216 includes a computing host identifier 400, a computing host name 410, an IP address 420, and an execution job identifier 430.
  • The computing host identifier 400 represents an identifier for identifying the computing host 120 included in the computer system. The computing host name 410 represents a name of the computing host 120. The IP address 420 represents an IP address of the computing host 120.
  • The execution job identifier 430 represents an identifier of the job 200 executed by the computing host 120. Referring to FIG. 3, the job 200 having the execution job identifier 430 of “10” is executed by the computing hosts “comp0” and “comp1”.
  • FIG. 4 is a diagram illustrating an example of the file system configuration file 236 according to the first embodiment of this invention.
  • The login host 110 creates, based on the received execution request for the job 200, the file system configuration file 236 which includes definition information of the shared file system. The login host 110 transmits the created file system configuration file 236 to the computing host 120, and gives an instruction to construct a shared file system.
  • To describe more specifically, the file system configuration file 236 includes a master server computing host name 500, a sub-server computing host name list 510, a client computing host name list 520, and a shared directory name 530.
  • The master server computing host name 500 represents a name of the computing host 120 which serves as the master server in the shared file system.
  • The sub-server computing host name list 510 represents a list of computing hosts 120 which serve as the sub-servers. In the sub-server computing host name list 510, at least one name of the computing host 120 is described.
  • The client computing host name list 520 represents a list of computing hosts 120 which serve as the clients. In the client computing host name list 520, at least one name of the computing host 120 is described.
  • In the shared directory name 530, the name of a directory which stores the shared file is described. For example, the shared directory name 320 included in the execution request for the job 200 illustrated in FIG. 2 is set.
  • FIG. 5 is a flow chart illustrating a procedure in which an execution request for the job 200 is received from the user by the job reception module 211 according to the first embodiment of this invention.
  • The CPU 111 of the login host 110 receives the execution request for the job 200 from the user by executing the job reception module 211 (Step 600).
  • The CPU 111 of the login host 110 temporarily stores the job 200 received in the processing of Step 600 in the job queue 215 (Step 601).
  • FIG. 6 is a flow chart illustrating a procedure in which a job 200 waiting for execution is allocated to the computing host 120 by the job scheduling module 212 according to the first embodiment of this invention.
  • First, the CPU 111 of the login host 110 judges whether or not any job 200 waiting for execution is stored in the job queue 215 (Step 700). In a case where no job 200 waiting for execution is stored in the job queue 215 (the result of Step 700 is “No”), there is no job 200 to be processed, and hence the CPU 111 of the login host 110 waits until a job 200 waiting for execution is stored in the job queue 215.
  • In a case where a job 200 waiting for execution is stored in the job queue 215 (the result of Step 700 is “Yes”), the CPU 111 of the login host 110 judges whether or not it is possible to assure a sufficient number of computing hosts 120 to execute the job 200 (Step 710).
  • Specifically, the processing of Step 710 is processing in which it is judged whether or not the number of computing hosts 120 which are not executing a job is equal to or larger than the number of computing hosts 120 specified by the execution request for the job 200. The number of computing hosts 120 which are not executing a job can be obtained by referring to the computing host management table 216, and counting the number of computing hosts 120 for which no value is registered as the execution job identifier 430. The number of computing hosts 120 specified by the execution request for the job 200 corresponds to a value which is set as the number of computing hosts to be used 310 illustrated in FIG. 2.
  • In a case where it is possible to assure a sufficient number of computing hosts 120 to execute the job 200 (the result of Step 710 is “Yes”), the CPU 111 of the login host 110 obtains, from the computing host management table 216, information of the computing hosts 120 which are to be allocated the job 200, and then updates the computing host management table 216 (Step 720). Specifically, the CPU 111 of the login host 110 obtains a record associated with the computing host 120 which is to be allocated the job 200, and then registers, as the execution job identifier 430 of the record, the identifier of the job 200 to be executed.
  • Based on the information of the computing hosts 120 which are to be allocated the job 200, the CPU 111 of the login host 110 constructs a shared file system by executing the file system construction request module 213 (Step 721).
  • Upon constructing the shared file system, the CPU 111 of the login host 110 obtains the job 200 waiting for execution from the job queue 215, and requests the computing host 120 to execute the obtained job 200 (Step 722).
  • FIG. 7 is a flow chart illustrating a procedure in which allocation of the job 200 to the computing hosts 120 is canceled by the job scheduling module 212 according to the first embodiment of this invention.
  • The CPU 111 of the login host 110 judges whether or not there is any job 200 which has been executed (Step 800). Completion of the job 200 may be judged by, for example, receiving a notification of execution completion of the job 200 from the computing host 120, or making an inquiry to the computing host 120 periodically.
  • In a case where there is no job 200 which has been executed (the result of Step 800 is “No”), the CPU 111 of the login host 110 waits until the job 200 has been executed.
  • On the other hand, in a case where there is a job 200 which has been executed (the result of Step 800 is “Yes”), the CPU 111 of the login host 110 obtains, from the computing host management table 216, information of the computing hosts 120 which have been allocated the executed job 200 (Step 810).
  • Further, based on the information of the computing hosts 120 obtained in the processing of Step 810, the CPU 111 of the login host 110 executes the file system destruction request module 214 to thereby destruct the shared file system (Step 811).
  • Lastly, the CPU 111 of the login host 110 updates the computing host management table 216 (Step 812). Specifically, the CPU 111 of the login host 110 clears, in the computing host management table 216, the execution job identifier 430 of a record associated with the computing host 120 which has been allocated the executed job 200.
  • FIG. 8 is a flow chart illustrating a procedure in which a request to construct a shared file system is made by the file system construction request module 213 according to the first embodiment of this invention.
  • First, the CPU 111 of the login host 110 receives information of the computing hosts 120 which are to be allocated the job 200 from the job scheduling module 212 (Step 900).
  • Based on the received information of the computing host 120, the CPU 111 of the login host 110 creates the file system configuration file 236 (Step 901). The content of the file system configuration file 236 is as described with reference to FIG. 4 as an example. Hereinbelow, referring to the file system configuration file 236 illustrated in FIG. 4, the processing will be described in detail.
  • First, upon creation of the file system configuration file 236, the CPU 111 of the login host 110 registers a computing host 120 which is to serve as the master server (Step 901). Specifically, the CPU 111 of the login host 110 selects one computing host 120 from among the computing hosts 120 which are allocated the job 200, and then registers the name of the selected computing host 120 as the master server computing host name 500 of the file system configuration file 236. In selecting a computing host 120 which is to serve as the master server, for example, a computing host 120 having the smallest identifier may be selected from among the computing hosts 120 which are allocated the job 200.
  • Next, the CPU 111 of the login host 110 registers computing hosts 120 which are to serve as the sub-servers (Step 902). According to the first embodiment of this invention, in order for all the computing hosts 120 which are allocated the job 200 to function as the sub-servers, the CPU 111 of the login host 110 registers, in the sub-server computing host name list 510 of the file system configuration file 236, all the computing hosts 120 which are allocated the job 200.
  • Further, the CPU 111 of the login host 110 registers computing hosts 120 which are to serve as the clients (Step 903). Specifically, the CPU 111 of the login host 110 registers all the computing hosts 120 which are allocated the job 200 in the client computing host name list 520 of the file system configuration file 236. According to the first embodiment of this invention, all the computing hosts 120 which are allocated the job 200 execute the job, and hence, in order to minimize an overhead at the time of executing such processing that requires file access, all the computing hosts 120 execute the client function.
  • The CPU 111 of the login host 110 registers a directory in which a shared file is to be stored (Step 904). Specifically, the CPU 111 of the login host 110 registers the directory for the shared file (shared directory name 320 of FIG. 2), which is defined in the job 200, in the shared directory name 530.
  • The CPU 111 of the login host 110 transfers the created file system configuration file 236 to the computing host 120 which is registered as the master server of the shared file system in the processing of Step 901 (Step 905). At this time, an instruction to execute the file system construction module 231 may be given to the computing host 120 which is a transfer destination of the file system configuration file 236. When the processing of Step 905 is ended, constructing the shared file system has been completed among the computing hosts 120 which process the job 200.
  • The CPU 111 of the login host 110 transfers a file to be used for the job 200 to one of the computing hosts 120 which are registered as the clients of the shared file system in the processing of Step 903 (Step 906). Specifically, the file to be transferred is a file specified in the stage-in file list 330 of FIG. 2. As the computing host 120 to which the file is to be transferred, for example, a computing host 120 having the smallest identifier may be selected from among the computing hosts 120 which serve as the clients.
  • FIG. 9 is a flow chart illustrating a procedure in which a request to destruct the shared file system is made by the file system destruction request module 214 according to the first embodiment of this invention.
  • First, the CPU 111 of the login host 110 receives, from the job scheduling module 212, information of the computing hosts 120 which are allocated the job 200 (Step 1000).
  • Based on the received information of the computing hosts 120, the CPU 111 of the login host 110 obtains the file which has been used for the job 200 from the computing host 120 serving as the client of the shared file system (Step 1001). The file which has been used for the job 200 corresponds to a file specified in the stage-out file list 340 of FIG. 2. As the client computing host 120 used for obtaining the file, for example, a computing host 120 having the smallest identifier may be selected.
  • Lastly, the CPU 111 of the login host 110 notifies the computing host 120 registered as the master server of the shared file system that the execution of the job 200 has been ended (Step 1002).
  • FIG. 10 is a flow chart illustrating a procedure in which a shared file system is constructed by the file system construction module 231 according to the first embodiment of this invention.
  • The CPU 121 of the computing host 120 receives the file system configuration file 236 transmitted by the job scheduler 210 of the login host 110 (Step 1100).
  • The CPU 121 of the computing host 120 starts up the master server module 233 so as to operate as the master server of the shared file system (Step 1101). It should be noted that detailed description of the processing performed by the master server module 233 will be made later with reference to FIG. 12.
  • Next, the CPU 121 of the computing host 120 gives an instruction to start up the sub-server module 234 to the computing hosts 120 which are registered in the sub-server computing host name list 510 of the file system configuration file 236 received in the processing of Step 1100 (Step 1102). It should be noted that detailed description of the processing performed by the sub-server module 234 will be made later with reference to FIG. 13.
  • Lastly, the CPU 121 of the computing host 120 gives an instruction to start up the client module 235 to the computing hosts 120 which are registered in the client computing host name list 520 of the file system configuration file 236 received in the processing of Step 1100 (Step 1103). It should be noted that detailed description of the processing performed by the client module 235 will be made later with reference to FIG. 14.
  • After the completion of constructing the shared file system with the processing illustrated in FIG. 10, the job 200 is executed in each of the computing hosts 120 which are allocated the job 200 (Step 722 of FIG. 6).
  • FIG. 11 is a flow chart illustrating a procedure in which a shared file system is destructed by the file system destruction module 232 according to the first embodiment of this invention.
  • The CPU 121 of the computing host 120 receives a notification of end of the job from the job scheduler 210 of the login host 110 (Step 1200).
  • The CPU 121 of the computing host 120 refers to the file system configuration file 236, and then instructs all the computing hosts 120 registered as the sub-servers to delete a file which is duplicated at the time of execution of the job 200, and a file which is generated during the execution of the job 200 (Step 1201).
  • The CPU 121 of the computing host 120 refers to the file system configuration file 236, and then instructs all the computing hosts 120 registered as the clients of the shared file system to suspend the client module 235 (Step 1202).
  • The CPU 121 of the computing host 120 refers to the file system configuration file 236, and then instructs all the computing hosts 120 registered as the sub-servers of the shared file system to suspend the sub-server module 234 (Step 1203).
  • Lastly, the CPU 121 of the computing host 120 suspends the master server module 233 (Step 1204).
  • FIG. 12 is a flow chart illustrating a procedure of the processing performed by the master server of the shared file system according to the first embodiment of this invention.
  • This processing is executed continuously by the master server module 233 until the shared file system is destructed. The function carried out as the master server does not include processing of directly accessing a file stored in the storage device 123, which therefore makes a load thereof smaller compared to execution of a job or file access processing. Thus, according to the first embodiment of this invention, as illustrated in the file system configuration file 236 of FIG. 4, the master server is allocated to any one of the sub-servers in an overlapping manner.
  • The CPU 121 of the computing host 120 receives an access request for a file, which is transmitted by the computing host 120 serving as the client of the shared file system (Step 1300).
  • The CPU 121 of the computing host 120 notifies the computing host 120, which has transmitted the access request in the processing of Step 1300, of information of the sub-server computing host 120 which stores the requested file (Step 1301). In order to identify the sub-server which stores the files, for example, a correlation between the file stored in the shared directory and the computing host 120 which actually stores the file may be kept. Alternatively, by using another commonly-used file management method, the computing host 120 serving as the sub-server may be identified.
  • FIG. 13 is a flow chart illustrating a procedure of the processing performed by the sub-server of the shared file system according to the first embodiment of this invention.
  • This processing is executed continuously by the sub-server module 234 until the shared file system is destructed.
  • The CPU 121 of the computing host 120 receives an access request for a file, which is transmitted by the client module 235 of the computing host 120 serving as the client of the shared file system (Step 1400).
  • Based on the access request received in the processing of Step 1400, the CPU 121 of the computing host 120 accesses the file stored in the storage device 123 (Step 1401). Further, the CPU 121 of the computing host 120 transfers an access result of the file to the computing host 120 which is a client of a request source (Step 1402).
  • FIG. 14 is a flow chart illustrating a procedure of the processing performed by the client of the shared file system according to the first embodiment of this invention.
  • This processing is executed continuously by the client module 235 until the shared file system is destructed.
  • The CPU 121 of the computing host 120 receives a file access requested through execution of the program 220 included in the job 200 (Step 1500).
  • First, the CPU 121 of the computing host 120 issues a file access request to the computing host 120 serving as the master server of the shared file system (Step 1501).
  • The CPU 121 of the computing host 120 receives, from the computing host 120 serving as the master server, the information of the sub-server computing host 120 having the storage device 123 which actually stores the file (Step 1502).
  • Based on the information of the computing host 120 serving as the sub-server, which has been received in the processing of Step 1502, the CPU 121 of the computing host 120 issues an access request for the files (Step 1503).
  • The CPU 121 of the computing host 120 receives, from the computing host 120 serving as the sub-server, an access result of the file access request transmitted in the processing of Step 1503 (Step 1504).
  • Lastly, the CPU 121 of the computing host 120 returns the access result of the file, which has been received in the processing of Step 1504, to the program 220 (Step 1505).
  • As is described above, according to the first embodiment of this invention, a shared file system can be constructed dynamically at the time of execution of the job 200. Specifically, upon reception of the job 200 (FIG. 5), the computing hosts 120 which are to process the job 200 are allocated for the job 200 (FIG. 6). Then, a shared file system configured by the computing hosts 120 which are to process the job 200 is constructed (FIGS. 8 and 10). After that, the job 200 is processed while accessing the shared file system thus constructed (FIGS. 12 to 14). Lastly, when the job is ended, the constructed shared file system is destructed (FIGS. 9 and 11).
  • According to the first embodiment of this invention, it is possible to dynamically construct a shared file system which is configured by the computing hosts 120 which are allocated the job 200. Therefore, a processing delay caused by file access from another job or the like can be prevented. In addition, the shared file system is constructed by duplicating only a file necessary for the execution of the job 200, and hence the shared file system can be constructed quickly.
  • Second Embodiment
  • The computer system according to the first embodiment of this invention is configured by the login host 110 and the computing host 120, but a computer system according to a second embodiment of this invention further includes an I/O host, which is a computer having a higher file access capability than the computing host 120. According to the second embodiment of this invention, by the I/O host accessing a file, a period of time required to process a job is shortened.
  • Hereinbelow, with reference to the drawings, the second embodiment of this invention will be described. It should be noted that the contents common to the first embodiment of this invention will be omitted as appropriate, and differences from the first embodiment of this invention will be mainly described.
  • FIG. 15 is a diagram illustrating an example of a configuration of the computer system according to the second embodiment of this invention.
  • As is described above, the computer system according to the second embodiment of this invention further includes a plurality of I/O hosts 1600 in addition to the login host 110 and a plurality of the computing hosts 120. The login host 110, the computing hosts 120, and the I/O hosts 1600 are coupled via the network 100. Further, the plurality of I/O hosts 1600 are coupled to one another via the network 100 as well.
  • The hardware configurations of the login host 110 and the computing host 120 are the same as those of the first embodiment of this invention.
  • Similarly to the computing hosts 120, the I/O hosts 1600 are dynamically allocated for a job at the time of execution of the job. Here, the number of the I/O hosts 1600 to be allocated for the job does not necessarily have to be equal to the number of the computing hosts 120.
  • The I/O host 1600 includes a CPU 1601, a memory 1602, a storage device 1603, and an NIC 1604. The CPU 1601 executes a file system program by executing a program stored in the memory 1602.
  • The memory 1602 stores a program to be executed by the CPU 1601, and data necessary for executing the program. For example, the memory 1602 stores a control program necessary for file access.
  • The storage device 1603 stores programs and files. For example, in a case where a shared file system is constructed, a file which is accessed by the job 200 is stored.
  • The NIC 1604 is an interface for establishing connection with the login host 110 and the computing host 120 via the network 100.
  • Next, a software configuration according to the second embodiment of this invention will be described. Similarly to the case of the first embodiment of this invention, the login host 110 according to the second embodiment of this invention stores the job 200 and a job scheduler 210 in the memory 112.
  • The job 200 is the same as that of the first embodiment of this invention. On the other hand, the job scheduler 210 includes an I/O host management table 1700 in addition to the configuration of the job scheduler 210 of the first embodiment of this invention.
  • The I/O host management table 1700 keeps identification information for the I/O hosts 1600, and allocation states of the job 200 among the I/O hosts 1600. It should be noted that detailed description of the I/O host management table 1700 will be made later with reference to FIG. 16.
  • Further, similarly to the case of the first embodiment of this invention, the computing host 120 stores the file system program 230 and the program 220 in the memory 122. The file system program 230 and the program 220 may be configured in the same manner as those of the first embodiment of this invention.
  • Here, with regard to the configuration of the file system program 230, in a case where all file accesses are carried out through the I/O hosts 1600, such a configuration that excludes the master server module 233 and the sub-server module 234 may be employed.
  • The I/O host 1600 stores the file system program 230 in the memory 1602. The configuration of the file system program 230 is the same as the file system program 230 provided to the computing host 120 according to the first embodiment of this invention. It should be noted that a configuration which excludes the client module 235 may be employed for the I/O host 1600.
  • FIG. 16 is a diagram illustrating an example of the I/O host management table 1700 according to the second embodiment of this invention.
  • The I/O host management table 1700 includes an I/O host identifier 1800, an I/O host name 1810, an IP address 1820, and an allocated job identifier 1830.
  • The I/O host identifier 1800 represents an identifier of the I/O host 1600 included in the computer system. The I/O host name 1810 represents a name of the I/O host 1600. The IP address 1820 represents an IP address of the I/O host 1600. The allocated job identifier 1830 represents an identifier of the job 200 which is allocated to the I/O host 1600.
  • FIG. 17 is a diagram illustrating an example of the file system configuration file 236 according to the second embodiment of this invention.
  • Similarly to the case of the first embodiment of this invention, the login host 110 creates, based on the received execution request for the job 200, the file system configuration file 236 which includes definition information of the shared file system. The login host 110 transmits the created file system configuration file 236 to the I/O host 1600, and gives an instruction to construct a shared file system.
  • The configuration of the file system configuration file 236 is the same as that of the first embodiment of this invention in terms of form. In the second embodiment of this invention, the functions of the master server and the sub-server are provided by the I/O host 1600, and hence, instead of the master server computing host name 500 and the sub-server computing host name list 510, the file system configuration file 236 includes a master server I/O host name 1900 and a sub-server I/O host name list 1910.
  • In a case where a shared file system is constructed using the file system configuration file 236 illustrated in FIG. 17, an I/O host 1600 having the I/O host identifier “io0” functions as the master server, whereas I/O hosts 1600 having the I/O host identifiers “io0” and “io1” function as the sub-servers. Further, computing hosts 120 having the computing host identifiers “comp0” and “comp1” function as the clients, and process a requested job.
  • FIG. 18 is a flow chart illustrating a procedure in which a job 200 waiting for execution is allocated to the computing host 120 and the I/O host 1600 by the job scheduling module 212 according to the second embodiment of this invention.
  • Apart from the procedure of the first embodiment of this invention illustrated in FIG. 6, the processing performed by the job scheduling module 212 according to the second embodiment of this invention is added with a procedure associated with the I/O host 1600.
  • First, similarly to the case of the first embodiment of this invention, the CPU 111 of the login host 110 judges whether or not any job waiting for execution is stored in the job queue 215 (Step 700). Then, the CPU 111 of the login host 110 further judges whether or not it is possible to assure a sufficient number of computing hosts 120 to execute the job 200 (Step 710).
  • In a case where it is possible to assure a sufficient number of computing hosts 120 to execute the job 200 (the result of Step 710 is “Yes”), the CPU 111 of the login host 110 further judges whether or not it is possible to assure a sufficient number of I/O hosts 1600 to execute the job 200 (Step 2000).
  • Specifically, the CPU 111 of the login host 110 refers to the I/O host management table 1700 to obtain the number of I/O hosts 1600 which have no value registered as the allocated job identifier 1830, and then judges whether or not the obtained number is equal to or larger than the number of I/O hosts 1600 necessary for executing the job 200. The I/O host 1600 is a computer having a higher file access capability, and hence the necessary number of I/O hosts 1600 may be set smaller than the number of computing hosts 120 necessary for executing the job. For example, the number of I/O hosts 1600 has only to satisfy a predetermined ratio with respect to the number of computing hosts 120 necessary for executing the job. Here, the predetermined ratio may be set as a ratio between all the I/O hosts 1600 and all the computing hosts 120 which constitute the computer system. Alternatively, the predetermined ratio may be set based on the capability of the I/O host 1600 or the like.
  • In a case where it is impossible to assure a sufficient number of I/O hosts 1600 to execute the job 200 (the result of Step 2000 is “No”), the CPU 111 of the login host 110 waits until a job which is being executed is completed.
  • On the other hand, in a case where it is possible to assure a sufficient number of I/O hosts 1600 to execute the job 200 (the result of Step 2000 is “Yes”), similarly to the case of the first embodiment of this invention, the CPU 111 of the login host 110 updates the computing host management table 216 (Step 720), and then executes the file system construction request module 213 (Step 721).
  • Further, the CPU 111 of the login host 110 obtains, from the I/O host management table 1700, information for the sufficient number of I/O hosts 1600 to execute the job 200, and then registers the identifier of the job 200 as the allocated job identifier 1830 of each of the I/O hosts 1600 (Step 2010).
  • In order to construct a shared file system, the CPU 111 of the login host 110 transmits the information of the I/O hosts 1600, which has been obtained in the processing of Step 2010, to the file system construction request module 213 (Step 2011).
  • Lastly, in a case where the shared file system has been constructed, the CPU 111 of the login host 110 obtains the job 200 waiting for execution from the job queue 215, and requests the computing host 120 to execute the obtained job 200 (Step 722).
  • FIG. 19 is a flow chart illustrating a procedure in which allocation of the job 200 to the computing hosts 120 and the I/O hosts 1600 is canceled by the job scheduling module 212 according to the second embodiment of this invention.
  • Upon completion of the execution of the job (the result of Step 800 is “Yes”), similarly to the case of the first embodiment of this invention, the CPU 111 of the login host 110 cancels the allocation of the executed job to the computing hosts 120 (Steps 810 to 812).
  • The CPU 111 of the login host 110 further refers to the I/O host management table 1700, and then obtains the information of the I/O hosts 1600 which have the identifier of the executed job 200 registered as the allocated job identifier 1830 (Step 2100).
  • The CPU 111 of the login host 110 transmits the information of the I/O hosts 1600, which has been obtained in the processing of Step 2100, to the file system destruction request module 214 (Step 2101). Lastly, the CPU 111 of the login host 110 clears the allocated job identifier 1830 of each of the obtained I/O hosts 1600 (Step 2102).
  • FIG. 20 is a flow chart illustrating a procedure in which a request to construct a shared file system is made by the file system construction request module 213 according to the second embodiment of this invention.
  • The processing performed by the file system construction request module 213 according to the second embodiment of this invention includes processing associated with the I/O host 1600 in addition to the procedure of the first embodiment of this invention illustrated in FIG. 8.
  • After reception of the information of the computing hosts 120 from the job scheduling module 212 (Step 900), the CPU 111 of the login host 110 receives information of the I/O hosts 1600 which are allocated for the job 200 (Step 2200).
  • Based on the received information of the I/O hosts 1600, the CPU 111 of the login host 110 creates the file system configuration file 236, and then registers an I/O host 1600 which is to serve as the master server (Step 2201). For example, an I/O host 1600 having the smallest identifier may be selected and registered from among the allocated I/O hosts 1600.
  • Next, the CPU 111 of the login host 110 registers I/O hosts 1600 which are to serve as the sub-servers (Step 2202). Specifically, the CPU 111 of the login host 110 registers all the I/O hosts 1600 which are allocated for the job 200 in the sub-server I/O host name list 1910 of the file system configuration file 236 illustrated in FIG. 17.
  • Similarly to the case of the first embodiment of this invention, the CPU 111 of the login host 110 registers computing hosts 120 which are to serve as the clients (Step 903). Further, the CPU 111 of the login host 110 registers a directory which is to store a shared file (Step 904).
  • The CPU 111 of the login host 110 transfers the created file system configuration file 236 to the I/O host 1600 which is registered as the master server of the shared file system in the processing of Step 2201 (Step 2210).
  • Lastly, the CPU 111 of the login host 110 transfers a file to be used for the job 200 to one of the computing hosts 120 which are registered as the clients of the shared file system in the processing of Step 903 (Step 906).
  • FIG. 21 is a flow chart illustrating a procedure in which a request to destruct the shared file system is made by the file system destruction request module 214 according to the second embodiment of this invention.
  • The processing performed by the file system destruction request module 214 according to the second embodiment of this invention includes processing associated with the I/O host 1600 in addition to the procedure of the first embodiment of this invention illustrated in FIG. 9.
  • Similarly to the case of the first embodiment of this invention, the CPU 111 of the login host 110 receives, from the job scheduling module 212, the information of the computing hosts 120 which are allocated the job 200 (Step 1000), and obtains the file used for the job 200 from the computing host 120 serving as the client (Step 1001).
  • The CPU 111 of the login host 110 further receives, from the job scheduling module 212, the information of the I/O hosts 1600 which are allocated the job 200 (Step 2300), and notifies the I/O host 1600 which is registered as the master server of the shared file system that the execution of the job 200 has been ended (Step 2301).
  • FIG. 22 is a flow chart illustrating a procedure in which a shared file system is constructed by the file system construction module 231 according to the second embodiment of this invention.
  • The processing performed by the file system construction module 231 according to the second embodiment of this invention is obtained by partly changing the procedure of the first embodiment of this invention illustrated in FIG. 10, and includes a procedure associated with the I/O host 1600. Besides, in the first embodiment of this invention, the processing is executed by the computing host 120, but, in the second embodiment of this invention, the processing is executed by the I/O host 1600.
  • The CPU 1601 of the I/O host 1600 receives the file system configuration file 236 transmitted by the job scheduler 210 of the login host 110 (Step 1100), and then starts up the master server module 233 (Step 1101). It should be noted that detailed description of the master server module 233 will be made later with reference to FIG. 24.
  • The CPU 1601 of the I/O host 1600 gives an instruction to start up the sub-server module 234 to all the I/O hosts 1600 which are registered in the sub-server I/O host name list 1910 of the file system configuration file 236 received in the processing of Step 1100 (Step 2400). It should be noted that the processing performed by the sub-server module 234 is the same as the processing of the first embodiment of this invention illustrated in FIG. 13 except that the operation subject has been changed from the computing host 120 to the I/O host 1600.
  • Lastly, the CPU 1601 of the I/O host 1600 gives an instruction to start up the client module 235 to the computing hosts 120 which are registered in the client computing host name list 520 of the file system configuration file 236 received in the processing of Step 1100 (Step 1103). It should be noted that detailed description of the client module 235 will be made later with reference to FIG. 25.
  • FIG. 23 is a flow chart illustrating a procedure in which a shared file system is destructed by the file system destruction module 232 according to the second embodiment of this invention.
  • The processing performed by the file system destruction module 232 according to the second embodiment of this invention is obtained by partly changing the procedure of the first embodiment of this invention illustrated in FIG. 11, and includes a procedure associated with the I/O host 1600.
  • The CPU 1601 of the I/O host 1600 receives a notification of end of the job from the job scheduler 210 of the login host 110 (Step 1200).
  • The CPU 1601 of the I/O host 1600 refers to the file system configuration file 236, and then instructs all the I/O hosts 1600 registered as the sub-servers to delete a file which is duplicated at the time of execution of the job 200, and a file which is generated during the execution of the job 200 (Step 2500).
  • Similarly to the case of the first embodiment of this invention, the CPU 1601 of the I/O host 1600 refers to the file system configuration file 236, and then instructs all the computing hosts 120 registered as the clients to suspend the client module 235 (Step 1202).
  • The CPU 1601 of the I/O host 1600 refers to the file system configuration file 236, and then instructs all the I/O hosts 1600 registered as the sub-servers to suspend the sub-server module 234 (Step 2510).
  • Lastly, the CPU 1601 of the I/O host 1600 suspends the master server module 233 (Step 1204).
  • FIG. 24 is a flow chart illustrating a procedure of the processing performed by the master server of the shared file system according to the second embodiment of this invention.
  • The processing performed by the master server module 233 according to the second embodiment of this invention is obtained by partly changing the procedure of the first embodiment of this invention illustrated in FIG. 12, and includes a procedure associated with the I/O host 1600.
  • The CPU 1601 of the I/O host 1600 receives a file access request transmitted from the computing host 120 serving as the client of the shared file system (Step 1300).
  • The CPU 1601 of the I/O host 1600 notifies the computing host 120, which has transmitted the access request in the processing of Step 1300, of the information of the sub-server I/O host 1600 which stores the requested file (Step 2600).
  • FIG. 25 is a flow chart illustrating a procedure of the processing performed by the client of the shared file system according to the second embodiment of this invention.
  • The processing performed by the client module 235 according to the second embodiment of this invention is obtained by partly changing the procedure of the first embodiment of this invention illustrated in FIG. 14, and includes a procedure associated with the I/O host 1600.
  • Similarly to the case of the first embodiment of this invention, the CPU 121 of the computing host 120 receives a file access requested through execution of the program 220 included in the job 200 (Step 1500).
  • The CPU 121 of the computing host 120 issues a file access request to the I/O host 1600 serving as the master server of the shared file system (Step 2700).
  • The CPU 121 of the computing host 120 receives, from the I/O host 1600 serving as the master server, the information of the sub-server I/O host 1600 which stores the file to be accessed (Step 2701).
  • Further, based on the information of the I/O host 1600 serving as the sub-server, which has been received in the processing of Step 2701, the CPU 121 of the computing host 120 issues a file access request (Step 2702).
  • Then, the CPU 121 of the computing host 120 receives, from the I/O host 1600 serving as the sub-server, an access result of the file access request transmitted in the processing of Step 2702 (Step 2703).
  • Lastly, the CPU 121 of the computing host 120 returns the access result of the file, which has been received in the processing of Step 2703, to the program 220 (Step 1505).
  • According to the second embodiment of this invention, similarly to the case of the first embodiment of this invention, it is possible to dynamically construct a shared file system which is configured by the computing hosts 120 which are allocated the job 200. Therefore, a processing delay caused by file access from another job or the like can be prevented. In addition, the shared file system is constructed by duplicating only a file necessary for the execution of the job 200, and hence the shared file system can be constructed quickly.
  • On top of that, according to the second embodiment of this invention, the constructing of a shared file system is executed using the I/O hosts 1600 having higher file access capabilities than the computing hosts 120. As a result, file access which is likely to become a bottleneck at the time of execution of a job can be processed with higher speed. In particular, it is possible to enhance the performance at the time of executing a job which imposes a heavy file access load, such as when the frequency of file access is high or when a file having a large volume is accessed.
  • This invention has been described above in detail according to the embodiments of this invention, but this invention is not limited to the above-mentioned embodiments, and various modifications may be made without departing the gist and scope of this invention.
  • This invention is applicable to file access management for a computer system, and more particularly, is applicable to shared file access management for a large-scale computer system.

Claims (16)

1. A file sharing method used for a computer system which includes a plurality of computers, and which executes a job requested,
the plurality of computers including a plurality of computing hosts which execute the job,
each of the plurality of computing hosts comprising:
a first interface coupled to another one of the plurality of computing hosts;
a first processor coupled to the first interface; and
a first memory coupled to the first processor, the file sharing method including the steps of, in a case where the job is executed by the plurality of computing hosts:
sharing, by the first processor of the each of the plurality of computing hosts which execute the job, a file necessary for executing the job;
executing, by the first processor of the each of the plurality of computing hosts which execute the job, the requested job through accessing the file; and
canceling, by the first processor of the each of the plurality of computing hosts which execute the requested job, the sharing of the file after the executing of the requested job is completed.
2. The file sharing method according to claim 1, wherein:
the plurality of computers further include a job scheduler for receiving an execution request for the job;
the job scheduler comprises:
a second interface coupled to the plurality of computing hosts;
a second processor coupled to the second interface; and
a second memory coupled to the second processor; and
the file sharing method further includes the steps of:
receiving, by the second processor, the execution request for the job;
selecting, by the second processor, the plurality of computing hosts which execute the job based on the received execution request for the job;
generating, by the second processor, a duplication of the file necessary for executing the job; and
sharing, by the first processor of the each of the plurality of computing hosts which execute the job, the generated duplication of the file.
3. The file sharing method according to claim 2, wherein:
the each of the plurality of computing hosts further includes a storage device for storing the file; and
the file sharing method further includes the steps of:
transferring, by the second processor, the duplication of the file to one of the plurality of computing hosts which execute the job; and
storing, by the first processor of the one of the plurality of computing hosts to which the duplication of the file is transferred, the duplication of the file in the storage device.
4. The file sharing method according to claim 3, further including the steps of:
selecting, by the second processor, a master server which keeps information for identifying the one of the plurality of computing hosts which stores the file from among the plurality of computing hosts;
identifying, by the first processor of the each of the plurality of computing hosts which execute the job, the one of the plurality of computing hosts which stores the file through making an inquiry to the master server in a case where the file necessary for executing the job is accessed; and
obtaining, by the first processor of the each of the plurality of computing hosts which execute the job, the file necessary for executing the job from the identified one of the plurality of computing hosts.
5. The file sharing method according to claim 4, further including the steps of:
transmitting, by the second processor, file sharing system configuration information that includes a list of the plurality of computing hosts which execute the job to the master server before the transferring of the duplication of the file; and
identifying, by the first processor of the master server, the one of the plurality of computing hosts which stores the file necessary for executing the job based on the file sharing system configuration information.
6. The file sharing method according to claim 2, wherein:
the plurality of computers further include a plurality of I/O hosts each having a higher file access capability than the plurality of computing hosts;
each of the plurality of I/O hosts comprises:
a third interface capable of communication with the job scheduler and the plurality of computing hosts;
a third processor coupled to the third interface;
a third memory coupled to the third processor; and
a storage device for storing the file; and
the file sharing method further includes the steps of:
selecting, by the second processor, at least one of the plurality of I/O hosts which stores the file based on the received execution request for the job;
transferring, by the second processor, the generated duplication of the file to the selected at least one of the plurality of I/O hosts;
storing, by the third processor of the selected at least one of the plurality of I/O hosts, the duplication of the file in the storage device; and
obtaining, by the first processor of the each of the plurality of computing hosts which execute the job, the file necessary for executing the job from the selected at least one of the plurality of I/O hosts.
7. The file sharing method according to claim 6, further including the steps of:
selecting, by the second processor, a master server which keeps information for identifying the at least one of the plurality of I/O hosts which stores the file from among the selected at least one of the plurality of I/O hosts;
identifying, by the first processor of the each of the plurality of computing hosts which execute the job, the at least one of the plurality of I/O hosts which stores the file through making an inquiry to the master server in a case where the file necessary for executing the job is accessed; and
obtaining, by the first processor of the each of the plurality of computing hosts which execute the job, the file necessary for executing the job from the identified at least one of the plurality of I/O hosts.
8. The file sharing method according to claim 7, further including the steps of:
transmitting, by the second processor, file sharing system configuration information which includes a list of the at least one of the plurality of I/O hosts which stores the file to the master server before the transferring of the duplication of the file; and
identifying, by the third processor of the master server, the at least one of the plurality of I/O hosts which stores the file necessary for executing the job based on the file sharing system configuration information.
9. The file sharing method according to claim 6, further including the step of determining, by the second processor, the number of the at least one of the plurality of I/O hosts to be selected based on a ratio between the number of the plurality of computing hosts and the number of the plurality of I/O hosts in the plurality of computers.
10. A computer system which includes a plurality of computers, and which executes a job requested, wherein:
the plurality of computers include a plurality of computing hosts which execute the job;
each of the plurality of computing hosts comprises:
a first interface coupled to another one of the plurality of computing hosts;
a first processor coupled to the first interface; and
a first memory coupled to the first processor; and
the each of the plurality of computing hosts which execute the job is configured to, in a case where the job is executed by the plurality of computing hosts:
share a file necessary for executing the job;
execute the requested job through accessing the file; and
cancel the sharing of the file after the executing of the requested job is completed.
11. The computer system according to claim 10, wherein:
the plurality of computers further comprise a job scheduler for receiving an execution request for the job;
the job scheduler comprises:
a second interface coupled to the plurality of computing hosts;
a second processor coupled to the second interface; and
a second memory coupled to the second processor;
the job scheduler is configured to:
receive the execution request for the job;
select the plurality of computing hosts which execute the job based on the received execution request for the job; and
generate a duplication of the file necessary for executing the job; and
the each of the plurality of computing hosts which execute the job is configured to share the generated duplication of the file.
12. The computer system according to claim 11, wherein:
the each of the plurality of computing hosts further comprises a storage device for storing the file;
the job scheduler is configured to transfer the duplication of the file to one of the plurality of computing hosts which execute the job; and
the one of the plurality of computing hosts to which the duplication of the file is transferred is configured to store the duplication of the file in the storage device.
13. The computer system according to claim 12, wherein:
the job scheduler is configured to select a master server which keeps information for identifying the one of the plurality of computing hosts which stores the file from among the plurality of computing hosts; and
the each of the plurality of computing hosts which execute the job is configured to, in a case where the file necessary for executing the job is accessed:
identify the one of the plurality of computing hosts which stores the file through making an inquiry to the master server; and
obtain the file necessary for executing the job from the identified one of the plurality of computing hosts.
14. The computer system according to claim 11, wherein:
the plurality of computers further include a plurality of I/O hosts each having a higher file access capability than the plurality of computing hosts;
each of the plurality of I/O hosts comprises:
a third interface capable of communication with the job scheduler and the plurality of computing hosts;
a third processor coupled to the third interface;
a third memory coupled to the third processor; and
a storage device for storing the file;
the job scheduler is further configured to:
select, by the second processor, at least one of the plurality of I/O hosts which stores the file based on the received execution request for the job; and
transfer, by the second processor, the generated duplication of the file to the selected at least one of the plurality of I/O hosts;
the selected at least one of the plurality of I/O hosts is configured to store the duplication of the file in the storage device; and
the each of the plurality of computing hosts which execute the job is configured to obtain the file necessary for executing the job from the selected at least one of the plurality of I/O hosts.
15. The computer system according to claim 14, wherein:
the job scheduler selects a master server which keeps information for identifying the at least one of the plurality of I/O hosts which stores the file from among the selected at least one of the plurality of I/O hosts; and
the each of the plurality of computing hosts which execute the job is further configured to, in a case where the file necessary for executing the job is accessed:
identify the at least one of the plurality of I/O hosts which stores the file through making an inquiry to the master server; and
obtain the file necessary for executing the job from the identified at least one of the plurality of I/O hosts.
16. A job scheduler which receives a job to be executed in a computer system which includes a plurality of computers, and which executes the job requested,
the plurality of computers including computing hosts which execute the job,
the job scheduler comprising:
an interface coupled to the computing hosts;
a processor coupled to the interface; and
a memory coupled to the processor,
wherein the processor is configured to:
receive an execution request for the job;
select at least one of the computing hosts which execute the job based on the received execution request for the job;
duplicate a file necessary for executing the job;
instruct each of the selected plurality of the computing hosts to construct a file sharing system for sharing the duplicated file in a case where a plurality of the computing hosts are selected;
instruct the each of the selected plurality of the computing hosts to execute the received job; and
instruct the each of the selected plurality of the computing hosts to destruct the constructed file sharing system after the executing of the job is completed.
US12/388,174 2008-12-26 2009-02-18 File sharing method, computer system, and job scheduler Expired - Fee Related US8442939B2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2008-333135 2008-12-26
JP2008333135A JP5294014B2 (en) 2008-12-26 2008-12-26 File sharing method, computer system, and job scheduler

Publications (2)

Publication Number Publication Date
US20100169271A1 true US20100169271A1 (en) 2010-07-01
US8442939B2 US8442939B2 (en) 2013-05-14

Family

ID=42286100

Family Applications (1)

Application Number Title Priority Date Filing Date
US12/388,174 Expired - Fee Related US8442939B2 (en) 2008-12-26 2009-02-18 File sharing method, computer system, and job scheduler

Country Status (2)

Country Link
US (1) US8442939B2 (en)
JP (1) JP5294014B2 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150341366A1 (en) * 2010-01-15 2015-11-26 Apple Inc. Specialized network fileserver
WO2016069037A1 (en) * 2014-11-01 2016-05-06 Hewlett Packard Enterprise Development Lp File system configuration data storage
WO2020205598A1 (en) * 2019-03-29 2020-10-08 Micron Technology, Inc. Computational storage and networked based system

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5640151B2 (en) * 2011-05-31 2014-12-10 株式会社日立製作所 Computer and data management method by computer
US20190124153A1 (en) * 2016-06-16 2019-04-25 Center Of Human-Centered Interaction For Coexistence Data processing device and method for data sharing among multiple users, and computer program

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5386517A (en) * 1993-01-26 1995-01-31 Unisys Corporation Dual bus communication system connecting multiple processors to multiple I/O subsystems having a plurality of I/O devices with varying transfer speeds
US20050117587A1 (en) * 2003-12-01 2005-06-02 Nec Corporation Distributed computing system for resource reservation and user verification
US20080148272A1 (en) * 2006-12-19 2008-06-19 Fujitsu Limited Job allocation program, method and apparatus
US20080222434A1 (en) * 2007-03-09 2008-09-11 Hitachi, Ltd. Method of power-aware job management and computer system
US20090260012A1 (en) * 2008-04-15 2009-10-15 International Business Machines Corporation Workload Scheduling
US20100132022A1 (en) * 2001-07-06 2010-05-27 Computer Associates Think, Inc. Systems and Methods for Information Backup
US7752171B2 (en) * 2001-08-20 2010-07-06 Datacentertechnologies N.V Efficient computer file backup system and method
US8209299B2 (en) * 2008-04-28 2012-06-26 International Business Machines Corporation Selectively generating program objects on remote node of a multi-node computer system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS60209858A (en) * 1984-04-04 1985-10-22 Hitachi Ltd File processing system
JPS63301352A (en) * 1987-06-02 1988-12-08 Nec Corp Exchange system for file shared data with communication control
JPH0713823A (en) * 1993-06-24 1995-01-17 Nec Corp File resource management system of virtual computer system
JPH0785020A (en) * 1993-09-20 1995-03-31 Hitachi Ltd Document managing method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5386517A (en) * 1993-01-26 1995-01-31 Unisys Corporation Dual bus communication system connecting multiple processors to multiple I/O subsystems having a plurality of I/O devices with varying transfer speeds
US20100132022A1 (en) * 2001-07-06 2010-05-27 Computer Associates Think, Inc. Systems and Methods for Information Backup
US7752171B2 (en) * 2001-08-20 2010-07-06 Datacentertechnologies N.V Efficient computer file backup system and method
US20050117587A1 (en) * 2003-12-01 2005-06-02 Nec Corporation Distributed computing system for resource reservation and user verification
US20080148272A1 (en) * 2006-12-19 2008-06-19 Fujitsu Limited Job allocation program, method and apparatus
US20080222434A1 (en) * 2007-03-09 2008-09-11 Hitachi, Ltd. Method of power-aware job management and computer system
US20090260012A1 (en) * 2008-04-15 2009-10-15 International Business Machines Corporation Workload Scheduling
US8209299B2 (en) * 2008-04-28 2012-06-26 International Business Machines Corporation Selectively generating program objects on remote node of a multi-node computer system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Atsuya Uno, "Software of the Earth Simulator," Journal of the Earth Simulator, Volume 3, September 2005, 52-59 *
Russel Sandberg, "The Sun Network Filesystem: Design, Implementation and Experience," Sun Microsystems, Inc. pages 1-15 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150341366A1 (en) * 2010-01-15 2015-11-26 Apple Inc. Specialized network fileserver
US10091203B2 (en) * 2010-01-15 2018-10-02 Apple Inc. Specialized network fileserver
US10305910B2 (en) 2010-01-15 2019-05-28 Apple Inc. Accessing specialized fileserver
WO2016069037A1 (en) * 2014-11-01 2016-05-06 Hewlett Packard Enterprise Development Lp File system configuration data storage
WO2020205598A1 (en) * 2019-03-29 2020-10-08 Micron Technology, Inc. Computational storage and networked based system
CN113678113A (en) * 2019-03-29 2021-11-19 美光科技公司 Computing storage device and networking-based system
EP3948555A4 (en) * 2019-03-29 2022-11-16 Micron Technology, Inc. Computational storage and networked based system
US11550500B2 (en) 2019-03-29 2023-01-10 Micron Technology, Inc. Computational storage and networked based system

Also Published As

Publication number Publication date
US8442939B2 (en) 2013-05-14
JP5294014B2 (en) 2013-09-18
JP2010152846A (en) 2010-07-08

Similar Documents

Publication Publication Date Title
US20220318184A1 (en) Virtual rdma switching for containerized applications
US10275851B1 (en) Checkpointing for GPU-as-a-service in cloud computing environment
US8713186B2 (en) Server-side connection resource pooling
KR101099221B1 (en) Software image creation in a distributed build environment
US9996401B2 (en) Task processing method and virtual machine
US9256464B2 (en) Method and apparatus to replicate stateful virtual machines between clouds
US20180123968A1 (en) Method and system for securely transmitting volumes into cloud
WO2017067016A1 (en) Extension of resource constraints for service-defined containers
KR20080106908A (en) Migrating a virtual machine that owns a resource such as a hardware device
US11809901B2 (en) Migrating the runtime state of a container between two nodes
US20190196875A1 (en) Method, system and computer program product for processing computing task
CN102693230B (en) For the file system of storage area network
JP2009251708A (en) I/o node control system and method
US8442939B2 (en) File sharing method, computer system, and job scheduler
CN113438295A (en) Container group address allocation method, device, equipment and storage medium
KR20060041928A (en) Scalable print spooler
CN107800779B (en) Method and system for optimizing load balance
JP2008107966A (en) Computer system
US11861386B1 (en) Application gateways in an on-demand network code execution system
CN111294293B (en) Network isolation method and device based on user mode protocol stack
US20050055419A1 (en) Method and apparatus for platform independent network virtual memory (PINVM) hierarchy
US20170090973A1 (en) Virtual node deployments of cluster-based applications
US10791088B1 (en) Methods for disaggregating subscribers via DHCP address translation and devices thereof
US11507512B2 (en) Fault tolerant cluster data handling
CN111294220A (en) Network isolation configuration method and device based on nginx

Legal Events

Date Code Title Description
AS Assignment

Owner name: HITACHI, LTD.,JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:YASUI, TAKASHI;SHIMIZU, MASAAKI;SIGNING DATES FROM 20090205 TO 20090209;REEL/FRAME:022275/0714

Owner name: HITACHI, LTD., JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:YASUI, TAKASHI;SHIMIZU, MASAAKI;SIGNING DATES FROM 20090205 TO 20090209;REEL/FRAME:022275/0714

STCF Information on status: patent grant

Free format text: PATENTED CASE

FEPP Fee payment procedure

Free format text: PAYOR NUMBER ASSIGNED (ORIGINAL EVENT CODE: ASPN); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

FPAY Fee payment

Year of fee payment: 4

FEPP Fee payment procedure

Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

LAPS Lapse for failure to pay maintenance fees

Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

STCH Information on status: patent discontinuation

Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362

FP Lapsed due to failure to pay maintenance fee

Effective date: 20210514