[go: up one dir, main page]
More Web Proxy on the site http://driver.im/

CN106371919B - It is a kind of based on mapping-reduction computation model data cache method of shuffling - Google Patents

It is a kind of based on mapping-reduction computation model data cache method of shuffling Download PDF

Info

Publication number
CN106371919B
CN106371919B CN201610712705.5A CN201610712705A CN106371919B CN 106371919 B CN106371919 B CN 106371919B CN 201610712705 A CN201610712705 A CN 201610712705A CN 106371919 B CN106371919 B CN 106371919B
Authority
CN
China
Prior art keywords
shuffling
reduction
node
data
mapping
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610712705.5A
Other languages
Chinese (zh)
Other versions
CN106371919A (en
Inventor
付周望
王一丁
戚正伟
管海兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201610712705.5A priority Critical patent/CN106371919B/en
Publication of CN106371919A publication Critical patent/CN106371919A/en
Application granted granted Critical
Publication of CN106371919B publication Critical patent/CN106371919B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0709Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/073Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a memory management context, e.g. virtual memory or cache management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of based on mapping-reduction computation model data cache method of shuffling, one mapping-reduction work is sent to shuffle by the division that task is unit by interface including mapping-reduction Computational frame and caches host, shuffle cache host receive task divide data after, in addition timestamp is stored in local memory;It shuffles and caches host the mapping that each node of reduction task therein and cluster is done using random algorithm an a pair three by data is divided to task, and be stored in the form of Hash table in the memory for caching host of shuffling and etc..The present invention is able to ascend the calculated performance based on mapping-reduction model distributed computing framework, avoids the manual Checkpointing of inefficient user, promotes the robustness of distributed computing framework.

Description

It is a kind of based on mapping-reduction computation model data cache method of shuffling
Technical field
The present invention relates to computer distributed system and distributed computing framework fields.It specifically, is mainly base Distribution memory-based is provided in mapping (Map)-reduction (Reduce) computation model to shuffle (shuffle) data buffer storage, from And promote the performance and robustness of the Computational frame.
Background technique
Mapping-reduction computation model and the distributed computing system designed based on this model are the big datas of current mainstream Distributed system, such as Spark, Hadoop.There is one between mapping and reduction stages to wash for calculating based on this model Board (Shuffle) is isolated by mapping and reduction.Current all designs are done using the data write-in disk that will shuffle Persistence processing, is then transmitted again.And the performance of disk can not show a candle to memory, therefore it is biggish to computing system to bring this Performance cost.
Simultaneously, the Computational frame of the type mainly guarantees the fault-tolerance (Hadoop) calculated by disk, or User is needed manually to increase checkpoint (Spark).These fault tolerant mechanisms are not filled not only due to overlapped with calculating logic Divide and utilize existing hardware feature, and is interspersed in the performance for leveraging in calculating process and calculating itself.
Although having some distributed file systems memory-based at present, important is be directed to data block sheet for they Body, and the volume of data block itself is often much larger than data of shuffling, it is therefore desirable to a large amount of memory is as support.Based on above Background shuffles data cache method the present invention provides a kind of distribution memory-based to eliminate to shuffle and transmit and based on disk Fault-tolerance mechanism bring performance cost, promote the performance and robustness of Computational frame.
Summary of the invention
The present invention is directed to based on mapping-reduction model distributed computing system, is transmitted data buffer storage by that will shuffle and is existed Shuffle transmission and the fault-tolerance mechanism bring performance cost based on disk are eliminated in the memory of distributed system.Of the invention Technical solution is as follows:
A kind of data cache method of shuffling of mapping-specification computation model, includes the following steps:
Step 1: a mapping-reduction work is passed through interface by the division that task is unit by mapping-reduction Computational frame It is sent to shuffle and caches host.Here task contains the ID that transmission of shuffling relies on, and mapping tasks sum and reduction task are total Number, after host receives, in addition timestamp is stored in local memory.
Step 2: shuffle and cache after host receives workload partition data, using random algorithm by reduction task therein with Each node of cluster does the mapping an of a pair three.And a reduction task corresponds to random three nodes, based on one of them Node is wanted, being left two is backup node.The mapping of reduction task and node is stored in the memory of host in the form of Hash table In.
Step 3: Computational frame dispatches one of node and executes a mapping tasks.The node has executed mapping tasks After calculating process, held by calling the interface of caching system that the data of shuffling of the mapping tasks are sent to local caching of shuffling The memory headroom of row device process.It returns immediately simultaneously, indicates that task execution is completed.
Step 4: when the actuator process of the caching system on a node receive mapping tasks shuffle data when, can be by According to the division mode (being specified by Computational frame) for data default of shuffling, data are divided into multiple reduction of shuffling according to reduction task Data block saves in memory.A usual mapping tasks can generate it is identical as reduction task number or be less than reduction task The data block of number.
Step 5: actuator caches the mapping table of host request reduction task and node to shuffling, which is entirely reflecting Penetrate-reduction work in can only execute it is primary).Mapping table ensure that the distribution rules of all actuators are consistent.Actuator according to The reduction task of host and the mapping table of node will divide the reduction data block of shuffling that finishes and be distributed to and be corresponding to it in step 4 Three reduction task remote nodes.Actuator transmission shuffle reduction data block when, can according to main and subordinate node in step 2 setting The label of master-slave back-up is added to data block respectively.
Step 6: remote node receives the label that the data block is read when shuffling reduction data block.If the label is shown as Master backup then saves it in memory, is written into hard disk if fruit is from backup.
Step 7: the process that step 3 arrives step 6 is repeated, until all mapping tasks of the work are finished, into step Rapid 10.
Step 8: the distribution feelings of all reduction tasks of interface polls of the Computational frame before scheduling through caching system of shuffling Condition.
Step 9: Computational frame dispatches reduction task according to the distribution situation of reduction task.Computational frame is chosen wherein first Master backup node, a reduction task is distributed on the node.If master backup node failure enters step 10, otherwise Enter step 11.
Step 10: Computational frame selects to send reduction task on the node from backup node.If two from backup Node fails simultaneously, then the mission failure, mistake of dishing out.Terminate all steps.
Step 11: reduction task caches actuator to local shuffling by interface and obtains data when executing on node.
Step 12: local shuffle caches after actuator receives request, whether in memory data is first checked for, as including Corresponding data is then directly returned in depositing, otherwise degaussing, which is examined and seized, takes corresponding data and return.
Step 13: reduction task starts to calculate after receiving data.
Step 14: repeating step 9 to step 13 until all reduction task executions finish, mapping-reduction work terminates.
It shuffles the data cached replacement policy of caching system:
Due to the memory resource limitation of each node, performance when in order to not influence task execution, caching system of shuffling is only Fixed memory headroom (can be arranged by configuration file) can be occupied.It, will be active and standby however as the continuous execution of task The memory cache of part node is largely shuffled reduction data.In order to save memory source, caching system of shuffling provides slow earliest Deposit the strategy that task is removed at first.The strategy follows following steps.
Step 1: it is insufficient to memory surplus that caching system of shuffling executes nodal test.
Step 2: the backup node sends to caching system host of shuffling and rejects request.
Step 3: after caching system host of shuffling receives rejecting request, being recorded according to local memory, find and be buffered earliest The mapping-corresponding transmission of shuffling of work of shuffling rely on the master backup nodes of ID and all reduction tasks of the work.
Step 4: the transmission of shuffling is relied on the master that ID is broadcast to all caching systems of shuffling in cluster by caching system of shuffling Backup node.
Step 5: executing node and receive transmission of shuffling and rely on corresponding data block after ID from the memory of oneself from interior Deposit middle deletion.
Shuffling and caching the decorum is the recovery policy that the robustness that Computational frame provides is supported.
Since mapping-reduction Computational frame contains a large amount of mapping-reduction mistake when executing entire workflow Journey.If having cooperated caching system of shuffling, Computational frame will not need to carry out manually checkpointing to calculating data.It is in office Fail in business implementation procedure, then directly can directly restore from the data of recent mapping-reduction, greatly reduce Recovery time, improve calculated performance.The strategy follows following steps.
Step 1: there is run-time error in Computational frame.
Step 2: Computational frame executes logic according to user and starts to find the data persisted recently from back to front.
Step 3: whether having data of shuffling accordingly standby to caching system inquiry of shuffling by interface when Computational frame is searched Part.
Step 4: directly restoring since the step if having found backup.
Step 5: if not finding backup, continuing to find forward, if all do not backed up, according to Computational frame Fault tolerant mechanism starts to restore.
Compared with prior art, the beneficial effects of the present invention are: being able to ascend based on mapping-reduction model distribution The calculated performance for calculating frame (such as Spark, Hadoop) avoids the manual Checkpointing of inefficient user, promotes distributed computing The robustness of frame.
Detailed description of the invention
Fig. 1 configuration diagram
Fig. 2 mapping tasks operation schematic diagram
Fig. 3 host node reduction task execution schematic diagram
Fig. 4 is from node reduction task execution schematic diagram
Fig. 5 task division information
Fig. 6, which shuffles, caches host traceback information
Specific implementation method
It elaborates below with reference to attached drawing to the embodiment of the present invention.The present embodiment is in technical solution of the present invention and calculation Implemented under the premise of method, and provide detailed embodiment and specific operation process, but is applicable in platform and be not limited to following realities Apply example.The small-sized cluster that the concrete operations platform of this example is made of two common servers is equipped on each server 64 bit of UbuntuServer 14.04.1 LTS, and it is equipped with 8GB memory.Specific exploitation of the invention is based on Apache As explanation, other mappings such as Hadoop-reduction distributed computing framework is equally applicable for the source code version of Spark 1.6.It is first It first needs to make it through the interface of this method by the source code for modifying Spark to transmit data of shuffling.
The present invention modifies few portion of distributed computing framework by disposing caching system in distributed computing cluster Divide code, realizes and the interface of this method is called, the distributed prepare more part of Lai Shixian for data of shuffling in mapping-specification calculating Memory/disk buffering.Under the support of this method, it can be promoted existing based on mapping-reduction model distributed computing framework Performance and robustness.It is designed based on the framework in Fig. 1, uses Paxos agreement in the host for caching system of shuffling to tie up The consistency and robustness of protecting system state itself.Meanwhile it deploying to shuffle on each node of distributed computing cluster and hold Row device caches to be responsible for the transmission of specific data of shuffling and provides interface for the progress of work of distributed computing framework.It is reflecting Data of the workflow in stage being penetrated as shown in Fig. 2, distributed computing framework will shuffle are transferred to local execution of shuffling by interface Then the memory of device divides and selects backup to save by shuffling actuator according to parameters such as the mapping-reduction work number of tasks Point, and Backup Data.In reduction stages, distributed computing framework then passes through interface directly to local actuator request of shuffling Data.In the case where host node work, memory of the data from actuator of shuffling, as shown in Figure 3.If host node loses It loses, task can be scheduled for from node, read data from from the hard disk of node, as shown in Figure 4.
It is please configuration diagram referring initially to Fig. 1, Fig. 1, as shown, general frame of the invention is typical master-slave mode frame Structure, host are made of a working host and two backup hosts.They guarantee the consistency of state by Paxos agreement, from And whole system caused by collapsing as working host is avoided to collapse.In addition, all deployment is shuffled from every server of node Cache actuator.It needs to dispose modified Spark Computational frame in the same cluster simultaneously.
When Spark Computational frame is started to work, once there is the task comprising transmission of shuffling to be submitted by user, caching of shuffling System will be into work step described in people's claims, to provide the acceleration and robustness branch of transmission of shuffling for Spark It holds.And whole process is fully transparent for the user of Spark.
Since to take full advantage of memory data cached for the present embodiment.At the end of mapping tasks, reduction task can be straight It connects and reads required data in the local memory for cache actuator of shuffling, to accelerate the speed of entire distributed computing Degree.
Meanwhile when step a certain in the operation of Spark generation mistake, when needing to restore, it can find and be shuffled with forward recursion The data of caching system caching, then start to restore, to accelerate resume speed, promote the robustness of whole system.
It is a kind of based on mapping-reduction computation model data cache method of shuffling, include the following steps:
Computational frame and caching system cooperation operational process of shuffling:
Step 1: a mapping-reduction work is passed through interface by the division that task is unit by mapping-reduction Computational frame It is sent to shuffle and caches host.Here task contains the ID that transmission of shuffling relies on, and mapping tasks sum and reduction task are total Number, after host receives, in addition timestamp is stored in local memory.As shown in Figure 5.
Step 2: shuffle cache host receive task divide data after, using random algorithm by reduction task therein with Each node of cluster does the mapping an of a pair three.That is a reduction task corresponds to random three nodes, based on one of them Node is wanted, being left two is backup node.The mapping of reduction task and node is stored in the memory of host in the form of Hash table In, while host can stamp timestamp to respective record.Specific preservation information is as shown in Figure 6.
Step 3: Computational frame dispatches one of node and executes a mapping tasks.The node has executed mapping tasks After calculating process, held by calling the interface of caching system that the data of shuffling of the mapping tasks are sent to local caching of shuffling The memory headroom of row device process.It returns immediately simultaneously, indicates that task execution is completed.
Step 4: when the actuator process of the caching system on a node receive mapping tasks shuffle data when, can be by According to the division mode (being specified by Computational frame) for data default of shuffling, data are divided into multiple reduction of shuffling according to reduction task Data block saves in memory.A usual mapping tasks can generate it is identical as reduction task number or be less than reduction task The data block of number.
Step 5: actuator to shuffle cache host request reduction task and node mapping table and Fig. 6 represented by letter Breath.(step can only execute primary in entire mapping-reduction work).Mapping table ensure that the distribution rules of all actuators It is consistent.
Step 6: actuator is shuffled according to the reduction task of host and the mapping table of node by what division in step 4 finished Reduction data block is distributed to corresponding three reduction task remote node.
Step 7: actuator transmission shuffle reduction data block when, according to the setting of main and subordinate node in step 2 can give respectively number The label of master-slave back-up is added according to block.
Step 8: remote node receives the label that the data block is read when shuffling reduction data block.If the label is shown as Master backup then saves it in memory, is written into hard disk if fruit is from backup.
Step 9: the process that step 3 arrives step 8 is repeated, until all mapping tasks of the work are finished, into step Rapid 10.
Step 10: the distribution feelings of all reduction tasks of interface polls of the Computational frame before scheduling through caching system of shuffling Condition.
Step 11: Computational frame dispatches reduction task according to the distribution situation of reduction task.Computational frame chooses it first In master backup node, a reduction task is distributed on the node.If master backup node failure, 12 are entered step, it is no Then enter step 13.
Step 12: Computational frame selects to send reduction task on the node from backup node.If two from backup Node fails simultaneously, then the mission failure, mistake of dishing out.Terminate all steps.
Step 13: reduction task caches actuator to local shuffling by interface and obtains data when executing on node.
Step 14: local shuffle caches after actuator receives request, whether in memory data is first checked for, as including Corresponding data is then directly returned in depositing, otherwise degaussing, which is examined and seized, takes corresponding data and return.
Step 15: reduction task starts to calculate after receiving data.
Step 16: repeating step 11 to step 15 until all reduction task executions finish, enter step 17.
Step 17: the mapping-reduction work terminates.
It shuffles the data cached replacement policy of caching system:
Due to the memory resource limitation of each node, performance when in order to not influence task execution, caching system of shuffling is only Fixed memory headroom (can be arranged by configuration file) can be occupied.It, will be active and standby however as the continuous execution of task The memory cache of part node is largely shuffled reduction data.In order to save memory source, caching system of shuffling provides slow earliest Deposit the strategy that task is removed at first.The strategy follows following steps.
Step 1: it is insufficient to memory surplus that caching system of shuffling executes nodal test.
Step 2: the backup node sends to caching system host of shuffling and rejects request.
Step 3: after caching system host of shuffling receives rejecting request, being recorded according to local memory, find and be buffered earliest The mapping-corresponding transmission of shuffling of work of shuffling rely on the master backup nodes of ID and all reduction tasks of the work.
Step 4: the transmission of shuffling is relied on the master that ID is broadcast to all caching systems of shuffling in cluster by caching system of shuffling Backup node.
Step 5: executing node and receive transmission of shuffling and rely on corresponding data block after ID from the memory of oneself from interior Deposit middle deletion.
Shuffling and caching the decorum is the recovery policy that the robustness that Computational frame provides is supported.
Since mapping-reduction Computational frame contains a large amount of mapping-reduction mistake when executing entire workflow Journey.If having cooperated caching system of shuffling, Computational frame will not need to carry out manually checkpointing to calculating data.It is in office Fail in business implementation procedure, then directly can directly restore from the data of recent mapping-reduction, greatly reduce Recovery time, improve calculated performance.The strategy follows following steps.
Step 1: there is run-time error in Computational frame.
Step 2: Computational frame executes logic according to user and starts to find the data persisted recently from back to front.
Step 3: whether having data of shuffling accordingly standby to caching system inquiry of shuffling by interface when Computational frame is searched Part.
Step 4: directly restoring since the step if having found backup.
Step 5: if not finding backup, continuing to find forward, if all do not backed up, according to Computational frame Fault tolerant mechanism starts to restore.
By the benchmark program of the correlation Spark such as Word Count on the basis of this embodiment, this is demonstrated The correctness of invention, while the present invention has not in different benchmark programs in performance compared to the Spark of master With the promotion of degree.
The preferred embodiment of the present invention has been described in detail above.It should be appreciated that the ordinary skill of this field is without wound The property made labour, which according to the present invention can conceive, makes many modifications and variations.Therefore, all technician in the art Pass through the available technology of logical analysis, reasoning, or a limited experiment on the basis of existing technology under this invention's idea Scheme, all should be within the scope of protection determined by the claims.

Claims (3)

1. a kind of based on mapping-reduction computation model data cache method of shuffling, which is characterized in that this method includes following step It is rapid:
Step 1: mapping-reduction Computational frame is sent a mapping-reduction work by the division that task is unit by interface Cache host to shuffling, shuffle cache host receive task divide data after, in addition timestamp is stored in local memory;
Step 2: shuffling caches host and divides data using random algorithm for each of reduction task therein and cluster to task Node does the mapping an of a pair three, and is stored in the memory for caching host of shuffling in the form of Hash table, and a reduction is appointed Corresponding random three nodes of business, one of them is main node, and being left two is backup node;Step 3: Computational frame dispatches it In node execute a mapping tasks, should by the interface of calling caching system after which has executed mapping tasks The data of shuffling of mapping tasks are sent to local shuffle and cache the memory headroom of actuator process, return simultaneously, expression task is held Row is completed;
Step 4: when the actuator process of the caching system on a node receive mapping tasks shuffle data when, according to shuffling Data are divided into multiple reduction data blocks of shuffling according to reduction task, saved in memory by the division mode of data default;
Step 5: local shuffle caches actuator to the mapping table for caching host request reduction task and node of shuffling, and according to washing Board caches the reduction task of host and the mapping table of node, will be divided in step 4 the reduction data block of shuffling that finishes be distributed to Corresponding three reduction task remote node, and give data respectively according to the setting of main node in step 2 and backup node Block adds master backup and the label from backup;
Step 6: remote node receives the label that the data block is read when shuffling reduction data block, if the label is shown as active and standby Part then saves it in memory, is written into hard disk if the label is shown as from backup;If master backup node at this time Memory headroom it is insufficient, then the data of shuffling that can trigger caching system of shuffling reject step;7 are entered step simultaneously;
Step 7: repeating the process that step 3 arrives step 6 until all mapping tasks of the work are finished and enter step 8;
Step 8: the distribution situation of all reduction tasks of interface polls of the Computational frame before scheduling through caching system of shuffling;
Step 9: Computational frame dispatches reduction task according to the distribution situation of reduction task: choosing master backup section therein first One reduction task is distributed on the node by point, if master backup node failure, is entered step 10, is otherwise entered step 11;
Step 10: Computational frame selects to send reduction task on the node from backup node, if two from backup node It fails simultaneously, then the mission failure, mistake of dishing out terminates all steps;
Step 11: reduction task is shuffled to local by interface when executing on node and caches actuator acquisition data;
Step 12: local shuffle caches after actuator receives request, whether in memory to first check for data, such as in memory Corresponding data directly then is returned to the task, otherwise degaussing, which is examined and seized, takes corresponding data and return;
Step 13: reduction task starts to calculate after receiving data;
Step 14: repeating step 9 to step 13 until all reduction task executions finish, mapping-reduction work terminates.
2. according to claim 1 based on mapping-reduction computation model data cache method of shuffling, which is characterized in that The data of shuffling reject step, specific as follows:
Step 1: caching system of shuffling executes node, i.e. master backup nodal test is insufficient to memory surplus;
Step 2: the backup node sends to caching system host of shuffling and rejects request;
Step 3: shuffling and cache after host receives rejecting request, recorded according to local memory, find earliest buffered mapping-and wash The corresponding transmission of shuffling of board work relies on the master backup node of ID and all reduction tasks of the work;
Step 4: the transmission of shuffling is relied on the master backup that ID is broadcast to all caching systems of shuffling in cluster by caching system of shuffling Node;
Step 5: execute node receive shuffle transmission rely on ID after from the memory of oneself by corresponding data block from memory It deletes.
3. according to claim 1 based on mapping-reduction computation model data cache method of shuffling, which is characterized in that When run-time error occurs in Computational frame, caching system of shuffling provides the recovery policy of robustness support for Computational frame, specifically such as Under:
Step 1: Computational frame executes logic according to user and starts to find the data persisted recently from back to front;
Step 2: whether having data backup of shuffling accordingly to caching system inquiry of shuffling by interface when Computational frame is searched: such as Fruit has found backup and has then directly restored since the step;
If not finding backup, continue to find forward, until confirmation is not all backed up, then according to the fault-tolerant machine of Computational frame System starts to restore.
CN201610712705.5A 2016-08-24 2016-08-24 It is a kind of based on mapping-reduction computation model data cache method of shuffling Active CN106371919B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610712705.5A CN106371919B (en) 2016-08-24 2016-08-24 It is a kind of based on mapping-reduction computation model data cache method of shuffling

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610712705.5A CN106371919B (en) 2016-08-24 2016-08-24 It is a kind of based on mapping-reduction computation model data cache method of shuffling

Publications (2)

Publication Number Publication Date
CN106371919A CN106371919A (en) 2017-02-01
CN106371919B true CN106371919B (en) 2019-07-16

Family

ID=57878112

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610712705.5A Active CN106371919B (en) 2016-08-24 2016-08-24 It is a kind of based on mapping-reduction computation model data cache method of shuffling

Country Status (1)

Country Link
CN (1) CN106371919B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11061609B2 (en) * 2018-08-02 2021-07-13 MemVerge, Inc Distributed memory object method and system enabling memory-speed data access in a distributed environment
CN110690991B (en) * 2019-09-10 2021-03-19 无锡江南计算技术研究所 Non-blocking network reduction computing device and method based on logic tree
CN115203133A (en) * 2021-04-14 2022-10-18 华为技术有限公司 Data processing method and device, reduction server and mapping server

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105718244A (en) * 2016-01-18 2016-06-29 上海交通大学 Streamline data shuffle Spark task scheduling and executing method
CN105760215A (en) * 2014-12-17 2016-07-13 南京绿云信息技术有限公司 Map-reduce model based job running method for distributed file system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140059552A1 (en) * 2012-08-24 2014-02-27 International Business Machines Corporation Transparent efficiency for in-memory execution of map reduce job sequences
US9389994B2 (en) * 2013-11-26 2016-07-12 International Business Machines Corporation Optimization of map-reduce shuffle performance through shuffler I/O pipeline actions and planning

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760215A (en) * 2014-12-17 2016-07-13 南京绿云信息技术有限公司 Map-reduce model based job running method for distributed file system
CN105718244A (en) * 2016-01-18 2016-06-29 上海交通大学 Streamline data shuffle Spark task scheduling and executing method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Characterization and Optimization of Memory-Resident MapReduce on HPC Systems;Yandong Wang 等;《2014 IEEE 28th International Parallel and Distributed Processing Symposium》;20140523;799-808

Also Published As

Publication number Publication date
CN106371919A (en) 2017-02-01

Similar Documents

Publication Publication Date Title
US11163479B2 (en) Replicated state cluster with standby node state assessment during leadership transition
Almeida et al. ChainReaction: a causal+ consistent datastore based on chain replication
US9372908B2 (en) Merging an out of synchronization indicator and a change recording indicator in response to a failure in consistency group formation
Lorch et al. The SMART way to migrate replicated stateful services
CN103647669B (en) It is a kind of to ensure the conforming system and method for distributed data processing
US7987158B2 (en) Method, system and article of manufacture for metadata replication and restoration
CN105871603B (en) A kind of the real time streaming data processing fail recovery and method of data grids based on memory
US20170351584A1 (en) Managing a Redundant Computerized Database Using a Replicated Database Cache
CN105159818A (en) Log recovery method in memory data management and log recovery simulation system in memory data management
US20050223275A1 (en) Performance data access
US20090063807A1 (en) Data redistribution in shared nothing architecture
US20230127166A1 (en) Methods and systems for power failure resistance for a distributed storage system
US20170199760A1 (en) Multi-transactional system using transactional memory logs
CN106371919B (en) It is a kind of based on mapping-reduction computation model data cache method of shuffling
CN111475480A (en) Log processing method and system
Wang et al. BeTL: MapReduce checkpoint tactics beneath the task level
CN109783578A (en) Method for reading data, device, electronic equipment and storage medium
US11188516B2 (en) Providing consistent database recovery after database failure for distributed databases with non-durable storage leveraging background synchronization point
AU2015336250C1 (en) Recovery and fault-tolerance under computational indeterminism
US10929238B2 (en) Management of changed-block bitmaps
EP3377970B1 (en) Multi-version removal manager
US20140149697A1 (en) Memory Pre-Allocation For Cleanup and Rollback Operations
WO2023274409A1 (en) Method for executing transaction in blockchain system and blockchain node
CN105988885A (en) Compensation rollback-based operation system fault self-recovery method
US12001296B1 (en) Continuous lock-minimal checkpointing and recovery with a distributed log-based datastore

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant