Header menu link for other important links
X
Chisel: A resource savvy approach for handling skew in mapreduce applications
Published in
2013
Pages: 652 - 660
Abstract
Skew mitigation has been a major concern in distributed programming frameworks like MapReduce. It is becoming more prominent with the increasing complexity in user requirements and computation involved. We present Chisel, a self-regulating skew detection and mitigation policy for MapReduce applications. The novelty of the approach is that it involves no scanning or sampling of input data to detect skew and hence incurs low overhead, provides better resource utilization and maintains output order and file structure. It is also transparent to the users and can be used as a plugin whenever required. We use Hadoop to implement our skew handling policies. Chisel implements two skew handling policies for mitigating skew. It does late skew detection for map operators i.e at the last wave of map execution, where skewed maps are selected on the basis of remaining time to complete. More maps are created dynamically over remaining data per block. An early skew detection i.e before starting shuffle phase, is done for reduce operator. This prevents the expensive shuffle and sort phases from delaying skew detection and job completion time. Multiple reducers are created per skewed partition, each shuffling data from a subset of total maps and starts processing it when their portion of maps are over. They need not wait for the completion of all the maps. Therefore, the barrier between map and reduce phase no longer remains a constraint for effective resource utilization. Chisel additionally implements an online job profiler to determine the start point of reduce tasks and also modifies the capacity scheduler to distribute reduce tasks evenly in the cluster. Chisel significantly decreases the overall execution time of jobs and increases resource utilization. Improvement depends directly upon the availability of resources in the cluster and skewness in the job. © 2013 IEEE.
About the journal
JournalIEEE International Conference on Cloud Computing, CLOUD
ISSN21596182
Open AccessNo
Concepts (12)
  •  related image
    Distributed programming
  •  related image
    FILE STRUCTURE
  •  related image
    MAP-REDUCE
  •  related image
    MITIGATION POLICIES
  •  related image
    OVERALL EXECUTION
  •  related image
    Resource utilizations
  •  related image
    SKEW HANDLING
  •  related image
    User requirements
  •  related image
    Cloud computing
  •  related image
    Resource allocation
  •  related image
    Scheduling
  •  related image
    Tools