A two-component mechanism to deal with the straggler problem in Hadoop

APICHANUKUL WORACHATE ( 1551201 )


Hadoop, a well known distributed computing platform, is confronted with a problem called the issue of stragglers that some tasks of a job take unusual long execution time and delay the completion time of the job. The slow tasks are known as straggler tasks. In our work, we proposed a two-component mechanism, which consists of a detection mechanism and a prevention mechanism, to properly deal with the problem. The detection mechanism is developed based on a speculative algorithm, which is provided by Hadoop to detect the straggler tasks. However, the default speculative algorithm achieves low performance to classify slow tasks. Therefore, we propose an enhanced version of the speculative algorithm, called Accuracy Improvement for Backup Task (AIBT), to accurately identify straggler tasks. Since the detection mechanism is designed to detect the problem only when it occurred, we also proposed the prevention mechanism to protect Hadoop from the problem. We have found that an inefficient task distribution in existing scheduling algorithms is a cause of the problem. Existing scheduling algorithms distribute tasks such that some nodes are under-utilized while some nodes are over-utilized and suffer from the problem. The prevention mechanism is designed to effectively distribute tasks to be executed on suitable nodes. We evaluate both mechanisms by performing experiments in an actual environment under comprehensive situations. Numerical results show that our prevention mechanism works well with our AIBT detection algorithm. Both mechanisms not only relieve the straggler problem, but also reduce task execution time and increase the amount of local execution compared with the existing algorithms.