Herding behavior is the collective unwilling behavior of a group being led by individuals to move in a single direction to a specified target. In this work,we focused on a specific kind of herding behavior found in a flock of sheep being led by a dog, which we called as the shepherding task. A heuristic model of the shepherding task was developed by Strombom, et al, where interaction rules between the dog and sheep, and among individual sheep were identified. Although the heuristic model explained the behaviors of the dog and sheep, how they learned the behaviors was not clear. Thus, we proposed to create a reinforcement learning model of the shepherding task based on the existing heuristic model. This study reconstructed the shepherding task using SARSA, an algorithm for learning the optimal policy in reinforcement learning. Results showed that with a discretized state and action space, the dog was able to successfully herd a flock of sheep to the target position by first learning to reach a subgoal. A reward was given when the dog reaches the neighborhood of a subgoal, while a penalty was incurred for each time the shepherding task was not completed. Finally, we were able to demonstrate an example of a completed shepherding task which showed the agent’s continuous success after the 350th episode