Context is an important aspect for accurate saliency detection. However, the question of how to formally model image context within saliency detection frameworks is still an open problem. Recent saliency detection models designed using complex Deep Neural Networks to extract robust features, however often fail to select the right contextual features. These methods generally utilize physical attributes of objects for generating final saliency maps, but ignores scene contextual information. In this paper, we overcome such limitation using (i) a proposed novel end-to-end framework with a Contextual Unit (CTU) module that models the scene contextual information to give efficient saliency maps with the help of Convolutional GRU (Conv-GRU). This is the first work reported so far that utilizes Conv-GRU to generate image saliency maps. In addition, (ii) we propose a novel way of using the Conv-GRU that helps to refine saliency maps based on input image context. The proposed model has been evaluated on challenging benchmark saliency datasets, where it outperforms prominent state-of-the-art methods. © 2019, Springer Nature Switzerland AG.