A sub-problem of paramount importance in super-resolution is the generation of an upsampled image (or frame) that is ‘sharp’. In deblurring, the core problem itself is of removing the blur, and it is equivalent to the problem of generating a ‘sharper’ version of the given image. This sharpness in the generated image comes by accurately predicting the high-frequency details (commonly referred to as fine-details) such as object edges. Thus high-frequency prediction is a vital sub-problem in super-resolution and a core problem in deblurring. To generate a sharp upsampled or deblurred image, this paper proposes a multi-stage neural network architecture ‘HFR-Net’ that works on the principle of ‘explicit refinement and fusion of high-frequency details’. To implement this principle, HFR-Net is trained with a novel 2-phase progressive–retrogressive training method. In addition to the training method, this paper also introduces dual motion warping with attention. It is a technique that is specifically designed to handle videos that have different rates of motion. Results obtained from extensive experiments on multiple super-resolution and deblurring datasets reveal that the proposed approach gives better results than the current state-of-the-art techniques. © 2020 Elsevier Inc.