Device-to-Device (D2D) communication empowers direct communication between two proximal users and helps in offloading network traffic. It is considered as one of the most promising 3GPP Long Term Evolution (LTE)-Advanced technologies for improving the spectrum and energy efficiency of Proximity-based Services (ProSe). However, due to interference posed by the D2D transmitters to the primary cellular users, intelligent resource allocation techniques need to be incorporated for the betterment of the overall system. Overlay D2D communication avoids the interference between cellular and D2D users by dedicating resources for D2D users. With this approach, efficient use of the dedicated Physical Resource Blocks (PRBs) among D2D users is still a concern. To handle the issue of optimum spectral efficiency, efficient allocation of spectrum resources is essential among D2D users. For the first time, we address this problem and propose a PRB allocation scheme that maximizes the PRB utilization among D2D users. The proposed scheme spreads the PRB requirement of a D2D pair over multiple PRBs while reducing the transmit power over each PRB, thereby reducing the overall interference and improving the spectral efficiency of the network. Extensive simulations demonstrate that our scheme does better than the conventional PRB allocation scheme (of assigning PRBs based on their data rate demand), by increasing the overall throughput and energy efficiency of the network. To analytically evaluate the proposed PRB allocation scheme, we present a stochastic geometry based framework to analyze the coverage and average rate of the network, and conclude that the link SINR of the D2D pairs is actually improved on employing our proposed scheme which results in high coverage and overall rate of the system. Finally, we propose a Q-Learning algorithm that learns the optimum number of PRBs to dedicate for overlay D2D communication based on the PRB demand of the cellular and D2D users. Extensive simulations reveal that the proposed algorithm efficiently allocates PRBs to the D2D pairs when compared to the conventional static scheme of dedicating fixed number of PRBs. © 2017 Elsevier B.V.