With the rapid increase in mobile data traffic (especially video content traffic), the number of content access requests is also increasing. In this context, device-to-device (D2D) communication has been an effective technology in increasing the spectral efficiency and reducing the load by offloading the mobile data traffic in cellular networks. In order to reduce the load on eNodeBs (eNBs), centralized area controller (CAC) has been proposed earlier to take content aware decisions for content access requests. In this paper, we propose to exploit CAC in a distributed manner. Distributed D2D controller (DDC) is responsible for arbitrating and scheduling D2D transfers. Thus, every content access request from a user equipment (UE) is first served by a neighboring DDC and in case the content is not available in the region, the request is forwarded to the eNB. Further, it has been observed that the inclusion of DDCs in the existing D2D architecture reference model requires the addition of new reference points. The same has been highlighted in this paper. Due to the spatiotemporal correlation in the mobile data traffic, a learning algorithm can be employed that determines the number of required resources to fulfill the D2D data rate requirements. Hence, we propose a Q-learning based algorithm that learns the expected number of resource blocks (RBs) required to meet the data rate requirements at each DDC. In addition, a cache content management policy is proposed that exploits the popularity of the contents in order to increase the chances of D2D communication. With extensive simulations, we observe that the proposed Q-learning algorithm indeed learns by reserving near-optimal number of RBs to serve the data rate requirement at each DDC. Further, we observe that the one- and two-hop modes of D2D transfer effectively reduces the load on eNB by transferring a maximum 49% of the required data to the UEs. © 2004-2012 IEEE.