Image annotation is defined as the task of assigning semantically relevant tags to an image. Features such as color, texture, and shape are used by many machine learning algorithms for the image annotation task. Success of these algorithms is dependent on carefully handcrafted features. Deep learning models use multiple layers of processing to learn abstract, high level representations from raw data. Deep belief networks are the most commonly used deep learning models formed by pre-training the individual Restricted Boltzmann Machines in a layer-wise fashion and then stacking together and training them using error back-propagation. However, the time taken to train a deep learning model is extensive. To reduce the time taken for training, models that try to eliminate backpropagation by using convex optimization and kernel trick to get a closedform solution for the weights of the connections have been proposed. In this paper we explore two such models, Tensor Deep Stacking Network and Kernel Deep Convex Network, for the task of automatic image annotation. We use a deep convolutional network to extract high level features from different sub-regions of the images, and then use these features as inputs to these models. Performance of the proposed approach is evaluated on benchmark image datasets. © Springer International Publishing Switzerland 2015.