When an automatic speech recognition (ASR) system is deployed for real-world applications, it often receives only one utterance at a time for decoding. This single utterance could be of short duration depending on the ASR task. In these cases, robust estimation of speaker normalizing methods like feature-space maximum likelihood linear regression (FMLLR) and i-vectors may not be feasible. In this paper, we propose two unsupervised speaker normalization techniques - one at feature level and other at model level of acoustic modeling - to overcome the drawbacks of FMLLR and i-vectors in real-time scenarios. At feature level, we propose the use of deep neural networks (DNN) to generate pseudo-FMLLR features from time-synchronous pair of filterbank and FMLLR features. These pseudo-FMLLR features can then be used for DNN acoustic model training and decoding. At model level, we propose a generalized distillation framework, where a teacher DNN trained on FMLLR features guides the training and optimization of a student DNN trained on filterbank features. In both the proposed methods, the ambiguity in choosing the speaker-specific FMLLR transform can be reduced by augmenting i-vectors to the input filterbank features. Experiments conducted on 33-h and 110-h subsets of Switchboard corpus show that the proposed methods provide significant gains over DNNs trained on FMLLR, i-vector appended FMLLR, filterbank and i -vector appended filterbank features, in real-time scenario. © 2014 IEEE.