Header menu link for other important links
Missing Mass of Markov Chains
, Rajaraman N.
Published in Institute of Electrical and Electronics Engineers Inc.
Volume: 2020-June
Pages: 1207 - 1212
Estimation of missing mass or the total probability of unseen letters in a random sample is an important problem with several applications. The Good-Turing (GT) estimator is one of the most popular estimators for missing mass. The bias of the GT estimator is known to fall inversely with the number of samples when they are independent and identically distributed. When the samples form a Markov chain, very little is known about the convergence of the GT estimator even though it is widely used for smoothing in language models that are mostly Markovian. In this work, we initiate and make progress towards characterizing bias of the GT estimator for missing mass in Markov chains. We develop a useful 'multi-letter' characterization of the bias, which leads to sufficient conditions on the transition probability matrix for convergence of the bias of the GT estimator to zero. © 2020 IEEE.
About the journal
JournalData powered by TypesetIEEE International Symposium on Information Theory - Proceedings
PublisherData powered by TypesetInstitute of Electrical and Electronics Engineers Inc.
Open AccessNo