Header menu link for other important links
X
Improved tail bounds for missing mass and confidence intervals for good-turing estimator
Published in Institute of Electrical and Electronics Engineers Inc.
2019
Abstract
The missing mass of a sequence is defined as the total probability of the elements that have not appeared or occurred in the sequence. The popular Good-Turing estimator for missing mass has been used extensively in language modeling and ecological studies. Exponential tail bounds have been known for missing mass, and improving them results in better confidence in estimation. In this work, we first show that missing mass is sub-Gamma on the right tail with the best-possible variance parameter under the Poisson and multinomial sampling models. This results in a right tail bound that beats the previously best known tail bound for deviation from mean up to about 0.2785. Further, we show that the sub-Gaussian approach cannot result in any improvement in the right tail bound for Poisson sampling. We derive confidence intervals for the Good-Turing estimator with better confidence levels and narrower width when compared to existing ones. Our results are worst case over all distributions. © 2019 IEEE.
About the journal
JournalData powered by Typeset25th National Conference on Communications, NCC 2019
PublisherData powered by TypesetInstitute of Electrical and Electronics Engineers Inc.
Open AccessNo
Concepts (9)
  •  related image
    Confidence interval
  •  related image
    Confidence levels
  •  related image
    ECOLOGICAL STUDIES
  •  related image
    Exponential tail
  •  related image
    Language model
  •  related image
    Poisson sampling
  •  related image
    SAMPLING MODEL
  •  related image
    TOTAL PROBABILITIES
  •  related image
    Modeling languages