poisson regression


  • Extensions Regularized Poisson regression[edit] When estimating the parameters for Poisson regression, one typically tries to find values for θ that maximize the likelihood
    of an expression of the form where m is the number of examples in the data set, and is the probability mass function of the Poisson distribution with the mean set to .

  • Maximum likelihood-based parameter estimation Given a set of parameters θ and an input vector x, the mean of the predicted Poisson distribution, as stated above, is given
    by and thus, the Poisson distribution’s probability mass function is given by Now suppose we are given a data set consisting of m vectors , along with a set of m values .

  • Thus, when given a Poisson regression model and an input vector , the predicted mean of the associated Poisson distribution is given by If are independent observations with
    corresponding values of the predictor variables, then can be estimated by maximum likelihood.

  • Then, for a given set of parameters θ, the probability of attaining this particular set of data is given by By the method of maximum likelihood, we wish to find the set of
    parameters θ that makes this probability as large as possible.

  • [4] Another common problem with Poisson regression is excess zeros: if there are two processes at work, one determining whether there are zero events or any events, and a
    Poisson process determining how many events there are, there will be more zeros than a Poisson regression would predict.

  • This logged variable, log(exposure), is called the offset variable and enters on the right-hand side of the equation with a parameter estimate (for log(exposure)) constrained
    to 1. which implies Offset in the case of a GLM in R can be achieved using the offset() function: Overdispersion and zero inflation[edit] A characteristic of the Poisson distribution is that its mean is equal to its variance.

  • Other generalized linear models such as the negative binomial model or zero-inflated model may function better in these cases.

  • When both sides of the equation are then logged, the final model contains log(exposure) as a term that is added to the regression coefficients.

  • Under some circumstances, the problem of overdispersion can be solved by using quasi-likelihood estimation or a negative binomial distribution instead.

  • “Exposure” and offset[edit] Poisson regression may also be appropriate for rate data, where the rate is a count of events divided by some measure of that unit’s exposure (a
    particular unit of observation).

  • A formula in this form is typically difficult to work with; instead, one uses the log-likelihood: Notice that the parameters only appear in the first two terms of each term
    in the summation.

  • Poisson regression in practice Poisson regression may be appropriate when the dependent variable is a count, for instance of events such as the arrival of a telephone call
    at a call centre.

  • The maximum-likelihood estimates lack a closed-form expression and must be found by numerical methods.

  • [5] Use in survival analysis[edit] Poisson regression creates proportional hazards models, one class of survival analysis: see proportional hazards models for descriptions
    of Cox models.

  • In certain circumstances, it will be found that the observed variance is greater than the mean; this is known as overdispersion and indicates that the model is not appropriate.

  • However, the negative log-likelihood, , is a convex function, and so standard convex optimization techniques such as gradient descent can be applied to find the optimal value
    of .


Works Cited

[‘Greene, William H. (2003). Econometric Analysis (Fifth ed.). Prentice-Hall. pp. 740–752. ISBN 978-0130661890.
2. ^ Paternoster R, Brame R (1997). “Multiple routes to delinquency? A test of developmental and general theories of crime”. Criminology.
35: 45–84. doi:10.1111/j.1745-9125.1997.tb00870.x.
3. ^ Berk R, MacDonald J (2008). “Overdispersion and Poisson regression”. Journal of Quantitative Criminology. 24 (3): 269–284. doi:10.1007/s10940-008-9048-4. S2CID 121273486.
4. ^ Ver Hoef, JAY
M.; Boveng, Peter L. (2007-01-01). “Quasi-Poisson vs. Negative Binomial Regression: How should we model overdispersed count data?”. Ecology. 88 (11): 2766–2772. doi:10.1890/07-0043.1. PMID 18051645. Retrieved 2016-09-01.
5. ^ Schwarzenegger, Rafael;
Quigley, John; Walls, Lesley (23 November 2021). “Is eliciting dependency worth the effort? A study for the multivariate Poisson-Gamma probability model”. Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability:
5. doi:10.1177/1748006X211059417.
6. ^ Perperoglou, Aris (2011-09-08). “Fitting survival data with penalized Poisson regression”. Statistical Methods & Applications. Springer Nature. 20 (4): 451–462. doi:10.1007/s10260-011-0172-1. ISSN 1618-2510.
S2CID 10883925.
Photo credit: https://www.flickr.com/photos/seanfx/7070694881/’]