logistic regression

 

  • Then Yi can be viewed as an indicator for whether this latent variable is positive: The choice of modeling the error variable specifically with a standard logistic distribution,
    rather than a general logistic distribution with the location and scale set to arbitrary values, seems restrictive, but in fact, it is not.

  • For K measurements, defining as the explanatory vector of the k-th measurement, and as the categorical outcome of that measurement, the log likelihood may be written in a
    form very similar to the simple case above: As in the simple example above, finding the optimum β parameters will require numerical methods.

  • Alternatively, instead of minimizing the loss, one can maximize its inverse, the (positive) log-likelihood: or equivalently maximize the likelihood function itself, which
    is the probability that the given data set is produced by a particular logistic function: This method is known as maximum likelihood estimation.

  • It must be kept in mind that we can choose the regression coefficients ourselves, and very often can use them to offset changes in the parameters of the error variable’s distribution.

  • The logistic regression model itself simply models probability of output in terms of input and does not perform statistical classification (it is not a classifier), though
    it can be used to make a classifier, for instance by choosing a cutoff value and classifying inputs with probability greater than the cutoff as one class, below the cutoff as the other; this is a common way to make a binary classifier.

  • (Discrete variables referring to more than two possible choices are typically coded using dummy variables (or indicator variables), that is, separate explanatory variables
    taking the value 0 or 1 are created for each possible value of the discrete variable, with a 1 meaning “variable does have the given value” and a 0 meaning “variable does not have that value”.)

  • The values of and which maximize ℓ and L using the above data are found to be: which yields a value for μ and s of: Predictions[edit] The and coefficients may be entered into
    the logistic regression equation to estimate the probability of passing the exam.

  • As a generalized linear model[edit] The particular model used by logistic regression, which distinguishes it from standard linear regression and from other types of regression
    analysis used for binary-valued outcomes, is the way the probability of a particular outcome is linked to the linear predictor function: Written using the more compact notation described above, this is: This formulation expresses logistic
    regression as a type of generalized linear model, which predicts variables with various types of probability distributions by fitting a linear predictor function of the above form to some sort of arbitrary transformation of the expected value
    of the variable.

  • Two-way latent-variable model[edit] Yet another formulation uses two separate latent variables: where where EV1(0,1) is a standard type-1 extreme value distribution: i.e.

  • Similarly, an arbitrary scale parameter s is equivalent to setting the scale parameter to 1 and then dividing all regression coefficients by s. In the latter case, the resulting
    value of Yi* will be smaller by a factor of s than in the former case, for all sets of explanatory variables — but critically, it will always remain on the same side of 0, and hence lead to the same Yi choice.

  • : The formula can also be written as a probability distribution (specifically, using a probability mass function): As a latent-variable model[edit] The logistic model has
    an equivalent formulation as a latent-variable model.

  • The linear predictor function for a particular data point i is written as: where are regression coefficients indicating the relative effect of a particular explanatory variable
    on the outcome.

  • It is clear that the response variables are not identically distributed: differs from one data point to another, though they are independent given design matrix and shared
    parameters .

  • We can then express as follows: And the general logistic function can now be written as: In the logistic model, is interpreted as the probability of the dependent variable
    equaling a success/case rather than a failure/non-case.

  • Binary variables are widely used in statistics to model the probability of a certain class or event taking place, such as the probability of a team winning, of a patient being
    healthy, etc.

  • It also has the practical effect of converting the probability (which is bounded to be between 0 and 1) to a variable that ranges over — thereby matching the potential range
    of the linear prediction function on the right side of the equation.

  • This makes it possible to write the linear predictor function as follows: using the notation for a dot product between two vectors.

  • The corresponding probability of the value labeled “1” can vary between 0 (certainly the value “0”) and 1 (certainly the value “1”), hence the labeling;[2] the function that
    converts log-odds to probability is the logistic function, hence the name.

  • This special value of n is termed the “pivot index”, and the log-odds (tn) are expressed in terms of the pivot probability and are again expressed as a linear combination
    of the explanatory variables: Note also that for the simple case of , the two-category case is recovered, with and .

  • The reason for using logistic regression for this problem is that the values of the dependent variable, pass and fail, while represented by “1” and “0”, are not cardinal numbers.

  • One useful technique is to equate the derivatives of the log likelihood with respect to each of the β parameters to zero yielding a set of equations which will hold at the
    maximum of the log likelihood: where xmk is the value of the xm explanatory variable from the k-th measurement.

  • Let us assume that is a linear function of a single explanatory variable (the case where is a linear combination of multiple explanatory variables is treated similarly).

  • [2] So we define odds of the dependent variable equaling a case (given some linear combination of the predictors) as follows: The odds ratio[edit] For a continuous independent
    variable the odds ratio can be defined as: The image represents an outline of what an odds ratio looks like in writing, through a template in addition to the test score example in the “Example” section of the contents.

  • Analogous linear models for binary variables with a different sigmoid function instead of the logistic function (to convert the linear combination to a probability) can also
    be used, most notably the probit model; see § Alternatives.

  • This linear relationship may be extended to the case of M explanatory variables: where t is the log-odds and are parameters of the model.

  • To begin with, we may consider a logistic model with M explanatory variables, and, as in the example above, two categorical values (y = 0 and 1).

  • Once the beta coefficients have been estimated from the data, we will be able to estimate the probability that any subsequent set of explanatory variables will result in any
    of the possible outcome categories.

  • The above formula shows that once the are fixed, we can easily compute either the log-odds that for a given observation, or the probability that for a given observation.

  • This is important in that it shows that the value of the linear regression expression can vary from negative to positive infinity and yet, after transformation, the resulting
    expression for the probability ranges between 0 and 1.

  • The defining characteristic of the logistic model is that increasing one of the independent variables multiplicatively scales the odds of the given outcome at a constant rate,
    with each independent variable having its own parameter; for a binary dependent variable this generalizes the odds ratio.

  • These can be combined into a single expression: This expression is more formally known as the cross-entropy of the predicted distribution from the actual distribution , as
    probability distributions on the two-element space of (pass, fail).

  • The third line writes out the probability mass function of the Bernoulli distribution, specifying the probability of seeing each of the two possible outcomes.

  • Each point i consists of a set of m input variables (also called independent variables, explanatory variables, predictor variables, features, or attributes), and a binary
    outcome variable Yi (also known as a dependent variable, response variable, output variable, or class), i.e.

  • More abstractly, the logistic function is the natural parameter for the Bernoulli distribution, and in this sense is the “simplest” way to convert a real number to a probability.

  • It turns out that this formulation is exactly equivalent to the preceding one, phrased in terms of the generalized linear model and without any latent variables.

  • the latent variable can be written directly in terms of the linear predictor function and an additive random error variable that is distributed according to a standard logistic
    distribution.

  • (Regularization is most commonly done using a squared regularizing function, which is equivalent to placing a zero-mean Gaussian prior distribution on the coefficients, but
    other regularizers are also possible.)

  • Then This model has a separate latent variable and a separate set of regression coefficients for each possible outcome of the dependent variable.

  • Definition of the odds[edit] The odds of the dependent variable equaling a case (given some linear combination of the predictors) is equivalent to the exponential function
    of the linear regression expression.

  • Then when this is used in the equation relating the log odds of a success to the values of the predictors, the linear regression will be a multiple regression with m explanators;
    the parameters for all are all estimated.

  • The main use-case of a logistic model is to be given an observation , and estimate the probability that .

  • Rather than the Wald method, the recommended method[20] to calculate the p-value for logistic regression is the likelihood-ratio test (LRT), which for these data give (see
    § Deviance and likelihood ratio tests below).

  • [23] Again, the optimum beta coefficients may be found by maximizing the log-likelihood function generally using numerical methods.

  • The formula for illustrates that the probability of the dependent variable equaling a case is equal to the value of the logistic function of the linear regression expression.

  • Given that the logit ranges between negative and positive infinity, it provides an adequate criterion upon which to conduct linear regression and the logit is easily converted
    back into the odds.

  • In the case of linear regression, the sum of the squared deviations of the fit from the data points (yk), the squared error loss, is taken as a measure of the goodness of
    fit, and the best fit is obtained when that function is minimized.

  • For the simple binary logistic regression model, we assumed a linear relationship between the predictor variable and the log-odds (also called logit) of the event that .

  • For example, a logistic error-variable distribution with a non-zero location parameter μ (which sets the mean) is equivalent to a distribution with a zero location parameter,
    where μ has been added to the intercept coefficient.

  • The fourth line is another way of writing the probability mass function, which avoids having to write separate cases and is more convenient for certain types of calculations.

  • Then: This formulation—which is standard in discrete choice models—makes clear the relationship between logistic regression (the “logit model”) and the probit model, which
    uses an error variable distributed according to a standard normal distribution instead of a standard logistic distribution.

  • This exponential relationship provides an interpretation for : The odds multiply by for every 1-unit increase in x.

  • One method of maximizing ℓ is to require the derivatives of ℓ with respect to and to be zero: and the maximization procedure can be accomplished by solving the above two equations
    for and , which, again, will generally require the use of numerical methods.

  • • For each data point i, an additional explanatory pseudo-variable x0,i is added, with a fixed value of 1, corresponding to the intercept coefficient β0.

  • This formulation is common in the theory of discrete choice models and makes it easier to extend to certain more complicated models with multiple, correlated choices, as well
    as to compare logistic regression to the closely related probit model.

  • The log-likelihood that a particular set of K measurements or data points will be generated by the above probabilities can now be calculated.

  • Formally, in binary logistic regression there is a single binary dependent variable, coded by an indicator variable, where the two values are labeled “0” and “1”, while the
    independent variables can each be a binary variable (two classes, coded by an indicator variable) or a continuous variable (any real value).

  • a linear combination of the explanatory variables and a set of regression coefficients that are specific to the model at hand but the same for all trials.

  • maximum likelihood estimation, that finds values that best fit the observed data (i.e.

  • The first line expresses the probability distribution of each Yi : conditioned on the explanatory variables, it follows a Bernoulli distribution with parameters pi, the probability
    of the outcome of 1 for trial i.

  • Linear predictor function The basic idea of logistic regression is to use the mechanism already developed for linear regression by modeling the probability pi using a linear
    predictor function, i.e.

  • Since the value of the logistic function is always strictly between zero and one, the log loss is always greater than zero and less than infinity.

  • Logistic regression by MLE plays a similarly basic role for binary or categorical responses as linear regression by ordinary least squares (OLS) plays for scalar responses:
    it is a simple, well-analyzed baseline model; see § Comparison with linear regression for discussion.

  • The model is usually put into a more compact form as follows: • The regression coefficients are grouped into a single vector β of size m + 1.

  • In other words, if we run a large number of Bernoulli trials using the same probability of success pi, then take the average of all the 1 and 0 outcomes, then the result would
    be close to pi.

  • (This predicts that the irrelevancy of the scale parameter may not carry over into more complex models where more than two choices are available.)

  • Whether or not regularization is used, it is usually not possible to find a closed-form solution; instead, an iterative numerical method must be used, such as iteratively
    reweighted least squares (IRLS) or, more commonly these days, a quasi-Newton method such as the L-BFGS method.

  • Multinomial logistic regression: Many explanatory variables and many categories[edit] Main article: Multinomial logistic regression In the above cases of two categories (binomial
    logistic regression), the categories were indexed by “0” and “1”, and we had two probabilities: The probability that the outcome was in category 1 was given by and the probability that the outcome was in category 0 was given by .

  • Generalizations[edit] This simple model is an example of binary logistic regression, and has one explanatory variable and a binary categorical variable which can assume one
    of two categorical values.

  • that give the most accurate predictions for the data already observed), usually subject to regularization conditions that seek to exclude unlikely values, e.g.

  • The x variable is called the “explanatory variable”, and the y variable is called the “categorical variable” consisting of two categories: “pass” or “fail” corresponding to
    the categorical values 1 and 0 respectively.

  • In statistics, the logistic model (or logit model) is a statistical model that models the log-odds of an event as a linear combination of one or more independent variables.

  • The logistic function is a sigmoid function, which takes any real input , and outputs a value between zero and one.

 

Works Cited

[‘o Tolles, Juliana; Meurer, William J (2016). “Logistic Regression Relating Patient Characteristics to Outcomes”. JAMA. 316 (5): 533–4. doi:10.1001/jama.2016.7653. ISSN 0098-7484. OCLC 6823603312. PMID 27483067.
o ^ Jump up to:a b c d e f g h i j
k Hosmer, David W.; Lemeshow, Stanley (2000). Applied Logistic Regression (2nd ed.). Wiley. ISBN 978-0-471-35632-5.[page needed]
o ^ Jump up to:a b Cramer 2002, p. 10–11.
o ^ Jump up to:a b Walker, SH; Duncan, DB (1967). “Estimation of the probability
of an event as a function of several independent variables”. Biometrika. 54 (1/2): 167–178. doi:10.2307/2333860. JSTOR 2333860.
o ^ Cramer 2002, p. 8.
o ^ Boyd, C. R.; Tolson, M. A.; Copes, W. S. (1987). “Evaluating trauma care: The TRISS method.
Trauma Score and the Injury Severity Score”. The Journal of Trauma. 27 (4): 370–378. doi:10.1097/00005373-198704000-00005. PMID 3106646.
o ^ Kologlu, M.; Elker, D.; Altun, H.; Sayek, I. (2001). “Validation of MPI and PIA II in two different groups
of patients with secondary peritonitis”. Hepato-Gastroenterology. 48 (37): 147–51. PMID 11268952.
o ^ Biondo, S.; Ramos, E.; Deiros, M.; Ragué, J. M.; De Oca, J.; Moreno, P.; Farran, L.; Jaurrieta, E. (2000). “Prognostic factors for mortality in
left colonic peritonitis: A new scoring system”. Journal of the American College of Surgeons. 191 (6): 635–42. doi:10.1016/S1072-7515(00)00758-4. PMID 11129812.
o ^ Marshall, J. C.; Cook, D. J.; Christou, N. V.; Bernard, G. R.; Sprung, C. L.; Sibbald,
W. J. (1995). “Multiple organ dysfunction score: A reliable descriptor of a complex clinical outcome”. Critical Care Medicine. 23 (10): 1638–52. doi:10.1097/00003246-199510000-00007. PMID 7587228.
o ^ Le Gall, J. R.; Lemeshow, S.; Saulnier, F. (1993).
“A new Simplified Acute Physiology Score (SAPS II) based on a European/North American multicenter study”. JAMA. 270 (24): 2957–63. doi:10.1001/jama.1993.03510240069035. PMID 8254858.
o ^ Jump up to:a b David A. Freedman (2009). Statistical Models:
Theory and Practice. Cambridge University Press. p. 128.
o ^ Truett, J; Cornfield, J; Kannel, W (1967). “A multivariate analysis of the risk of coronary heart disease in Framingham”. Journal of Chronic Diseases. 20 (7): 511–24. doi:10.1016/0021-9681(67)90082-3.
PMID 6028270.
o ^ Jump up to:a b c Harrell, Frank E. (2015). Regression Modeling Strategies. Springer Series in Statistics (2nd ed.). New York; Springer. doi:10.1007/978-3-319-19425-7. ISBN 978-3-319-19424-0.
o ^ M. Strano; B.M. Colosimo (2006).
“Logistic regression analysis for experimental determination of forming limit diagrams”. International Journal of Machine Tools and Manufacture. 46 (6): 673–682. doi:10.1016/j.ijmachtools.2005.07.005.
o ^ Palei, S. K.; Das, S. K. (2009). “Logistic
regression model for prediction of roof fall risks in bord and pillar workings in coal mines: An approach”. Safety Science. 47: 88–96. doi:10.1016/j.ssci.2008.01.002.
o ^ Berry, Michael J.A (1997). Data Mining Techniques For Marketing, Sales and
Customer Support. Wiley. p. 10.
o ^ Mesa-Arango, Rodrigo; Hasan, Samiul; Ukkusuri, Satish V.; Murray-Tuite, Pamela (February 2013). “Household-Level Model for Hurricane Evacuation Destination Type Choice Using Hurricane Ivan Data”. Natural Hazards
Review. 14 (1): 11–20. doi:10.1061/(ASCE)NH.1527-6996.0000083. ISSN 1527-6988.
o ^ Wibbenmeyer, Matthew J.; Hand, Michael S.; Calkin, David E.; Venn, Tyron J.; Thompson, Matthew P. (June 2013). “Risk Preferences in Strategic Wildfire Decision Making:
A Choice Experiment with U.S. Wildfire Managers”. Risk Analysis. 33 (6): 1021–1037. Bibcode:2013RiskA..33.1021W. doi:10.1111/j.1539-6924.2012.01894.x. ISSN 0272-4332. PMID 23078036. S2CID 45282555.
o ^ Lovreglio, Ruggiero; Borri, Dino; dell’Olio,
Luigi; Ibeas, Angel (2014-02-01). “A discrete choice model based on random utilities for exit choice in emergency evacuations”. Safety Science. 62: 418–426. doi:10.1016/j.ssci.2013.10.004. ISSN 0925-7535.
o ^ Neyman, J.; Pearson, E. S. (1933), “On
the problem of the most efficient tests of statistical hypotheses” (PDF), Philosophical Transactions of the Royal Society of London A, 231 (694–706): 289–337, Bibcode:1933RSPTA.231..289N, doi:10.1098/rsta.1933.0009, JSTOR 91247
o ^ “How to Interpret
Odds Ratio in Logistic Regression?”. Institute for Digital Research and Education.
o ^ Everitt, Brian (1998). The Cambridge Dictionary of Statistics. Cambridge, UK New York: Cambridge University Press. ISBN 978-0-521-59346-5.
o ^ For example,
the indicator function in this case could be defined as
o ^ Malouf, Robert (2002). “A comparison of algorithms for maximum entropy parameter estimation”. Proceedings of the Sixth Conference on Natural Language Learning (CoNLL-2002). pp. 49–55.
doi:10.3115/1118853.1118871.
o ^ Jump up to:a b c d e f g Menard, Scott W. (2002). Applied Logistic Regression (2nd ed.). SAGE. ISBN 978-0-7619-2208-7.[page needed]
o ^ Gourieroux, Christian; Monfort, Alain (1981). “Asymptotic Properties of the
Maximum Likelihood Estimator in Dichotomous Logit Models”. Journal of Econometrics. 17 (1): 83–97. doi:10.1016/0304-4076(81)90060-9.
o ^ Park, Byeong U.; Simar, Léopold; Zelenyuk, Valentin (2017). “Nonparametric estimation of dynamic discrete choice
models for time series data” (PDF). Computational Statistics & Data Analysis. 108: 97–120. doi:10.1016/j.csda.2016.10.024.
o ^ Murphy, Kevin P. (2012). Machine Learning – A Probabilistic Perspective. The MIT Press. p. 245. ISBN 978-0-262-01802-9.
o ^
Van Smeden, M.; De Groot, J. A.; Moons, K. G.; Collins, G. S.; Altman, D. G.; Eijkemans, M. J.; Reitsma, J. B. (2016). “No rationale for 1 variable per 10 events criterion for binary logistic regression analysis”. BMC Medical Research Methodology.
16 (1): 163. doi:10.1186/s12874-016-0267-3. PMC 5122171. PMID 27881078.
o ^ Peduzzi, P; Concato, J; Kemper, E; Holford, TR; Feinstein, AR (December 1996). “A simulation study of the number of events per variable in logistic regression analysis”.
Journal of Clinical Epidemiology. 49 (12): 1373–9. doi:10.1016/s0895-4356(96)00236-3. PMID 8970487.
o ^ Vittinghoff, E.; McCulloch, C. E. (12 January 2007). “Relaxing the Rule of Ten Events per Variable in Logistic and Cox Regression”. American
Journal of Epidemiology. 165 (6): 710–718. doi:10.1093/aje/kwk052. PMID 17182981.
o ^ van der Ploeg, Tjeerd; Austin, Peter C.; Steyerberg, Ewout W. (2014). “Modern modelling techniques are data hungry: a simulation study for predicting dichotomous
endpoints”. BMC Medical Research Methodology. 14: 137. doi:10.1186/1471-2288-14-137. PMC 4289553. PMID 25532820.
o ^ Greene, William N. (2003). Econometric Analysis (Fifth ed.). Prentice-Hall. ISBN 978-0-13-066189-0.
o ^ Jump up to:a b c d e f
g h i j Cohen, Jacob; Cohen, Patricia; West, Steven G.; Aiken, Leona S. (2002). Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences (3rd ed.). Routledge. ISBN 978-0-8058-2223-6.[page needed]
o ^ Allison, Paul D. “Measures
of fit for logistic regression” (PDF). Statistical Horizons LLC and the University of Pennsylvania.
o ^ Hosmer, D.W. (1997). “A comparison of goodness-of-fit tests for the logistic regression model”. Stat Med. 16 (9): 965–980. doi:10.1002/(sici)1097-0258(19970515)16:9
<965::aid-sim509>3.3.co;2-f. PMID 9160492.
o ^ Harrell, Frank E. (2010). Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. New York: Springer. ISBN 978-1-4419-2918-1.[page needed]
o ^ Jump up to:a b
https://class.stanford.edu/c4x/HumanitiesScience/StatLearning/asset/classification.pdf slide 16
o ^ Jump up to:a b Mount, J. (2011). “The Equivalence of Logistic Regression and Maximum Entropy models” (PDF). Retrieved Feb 23, 2022.
o ^ Ng,
Andrew (2000). “CS229 Lecture Notes” (PDF). CS229 Lecture Notes: 16–19.
o ^ Rodríguez, G. (2007). Lecture Notes on Generalized Linear Models. pp. Chapter 3, page 45.
o ^ Gareth James; Daniela Witten; Trevor Hastie; Robert Tibshirani (2013).
An Introduction to Statistical Learning. Springer. p. 6.
o ^ Pohar, Maja; Blas, Mateja; Turk, Sandra (2004). “Comparison of Logistic Regression and Linear Discriminant Analysis: A Simulation Study”. Metodološki Zvezki. 1 (1).
o ^ Cramer 2002,
pp. 3–5.
o ^ Verhulst, Pierre-François (1838). “Notice sur la loi que la population poursuit dans son accroissement” (PDF). Correspondance Mathématique et Physique. 10: 113–121. Retrieved 3 December 2014.
o ^ Cramer 2002, p. 4, “He did not
say how he fitted the curves.”
o ^ Verhulst, Pierre-François (1845). “Recherches mathématiques sur la loi d’accroissement de la population” [Mathematical Researches into the Law of Population Growth Increase]. Nouveaux Mémoires de l’Académie
Royale des Sciences et Belles-Lettres de Bruxelles. 18. Retrieved 2013-02-18.
o ^ Cramer 2002, p. 4.
o ^ Cramer 2002, p. 7.
o ^ Cramer 2002, p. 6.
o ^ Cramer 2002, p. 6–7.
o ^ Cramer 2002, p. 5.
o ^ Cramer 2002, p. 7–9.
o ^ Cramer
2002, p. 9.
o ^ Cramer 2002, p. 8, “As far as I can see the introduction of the logistics as an alternative to the normal probability function is the work of a single person, Joseph Berkson (1899–1982), …”
o ^ Cramer 2002, p. 11.
o ^ Jump
up to:a b Cramer 2002, p. 13.
o ^ McFadden, Daniel (1973). “Conditional Logit Analysis of Qualitative Choice Behavior” (PDF). In P. Zarembka (ed.). Frontiers in Econometrics. New York: Academic Press. pp. 105–142. Archived from the original
(PDF) on 2018-11-27. Retrieved 2019-04-20.
• Berkson, Joseph (1944). “Application of the Logistic Function to Bio-Assay”. Journal of the American Statistical Association. 39 (227): 357–365. doi:10.1080/01621459.1944.10500699. JSTOR 2280041.
• Berkson,
Joseph (1951). “Why I Prefer Logits to Probits”. Biometrics. 7 (4): 327–339. doi:10.2307/3001655. ISSN 0006-341X. JSTOR 3001655.
• Bliss, C. I. (1934). “The Method of Probits”. Science. 79 (2037): 38–39. Bibcode:1934Sci….79…38B. doi:10.1126/science.79.2037.38.
PMID 17813446. These arbitrary probability units have been called ‘probits’.
• Cox, David R. (1958). “The regression analysis of binary sequences (with discussion)”. J R Stat Soc B. 20 (2): 215–242. JSTOR 2983890.
• Cox, David R. (1966).
“Some procedures connected with the logistic qualitative response curve”. In F. N. David (ed.). Research Papers in Probability and Statistics (Festschrift for J. Neyman). London: Wiley. pp. 55–71.
• Cramer, J. S. (2002). The origins of logistic
regression (PDF) (Technical report). Vol. 119. Tinbergen Institute. pp. 167–178. doi:10.2139/ssrn.360300.
o Published in: Cramer, J. S. (2004). “The early origins of the logit model”. Studies in History and Philosophy of Science Part C: Studies
in History and Philosophy of Biological and Biomedical Sciences. 35 (4): 613–626. doi:10.1016/j.shpsc.2004.09.003.
• Fisher, R. A. (1935). “The Case of Zero Survivors in Probit Assays”. Annals of Applied Biology. 22: 164–165. doi:10.1111/j.1744-7348.1935.tb07713.x.
Archived from the original on 2014-04-30.
• Gaddum, John H. (1933). Reports on Biological Standards: Methods of biological assay depending on a quantal response. III. H.M. Stationery Office. OCLC 808240121.
• Theil, Henri (1969). “A Multinomial
Extension of the Linear Logit Model”. International Economic Review. 10 (3): 251–59. doi:10.2307/2525642. JSTOR 2525642.
• Pearl, Raymond; Reed, Lowell J. (June 1920). “On the Rate of Growth of the Population of the United States since 1790
and Its Mathematical Representation”. Proceedings of the National Academy of Sciences. 6 (6): 275–288. Bibcode:1920PNAS….6..275P. doi:10.1073/pnas.6.6.275. PMC 1084522. PMID 16576496.
• Wilson, E.B.; Worcester, J. (1943). “The Determination
of L.D.50 and Its Sampling Error in Bio-Assay”. Proceedings of the National Academy of Sciences of the United States of America. 29 (2): 79–85. Bibcode:1943PNAS…29…79W. doi:10.1073/pnas.29.2.79. PMC 1078563. PMID 16588606.
• Agresti, Alan.
(2002). Categorical Data Analysis. New York: Wiley-Interscience. ISBN 978-0-471-36093-3.
• Amemiya, Takeshi (1985). “Qualitative Response Models”. Advanced Econometrics. Oxford: Basil Blackwell. pp. 267–359. ISBN 978-0-631-13345-2.
• Balakrishnan,
N. (1991). Handbook of the Logistic Distribution. Marcel Dekker, Inc. ISBN 978-0-8247-8587-1.
• Gouriéroux, Christian (2000). “The Simple Dichotomy”. Econometrics of Qualitative Dependent Variables. New York: Cambridge University Press. pp.
6–37. ISBN 978-0-521-58985-7.
• Greene, William H. (2003). Econometric Analysis, fifth edition. Prentice Hall. ISBN 978-0-13-066189-0.
• Hilbe, Joseph M. (2009). Logistic Regression Models. Chapman & Hall/CRC Press. ISBN 978-1-4200-7575-5.
• Hosmer,
David (2013). Applied logistic regression. Hoboken, New Jersey: Wiley. ISBN 978-0-470-58247-3.
• Howell, David C. (2010). Statistical Methods for Psychology, 7th ed. Belmont, CA; Thomson Wadsworth. ISBN 978-0-495-59786-5.
• Peduzzi, P.; J.
Concato; E. Kemper; T.R. Holford; A.R. Feinstein (1996). “A simulation study of the number of events per variable in logistic regression analysis”. Journal of Clinical Epidemiology. 49 (12): 1373–1379. doi:10.1016/s0895-4356(96)00236-3. PMID 8970487.
• Berry,
Michael J.A.; Linoff, Gordon (1997). Data Mining Techniques For Marketing, Sales and Customer Support. Wiley.
Photo credit: https://www.flickr.com/photos/nillanilzon/16062072523/’]