# Probit Analysis

Menu location: **Analysis_Regression and**** Correlation_Probit Analysis**.

This function provides probit analysis for fitting probit and logit sigmoid dose/stimulus response curves and for calculating confidence intervals for dose-response quantiles such as ED50.

When biological responses are plotted against their causal stimuli (or logarithms of them) they often form a sigmoid curve. Sigmoid relationships can be linearized by transformations such as logit, probit and angular. For most systems the probit (normal sigmoid) and logit (logistic sigmoid) give the most closely fitting result. Logistic methods are useful in Epidemiology because odds ratios can be determined easily from differences between fitted logits (see logistic regression). In biological assay work, however, probit analysis is preferred (Finney, 1971, 1978). Curves produced by these methods are very similar, with maximum variation occurring within 10% of the upper and lower asymptotes.

__Probit__

- where Y' is the probit transformed value (5 used to be added to avoid negative values in hand calculation), p is the proportion (p = responders/total number) and inverse Φ(p) is the 100*p% quantile from the standard normal distribution.

__Logit__

Odds = p/(1-p)

[p = proportional response, i.e. r out of n responded so p = r/n]

Logit = log odds = log(p/(1-p))

__Data preparation__

Your data are entered as dose levels, number of subjects tested at each dose level and number responding at each dose level. At the time of running the analysis you may enter a control result for the number of subjects responding in the absence of dose/stimulus; this provides a global adjustment for natural mortality/responsiveness. You may also specify automatic log transformation of the dose levels at run time if appropriate (this should be supported by good evidence of a log-probit relationship for your type of study).

__Model analysis and critical quantiles__

The fitted model is assessed by statistics for heterogeneity which follow a chi-square distribution. If the heterogeneity statistics are significant then your observed values deviate from the fitted curve too much for reliable inference to be made (Finney, 1971, 1978).

StatsDirect gives you the effective/lethal levels of dose/stimulus with confidence intervals at the quantiles you specify, e.g. ED50/LD50.

__Technical validation__

The curve is fitted by maximum likelihood estimation, using Newton-Raphson iteration. A dummy variable is used to factor in the background/natural response rate if the you specify a response in controls.

__Further analysis and cautions__

For more complex probit analysis, such as the calculation of relative potencies from several related dose response curves, consider non-linear optimisation software or specialist dose-response analysis software such as Bliss. The latter is a FORTRAN routine written by David Finney and Ian Craigie, it is available from Edinburgh University Computing Centre. MLP, Genstat or R can be used for more general non-linear model fitting with the ability to constrain curves to "parallelism". Expert statistical guidance should be sought before attempting this sort of work.

CAUTION 1: Please do not think of probit analysis as a "cure all" for dose response curves. Many log dose - response relationships are clearly not Gaussian sigmoids. Other well described sigmoid relationships include angular, Wilson-Worcester and Cauchy-Urban. There may be no "off the shelf" regression model suited to your study. Exploratory non-linear modelling should only be carried out by an expert Statistician.

CAUTION 2: Standard probit analysis is designed to handle only quantal responses with binomial error distributions. Quantal data, such as the number of subjects responding vs. total number of subjects tested, usually have binomial error distributions. You should not use continuous data, such as percent maximal response, with probit analysis as these data are likely to require regression methods that assume a different error distribution. Most researchers should seek expert statistical help before pursuing this type of analysis (see non-linear models).

__Example__

From Finney (1971, p. 98).

Test workbook (Regression worksheet: Age, Girls, + Menses).

The following data represent a study of the age at menarche (first menstruation) of 3918 Warsaw girls. For each age group you are given mean age, total number of girls and the number of girls who had reached menarche.

Age | Girls | + Menses |

9.21 | 376 | 0 |

10.21 | 200 | 0 |

10.58 | 93 | 0 |

10.83 | 120 | 2 |

11.08 | 90 | 2 |

11.33 | 88 | 5 |

11.58 | 105 | 10 |

11.83 | 111 | 17 |

12.08 | 100 | 16 |

12.33 | 93 | 29 |

12.58 | 100 | 39 |

12.83 | 108 | 51 |

13.08 | 99 | 47 |

13.33 | 106 | 67 |

13.58 | 105 | 81 |

13.83 | 117 | 88 |

14.08 | 98 | 79 |

14.33 | 97 | 90 |

14.58 | 120 | 113 |

14.83 | 102 | 95 |

15.08 | 122 | 117 |

15.33 | 111 | 107 |

15.58 | 94 | 92 |

15.83 | 114 | 112 |

17.58 | 1049 | 1049 |

To analyse these data in StatsDirect you must first prepare them in three workbook columns appropriately labelled. Alternatively, open the test workbook using the file open function of the file menu. Then select Logit from the Probit analysis section of the Regression and Correlation section of the analysis menu. Select the column marked "Age" when you are prompted for dose levels, select "Girls" when you are prompted for subjects at each level and select "Menses" when prompted for responders at each level. Make sure that the "Calc log10" option is not checked when prompted, this disables base 10 logarithmic transformation of the "dose" variable (mean ages in this example). Enter number of controls as 0 when prompted and also enter 0 when you are asked about an additional quantile.

For this example:

Probit analysis - logit sigmoid curve

constant = -10.613197

slope = 0.815984

Median * Dose = 13.006622

Confidence interval (No Heterogeneity) = 12.930535 to 13.082483

* Dose for centile 90 = 14.352986

Confidence interval (No Heterogeneity) = 14.238636 to 14.480677

Chi² (heterogeneity of deviations from model) = 21.869852, (23 df), P = .5281

t for slope = 27.682452 (23 df) P < .0001

Having looked at a plot of this model and accepted that the model is reasonable, we conclude with 95% confidence that the true population value for median age at menarche in Warsaw lay between 12.93 and 13.08 years when this study was carried out.