Contents of DISTRIB 2.0 Help System

DISTRIB is a computer program designed to aid in the fitting of independent data to a theoretical probability distribution. The program was written in Visual Basic(TM) for the Window(TM) operating system and is based upon the text Frequency and Risk Analysis by G.W. Kite (Water Resources Publications, Littleton CO). The objective of this program is to provide an easy to use and intuitive interface for the analysis of data. The program was written by Dr. Ron Eaglin at the University of Central Florida.

What is a Distribution?

Types of Distributions

Using DISTRIB

Types of Distributions

Normal

Log Normal

3 Parameter Log Normal3

Pearson

Log Pearson

Gumbel

Methods of Analysis

Maximum Likelihood

Method of Moments

Using DISTRIB

Using Files

DISTRIB Files

Importing Data

Editing Data and Options

Pasting Data from Spreadsheets

Editing Actual Data

Level of Confidence

Plot Text

Plotting Analysis

Comparison Plots

Histograms

What is a Distribution?

Understanding distributions can best be done by example. So...

In the study of hydrology it is very advantageous to predict a flood. It is, however, almost impossible to predict if a flood will occur, say, next year. Instead we try to predict the probability of a flood. If a flood has occurred 4 times in the last 100 years then we can very simply state that there is probably a 1 in 25 chance that one will occur next year. However, is the last 100 years representative of the probability? What if one of the floods occurred last year. If we were predicting a flood based on 100 years the year before the flood we would have thought a flood occurred every 33 years and therefore there was a 1 in 33 chance of the flood. To alleviate these inaccuracies why don't we analyze **all** storms that occurred in the last 100 years. Only those storms which are the largest will cause a flood, but now our analysis is based on a lot more data. IN FACT we can probably use only data from the last 10 years to predict a flood even if a flood has not occurred in the last 10 years. Why? Because the data will probably fit a distribution. Rainfall data typically fits any number of distributions and fit them well.

We run DISTRIB and we find our data fits a 2 parameter log normal distribution very nicely. Odds are any storms in the future will also fit nicely. We can predict storms which occur every 100 storms, every 1000 storms, etc.. We have fit our data to a distribution and we can make predictions based on that distribution. You may be familiar with the most common distribution - the normal distribution. You know, the one that is shaped like a typical bell curve. Well, surprise! there are many distributions which fit many different types of data. This program will help you fit your data to the distributions available in the program.

Normal Distribution

The probability density function for the normal distribution is:

where:

m = mean of the population of x

s = variance of the population of x

The Assumptions of the Normal Distribution are:

1. The variable is continuous

2. Consecutive Values are Independent

3. Probabilities are Stable

The **Prediction** for any value of a normal distribution can be found from:

where K (frequency factor) is the standard normal deviate.

The **Standard Error** of a normal distribution can be found from:

Various methods for the esitmation of the delta parameter exist.The results for the Prediction and Standard Error using the maximum likelihood and method of moments for the Normal distribution are identical.

Log Normal Distribution (2 Parameter Log Normal)

The probability density function for the Log Normal Distribution is:

where:

y = ln(x) - natural log of x

my = mean of the population y

sy = variance of the population of y

The Assumptions of the Normal Distribution are:

1. The variable is continuous

2. Consecutive Values are Independent

3. Probabilities are Stable

4. All variables are non-zero

A number of methods have been used to handle 0.0 values. DISTRIB does not perform any of these conversions for you and may crash if you attempt to fit 0 data to a log distribution. To alleviate this problem you may:

1. Add 1.0 to all data

2. Add a small positive value to all data.

3. Substitute 1.0 in place of all 0 data.

4. Substitute a small positive number in the place of all zero readings.

5. Ignore all zero observations.

6. Consider the probability distribution as the sum of the probability mass at 0.0 and a probability distribution over the remainder of the range. This method is described in Jennings and Benson.

3 Parameter Log Normal

The probability density function for the 3 parameter log normal distribution is:

where:

y = ln(x-a) - natural log of (x-a)

my = mean of the population y

sy = variance of the population of y

Pearson Distribution

The Pearson distribution is represented by following probability density function. The mode of this function is a x = 0. This equation is a selective case of the three parameter gamma distribution.

where:

d = difference between mean and mode (d = m - Xm)

Xm = mode of popolation x

a = Scale parameter of distribution

po = value of px(x) at mode

Log Pearson Distribution

Substituting y=ln(x) for x in the Pearson distribution gives the Log Pearson type III distribution;

where:

dy = difference between mean and mode (d = my - Ym)

Ym = mode of population y

a = Scale parameter of distribution

pyo = value of px(y) at mode

Gumbel Distribution

The Gumbel distribution (also referred to as Fisher-Tippett Type I, Double Exponential, Gumbel Type I, and Gumbel Extremal distribution ) is characterized by the probability density function;

where:

a = Scale parameter of the distribution

b = Location parameter of the distribution

Standard Error of Distribution

The standard error can be calculated by the equation.

Where

SSE = the sum squared error difference between the actual and predicted data.

n = Number of Points.

Standard Error of Prediction

The Standard Error of the Prediction is calculated specific to a prediction at a certain probability. It is important to the calculation of a __confidence interval__HLP_CONFIDENCE. This statistic is output in the probability analysisHLP_PREDICTIVE and the Comparison AnalysisHLP_COMPARISON. This statistic should not be confused with the __Standard Error of Distribution__HLP_STANDARDERROR which is for the entire distribution.

Confidence Interval

A confidence interval for any prediction can be obtained by using the equation

where:

Xt = Prediction Value of Event

St = __Standard Error of Event__HLP_STANDARDERRORPREDICTION

t = Standard Normal Deviate Corresponding to Confidence Level

The t statistic will be found from the level of confidence. The level of confidence can be changed but **it must be between 0.50 and 0.999**. The level of confidence can be changed using: Edit, Level of Confidence.

Determination of Plotting Position

The most common method of determination of plotting position is the Weibul Method. Other methods exist and can be used in DISTRIB. To change the method click on the name of the plotting position on the DISTRIB main screen. A plotting position will be calculated for every data point in an array of data. The data must be sorted prior (DISTRIB will automatically sort the data).

The available plotting position formulae are:

Weibull Probability

where

m = sorted number of data (1,2, ...,n)

n = number of data points

California

Foster

Exceedence

Prediction Array

The prediction array in DISTRIB 2.0 uses only one type of distribution, and a forms predictions based on a number of probabilities contained in the prediction spreadsheet. The reuslts are placed in the same spreadsheet.

Predictive Analysis

Predictive analysis based on a return period is automatically given in the prediction spreadsheet. If you wish to change the return periods used in prediction, enter the corresponding probability in the first column and the click the type of distribution desired. The calculations will be automatically performed.

Distribution Analysis

A Distribution analysis is automatically performed on the data in the main spreadsheet, simply by clicking the button corresponding to the distribution type desired. The probabilities for the individual points are calculated by __Weibull equation__HLP_WEIBULL.

Maximum Likelihood Estimation

The maximum likelihood estimates the distribution parameters such that product of the likelihoods of the individual events (L) is maximized. In terms of an equation this becomes an estimation of a,b... such that

is maximized.

Method of Moments Estimation

The method of moments uses the calculation of the rth moment about the origin of a distribution.

The probability function p(x)is then directly substituted into the equation and the distribution parameters are solved for directly. For example in the case of a __Normal Distribution__HLP_NORMAL it can be shown that distribution parameters solve into the form.

Opening and Saving DISTRIB (*.DST, *.XD2) Files

DISTRIB 2.0 opens and saves all files with a XD2 extension. These files are saved in a ASCII format. Files will be saved as a list of numbers for each of the data points in the analysis. Data can also be importedHLP_IMPORTINGDATA.

Importing Data

Data can be imported in both DISTRIB 1.0 and DISTRIB 2.0. The data must be in ASCII format with one number per line. There should be no extra spaces. For example if you wish to import a set of data with 9 pieces of data then the file would look like;

***** Beginning of File - Do Not Put this Line in Your File ******

11.23

12.54

14.65

16.87

9.54

11.76

14.32

19.54

12.87

***** End of File Do Not Put this Line in Your File ***********

You can create this file using Windows NotePad or any ASCII text editor, be sure there are no extra spaces at the end of your file.

Editing Data

Data can be edited directly by changing the data on the spreadsheet in the distribution window. When the distribution is selected the data will be sorted. This data can also be saved to disk.

Entering Plot Text

Plot Text Cannot be modified in DISTRIB 2.0

Plotting the Distribution Analysis

The distribution Analysis can be plotted. The plot will contain the actual data and the prediction The plot can be sent to the Windows clipboard using the copy button. To print a plot use the Print button.

Plotting Histogram

DISTRIB 2.0 cannot plot Histograms.

Pasting Data from Spreadsheets

Data can be pasted from a spreadsheet directly into DISTRIB. To do this you must have the data in columnar format in the spreadsheet. In your spreadsheet highlight your data in columnar format and copy it to the clipboard. Switch to DISTRIB [Alt Tab or Ctrl-Esc]. Select Edit, Paste from the DISTRIB menu. The data will be sorted and displayed in the spreadsheet control in the distibution window. You may edit the data within this display. Do not worry about blanks in the data, as DISTRIB will remove these automatically. Click Ok from the Paste data dialog. You may now analyze the data.