Frequency Programs

This page documents two related Perl utilities for quick frequency analysis of fixed-column data. freq.pl provides basic counts and cumulative counts, while freq2.pl adds percentages and cumulative percentages.

freq.pl — Basic Frequency Counts


freq.pl is a perl program for generating a quick frequency table from a fixed column of data values making it useful when you need a fast check on the distribution of values without running a full statistical package like SAS or SPSS. freq.pl can be used to answer the question "how many times do the unique values in columns x through y occur in this file?"

The input file is assumed to be formatted with the data of interest in the same column range on each line of the input file. The data may be numeric and/or character. The output is reminiscent of that produced by SAS proc freq, consisting of the individual values in increasing order, plus a frequency count and cumulative frequency count for each value.

Usage

The commandline is: freq.pl [-h] [-c[#-#][#] filename] where: -c#-# indicates starting & ending column numbers of the variable and -c# indicates a single-column variable at column #.

As an example, the command freq.pl -c210 data.file might produce output like:


(pts/1):~> freq.pl -c210 data.file

   Page  1
   Frequencies for the values in columns 210-210
   in the file "data.file"

                                              Cumulative
                           Value   Frequency   Frequency
                          -------  ---------  ----------
                               0       2214        2214
                               1       1009        3223
                               2        533        3756
                               9      15721       19477

If there had been many values to print out, the program would have output a paged listing, each page with increasing page number and header information as in the above table. In testing, this program was usually faster than a shell script that uses sort, cut, uniq, and awk to produce output without the cumulative frequencies.

An example of SAS output on the same data shows the same counts as above.

The next program shows the addition of percentages to the report.

freq2.pl — Frequency Counts and Percentages


freq2.pl is a simple extension of freq.pl. freq2.pl adds frequency counts, frequency as a percentage of total count, plus cumulative frequency, and cumulative percentage to the report.

Limitations:

Usage

The command line is: freq2.pl [-h] [-c[#-#][#] filename] where: -c#-# indicates starting & ending column numbers of the variable and -c# indicates a single-column variable at column # #.

As an example, the command freq2.pl -c210 data.file might produce output like:

(pts/1):~> freq2.pl -c210 data.file

   Page  1
   Frequencies for the values in columns 210-210
   in the file "data.file"

                                            Value  Cumulative  Cumulative
                       Value   Frequency  Percent   Frequency     Percent
                      -------  ---------  -------  ----------  ----------
                           0       2214    11.37        2214       11.37
                           1       1009     5.18        3223       16.55
                           2        533     2.74        3756       19.28
                           9      15721    80.72       19477      100.00
An example of SAS output on the same data shows the same results as above.
Back to the Kent's Perl Page
Last Modified: Mon Jun 15 14:29:18 EDT 2026