Frequency Programs
This page documents two related Perl utilities for quick frequency analysis of fixed-column data. freq.pl provides basic counts and cumulative counts, while freq2.pl adds percentages and cumulative percentages.
freq.pl — Basic Frequency Counts
freq.pl is a perl program for generating a quick frequency table from a fixed column of data values making it useful when you need a fast check on the distribution of values without running a full statistical package like SAS or SPSS. freq.pl can be used to answer the question "how many times do the unique values in columns x through y occur in this file?"
The input file is assumed to be formatted with the data of interest in the same column range on each line of the input file. The data may be numeric and/or character. The output is reminiscent of that produced by SAS proc freq, consisting of the individual values in increasing order, plus a frequency count and cumulative frequency count for each value.
- Show freq.pl code
- Download: freq.pl
Usage
The commandline is: freq.pl [-h] [-c[#-#][#] filename] where: -c#-# indicates starting & ending column numbers of the variable and -c# indicates a single-column variable at column #.
As an example, the command freq.pl -c210 data.file might produce output like:
(pts/1):~> freq.pl -c210 data.file Page 1 Frequencies for the values in columns 210-210 in the file "data.file" Cumulative Value Frequency Frequency ------- --------- ---------- 0 2214 2214 1 1009 3223 2 533 3756 9 15721 19477If there had been many values to print out, the program would have output a paged listing, each page with increasing page number and header information as in the above table. In testing, this program was usually faster than a shell script that uses sort, cut, uniq, and awk to produce output without the cumulative frequencies.
An example of SAS output on the same data shows the same counts as above.
The next program shows the addition of percentages to the report.
freq2.pl — Frequency Counts and Percentages
freq2.pl is a simple extension of freq.pl. freq2.pl adds frequency counts, frequency as a percentage of total count, plus cumulative frequency, and cumulative percentage to the report.
- Show freq2.pl code
- Download: freq2.pl
Limitations:
- It reads columns of raw data. Intended for quick checks.
- Can only print the first 20 characters of a value, although can compute frequencies for longer values.
- The order of values in the output is set by a character sort rather than a numeric sort (the program knows nothing of variable types like numeric or string).
- If you give a column range outside the logical record length of the data, you will still get output, but all values will be null.
- Since the program summarizes values, if you have dots in the data (as SAS missing), they will be counted. And of course, values like NA and 999 but they will not be labeled as missing.
Usage
The command line is: freq2.pl [-h] [-c[#-#][#] filename] where: -c#-# indicates starting & ending column numbers of the variable and -c# indicates a single-column variable at column # #.
As an example, the command freq2.pl -c210 data.file might produce output like:
(pts/1):~> freq2.pl -c210 data.file Page 1 Frequencies for the values in columns 210-210 in the file "data.file" Value Cumulative Cumulative Value Frequency Percent Frequency Percent ------- --------- ------- ---------- ---------- 0 2214 11.37 2214 11.37 1 1009 5.18 3223 16.55 2 533 2.74 3756 19.28 9 15721 80.72 19477 100.00An example of SAS output on the same data shows the same results as above.
Back to the Kent's Perl Page
Last Modified: Mon Jun 15 14:29:18 EDT 2026