Multi-Record Hierarchical Data
What Are We Talking About Here?
Multi-record data is fixed-width raw data where each case spans MULTIPLE LINES in a fixed sequence, potentially of different lengths, but always WITH DIFFERENT CONTENTS. This data format still appears in industries where legacy reporting systems, long-running surveys, or fixed-width exports are produced, or where efficient storage is important. It's an ancient format used in the days of punched paper cards.
A common misconception is that PSPP lacks support for multi‑record data files. In reality, PSPP has long supported reading multi‑record data using
DATA LIST FIXEDwithRECORDS=and slash notation. The fact thatDATA LIST FIXEDdoes not appear in the psppire graphical interface means users need to rely on the PSPP syntax engine. The code can be run in psppire in the Syntax Editor.In this context, “multi‑record” refers to classic SPSS/PSPP hierarchical files -- sometimes called multi-record case files or multi-type record files -- where a single case spans MULTIPLE PHYSICAL RECORDS, EACH WITH A DIFFERENT LAYOUT. This is not the modern usage of “multi‑record” to mean “multiple rows per ID” in a CSV or database table, where every row has the same structure or schema. It also has nothing to do with multiple response sets (MRSETS) which are a completely different SPSS/PSPP feature.
Here are two screenshots of parts of the 1980 PUMS codebook showing two different record types (Person and Housing):
1980 PUMS Codebook example (P Records) [click to enlarge]:
![]()
1980 PUMS Codebook example (H Records) [click to enlarge]:
![]()
Person records are different from Housing records because they collect different types of data (age vs monthly rent, for example). They can be combined in a multi-record data file very easily
PSPP can read this kind of multi-record data directly. This page shows how in two different examples. There is no need to preprocess this data to get it into a dataset.
You may hear these referred to as multi-record case files, multi-record data files, stacked data, or (historically) hierarchical data files. This format originated when data were stored on punched paper cards, which limited the width of the data. Instead of making each record wider, this format made it taller by spreading one case across multiple fixed-width cards in a defined sequence.
Comparing Rectangular and Multi-line Data Used Today
Here is what modern data looks like most of the time:
id,age,income,married 101,34,55000,1 102,29,48000,0 103,41,72000,1Or, if the data are multiple files by ID it might look like this:
id,visit,weight 101,1,180 101,2,178 101,3,176Multi-record hierarchical data (in the SPSS/PSPP meaning) looks like this:
1John Smith 1985M 2 55000 1 3 2 1 0 3 1Mary Johnson 1990F 2 48000 0 3 1 0 0 2Record 1 = demographics
Record 2 = income + marital status
Record 3 = responses to 4 items
- This is not long format.
- This is not multiple rows per ID.
- This is hierarchical fixed-width data.
How to Read Multi-Record Raw Data with PSPP
This format stores a large number of variables and values in a compact form, spreading each case across multiple physical records. However, the format is not self-documenting. External documentation is required to interpret the file, usually in the form of a codebook defining how the data were collected, what the records represent, the variable names, special handling needed, and the column locations. If you have existing PSPP code that reads the data correctly, use that. Sometimes multi-record files come with a program to read the data (e.g. SPSS, PSPP, SAS).
Workflow Tip:Always ask for the codebook when working with multi-record files. These files cannot be interpreted reliably without the accompanying documentation, and guessing the layout almost always leads to errors.
Here is an example of multi-record data with 4 records of survey data for two subjects:
044104101712288855439855238255785555221010000010000000005080 805060909080808090500000708000005092543214424122111432310100 000200000000080908080810090909100800000007070000051015441344 2412311243221 087107002688878693388888577338897238210020001020001010308090 810071008070506060906030607020505092444443225332444554400200 000100010001081009090810100708090707060304070302050845444421 1522124444441Each block in the example above consists of four records or "cards." Single digit survey responses make it nearly impossible to tell which value belongs to which variable. Different variables appear on different records (or lines in the file), so this file cannot be read vertically as a simple rectangular dataset or CSV file. Avoid the common mistake of treating a multi-record file as a rectangular file of records to be read the same way.
PSPP can handle this type of data by reading each record with a separate set of input statements for each record in the case. It then combines the values from all the records into a single (wide) case which can be exported to a format other programs can read. Exporting to CSV format, for example makes the data readable by many programs.
Tip: Every Record/Card Must be Read or Explicitly Skipped
A multi‑record file is like a stack of playing cards. PSPP reads them in order, one record at a time, and each card is different. There are two safe options to not read every data record:
- Read every record and delete variables later if they are not needed. (Yes, this really is reading every record).
- Skip records correctly using a blank record specification (for example:
/ / /: slashes inDATA LISTskip records)Every record must be accounted for. Removing a record is like removing variables from the middle of a
DATA LIST FREEstatement — everything after the gap shifts left, and the data becomes misaligned. In multi-record files, this misalignment appears vertically. Experience has shown that such misalignments can be very difficult to debug. So be careful in your setups.Multi‑record files follow a vertical, sequential pattern that is not obvious at first glance. Without understanding the data file structure, it is easy to make assumptions that don’t match how these files must be read.
This page shows how to read a multi‑record file with PSPP and export the assembled dataset to CSV or SAV. Once in a modern format, the data can be used in modern analysis tools.
Before looking at the PSPP commands, it helps to see the basic pattern of a multi-record file. Each record has its own layout, and PSPP reads them in order:
- Record 1 → /1 variables (Subject IDs and Treatments) - Record 2 → /2 variables (Attitudes, Part 1) - Record 3 → /3 variables (Attitudes, Part 2) - etc.This dataset has 4 records per case so
RECORDS=4is used. Two of the records are not needed for analysis, so PSPP skips them using blank slash lines below. This technique works well for reading multi-record files. Later I'll show howGET DATAreads the same data. Thus there are two ways to read multi-record data in PSPP.* Read multi-record data and skip two records. DATA LIST FILE=studydat RECORDS=4 NOTABLE /1 SUBJECT 1-3 STUDY 4 TRTMENT 5-6 FEEL1 10 FEEL2 11 FEEL3 12 /2 LIKE1 1 LIKE2 2 LIKE3 3 LIKE4 4 LIKE5 5 LIKE6 6 LIKE7 7 LIKE8 8 LIKE9 9 LIKE10 10 ACTU11 11 ACTU12 12 ACTU13 13 / / . SAVE OUTFILE='studydat_wide.sav'.To skip a record , the user instructs PSPP to advance one record for each slash (/). To skip two records, enter two slashes and end the command with a period.
This code illustrates how
DATA LIST FIXED) reads multi-record data. First, theRECORDS=4tells PSPP that each case consists of four records, even though in this case, two records are not used for analysis. The two records not being read still need to be accounted for to keep the read in sync with the data.Each record is defined by a
/Nand followed by variable names and their column locations on that record. Note that the column locations differ across the records because each record has its own data layout and contents. In this example,/1and/2define the two records being read. The two slashes account for the other two records and move the pointer for DATA LIST FIXED when reading the data.Output:
Data List +-------+-----+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ |SUBJECT|STUDY|TRTMENT|FEEL1|FEEL2|FEEL3|LIKE1|LIKE2|LIKE3|LIKE4|LIKE5|LIKE6| +-------+-----+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ | 44| 1| 4| 7| 1| 2| 8| 0| 5| 0| 6| 0| | 87| 1| 7| 6| 8| 8| 8| 1| 0| 0| 7| 1| +-------+-----+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ +-----+-----+-----+------+------+------+------+ |LIKE7|LIKE8|LIKE9|LIKE10|ACTU11|ACTU12|ACTU13| +-----+-----+-----+------+------+------+------+ | 9| 0| 9| 0| 8| 0| 8| | 0| 0| 8| 0| 7| 0| 5| +-----+-----+-----+------+------+------+------+ Variables +-------+--------+---------------+-----+-----+---------+-----------+----------+ | | | Measurement | | | | Print | Write | |Name |Position| Level | Role|Width|Alignment| Format | Format | +-------+--------+---------------+-----+-----+---------+-----------+----------+ |SUBJECT| 1|Scale |Input| 8|Right |F3.0 |F3.0 | |STUDY | 2|Nominal |Input| 8|Right |F1.0 |F1.0 | |TRTMENT| 3|Nominal |Input| 8|Right |F2.0 |F2.0 | |FEEL1 | 4|Nominal |Input| 8|Right |F1.0 |F1.0 | |FEEL2 | 5|Nominal |Input| 8|Right |F1.0 |F1.0 | |FEEL3 | 6|Nominal |Input| 8|Right |F1.0 |F1.0 | |LIKE1 | 7|Nominal |Input| 8|Right |F1.0 |F1.0 | |LIKE2 | 8|Nominal |Input| 8|Right |F1.0 |F1.0 | |LIKE3 | 9|Nominal |Input| 8|Right |F1.0 |F1.0 | |LIKE4 | 10|Nominal |Input| 8|Right |F1.0 |F1.0 | |LIKE5 | 11|Nominal |Input| 8|Right |F1.0 |F1.0 | |LIKE6 | 12|Nominal |Input| 8|Right |F1.0 |F1.0 | |LIKE7 | 13|Nominal |Input| 8|Right |F1.0 |F1.0 | |LIKE8 | 14|Nominal |Input| 8|Right |F1.0 |F1.0 | |LIKE9 | 15|Nominal |Input| 8|Right |F1.0 |F1.0 | |LIKE10 | 16|Nominal |Input| 8|Right |F1.0 |F1.0 | |ACTU11 | 17|Nominal |Input| 8|Right |F1.0 |F1.0 | |ACTU12 | 18|Nominal |Input| 8|Right |F1.0 |F1.0 | |ACTU13 | 19|Nominal |Input| 8|Right |F1.0 |F1.0 | +-------+--------+---------------+-----+-----+---------+-----------+----------+The 2 cases that were originally 4 records of data have been turned into 2 cases of one row each. And you can see all the F1.0 formats that read them. One column for many variables.
Moral to the story: Yes, PSPP can read multi-record data as it is defined here. PSPP has been able to do this via
DATA LIST FIXEDfor many, many years, at least as far back as v0.78 of PSPP or older versions. Even psppire can run the code above. But the command used doesn't appear in any GUI menu (use the Syntax Editor to copy or enter the code and run it from the Run menu). The functionality exists and pspp can read multi-record files.Reading Multi-record Data with GET DATA
The PSPP user manual describes a
/rec#form ofGET DATAfor multi-record fixed-column files, and PSPP accepts this form of the command. There are several important points to understand about the GET DATA method of reading multi-record data:
- The first record is defined as /1, subsequent records follow as /2, /3, etc.
- All records must be read.
GET DATAdoes not support skipping records.FIXCASEdefines the number of records per case, similar to theRECORDSsubcommand ofDATA LIST FIXED.- Column locations in
GET DATAare defined using 0-based indexing (e.g. var1 0-5). The first column in the data is zero. This follows the PSPP user manual and differs from the 1-based column locations used byDATA LIST FIXEDso be careful when moving code between the two commands.- SPSS uses 0-based column indexing for fixed-format data in
GET DATA, PSPP follows this behavior for compatibility.NOTE: This example was run with pspp 2.1.1 on FreeBSD 14.3-STABLE, which works, as does psppire on Windows. The main thing to note here is that PSPP's
GET DATArequires 0-based column locations for fixed column data, as used here. It doesn't use the 1-based column locations expected by other pspp programs that read data.On FreeBSD 14.3-STABLE: pspp --version pspp (GNU PSPP) 2.1.1 Copyright (C) 2022 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>. This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Written by Ben Pfaff, John Darrington, and Jason Stover.* Note Zero-Based column locations here. GET DATA /TYPE=TXT /FILE='studydat' /FIXCASE=4 /ARRANGEMENT=FIXED /VARIABLES /1 SUBJECT 0-2 F STUDY 3 F TRTMENT 4-5 F FEEL1 9 F FEEL2 10 F FEEL3 11 F /2 LIKE1 0 F LIKE2 1 F LIKE3 2 F LIKE4 3 F LIKE5 4 F LIKE6 5 F LIKE7 6 F LIKE8 7 F LIKE9 8 F LIKE10 9 F ACTU11 10 F /3 LINE3 0-39 F LINE3B 40-59 F /4 LINE4 0-12 F . DELETE VARIABLES LINE3 LINE3B LINE4. LIST. DISPLAY DICTIONARY.Output:Data List +-------+-----+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ |SUBJECT|STUDY|TRTMENT|FEEL1|FEEL2|FEEL3|LIKE1|LIKE2|LIKE3|LIKE4|LIKE5|LIKE6| +-------+-----+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ | 44| 1| 4| 7| 1| 2| 8| 0| 5| 0| 6| 0| | 87| 1| 7| 6| 8| 8| 8| 1| 0| 0| 7| 1| +-------+-----+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+ +-----+-----+-----+------+------+------+------+ |LIKE7|LIKE8|LIKE9|LIKE10|ACTU11|ACTU12|ACTU13| +-----+-----+-----+------+------+------+------+ | 9| 0| 9| 0| 8| 0| 8| | 0| 0| 8| 0| 7| 0| 5| +-----+-----+-----+------+------+------+------+ Variables +-------+--------+---------------+-----+-----+---------+-----------+----------+ | | | Measurement | | | | Print | Write | |Name |Position| Level | Role|Width|Alignment| Format | Format | +-------+--------+---------------+-----+-----+---------+-----------+----------+ |SUBJECT| 1|Scale |Input| 8|Right |F3.0 |F3.0 | |STUDY | 2|Nominal |Input| 8|Right |F1.0 |F1.0 | |TRTMENT| 3|Nominal |Input| 8|Right |F2.0 |F2.0 | |FEEL1 | 4|Nominal |Input| 8|Right |F1.0 |F1.0 | |FEEL2 | 5|Nominal |Input| 8|Right |F1.0 |F1.0 | |FEEL3 | 6|Nominal |Input| 8|Right |F1.0 |F1.0 | |LIKE1 | 7|Nominal |Input| 8|Right |F1.0 |F1.0 | |LIKE2 | 8|Nominal |Input| 8|Right |F1.0 |F1.0 | |LIKE3 | 9|Nominal |Input| 8|Right |F1.0 |F1.0 | |LIKE4 | 10|Nominal |Input| 8|Right |F1.0 |F1.0 | |LIKE5 | 11|Nominal |Input| 8|Right |F1.0 |F1.0 | |LIKE6 | 12|Nominal |Input| 8|Right |F1.0 |F1.0 | |LIKE7 | 13|Nominal |Input| 8|Right |F1.0 |F1.0 | |LIKE8 | 14|Nominal |Input| 8|Right |F1.0 |F1.0 | |LIKE9 | 15|Nominal |Input| 8|Right |F1.0 |F1.0 | |LIKE10 | 16|Nominal |Input| 8|Right |F1.0 |F1.0 | |ACTU11 | 17|Nominal |Input| 8|Right |F1.0 |F1.0 | |ACTU12 | 18|Nominal |Input| 8|Right |F1.0 |F1.0 | |ACTU13 | 19|Nominal |Input| 8|Right |F1.0 |F1.0 | +-------+--------+---------------+-----+-----+---------+-----------+----------+Prevent Errors When the Data Don’t Match the Expected Pattern
For anyone unfamiliar with the vertical, sequential reading pattern that multi-record files depend upon, the format may seem unusual at first. The structure itself is not the problem. It is an efficient, compact way to store data--and it still works well today. But because it relies on reading multiple records of data in the correct sequence, even a small slip can create problems that are hard to detect.
Just because it looks okay doesn’t always mean it is.
If a record is missing, out of order, or removed from the read pattern, the software doesn't complain. It simply reads the next physical record as the next logical one, and everything after that is wrong. The first record data may appear correct, but the misalignment often becomes apparent only on later records. Because PSPP cannot infer what was intended, it is essential to run
FREQUENCIESon a few key variables drawn from several records to confirm that the data were read correctly.When all variables are single digits, a one‑off shift can still produce values that look plausible. In those cases,
FREQUENCIESmay not reveal the problem because the range of values is unchanged--only the counts shift. This is where having the original, documented frequencies or an original filled-out survey form is essential. WhenFREQUENCIEScannot be trusted to expose a misread due to complexity, the only dependable method is to check the values by hand to see where the read went wrong.