Multi-Record Hierarchical Data

 


What Are We Talking About Here?

Multi-record data is fixed-width raw data where each case spans MULTIPLE LINES in a fixed sequence, potentially of different lengths, but always WITH DIFFERENT CONTENTS. This data format still appears in industries where legacy reporting systems, long-running surveys, or fixed-width exports are produced, or where efficient storage is important. It's an ancient format used in the days of punched paper cards.

A common misconception is that PSPP lacks support for multi‑record data files. In reality, PSPP has long supported reading multi‑record data using DATA LIST FIXED with RECORDS= and slash notation. The fact that DATA LIST FIXED does not appear in the psppire graphical interface means users need to rely on the PSPP syntax engine. The code can be run in psppire in the Syntax Editor.

In this context, “multi‑record” refers to classic SPSS/PSPP hierarchical files -- sometimes called multi-record case files or multi-type record files -- where a single case spans MULTIPLE PHYSICAL RECORDS, EACH WITH A DIFFERENT LAYOUT. This is not the modern usage of “multi‑record” to mean “multiple rows per ID” in a CSV or database table, where every row has the same structure or schema. It also has nothing to do with multiple response sets (MRSETS) which are a completely different SPSS/PSPP feature.

Here are two screenshots of parts of the 1980 PUMS codebook showing two different record types (Person and Housing):

1980 PUMS Codebook example (P Records) [click to enlarge]:

PUMS P Record screenshot

1980 PUMS Codebook example (H Records) [click to enlarge]:

PUMS H Record screenshot

Person records are different from Housing records because they collect different types of data (age vs monthly rent, for example). They can be combined in a multi-record data file very easily

PSPP can read this kind of multi-record data directly. This page shows how in two different examples. There is no need to preprocess this data to get it into a dataset.

You may hear these referred to as multi-record case files, multi-record data files, stacked data, or (historically) hierarchical data files. This format originated when data were stored on punched paper cards, which limited the width of the data. Instead of making each record wider, this format made it taller by spreading one case across multiple fixed-width cards in a defined sequence.

Comparing Rectangular and Multi-line Data Used Today

Here is what modern data looks like most of the time:

id,age,income,married
101,34,55000,1
102,29,48000,0
103,41,72000,1

Or, if the data are multiple files by ID it might look like this:

id,visit,weight
101,1,180
101,2,178
101,3,176

Multi-record hierarchical data (in the SPSS/PSPP meaning) looks like this:

1John Smith      1985M
2  55000  1
3  2  1  0  3
1Mary Johnson    1990F
2  48000  0
3  1  0  0  2

Record 1 = demographics

Record 2 = income + marital status

Record 3 = responses to 4 items

How to Read Multi-Record Raw Data with PSPP

This format stores a large number of variables and values in a compact form, spreading each case across multiple physical records. However, the format is not self-documenting. External documentation is required to interpret the file, usually in the form of a codebook defining how the data were collected, what the records represent, the variable names, special handling needed, and the column locations. If you have existing PSPP code that reads the data correctly, use that. Sometimes multi-record files come with a program to read the data (e.g. SPSS, PSPP, SAS).

Workflow Tip:Always ask for the codebook when working with multi-record files. These files cannot be interpreted reliably without the accompanying documentation, and guessing the layout almost always leads to errors.

Here is an example of multi-record data with 4 records of survey data for two subjects:

044104101712288855439855238255785555221010000010000000005080
805060909080808090500000708000005092543214424122111432310100
000200000000080908080810090909100800000007070000051015441344
2412311243221
087107002688878693388888577338897238210020001020001010308090
810071008070506060906030607020505092444443225332444554400200
000100010001081009090810100708090707060304070302050845444421
1522124444441

Each block in the example above consists of four records or "cards." Single digit survey responses make it nearly impossible to tell which value belongs to which variable. Different variables appear on different records (or lines in the file), so this file cannot be read vertically as a simple rectangular dataset or CSV file. Avoid the common mistake of treating a multi-record file as a rectangular file of records to be read the same way.

PSPP can handle this type of data by reading each record with a separate set of input statements for each record in the case. It then combines the values from all the records into a single (wide) case which can be exported to a format other programs can read. Exporting to CSV format, for example makes the data readable by many programs.

Tip: Every Record/Card Must be Read or Explicitly Skipped

A multi‑record file is like a stack of playing cards. PSPP reads them in order, one record at a time, and each card is different. There are two safe options to not read every data record:

Every record must be accounted for. Removing a record is like removing variables from the middle of a DATA LIST FREE statement — everything after the gap shifts left, and the data becomes misaligned. In multi-record files, this misalignment appears vertically. Experience has shown that such misalignments can be very difficult to debug. So be careful in your setups.

Multi‑record files follow a vertical, sequential pattern that is not obvious at first glance. Without understanding the data file structure, it is easy to make assumptions that don’t match how these files must be read.

This page shows how to read a multi‑record file with PSPP and export the assembled dataset to CSV or SAV. Once in a modern format, the data can be used in modern analysis tools.

Before looking at the PSPP commands, it helps to see the basic pattern of a multi-record file. Each record has its own layout, and PSPP reads them in order:

- Record 1 → /1 variables (Subject IDs and Treatments)
- Record 2 → /2 variables (Attitudes, Part 1)
- Record 3 → /3 variables (Attitudes, Part 2)
- etc.

This dataset has 4 records per case so RECORDS=4 is used. Two of the records are not needed for analysis, so PSPP skips them using blank slash lines below. This technique works well for reading multi-record files. Later I'll show how GET DATA reads the same data. Thus there are two ways to read multi-record data in PSPP.

* Read multi-record data and skip two records.
DATA LIST FILE=studydat RECORDS=4 NOTABLE 
 /1 SUBJECT 1-3  STUDY 4  TRTMENT 5-6
    FEEL1 10  FEEL2 11  FEEL3 12
 /2 LIKE1 1  LIKE2 2  LIKE3 3
    LIKE4 4  LIKE5 5  LIKE6 6  LIKE7 7  LIKE8 8
    LIKE9 9  LIKE10 10  ACTU11 11  ACTU12 12  ACTU13 13
 /
 /
 .

SAVE OUTFILE='studydat_wide.sav'.

To skip a record , the user instructs PSPP to advance one record for each slash (/). To skip two records, enter two slashes and end the command with a period.

This code illustrates how DATA LIST FIXED) reads multi-record data. First, the RECORDS=4 tells PSPP that each case consists of four records, even though in this case, two records are not used for analysis. The two records not being read still need to be accounted for to keep the read in sync with the data.

Each record is defined by a /N and followed by variable names and their column locations on that record. Note that the column locations differ across the records because each record has its own data layout and contents. In this example, /1 and /2 define the two records being read. The two slashes account for the other two records and move the pointer for DATA LIST FIXED when reading the data.

Output:

                                  Data List                                                                             
+-------+-----+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+
|SUBJECT|STUDY|TRTMENT|FEEL1|FEEL2|FEEL3|LIKE1|LIKE2|LIKE3|LIKE4|LIKE5|LIKE6|
+-------+-----+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+
|     44|    1|      4|    7|    1|    2|    8|    0|    5|    0|    6|    0|  
|     87|    1|      7|    6|    8|    8|    8|    1|    0|    0|    7|    1|  
+-------+-----+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+

+-----+-----+-----+------+------+------+------+
|LIKE7|LIKE8|LIKE9|LIKE10|ACTU11|ACTU12|ACTU13|
+-----+-----+-----+------+------+------+------+
|    9|    0|    9|     0|     8|     0|     8|  
|    0|    0|    8|     0|     7|     0|     5|  
+-----+-----+-----+------+------+------+------+

                                   Variables
+-------+--------+---------------+-----+-----+---------+-----------+----------+
|       |        |  Measurement  |     |     |         |   Print   |   Write  |
|Name   |Position|     Level     | Role|Width|Alignment|   Format  |  Format  |
+-------+--------+---------------+-----+-----+---------+-----------+----------+
|SUBJECT|       1|Scale          |Input|    8|Right    |F3.0       |F3.0      |   
|STUDY  |       2|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|TRTMENT|       3|Nominal        |Input|    8|Right    |F2.0       |F2.0      |   
|FEEL1  |       4|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|FEEL2  |       5|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|FEEL3  |       6|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|LIKE1  |       7|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|LIKE2  |       8|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|LIKE3  |       9|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|LIKE4  |      10|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|LIKE5  |      11|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|LIKE6  |      12|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|LIKE7  |      13|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|LIKE8  |      14|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|LIKE9  |      15|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|LIKE10 |      16|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|ACTU11 |      17|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|ACTU12 |      18|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|ACTU13 |      19|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
+-------+--------+---------------+-----+-----+---------+-----------+----------+

The 2 cases that were originally 4 records of data have been turned into 2 cases of one row each. And you can see all the F1.0 formats that read them. One column for many variables.

Moral to the story: Yes, PSPP can read multi-record data as it is defined here. PSPP has been able to do this via DATA LIST FIXED for many, many years, at least as far back as v0.78 of PSPP or older versions. Even psppire can run the code above. But the command used doesn't appear in any GUI menu (use the Syntax Editor to copy or enter the code and run it from the Run menu). The functionality exists and pspp can read multi-record files.

Reading Multi-record Data with GET DATA

The PSPP user manual describes a /rec# form of GET DATA for multi-record fixed-column files, and PSPP accepts this form of the command. There are several important points to understand about the GET DATA method of reading multi-record data:

NOTE: This example was run with pspp 2.1.1 on FreeBSD 14.3-STABLE, which works, as does psppire on Windows. The main thing to note here is that PSPP's GET DATA requires 0-based column locations for fixed column data, as used here. It doesn't use the 1-based column locations expected by other pspp programs that read data.

 On FreeBSD 14.3-STABLE:
pspp --version
pspp (GNU PSPP) 2.1.1
Copyright (C) 2022 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <https://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Written by Ben Pfaff, John Darrington, and Jason Stover.
* Note Zero-Based column locations here.
GET DATA 
/TYPE=TXT
/FILE='studydat' 
/FIXCASE=4
/ARRANGEMENT=FIXED
/VARIABLES /1 SUBJECT 0-2  F STUDY 3 F TRTMENT  4-5 F  FEEL1 9 F FEEL2 10 F FEEL3 11 F
  /2 LIKE1 0 F  LIKE2 1 F  LIKE3 2 F LIKE4 3 F LIKE5 4 F LIKE6 5 F LIKE7 6 F LIKE8 7 F LIKE9 8 F  LIKE10 9 F ACTU11 10 F
  /3 LINE3 0-39 F LINE3B 40-59 F
  /4 LINE4 0-12 F
 .
DELETE VARIABLES LINE3 LINE3B LINE4.
LIST.
DISPLAY DICTIONARY.
Output:
                                  Data List                                                                             
+-------+-----+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+
|SUBJECT|STUDY|TRTMENT|FEEL1|FEEL2|FEEL3|LIKE1|LIKE2|LIKE3|LIKE4|LIKE5|LIKE6|
+-------+-----+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+
|     44|    1|      4|    7|    1|    2|    8|    0|    5|    0|    6|    0|  
|     87|    1|      7|    6|    8|    8|    8|    1|    0|    0|    7|    1|  
+-------+-----+-------+-----+-----+-----+-----+-----+-----+-----+-----+-----+

+-----+-----+-----+------+------+------+------+
|LIKE7|LIKE8|LIKE9|LIKE10|ACTU11|ACTU12|ACTU13|
+-----+-----+-----+------+------+------+------+
|    9|    0|    9|     0|     8|     0|     8|  
|    0|    0|    8|     0|     7|     0|     5|  
+-----+-----+-----+------+------+------+------+

                                   Variables
+-------+--------+---------------+-----+-----+---------+-----------+----------+
|       |        |  Measurement  |     |     |         |   Print   |   Write  |
|Name   |Position|     Level     | Role|Width|Alignment|   Format  |  Format  |
+-------+--------+---------------+-----+-----+---------+-----------+----------+
|SUBJECT|       1|Scale          |Input|    8|Right    |F3.0       |F3.0      |   
|STUDY  |       2|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|TRTMENT|       3|Nominal        |Input|    8|Right    |F2.0       |F2.0      |   
|FEEL1  |       4|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|FEEL2  |       5|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|FEEL3  |       6|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|LIKE1  |       7|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|LIKE2  |       8|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|LIKE3  |       9|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|LIKE4  |      10|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|LIKE5  |      11|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|LIKE6  |      12|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|LIKE7  |      13|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|LIKE8  |      14|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|LIKE9  |      15|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|LIKE10 |      16|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|ACTU11 |      17|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|ACTU12 |      18|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
|ACTU13 |      19|Nominal        |Input|    8|Right    |F1.0       |F1.0      |   
+-------+--------+---------------+-----+-----+---------+-----------+----------+

Prevent Errors When the Data Don’t Match the Expected Pattern

For anyone unfamiliar with the vertical, sequential reading pattern that multi-record files depend upon, the format may seem unusual at first. The structure itself is not the problem. It is an efficient, compact way to store data--and it still works well today. But because it relies on reading multiple records of data in the correct sequence, even a small slip can create problems that are hard to detect.

Just because it looks okay doesn’t always mean it is.

If a record is missing, out of order, or removed from the read pattern, the software doesn't complain. It simply reads the next physical record as the next logical one, and everything after that is wrong. The first record data may appear correct, but the misalignment often becomes apparent only on later records. Because PSPP cannot infer what was intended, it is essential to run FREQUENCIES on a few key variables drawn from several records to confirm that the data were read correctly.

When all variables are single digits, a one‑off shift can still produce values that look plausible. In those cases, FREQUENCIES may not reveal the problem because the range of values is unchanged--only the counts shift. This is where having the original, documented frequencies or an original filled-out survey form is essential. When FREQUENCIES cannot be trusted to expose a misread due to complexity, the only dependable method is to check the values by hand to see where the read went wrong.