Tutorial

We now illustrate the basic capabilities of the respy package. We start with the model specification and then turn to some example use cases.

Model Specification

The model is specified in an initialization file. For an example, check out the first parameterization analyzed in Keane and Wolpin (1994) here. Let us discuss each of its elements in more detail.

BASICS

Key Value Interpretation
periods int number of periods
delta float discount factor

Warning

There are two small differences compared to Keane and Wolpin (1994). First, all coefficients enter the return function with a positive sign, while the squared terms enter with a minus in the original paper. Second, the order of covariates is fixed across the two occupations. In the original paper, own experience always comes before other experience.

OCCUPATION A

Key Value Interpretation
coeff float intercept
coeff float return to schooling
coeff float experience Occupation A, linear
coeff float experience Occupation A, squared
coeff float experience Occupation B, linear
coeff float experience Occupation B, squared

OCCUPATION B

Key Value Interpretation
coeff float intercept
coeff float return to schooling
coeff float experience Occupation A, linear
coeff float experience Occupation A, squared
coeff float experience Occupation B, linear
coeff float experience Occupation B, squared

EDUCATION

Key Value Interpretation
coeff float consumption value
coeff float tuition cost
coeff float adjustment cost
max int maximum level of schooling
start int initial level of schooling

Warning

Again, there is a small difference between this setup and Keane and Wolpin (1994). There is no automatic change in sign for the tuition and adjustment costs. Thus, a $1,000 tuition cost must be specified as -1000.

HOME

Key Value Interpretation
coeff float mean value of non-market alternative

SHOCKS

Key Value Interpretation
coeff float \(\sigma_{1}\)
coeff float \(\sigma_{12}\)
coeff float \(\sigma_{13}\)
coeff float \(\sigma_{14}\)
coeff float \(\sigma_{2}\)
coeff float \(\sigma_{23}\)
coeff float \(\sigma_{24}\)
coeff float \(\sigma_{3}\)
coeff float \(\sigma_{34}\)
coeff float \(\sigma_{4}\)

SOLUTION

Key Value Interpretation
draws int number of draws for \(E\max\)
store bool persistent storage of results
seed int random seed for \(E\max\)

SIMULATION

Key Value Interpretation
file str file to print simulated sample
agents int number of simulated agents
seed int random seed for agent experience

ESTIMATION

Key Value Interpretation
file str file to read observed sample
tau float scale parameter for function smoothing
agents int number of agents to read from sample
draws int number of draws for choice probabilities
maxfun int maximum number of function evaluations
seed int random seed for choice probability
optimizer str optimizer to use

PROGRAM

Key Value Interpretation
debug bool debug mode
version str program version

PARALLELISM

Key Value Interpretation
flag bool parallel executable
procs int number of processors

INTERPOLATION

Key Value Interpretation
points int number of interpolation points
flag bool flag to use interpolation

DERIVATIVES

Key Value Interpretation
version str approximation scheme
eps float step size

SCALING

Key Value Interpretation
flag bool apply scaling to parameters
minimum float minimum value for gradient approximation

The implemented optimization algorithms vary with the program’s version. If you request the Python version of the program, you can choose from the scipy implementations of the BFGS (Norcedal and Wright, 2006) and POWELL (Powell, 1964) algorithm. Their implementation details are available here. For Fortran, we implemented the BFGS and NEWUOA (Powell, 2004) algorithms.

SCIPY-BFGS

Key Value Interpretation
gtol float gradient norm must be less than gtol before successful termination
maxiter int maximum number of iterations

SCIPY-POWELL

Key Value Interpretation
maxfun int maximum number of function evaluations to make
ftol float relative error in func(xopt) acceptable for convergence
xtol float line-search error tolerance

SCIPY-LBFGSB

Key Value Interpretation
eps float Step size used when approx_grad is True, for numerically calculating the gradient
factr float Multiple of the default machine precision used to determine the relative error in func(xopt) acceptable for convergence
m int Maximum number of variable metric corrections used to define the limited memory matrix.
maxiter int maximum number of iterations
maxls int Maximum number of line search steps (per iteration). Default is 20.
pgtol float gradient norm must be less than gtol before successful termination

FORT-BFGS

Key Value Interpretation
gtol float gradient norm must be less than gtol before successful termination
maxiter int maximum number of iterations

FORT-NEWUOA

Key Value Interpretation
maxfun float maximum number of function evaluations
npt int number of points for approximation model
rhobeg float starting value for size of trust region
rhoend float minimum value of size for trust region

FORT-BOBYQA

Key Value Interpretation
maxfun float maximum number of function evaluations
npt int number of points for approximation model
rhobeg float starting value for size of trust region
rhoend float minimum value of size for trust region

Constraints for the Optimizer

If you want to keep any parameter fixed at the value you specified (i.e. not estimate this parameter) you can simply add an exclamation mark after the value. If you want to provide bounds for a constrained optimizer you can specify a lower and upper bound in round brackets. A section of such an .ini file would look as follows:

coeff             -0.049538516229344
coeff              0.020000000000000     !
coeff             -0.037283956168153       (-0.5807488086366478,None)
coeff              0.036340835226155     ! (None,0.661243603948984)

In this example, the first coefficient is free. The second one is fixed at 0.2. The third one will be estimated but has a lower bound. In the fourth case, the parameter is fixed and the bounds will be ignored.

If you specify bounds for any free parameter, you have to choose a constraint optimizer such as SCIPY-LBFGSB or FORT-BOBYQA.

Dataset

To use respy, you need a dataset with the following columns:

  • Identifier: identifies the different individuals in the sample
  • Period: identifies the different rounds of observation for each individual
  • Choice: an integer variable that indicates the labor market choice
    • 1 = Occupation A
    • 2 = Occupation B
    • 3 = Education
    • 4 = Home
  • Earnings: a float variable that indicates how much people are earning. This variable is missing (indicated by a dot) if individuals don’t work.
  • Experience_A: labor market experience in sector A
  • Experience_B: labor market experience in sector B
  • Years_Schooling: years of schooling
  • Lagged_Choice: choice in the period before the model starts. Codes are the same as in Choice.

Datasets for respy are stored in simple text files, where columns are separated by spaces. The easiest way to write such a text file in Python is to create a pandas DataFrame with all relevant columns and then storing it in the following way:

with open('my_data.respy.dat', 'w') as file:
    df.to_string(file, index=False, header=True, na_rep='.')

Examples

Let us explore the basic capabilities of the respy package with a couple of examples. All the material is available online.

Simulation and Estimation

We always first initialize an instance of the RespyCls by passing in the path to the initialization file.

from respy import RespyCls

respy_obj = RespyCls('example.ini')

Now we can simulate a sample from the specified model.

respy_obj.simulate()

During the simulation, several files will appear in the current working directory. sol.respy.log allows to monitor the progress of the solution algorithm, while sim.respy.log records the progress of the simulation. The simulated dataset with the agents’ choices and state experiences is stored in data.respy.dat, data.respy.info provides some basic descriptives about the simulated dataset. See our section on Additional Details for more information regarding the output files.

Now that we simulated some data, we can start an estimation. Here we are using the simulated data for the estimation. However, you can of course also use other data sources. Just make sure they follow the layout of the simulated sample. The coefficient values in the initialization file serve as the starting values.

x, crit_val = respy_obj.fit()

This directly returns the value of the coefficients at the final step of the optimizer as well as the value of the criterion function. However, some additional files appear in the meantime. Monitoring the estimation is best done using est.respy.info and more details about each evaluation of the criterion function are available in est.respy.log.

We can now simulate a sample using the estimated parameters by updating the instance of the RespyCls.

respy_obj.update_model_paras(x)

respy_obj.simulate()

Recomputing Keane and Wolpin (1994)

Just using the capabilities outlined so far, it is straightforward to recompute some of the key results in the original paper with a simple script.

#!/usr/bin/env python
""" This module recomputes some of the key results of Keane and Wolpin (1994).
"""

from respy import RespyCls

# We can simply iterate over the different model specifications outlined in
# Table 1 of their paper.
for spec in ['kw_data_one.ini', 'kw_data_two.ini', 'kw_data_three.ini']:

    # Process relevant model initialization file
    respy_obj = RespyCls(spec)

    # Let us simulate the datasets discussed on the page 658.
    respy_obj.simulate()

    # To start estimations for the Monte Carlo exercises. For now, we just
    # evaluate the model at the starting values, i.e. maxfun set to zero in
    # the initialization file.
    respy_obj.unlock()
    respy_obj.set_attr('maxfun', 0)
    respy_obj.lock()

    respy_obj.fit()

In an earlier working paper, Keane and Wolpin (1994b) provide a full account of the choice distributions for all three specifications. The results from the recomputation line up well with their reports.