Tutorial¶

We now illustrate the basic capabilities of the respy package. We start with the model specification and then turn to some example use cases.

Model Specification¶

The model is specified in an initialization file. For an example, check out the first parameterization analyzed in Keane and Wolpin (1994) here. Let us discuss each of its elements in more detail.

BASICS

Key	Value	Interpretation
periods	int	number of periods
delta	float	discount factor

Warning

There are two small differences compared to Keane and Wolpin (1994). First, all coefficients enter the return function with a positive sign, while the squared terms enter with a minus in the original paper. Second, the order of covariates is fixed across the two occupations. In the original paper, own experience always comes before other experience.

OCCUPATION A

Key	Value	Interpretation
coeff	float	intercept
coeff	float	return to schooling
coeff	float	experience Occupation A, linear
coeff	float	experience Occupation A, squared
coeff	float	experience Occupation B, linear
coeff	float	experience Occupation B, squared

OCCUPATION B

Key	Value	Interpretation
coeff	float	intercept
coeff	float	return to schooling
coeff	float	experience Occupation A, linear
coeff	float	experience Occupation A, squared
coeff	float	experience Occupation B, linear
coeff	float	experience Occupation B, squared

EDUCATION

Key	Value	Interpretation
coeff	float	consumption value
coeff	float	tuition cost
coeff	float	adjustment cost
max	int	maximum level of schooling
start	int	initial level of schooling

Warning

Again, there is a small difference between this setup and Keane and Wolpin (1994). There is no automatic change in sign for the tuition and adjustment costs. Thus, a $1,000 tuition cost must be specified as -1000.

HOME

Key	Value	Interpretation
coeff	float	mean value of non-market alternative

SHOCKS

Key	Value	Interpretation
coeff	float	$\sigma_{1}$
coeff	float	$\sigma_{12}$
coeff	float	$\sigma_{13}$
coeff	float	$\sigma_{14}$
coeff	float	$\sigma_{2}$
coeff	float	$\sigma_{23}$
coeff	float	$\sigma_{24}$
coeff	float	$\sigma_{3}$
coeff	float	$\sigma_{34}$
coeff	float	$\sigma_{4}$

SOLUTION

Key	Value	Interpretation
draws	int	number of draws for $E\max$
store	bool	persistent storage of results
seed	int	random seed for $E\max$

SIMULATION

Key	Value	Interpretation
file	str	file to print simulated sample
agents	int	number of simulated agents
seed	int	random seed for agent experience

ESTIMATION

Key	Value	Interpretation
file	str	file to read observed sample
tau	float	scale parameter for function smoothing
agents	int	number of agents to read from sample
draws	int	number of draws for choice probabilities
maxfun	int	maximum number of function evaluations
seed	int	random seed for choice probability
optimizer	str	optimizer to use

PROGRAM

Key	Value	Interpretation
debug	bool	debug mode
version	str	program version

PARALLELISM

Key	Value	Interpretation
flag	bool	parallel executable
procs	int	number of processors

INTERPOLATION

Key	Value	Interpretation
points	int	number of interpolation points
flag	bool	flag to use interpolation

DERIVATIVES

Key	Value	Interpretation
version	str	approximation scheme
eps	float	step size

SCALING

Key	Value	Interpretation
flag	bool	apply scaling to parameters
minimum	float	minimum value for gradient approximation

The implemented optimization algorithms vary with the program’s version. If you request the Python version of the program, you can choose from the scipy implementations of the BFGS (Norcedal and Wright, 2006) and POWELL (Powell, 1964) algorithm. Their implementation details are available here. For Fortran, we implemented the BFGS and NEWUOA (Powell, 2004) algorithms.

SCIPY-BFGS

Key	Value	Interpretation
gtol	float	gradient norm must be less than gtol before successful termination
maxiter	int	maximum number of iterations

SCIPY-POWELL

Key	Value	Interpretation
maxfun	int	maximum number of function evaluations to make
ftol	float	relative error in func(xopt) acceptable for convergence
xtol	float	line-search error tolerance

SCIPY-LBFGSB

Key	Value	Interpretation
eps	float	Step size used when approx_grad is True, for numerically calculating the gradient
factr	float	Multiple of the default machine precision used to determine the relative error in func(xopt) acceptable for convergence
m	int	Maximum number of variable metric corrections used to define the limited memory matrix.
maxiter	int	maximum number of iterations
maxls	int	Maximum number of line search steps (per iteration). Default is 20.
pgtol	float	gradient norm must be less than gtol before successful termination

FORT-BFGS

Key	Value	Interpretation
gtol	float	gradient norm must be less than gtol before successful termination
maxiter	int	maximum number of iterations

FORT-NEWUOA

Key	Value	Interpretation
maxfun	float	maximum number of function evaluations
npt	int	number of points for approximation model
rhobeg	float	starting value for size of trust region
rhoend	float	minimum value of size for trust region

FORT-BOBYQA

Key	Value	Interpretation
maxfun	float	maximum number of function evaluations
npt	int	number of points for approximation model
rhobeg	float	starting value for size of trust region
rhoend	float	minimum value of size for trust region

Constraints for the Optimizer¶

If you want to keep any parameter fixed at the value you specified (i.e. not estimate this parameter) you can simply add an exclamation mark after the value. If you want to provide bounds for a constrained optimizer you can specify a lower and upper bound in round brackets. A section of such an .ini file would look as follows:

coeff             -0.049538516229344
coeff              0.020000000000000     !
coeff             -0.037283956168153       (-0.5807488086366478,None)
coeff              0.036340835226155     ! (None,0.661243603948984)

In this example, the first coefficient is free. The second one is fixed at 0.2. The third one will be estimated but has a lower bound. In the fourth case, the parameter is fixed and the bounds will be ignored.

If you specify bounds for any free parameter, you have to choose a constraint optimizer such as SCIPY-LBFGSB or FORT-BOBYQA.

Dataset¶

To use respy, you need a dataset with the following columns:

Identifier: identifies the different individuals in the sample
Period: identifies the different rounds of observation for each individual
Choice: an integer variable that indicates the labor market choice
- 1 = Occupation A
- 2 = Occupation B
- 3 = Education
- 4 = Home
Earnings: a float variable that indicates how much people are earning. This variable is missing (indicated by a dot) if individuals don’t work.
Experience_A: labor market experience in sector A
Experience_B: labor market experience in sector B
Years_Schooling: years of schooling
Lagged_Choice: choice in the period before the model starts. Codes are the same as in Choice.

Datasets for respy are stored in simple text files, where columns are separated by spaces. The easiest way to write such a text file in Python is to create a pandas DataFrame with all relevant columns and then storing it in the following way:

with open('my_data.respy.dat', 'w') as file:
    df.to_string(file, index=False, header=True, na_rep='.')

Examples¶

Let us explore the basic capabilities of the respy package with a couple of examples. All the material is available online.

Simulation and Estimation

We always first initialize an instance of the RespyCls by passing in the path to the initialization file.

from respy import RespyCls

respy_obj = RespyCls('example.ini')

Now we can simulate a sample from the specified model.

respy_obj.simulate()

During the simulation, several files will appear in the current working directory. sol.respy.log allows to monitor the progress of the solution algorithm, while sim.respy.log records the progress of the simulation. The simulated dataset with the agents’ choices and state experiences is stored in data.respy.dat, data.respy.info provides some basic descriptives about the simulated dataset. See our section on Additional Details for more information regarding the output files.

Now that we simulated some data, we can start an estimation. Here we are using the simulated data for the estimation. However, you can of course also use other data sources. Just make sure they follow the layout of the simulated sample. The coefficient values in the initialization file serve as the starting values.

x, crit_val = respy_obj.fit()

This directly returns the value of the coefficients at the final step of the optimizer as well as the value of the criterion function. However, some additional files appear in the meantime. Monitoring the estimation is best done using est.respy.info and more details about each evaluation of the criterion function are available in est.respy.log.

We can now simulate a sample using the estimated parameters by updating the instance of the RespyCls.

respy_obj.update_model_paras(x)

respy_obj.simulate()

Recomputing Keane and Wolpin (1994)

Just using the capabilities outlined so far, it is straightforward to recompute some of the key results in the original paper with a simple script.

#!/usr/bin/env python
""" This module recomputes some of the key results of Keane and Wolpin (1994).
"""

from respy import RespyCls

# We can simply iterate over the different model specifications outlined in
# Table 1 of their paper.
for spec in ['kw_data_one.ini', 'kw_data_two.ini', 'kw_data_three.ini']:

    # Process relevant model initialization file
    respy_obj = RespyCls(spec)

    # Let us simulate the datasets discussed on the page 658.
    respy_obj.simulate()

    # To start estimations for the Monte Carlo exercises. For now, we just
    # evaluate the model at the starting values, i.e. maxfun set to zero in
    # the initialization file.
    respy_obj.unlock()
    respy_obj.set_attr('maxfun', 0)
    respy_obj.lock()

    respy_obj.fit()

In an earlier working paper, Keane and Wolpin (1994b) provide a full account of the choice distributions for all three specifications. The results from the recomputation line up well with their reports.

Tutorial¶

Model Specification¶

Constraints for the Optimizer¶

Dataset¶

Examples¶

respy

Navigation

Related Topics

Key	Value	Interpretation
coeff	float	\(\sigma_{1}\)
coeff	float	\(\sigma_{12}\)
coeff	float	\(\sigma_{13}\)
coeff	float	\(\sigma_{14}\)
coeff	float	\(\sigma_{2}\)
coeff	float	\(\sigma_{23}\)
coeff	float	\(\sigma_{24}\)
coeff	float	\(\sigma_{3}\)
coeff	float	\(\sigma_{34}\)
coeff	float	\(\sigma_{4}\)

Key	Value	Interpretation
draws	int	number of draws for \(E\max\)
store	bool	persistent storage of results
seed	int	random seed for \(E\max\)