Tutorial ======== We now illustrate the basic capabilities of the ``respy`` package. We start with the model specification and then turn to some example use cases. Model Specification ------------------- The model is specified in an initialization file. For an example, check out the first parameterization analyzed in Keane and Wolpin (1994) `here `__. Let us discuss each of its elements in more detail. **BASICS** ======= ====== ================== Key Value Interpretation ======= ====== ================== periods int number of periods delta float discount factor ======= ====== ================== .. Warning:: There are two small differences compared to Keane and Wolpin (1994). First, all coefficients enter the return function with a positive sign, while the squared terms enter with a minus in the original paper. Second, the order of covariates is fixed across the two occupations. In the original paper, own experience always comes before other experience. **OCCUPATION A** ======= ====== ============== Key Value Interpretation ======= ====== ============== coeff float intercept coeff float return to schooling coeff float experience Occupation A, linear coeff float experience Occupation A, squared coeff float experience Occupation B, linear coeff float experience Occupation B, squared ======= ====== ============== **OCCUPATION B** ======= ====== ================ Key Value Interpretation ======= ====== ================ coeff float intercept coeff float return to schooling coeff float experience Occupation A, linear coeff float experience Occupation A, squared coeff float experience Occupation B, linear coeff float experience Occupation B, squared ======= ====== ================ **EDUCATION** ======= ====== ========================== Key Value Interpretation ======= ====== ========================== coeff float consumption value coeff float tuition cost coeff float adjustment cost max int maximum level of schooling start int initial level of schooling ======= ====== ========================== .. Warning:: Again, there is a small difference between this setup and Keane and Wolpin (1994). There is no automatic change in sign for the tuition and adjustment costs. Thus, a \$1,000 tuition cost must be specified as -1000. **HOME** ======= ====== ========================== Key Value Interpretation ======= ====== ========================== coeff float mean value of non-market alternative ======= ====== ========================== **SHOCKS** ======= ====== ========================== Key Value Interpretation ======= ====== ========================== coeff float :math:`\sigma_{1}` coeff float :math:`\sigma_{12}` coeff float :math:`\sigma_{13}` coeff float :math:`\sigma_{14}` coeff float :math:`\sigma_{2}` coeff float :math:`\sigma_{23}` coeff float :math:`\sigma_{24}` coeff float :math:`\sigma_{3}` coeff float :math:`\sigma_{34}` coeff float :math:`\sigma_{4}` ======= ====== ========================== **SOLUTION** ======= ====== ========================== Key Value Interpretation ======= ====== ========================== draws int number of draws for :math:`E\max` store bool persistent storage of results seed int random seed for :math:`E\max` ======= ====== ========================== **SIMULATION** ======= ====== ========================== Key Value Interpretation ======= ====== ========================== file str file to print simulated sample agents int number of simulated agents seed int random seed for agent experience ======= ====== ========================== **ESTIMATION** ========== ====== ========================== Key Value Interpretation ========== ====== ========================== file str file to read observed sample tau float scale parameter for function smoothing agents int number of agents to read from sample draws int number of draws for choice probabilities maxfun int maximum number of function evaluations seed int random seed for choice probability optimizer str optimizer to use ========== ====== ========================== **PROGRAM** ======= ====== ========================== Key Value Interpretation ======= ====== ========================== debug bool debug mode version str program version ======= ====== ========================== **PARALLELISM** ======= ====== ========================== Key Value Interpretation ======= ====== ========================== flag bool parallel executable procs int number of processors ======= ====== ========================== **INTERPOLATION** ======= ====== ========================== Key Value Interpretation ======= ====== ========================== points int number of interpolation points flag bool flag to use interpolation ======= ====== ========================== **DERIVATIVES** ======= ====== ========================== Key Value Interpretation ======= ====== ========================== version str approximation scheme eps float step size ======= ====== ========================== **SCALING** ======= ====== ========================== Key Value Interpretation ======= ====== ========================== flag bool apply scaling to parameters minimum float minimum value for gradient approximation ======= ====== ========================== The implemented optimization algorithms vary with the program's version. If you request the Python version of the program, you can choose from the ``scipy`` implementations of the BFGS (Norcedal and Wright, 2006) and POWELL (Powell, 1964) algorithm. Their implementation details are available `here `__. For Fortran, we implemented the BFGS and NEWUOA (Powell, 2004) algorithms. **SCIPY-BFGS** ======= ====== ========================== Key Value Interpretation ======= ====== ========================== gtol float gradient norm must be less than gtol before successful termination maxiter int maximum number of iterations ======= ====== ========================== **SCIPY-POWELL** ======= ====== ========================== Key Value Interpretation ======= ====== ========================== maxfun int maximum number of function evaluations to make ftol float relative error in func(xopt) acceptable for convergence xtol float line-search error tolerance ======= ====== ========================== **SCIPY-LBFGSB** ======= ====== ========================== Key Value Interpretation ======= ====== ========================== eps float Step size used when approx_grad is True, for numerically calculating the gradient factr float Multiple of the default machine precision used to determine the relative error in func(xopt) acceptable for convergence m int Maximum number of variable metric corrections used to define the limited memory matrix. maxiter int maximum number of iterations maxls int Maximum number of line search steps (per iteration). Default is 20. pgtol float gradient norm must be less than gtol before successful termination ======= ====== ========================== **FORT-BFGS** ======= ====== ========================== Key Value Interpretation ======= ====== ========================== gtol float gradient norm must be less than gtol before successful termination maxiter int maximum number of iterations ======= ====== ========================== **FORT-NEWUOA** ======= ====== ========================== Key Value Interpretation ======= ====== ========================== maxfun float maximum number of function evaluations npt int number of points for approximation model rhobeg float starting value for size of trust region rhoend float minimum value of size for trust region ======= ====== ========================== **FORT-BOBYQA** ======= ====== ========================== Key Value Interpretation ======= ====== ========================== maxfun float maximum number of function evaluations npt int number of points for approximation model rhobeg float starting value for size of trust region rhoend float minimum value of size for trust region ======= ====== ========================== Constraints for the Optimizer ----------------------------- If you want to keep any parameter fixed at the value you specified (i.e. not estimate this parameter) you can simply add an exclamation mark after the value. If you want to provide bounds for a constrained optimizer you can specify a lower and upper bound in round brackets. A section of such an .ini file would look as follows: .. code:: coeff -0.049538516229344 coeff 0.020000000000000 ! coeff -0.037283956168153 (-0.5807488086366478,None) coeff 0.036340835226155 ! (None,0.661243603948984) In this example, the first coefficient is free. The second one is fixed at 0.2. The third one will be estimated but has a lower bound. In the fourth case, the parameter is fixed and the bounds will be ignored. If you specify bounds for any free parameter, you have to choose a constraint optimizer such as SCIPY-LBFGSB or FORT-BOBYQA. Dataset ------- To use respy, you need a dataset with the following columns: - Identifier: identifies the different individuals in the sample - Period: identifies the different rounds of observation for each individual - Choice: an integer variable that indicates the labor market choice - 1 = Occupation A - 2 = Occupation B - 3 = Education - 4 = Home - Earnings: a float variable that indicates how much people are earning. This variable is missing (indicated by a dot) if individuals don't work. - Experience_A: labor market experience in sector A - Experience_B: labor market experience in sector B - Years_Schooling: years of schooling - Lagged_Choice: choice in the period before the model starts. Codes are the same as in Choice. Datasets for respy are stored in simple text files, where columns are separated by spaces. The easiest way to write such a text file in Python is to create a pandas DataFrame with all relevant columns and then storing it in the following way: .. code:: with open('my_data.respy.dat', 'w') as file: df.to_string(file, index=False, header=True, na_rep='.') Examples -------- Let us explore the basic capabilities of the ``respy`` package with a couple of examples. All the material is available `online `__. **Simulation and Estimation** We always first initialize an instance of the ``RespyCls`` by passing in the path to the initialization file. :: from respy import RespyCls respy_obj = RespyCls('example.ini') Now we can simulate a sample from the specified model. :: respy_obj.simulate() During the simulation, several files will appear in the current working directory. ``sol.respy.log`` allows to monitor the progress of the solution algorithm, while ``sim.respy.log`` records the progress of the simulation. The simulated dataset with the agents' choices and state experiences is stored in ``data.respy.dat``, ``data.respy.info`` provides some basic descriptives about the simulated dataset. See our section on :ref:`Additional Details ` for more information regarding the output files. Now that we simulated some data, we can start an estimation. Here we are using the simulated data for the estimation. However, you can of course also use other data sources. Just make sure they follow the layout of the simulated sample. The coefficient values in the initialization file serve as the starting values. :: x, crit_val = respy_obj.fit() This directly returns the value of the coefficients at the final step of the optimizer as well as the value of the criterion function. However, some additional files appear in the meantime. Monitoring the estimation is best done using ``est.respy.info`` and more details about each evaluation of the criterion function are available in ``est.respy.log``. We can now simulate a sample using the estimated parameters by updating the instance of the ``RespyCls``. :: respy_obj.update_model_paras(x) respy_obj.simulate() **Recomputing Keane and Wolpin (1994)** Just using the capabilities outlined so far, it is straightforward to recompute some of the key results in the original paper with a simple script. :: #!/usr/bin/env python """ This module recomputes some of the key results of Keane and Wolpin (1994). """ from respy import RespyCls # We can simply iterate over the different model specifications outlined in # Table 1 of their paper. for spec in ['kw_data_one.ini', 'kw_data_two.ini', 'kw_data_three.ini']: # Process relevant model initialization file respy_obj = RespyCls(spec) # Let us simulate the datasets discussed on the page 658. respy_obj.simulate() # To start estimations for the Monte Carlo exercises. For now, we just # evaluate the model at the starting values, i.e. maxfun set to zero in # the initialization file. respy_obj.unlock() respy_obj.set_attr('maxfun', 0) respy_obj.lock() respy_obj.fit() In an earlier `working paper `_, Keane and Wolpin (1994b) provide a full account of the choice distributions for all three specifications. The results from the recomputation line up well with their reports.