Tutorial
========

We now illustrate the basic capabilities of the ``respy`` package. We start with the model specification and then turn to some example use cases.

Model Specification
-------------------

The model is specified in an initialization file. For an example, check out the first parameterization analyzed in Keane and Wolpin (1994) `here <https://github.com/OpenSourceEconomics/respy/blob/janosg/respy/tests/resources/kw_data_one.ini>`__. Let us discuss each of its elements in more detail.

**BASICS**

=======     ======      ==================
Key         Value       Interpretation
=======     ======      ==================
periods      int        number of periods
delta        float      discount factor
=======     ======      ==================

.. Warning::

    There are two small differences compared to Keane and Wolpin (1994). First, all coefficients enter the return function with a positive sign, while the squared terms enter with a minus in the original paper. Second, the order of covariates is fixed across the two occupations. In the original paper, own experience always comes before other experience.

**OCCUPATION A**

=======     ======    ==============
Key         Value     Interpretation
=======     ======    ==============
coeff       float     intercept
coeff       float     return to schooling
coeff       float     experience Occupation A, linear
coeff       float     experience Occupation A, squared
coeff       float     experience Occupation B, linear
coeff       float     experience Occupation B, squared
=======     ======    ==============

**OCCUPATION B**

=======     ======    ================
Key         Value     Interpretation
=======     ======    ================
coeff       float     intercept
coeff       float     return to schooling
coeff       float     experience Occupation A, linear
coeff       float     experience Occupation A, squared
coeff       float     experience Occupation B, linear
coeff       float     experience Occupation B, squared
=======     ======    ================

**EDUCATION**

======= ======    ==========================
Key     Value       Interpretation
======= ======    ==========================
coeff    float      consumption value
coeff    float      tuition cost
coeff    float      adjustment cost
max      int        maximum level of schooling
start    int        initial level of schooling
======= ======    ==========================

.. Warning::

    Again, there is a small difference between this setup and Keane and Wolpin (1994). There is no automatic change in sign for the tuition and adjustment costs. Thus, a \$1,000 tuition cost must be specified as -1000.

**HOME**

======= ======      ==========================
Key     Value       Interpretation
======= ======      ==========================
coeff    float      mean value of non-market alternative
======= ======      ==========================

**SHOCKS**

======= ======      ==========================
Key     Value       Interpretation
======= ======      ==========================
coeff    float      :math:`\sigma_{1}`
coeff    float      :math:`\sigma_{12}`
coeff    float      :math:`\sigma_{13}`
coeff    float      :math:`\sigma_{14}`
coeff    float      :math:`\sigma_{2}`
coeff    float      :math:`\sigma_{23}`
coeff    float      :math:`\sigma_{24}`
coeff    float      :math:`\sigma_{3}`
coeff    float      :math:`\sigma_{34}`
coeff    float      :math:`\sigma_{4}`
======= ======      ==========================

**SOLUTION**

=======     ======      ==========================
Key         Value       Interpretation
=======     ======      ==========================
draws       int         number of draws for :math:`E\max`
store       bool        persistent storage of results
seed        int         random seed for :math:`E\max`
=======     ======      ==========================

**SIMULATION**

=======     ======      ==========================
Key         Value       Interpretation
=======     ======      ==========================
file        str         file to print simulated sample
agents      int         number of simulated agents
seed        int         random seed for agent experience
=======     ======      ==========================

**ESTIMATION**

==========      ======      ==========================
Key             Value       Interpretation
==========      ======      ==========================
file            str         file to read observed sample
tau             float       scale parameter for function smoothing
agents          int         number of agents to read from sample
draws           int         number of draws for choice probabilities
maxfun          int         maximum number of function evaluations
seed            int         random seed for choice probability
optimizer       str         optimizer to use
==========      ======      ==========================

**PROGRAM**

=======     ======      ==========================
Key         Value       Interpretation
=======     ======      ==========================
debug       bool        debug mode
version     str         program version
=======     ======      ==========================

**PARALLELISM**

=======     ======      ==========================
Key         Value       Interpretation
=======     ======      ==========================
flag        bool        parallel executable
procs       int         number of processors
=======     ======      ==========================

**INTERPOLATION**

=======     ======      ==========================
Key         Value       Interpretation
=======     ======      ==========================
points      int         number of interpolation points
flag        bool        flag to use interpolation
=======     ======      ==========================

**DERIVATIVES**

=======     ======      ==========================
Key         Value       Interpretation
=======     ======      ==========================
version     str         approximation scheme
eps         float       step size
=======     ======      ==========================

**SCALING**

=======     ======      ==========================
Key         Value       Interpretation
=======     ======      ==========================
flag        bool        apply scaling to parameters
minimum     float       minimum value for gradient approximation
=======     ======      ==========================

The implemented optimization algorithms vary with the program's version. If you request the Python version of the program, you can choose from the ``scipy`` implementations of the BFGS  (Norcedal and Wright, 2006) and POWELL (Powell, 1964) algorithm. Their implementation details are available `here <http://docs.scipy.org/doc/scipy-0.17.0/reference/generated/scipy.optimize.minimize.html>`__. For Fortran, we implemented the BFGS and NEWUOA (Powell, 2004) algorithms.

**SCIPY-BFGS**

=======     ======      ==========================
Key         Value       Interpretation
=======     ======      ==========================
gtol        float       gradient norm must be less than gtol before successful termination
maxiter     int         maximum number of iterations
=======     ======      ==========================

**SCIPY-POWELL**

=======     ======      ==========================
Key         Value       Interpretation
=======     ======      ==========================
maxfun      int         maximum number of function evaluations to make
ftol        float       relative error in func(xopt) acceptable for convergence
xtol        float       line-search error tolerance
=======     ======      ==========================


**SCIPY-LBFGSB**

=======     ======      ==========================
Key         Value       Interpretation
=======     ======      ==========================
eps         float       Step size used when approx_grad is True, for numerically calculating the gradient
factr       float       Multiple of the default machine precision used to determine the relative error in func(xopt) acceptable for convergence
m           int         Maximum number of variable metric corrections used to define the limited memory matrix.
maxiter     int         maximum number of iterations
maxls       int         Maximum number of line search steps (per iteration). Default is 20.
pgtol       float       gradient norm must be less than gtol before successful termination
=======     ======      ==========================

**FORT-BFGS**

=======     ======      ==========================
Key         Value       Interpretation
=======     ======      ==========================
gtol        float       gradient norm must be less than gtol before successful termination
maxiter     int         maximum number of iterations
=======     ======      ==========================

**FORT-NEWUOA**

=======     ======      ==========================
Key         Value       Interpretation
=======     ======      ==========================
maxfun      float       maximum number of function evaluations
npt         int         number of points for approximation model
rhobeg      float       starting value for size of trust region
rhoend      float       minimum value of size for trust region
=======     ======      ==========================

**FORT-BOBYQA**

=======     ======      ==========================
Key         Value       Interpretation
=======     ======      ==========================
maxfun      float       maximum number of function evaluations
npt         int         number of points for approximation model
rhobeg      float       starting value for size of trust region
rhoend      float       minimum value of size for trust region
=======     ======      ==========================


Constraints for the Optimizer
-----------------------------

If you want to keep any parameter fixed at the value you specified (i.e. not estimate this parameter) you can simply add an exclamation mark after the value. If you want to provide bounds for a constrained optimizer you can specify a lower and upper bound in round brackets. A section of such an .ini file would look as follows:

.. code::

    coeff             -0.049538516229344
    coeff              0.020000000000000     !
    coeff             -0.037283956168153       (-0.5807488086366478,None)
    coeff              0.036340835226155     ! (None,0.661243603948984)

In this example, the first coefficient is free. The second one is fixed at 0.2. The third one will be estimated but has a lower bound. In the fourth case, the parameter is fixed and the bounds will be ignored.

If you specify bounds for any free parameter, you have to choose a constraint optimizer such as SCIPY-LBFGSB or FORT-BOBYQA.

Dataset
-------

To use respy, you need a dataset with the following columns:

- Identifier: identifies the different individuals in the sample
- Period: identifies the different rounds of observation for each individual
- Choice: an integer variable that indicates the labor market choice
    - 1 = Occupation A
    - 2 = Occupation B
    - 3 = Education
    - 4 = Home
- Earnings: a float variable that indicates how much people are earning. This variable is missing (indicated by a dot) if individuals don't work.
- Experience_A: labor market experience in sector A
- Experience_B: labor market experience in sector B
- Years_Schooling: years of schooling
- Lagged_Choice: choice in the period before the model starts. Codes are the same as in Choice.

Datasets for respy are stored in simple text files, where columns are separated by spaces. The easiest way to write such a text file in Python is to create a pandas DataFrame with all relevant columns and then storing it in the following way:

.. code::

    with open('my_data.respy.dat', 'w') as file:
        df.to_string(file, index=False, header=True, na_rep='.')


Examples
--------

Let us explore the basic capabilities of the ``respy`` package with a couple of examples. All the material is available `online <https://github.com/OpenSourceEconomics/respy/tree/janosg/respy/tests/resources>`__.

**Simulation and Estimation**

We always first initialize an instance of the ``RespyCls`` by passing in the path to the initialization file.
::

    from respy import RespyCls

    respy_obj = RespyCls('example.ini')

Now we can simulate a sample from the specified model.
::

    respy_obj.simulate()

During the simulation, several files will appear in the current working directory. ``sol.respy.log`` allows to monitor the progress of the solution algorithm, while ``sim.respy.log`` records the progress of the simulation. The simulated dataset with the agents' choices and state experiences is stored in ``data.respy.dat``, ``data.respy.info`` provides some basic descriptives about the simulated dataset. See our section on :ref:`Additional Details <additional-details>` for more information regarding the output files.

Now that we simulated some data, we can start an estimation. Here we are using the simulated data for the estimation. However, you can of course also use other data sources. Just make sure they follow the layout of the simulated sample. The coefficient values in the initialization file serve as the starting values.
::

    x, crit_val = respy_obj.fit()

This directly returns the value of the coefficients at the final step of the optimizer as well as the value of the criterion function. However, some additional files appear in the meantime. Monitoring the estimation is best done using ``est.respy.info`` and more details about each evaluation of the criterion function are available in ``est.respy.log``.

We can now simulate a sample using the estimated parameters by updating the instance of the ``RespyCls``.
::

    respy_obj.update_model_paras(x)

    respy_obj.simulate()

**Recomputing Keane and Wolpin (1994)**

Just using the capabilities outlined so far, it is straightforward to recompute some of the key results in the original paper with a simple script.
::

    #!/usr/bin/env python
    """ This module recomputes some of the key results of Keane and Wolpin (1994).
    """

    from respy import RespyCls

    # We can simply iterate over the different model specifications outlined in
    # Table 1 of their paper.
    for spec in ['kw_data_one.ini', 'kw_data_two.ini', 'kw_data_three.ini']:

        # Process relevant model initialization file
        respy_obj = RespyCls(spec)

        # Let us simulate the datasets discussed on the page 658.
        respy_obj.simulate()

        # To start estimations for the Monte Carlo exercises. For now, we just
        # evaluate the model at the starting values, i.e. maxfun set to zero in
        # the initialization file.
        respy_obj.unlock()
        respy_obj.set_attr('maxfun', 0)
        respy_obj.lock()

        respy_obj.fit()

In an earlier `working paper  <https://www.minneapolisfed.org/research/staff-reports/the-solution-and-estimation-of-discrete-choice-dynamic-programming-models-by-simulation-and-interpolation-monte-carlo-evidence>`_, Keane and Wolpin (1994b) provide a full account of the choice distributions for all three specifications. The results from the recomputation line up well with their reports.