41. Machine Learning a Ramsey Plan#

41.1. Introduction#

This lecture uses what we call a machine learning approach to compute a Ramsey plan for a version of a model of Calvo [Calvo, 1978].

We use another approach to compute a Ramsey plan for Calvo’s model in another quantecon lecture Time Inconsistency of Ramsey Plans.

The Time Inconsistency of Ramsey Plans lecture uses an analytic approach based on dynamic programming squared to guide computations.

Dynamic programming squared provides information about the structure of mathematical objects in terms of which a Ramsey plan can be represented recursively.

Using that information paves the way to computing a Ramsey plan efficiently.

Included in the structural information that dynamic programming squared provides in quantecon lecture Time Inconsistency of Ramsey Plans are

  • a state variable that confronts a continuation Ramsey planner, and

  • two Bellman equations

    • one that describes the behavior of the representative agent

    • another that describes decision problems of a Ramsey planner and of a continuation Ramsey planner

In this lecture, we approach the Ramsey planner in a less sophisticated way that proceeds without knowing the mathematical structure imparted by dynamic programming squared.

We simply choose a pair of infinite sequences of real numbers that maximizes a Ramsey planner’s objective function.

The pair consists of

  • a sequence \(\vec \theta\) of inflation rates

  • a sequence \(\vec \mu\) of money growh rates

Because it fails to take advantage of the structure recognized by dynamic programming squared and, relative to the dynamic programming squared approach, proliferates parameters, we take the liberty of calling this a machine learning approach.

This is similar to what other machine learning algorithms also do.

Comparing the calculations in this lecture with those in our sister lecture Time Inconsistency of Ramsey Plans provides us with a laboratory that can help us appreciate promises and limits of machine learning approaches more generally.

In this lecture, we’ll actually deploy two machine learning approaches.

  • the first is really lazy

    • it writes a Python function that computes the Ramsey planner’s objective as a function of a money growth rate sequence and hands it over to a gradient descent optimizer

  • the second is less lazy

    • it exerts the mental effort required to express the Ramsey planner’s objective as an affine quadratic form in \(\vec \mu\), computes first-order conditions for an optimum, arranges them into a system of simultaneous linear equations for \(\vec \mu\) and then \(\vec \theta\), then solves them.

Each of these machine learning (ML) approaches recovers the same Ramsey plan that shall compute in quantecon lecture Time Inconsistency of Ramsey Plans by using dynamic programming squared.

However, they conceal the recursive structure of the Ramsey plan.

That recursive structure lies hidden within some of the objects calculated by our ML approaches.

Nevertheless, we can ferret out some of that structure by asking the right questions.

We pose those questions at the end of this lecture and answer them by running particulars some linear regressions on components of \(\vec \mu, \vec \theta\).

Human intelligence, not the artificial intelligence deployed in our machine learning approach, is a key input into choosing which regressions to run.

41.2. The Model#

We study a linear-quadratic version of a model that Guillermo Calvo [Calvo, 1978] used to illustrate the time inconsistency of optimal government plans.

The model focuses on intertemporal tradeoffs between

  • utility that a representative agent’s anticipations of future deflation delivered by lowering the agent’s cost of holding real money balances and thereby increasing the agent’s liquidity, as ultimately measured by the agent’s holdings of real money balances, and

  • social costs associated with the distorting taxes that a government levies to acquire the paper money that it destroys in order to generate prospective deflation

The model features

  • rational expectations

  • costly government actions at all dates \(t \geq 1\) that increase household utilities at dates before \(t\)

The model combines ideas from papers by Cagan [Cagan, 1956], [Sargent and Wallace, 1973], and Calvo [Calvo, 1978].

41.3. Model Components#

There is no uncertainty.

Let:

  • \(p_t\) be the log of the price level

  • \(m_t\) be the log of nominal money balances

  • \(\theta_t = p_{t+1} - p_t\) be the net rate of inflation between \(t\) and \(t+1\)

  • \(\mu_t = m_{t+1} - m_t\) be the net rate of growth of nominal balances

The demand for real balances is governed by a perfect foresight version of a Cagan [Cagan, 1956] demand function for real balances:

(41.1)#\[m_t - p_t = -\alpha(p_{t+1} - p_t) \: , \: \alpha > 0\]

for \(t \geq 0\).

Equation (41.1) asserts that the representative agent’s demand for real balances is inversely related to the representative agent’s expected rate of inflation, which equals the actual rate of inflation because there is no uncertainty here.

(When there is no uncertainty, an assumption of rational expectations becomes equivalent to perfect foresight).

Subtracting the demand function (41.1) at time \(t\) from the demand function at \(t+1\) gives:

\[ \mu_t - \theta_t = -\alpha \theta_{t+1} + \alpha \theta_t \]

or

(41.2)#\[\theta_t = \frac{\alpha}{1+\alpha} \theta_{t+1} + \frac{1}{1+\alpha} \mu_t\]

Because \(\alpha > 0\), \(0 < \frac{\alpha}{1+\alpha} < 1\).

Definition 41.1

For scalar \(b_t\), let \(L^2\) be the space of sequences \(\{b_t\}_{t=0}^\infty\) that satisfy

\[ \sum_{t=0}^\infty b_t^2 < +\infty \]

We say that a sequence that belongs to \(L^2\) is square summable.

When we assume that the sequence \(\vec \mu = \{\mu_t\}_{t=0}^\infty\) is square summable and we require that the sequence \(\vec \theta = \{\theta_t\}_{t=0}^\infty\) is square summable, the linear difference equation (41.2) can be solved forward to get:

(41.3)#\[\theta_t = \frac{1}{1+\alpha} \sum_{j=0}^\infty \left(\frac{\alpha}{1+\alpha}\right)^j \mu_{t+j}, \quad t \geq 0\]

The government values a representative household’s utility of real balances at time \(t\) according to the utility function

(41.4)#\[U(m_t - p_t) = u_0 + u_1 (m_t - p_t) - \frac{u_2}{2} (m_t - p_t)^2, \quad u_0 > 0, u_1 > 0, u_2 > 0\]

The money demand function (41.1) and the utility function (41.4) imply that

(41.5)#\[ U(-\alpha \theta_t) = u_0 + u_1 (-\alpha \theta_t) -\frac{u_2}{2}(-\alpha \theta_t)^2 . \]

Note

The “bliss level” of real balances is \(\frac{u_1}{u_2}\); the inflation rate that attains it is \(-\frac{u_1}{u_2 \alpha}\).

Via equation (41.3), a government plan \(\vec \mu = \{\mu_t \}_{t=0}^\infty\) leads to a sequence of inflation rates \(\vec \theta = \{ \theta_t \}_{t=0}^\infty\).

We assume that the government incurs social costs \(\frac{c}{2} \mu_t^2\) at \(t\) when it changes the stock of nominal money balances at rate \(\mu_t\).

Therefore, the one-period welfare function of a benevolent government is

\[ s(\theta_t,\mu_t) = U(-\alpha \theta_t) - \frac{c}{2} \mu_t^2 . \]

The Ramsey planner’s criterion is

(41.6)#\[ V = \sum_{t=0}^\infty \beta^t s(\theta_t, \mu_t) \]

where \(\beta \in (0,1)\) is a discount factor.

The Ramsey planner chooses a vector of money growth rates \(\vec \mu\) to maximize criterion (41.6) subject to equations (41.3) and a restriction requiring that

(41.7)#\[ \vec \theta \in L^2 \]

Notice equations (41.3) and (41.7) imply that \(\vec \theta\) is a function of \(\vec \mu\).

In particular, the inflation rate \(\theta_t\) satisfies

(41.8)#\[ \theta_t = (1-\lambda) \sum_{j=0}^\infty \lambda^j \mu_{t+j}, \quad t \geq 0 \]

where

\[ \lambda = \frac{\alpha}{1+\alpha} . \]

41.4. Parameters and Variables#

Parameters:

  • Demand for money parameter is \(\alpha > 0\); we set its default value \(\alpha = 1\)

    • Induced demand function for money parameter is \(\lambda = \frac{\alpha}{1+\alpha}\)

  • Utility function parameters are \(u_0, u_1, u_2 \) and \(\beta \in (0,1)\)

  • Cost parameter of tax distortions associated with setting \(\mu_t \neq 0\) is \(c\)

  • Truncation parameter: a positive integer \(T >0\)

Variables:

  • \(\theta_t = p_{t+1} - p_t\) where \(p_t\) is log of price level

  • \(\mu_t = m_{t+1} - m_t \) where \(m_t\) is log of money supply

41.4.1. Basic Objects#

To prepare the way for our calculations, we’ll remind ourselves of the mathematical objects in play.

  • sequences of inflation rates and money creation rates:

\[ (\vec \theta, \vec \mu) = \{\theta_t, \mu_t\}_{t=0}^\infty \]
  • A planner’s value function

(41.9)#\[ V = \sum_{t=0}^\infty \beta^t (h_0 + h_1 \theta_t + h_2 \theta_t^2 - \frac{c}{2} \mu_t^2 ) \]

where we set \(h_0, h_1, h_2\) to match

\[ u_0 + u_1(-\alpha \theta_t) - \frac{u_2}{2} (-\alpha \theta_t)^2 \]

with

\[ h_0 + h_1 \theta_t + h_2 \theta_t^2 \]

To make them match, we set

\[ \begin{aligned} h_0 & = u_0 \cr h_1 & = -\alpha u_1 \cr h_2 & = - \frac{u_2 \alpha^2}{2} \end{aligned} \]

A Ramsey planner chooses \(\vec \mu\) to maximize the government’s value function (41.9) subject to equations (41.8).

A solution \(\vec \mu\) of this problem is called a Ramsey plan.

41.5. Approximations#

We anticipate that under a Ramsey plan the sequences \(\{\theta_t\}\) and \(\{\mu_t\}\) both converge to stationary values.

Thus, we guess that under the optimal policy \( \lim_{t \rightarrow + \infty} \mu_t = \bar \mu\).

Convergence of \(\mu_t\) to \(\bar \mu\) together with formula (41.8) for the inflation rate then implies that \( \lim_{t \rightarrow + \infty} \theta_t = \bar \mu\) as well.

Consequently, we’ll guess a time \(T\) large enough that \(\mu_t\) has gotten very close to the limit \(\bar \mu\).

Then we’ll approximate \(\vec \mu\) by a truncated vector with the property that

\[ \mu_t = \bar \mu \quad \forall t \geq T \]

We’ll approximate \(\vec \theta\) with a truncated vector with the property that

\[ \theta_t = \bar \theta \quad \forall t \geq T \]

Formula for truncated \(\vec \theta\)

In light of our approximation that \(\mu_t = \bar \mu\) for all \(t \geq T\), we now seek a function that takes

\[ \tilde \mu = \begin{bmatrix}\mu_0 & \mu_1 & \cdots & \mu_{T-1} & \bar \mu \end{bmatrix} \]

as an input and as an output gives

\[ \tilde \theta = \begin{bmatrix}\theta_0 & \theta_1 & \cdots & \theta_{T-1} & \bar \theta \end{bmatrix} \]

where \(\theta_t\) satisfies

(41.10)#\[ \theta_t = (1-\lambda) \sum_{j=0}^{T-1-t} \lambda^j \mu_{t+j} + \lambda^{T-t} \bar \mu \]

for \(t=0, 1, \ldots, T-1\) and \(\bar \theta = \bar \mu\).

Formula for \(V\)

Having specified a truncated vector \(\tilde \mu\) and and having computed \(\tilde \theta\) by using formula (41.10), we want to write a Python function that computes

(41.11)#\[ \tilde V = \sum_{t=0}^\infty \beta^t ( h_0 + h_1 \tilde\theta_t + h_2 \tilde\theta_t^2 - \frac{c}{2} \mu_t^2 ) \]

or more precisely

\[ \tilde V = \sum_{t=0}^{T-1} \beta^t (h_0 + h_1 \tilde\theta_t + h_2 \tilde\theta_t^2 - \frac{c}{2} \mu_t^2 ) + \frac{\beta^T}{1-\beta} (h_0 + h_1 \bar \mu + h_2 \bar \mu^2 - \frac{c}{2} \bar \mu^2 ) \]

where \(\tilde \theta_t, \ t = 0, 1, \ldots , T-1\) satisfies formula (1).

41.6. A Gradient Descent Algorithm#

We now describe code that maximizes the criterion function (41.9) subject to equations (41.8) by choice of the truncated vector \(\tilde \mu\).

We use a brute force or machine learning approach that just hands our problem off to code that minimizes \(V\) with respect to the components of \(\tilde \mu\) by using gradient descent.

We hope that answers will agree with those found obtained by other more structured methods in this quantecon lecture Time Inconsistency of Ramsey Plans.

41.6.1. Implementation#

We will implement the above in Python using JAX and Optax libraries.

We use the following imports in this lecture

!pip install --upgrade quantecon
!pip install --upgrade optax
!pip install --upgrade statsmodels
Hide code cell output
Requirement already satisfied: quantecon in /home/runner/miniconda3/envs/quantecon/lib/python3.11/site-packages (0.7.2)
Requirement already satisfied: numba>=0.49.0 in /home/runner/miniconda3/envs/quantecon/lib/python3.11/site-packages (from quantecon) (0.60.0)
Requirement already satisfied: numpy>=1.17.0 in /home/runner/miniconda3/envs/quantecon/lib/python3.11/site-packages (from quantecon) (1.26.4)
Requirement already satisfied: requests in /home/runner/miniconda3/envs/quantecon/lib/python3.11/site-packages (from quantecon) (2.31.0)
Requirement already satisfied: scipy>=1.5.0 in /home/runner/miniconda3/envs/quantecon/lib/python3.11/site-packages (from quantecon) (1.11.4)
Requirement already satisfied: sympy in /home/runner/miniconda3/envs/quantecon/lib/python3.11/site-packages (from quantecon) (1.12)
Requirement already satisfied: llvmlite<0.44,>=0.43.0dev0 in /home/runner/miniconda3/envs/quantecon/lib/python3.11/site-packages (from numba>=0.49.0->quantecon) (0.43.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /home/runner/miniconda3/envs/quantecon/lib/python3.11/site-packages (from requests->quantecon) (2.0.4)
Requirement already satisfied: idna<4,>=2.5 in /home/runner/miniconda3/envs/quantecon/lib/python3.11/site-packages (from requests->quantecon) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in /home/runner/miniconda3/envs/quantecon/lib/python3.11/site-packages (from requests->quantecon) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /home/runner/miniconda3/envs/quantecon/lib/python3.11/site-packages (from requests->quantecon) (2024.2.2)
Requirement already satisfied: mpmath>=0.19 in /home/runner/miniconda3/envs/quantecon/lib/python3.11/site-packages (from sympy->quantecon) (1.3.0)
Collecting optax
  Downloading optax-0.2.3-py3-none-any.whl.metadata (8.3 kB)
Collecting absl-py>=0.7.1 (from optax)
  Downloading absl_py-2.1.0-py3-none-any.whl.metadata (2.3 kB)
Collecting chex>=0.1.86 (from optax)
  Downloading chex-0.1.87-py3-none-any.whl.metadata (17 kB)
Collecting jax>=0.4.27 (from optax)
  Downloading jax-0.4.34-py3-none-any.whl.metadata (22 kB)
Collecting jaxlib>=0.4.27 (from optax)
  Downloading jaxlib-0.4.34-cp311-cp311-manylinux2014_x86_64.whl.metadata (983 bytes)
Requirement already satisfied: numpy>=1.18.0 in /home/runner/miniconda3/envs/quantecon/lib/python3.11/site-packages (from optax) (1.26.4)
Collecting etils[epy] (from optax)
  Downloading etils-1.9.4-py3-none-any.whl.metadata (6.4 kB)
Requirement already satisfied: typing-extensions>=4.2.0 in /home/runner/miniconda3/envs/quantecon/lib/python3.11/site-packages (from chex>=0.1.86->optax) (4.9.0)
Requirement already satisfied: toolz>=0.9.0 in /home/runner/miniconda3/envs/quantecon/lib/python3.11/site-packages (from chex>=0.1.86->optax) (0.12.0)
Collecting ml-dtypes>=0.2.0 (from jax>=0.4.27->optax)
  Downloading ml_dtypes-0.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (21 kB)
Collecting opt-einsum (from jax>=0.4.27->optax)
  Downloading opt_einsum-3.4.0-py3-none-any.whl.metadata (6.3 kB)
Requirement already satisfied: scipy>=1.10 in /home/runner/miniconda3/envs/quantecon/lib/python3.11/site-packages (from jax>=0.4.27->optax) (1.11.4)
Downloading optax-0.2.3-py3-none-any.whl (289 kB)
?25l   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/289.6 kB ? eta -:--:--
   ━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━ 163.8/289.6 kB 4.7 MB/s eta 0:00:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 289.6/289.6 kB 6.4 MB/s eta 0:00:00
?25hDownloading absl_py-2.1.0-py3-none-any.whl (133 kB)
?25l   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/133.7 kB ? eta -:--:--
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 133.7/133.7 kB 39.2 MB/s eta 0:00:00
?25hDownloading chex-0.1.87-py3-none-any.whl (99 kB)
?25l   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/99.4 kB ? eta -:--:--
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.4/99.4 kB 30.4 MB/s eta 0:00:00
?25hDownloading jax-0.4.34-py3-none-any.whl (2.1 MB)
?25l   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/2.1 MB ? eta -:--:--
   ━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━ 1.0/2.1 MB 36.9 MB/s eta 0:00:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.1/2.1 MB 41.2 MB/s eta 0:00:00
?25hDownloading jaxlib-0.4.34-cp311-cp311-manylinux2014_x86_64.whl (86.1 MB)
?25l   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/86.1 MB ? eta -:--:--
   ━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.9/86.1 MB 170.6 MB/s eta 0:00:01
   ━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.0/86.1 MB 159.6 MB/s eta 0:00:01
   ━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 15.3/86.1 MB 163.9 MB/s eta 0:00:01
   ━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 20.7/86.1 MB 163.2 MB/s eta 0:00:01
   ━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 25.8/86.1 MB 155.8 MB/s eta 0:00:01
   ━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━ 30.7/86.1 MB 140.4 MB/s eta 0:00:01
   ━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━ 35.5/86.1 MB 141.9 MB/s eta 0:00:01
   ━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━━━━━━━━ 40.5/86.1 MB 143.5 MB/s eta 0:00:01
   ━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━ 45.6/86.1 MB 146.9 MB/s eta 0:00:01
   ━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━ 50.6/86.1 MB 147.0 MB/s eta 0:00:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━━━━ 55.8/86.1 MB 147.6 MB/s eta 0:00:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━ 60.9/86.1 MB 152.2 MB/s eta 0:00:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━ 66.2/86.1 MB 152.6 MB/s eta 0:00:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━ 71.6/86.1 MB 154.8 MB/s eta 0:00:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━ 77.1/86.1 MB 157.2 MB/s eta 0:00:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━ 82.6/86.1 MB 159.3 MB/s eta 0:00:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 86.1/86.1 MB 154.2 MB/s eta 0:00:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 86.1/86.1 MB 154.2 MB/s eta 0:00:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 86.1/86.1 MB 154.2 MB/s eta 0:00:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 86.1/86.1 MB 154.2 MB/s eta 0:00:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 86.1/86.1 MB 154.2 MB/s eta 0:00:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 86.1/86.1 MB 154.2 MB/s eta 0:00:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 86.1/86.1 MB 154.2 MB/s eta 0:00:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 86.1/86.1 MB 35.0 MB/s eta 0:00:00
?25hDownloading ml_dtypes-0.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.5 MB)
?25l   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/4.5 MB ? eta -:--:--
   ━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━ 2.9/4.5 MB 144.4 MB/s eta 0:00:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 4.5/4.5 MB 93.3 MB/s eta 0:00:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 4.5/4.5 MB 61.3 MB/s eta 0:00:00
?25hDownloading etils-1.9.4-py3-none-any.whl (164 kB)
?25l   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/164.3 kB ? eta -:--:--
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 164.3/164.3 kB 30.6 MB/s eta 0:00:00
?25h
Downloading opt_einsum-3.4.0-py3-none-any.whl (71 kB)
?25l   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/71.9 kB ? eta -:--:--
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 71.9/71.9 kB 24.7 MB/s eta 0:00:00
?25h
Installing collected packages: opt-einsum, ml-dtypes, etils, absl-py, jaxlib, jax, chex, optax
Successfully installed absl-py-2.1.0 chex-0.1.87 etils-1.9.4 jax-0.4.34 jaxlib-0.4.34 ml-dtypes-0.5.0 opt-einsum-3.4.0 optax-0.2.3
Requirement already satisfied: statsmodels in /home/runner/miniconda3/envs/quantecon/lib/python3.11/site-packages (0.14.0)
Collecting statsmodels
  Downloading statsmodels-0.14.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (9.2 kB)
Requirement already satisfied: numpy<3,>=1.22.3 in /home/runner/miniconda3/envs/quantecon/lib/python3.11/site-packages (from statsmodels) (1.26.4)
Requirement already satisfied: scipy!=1.9.2,>=1.8 in /home/runner/miniconda3/envs/quantecon/lib/python3.11/site-packages (from statsmodels) (1.11.4)
Requirement already satisfied: pandas!=2.1.0,>=1.4 in /home/runner/miniconda3/envs/quantecon/lib/python3.11/site-packages (from statsmodels) (2.1.4)
Collecting patsy>=0.5.6 (from statsmodels)
  Downloading patsy-0.5.6-py2.py3-none-any.whl.metadata (3.5 kB)
Requirement already satisfied: packaging>=21.3 in /home/runner/miniconda3/envs/quantecon/lib/python3.11/site-packages (from statsmodels) (23.1)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/runner/miniconda3/envs/quantecon/lib/python3.11/site-packages (from pandas!=2.1.0,>=1.4->statsmodels) (2.8.2)
Requirement already satisfied: pytz>=2020.1 in /home/runner/miniconda3/envs/quantecon/lib/python3.11/site-packages (from pandas!=2.1.0,>=1.4->statsmodels) (2023.3.post1)
Requirement already satisfied: tzdata>=2022.1 in /home/runner/miniconda3/envs/quantecon/lib/python3.11/site-packages (from pandas!=2.1.0,>=1.4->statsmodels) (2023.3)
Requirement already satisfied: six in /home/runner/miniconda3/envs/quantecon/lib/python3.11/site-packages (from patsy>=0.5.6->statsmodels) (1.16.0)
Downloading statsmodels-0.14.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (10.8 MB)
?25l   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/10.8 MB ? eta -:--:--
   ╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.2/10.8 MB 6.3 MB/s eta 0:00:02
   ━━━━━━╸━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/10.8 MB 26.4 MB/s eta 0:00:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸━━━━━━━━━━━ 7.7/10.8 MB 75.1 MB/s eta 0:00:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╸ 10.8/10.8 MB 147.5 MB/s eta 0:00:01
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 10.8/10.8 MB 106.3 MB/s eta 0:00:00
?25h
Downloading patsy-0.5.6-py2.py3-none-any.whl (233 kB)
?25l   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 0.0/233.9 kB ? eta -:--:--
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 233.9/233.9 kB 58.5 MB/s eta 0:00:00
?25h
Installing collected packages: patsy, statsmodels
  Attempting uninstall: patsy
    Found existing installation: patsy 0.5.3
    Uninstalling patsy-0.5.3:
      Successfully uninstalled patsy-0.5.3
  Attempting uninstall: statsmodels
    Found existing installation: statsmodels 0.14.0
    Uninstalling statsmodels-0.14.0:
      Successfully uninstalled statsmodels-0.14.0
Successfully installed patsy-0.5.6 statsmodels-0.14.4
from quantecon import LQ
import numpy as np
import jax.numpy as jnp
from jax import jit, grad
import optax
import statsmodels.api as sm
import matplotlib.pyplot as plt

We’ll eventually want to compare the results we obtain here to those that we obtain in those obtained in this quantecon lecture Time Inconsistency of Ramsey Plans.

To enable us to do that, we copy the class ChangLQ used in that lecture.

We hide the cell that copies the class, but readers can find details of the class in this quantecon lecture Time Inconsistency of Ramsey Plans.

Hide code cell source
class ChangLQ:
    """
    Class to solve LQ Chang model
    """
    def __init__(self, β, c, α=1, u0=1, u1=0.5, u2=3, T=1000, θ_n=200):
        # Record parameters
        self.α, self.u0, self.u1, self.u2 = α, u0, u1, u2
        self.β, self.c, self.T, self.θ_n = β, c, T, θ_n

        self.setup_LQ_matrices()
        self.solve_LQ_problem()
        self.compute_policy_functions()
        self.simulate_ramsey_plan()
        self.compute_θ_range()
        self.compute_value_and_policy()

    def setup_LQ_matrices(self):
        # LQ Matrices
        self.R = -np.array([[self.u0, -self.u1 * self.α / 2],
                            [-self.u1 * self.α / 2, 
                             -self.u2 * self.α**2 / 2]])
        self.Q = -np.array([[-self.c / 2]])
        self.A = np.array([[1, 0], [0, (1 + self.α) / self.α]])
        self.B = np.array([[0], [-1 / self.α]])

    def solve_LQ_problem(self):
        # Solve LQ Problem (Subproblem 1)
        lq = LQ(self.Q, self.R, self.A, self.B, beta=self.β)
        self.P, self.F, self.d = lq.stationary_values()

        # Compute g0, g1, and g2 (41.16)
        self.g0, self.g1, self.g2 = [-self.P[0, 0], 
                                     -2 * self.P[1, 0], -self.P[1, 1]]
        
        # Compute b0 and b1 (41.17)
        [[self.b0, self.b1]] = self.F

        # Compute d0 and d1 (41.18)
        self.cl_mat = (self.A - self.B @ self.F)  # Closed loop matrix
        [[self.d0, self.d1]] = self.cl_mat[1:]

        # Solve Subproblem 2
        self.θ_R = -self.P[0, 1] / self.P[1, 1]
        
        # Find the bliss level of θ
        self.θ_B = -self.u1 / (self.u2 * self.α)

    def compute_policy_functions(self):
        # Solve the Markov Perfect Equilibrium
        self.μ_MPE = -self.u1 / ((1 + self.α) / self.α * self.c 
                                 + self.α / (1 + self.α)
                                 * self.u2 + self.α**2 
                                 / (1 + self.α) * self.u2)
        self.θ_MPE = self.μ_MPE
        self.μ_CR = -self.α * self.u1 / (self.u2 * self.α**2 + self.c)
        self.θ_CR = self.μ_CR

        # Calculate value under MPE and CR economy
        self.J_θ = lambda θ_array: - np.array([1, θ_array]) \
                                   @ self.P @ np.array([1, θ_array]).T
        self.V_θ = lambda θ: (self.u0 + self.u1 * (-self.α * θ)
                              - self.u2 / 2 * (-self.α * θ)**2 
                              - self.c / 2 * θ**2) / (1 - self.β)
        
        self.J_MPE = self.V_θ(self.μ_MPE)
        self.J_CR = self.V_θ(self.μ_CR)

    def simulate_ramsey_plan(self):
        # Simulate Ramsey plan for large number of periods
        θ_series = np.vstack((np.ones((1, self.T)), np.zeros((1, self.T))))
        μ_series = np.zeros(self.T)
        J_series = np.zeros(self.T)
        θ_series[1, 0] = self.θ_R
        [μ_series[0]] = -self.F.dot(θ_series[:, 0])
        J_series[0] = self.J_θ(θ_series[1, 0])

        for i in range(1, self.T):
            θ_series[:, i] = self.cl_mat @ θ_series[:, i-1]
            [μ_series[i]] = -self.F @ θ_series[:, i]
            J_series[i] = self.J_θ(θ_series[1, i])

        self.J_series = J_series
        self.μ_series = μ_series
        self.θ_series = θ_series

    def compute_θ_range(self):
        # Find the range of θ in Ramsey plan
        θ_LB = min(min(self.θ_series[1, :]), self.θ_B)
        θ_UB = max(max(self.θ_series[1, :]), self.θ_MPE)
        θ_range = θ_UB - θ_LB
        self.θ_LB = θ_LB - 0.05 * θ_range
        self.θ_UB = θ_UB + 0.05 * θ_range
        self.θ_range = θ_range

    def compute_value_and_policy(self):        
        # Create the θ_space
        self.θ_space = np.linspace(self.θ_LB, self.θ_UB, 200)
        
        # Find value function and policy functions over range of θ
        self.J_space = np.array([self.J_θ(θ) for θ in self.θ_space])
        self.μ_space = -self.F @ np.vstack((np.ones(200), self.θ_space))
        x_prime = self.cl_mat @ np.vstack((np.ones(200), self.θ_space))
        self.θ_prime = x_prime[1, :]
        self.CR_space = np.array([self.V_θ(θ) for θ in self.θ_space])
        
        self.μ_space = self.μ_space[0, :]
        
        # Calculate J_range, J_LB, and J_UB
        self.J_range = np.ptp(self.J_space)
        self.J_LB = np.min(self.J_space) - 0.05 * self.J_range
        self.J_UB = np.max(self.J_space) + 0.05 * self.J_range

Now we compute the value of \(V\) under this setup, and compare it against those obtained in this section Outcomes under Three Timing Protocols of the sister quantecon lecture Time Inconsistency of Ramsey Plans.

# Assume β=0.85, c=2, T=40.
T = 40
clq = ChangLQ(β=0.85, c=2, T=T)
@jit
def compute_θ(μ, α=1):
    λ = α / (1 + α)
    T = len(μ) - 1
    μbar = μ[-1]
    
    # Create an array of powers for λ
    λ_powers = λ ** jnp.arange(T + 1)
    
    # Compute the weighted sums for all t
    weighted_sums = jnp.array(
        [jnp.sum(λ_powers[:T-t] * μ[t:T]) for t in range(T)])
    
    # Compute θ values except for the last element
    θ = (1 - λ) * weighted_sums + λ**(T - jnp.arange(T)) * μbar
    
    # Set the last element
    θ = jnp.append(θ, μbar)
    
    return θ

@jit
def compute_hs(u0, u1, u2, α):
    h0 = u0
    h1 = -u1 * α
    h2 = -0.5 * u2 * α**2
    
    return h0, h1, h2
    
@jit
def compute_V(μ, β, c, α=1, u0=1, u1=0.5, u2=3):
    θ = compute_θ(μ, α)
    
    h0, h1, h2 = compute_hs(u0, u1, u2, α)
    
    T = len(μ) - 1
    t = np.arange(T)
    
    # Compute sum except for the last element
    V_sum = np.sum(β**t * (h0 + h1 * θ[:T] + h2 * θ[:T]**2 - 0.5 * c * μ[:T]**2))
    
    # Compute the final term
    V_final = (β**T / (1 - β)) * (h0 + h1 * μ[-1] + h2 * μ[-1]**2 - 0.5 * c * μ[-1]**2)
    
    V = V_sum + V_final
    
    return V
V_val = compute_V(clq.μ_series, β=0.85, c=2)

# Check the result with the ChangLQ class in previous lecture
print(f'deviation = {np.abs(V_val - clq.J_series[0])}') # good!
deviation = 1.430511474609375e-06

Now we want to maximize the function \(V\) by choice of \(\mu\).

We will use the optax.adam from the optax library.

def adam_optimizer(grad_func, init_params, 
                   lr=0.1, 
                   max_iter=10_000, 
                   error_tol=1e-7):

    # Set initial parameters and optimizer
    params = init_params
    optimizer = optax.adam(learning_rate=lr)
    opt_state = optimizer.init(params)

    # Update parameters and gradients
    @jit
    def update(params, opt_state):
        grads = grad_func(params)
        updates, opt_state = optimizer.update(grads, opt_state)
        params = optax.apply_updates(params, updates)
        return params, opt_state, grads

    # Gradient descent loop
    for i in range(max_iter):
        params, opt_state, grads = update(params, opt_state)
        
        if jnp.linalg.norm(grads) < error_tol:
            print(f"Converged after {i} iterations.")
            break

        if i % 100 == 0: 
            print(f"Iteration {i}, grad norm: {jnp.linalg.norm(grads)}")
    
    return params

Here we use automatic differentiation functionality in JAX with grad.

# Initial guess for μ
μ_init = jnp.zeros(T)

# Maximization instead of minimization
grad_V = jit(grad(
    lambda μ: -compute_V(μ, β=0.85, c=2)))
%%time

# Optimize μ
optimized_μ = adam_optimizer(grad_V, μ_init)

print(f"optimized μ = \n{optimized_μ}")
Iteration 0, grad norm: 0.8627105951309204
Iteration 100, grad norm: 0.0033030672930181026
Iteration 200, grad norm: 1.701549808785785e-05
Converged after 277 iterations.
optimized μ = 
[-0.06450703 -0.09033976 -0.10068483 -0.10482766 -0.10648672 -0.10715112
 -0.10741719 -0.10752375 -0.10756642 -0.10758351 -0.10759035 -0.10759308
 -0.10759415 -0.10759459 -0.10759478 -0.10759482 -0.10759486 -0.10759486
 -0.10759486 -0.10759488 -0.10759487 -0.10759485 -0.10759488 -0.10759489
 -0.1075949  -0.10759487 -0.10759488 -0.10759488 -0.10759488 -0.10759489
 -0.10759491 -0.10759493 -0.10759494 -0.10759497 -0.10759497 -0.10759496
 -0.10759496 -0.10759495 -0.10759496 -0.10759499]
CPU times: user 685 ms, sys: 155 ms, total: 840 ms
Wall time: 634 ms
print(f"original μ = \n{clq.μ_series}")
original μ = 
[-0.06450708 -0.09033982 -0.10068489 -0.10482772 -0.10648677 -0.10715115
 -0.10741722 -0.10752377 -0.10756644 -0.10758352 -0.10759037 -0.10759311
 -0.1075942  -0.10759464 -0.10759482 -0.10759489 -0.10759492 -0.10759493
 -0.10759493 -0.10759494 -0.10759494 -0.10759494 -0.10759494 -0.10759494
 -0.10759494 -0.10759494 -0.10759494 -0.10759494 -0.10759494 -0.10759494
 -0.10759494 -0.10759494 -0.10759494 -0.10759494 -0.10759494 -0.10759494
 -0.10759494 -0.10759494 -0.10759494 -0.10759494]
print(f'deviation = {np.linalg.norm(optimized_μ - clq.μ_series)}')
deviation = 2.9737057616330276e-07
compute_V(optimized_μ, β=0.85, c=2)
Array(6.8357825, dtype=float32)
compute_V(clq.μ_series, β=0.85, c=2)
Array(6.835783, dtype=float32)

41.6.2. Restricting \(\mu_t = \bar \mu\) for all \(t\)#

We take a brief detour to solve a restricted version of the Ramsey problem defined above.

First, recall that a Ramsey planner chooses \(\vec \mu\) to maximize the government’s value function (41.9) subject to equations (41.8).

We now define a distinct problem in which the planner chooses \(\vec \mu\) to maximize the government’s value function (41.9) subject to equation (41.8) and the additional restriction that \(\mu_t = \bar \mu\) for all \(t\).

The solution of this problem is a time-invariant \(\mu_t\) that this quantecon lecture Time Inconsistency of Ramsey Plans calls \(\mu^{CR}\).

# Initial guess for single μ
μ_init = jnp.zeros(1)

# Maximization instead of minimization
grad_V = jit(grad(
    lambda μ: -compute_V(μ, β=0.85, c=2)))

# Optimize μ
optimized_μ_CR = adam_optimizer(grad_V, μ_init)

print(f"optimized μ = \n{optimized_μ_CR}")
Iteration 0, grad norm: 3.333333969116211
Iteration 100, grad norm: 0.0049784183502197266
Iteration 200, grad norm: 6.771087646484375e-05
Converged after 282 iterations.
optimized μ = 
[-0.10000004]

Comparing it to \(\mu^{CR}\) in Time Inconsistency of Ramsey Plans, we again obtained very close answers.

np.linalg.norm(clq.μ_CR - optimized_μ_CR)
3.7252903e-08
V_CR = compute_V(optimized_μ_CR, β=0.85, c=2)
V_CR
Array(6.8333354, dtype=float32)
compute_V(jnp.array([clq.μ_CR]), β=0.85, c=2)
Array(6.8333344, dtype=float32)

41.7. A More Structured ML Algorithm#

By thinking about the mathematical structure of the Ramsey problem and using some linear algebra, we can simplify the problem that we hand over to a machine learning algorithm.

We start by recalling that the Ramsey problem that chooses \(\vec \mu\) to maximize the government’s value function (41.9)subject to equation (41.8).

This is actually an optimization problem with a quadratic objective function and linear constraints.

First-order conditions for this problem are a set of simultaneous linear equations in \(\vec \mu\).

If we trust that the second-order conditions for a maximum are also satisfied (they are in our problem), we can compute the Ramsey plan by solving these equations for \(\vec \mu\).

We’ll apply this approach here and compare answers with what we obtained above with the gradient descent approach.

To remind us of the setting, remember that we have assumed that

\[ \mu_t = \mu_T \; \forall t \geq T \]

and that

\[ \theta_t = \theta_T = \mu_T \ \forall t \geq T \]

Again, define

\[ \vec \theta = \begin{bmatrix} \theta_0 \cr \theta_1 \cr \vdots \cr \theta_{T-1} \cr \theta_T \end{bmatrix} , \quad \vec \mu = \begin{bmatrix} \mu_0 \cr \mu_1 \cr \vdots \cr \mu_{T-1} \cr \mu_T \end{bmatrix} \]

Write the system of \(T+1\) equations (41.10) that relate \(\vec \theta\) to a choice of \(\vec \mu\) as the single matrix equation

\[ \frac{1}{(1 - \lambda)} \begin{bmatrix} 1 & -\lambda & 0 & 0 & \cdots & 0 & 0 \cr 0 & 1 & -\lambda & 0 & \cdots & 0 & 0 \cr 0 & 0 & 1 & -\lambda & \cdots & 0 & 0 \cr \vdots & \vdots & \vdots & \vdots & \vdots & -\lambda & 0 \cr 0 & 0 & 0 & 0 & \cdots & 1 & -\lambda \cr 0 & 0 & 0 & 0 & \cdots & 0 & 1-\lambda \end{bmatrix} \begin{bmatrix} \theta_0 \cr \theta_1 \cr \theta_2 \cr \vdots \cr \theta_{T-1} \cr \theta_T \end{bmatrix} = \begin{bmatrix} \mu_0 \cr \mu_1 \cr \mu_2 \cr \vdots \cr \mu_{T-1} \cr \mu_T \end{bmatrix} \]

or

\[ A \vec \theta = \vec \mu \]

or

\[ \vec \theta = B \vec \mu \]

where

\[ B = A^{-1} \]
def construct_B(α, T):
    λ = α / (1 + α)
    
    A = (jnp.eye(T, T) - λ*jnp.eye(T, T, k=1))/(1-λ)
    A = A.at[-1, -1].set(A[-1, -1]*(1-λ))

    B = jnp.linalg.inv(A)
    return A, B
A, B = construct_B(α=clq.α, T=T)

print(f'A = \n {A}')
A = 
 [[ 2. -1.  0. ...  0.  0.  0.]
 [ 0.  2. -1. ...  0.  0.  0.]
 [ 0.  0.  2. ...  0.  0.  0.]
 ...
 [ 0.  0.  0. ...  2. -1.  0.]
 [ 0.  0.  0. ...  0.  2. -1.]
 [ 0.  0.  0. ...  0.  0.  1.]]
# Compute θ using optimized_μ
θs = np.array(compute_θ(optimized_μ))
μs = np.array(optimized_μ)
np.allclose(θs, B @ clq.μ_series)
True

As before, the Ramsey planner’s criterion is

\[ V = \sum_{t=0}^\infty \beta^t (h_0 + h_1 \theta_t + h_2 \theta_t^2 - \frac{c}{2} \mu_t^2 ) \]

With our assumption above, criterion \(V\) can be rewritten as

\[ \begin{aligned} V & = \sum_{t=0}^{T-1} \beta^t (h_0 + h_1 \theta_t + h_2 \theta_t^2 - \frac{c}{2} \mu_t^2 ) \cr & + \frac{\beta^T}{1-\beta} (h_0 + h_1 \theta_T + h_2 \theta_T^2 - \frac{c}{2} \mu_T^2 ) \end{aligned} \]

To help us write \(V\) as a quadratic plus affine form, define

\[\begin{split} \vec{\beta} = \begin{bmatrix} 1 \\ \beta \\ \vdots \\ \beta^{T-1} \\ \frac{\beta^T}{1-\beta} \end{bmatrix} \end{split}\]

Then we have:

\[ h_1 \sum_{t=0}^\infty \beta^t \theta_t = h_1 \cdot \vec{\beta}^T \vec{\theta} = (h_1 \cdot B^T \vec{\beta})^T \vec{\mu} = g^T \vec{\mu} \]

where \(g = h_1 \cdot B^T \vec{\beta}\) is a \((T+1) \times 1\) vector,

\[ h_2 \sum_{t=0}^\infty \beta^t \theta_t^2 = \vec{\mu}^T (B^T (h_2 \cdot \vec{\beta} \cdot \mathbf{I}) B) \vec{\mu} = \vec{\mu}^T M \vec{\mu} \]

where \(M = B^T (h_2 \cdot \vec{\beta} \cdot \mathbf{I}) B\) is a \((T+1) \times (T+1)\) matrix,

\[ \frac{c}{2} \sum_{t=0}^\infty \beta^t \mu_t^2 = \vec{\mu}^T (\frac{c}{2} \cdot \vec{\beta} \cdot \mathbf{I}) \vec{\mu} = \vec{\mu}^T F \vec{\mu} \]

where \(F = \frac{c}{2} \cdot \vec{\beta} \cdot \mathbf{I}\) is a \((T+1) \times (T+1)\) matrix

It follows that

\[\begin{split} \begin{aligned} J = V - h_0 &= \sum_{t=0}^\infty \beta^t (h_1 \theta_t + h_2 \theta_t^2 - \frac{c}{2} \mu_t^2) \\ &= g^T \vec{\mu} + \vec{\mu}^T M \vec{\mu} - \vec{\mu}^T F \vec{\mu} \\ &= g^T \vec{\mu} + \vec{\mu}^T (M - F) \vec{\mu} \\ &= g^T \vec{\mu} + \vec{\mu}^T G \vec{\mu} \end{aligned} \end{split}\]

where \(G = M - F\).

To compute the optimal government plan we want to maximize \(J\) with respect to \(\vec \mu\).

We use linear algebra formulas for differentiating linear and quadratic forms to compute the gradient of \(J\) with respect to \(\vec \mu\)

\[ \frac{\partial}{\partial \vec{\mu}} J = g + 2 G \vec{\mu}. \]

Setting \(\frac{\partial}{\partial \vec{\mu}} J = 0\), the maximizing \(\mu\) is

\[ \vec \mu^R = -\frac{1}{2}G^{-1} g \]

The associated optimal inflation sequence is

\[ \vec \theta^{R} = B \vec \mu^R \]

41.7.1. Two implementations#

With the more structured approach, we can update our gradient descent exercise with compute_J

def compute_J(μ, β, c, α=1, u0=1, u1=0.5, u2=3):
    T = len(μ) - 1
    
    h0, h1, h2 = compute_hs(u0, u1, u2, α)
    λ = α / (1 + α)
    
    _, B = construct_B(α, T+1)
    
    β_vec = jnp.hstack([β**jnp.arange(T),
                       (β**T/(1-β))])
    
    θ = B @ μ
    βθ_sum = jnp.sum((β_vec * h1) * θ)
    βθ_square_sum = β_vec * h2 * θ.T @ θ
    βμ_square_sum = 0.5 * c * β_vec * μ.T @ μ
    
    return βθ_sum + βθ_square_sum - βμ_square_sum
# Initial guess for μ
μ_init = jnp.zeros(T)

# Maximization instead of minimization
grad_J = jit(grad(
    lambda μ: -compute_J(μ, β=0.85, c=2)))
%%time

# Optimize μ
optimized_μ = adam_optimizer(grad_J, μ_init)

print(f"optimized μ = \n{optimized_μ}")
Iteration 0, grad norm: 0.8627105951309204
Iteration 100, grad norm: 0.003303033299744129
Iteration 200, grad norm: 1.6926183889154345e-05
Converged after 280 iterations.
optimized μ = 
[-0.06450712 -0.09033988 -0.10068493 -0.10482774 -0.1064868  -0.1071512
 -0.10741723 -0.10752378 -0.10756644 -0.10758355 -0.10759039 -0.10759313
 -0.10759424 -0.10759471 -0.10759488 -0.10759497 -0.10759498 -0.10759498
 -0.10759498 -0.10759497 -0.10759494 -0.10759497 -0.10759496 -0.10759497
 -0.10759497 -0.10759498 -0.107595   -0.10759494 -0.10759496 -0.10759496
 -0.10759494 -0.10759493 -0.10759493 -0.10759494 -0.10759494 -0.10759495
 -0.10759497 -0.107595   -0.10759497 -0.10759495]
CPU times: user 359 ms, sys: 184 ms, total: 543 ms
Wall time: 210 ms
print(f"original μ = \n{clq.μ_series}")
original μ = 
[-0.06450708 -0.09033982 -0.10068489 -0.10482772 -0.10648677 -0.10715115
 -0.10741722 -0.10752377 -0.10756644 -0.10758352 -0.10759037 -0.10759311
 -0.1075942  -0.10759464 -0.10759482 -0.10759489 -0.10759492 -0.10759493
 -0.10759493 -0.10759494 -0.10759494 -0.10759494 -0.10759494 -0.10759494
 -0.10759494 -0.10759494 -0.10759494 -0.10759494 -0.10759494 -0.10759494
 -0.10759494 -0.10759494 -0.10759494 -0.10759494 -0.10759494 -0.10759494
 -0.10759494 -0.10759494 -0.10759494 -0.10759494]
print(f'deviation = {np.linalg.norm(optimized_μ - clq.μ_series)}')
deviation = 2.3748542332668876e-07
V_R = compute_V(optimized_μ, β=0.85, c=2)
V_R
Array(6.8357825, dtype=float32)

We find that by exploiting more knowledge about the structure of the problem, we can significantly speed up our computation.

We can also derive a closed-form solution for \(\vec \mu\)

def compute_μ(β, c, T, α=1, u0=1, u1=0.5, u2=3):    
    h0, h1, h2 = compute_hs(u0, u1, u2, α)
    
    _, B = construct_B(α, T+1)
    
    β_vec = jnp.hstack([β**jnp.arange(T),
                       (β**T/(1-β))])
    
    g = h1 * B.T @ β_vec
    M = B.T @ (h2 * jnp.diag(β_vec)) @ B
    F = c/2 * jnp.diag(β_vec)
    G = M - F
    return jnp.linalg.solve(2*G, -g)

μ_closed = compute_μ(β=0.85, c=2, T=T-1)
print(f'closed-form μ = \n{μ_closed}')
closed-form μ = 
[-0.0645071  -0.09033982 -0.1006849  -0.1048277  -0.10648677 -0.10715113
 -0.10741723 -0.10752378 -0.10756643 -0.10758351 -0.10759034 -0.10759313
 -0.10759421 -0.10759464 -0.10759482 -0.1075949  -0.10759489 -0.10759492
 -0.10759492 -0.10759491 -0.10759495 -0.10759494 -0.10759495 -0.10759493
 -0.10759491 -0.10759491 -0.10759494 -0.10759491 -0.10759491 -0.10759495
 -0.10759498 -0.10759492 -0.10759494 -0.10759485 -0.10759497 -0.10759495
 -0.10759493 -0.10759494 -0.10759498 -0.10759494]
print(f'deviation = {np.linalg.norm(μ_closed - clq.μ_series)}')
deviation = 1.47137171779832e-07
compute_V(μ_closed, β=0.85, c=2)
Array(6.835783, dtype=float32)
print(f'deviation = {np.linalg.norm(B @ μ_closed - θs)}')
deviation = 2.6541223974163586e-07

We can check the gradient of the analytical solution against the JAX computed version

def compute_grad(μ, β, c, α=1, u0=1, u1=0.5, u2=3):    
    T = len(μ) - 1
    
    h0, h1, h2 = compute_hs(u0, u1, u2, α)
    
    _, B = construct_B(α, T+1)
    
    β_vec = jnp.hstack([β**jnp.arange(T),
                       (β**T/(1-β))])
    
    g = h1 * B.T @ β_vec
    M = (h2 * B.T @ jnp.diag(β_vec) @ B)
    F = c/2 * jnp.diag(β_vec)
    G = M - F
    return g + (2*G @ μ)

closed_grad = compute_grad(jnp.ones(T), β=0.85, c=2)
closed_grad
Array([-3.75      , -4.0625    , -3.8906252 , -3.5257816 , -3.1062894 ,
       -2.6950336 , -2.3181221 , -1.9840758 , -1.6933005 , -1.4427234 ,
       -1.2280239 , -1.0446749 , -0.8884009 , -0.7553544 , -0.64215815,
       -0.5458878 , -0.46403134, -0.39444   , -0.33528066, -0.28499195,
       -0.24224481, -0.20590894, -0.17502302, -0.14876978, -0.12645441,
       -0.10748631, -0.09136339, -0.0776589 , -0.06601007, -0.05610856,
       -0.04769228, -0.04053844, -0.03445768, -0.02928903, -0.02489567,
       -0.02116132, -0.01798713, -0.01528906, -0.0129957 , -0.07364222],      dtype=float32)
- grad_J(jnp.ones(T))
Array([-3.75      , -4.0625    , -3.890625  , -3.5257816 , -3.1062894 ,
       -2.6950336 , -2.3181224 , -1.9840759 , -1.6933005 , -1.4427235 ,
       -1.228024  , -1.0446749 , -0.8884009 , -0.7553544 , -0.6421581 ,
       -0.54588777, -0.46403137, -0.39444   , -0.33528066, -0.28499192,
       -0.24224481, -0.20590894, -0.175023  , -0.14876977, -0.12645441,
       -0.10748631, -0.0913634 , -0.0776589 , -0.06601007, -0.05610857,
       -0.04769228, -0.04053844, -0.03445768, -0.02928903, -0.02489568,
       -0.02116132, -0.01798712, -0.01528906, -0.0129957 , -0.07364222],      dtype=float32)
print(f'deviation = {np.linalg.norm(closed_grad - (- grad_J(jnp.ones(T))))}')
deviation = 4.074267394571507e-07

41.8. Some Exploratory Regressions#

We compute some least squares linear regressions of some components of \(\vec \theta\) and \(\vec \mu\) on others.

We hope that these regressions will reveal structure hidden within the \(\vec \mu^R, \vec \theta^R\) sequences associated with a Ramsey plan.

It is worth pausing to think about roles being played here by human intelligence and artificial intelligence.

Artificial intelligence in the form of some Python code and a computer is running the regressions for us.

But we are free to regress anything on anything else.

Human intelligence tells us what regressions to run.

Additional inputs of human intelligence will be required fully to appreciate what those regressions reveal about the structure of a Ramsey plan.

Note

When we eventually get around to trying to understand the regressions below, it will worthwhile to study the reasoning that let Chang [Chang, 1998] to choose \(\theta_t\) as his key state variable.

We’ll begin by simply plotting the Ramsey plan’s \(\mu_t\) and \(\theta_t\) for \(t =0, \ldots, T\) against \(t\).

These are the data on which we’ll be running some linear least squares regressions.

# Compute θ using optimized_μ
θs = np.array(compute_θ(optimized_μ))
μs = np.array(optimized_μ)

# Plot the two sequences
Ts = np.arange(T)

plt.scatter(Ts, μs, label=r'$\mu_t$', alpha=0.7)
plt.scatter(Ts, θs, label=r'$\theta_t$', alpha=0.7)
plt.xlabel(r'$t$')
plt.legend()
plt.show()
_images/fc4aeb1a86f5778056e138a0ef81fa81a7fe97c2b5cb2e624d9e0a5f521c3c38.png

Note that while \(\theta_t\) is less than \(\mu_t\)for low \(t\)’s, it eventually converges to the limit \(\bar \mu\) of \(\mu_t\) as \(t \rightarrow +\infty\).

This pattern reflects how formula (41.3) makes \(\theta_t\) be a weighted average of future \(\mu_t\)’s.

We begin by regressing \(\mu_t\) on a constant and \(\theta_t\).

This might seem strange because, after all, equation (41.3) asserts that inflation at time \(t\) is determined \(\{\mu_s\}_{s=t}^\infty\)

Nevertheless, we’ll run this regression anyway.

# First regression: μ_t on a constant and θ_t
X1_θ = sm.add_constant(θs)
model1 = sm.OLS(μs, X1_θ)
results1 = model1.fit()

# Print regression summary
print("Regression of μ_t on a constant and θ_t:")
print(results1.summary(slim=True))
Regression of μ_t on a constant and θ_t:
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
No. Observations:                  40   F-statistic:                 1.489e+13
Covariance Type:            nonrobust   Prob (F-statistic):          6.90e-222
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          0.0645   4.42e-08   1.46e+06      0.000       0.065       0.065
x1             1.5995   4.14e-07   3.86e+06      0.000       1.600       1.600
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Our regression tells us that the affine function

\[ \mu_t = .0645 + 1.5995 \theta_t \]

fits perfectly along the Ramsey outcome \(\vec \mu, \vec \theta\).

Note

Of course, this means that a regression of \(\theta_t\) on \(\mu_t\) and a constant would also fit perfectly.

Let’s plot the regression line \(\mu_t = .0645 + 1.5995 \theta_t\) and the points \((\theta_t, \mu_t)\) that lie on it for \(t=0, \ldots, T\).

plt.scatter(θs, μs, label=r'$\mu_t$')
plt.plot(θs, results1.predict(X1_θ), 'grey', label='$\hat \mu_t$', linestyle='--')
plt.xlabel(r'$\theta_t$')
plt.ylabel(r'$\mu_t$')
plt.legend()
plt.show()
_images/f4f198b6f25ec767699ac37acce57083e3db0016eeb611843d7302e0181ad8fb.png

The time \(0\) pair \((\theta_0, \mu_0)\) appears as the point on the upper right.

Points \((\theta_t, \mu_t)\) for succeeding times appear further and further to the lower left and eventually converge to \((\bar \mu, \bar \mu)\).

Next, we’ll run a linear regression of \(\theta_{t+1}\) against \(\theta_t\) and a constant.

# Second regression: θ_{t+1} on a constant and θ_t
θ_t = np.array(θs[:-1])  # θ_t
θ_t1 = np.array(θs[1:])  # θ_{t+1}
X2_θ = sm.add_constant(θ_t)  # Add a constant term for the intercept
model2 = sm.OLS(θ_t1, X2_θ)
results2 = model2.fit()

# Print regression summary
print("\nRegression of θ_{t+1} on a constant and θ_t:")
print(results2.summary(slim=True))
Regression of θ_{t+1} on a constant and θ_t:
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
No. Observations:                  39   F-statistic:                 7.775e+11
Covariance Type:            nonrobust   Prob (F-statistic):          1.41e-192
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const         -0.0645   4.84e-08  -1.33e+06      0.000      -0.065      -0.065
x1             0.4005   4.54e-07   8.82e+05      0.000       0.400       0.400
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

We find that the regression line fits perfectly and thus discover the affine relationship

\[ \theta_{t+1} = - .0645 + .4005 \theta_t \]

that prevails along the Ramsey outcome for inflation.

Let’s plot \(\theta_t\) for \(t =0, 1, \ldots, T\) along the line.

plt.scatter(θ_t, θ_t1, label=r'$\theta_{t+1}$')
plt.plot(θ_t, results2.predict(X2_θ), color='grey', label='$\hat θ_{t+1}$', linestyle='--')
plt.xlabel(r'$\theta_t$')
plt.ylabel(r'$\theta_{t+1}$')
plt.legend()

plt.tight_layout()
plt.show()
_images/b21fe4916b9ec3dc5a1db347dd1140525f867b4d3317633f006b8eac4be5c05e.png

Points for succeeding times appear further and further to the lower left and eventually converge to \(\bar \mu, \bar \mu\).

41.8.1. Continuation Values#

Next, we’ll compute a sequence \(\{v_t\}_{t=0}^T\) of what we’ll call continuation values along a Ramsey plan.

To do so, we’ll start at date \(T\) and compute

\[ v_T = \frac{1}{1-\beta} s(\bar \mu, \bar \mu). \]

Then starting from \(t=T-1\), we’ll iterate backwards on the recursion

\[ v_t = s(\theta_t, \mu_t) + \beta v_{t+1} \]

for \(t= T-1, T-2, \ldots, 0.\)

# Define function for s and U in section 41.3
def s(θ, μ, u0, u1, u2, α, c):
    U = lambda x: u0 + u1 * x - (u2 / 2) * x**2
    return U(-α*θ) - (c / 2) * μ**2

# Calculate v_t sequence backward
def compute_vt(μ, β, c, u0=1, u1=0.5, u2=3, α=1):
    T = len(μs)
    θ = compute_θ(μ, α)
    
    v_t = np.zeros(T)
    μ_bar = μs[-1]
    
    # Reduce parameters
    s_p = lambda θ, μ: s(θ, μ, 
                       u0=u0, u1=u1, u2=u2, α=α, c=c)
    
    # Define v_T
    v_t[T-1] = (1 / (1 - β)) * s_p(μ_bar, μ_bar)
    
    # Backward iteration
    for t in reversed(range(T-1)):
        v_t[t] = s_p(θ[t], μ[t]) + β * v_t[t+1]
        
    return v_t

v_t = compute_vt(μs, β=0.85, c=2)

The initial continuation value \(v_0\) should equal the optimized value of the Ramsey planner’s criterion \(V\) defined in equation (41.6).

Indeed, we find that the deviation is very small:

print(f'deviation = {np.linalg.norm(v_t[0] - V_R)}')
deviation = 4.76837158203125e-07

We can also verify approximate equality by inspecting a graph of \(v_t\) against \(t\) for \(t=0, \ldots, T\) along with the value attained by a restricted Ramsey planner \(V^{CR}\) and the optimized value of the ordinary Ramsey planner \(V^R\)

# Plot the scatter plot
plt.scatter(Ts, v_t, label='$v_t$')

# Plot horizontal lines
plt.axhline(V_CR, color='C1', alpha=0.5)
plt.axhline(V_R, color='C2', alpha=0.5)

# Add labels
plt.text(max(Ts) + max(Ts)*0.07, V_CR, '$V^{CR}$', color='C1', 
         va='center', clip_on=False, fontsize=15)
plt.text(max(Ts) + max(Ts)*0.07, V_R, '$V^R$', color='C2', 
         va='center', clip_on=False, fontsize=15)
plt.xlabel(r'$t$')
plt.ylabel(r'$v_t$')

plt.tight_layout()
plt.show()
_images/7ee8b82f426f312462ac4609dd9b0d1e9916c46f295dbd34f0facbeaf979d331.png

Fig. 41.1 Continuation values#

Figure Fig. 41.1 shows interesting patterns:

  • The sequence of continuation values \(\{v_t\}_{t=0}^T\) is monotonically decreasing

  • Evidently, \(v_0 > V^{CR} > v_T\) so that

    • the value \(v_0\) of the ordinary Ramsey plan exceeds the value \(V^{CR}\) of the special Ramsey plan in which the planner is constrained to set \(\mu_t = \mu^{CR}\) for all \(t\).

    • the continuation value \(v_T\) of the ordinary Ramsey plan for \(t \geq T\) is constant and is less than the value \(V^{CR}\) of the special Ramsey plan in which the planner is constrained to set \(\mu_t = \mu^{CR}\) for all \(t\)

Note

The continuation value \(v_T\) is what some researchers call the “value of a Ramsey plan under a time-less perspective.” A more descriptive phrase is “the value of the worst continuation Ramsey plan.”

Next we ask Python to regress \(v_t\) against a constant, \(\theta_t\), and \(\theta_t^2\).

\[ v_t = g_0 + g_1 \theta_t + g_2 \theta_t^2 . \]
# Third regression: v_t on a constant, θ_t and θ^2_t
X3_θ = np.column_stack((np.ones(T), θs, θs**2))
model3 = sm.OLS(v_t, X3_θ)
results3 = model3.fit()


# Print regression summary
print("\nRegression of v_t on a constant, θ_t and θ^2_t:")
print(results3.summary(slim=True))
Regression of v_t on a constant, θ_t and θ^2_t:
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
No. Observations:                  40   F-statistic:                 5.474e+08
Covariance Type:            nonrobust   Prob (F-statistic):          6.09e-139
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
const          6.8052   5.91e-06   1.15e+06      0.000       6.805       6.805
x1            -0.7581      0.000  -6028.976      0.000      -0.758      -0.758
x2            -4.6996      0.001  -7131.888      0.000      -4.701      -4.698
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 3.5e+04. This might indicate that there are
strong multicollinearity or other numerical problems.

The regression has an \(R^2\) equal to \(1\) and so fits perfectly.

However, notice the warning about the high condition number.

As indicated in the printout, this is a consequence of \(\theta_t\) and \(\theta_t^2\) being highly correlated along the Ramsey plan.

np.corrcoef(θs, θs**2)
array([[ 1.        , -0.99942156],
       [-0.99942156,  1.        ]])

Let’s plot \(v_t\) against \(\theta_t\) along with the nonlinear regression line.

θ_grid = np.linspace(min(θs), max(θs), 100)
X3_grid = np.column_stack((np.ones(len(θ_grid)), θ_grid, θ_grid**2))

plt.scatter(θs, v_t)
plt.plot(θ_grid, results3.predict(X3_grid), color='grey', 
         label='$\hat v_t$', linestyle='--')
plt.axhline(V_CR, color='C1', alpha=0.5)

plt.text(max(θ_grid) - max(θ_grid)*0.025, V_CR, '$V^{CR}$', color='C1', 
         va='center', clip_on=False, fontsize=15)

plt.xlabel(r'$\theta_{t}$')
plt.ylabel(r'$v_t$')
plt.legend()

plt.tight_layout()
plt.show()
_images/416ee286592c3f58bacde0a62d2ff3bfacf9574cb62328009f31c41f7c8a2301.png

The highest continuation value \(v_0\) at \(t=0\) appears at the peak of the function quadratic function \(g_0 + g_1 \theta_t + g_2 \theta_t^2\).

Subsequent values of \(v_t\) for \(t \geq 1\) appear to the lower left of the pair \((\theta_0, v_0)\) and converge monotonically from above to \(v_T\) at time \(T\).

The value \(V^{CR}\) attained by the Ramsey plan that is restricted to be a constant \(\mu_t = \mu^{CR}\) sequence appears as a horizontal line.

Evidently, continuation values \(v_t > V^{CR}\) for \(t=0, 1, 2\) while \(v_t < V^{CR}\) for \(t \geq 3\).

41.9. What has Machine Learning Taught Us?#

Our regressions tells us that along the Ramsey outcome \(\vec \mu^R, \vec \theta^R\), the linear function

\[ \mu_t = .0645 + 1.5995 \theta_t \]

fits perfectly and that so do the regression lines

\[ \theta_{t+1} = - .0645 + .4005 \theta_t \]
\[ v_t = 6.8052 - .7580 \theta_t - 4.6991 \theta_t^2. \]

Assembling these regressions, we have discovered run for our single Ramsey outcome path \(\vec \mu^R, \vec \theta^R\) that along a Ramsey plan, the following relationships prevail:

(41.12)#\[\begin{split}\begin{aligned} \theta_0 & = \theta_0^R \\ \mu_t & = b_0 + b_1 \theta_t \\ \theta_{t+1} & = d_0 + d_1 \theta_t \\ \end{aligned}\end{split}\]

where the initial value \(\theta_0^R\) was computed along with other components of \(\vec \mu^R, \vec \theta^R\) when we computed the Ramsey plan, and where \(b_0, b_1, d_0, d_1\) are parameters whose values we estimated with our regressions.

In addition, we learned that continuation values are described by the quadratic function

\[ v_t = g_0 + g_1 \theta_t + g_2 \theta_t^2 \]

We discovered these relationships by running some carefully chosen regressions and staring at the results, noticing that the \(R^2\)’s of unity tell us that the fits are perfect.

We have learned much about the structure of the Ramsey problem.

However, by using the methods and ideas that we have deployed in this lecture, it is challenging to say more.

There are many other linear regressions among components of \(\vec \mu^R, \theta^R\) that would also have given us perfect fits.

For example, we could have regressed \(\theta_t\) on \(\mu_t\) and obtained the same \(R^2\) value.

Actually, wouldn’t that direction of fit have made more sense?

After all, the Ramsey planner chooses \(\vec \mu\), while \(\vec \theta\) is an outcome that reflects the represenative agent’s response to the Ramsey planner’s choice of \(\vec \mu\).

Isn’t it more natural then to expect that we’d learn more about the structure of the Ramsey problem from a regression of components of \(\vec \theta\) on components of \(\vec \mu\)?

To answer these questions, we’ll have to deploy more economic theory.

We do that in this quantecon lecture Time Inconsistency of Ramsey Plans.

There, we’ll discover that system (41.12) is actually a very good way to represent a Ramsey plan because it reveals many things about its structure.

Indeed, in that lecture, we show how to compute the Ramsey plan using dynamic programming squared and provide a Python class ChangLQ that performs the calculations.

We have deployed ChangLQ earlier in this lecture to compute a baseline Ramsey plan to which we have compared outcomes from our application of the cruder machine learning approaches studied here.

Let’s use the code to compute the parameters \(d_0, d_1\) for the decision rule for \(\mu_t\) and the parameters \(d_0, d_1\) in the updating rule for \(\theta_{t+1}\) in representation (41.12).

First, we’ll again use ChangLQ to compute these objects (along with a number of others).

clq = ChangLQ(β=0.85, c=2, T=T)

Now let’s print out the decision rule for \(\mu_t\) uncovered by applying dynamic programming squared.

print("decision rule for μ")
print(f'-(b_0, b_1) = ({-clq.b0:.6f}, {-clq.b1:.6f})')
decision rule for μ
-(b_0, b_1) = (0.064507, 1.599536)

Now let’s print out the decision rule for \(\theta_{t+1} \) uncovered by applying dynamic programming squared.

print("decision rule for θ(t+1) as function of θ(t)")
print(f'(d_0, d_1) =  ({clq.d0:.6f}, {clq.d1:.6f})')
decision rule for θ(t+1) as function of θ(t)
(d_0, d_1) =  (-0.064507, 0.400464)

Evidently, these agree with the relationships that we discovered by running regressions on the Ramsey outcomes \(\vec \mu^R, \vec \theta^R\) that we constructed with either of our machine learning algorithms.

We have set the stage for this quantecon lecture Time Inconsistency of Ramsey Plans.

We close this lecture by giving a hint about an insight of Chang [Chang, 1998] that underlies much of quantecon lecture Time Inconsistency of Ramsey Plans.

Chang noticed how equation (41.3) shows that an equivalence class of continuation money growth sequences \(\{\mu_{t+j}\}_{j=0}^\infty\) deliver the same \(\theta_t\).

Consequently, equations (41.1) and (41.3) indicate that \(\theta_t\) intermediates how the government’s choices of \(\mu_{t+j}, \ j=0, 1, \ldots\) impinge on time \(t\) real balances \(m_t - p_t = -\alpha \theta_t\).

In lecture Time Inconsistency of Ramsey Plans, we’ll see how Chang [Chang, 1998] put this insight to work.