Phase Space Reconstruction
Begin
As I said in last note, there are so many chaotic series in financial market, like the stock price or other asset prices, most of these series don’t have the enough features of custom time series analysis, this kind of prices is usually influenced by many factors, so the series always see a irregular trend, some series look like a random walk such as stock prices. In my mind, SDE is also a good tool of analyzing the stock price, I’ll tell this method in the later notes.
Because of the irregular features, the phase space reconstruction becomes, it make the chaotic series could be reflected in the high dimension, and the typical example is the Lorenz curve, it is regarded as a great model in weather prediction and weather dynamic system, but I think it’ll also be a good tool in financial field.
What’s this
Phase Space Reconstruction is reflected method in time series, it could make the series reconstruct in the many other dimonsions by delay and embedding. Then I’ll talk about this method gradually.
Takens Embedding was born in 1981, it said the chaotic series like this form could be reconstructed in the space. $$ y(i) = (x(i),\dots,x(i+(d-1)\tau)), \quad where 1 \leq i \leq n-(d-1)\tau $$
In this formula, the d means the dimension of vector $y(i)$, it is usually called embedding dimension and $\tau$ is the delay time.
As we can see, the new generate data is depend on the embedding dimension $d$ and the delay time $\tau$, but I think the most important thing for beginner is that the new data absolutely comes from the former data, that means every new data in the new space is one of the old data!
So the keys of this method is how to calculate the numbers of embedding dimension and delay time.
The first method of calculate delay time is ACF, and we need to find the minimum number in this process and when the $\rho$ is close to the $1-e^{-1}$, then we could choose the $x$ as the lag of the delay time. $$ R(\tau) = \frac{1}{n}\sum_{i=1}^{n-\tau}x(i) \times x(i+\tau) $$
But the problem of this is the method is the ACF shows the linear relationship in the delay time, but we has defined the system is nonlinear, so we need the other method to choose the lag.
The other method is AMI, which is a method to calculate the information in each series. If we have a series $X$, and the $Y$ is a delay series. with $N-\tau$ observations, where $\tau$indicates the delay; the average mutual information $I_{Y;X}$ between both time series can be expressed, in probabilistic terms. $$ I_{Y;X} = I_\tau = \sum_{i=1}^{N_c}\sum_{j=1}^{N_c} P(x(i), y(j)) \mathop{log}\frac{P(x(i),y(j))}{P(x(i))P(y(j))} \\ \quad \quad \quad = \sum_{i=1}^{N_r}P(s_i,s_{i+\tau})\mathop{log}\frac{P(s_i,s_{i+\tau})}{P(s_i)P(s_{i+\tau})} $$
where $N_c$ is the number of cells containing points, with non-zero probability, and $N_r$ is the number of routes in the state space.
$$ I_{Y;X} = H_Y + H_X - H_{X,Y} $$
And we could program this method easily by MATLAB and we could get the picture as this.
Then we need to determine the embedding dimensions. In this period, we could use the FNN(False Nearest Neighbors) method to mesure the fittest number.
Actually, we reconstruct the data is reflect the data to the other dimension, in other words, we need to extend the track of the single series and expand this data to a different space.
We defined the every point in the series has a Euclidean distance with other points. $$ R_i(d) = ||y_i(d) - y_{n(i,a)}(d)||_2 $$
If the dimension become $d+1$ frome $d$, the distance would be $R_i(d+1)$, like this. $$ R_i(d+1)^2 = R_i(d) + ||x(i+d\tau)-x(n(i,d)+d\tau)||_2^2 $$
If the $R_i(d+1)$ is much higher than $R_i(d)$, we could think this closed point is wrong. $$ a_1(i,d) = \frac{||x(i+d\tau)-x(n(i,d)+d\tau)||_2}{R_d(i)} $$
When the $a_1(i,d) \in [10,50]$, we could think the point is wrong.
Then we could use MATLAB again and the picture could be showed as below.
Then we could get the new space and become the new space from the old data.
Predict
After we finish this process, we could do the prediction of the nonlinear system.
The data we should understand is the $n-(d-1)\tau$ vector like this.
$$ \vec{y}_1 = y_1(d) = (x(1),\dots, x(1+(d-1)\tau)) \\ \vec{y}_2 = y_2(d) = (x(2),\dots, x(2+(d-1)\tau)) \\ \dots \\ \vec{y}_i = y_i(d) = (x(i),\dots, x(i+(d-1)\tau)) \\ \dots \\ \vec{y}_{n-(d-1)\tau} = y_{n-(d-1)\tau}(d) = (x(n-(d-1)\tau),\dots, x(n)) $$
Then we could set a dynamic system in Euclidean space with $d$ dimension, and we could predict this system by many methods like machine learning, deep learning or least square. $$ \vec{y}_{i+1} = F(\vec{y_i}) $$ The connected function $F(\vec{y_i})$ would be easy to estimate.
Finally, I think this way of treating the data could keep the compeleted information and the new space could be much fitter to do the analysis.