A reinforcement learning approach to improve the performance of the Avellaneda-Stoikov market-making algorithm PMC

market tick

As stated in Section 4.1.7, these for w and k are taken as the fixed parameter values for the Alpha-AS models. They are not recalibrated periodically for the Gen-AS so that their values do not differ from those used throughout the experiment in the Alpha-AS models. If w and k were different for Gen-AS and Alpha-AS, it would be hard to discern whether observed differences in the performance of the models are due to the action modifications learnt by the RL algorithm or simply the result of differing parameter optimisation values. Alternatively, w and k could be recalibrated periodically for the Gen-AS model and the new values introduced into the Alpha-AS models as well.

prediction

On the P&L-to-MAP ratio, Alpha-AS-1 was the best-performing model for 11 test days, with Alpha-AS-2 coming second on 9 of them, whereas Alpha-AS-2 was the best-performing model on P&L-to-MAP for 16 of the test days, with Alpha-AS-1 coming second on 14 of these. Here the single best-performing model was Alpha-AS-2, winning for 16 days and coming second on 10 (on 9 of which losing to Alpha-AS-1). Alpha-AS-1 had 11 victories and placed second 16 times (losing to Alpha-AS-2 on 14 of these).

4 Comparison with existing models

Currencies, though this can vary depending on market volatility and client flows. “Under standard assumptions of risk tolerance and daily turnover, the model indeed confirms that this level of internalisation is optimal on average,” says Barzykin. The finding correlates with current industry practices, while the optimal risk neutralisation time derived from the model was also in line with market norms.

Market-making by a foreign exchange dealer – Risk.net

Market-making by a foreign exchange dealer.

Posted: Wed, 10 Aug 2022 07:00:00 GMT [source]

HFT (high-frequency trading) has emerged as a powerful force in modern financial markets. Only 20 years ago, most of the trading volume occurred in exchanges such as the New York Stock Exchange, where humans dressed in brightly colored outfits would gesticulate and scream their trading intentions. Nowadays, trading occurs mostly in electronic servers in data centers, where computers communicate their trading intentions through network messages. This transition from physical exchanges to electronic platforms has been particularly profitable for HFT firms, which invested heavily in the infrastructure of this new environment. In this section, we compare the existing optimal market making models based on the stock price impacts with the models that we introduce in the previous GAL sections. Numerical experiments are carried out on two different types of utility functions, i.e., quadratic and exponential utility functions.

Institutional Investors and Stock Market Volatility

AlphaGo learned by playing against itself many times, registering the moves that were more likely to lead to victory in any given situation, thus gradually improving its overall strategies. The same concept has been applied to train a machine to play Atari video games competently, feeding a convolutional neural network with the pixel values of successive screen stills from the games . High-frequency trading is a popular form of algorithmic trading that leverages electronic trading tools and high-frequency financial data.

https://www.beaxy.com/glossary/no-coiner-nocoiner/

Throughout a full day of trading, it is more likely than within shorter time frames that there will be intervals at which the market is indeed closely matched by the AS formula parameters. The greater inventory risk taken by the Alpha-AS models during such intervals can be punished with greater losses. (fcsn.org) Conversely, the gains may also be greater, a benefit which is indeed reflected unequivocally in the results obtained for the P&L-to-MAP performance indicator. The usual approach in algorithmic trading research is to use machine learning algorithms to determine the buy and sell orders directly.

And then we show how to incorporate those tiers into the model,” says Barzykin. In the paper, clients are divided into two tiers based on their sensitivity to price changes. Some clients need to take certain positions, and their activity is less likely to be influenced by changes in price, while others are more likely to trade when they see an attractive price. Trading strategy with stochastic volatility in a limit order book market. Consequently, she will sell the assets with a lower price on the positive inventory levels to reduce both the price risk and liquidation risk. On the other hand, she does not face with the liquidation risk on the negative inventory levels but wants to receive higher amount for selling the assets.

We’ve updated our privacy policy.

Furthermore, as already mentioned, the agent’s risk aversion (γ) is modelled as constant in the AS formulas. Finally, as noted above, implementations of the AS procedure typically use the reservation price as an approximation for both the bid and ask indifference prices. The main contribution of this paper is a new integral deep LOB trading system that embraces model training, prediction, and optimization. Inspired by the model architecture in Zhang et al., 2018, Zhang et al., 2019, we adopt the deep convolutional neural network model, which has a structure of convolutional layers and includes an inception module and LSTM module.

Then, a robust sparse-norm and graph regularization constraints are performed in the objective function to ensure the consistency of the spatial information. For the optimization of the parameters involved in the model, a distributed adaptive proximal Newton gradient descent learning strategy is proposed to accelerate the convergence. Furthermore, considering the dynamic time-series and potentially non-stationary structure of industrial data, we propose extended incremental versions to alleviate the complexity of the overall model computation. Extensive data recovery experiments are conducted on two real industrial processes to evaluate the proposed method in comparison with existing state-of-the-art restorers.

Mean decrease impurity , a feature-specific measure of the mean reduction of weighted impurity over all the nodes in the tree ensemble that partition the data samples according to the values of that feature . Where the 0 subscript denotes the best orderbook price level on the ask and on the bid side, i.e., the price levels of the lowest ask and of the highest bid, respectively. Market indicators, consisting of features describing the state of the environment. Thus, the DQN approximates a Q-learning function by outputting for each input state, s, a vector of Q-values, which is equivalent to checking the row for s in a Qs,a matrix to obtain the Q-value for each action from that state.

In contrast, the total P&L accrued so far in the day is what has been added to the agent’s state space, since it is reasonable for this value to affect the agent’s assessment of risk, and hence also how it manipulates its risk aversion as part of its ongoing actions. In a paper published on Risk.net earlier this month, they define the choice between internalisation and externalisation as an optimisation problem in which the state variable is the inventory of the market-maker. Their model uses market parameters such as volatility and client trading activity in response to pricing to determine the optimal choice. The market-maker has full control over the prices quoted to clients and its trading activity on external venues.

Figure3 depicts one simulation of the profit and loss function of the market maker at any time t during the trading session in the left panel. The profit and loss performance of the trading is displayed by the cash level histogram in the left panel. 3 that the strategy is profitable even when there are adverse selection effects in the model due to the expectations of the jumps. Now, as another extension of a stock price impact on optimal market making problem, we work on the problem that the stochastic volatility of the asset is affected by the arrival of market orders and perform this case on the optimal trading prices. The models in literature assume that the stock price is followed by mostly Brownian motion with constant volatility, and in the cases of stochastic volatility, they consider only a diffusive volatility. Picking up the volatility as diffusive is highly permanent but this dynamics can allow the increments to increase only by a sequence of normally distributions.

  • The stochastic control problem of optimal market making is among the central problems in quantitative finance.
  • A typical HFT algorithm is based on limit order book data (Baldauf and Mollner, 2020, Brogaard et al., 2014, Kirilenko et al., 2017).
  • Figures in bold are the best values among the five models for the corresponding test days.

They have considered a constant price impact using the same counting processes for both arrival and filled limit orders. More recently, Baldacci et al. have studied the optimal control problem for an option market maker with Heston model in an underlying asset using the vega approximation for the portfolio. For more developments in optimal market making literature, we refer the reader to Guéant , Ahuja et al. , Cartea et al. , Guéant and Lehalle , Nyström and Guéant et al. . The latter is an important feature for market maker algorithms.

Combining reservation price and optimal spread

We show that, over short time intervals, price changes are mainly driven by the order flow imbalance, defined as the imbalance between supply and demand at the best bid and ask prices. Our study reveals a linear relation between order flow imbalance and price changes, with a slope inversely proportional to the market depth. These results are shown to be robust to intraday seasonality effects, and stable across time scales and across stocks. This linear price impact model, together with a scaling argument, implies the empirically observed “square-root” relation between the magnitude of price moves and trading volume. However, the latter relation is found to be noisy and less robust than the one based on order flow imbalance. We discuss a potential application of order flow imbalance as a measure of adverse selection in limit order executions, and demonstrate how it can be used to analyze intraday volatility dynamics.

Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible — no later than 48 hours after receiving the formal acceptance.

Should you hedge or should you wait? – Risk.net

Should you hedge or should you wait?.

Posted: Wed, 24 Aug 2022 07:00:00 GMT [source]

Finally, we demonstrate the significance of this novel system in multiple experiments. The AS model generates bid and ask quotes that aim to maximize the market maker’s P&L profile for a given level of inventory risk the agent is willing to take, relying on certain assumptions regarding the microstructure and stochastic dynamics of the market. Extensions to the AS model have been proposed, most notably the Guéant-Lehalle-Fernandez-Tapia approximation , and in a recent variation of it by Bergault et al. , which are currently used by major market making agents. Nevertheless, in practice, deviations from the model scenarios are to be expected. Under real trading conditions, therefore, there is room for improvement upon the orders generated by the closed-form AS model and its variants.

inventory

Also, deploying monitors provides a virtual backbone for multi-hop https://www.beaxy.com/ transmission. However, adding secure points to a WANET can be costly in terms of price and time, so minimizing the number of secure points is of utmost importance. Graph theory provides a great foundation to tackle the emerging problems in WANETs. A vertex cover is a set of vertices where every edge is incident to at least one vertex. The minimum weighted connected VC problem can be defined as finding the VC of connected nodes having the minimum total weight.

  • In the first generation, 45 individuals were created by assigning to each of the four genes random values within the defined ranges.
  • Figure3 depicts one simulation of the profit and loss function of the market maker at any time t during the trading session in the left panel.
  • Alternatively, experimenting with further layers to learn such policies autonomously may ultimately yield greater benefits, as indeed may simply altering the number of layers and neurons, or the loss functions, in the current architecture.
  • By trimming the values to the [−1, 1] interval we limit the influence of this minority of values.
  • Another extended market making model with inventory constraints has been provided by Fodra and Labadie who consider a general case of midprice by linear and exponential utility criteria and find closed-form solutions for the optimal spreads.
  • Figures for Alpha-AS 1 and 2 are given in green if their value is higher than that for the AS-Gen model for the same day.

Table11 which is obtained from all simulations depicts the results of these two strategies. We can see that when the jumps occur in volatility, it causes not only larger profits but also larger standard deviation of the profit and loss function. In order to recall the models easier, we call the model studied in in Case 1 in Sect. 3 with stock price dynamics as “Model 1” and the model with the dynamics “Model 2”. Moreover, the spread can also be considered to be normally distributed due to its skewness and kurtosis values. Hence, the optimal spreads which maximize the supremums in the verification Eq.

However, on 13 of those avellaneda-stoikov paper Alpha-AS-1 achieved a better P&L-to-MAP score than Gen-AS, substantially so in many instances. Only on one day was the trend reversed, with Gen-AS performing slightly worse than Alpha-AS-1 on Max DD, but then performing better than Alpha-AS-1 on P&L-to-MAP. The procedure, therefore, has two steps, which are applied at each time increment as follows. Discover a faster, simpler path to publishing in a high-quality journal.