5th NUS-USPC Workshop

29 – 30 Nov 2017, Paris

PROGRAMME

Schedule for 29 Nov (Wed)
Schedule for 30 Nov (Thur)

ABSTRACTS

Wednesay, 29 November 2017 Venue: AmphiTuring
Time	Activity
09:00 – 09:30	Registration
09:30 – 09:40	Opening Address
09:40 – 10:25	Ying CHEN National University of Singapore, Singapore Sentiment Analysis for Online Reviews with Regularized Text Logistic Regression
10:25 – 10:55	Coffee Break
10:55 – 11:40	Stephan CLÉMENÇON Telecom Paris, France Weak Signals: Machine-Learning meets Extreme Value Theory
11:40 – 12:25	Stéphane GAIFFAS University Paris Diderot, France Statistical Learning with Hawkes Processes
12:25 – 14:00	Group Photo & Lunch
14:00 – 14:45	Jean-Yves AUDIBERT Capital Fund Management, France Aggregating Weak Predictions and Rupture Detections in Financial Time Series
14:45 – 15:30	Benjamin BRUDER Lyxor Asset Management, France Data Science and Asset Management
15:30 – 16:00	Tea Break
16:00 – 16:45	Gah-Yi BAN London Business School, England Machine Learning and Portfolio Optimization
16:45 – 17:30	Mathilde MOUGEOT LPMA and ENSIIE, France Statistical and Machine Learning Methods to Model and Forecast Energy

Thursday, 30 November 2017 Venue: AmphiBuffon
Time	Activity
09:30 – 10:15	Arnulf JENTZEN ETH Zurich, Switzerland On Deep Learning based Approximation Algorithms for Partial Differential Equations
10:15 – 10:45	Coffee Break
10:45 – 11:30	Johann LUSSANGE Ecole Normale Supérieure, France Latest Advances in Reinforcement Learning
11:30 – 12:15	Chao ZHOU National University of Singapore, Singapore Investment Decisions and Falling Cost of Data Analytics
12:15 – 14:00	Lunch
14:00 – 14:45	Steven KOU National University of Singapore, Singapore A Theory of FinTech
14:45 – 15:30	Steven KOU National University of Singapore, Singapore The Economics of Bitcoin
15:30 – 16:00	Tea Break
16:00 – 16:45	Cyril GRUNSPAN ESILV, France Security and Stability of Blockchains

Abstracts

Aggregating Weak Predictions and Rupture Detections in Financial Time Series Jean-Yves AUDIBERT, Capital Fund Management, France Financial market data offers various challenging tasks for Machine Learning (ML) methods. We present here some results on semi-supervised and supervised learning tasks related to forecasting stocks returns. We show how ML methods do predict stocks returns, even when using standard indicators from the literature. We discuss the limits of these approaches at low or medium frequency timescales. ML methods strongly rely on i.i.d. or at least strong stationarity assumptions. In reality, there are clear breakpoints in the history of financial markets, and in particular in the performance of strategies. Detecting these breakpoints may help for dynamically investing in strategies. We will thus present rupture detection methods, and emphasize on the impact of the non-gaussianity of P&L time-series.

Machine Learning and Portfolio Optimization Gah-Yi BAN, London Business School, England The portfolio optimization model has limited impact in practice because of estimation issues when applied to real data. To address this, we adapt two machine learning methods, regularization and cross-validation, for portfolio optimization. First, we introduce performance-based regularization (PBR), where the idea is to constrain the sample variances of the estimated portfolio risk and return, which steers the solution toward one associated with less estimation error in the performance. We consider PBR for both mean-variance and mean-conditional value-at-risk (CVaR) problems. For the mean-variance problem, PBR introduces a quartic polynomial constraint, for which we make two convex approximations: one based on rank-1 approximation and another based on a convex quadratic approximation. The rank-1 approximation PBR adds a bias to the optimal allocation, and the convex quadratic approximation PBR shrinks the sample covariance matrix. For the mean-CVaR problem, the PBR model is a combinatorial optimization problem, but we prove its convex relaxation, a quadratically constrained quadratic program, is essentially tight. We show that the PBR models can be cast as robust optimization problems with novel uncertainty sets and establish asymptotic optimality of both sample average approximation (SAA) and PBR solutions and the corresponding efficient frontiers. To calibrate the right-hand sides of the PBR constraints, we develop new, performance-based k-fold cross-validation algorithms. Using these algorithms, we carry out an extensive empirical investigation of PBR against SAA, as well as L1 and L2 regularizations and the equally weighted portfolio. We find that PBR dominates all other benchmarks for two out of three Fama–French data sets.

Data Science and Asset Management Benjamin BRUDER, Lyxor Asset Management, France This presentation overviews the potential applications of machine learning to asset management. After a brief reminder of the major steps of a portfolio construction process, we will consider which related problems can benefit from machine learning techniques, and which ones can be suffer from data mining and strong overfitting biais. In particular, we will show that various problems involving covariances are in general good candidates for a machine learning oriented framework. On the contrary, we consider that long term trend estimation should be based on very parsimonious techniques, and rely on long term experience rather than off the shelf complex data mining solutions.

Sentiment Analysis for Online Reviews with Regularized Text Logistic Regression Ying CHEN, National University of Singapore, Singapore With the increasing user-generated reviews and feedback posted in online review platforms, it becomes essential for executives and managers to build an efficient classifier to capture general sentiment of reviews based on the unstructured text information. We propose regularized text logistic regression method that, besides providing good classification accuracy, can identify a set of essential features so as to provide rapid and valuable suggestions for sentiment analysis and operational improvement. We demonstrate the performance of the proposed method along with two real text data on restaurants and hotels and compare the classification performance with several alternatives. This is based on joint work with Peng LIU (National University of Singapore, Singapore) and Chung Piaw TEO (National University of Singapore, Singapore).

Weak Signals: Machine-Learning Meets Extreme Value Theory Stephan CLEMENÇON, Telecom Paris, France “From pattern recognition to stochastic bandits, most machine-learning algorithms only involve the computation of basic sample mean statistics and the performance of empirical risk/regret minimizers produced by the latter can be investigated by means of concentration results for empirical processes. In many applications however (e.g. classification with unbalanced classes, novelty detection, dimensionality reduction), the useful information can be located in the ‘tails’ of the data distribution, far from the mean behaviour, and risk/regret cannot be appropriately described by such statistics any more. In the Big Data era, the observation of rare/extreme events is now possible, which paves the way for designing novel algorithms relying on extreme value statistics. It is the goal of this talk to illustrate this belief through the presentation of recent works, where machine-learning interfaces with extreme value theory ans leads to efficient methods supported by a sound validity framework.” References: Anomaly Detection in Extreme Regions via Empirical MV-sets on the Sphere. A. Thomas, S. Clémençon, A. Sabourin & A. Gramfort. In the Proceedings of AISTATS 2017, Fort Lauderdale, USA. Sparse Representation of Multivariate Extremes with Applications to Anomaly Detection. With N. Goix, A. Sabourin & S. Clémençon. In Journal of Multivariate Analysis, 2017. Learning the dependence structure of rare events: a nonasymptotic study. N. Goix, A. Sabourin & S. Clémençon. In the proceedings of the 2015 COLT conference, Paris, France.

Statistical Learning with Hawkes Processes Stéphane GAIFFAS, University Paris Diderot, France We consider the problem of unveiling the implicit network structure of interactions between nodes (users in a social network for instance, moves of high-frequency financial signals), based their actions timestamps. We will describe several approaches to achieve this: using a parametric modeling of the Hawkes process with sparsity-inducing penalization (sparsity and low-rank of the adjacency matrix), and using a more recent and direct approach based on cumulants matching. Our theoretical analysis required a new tool: matrix concentration inequalities for continuous time martingales, that are of independent interest, and that will be quickly described during this talk. Our methods are illustrated on the MemeTracker dataset (network of blogs) and on financial data (order book modeling).

Security and Stability of Blockchains Cyril GRUNSPAN, ESILV, France The invention of bitcoin in 2008 marks a new stage in the history of money. For the first time, a currency lies solely on trust in simple cryptographic algorithms rather than on a state or a central bank. The technology used and known as blockchain establishes an original bridge between distributed systems and probability theory. We propose to explain this fact as well as the convergence between private interests and public interest.

On Deep Learning based Approximation Algorithms for Partial Differential Equations Arnulf JENTZEN, ETH Zurich, Switzerland

Partial differential equations (PDEs) are among the most universal tools used in modelling problems in nature and man-made complex systems. In particular, PDEs are a fundamental tool in portfolio optimization problems and in the state-of-the-art pricing and hedging of financial derivatives. The PDEs appearing in such financial engineering applications are often high dimensional as the dimensionality of the PDE corresponds to the number of financial asserts in the involved hedging portfolio. Such PDEs can typically not be solved explicitly and developing efficient numerical algorithms for high dimensional PDEs is one of the most challenging tasks in applied mathematics. As is well-known, the difficulty lies in the so-called “curse of dimensionality” in the sense that the computational effort of standard approximation algorithms grows exponentially in the dimension of the considered PDE and there is only a very limited number of cases where a practical PDE approximation algorithm with a computational effort which grows at most polynomially in the PDE dimension has been developed. In the case of linear parabolic PDEs the curse of dimensionality can be overcome by means of stochastic approximation algorithms and the Feynman-Kac formula. We first review some results for stochastic approximation algorithms for linear PDEs and, thereafter, we present a stochastic approximation algorithm for high dimensional nonlinear PDEs whose key ingredients are deep artificial neural networks, which are widely used in data science applications. Numerical results illustrate the efficiency and the accuracy of the proposed stochastic approximation algorithm in the cases of several high dimensional nonlinear PDEs from finance and physics.

The Economics of Bitcoin Steven KOU, National University of Singapore, Singapore We attempt to build an equilibrium model about Bitcoin to address the following research questions simultaneously: Why the Bitcoin price has increased more than 60,000 times from 2009 to now? Why the miner’s proportion of Bitcoin holding will decline in time, despite the price increase, Athey et al. (2016, WP)? The model features (1) two control variables, inventory level and imposed transaction fee, and (2) “S” shape demand level (Bass, 1967; Bass, 2004). The assumption of a given demand level or a curve is popular in monopolistic pricing, as in Industrial Organization, Revenue Management, and Marketing. Our model yields an interesting price dynamic: In short-run the Bitcoin price is driven by “S” shape demand level, while in long-run the price turns to be flat. The model predicts that the miner’s proportion of Bitcoin holding will decline in time, consistent with the empirical finding in Athey et al. (2016, WP). This is a joint work with Min DAI (National University of Singapore, Singapore), Wei JIANG (National University of Singapore, Singapore) and Cong QIN (National University of Singapore, Singapore).

A Theory of Fintech Steven KOU, National University of Singapore, Singapore In this talk, I will give a brief overview of current academic research on Fintech. The topics to be discussed include: (1) P2P equity financing: how to design contracts suitable for a P2P equity financing platform with information asymmetry. (2) Robotic financial advising: how to get investor’s risk aversion parameters automatically by asking simple questions, and how to get consistent answers to meet goals of investors, such as retirement planning. (3) Economics of Bitcoin: how to build a general equilibrium model for bitcoin. (4) Data privacy preservation: how to do econometrics based on the encrypted data while still preserving privacy. All the above 4 topics are based on my recent working papers.

Latest Advances in Reinforcement Learning Johann LUSSANGE, École Normale Supérieure, France Derived from the early biological studies on Pavlovian conditioning, reinforcement learning is one of the most promising approaches to machine learning. Distinct from the supervised and unsupervised learning approaches, the whole reinforcement learning framework is subject to three specific challenges: i- the exploration versus exploitation dilemma, ii- the curse of dimensionality, iii- the reward estimation problem. These challenges have recently led to much research activity, such as deep Q-learning, hierarchical reinforcement learning, shaping rewards, inverse and transfer learning, self-play and multi-agent learning, etc. Keeping in mind the potential fintech applications, we present here for technology intelligence, an overview of the latest advances of the field.

Statistical and Machine Learning Methods to Model and Forecast Energy Mathilde MOUGEOT, LPMA and ENSIIE, France Since electricity can hardly be stored, forecasting tools are essential to appropriately balance consumption and generation of energy, including renewable energies. To adapt energy market prices, forecasting is also essential. Analyzing historical data shows that the time series of wind production or the national electrical consumption are radically different regarding for example, volatility or periodicity. Dedicated models for modeling and forecasting should be, consequently, introduced and adapted for energy production or consumption. In order to forecast energy consumption, we are currently developing a “prediction box” based on a sparse learning process for functional regression. This model allows us to forecast, in a high dimensional framework, the intra day load curves on the French national consumption [2,3]. In a second study, machine learning techniques are challenged first to model the wind energy then to forecast the wind production using restricted wind measures as provided by the meteorological companies [3]. [1] Fischer A., Montuelle L., Mougeot M., Picard D. (2017) Statistical learning for wind power: a modeling and stability study towards forecasting. Wind Energy. [2] Mougeot M., Picard D., Lefieux V., Maillard-Teyssier L. Forecasting intra day load curves using sparse functionnal regression. Springer Lecture Notes in Statistics, p 161-182. [3] Mougeot M., Picard D., Tribouley K. (2013) Sparse approximation and fit of intraday load curves in a high dimensional framework. Advances in Adaptive Data Analysis, p1-23.

Investment Decisions and Falling Cost of Data Analytics Chao ZHOU, National University of Singapore, Singapore We study how the cost of data analytics and the characteristics of investors and investment opportunities affect investment decisions and their data analytics. We show that the falling cost of the data analytics raises investors’ leverage, financially constrained or highly risk-averse investors use less data analytics, the value of data analytics is highest with average investment opportunities and it is low with a high or low expected return opportunities. Due to the increased leverage, the falling cost of data analytics may lead to higher losses during the crises. This is a joint work with Jussi Keppo (National University of Singapore, Singapore) and Hong Ming Tan (National University of Singapore, Singapore).