EXPLORING THE INTERPRETABILITY OF LSTM NEURAL NETWORKS OVER MULTI-VARIABLE DATA Anonymous authors Paper under double-blind review ABSTRACT In learning a predictive model over multivariate time series consisting of target and exogenous variables, the forecasting performance and interpretability of the model are both essential for deployment and uncovering knowledge behind the data. To this end, we propose the interpretable multi-variable LSTM recurrent neural network (IMV-LSTM) capable of providing accurate forecasting as well as both temporal and variable level importance interpretation. In particular, IMVLSTM is equipped with tensorized hidden states and update process, so as to learn variables-wise hidden states. On top of it, we develop a mixture attention mechanism and associated summarization methods to quantify the temporal and variable importance in data. Extensive experiments using real datasets demonstrate the prediction performance and interpretability of IMV-LSTM in comparison to a variety of baselines. It also exhibits the prospect as an end-to-end framework for both forecasting and knowledge extraction over multi-variate data. 1 INTRODUCTION Our daily life is now surrounded by various types of sensors, ranging from smart phones, video cameras, Internet of things, to robots. The observations yield by such devices over time are naturally organized in time series data (Qin et al., 2017; Yang et al., 2015). In this paper, we focus on multivariable time series consisting of target and exogenous variables. Each variable corresponds to a monitoring over physical world. A predictive model over such multi-variable data aims to predict the future values of the target series using historical values of target and exogenous series. In addition to forecasting, the interpretability of prediction models is essential for deployment and knowledge extraction as well (Hu et al., 2018; Foerster et al., 2017; Lipton, 2016). For multi-variable time series in this paper, we focus on two types of importance interpretation. (1) Variable-wise temporal importance: exogenous variables present different temporal influence on the target one (Kirchgässner et al., 2012). For instance, for the exogenous variable having instant effect on the target one, its historical data at short time lags is expected to high importance values. (2) Overall variable importance: exogenous variables and the auto-regressive part of the target variable differ in predictive power, which reflects different variable importance w.r.t. the prediction of the target (Feng et al., 2018; Riemer et al., 2016). The ability to unveil such knowledge through predictive models enables to fundamentally understand the effect of exogenous variables on the target one. Recently, recurrent neural networks (RNNs), especially long short-term memory (LSTM) (Hochreiter & Schmidhuber, 1997) and the gated recurrent unit (GRU) (Cho et al., 2014), have been proven to be powerful sequence modeling tools in a variety of tasks such as language modelling, machine translation, health informatics, time series, and speech (Ke et al., 2018; Lin et al., 2017; Lipton et al., 2015; Sutskever et al., 2014; Bahdanau et al., 2014). However, current RNNs fall short of the aforementioned interpretability for multi-variable data due to their opaque internal states. Specifically, when fed with the multi-variable observations of the target and exogenous variables, RNNs blindly blend the information of all variables into memory cells and hidden states which are used for prediction. It is intractable to distinguish the contribution of individual variables into the prediction through hidden states (Zhang et al., 2017). Recently, attention-based neural networks have been proposed to enhance the ability of RNN in selectively using long-term memory and the interpretability (Vaswani et al., 2017; Qin et al., 2017; Choi et al., 2016; Vinyals et al., 2015; Chorowski et al., 2015; Bahdanau et al., 2014). Nevertheless, current
IMV-LSTM:多变量LSTM的可解释预测与知识挖掘
点点赞赏,手留余香
给TA打赏
评论0