Journal of Industrial Engineering and Management Science

Vol: 2018    Issue: 1

Published In:   January 2019

Modeling Operational Risk Using Linear Algebra and Monte Carlo Simulation: Enabling Innovative Service Concepts with Simulated Throughput Computations

Article No: 2    Page: 15-38    doi:    

Read other article:
1 2 3 4 5 6 7 8

Modeling Operational Risk Using Linear Algebra and Monte Carlo Simulation: Enabling Innovative Service Concepts with Simulated Throughput Computations*

Larissa Laumann1, Daniel Jaroszewski1, Benedikt Sturm1, Kathrin Rose1, Wolfgang Mergenthaler1 and Gunnar Markert2

1FCE Frankfurt Consulting Engineers GmbH, Bessie Coleman-Strasse 7, 60549 Frankfurt am Main, Germany

2Thyssenkrupp Industrial Solutions, Graf-Galen-Strasse 17, 59269 Beckum, Germany


Received 04 December 2018; Accepted 05 December 2018;
Publication 18 December 2018


Industrial maintenance as a service provided by the plant manufacturer is receiving increasing attention throughout the community of plant operators, notably in the cement and mining industry. The main reason is the fact that manufacturers have the deepest knowledge of their own plants. Furthermore, plant operators do not want to worry about maintenance issues and are rather willing to outsource this task to a service provider. The question then is, at what price the operator and the manufacturer are willing to close a service contract. The service provider must make sure that the price covers his expected cost including an eventual insurance fee against extreme damage and that the risk of outliers can be managed. The operator, in turn, must make sure that the price is covered by his income leaving an appropriate profit. Furthermore, the maintenance service provider can ask an insurance company for protection against very large damages. And finally, the investor, granting a loan to the operator, is interested in his own risk.


  • Reliability Engineering
  • Engineering Applications of Graph Theory
  • Monte Carlo Simulation
  • Matrix Inversion

1 The Model

In a favourable market situation, where every unit of production of the commodity under consideration can be sold, operators of large industrial plants, such as in the cement or the mining industry, crucially depend on high availability characteristics of their systems. Every unplanned system malfunction reduces revenue. The current paper describes an approach to estimate the output distribution of a multicommodity plant based on parallel, alternating renewal processes, representing component defects and subsequent repairs. Estimation makes use of the Monte-Carlo Simulation technique. For each time interval in each of a set of simulations, the plant must be modeled by an appropriate technique.

Reliability engineering, in the past, has largely concentrated on systems with stochastically independent binary components, see [4], for instance. Systems with a continuum of states call for a different approach. Kuei-Lin [1] deals with the reliability of manufacturing networks. So does this paper by modeling the manufacturing network as a graph with nodes, which linearly transform real-valued input vectors into real-valued output vectors. The transformation or yield matrix implements this transformation and is subject to failure and repair. This way, computing the output vector calls for an efficient matrix inversion algorithm, as this operation must be performed for each time interval and each simulation. Such algorithms are vital to the purpose of this paper, as also discussed in [2], for instance or [6]. Juan-Li [3] studies stochastic network problems, when malfunctions arise from physical failures and emphasizes insurance and reinsurance aspects. Stochastic networks can also be viewed from an entirely different angle. Thus, continuous and constant in-flow into the system may be replaced by discrete arrivals of customers to be routed through the individual stations in the graph and to join queues in front of the stations. This type of problem is investigated, for example by [5].

The maintenance task in plant operation is increasingly being outsourced to professional maintenance service providers, such as the plant manufacturer himself. The reason is the deep knowledge, the plant manufacturer has on his own systems. Providing maintenance, however, against a fixed fee, usually calls for protection against rare, but very large damages. The idea presented here is to find insurance coverage beyond a certain deductible. Finally, the investor who grants a loan to the operator in order to build the plant, has a vital interest in the probability that the operator will fail to pay back his loan. Computing the insurance premium and every kind of risk incurring to the operator, the manufacturer and the investor, requires knowledge of the output distribution.

This paper is organized as follows: In the current chapter we outline the model and deal with the master equations relating output to input in a network of nodes. We illustrate this with an example and describe how the stochastic processes consisting of failure and repair act on the yield matrices. Next we look at output and loss and define both in terms of the stochastic properties of the yield matrix. In Chapter 2 the necessity to invert certain matrices is the main topic. Various ways to accelerate or avoid matrix inversion are investigated. For instance, if the graph of the system under consideration contains no cycles, the precomputation of certain graph related coefficients will be possible. Also, if the yield matrix has a certain modular structure, matrix inversion collapses into a series of smaller inversions. This, together with a special disturbance model avoids repeated matrix inversion altogether by making use of the eigenvalues and the eigenvectors of the undisturbed matrix. It is shown how Monte-Carlo simulation, in connection with matrix inversion, can be used to estimate the output distribution and the damage distribution. Chapter 3 uses these distributions to calculate certain risks on the various players in the maintenance cycle. Chapter 4, finally illustrates the computational process by means of a small example.

1.1 Definitions and Assumptions

Assume the plant under consideration can be modeled as a graph with n nodes and an appropriate number of edges. Assume node i ∈{1,…,n} has M(i) inputs and K(i) outputs. Let


be the vector of input and output flows of node i, respectively and let Ai be a matrix mapping IRM onto IRK such that


Below we will refer to the individal inputs and outputs of a node also as the input and output ports. Concatenating vectors xi und yi into one single vector x and y each and defining a matrix


whereby Ψ is a matrix valued function of the matrix valued arguments A1,…,An one can write


We will call A the yield matrix in this paper. The inputs to each of the nodes are composed of certain outputs of predecessor nodes and eventual external inputs, whereby cycles may occur. x, therefore can be written as


for some appropriate connection matrix B and a vector α, representing the net inflow to the system. Equation (2) makes sure that inputs to the source nodes can be represented by certain outputs of other nodes and components of the vector α.

It is assumed, without loss of generality, that a unit of production costs a unit price of 1$.

1.2 The Main Equation

Plugging Equation (2) into Equation (1) yields

=AB*y+Aα or, equivalenlyy=(IAB)1Aα(3)

provided matrix I-AB is nonsingular, which will henceforth be assumed and can be tested as needed.

1.3 Example

Figure 1 shows the flow of material in a plant having all the graph related properties as described above. Defining the vectors


then, with the matrices


we see that Equation (3) is satisfied.


Figure 1 Example layout.

1.4 The Random Process

Let us now follow the system over a time period [0,T] ⊆ ℝ. Throughout this time period we will see a sequence of events. The k-th event tk ∈ [0,T] occurs in node i ∈{1,…,n}, say. If the event is a failure, then the affected node will experience a modified yield matrix Atki, which differs by a small matrix Δtki from Ai such that Atki = Ai + Δtki. The up times of each node are supposed to be – without loss of generality – exponentially distributed with an event rate κi. The repair times are supposed – again without loss of generality – to be deterministic with a constant repair time τi. Starting with an entirely intact system, the first event must necessarily be a failure of a node. Each subsequent event produces a new matrix Atk whereby submatrix Atki is equal to submatrix Ai, if the event at time tk was a repair and differs from Ai, if the event was a failure. Each event is represented by

  • the time
  • the node affected
  • and the mode (Up/Down)

and creates its own successor until time T has been exhausted.

1.5 Output and Loss Distributions

The vectorial output y in the system under consideration becomes time dependent through the randomly changing yield matrices At of the nodes, i.e.


The entire output over the whole period can be expressed as a sum of output rates over the list of events multiplied by the time expired since the last event, i.e.


Below we single out one particular component of y – henceforth called ys denoting a sink – and we are interested in its distribution and its density


Let ymaxs be the maximum achievable output of the plant, when the system is intact, i.e. when


with the symbol s on the right hand side again denoting the sink and define the defect as the missing yield


As a consequence the distribution function of Z is given by


assuming F is continuous. The distribution density of the defect is given by


The main objective in this paper is the numerical estimation of G(z) and g(z) via a Monte-Carlo simulation of F(η) and f(η).

2 The Numerical Challenge

The purpose of this paper is to use Equations (4) and (5) in a loop over many simulations and time steps within each simulation to generate possible lifetime trajectories. Below we will simulate the global output y over the time period [0,T].

In the pursuit of finding the damage distribution we are seeking ways to accelerate the algorithm which consists in many consecutive matrix inversions. We process the sequence of events in the following manner: In the beginning ys is given by Equation (7). As time proceeds yt is given by Equation (4).

2.1 A Matrix Series Expansion



for a certain matrix Δt, one obtains from (4) through a series expansion of the matrix (I - At-1B)-1ΔtB the following expression:


assuming the norm of (I - At-1B)-1ΔtB is small. This creates hope that – by tolerating a small error of order o(∥(I - At-1B)-1ΔtB∥) – one could do the matrix inversion only in every other step and thereby save on computational work.

2.2 An Iterative Approximation

Another method to save on the effort of inversion arises by considering


2.3 An Iterative Solutions

Equation (12) suggests to look for an iterative solution for yt+1 in the following form:


The question now is, whether iteration (13) converges. Applying (13) for indices k and k + 1 therefore yields


If the operator Ut := (I - AtB)-1Δt+1B represents a contracting map, then iteration (13) converges.

Lemma 1 Let the matrix Ut have n different eigenvalues χ1,…,χn. If


then (13) is a contracting map.

Proof. With


Then, from (14) one obtains Δyt+1k+1 := Ut ∗ Δytk. Now express Δytk in terms of the eigenvectors vi,i ∈{1,…,n} of Ut, i.e. Δytk := ∑ i∈1,…,n wivi for some coefficients wi,i ∈{1,…,n}. Then the following holds


and therefore limk->∞∥Δytk∥ = 0 which proves the claim.

Equivalently, starting from Equation (3) one can set


Lemma 2 It is straightforward to show that


Also, if


then iteration (16) converges.

Proof. The proof of (17) follows by induction. (17) certainly holds for k = 0 using (16). Now, assume it holds for any k > 0. Then


and the lemma is proven.

2.4 A Special Case

Lemma 3 If




Proof. Straightforward.

Dropping the time index in the matrix elements and the input and output vectors for now one can show, that the following holds:

Lemma 4 Let, for each node i ∈ {1,…,n} and for each input port k ∈ {1,…,M(i)} V(i,k) and P(i,k) denote the predecessor attached to input port k of node i and the corresponding output port on node V(i,k), respectively. Also, let Atki = ξtki ∗ Ai for some random variable ξtki,0 ≤ ξtki ≤ 1. Finally set V(i,k1,k2) := V(V(i,k1),k2) and P(i,k1,k2) := P(V(i,k1),k2) and so forth. If the network contains no cycles, then each output of each node can be written as


Proof. By definition the following holds


Observing that

xki={ yP(i,k)V(i,k),if M(V(i,k))>0αL(i,k),else - for some function L(i,k) }

and plugging this result into Equation (21), one obtains


as long as M(V (i,k)) > 0. Continuing this process once shows


The process stops, when M(V(i,k1,…,kN)) = 0, in which case xNV(i,k1,…,kN) = α(L(i,k1,…,kN)) for some appropriate function α(L(i,k1,…,kN)).

Corollary 1 Formula (20) has an important consequence. By letting i represent a sink node, i.e. a node which has no successors and by continuing the recursion on the right hand side until V (i,k1,…,ks),P(i,k1,…,ks) point to an external input α(L(i,k1,…,kN)), yil can be represented as a linear function of the inputs, whereby the coefficients are composed of a product of matrix elements.

yli=kNNψ(l,i,k)α(L(l,i,k))ρ{1,,n},ρ defectiveξtρ(24)

where the summation index satisfies




Proof. Starting from Equation (20) one verifies that, upon using ks-1 := (k1,…,ks-1)


with the above mentioned condition on the summation index. Now, upon observing that ar,k,ti = ξtiar,ki,1 ≤ k ≤ M(i),1 ≤ k ≤ K(i), one proves the corollary.

Formula (24) allows for a precomputation of all the factors ψki and thereby avoids the necessity to invert the yield matrix for each epoch throughout the simulation. Instead, only the scalars ξtl must be simulated.

2.5 Network Modularization

Dropping the time index for the purpose of the follwing result, one can show

Lemma 5 If the graph representing the system can be partitioned into a set of L subgraphs such that subgraph i ∈{1,…,L}

  • receives a net external inflow represented by α(i) from subgraphs with lower indices
  • has its own yield matrix A(i)
  • has a set of connection matrices B(i,j)
  • has an output vector given by y(i)

and there are no cycles between subgraphs, while each subgraph may possess internal cyles, then the following holds:


Proof. With the above definitions the following equations hold:


Therefore, plugging the second equation into the first one yields


Concatenating the output vectors y(i),i ∈{1,…,L} into one large output vector y one obtains


This can also be written as


Equation (30) is equivalent to Equation (26), which proves the lemma.

This result may eventually be used to simplify matters and accelerate the simulations, as smaller matrices must be inverted in comparison to Equation (3). Also, it can be extended to more complex graphs with isolated subgraphs.

In order to save matrix inversion effort we can observe another interesting special case upon reinsertion of the time index t:

Corollary 2 If the conditions of lemma (5) hold for each subgraph, and in addition matrices A(i) for all i ∈ {1,…,M(i)} have only different eigenvalues and the following definitions are used


where λk(i) and vk(i),k ∈ {1,…,M(i)} are the eigenvalues and the eigenvectors of A(i)B(i,i) and rk,t(i),vk(i) is the representation of A(i)α˜t(i) in the basis spanned by the eigenvectors, then yt(i) can be expressed as


The following can be concluded from (26)


using At(i) = ξt(i)A(i). With the definitions of λk(i),vk(i),k ∈{1,…,n} one obtains


Inserting (34) into (33) one obtains


which proves the corollary.

2.6 Monte Carlo Simulation

In the sequel we will discuss several details of simulating the output ys in order to generate the distribution P{ys ≤ η}. Assume we are performing a number NSIM of simulations. According to (6) the probability P{ys ≤ η} can be approximated by

F^(η)=P^{ysη}=Number of simulations with ytsηNSIM(36)

From this distribution we derive an approximate distribution density


such that


Also, with the definitions given in (8) and (9) the following relationships can be used.


3 The Players and Their Risk Levels

Four players are involved in the current concept, i.e. the plant operator, the plant manufacturer – here identical to the maintenance provider – eventually an insurance company and an investor. It is assumed that all of the following agreements between the players prevail:

  • The plant operator and the investor negotiate a loan on the investment to build the plant. Repayment plus interest amounts to a sum RI per year.
  • The plant operator and the maintenance contractor agree that the contractor will take care of the entire maintenance at a yearly fee FM.
  • The maintenance contractor and the insurance company close an insurance contract such that, whenever the defect per year exceeds a deductible D, the limit D is covered by the maintenance contractor and the exceeding amount is carried by the insurance. Let the insurance fee be FI.
  • Let LC be the labor cost
  • Let MC be the pure mechanical maintenance cost
  • Let E denote other expenditures and, finally
  • Let P be the profit before tax

All cost numbers referenced above and below will be yearly cost.

3.1 The Operator’s Risk

If the operator and the contractor close a service contract, the operator bears no risk, as he receives a fixed income generated either by his sales or by a reimbursement from the contractor, should the output remain below ymaxs. All he has to do is to pay the fee FM, which includes the price for having no risk.

3.2 The Contractor’s Risk

The contractor’s risk – defined as the probability that the defect is greater than the fee paid to him by the operator – is given by


The risk is mitigated by the insurance contract. The average defect and the variance the contractor sees can be written as


3.3 The Insurer’s Risk

Usually the insurance fee the contractor must pay to the insurance company is equal to the expected damage to the insurer plus a certain safety factor SF times the standard deviation. SF typically takes values between 1.0 and 3.0. The expected cost and the variance the insurer sees is therefore


The insurance fee is normally set to


The insurer’s risk is the probability that the damage exceeds the insurance fee paid to him by the maintenance contractor, i.e.


3.4 The Investor’s Risk

3.4.1 No service contract

Figure 2 below shows, how the yearly revenue to the operator is split between the various types of expenditure, if the operator does not close a maintenance contract with the contractor.


Figure 2 No service contract.

No Market Risk: Let PCS1 be the probability that the yearly capital service amounting to RI cannot be fully served, under the assumptions that

  • The yearly revenue Y is fixed and no market risk exists, i.e. every unit of production can be sold
  • Labor cost is fixed
  • Payments must be made in the order LCMCRIE

In that case we have


Verbally this means that labor and maintenance cost do not leave any resources to pay the loan back to the investor. Figure 3 illustrates this situation. The integral is the shaded area in the limits between 0 und ymaxs - (LC + MC + RI). If this integral grows due to the fact that the density function assumes larger values even for smaller maintenance cost (red area), i.e. if the maintenance cost become “stochastically smaller”, then the probability to serve the capital cost is larger than for smaller values of the integral (blue area).


Figure 3 Probability for capital service with service contract.


  • Capital service probability grows with falling maintenance cost and
  • Investor’s risk falls with falling maintenance cost as well

With Market Risk: If, however, h(x) represents the probability density that x units of product can be sold only, then we have


3.4.2 With service contract

In contrast, Figure 4 below shows the distribution of the yearly revenue between the various types of expenditure, if the operator does close a maintenance contract with the contractor.


Figure 4 With service contract.

Assume, that in this case payments must be made in the order LCFMFIRIE

No Market Risk: In this case we have

PCS3={ 1if Zymaxs(LC+FM+FI+RI)Ys(LC+FM+FI+RI)0else

With Market Risk:


3.4.3 Conclusion

Corollary 3 If there is no market risk, then the following holds:



Figure 5 Default probability without market risk.

Therefore, if there is no market risk, and if MCFM + FI (Case 1 in figure 5 below, then a service contract positively influences the investor’s risk, otherweise not.

On the other hand, if there is a market risk, and if

LC+MC+RIymaxsh(x)(F(ymaxs(x(LC+MC+RI)))F(ymaxsx))dx  0LC+FM+FI+RIh(x)dx(49)

then PCS4PCS2 and – again – a service contract has a positive impact.

Proof. Straightforward.

4 Numerical Example

Figure 6 represents the simplified layout of an industrial cement plant. Each of the nodes in this layout is given by the following four items.

  • An error rate κi
  • A downtime τi
  • A yield matrix A(i) and a
  • A reduction factor ξi


Figure 6 Layout of an industrial cement plant.

Table 1 below summarizes the reliability related data on κii and ξi as averages over the different failures. There were two failures that lead to a complete breakdown of the cement plant.

Table 1 Average reliability data

average error rate κ 1/5551,7h
average downtime τ 66,85h
average reduction factor ξ 0,29

Figure 7 shows the defect distribution as estimated by a Monte-Carlo simulation.


Figure 7 Defect histogram for 1000 simulation steps.


[1] Kuei-Lin, Y. and Chang, P. (2012). Evaluate the system reliability for a manufacturing network with reworking actions. Reliability Engineering and System Safety, 106, 127–137.

[2] Rashmi, P. and Dinesh Kumar, V. (2014). Design of newton iterative method for matrix inversion. International Journal of Research in Advent Technology, 2, 124–126.

[3] Juan, L. (2015). Stochastic Networks: Modeling, Simulation Design and Risk Control, Dissertation, Columbia University, New York.

[4] Meyna, A. and Pauli, B. (2010). Zuverlaessigkeitstechnik, Hanser, Muenchen.

[5] Dai, J. and Vande Vate, J. (1996). Global stability of two station queueing networks, in: Glasserman, P., Sigman, K. and Yao, D. (Eds.) Stochastic Networks, Lecture Notes in Statistics 117, Springer, New York.

[6] van der Veen, A. and Dewilde, P. (1993). Large Matrix Inversion using State Space Techniques, IEEE Workshop on VLSI Signal Processing, Veldhoven, The Netherlands.



Larissa Laumann: M.Sc. (Mathematics) graduated with a Master’s degree from Johannes Gutenberg-Universität Mainz. She holds a position as a software developer and a data analyst for FCE Frankfurt Consulting Engineers GmbH. Her focus is on reliability engineering, predictive maintenance, sequencing and scheduling as well as the simulation of plant performance. Another current topic of work is the development, implementation and automation within the .NET environment.


Daniel Jaroszewski (male): Dipl.-Math. (Mathematics) Johann Wolfgang Goethe University Frankfurt. He works as a consultant and software developer for Frankfurt Consulting Engineers GmbH. The consultant expertise is focused on machine learning, probability theory, combinatorial optimization and its applicability in the industry. Moreover he is Software Project Manager for a tool in the area of Predictive Maintenance. The main tasks are the development, implementation and automation of machine learning/prognosis algorithms, as well as the deployment of a monitoring platform in the .NET environment.


Benedikt Sturm received his B.Sc. and M.Sc. degrees in Mathematics from Goethe-University in Frankfurt, Germany. He works for Frankfurt Consulting Engineers GmbH since 2015. His main topics are predictive maintenance and combinatorial optimization such as routing problems. For this, he develops mathematical algorithms within a front-end interface for different costumers.


Kathrin Rose is a student at Johann Wolfgang Goethe University in Frankfurt on her way to her master’s degree in mathematics. Kathrin started to work as an intern and a research assistant at FCE Frankfurt Consulting Engineers GmbH. Her specialties are graph theory, numerical and computational mathematics.


Wolfgang Mergenthaler is managing director and owner of FCE Frankfurt Consulting Engineers GmbH in Frankfurt, Germany. He received his diploma in physics from Technische Universität München (TUM) in 1973 and his doctorate in applied mathematics in 1978, also from TUM. In his current position Dr. Mergenthaler leads a team of mathematicians and computer scientists working in the fields of Pattern Recognition, Mathematical Statistics, Probability Calculus and Combinatorial Optimization with Industrial Applications. Plant simulation is one of his strong points of interest, both with regard to operational as well as planning issues. Other important working areas are the domains of sequencing and scheduling in production planning and the generation of response surfaces from process data.


Gunnar Markert graduated from the Technical University of Kaiserslautern with a diploma in industrial engineering (Diplom-Wirtschaftsingenieur). He continued his academic career at the University of Basel in the economics department with a doctoral dissertation in an industrial marketing subject. After assuming functions in business consulting and corporate strategy, since 2016 holds a position as a senior manager for business development and sales with thyssenkrupp Industrial Solutions in the cement and mining sector. Gunnar specializes on the estimation of industrial risks due to plant malfunctions and on innovative ways to mitigate financial damage to the plant owner, in order to improve the overall risk profile and facilitate project financing. He is also interested in the computation of defect distributions and analyzes their impact on the break even point of the plant operator and – thereby – on an investor’s risk to finance a project.

FCE Frankfurt Consulting Engineers GmbH, ThyssenKrupp Industrial Solutions.



1 The Model

1.1 Definitions and Assumptions

1.2 The Main Equation

1.3 Example


1.4 The Random Process

1.5 Output and Loss Distributions

2 The Numerical Challenge

2.1 A Matrix Series Expansion

2.2 An Iterative Approximation

2.3 An Iterative Solutions

2.4 A Special Case

2.5 Network Modularization

2.6 Monte Carlo Simulation

3 The Players and Their Risk Levels

3.1 The Operator’s Risk

3.2 The Contractor’s Risk

3.3 The Insurer’s Risk

3.4 The Investor’s Risk





4 Numerical Example