144
and assessed for their performance, thus distilling the
optimal small turbine output model as a component
of the spatially constrained remote island micro-grid
model. Research results presented in this manuscript
may serve additional purpose of modelling the
alternative power source on vessels, either as a single
energy source, or as a component of the vessel’s
micro-grid.
2 METHODOLOGY
The approach taken utilised experimental data, and
followed the essential statistical research principles of
experimental model development [3, 4].
2.1 Over-all concept
Experimental data observed were analysed
statistically, and the machine learning approach [3, 4]
was utilised in development of three candidate
predictive models of a small wind turbine output
based on two wind predictors: wind speed and wind
direction. Model performance assessment was
conducted for all three models, using a common set of
descriptors, including Predicted-vs-Observed (P-O)
diagram, and the adjusted R-squared coefficient.
2.2 Data
Observations were taken at the experimental research
facility Sotavento in Santiago de Compostela, Galicia,
Spain (Figure 1), and provided in tabular format on
the internet [5]. A small wind turbine used was
manufactured as Bornay 1500 Inclin, extended the 1.5
kWh power rating with fibre-glass, and carbon fibre-
blades. In this research, we were concerned with the
Spring-time period, with the following duration of the
experiment selected: 1 May, 2019 – 31 May, 2019. The
experimental data set was split between input and
output data using variable selection, as follows: (i)
inputs (predictors): wind speed [m/s], wind direction
[°], (ii) output (target): cumulative generated energy
[kWh].
Figure 1. Location of a small wind turbine experimental site
2.3 Model development methodology
Machine learning-based approach was utilised in the
predictive model development procedure [3, 4]. Three
machine learning-based candidate models were
developed: (i) decision tree, (ii) random forest, and
(iii) artificial neural network with a single hidden
layer. The selection of candidate model approaches
was taken based on results of the statistical properties
of data.
Decision tree [3] is an optimisation-based model
development approach that returns a tree-like
structured model, comprising the root- (upper),
decision- (intermittent), and leaf-nodes (model
decisions). The model develops in two essential steps:
(i) the feature vector space (X1, X2, …, Xp) is divided
into non-overlapping regions Ri, and (ii) every new
observation of feature vector is assigned to region Ri
based on the mean value of the previous (training)
observations in the same region Ri. Decision tree is a
simple and clear model easily deployed for both the
human assessment and as a computer algorithm. Its
shortcomings include potential over-fitting
(modelling noise rather than a signal) and poor
performance with continuous data.
Random forest utilises the decision-tree concept to
form a forest of decisions that eventually yield the
random forest decision. The random forest
development approach requires the original data set
to be split into a number of sub-sets with randomly
selected data. Then, decision tree models are
developed with every sub-set. Decision, or, estimate,
related to new set of observations is performed by all
the decision trees, and then integrated using either the
democratic procedure (majority/average of votes of
separate decision trees) or using weighted approach,
favouring influential decision trees. Random forest
model encompass variance in data successfully and
tackles over-fitting efficiently, but is computationally
intensive, and not suitable for real-time predictions.
Artificial neural network mimics a human or
animal ones, with artificial neurons being kicked-off
by the appropriate input level, and exchanging their
outputs with other neurons it is connected with.
The artificial neural network (ANN) consists with
neuron layers that receive the inputs (input layer),
those that reside internally within the network
(hidden layers), and the one that provides
decision/estimation results (output layer). While
theoretically an ANN may consist of many internal
layers, a one- or two-hidden layer-architecture may
produce optimal results. ANN is suitable for
modelling the complex systems, where prediction of
behaviour is required without explaining the system.
Model performance assessment was conducted
using two essential model performance indicators: (i)
Predicted-Observed diagram, (ii) adjusted R-squared
coefficient. The P-O diagram is a simple graphical
indicator of model’s performance, designed as a
graphical presentation of observed-predicted pairs.
The adjusted R-squared indicator is defined as
follows. Let denote observations as