\pdfminorversion=6
\documentclass[10pt]{report}
\usepackage[a4paper,landscape,left=16cm,right=1cm,top=1cm,bottom=2cm]{geometry}
\usepackage{lmodern}
\usepackage{microtype}
\frenchspacing
\usepackage{fmtcount}
\usepackage{titlesec}
\titleclass{\part}{top}
\titleformat{\part}[hang]{\sffamily\Large\bfseries}{Part~\numberstring{part}.}{.5em}{}[{\titlerule[1pt]}]
\titlespacing{\part}{0pt}{*0}{*4}
\titleclass{\chapter}{straight}
\titleformat{\chapter}[display]{\sffamily\large\bfseries}{Chapter~\thechapter.}{1ex}{\mdseries\uppercase}
\titlespacing{\chapter}{0pt}{*2}{*2}
\titleformat{\section}[hang]{\sffamily\bfseries}{\thesection.}{.5em}{\uppercase}
\titlespacing{\section}{0pt}{*3}{*3}
\titleformat{\subsection}[runin]{\sffamily\bfseries}{\thesubsection}{.5em}{}[.]
\titlespacing{\subsection}{\parindent}{*1}{*2}
\usepackage{titletoc} % TODO: finish styling of ToC; the commented part below does not work as needed
% \titlecontents{part}[0pc]{}
% {Part~\contentslabel{3em}.}{\hspace*{-3em}}
% {\titlerule*[1pc]{.}\contentspage}
% \titlecontents{chapter}[0pc]{\bfseries}
% {Chapter~\contentslabel{1em}.}{\hspace*{-1em}}
% {\titlerule*[1pc]{.}\contentspage}
% \titlecontents{section}[0pc]{}
% {\contentslabel{2em}.}{\hspace*{-2em}}
% {\titlerule*[1pc]{.}\contentspage}
\titlecontents*{subsection}[4pc]{}
{\thecontentslabel. }{}{~(\thecontentspage)}[. ][]
\usepackage[english]{babel}
\usepackage{booktabs}
\usepackage[inline]{enumitem}
\usepackage[bottom]{footmisc} % avoid floats below footnotes
\usepackage{float,caption}%%%figures side to side text
\usepackage{csquotes}
\usepackage[style=numeric,sorting=none]{biblatex}
\renewcommand*{\bibfont}{\normalfont\small}
\addbibresource{Kuznetsov-1991-en.bib}
\nocite{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25}
\usepackage{mathtools,amssymb}
\numberwithin{figure}{chapter}
\DeclareMathOperator{\atan}{atan}
\usepackage{amsthm,thmtools}
\usepackage{accents}
\usepackage{siunitx}
\usepackage{graphicx}
\usepackage{eso-pic}
\AddToShipoutPictureBG{
\AtPageLowerLeft{
\includegraphics[page=\value{page},width=15cm]{Kuznetsov-1991-scan.pdf}
}
}
\usepackage{tikz}
\usetikzlibrary{decorations.text,patterns}
\usetikzlibrary{positioning,arrows.meta,shapes,decorations.pathreplacing}
\tikzstyle{underbrace style}=[decorate,decoration={brace,raise=2mm,amplitude=3pt,mirror}]
\tikzstyle{overbrace style}=[decorate,decoration={brace,raise=-2mm,amplitude=3pt,mirror}]
\usepackage{hyperref}
\newcommand\gauss[2]{1/(#2*sqrt(2*pi))*exp(-((x-#1)^2)/(2*#2^2))}
\declaretheoremstyle[
bodyfont=\small,
headindent=\parindent,
headpunct=,
postheadspace=0em,
spaceabove=1ex,
spacebelow=1ex
]{smallpar} % TODO: find semantic name; identify instances in text
\declaretheorem[style=smallpar,title=,numbered=no]{smallpar}
\declaretheoremstyle[
spaceabove=1ex,
headindent=\parindent,
headfont=\small\scshape,
notefont=\small\scshape,
notebraces={}{},
headformat=\NAME~\NUMBER. \NOTE, % TODO: special-case without-note, as now there is a spurious period
headpunct=.,
bodyfont=\small,
spacebelow=1ex
]{example}
\declaretheorem[style=example,within=chapter]{example}
\declaretheoremstyle[
headfont=\scshape,
bodyfont=\itshape,
headindent=\parindent,
headpunct=.,
spaceabove=1ex,
spacebelow=1ex
]{theorem}
\declaretheorem[style=theorem,within=chapter]{theorem}
\declaretheorem[style=theorem,within=chapter]{corollary} % TODO: corollaries seem to be numbered per theorem… so this is not correct
\declaretheorem[style=theorem,within=chapter,numbered=no,name=Corollary]{corollary*}
% TODO: proofs in \small font
\newenvironment{addendum}{
\small
\subsection{Addendum}
\begin{enumerate}
}{
\end{enumerate}
\normalsize
}
\newenvironment{conclusions}[1]{
\section{Conclusions}\label{#1}
\small
}{
\normalsize
}
% TODO: define \set, \cset, \abs, \cl?,…
\newcommand{\reals}{\mathbb{R}} % real numbers
\newcommand{\pspX}{\mathcal{X}} % possibility space X
\newcommand{\pspY}{\mathcal{Y}} % possibility space Y
\newcommand{\events}{\mathcal{A}} % set of events
\newcommand{\mifs}{\mathcal{LA}} % measurable and integrable functions
\newcommand{\slhull}{\mathcal{L}^+} % semi-linear hull
\newcommand{\lhull}{\mathcal{L}} % linear hull/span
\newcommand{\pchars}{\mathcal{G}} % primary characteristics
\newcommand{\bchars}{\mathcal{F}} % (upper-mean-)bounded characteristics
\newcommand{\traits}{\mathcal{Q}} % traits (different translation of characteristics?)
\newcommand{\probs}{\mathcal{J}}%%set of probability measures
% TODO: define \bchars variants with _^?
\newcommand{\pr}{P} % probability
\newcommand{\lpr}{\underline{\pr}} % lower probability
\newcommand{\upr}{\overline{\pr}} % upper probability
\newcommand{\mn}{M} % mean (expectation/prevision)
\newcommand{\lmn}{\underline{\mn}} % lower mean
\newcommand{\umn}{\overline{\mn}} % upper mean
\newcommand{\plmn}{\undertilde{\mn}} % primary lower mean
\newcommand{\pumn}{\widetilde{\mn}} % primary upper mean
\newcommand{\plpr}{\undertilde{\pr}} % primary lower probability
\newcommand{\pupr}{\widetilde{\pr}} % primary upper probability
\newcommand{\df}{F}%distribution function
\newcommand{\pldf}{\undertilde{\df}} % primary lower distribution function
\newcommand{\udf}{\overline{\df}} % upper distribution function
\newcommand{\ldf}{\underline{\df}} % lower distribution function
\newcommand{\pudf}{\widetilde{\df}} % primary upper distribution function
\newcommand{\IM}{\mathcal{M}} % Interval Model
\newcommand{\IMfunc}[2]{\langle#1#2\rangle} % Interval Model expressed as a function of its defining objects
\newcommand{\IPT}[1]{\langle#1\rangle} % Interval Probability distribution
\newcommand{\vacuous}{\mathcal{I}} % vacuous Interval Model
\newcommand{\hyperplane}{\mathbf{P}}
\newcommand{\rings}{\mathcal{K}}
\newcommand{\distr}{\mathcal{P}}
\begin{document}
\noindent%
\textbf{\sffamily\itshape\huge%
\uppercase{V\textperiodcentered P\textperiodcentered Kuznetsov}%
}
\vspace{5\baselineskip}
\noindent%
\textbf{\sffamily\Huge%
\uppercase{Interval\\[.7ex]Statistical\\[.7ex]Models}%
}%
\vfill
\Large
\noindent
***publisher(?) logo***
\noindent
Moscow
\noindent
“Radio and Communication”
\noindent
\textsf{1991}
\normalsize
\pagebreak % 2
\noindent
UDC 621.391
\vspace{3\baselineskip}
\small
\textbf{Kuznetsov V. P.} Interval statistical models — ***: Radio and Communication, 1991 — 352 p.: ill. \textbf{ISBN 5-256-00726-2}.
\vspace{1ex}
On the basis of the new axiomatics, the apparatus of fuzzy mathematical models of random phenomena is being developed.
These models cover multiple, interval, fuzzy, and in general any incomplete, and fragmentary statistical descriptions of the characteristics of the phenomenon, approaching probability distributions as the limit of the abundance of data.
The scope of the models ranges from unstable, unique phenomena to statistically resistant to repetitions.
Within these broad limits, the concepts of interval probability and mean are illuminated and interpreted, causal relationships are analyzed, random transformations, relations of dependence and independence, limit laws are investigated, random processes are described and so on.
With regard to new models, criteria are introduced and universal methods are developed for the synthesis of optimal decision rules for estimates, for distinguishing between hypotheses.
The devices that implement them are simple in structure and are able to work effectively in changing environmental conditions, the basis (reason) for which is the choice of reliable models.
The credibility of models is gained by involving a small number of initial and average probabilities in them, presented in interval form, reflecting the instability of real phenomena and the lack of initial data about it.
A joint synthesis of reliable models and decision rules is considered.
For scientists in the field of communications and management; *** can be useful to everyone who is interested in mathematical methods of describing random phenomena and decision-making problems under uncertainty.
Tabl.~1. Fig.~37. Bibliogr. 25 titles.
\vspace{3\baselineskip}
\noindent
\textsc{Reviewer}: prof., dr. techn. sciences PH. P. TARASENKO
\vspace{3\baselineskip}
\bfseries
\noindent
Edition of literature on radio engineering and telecommunications
\vfill
\noindent
K\lower1ex\hbox{\shortstack{2303020000-040\\046(01)-91}}-96-90
\noindent
ISBN 5-256-00726-2 \hfill \copyright\ Kuznetsov V. P., 1991
\mdseries
\normalsize
\clearpage % 3
\setcounter{secnumdepth}{0}
\hfill
\begin{minipage}{13em}\small\itshape
Dedicated\\
to the memory of my mother\\
Kuznetsova Ekaterina Ivanovna
\end{minipage}
\vspace{2ex}
\section{Introduction}
Probability theory is nothing else but a mathematical language for description of random phenomena.
A habit to this language and our skill to use it precludes considering if there are other more convenient forms and descriptions, outside of the boundaries of this extensively used language, which may render simple the situations which within a contemporary language can be hardly “pronounced”.
The development of a new mathematical language, which is broader and more inclusive than the language widely accepted at present, and the use of this new language is the main thrust of this book.
A symbolic language is a means of description as well as communication.
At the same time, it is an instrument which makes it possible to investigate, create, process and construct.
For random statistical methods, it is an instrument for the development of defining rules, algorithms and estimations.
The latter are realised in working applications and devices.
The criterion of viability and acceptability of a new symbolic apparatus is its reliability, fitness-for-purpose, and its ability to perform tasks which were impossible to perform before, to process data which were impossible to process before, simplify tasks which were complex before.
This was the purpose for the introduction of the interval models and the careful development of their methodology.Interval probabilistic and statistic categories make the foundation of this book, which offer a universal method of description both existing knowledge as well as the absence of knowledge, i.e. our ignorance.
These categories are supported by the axiomatic system.
Consequently, a new theory emerges which might appear unfamiliar at the first sight, but which encompasses a great multitude of phenomena of various types such as stable processes defined by probabilities, unstable processes, non-probabilistic phenomena with a small number of only partly researched laws and, finally, phenomena with entirely unknown properties.
It will be shown that the seed of the interval approach lies hidden within the depths of the existing probabilistic paradigm and, therefore, the goal is to grow the seed into a harvest (the first part of the book) and then to collect the harvest (the second part of the book).
\begin{example}
Let a model of a random variable with outcomes on a real (numerical) line be described with a probability density.
Integration over this line gives us probabilities \(\pr(A)\) for intervals \(A\in\pspX\) and their unions (sums) up to countable, which makes up a set \(\events\) of measurable events. % TODO: shouldn't that be \(A\subseteq\pspX\) to be in line with \pspX as possibility space?
\(\events\) cannot contain all events, even if we try to make them progressively smaller by partitioning into very small sets up to Borel’s and further to Lebesgue’s sets [1].
There will always be so-called unmeasurable events \(B\not\in\events\), whose probabilities will be interval \(\lpr(B)\), \(\upr(B)\), defined as internal and external measures by the following formulas:
\(\lpr(B) = \sup_{A:B\supset A\in\events}\pr(A)\),
\(\upr(A) = \inf_{A:B\subset A\in\events}\pr(A)\).
The same is true regarding the mathematical expectations (or the means) of random variables--- functions \(g(x)\), \(x\in\pspX\).
Those will be precise \(\mn{g}\) for the class \(\mifs\) of measurable and integrable functions.
For a large number of other, non-measurable functions \(f\), the expectations will become interval described by the formulas
\(\lmn{f} = \sup_{g:f\geq g\in\mifs}\mn{g}\),
\(\umn{f} = \inf_{g:f\leq g\in\mifs}\mn{g}\)
***
\end{example}
This example allows us to uncover the following structure of the existing probabilistic framework.
Over the nucleus \(\events\), which is a set of events in the sample space \(\pspX\), in the beginning, precise probabilities will be defined, forming a probability distribution.
An effort is made to fill \(\events\) with as many events as possible, so that in the continuation of probabilities, which is achieved by integration over a probability distribution, precise mathematical expectations (means) \(\mn{g}\) could then be obtained for all observable random phenomena.
Those expectations are defined as measurable functions \(g(x)\), \(x\in\pspX\) over outcomes of a random event (functions \(g(x)\) are usually interval-continuous with possible jumps of the first type).
However, despite the efforts, it is impossible to make the probabilities and their means precise for absolutely all events and functions \(f(x)\) (except the processes with a finite and countable number of outcomes), because there will always remain unmeasurable events and functions.
Their number is large and because of them the interval probabilities and interval means \(\lmn{f}\) and \(\umn{f}\) arise when probabilities are continued/extended (??).
This is the first point.
The second point is that intervals \(\lmn{f}\) and \(\umn{f}\) are defined formally over all \(\forall f\), whereas the precise values \(mn{g}\) are only a particular case of the interval means when \(\lmn{g}=\umn{g}\) and therefore they are only their part.
The question may arise why consider practically meaningless unmeasurable functions?
The answer is that the reduction of the nucleus \(\events\), forced by the physical impossibility to measure and even know a great number of probabilities, considerably expands a class of unmeasurable functions which makes the means imprecise.
In addition to this, if we assume that the nucleus \(\events\) might consist only of a small number of events and also that the primary probabilities over \(\events\) are imprecise, i.e. interval, then everywhere probabilities as well as their means will become interval.
Let’s illustrate the interval means for the family of probability distributions with the following example.
\begin{example}
Let’s assume there is a family of parametric probability distributions \(\pr_\theta\), \(\theta\in\Theta\).
Then the means of all measurable functions will be interval, defined as the lower and upper boundaries
\[
\lmn{g} = \inf_{\theta\in\Theta}M_\theta g,
\quad
\umn{g} = \sup_{\theta\in\Theta}M_\theta g.
\]
Substitution of \(g(x)\) with set indicator functions of the event \(A\) (which equals \(1\), if \(x\) belongs to \(A\) and \(0\) if \(x\) does not belong to \(A\)) will lead to the interval probabilities \(\lpr(A)\), \(\upr(A)\) as a component part of the mean.
\end{example}
\pagebreak % 5
In relation to the second example, the second reasonable question arises: isn’t it true that different families of probability distributions are the universal form reflecting any imprecise knowledge about a phenomenon?
It turns out that the interval statistical approach offers a more convenient framework for this purpose, which includes as particular case the probability distributions, their families with all the richness of possibilities and much, much more.
An interval model over the sample space \(\pspX\) is defined formally as a set of the interval means \(\lmn{f}\), \(\umn{f}\), \(\forall f\), linked by an axiomatic system (or set of axioms?).
Functions \(f(x)\), \(x\in\pspX\) are called characteristics.
Any model can be determined by a standard process (Figure~\ref{fig:B.1}).
The nucleus of the model will be formed by a set \(\pchars\) of characteristics \(g\in\pchars\), called primary characteristics, and by setting the boundaries of their means.
The probabilities are considered as a constituent part of the means.
From \(\pchars\), the means are continued/extended to \(\forall f\) and hence they form a statistical model.
\begin{figure}[b]
\centering
\begin{tikzpicture}
\node[draw,label=above:Core model,align=center] (left) {Set \(\pchars\) of functions \\primary averages};
\node[right=10em of left,draw,label=above:Interval model,align=center] (right) {All functions \(\forall f\)\\ interval averages};
\draw[double distance=1ex,-Implies] (left) -- (right)
node[pos=.5,above=.3em] {Continuation}
node[pos=.5,below=.3em,align=center] {Duality\\formula};
\end{tikzpicture}
\caption{Interval models structure.}
\label{fig:B.1}
\end{figure}
There are four key points that distinguish the new theory:\\
%\begin{enumerate}
%\item
\indent 1. The privilege of probabilities to determine a statistical model is taken away.
Instead the probabilities are equated in their importance with a more inclusive concept of the statistical mean of a numerical characteristic.\footnote{An attempt of this kind was made in [2], but was reduced only to a different arrangement of accents in the previous probabilistic language.}
This implies that we do not require a probability distribution as a necessary part of a model and neither do we need algebra of events as a necessary attribute of the nucleus.
This liberates the nucleus \(\pchars\) and the model itself.\\
%\item
\indent 2. Why should necessarily probabilities and their means be precise?
It is always an ideal.
However, a realistic approach is to view them as interval and then it gives us instantaneous freedom.
Indeed, an interval \([0,1]\) taken as a probability of some events means a total absence of knowledge of this probability.
Then it is not required, and this is magnificent, to even think about the existence of a precise probability with an absolute statistical stability.
Let’s allow instability, characteristic to many applications, take place!
If instability is not total, but partial, then it brings us to an interval probability inside \([0,1]\).
Finally, if the lower and the upper boundaries of the intervals coincide, we obtain a precise probability. All of this can be applied to the means as well; only their ranges will be intervals along the entire real line.
\pagebreak
%\item
\indent 3. Another point is the extraordinary flexibility of \(\pchars\) regarding its form and the number of elements as well as a choice of different options to define the means as a precise value, an interval value or even one boundary only.
This ”unfreezes” the structure of the model, allowing to put into the model only the existing information about a phenomenon---on one hand, universally and, on the other hand, extremely parsimoniously (by manipulating the content of \(\pchars\)), with adjustments taking into account the precision of this information and with simplicity as the goal.\\
% \item
\indent 4. The means from \(\pchars\) are extended to all the traits \(\forall f\) by means of the algorithmic formulas of duality. If we visualise a model as a body then the duality formulas move its representation as a family of points (probability distributions, requiring cumbersome formulas for integration), utilising a dualistic approach for solving statistical problems.
%%\end{enumerate}
The universality of the interval statistical models is ensured by the possibility to encompass any amount of knowledge about a phenomenon. For example, in case of a process: if there are no data about the process, then the nucleus is empty and the model is ”raw”.
Then if the mean of the process becomes known, it will correspond to a particular model trait, which will begin to build the borders around its body.
If additionally we are given the mean of intensity, it will supplement the nucleus with another defining trait and therefore will build one more border.
If knowledge about the probability of the mean exceeding the limits is also added, then the content of the nucleus becomes more complex, the model is further refined and therefore becomes more accurate. If in addition this knowledge is enriched with information about correlations, it leads to even further reduction and precision of the model and so on. The ultimate limit of our knowledge will be a single point, i.e. a statistical probability model with a number of borders equal to a number of elementary outcomes.
Therefore, a natural hierarchy is emerging from the simple to the complex, which is similar to the processes we observe in nature in the course of creation and evolution.
At the beginning there is nothing, no knowledge whatsoever and this corresponds to the simplest model. Any information, regardless of how vague it can be, creates some kind of a ready, more-or-less working model. As more information is accumulated, the model improves and becomes more complex. Finally, if precise information about probabilities of events becomes available, then we will have a probability model.
The links among these models are illustrated in Fig.~\ref{fig:B.2}, where arrows indicate a growing complexity of the models.
The black arrows correspond to the classic models, where applications have been going in the direction from probability distributions to their families of progressively more complex forms.
The white arrows lead to the interval models, the hierarchical system of which is along the continuum from zero knowledge to the infinity of knowledge ... towards the “classic” movement.
\begin{figure}[t]
\centering
\begin{tikzpicture}[scale=0.85]
\draw (0,0) -- (0,7.5);
\draw (-1.5,7) node(nonverbal1) {\tiny \begin{tabular}{c}Non-verbal\\ approaches \end{tabular}};
\draw (-0.3,3) node[rectangle,draw,rounded corners,opacity=1,fill=white,inner sep=0pt] (families) {\tiny \begin{tabular}{c} Families \\ of finite\\ additive\\ probability\\distributions\end{tabular}};
\draw (0,5.5) node[rectangle,draw,rounded corners,fill=white,inner sep=0pt](robust) {\tiny \begin{tabular}{c}Robust\\ approach \end{tabular}};
\draw (-3.3,3) node[rectangle,draw,rounded corners,fill=white,inner sep=0pt](divisions) {\tiny \begin{tabular}{c} Divisions\\ given by\\ primary\\ means \end{tabular}};
\draw (-5.9,3) node[rectangle,draw,rounded corners,fill=white,inner sep=0pt](ignorance) {\tiny \begin{tabular}{c} Complete \\ ignorance \end{tabular}};
\draw (-5.6,5.2) node[rectangle,draw,rounded corners,fill=white,inner sep=0pt](fuzzy) {\tiny \begin{tabular}{c} Fuzzy\\ set\\ theory \end{tabular}};
\draw (-4,4.8) node[rectangle,draw,rounded corners,fill=white,inner sep=0pt](moment) {\tiny \begin{tabular}{c} Moment \\ approach \end{tabular}};
\draw (-2,4.6) node[rectangle,draw,rounded corners,fill=white,inner sep=0pt](nonverbal) {\tiny \begin{tabular}{c} Non-verbal \\ probability\\
distributions \end{tabular}};
\draw (-1.5,1.1) node[rectangle,draw,rounded corners,fill=white,inner sep=0pt](subjective) {\tiny \begin{tabular}{c} Subjective\\ character \end{tabular}};
\draw (-4.7,1.1) node[rectangle,draw,rounded corners,fill=white,inner sep=0pt](interval) {\tiny \begin{tabular}{c} Interval\\ analysis \end{tabular}};
%%%
\draw (2.3,3) node[rectangle,draw,rounded corners,fill=white,inner sep=0pt] (families2) {\tiny \begin{tabular}{c} Families of\\ countably\\ additive\\ probability\\ distributions \end{tabular}};
\draw (1.5,7) node(classicModels) {\tiny \begin{tabular}{c} Classic \\ models \end{tabular}};
\draw (4.6,3) node[rectangle,draw,rounded corners,fill=white,inner sep=0pt] (countable) {\tiny \begin{tabular}{c} Countable\\ additive\\ probability \end{tabular}};
\draw (6.9,3) node[rectangle,draw,rounded corners,fill=white,inner sep=0pt] (uncount) {\tiny \begin{tabular}{c} Uncountable\\ additive\\ probability \end{tabular}};
\draw (2.3,4.8) node[rectangle,draw,rounded corners,fill=white,inner sep=0.pt] (nonp) {\tiny \begin{tabular}{c} Non-parametric\\ statstical\\ methods \end{tabular}};
\draw (2.3,1.1) node[rectangle,draw,rounded corners,fill=white,inner sep=0.pt] (finitely) {\tiny \begin{tabular}{c} Finitely\\ additive\\ probabilities \end{tabular}};
%%%%
\draw[->,line width=3pt,-{Triangle[width=8pt,length=8pt]},shorten <= -8pt] (classicModels) -- (0.05,7);
\draw[->,double,double distance=3pt,-{Triangle[open,width=8pt,length=8pt]},shorten <= -8pt] (nonverbal1) -- (-0.05,7);
\draw[->,double,double distance=3pt,-{Triangle[open,width=8pt,length=8pt]},shorten <= 0pt] (ignorance) -- (divisions);
\draw[->,double,dashed,double distance=3pt,-{Triangle[open,width=8pt,length=8pt]},shorten <= 0pt] (divisions.north west) -- (fuzzy.south);
\draw[->,double,double distance=3pt,-{Triangle[open,width=8pt,length=8pt]},shorten <= 0.1pt] ([xshift=-9]divisions) -- (moment);
\draw[->,double,double distance=3pt,-{Triangle[open,width=8pt,length=8pt]},shorten <= -1.0pt] ([xshift=2]divisions.south west) -- (interval);
\draw[->,double,double distance=3pt,-{Triangle[open,width=8pt,length=8pt]},shorten <= -1.1pt] ([xshift=-0.7]divisions.south east) -- (subjective);
\draw[->,double,double distance=3pt,-{Triangle[open,width=8pt,length=8pt]},shorten <= 0pt] (divisions) -- (families);
\draw[->,double,double distance=3pt,-{Triangle[open,width=8pt,length=8pt]},shorten <= -0.3pt] ([xshift=-2]divisions.north east) -- (nonverbal);
\draw[->,double,double distance=3pt,-{Triangle[open,width=8pt,length=8pt]},shorten <= 0pt] (families) -- ([xshift=13]robust.south west);
\draw[->,double,double distance=3pt,-{Triangle[open,width=8pt,length=8pt]},shorten <= 0pt] (families) -- (families2);
\draw[->,double,double distance=2pt,-{Triangle[open,width=6pt,length=4pt]},shorten <= 0pt] ([yshift=-4]families2.east) -- ([yshift=-4]countable.west);
\draw[->,double,double distance=3pt,-{Triangle[open,width=8pt,length=6pt]},shorten <= -0.8pt, shorten >= -1.0pt] ([xshift=-0.8]families.south east) -- (finitely.north west);
\draw[->,double,double distance=3pt,-{Triangle[open,width=6pt,length=6pt]},shorten <= -1.9pt, shorten >=0.2pt] ([xshift=0.2]finitely.north east) -- ([xshift=-7]countable.south);
\draw[->,double,double distance=2pt,-{Triangle[open,width=6pt,length=4pt]}] (countable) -- (uncount);
\draw[->,line width=3pt,-{Triangle[width=8pt,length=8pt]},shorten <= 0pt, shorten >=0pt] (families2) -- (nonp);
\draw[->,line width=3pt,-{Triangle[width=8pt,length=8pt]},shorten <= -1.5pt, shorten >=0pt] (families2.north west) -- (robust);
\draw[->,line width=2pt,-{Triangle[width=6pt,length=4pt]},shorten <= 0pt, shorten >=0pt] ([yshift=4]countable.west) -- ([yshift=4]families2.east);
\end{tikzpicture}
\caption{Model hierarchy.}
\label{fig:B.2}
\end{figure}
These two epistemologically opposite approaches tend to have different problems.
In the classic approach, the main problem is a priori ignorance and uncertainty, which compels to add complexity to the model through shifting from a probability to a family of probabilities with the purpose of alleviating the burden of the immense number of probabilities loaded into the modern/contemporary model. In our interval approach on the other hand, the model is ”loaded” gradually: the more knowledge we have, the more complex and accurate the model becomes.
The problem is then the parsimonious representation of knowledge, enclosing information into the nucleus, and throwing away anything inessential or secondary if simplifications are required.
Thus in contrast to the probability theory, which develops a “singular” (???) structure of the models and excludes the hierarchy of knowledge, we will be conceptualising the model by means of its external (???) traits or its cover.
Information is put into the borders, formed by the primary means, and depending on their number and content the models will grow more complex creating the hierarchy.
The hierarchy of interval models can be transferred to the derived from them optimal defining rules.
In case of a small number of given data, the rules will have bad qualitative characteristics, but on the positive side, they will be stable in response to external/endogenous (???) anomalies and instabilities.
As knowledge about a given phenomenon accumulates, the quality will improve and the defining rules, on average, will become more complex and more selective to different conditions because they will become attuned to one or more of these conditions.
In this situation, it appears that the classic probability models clearly have a qualitative advantage and therefore only the classic models should be used.
Sadly, however, it is only self-delusion, since information necessary for the development of these models is almost never available and also the choice of these models is quite arbitrary (this choice tends to gravitate towards a normal distribution due to its simplicity) and in reality is more declarative rather than fact-based.
Adequate models should not depend on anything but existing proved knowledge, always finite, and therefore represented in a confidence interval (fuzzy) form for reliability.
This is that part of the interval models which in Figure~\ref{fig:B.2} are situated on the left from a vertical line.
From them the rules will inherit reliability and simplicity.
Ordering the interval models according to the number and content of data allows us to conceive a portfolio of defining rules, in which the researcher, taking into account the available data, will be able to find the model and the corresponding optimal rule and then, depending on whether he/she is satisfied with the quality, to decide if existing data have to be clarified or additional data have to be collected (through experimentation, more detailed analysis, etc.) thus developing the initial model into a more complex one.
However, at present considerations about such a portfolio are restricted to the results of this monograph only. Further development would require efforts of all researchers who would be interested in this approach.
The groundwork is done and the theoretical methods allow the use of computers due to their algorithmic nature.
The history of this theory is as follows. The author gradually developed dissatisfaction with the existing paradigm after first attempting to use invariant and non-parametric methods for research tasks in the field of engineering research
[3-10] and then robust methods [11-13] ([12] is an overview, including some of the author’s works). This dissatisfaction led to search for something new. The method of moments held some promise, yet it didn’t develop to the fullest of its capacity. The reason for that is its closeness to the non-classical interval statistical methods.
The theory of subjective probabilities didn’t become widespread also because it was at odds with the classical methods.
The theory of fuzzy sets by Zadeh moved entirely away from probabilities.
The interval analysis has been developing somewhere on the side [16].
The theory of Chebyshev's generalised inequalities did not receive a well-earned autonomy.
These research developments prepared the grounds for the emergence of the idea which is the backbone of this monograph: to choose such a theoretical foundation that will enable us from a single platform to incorporate all the approaches described above with the purpose not only to see them in a new light, but also to significantly expand the possibilities for their applications (the application of the new ideology to the integral theory can be found in [19]).
Reading this book would perhaps require from the reader a considerable patience since the classical analogies are not possible when describing the new theory.
Conclusions in each chapter can be of help since they outline the content of each chapter and link them together under a single theoretical framework.
The book was written at the Department of Computational Mathematics at (...).
The target audience for this book are researchers, engineers, Masters and PhD students with an appropriate theoretical background. It is the author’s pleasure to express the deep gratitude to all those who contributed to the formation of this theory and the publication of this book.
Due to the novelty of the material, there might be some imperfections for which the author takes responsibility.
\setcounter{secnumdepth}{1}
\part{Interval models}\label{part:1}
\chapter{Description of stochastic events}\label{cha:1}
\section{Interval probabilities and means}\label{sec:1.1}
\subsection{The sample space}
We live in a world of randomness, surrounded by unexpected actions and events that are not fully predictable.
The roots of this randomness are diverse; it may originate from physical effects such as shot noise, but also from the inability to predict in its entirety the course of a process or the behavior a living organism (in particular, an individual), from our own ignorance (or unwillingness to know) the result of the upcoming (past) experiment, and so on.
As a result, we obtain statements such as it may or may not happen, to be or not to be.
This randomness covers all matters of chance, and our goal is to describe it.
Formally, a \emph{random phenomenon} is defined as a set of mutually exclusive events, called \emph{elementary}, one of which must occur, although it is unknown.
\begin{smallpar}
For example, if a coin is thrown once, the result will be either heads or tails; if it is thrown twice, then there will be either three elementary outcomes (two heads, two tails or two different outcomes) or four (if the order in which the outcomes are obtained is taken into account).
If we choose a point in the real line \(\reals\) the result is a number (a random variable), and for a random process \(x(t)\) this is a function of time \(t\).
\end{smallpar}
The set of all elementary outcomes is formally denoted as an abstract set \(\pspX\), and is called as \emph{space
elementary outcomes}; any elementary outcome is thus a point \(x\) of this space, i.e. \(x\in\pspX\).
Within the set \(\pspX\), we shall consider subsets \(A\subset \pspX\) and call them \emph{random events}.
They occur when the outcome of the experiment is one of the elementary outcomes \(x\) that are included in them.
\textsc{Comments.} % TODO: environment?
\begin{enumerate}
\item
This definition coincides with the classical one in terms of introducing the space of elementary outcomes \(\pspX\), but it does not connect randomness with probabilities: it is understood in a broader sense.
\item
It may seem that the introduction of space \(\pspX\) and by the link above we have somehow narrowed the type of randomness we cover.
In fact \(\pspX\) is an element of the mathematical model we are creating, that is at our disposal and can be made as general as we want, including conceivable and even unthinkable outcomes, if that is convenient.
For example (as students often suggest), when throwing
coins we can enable the event that the coin will stand on the edge or that the win in the game will be equal to infinity: \(\infty \in\pspX\).
You can generally talk about the space of all that it may or may not be, although it is unlikely that this will produce an economic description.
\item
Elementary outcomes are not necessarily physical reality, since there may be, for example, fuzzy events (which will be discussed later).
Nevertheless, the phenomenon in our definition will be random, if it is possible to at least link the outcomes to some abstract space \(\pspX\), which is usually the case.
\item
Deterministic phenomena are a special subclass of random ones, when the space of elementary outcomes contains only one element.
\end{enumerate}
\subsection{Features of the phenomenon}
Any measurement in its essence is a a translation of qualitative indicators into quantitative ones with a representation in numeric form.
Thus, it is convenient to characterize the results of a random phenomenon as a number or set of numbers, disregarding the specific content of elementary outcomes and the events (whose description may be very complex).
It suffices to have a number \(f(x)\) instead of every elementary outcome \(x\), i.e., to assign \(x\rightarrow f(x)\) for every \(x\in\pspX\).
The numeric function \(f\) on the space of elementary outcomes are called quantitative characteristics, or simply characteristics of a random phenomenon; it will be denoted in the abbreviated form \(f\); strictly speaking, this is a mapping from \(\pspX\) to the real numbers \(\reals\): \(\pspX\xrightarrow{f}\reals\).
Examples of features: if \(f\) is the number of heads in two tosses of a coin, then \(f(TT)=0, f(TH)=f(HT)=1, f(HH)=2\).
As another example, the random winnings in a gambling game, whose results can depend on rather complex combinations of conditions, can be modelled by the amount of money \(f\).
There are as many different characteristics as you can think of: as many different functions \(f(x)\) as possible.
A characteristic \(f\) together with the space \(\pspX\) plays a fundamental role in the subsequent developments, because they can be used to describe many things, if not all.
In particular, any events, predicates, or statements related to a phenomenon.
For instance, the elementary outcome \(x_1\) is described by the \emph{delta function} \(\delta_{x_1}(x)\), that takes the value \(1\) if \(x=x_1\) and \(0\) if \(x\neq x_1\); a subset \(A\subset \pspX\) (event A) is represented by the characteristic function \(A(x)\), that takes the value \(1\) if \(x\in A\) and \(0\) if \(x\notin A\).
We shall call \(A(x)\) the \emph{indicator} of an event \(A\).
The two values \(1\) and \(0\) taken by this characteristic indicate whether the element \(x\) belongs to \(A\) or not, and only that (Fig.~\ref{fig:1.1}).
More generally, we may consider the notion of a fuzzy event, with a not so categorical definition of belonging.
It is a function \(q(x)\) taking values between \(0\) and \(1\): \(0\leq q(x)\leq 1\) for every \(x\in\pspX\), and we will refer to it as \emph{characteristic function of the fuzzy event}.
Since a fuzzy event may not have its own description in the language of a random phenomenon,we shall simply refer to any characteristic \(q(x)\) taking values between \(0\) and \(1\) as a fuzzy event.
The value \(q(x)=1\) indicates that \(x\) belongs to the event modelled by \(q\), while \(q(x)=0\) indicates that it does not; \(q(x)=0.5\) indicates a balance between confidence and doubt: it means that \(x\) may be included in the event or not.
Other intermediate values between \(0\) and \(1\) give preference to either the confidence that \(x\) is included in the event (if \(q(x)>0.5\)) or the opposite.
\begin{figure}[t]
\centering
\begin{tikzpicture}%[scale=0.3]
\draw[scale=0.5, domain=-3:3, smooth, variable=\x] plot ({\x}, {10/(sqrt(2*pi*.8))*exp(-\x*\x/(2*.8))});
\draw (0,2.5) node {\scriptsize \(q(x)\)};
\draw[->] (-4,0) -- (4,0);
\draw (4,-0.2) node {\scriptsize \(x\)};
\draw[dashed] (-5,2.26) -- (5,2.26);
\draw (1.5,0) -- (1.5,2.26) -- (3,2.26) -- (3,0) -- (1.5,0);
\draw (2.25,2.5) node {\scriptsize \(A(x)\)};
\draw [underbrace style] (1.5,0) -- (3,0) node[pos=0.5,below,yshift=-3mm] {\scriptsize \(A\)};
\draw (-3.5,0) -- (-3.5,2.26);
\draw (-3.5,-0.2) node {\scriptsize \(x_1\)};
\draw (-3.5,2.5) node {\scriptsize \(\delta_{x_1}(x)\)};
\draw (-3.5,-0.9) .. controls (-2.1,-.5) and (-2.5,2) .. (-1.5,2.65);
\draw (-1.1,2.6) node {\scriptsize \(f(x)\)};
%%%
\draw (-3.7,3.05) node[scale=0.9] {\scriptsize \begin{tabular}{c}Delta\\ function\end{tabular}};
\draw (-1.4,3.0) node[scale=0.9] {\scriptsize Function};
\draw (0,3.0) node[scale=0.9] {\scriptsize \begin{tabular}{c} Fuzzy\\ event\end{tabular}};
\draw (2.25,3.0) node[scale=0.9] {\scriptsize \begin{tabular}{c} Indicator\\ function\end{tabular}};
\end{tikzpicture}
\caption{Functions, events.}
\label{fig:1.1}
\end{figure}
Therefore, each event can be identified with an equivalent characteristic, and all events correspond to a subclass of characteristics - those functions taking values from 0 to 1.
It is interesting and important to note that the logic of events, that is, the rules according to which some events logically form others, corresponds to well-defined arithmetic actions between characteristics: “Not \(q\Leftrightarrow 1-q\)”, “\(q_1\) and \(q_2 \Leftrightarrow q_1(x)q_2(x)\)”, “\(q_1\) or \(q_2 \Leftrightarrow \min\{1, q_1(x)+q_2(x)\}\)”.
characteristics describe not only events, but also many other features of a random phenomenon, such as its output, productive value, etc.
Any operation between characteristics, such as addition, multiplication, etc., leads to a new characteristic.
In this manner, the set of all characteristics together with the usual operations on them as real functions of a variable x gives a universal device for describing everything that is connected one way or another with the outcomes, the random phenomenon or anything that may be obtained from them using logical, arithmetical or analytic actions.
If \(g(x)\geq f(x)\), \(\forall x \in \pspX\), then we say that the characteristic \(g\) \emph{dominates} the characteristic \(f\), and we denote it \(g\geq f\).
This means that, whatever the outcome, characteristic \(g\) will take a value that is at least as large as that taken by characteristic \(f\).
In general, no two characteristics can be subject to such a judgement, since this corresponds to a partial ordering.
\subsection{Expected value of a characteristic}
Consider a characteristic \(f\) and whatever information may be available about it a priori (assuming that the phenomenon has not yet occurred or the result is not known).
The first is the range of possible values determined by \(f(x)\).
This set of values may be discrete, as is the case for instance for indicator characteristics \(A(x)\), that take the values \(0\) and \(1\).
It may also be the interval \([\min f, \max f]\) that goes from the minimum to the maximum values taken by \(f\).
For generality, given that the minimum and maximum are not always achieved, we replace them with the infimum and supremum \([\inf f, \sup f]\).
The second is the information about the average properties of \(f\).
Namely, the expected value \(\mn{f}\) (the notation \(\mn\) comes from the word \emph{mean}) will be called the \emph{exact mean of a characteristic} \(f\).
\begin{smallpar}
How does this work?
The simplest approach is based on the symmetry of the experiment.
For example, players who play heads-and-tails are well aware that when on equal bets, the chances of winning and losing are the same, i.e. on average they will have zero.
In ancient times, they threw Astragalus - bones of animal limbs - and to ensure “equal chances” each time the partners changed places.
\end{smallpar}
Accurate information can be found by studying the internal mechanisms of a phenomenon, its nature, as it is done in statistical physics.
It is possible to estimate the average based on the results of a preliminary survey, observations, artificially created training experiments, or tests.
If tests independent and conducted under the same conditions, the average will be the limit of the mean; it is arithmetic of the observed values of the characteristic for an unlimited increase in the number of tests.
Finally, just by experience, which is understood as a set of direct and indirect information about the phenomenon, one can approximately know what is expected on average.
In fact, each of us guesses how much time he will spend on the road on average
to the place of work or to another place, what is the average expected income from the planned business or expense from a tourist trip etc.
As a special case of such an average, when \(f(x)=A(x)\)- the indicator of an event \(A\)- we obtain the probability of the event, denoted as \(\pr(A)=\mn{A(x)}\) (the notation \(\pr\) coming from the word probability).
Probability is the average expected number of occurrences of A in independent repeated trials, divided by the number of trials.
The key in all the previous arguments is that that in the strict sense, exact averages (and probabilities) are parameters of a statistically stable phenomenon, and they are achieved by averaging the same phenomenon with unlimited repetition under independent and stable conditions.
Since it is sometimes difficult to organize a stable repetition, and an unlimited number of times is simply impossible, it often implies a conceivable or speculative repetition.
In order to “play” the phenomenon an unlimited number of times, either in your mind or in a computer, it is necessary to know the physical model of the phenomenon, its nature.
Thus, in the case of a symmetrical coin, its, not necessarily to make the toss, because it is clear that the average number of heads must be equal to the number of tails.
This is a classic example for definitions of the exact average.
However, practice does not always echo theory, and reality does not always echo what is desired.
Real phenomena are often such that their internal mechanisms do not fully lend themselves to research, experiments are unique, their repetitions are unstable, and preliminary observations limited.
As a result, the exact average remains as perfect concept reached in the limit, whose use is accompanied by many “if only” or “let\textellipsis”.
\pagebreak % 13
\subsection{Interval averages and probabilities}
Our idea is that not only the instability of phenomena, but also any non-absolute statistical knowledge (insufficiency, inaccuracy,
limitations) inherent in almost all real tasks, naturally forces the transition to interval concepts.
Let's expand the concept of the average, abandoning its definition as numbers.
The \emph{interval mean} of characteristic \(f\) is an interval \([\lmn{f}, \umn{f}]\), with boundaries \(\lmn{f}\) (lower mean) and \(\umn{f}\) (upper mean).
In the particular case of equality \(\lmn{f}=\umn{f}=\mn{f}\), the interval reduces to a point and is denoted without lines.
Another special case is that when the boundaries of the interval mean coincide with
with the minimum and maximum values of the function: \(\lmn{f}=\min f, \umn{f}=\max f\).
This means that nothing is known about the average expected value of the f attribute: it is any value in the range of f values.
Here it is no longer important whether the phenomenon is stable or unstable, or whether you can whether to organize repetitions or not: nothing is nothing.
Thus, the interval average \([\lmn{f},\umn{f}]\) gives a wide coverage of the description of the average properties of characteristics, from complete ignorance to exact knowledge.
It may be interpreted in several different ways.
As a range of possible values of an existing (but unknown) exact value \(\mn{f}\) in a statistically stable experiment, and as a protective degree of tolerance on \(\mn{f}\) to account for the instability of the phenomenon; and also as a more general concept than the exact average, when the latter is not determined due to the circumstances previously mentioned.
What is important is that in contrast from the exact average, interval average always exists at least because that it is always possible to go to the extreme case where we take the full range of values as the interval average.
When \(f(x)=A(x)\), the interval average becomes an interval of probabilities \([\lpr(A),\upr(A)]\); \(\lpr(A)=\lmn A,\upr(A)=\umn A\).
Let us give some simple examples to illustrate some of the interpretations.
\begin{example}
Consider an astragalus (a bone of the limb of the animal) that has four possible outcomes when throwing it.
It is clear that the exact probabilities of all these outcomes exist.
However, it takes an infinitely long time to find them experimentally.
If we consider an experiment that is limited in time, it is possible to estimate these probabilities only approximately by means of some confidence intervals of values.
This will be instances of probability intervals.
\end{example}
\begin{example}
Consider two identical-looking dices with sides numbered 1 to 6.
For a symmetrical dice, the probability of falling out one particular face, say a six, is equal to 1/6.
Assume the other dice has center of gravity shifted to the opposite side of the six (this was done for instance in the middle ages using dices filled with pieces of metal), so that the probability of getting six points becomes greater than \(\frac{1}{6}: p_6>\frac{1}{6}\).
\pagebreak
It is not known which of the dices and in what order is used in the game.
Then the probability of six will be an interval \([\frac{1}{6},p_6]\).
As the repetitions of the increase, the relative the frequency of sixes in different series can converge to any number from this interval.
Here the instability is caused by human intervention factor in the form of a completely unknown substitution strategy.
Also in some strategies, the interval is narrowed down to exact values (for instance when we only consider one of the dices).
\end{example}
\begin{example}
Assume a coin is not thrown, but one of its sides is shown to some subjects.
This will illustrate the statistical instability that is inherent to the human factor, even if we want to consider independent and identically distributed repetitions.
The average result is only an interval probability, that depends on the subjective features of each subject.
\end{example}
\subsection{Mathematical modelling of the phenomenon}
A mathematical model is a symbolic form of describing the outcomes of the phenomenon together with their inherent regularities, as we have understood from our previous arguments.
Consider a space \(\pspX\) of elementary outcomes, that defines an uncountable set of characteristics, some of which are dominated by others, some are defined as linear combinations, and so on.
For those characteristics that are uniformly bounded, that we shall denote \(\bchars_{00}=\{f: \sup_x |f(x)|<\infty\}\), there are interval averages \(\lmn f, \umn f\) within the range of values of \(f\).
But that may not be enough.
Our models will become more universal if we selectively allow the existence of averages on some unbounded characteristics.
In some cases, this is natural: for instance, if the attribute f is not bounded at the bottom, but it is bounded at the top \(\sup f=H<\infty\) for some number \(H\), then the upper average \(\umn f\) must be smaller than or equal to \(H\), and therefore it exists.
As a consequence, we have that \(\umn f<\infty\) on the class \(\bchars_0=\{f: \sup f<\infty\}\) of bounded above functions.
Similarly, the lower average \(\lmn f\) exists on the class \(-\bchars_0\) (obtained from \(\bchars_0\) by changing the sign of the functions) of all bounded below functions.
More work is needed in order to determine conditions for the existence of the averages for a broader classes of unbounded characteristics.
Let us denote \(\bchars\) the class of characteristics such that \(\umn f<\infty\), the \emph{area of existence of the upper averages}.
We will see further on that changing the sign of a characteristic results in a class \(-\bchars\) for which the lower average exists: we can compute \(\lmn{f}\) for every \(f\in\bchars\).
At the intersection we can compute both the lower and upper averages, and so it exists the interval average.
We do not exclude that \(\bchars=\bchars_0\), but this will correspond to the narrowest possible class.
The class \(\bchars\) must satisfy the following three properties:
\begin{enumerate}[label=C\arabic*., ref=C\arabic*]
\item\label{item:C1}
\(g\in\bchars, f\leq g \Rightarrow f\in\bchars\).
\item\label{item:C2}
\(f\in\bchars, c, b^+ \in\reals, b^+\geq 0 \Rightarrow b^+ f+ c \in\bchars\).
\item\label{item:C3}
\(f,g\in\bchars \Rightarrow f+g\in\bchars\).
\end{enumerate}
(Obviously, these hold for \(\bchars_0\) too).
\pagebreak
From these properties, it follows that: 1) \(c\in\bchars\), where \(c\) is used to denote the characteristic that is constant on the value \(c\in\reals\) (this follows from \ref{item:C2} with \(b^+=0\)); 2) \(\bchars \supset \bchars_0\) (use \ref{item:C1} and 1); 3) \(f_i\in\bchars, i=1,\dots,k \Rightarrow c+\sum_{i=1}^{k} b_i^+ f_i\in \bchars\), where the plus sign is used to indicate the non-negativity of the numbers, that is, \(b_i^+\geq 0\).
This last property is called the \emph{half linearity}, and a class \(\bchars\), with properties \ref{item:C1}, \ref{item:C2}, \ref{item:C3} is called \emph{semilinear}.
The mathematical model of the phenomenon includes: (a) a space \(\pspX\) of elementary outcomes; (b) a semilinear class \(\bchars\) of characteristics (that should include \(\bchars_0\)) and their averages; all together \(\{\pspX,\bchars,\umn,\lmn\}\) or \(\{\pspX,\umn,\lmn\}\) if \(\bchars=\bchars_0\).
Let us focus now on the main part of the model: the requirements on \(\umn\) and \(\lmn\).
\subsection{Axiomatics}
The averages \(\lmn f,\umn f\) of a certain characteristic must be related, as we see for instance in examples 1 and 2 in the introduction.
Even if we consider a model where the lower and upper averages are assessed independently, they must follow some relationships.
Among these, we need to highlight the fundamental ones, called \emph{axioms}, while the others will be derived from them.
It is important how many axioms you set.
Since a very tight relationship between the averages will narrow the classes of possible models, we think it is preferable to consider as general a design as possible, keeping in mind that it can always be made tighter by implementing additional properties inside a specific model or class of models.
With this in mind, the list of axioms should be as small as possible, while remaining within the limits of physical interpretability of the models (otherwise we should consider a different theory).
\textsc{Axioms on the averages.} % TODO: environment?
For every \(f,g\in\bchars\):
\begin{enumerate}[label=A\arabic*., ref=A\arabic*]
\item\label{item:A1}
\(g\geq f \Rightarrow \umn g\geq \umn f\);
\item\label{item:A2}
\(\umn(b^+ f+c)=b^+ \umn(f)+c\); \(\forall b^+\), \(c\in\reals, b^+\geq 0\);
\item\label{item:A3}
\(\umn(f+g)\leq \umn(f)+\umn(g)\);
\item\label{item:A4}
\(\lmn(-f)=-\umn(f).\)
\end{enumerate}
Here the arrow \(\Rightarrow\) should be read as “implies”, and the upper sign plus indicates that a number is non-negative.
Let is discuss the axioms.
\begin{itemize}
\item[\ref{item:A1}]
\textsc{Monotonicity axiom}: if the characteristic \(g\) dominates \(f\), then its upper average should be at least that of \(f\).
It cannot be otherwise, since the values of \(g\) will always be at least as large as those of \(f\).
\item[\ref{item:A2}]
\textsc{Transfer axiom}: multiplying a characteristic by a non-negative number and adding a constant to it leads to the same operations on \(\umn f\).
This is logical, because it is how each characteristic value is transformed.
\item[\ref{item:A3}]
\textsc{Subadditivity axiom}: the upper average of the sum of two characteristics cannot exceed the sum of their upper averages.
In fact, for the sum of two identical features \(f+f\), it follows from \ref{item:A2} with \(c=0, b^+=2\) that \(\umn (f+f)=\umn f+ \umn f\).
This is a just a case of “comonotone” addition, where the positive part is added with the positive part and the negative one is added with the negative one.
In general, the addition of two characteristics will not be in phase, which leads to \(\umn(f+g)\) being smaller in comparison to \(\umn f + \umn g\).
\item[\ref{item:A4}]
\textsc{Conversion axiom}: it connects the lower and the upper averages.
It follows from the fact that for any interval \([\underline{m},\overline{m}]\), if we change the sign then the boundaries are swapped: \([-\overline{m},-\underline{m}]\).
According to this axiom, if \(\umn\) is defined on \(\bchars\), then \(\lmn\) will be defined on \(-\bchars\).
\end{itemize}
Of course, it is a matter of “taste” which of the equivalent properties are taken as axioms, so this choice does not need to be discussed.
\subsection{Basic properties of the interval model of averages}
The interval model of averages (abbreviated from now on as IM) if the set of upper averages \(\umn f\) for those characteristics \(f\in\bchars\) (satisfying properties \ref{item:C1}, \ref{item:C2}, \ref{item:C3}) and lower averages \(\lmn f\) for \(f\in-\bchars\), where these upper and lower averages are assumed to satisfy axioms \ref{item:A1}--\ref{item:A4}.
It shall be denoted as \(\IM\) or \(\IMfunc{\umn}{\bchars}\).
The following properties are derived directly from the axioms (assuming all characteristics belong to the corresponding domain):
\begin{enumerate} % TODO: some \textsc needed and more faithful following the original
\item\label{item:prop1}
\(\lmn(b^+ f+c)=b^+\lmn(f)+c\) (follows from \ref{item:A2} and \ref{item:A4});
\item\label{item:prop2}
For any constant \(c\), \(\umn c=c\) (use \ref{item:A2} and \ref{item:prop1} with \(b^+=0\));
\item\label{item:prop3}
\(\lmn(f+g)\geq \lmn(f)+\lmn(g)\) (use \ref{item:A3} and \ref{item:A4});
\item\label{item:prop4}
\(\umn(\sum_{i=1}^k f_i)\leq \sum_{i=1}^{k} \umn(f_i)\) (upper subadditivity- follows from \ref{item:A3} by induction);
\item\label{item:prop5}
\(\lmn(\sum_{i=1}^k f_i)\geq \sum_{i=1}^{k} \lmn(f_i)\) (lower superadditivity- follows from \ref{item:prop3} by induction);
\item\label{item:prop6}
\(\lmn f\leq \umn f\) (\ref{item:A3} and \ref{item:A4} imply that \(0\leq \umn(0)=\umn(f-f)\leq \umn(f)+\umn(-f)=\umn(f)-\lmn(f)\));
\item\label{item:prop7}
\(f\geq f \Rightarrow \lmn f \geq \lmn g\) (use \ref{item:A1} and \ref{item:A4});
\item\label{item:prop8}
\(\inf f \leq \lmn f, \umn f \leq \sup f\) (use \ref{item:prop7}, \ref{item:A1} and \ref{item:prop2}, and \(\inf f \leq f \leq \sup f\));
\item\label{item:prop9}
\(|\umn f|=\max\{|\umn f|, |\lmn f|\}\leq \umn |f|\) (use that \(\pm f\leq|f|\Rightarrow \umn f\leq \umn |f|,\) \(-\lmn f \leq \umn |f|\)); this is called the maximum modulo average;
\item\label{item:prop10}
\(\lmn(f+g)\leq \lmn f+ \umn g \leq \umn (f+g)\) (pseudo-additivity): use \ref{item:prop3} and \ref{item:A4} to conclude that \(\lmn f=\lmn(f+g-g)\geq \lmn(f+g)-\umn(g)\), and \ref{item:A3} and \ref{item:A4} to conclude that \(\umn(g)=\umn(g+f-f)\leq \umn(g-f)-\lmn(f)\);
\item\label{item:prop11}
\(\mn{g}\) precise \(\Rightarrow \umn(f+g)=\umn{f}+mn{g}, \lmn(f+g)=\lmn{f}+\mn{g}\) (apply \ref{item:A3}, \ref{item:prop3} and \ref{item:prop10});
\item\label{item:prop12}
\(\mn(\sum_{i=1}^k f_i)=\sum_{i=1}^{k} \mn(f_i)\) (finite additivity of exact averages);
\item\label{item:prop13}
Continuity with respect to uniform convergence as \(n\rightarrow \infty\):
\[
\sup_x |f_n(x)-f(x)| \rightarrow 0
\Rightarrow
\lmn(f_n)\rightarrow \lmn(f), \umn(f_n)\rightarrow \umn(f)
\]
(use that \(|\lmn(f_n)-\lmn(f)|\) and \(\umn(f_n)-\umn(f)|\)) are dominated by \(\sup_x|f_n(x)-f(x)|\)).
\end{enumerate}
Therefore, we obtain natural properties such as that the lower average cannot exceed the upper \ref{item:prop6} and that it cannot be smaller than the infimum of the function \ref{item:prop8}.
We also obtain that the transfer property is also valid for \(\lmn\).
Properties \ref{item:prop3}, \ref{item:prop4}, \ref{item:prop5} and \ref{item:prop10} are also an extension of property \ref{item:prop12} from exact averages.
Formally, defining IM we have not yet specified in detail the region \(\bchars\) of existence of the upper average, other than it can be wider than \(\bchars_0\).
Instead, we have only limited its definition in that it must satisfy properties \ref{item:C1}, \ref{item:C2} and \ref{item:C3}.
The specification of \(\bchars\) will be made once we get acquainted with a general procedure for specifying IM, as follows.
\section{Extension of primary means}\label{sec:1.2}
\subsection{Introduction}
The utilitarian advantages of a particular theory, its models and methods are defined in three main areas:
\begin{enumerate}[label=\arabic*)]
\item
Versatility: ability to work with a wide range of a class of objects or phenomena;
\item
Convenience and flexibility of the device, in particular to work with simplifications and rough estimates;
\item
Simplicity and easiness of translating the “language of phenomena” into the language of adequate models and the physical interpretability of model parameters.
\end{enumerate}
In many ways, these three areas overlap, there are three similar hares running in different directions.
Let's look at the last one.
The model should have priority, primary parameters that link it to the phenomenon, by varying the number and values of which, it is possible to achieve the adequacy of the model, like a device which by twisting the handles we can achieve its settings.
We show that any set of features with their specified averages (exact or blurred, in the form of an interval or a single border) can play the role of primary parameters for IM.
In this manner we kill all three hares: universality is achieved due to the variety of primary characteristics, and the flexibility and interpretability by the choice of their number and the variability of the boundaries of interval averages, whose meaning we already know.
\subsection{Primary and secondary characteristics}
Let \(\pchars^*\) be the set of primary characteristics.
Whether it is finite or infinite, whether it consists of limited features or not, is a matter of application.
Each \(g\in\pchars^*\) is a function \(g(x)\) defined on the set of elementary outcomes \(\pspX\) and is also called a primary function.
For each such \(g\), interval averages \(\pumn{g}\), \(\plmn{g}\), or just one of these, is assessed.
The wavy line emphasizes not only that these are primary averages, but also that they may not be consistent with each other in the sense of fulfilling the axiomatic properties \ref{item:A1}, \ref{item:A2}, \ref{item:A3} and \ref{item:A4}.
This optional consistency control gives a certain freedom, which simplifies the procedure for setting primary averages, and therefore the models themselves.
For a given \(g\) it can be known \(\plmn{g}\) and (or) \(\pumn{g}\), so we can divide the set \(\pchars^*\) in two subsets: the set \(\pchars_U\), where \(\pumn{g}\) is defined, and the set \(\pchars_L\), where \(\plmn\) is defined.
At their intersection \(\pchars_U \cap \pchars_L\) both the upper and lower averages \(\pumn{g}\), \(\plmn{g}\) are set.
The lower primary subset can be automatically converted into the upper.
To do this, using \ref{item:A4}, we define \(\pumn(-g)=-\plmn{g}\).
Thus, instead of a characteristic \(g\) with a lower average \(\plmn{g}\) we consider the function \(g_1=-g\) with the specified upper average \(\pumn{g_1}=-\plmn{g}\).
By doing this to all \(g\in\pchars_L\), our initial set can be expressed equivalently as \(\pchars=\pchars_U\cup -\pchars_L\), and we have the upper average \(\pumn{g}\) for every \(g\in\pchars\).
This transformation, which is sometimes inconvenient from the point of view of the natural interpretability of the model parameters, is nevertheless very convenient for unification and simplification.
Despite the lack of-consistency requirements, primary upper averages \(\pumn{g}\), \(g \in \pchars\) cannot be set completely freely, since this may lead to contradiction (for instance, if \(g\geq 0\) and \(\pumn{g}<0\)).
Primary averages are called \emph{consistent} if, for any \(g_i\in\pchars\), any \(c_i\geq 0\) and any \(c\in\reals\), if \(c+\sum_i c_i g_i \geq 0\) then \(c+\sum_i c_i \pumn{g_i} \geq 0\).
The characteristics on the left-hand side of the first inequality are called \emph{secondary}.
Denote \(\slhull \pchars=\{g(x)=c+\sum_i c_i^+ g_i(x), g_i \in \pchars\}\) --- the class of all possibly secondary characteristics, that is, all finite linear combinations of primary functions \(g_i\) with non-negative coefficients \(c_i^+\) and an arbitrary free real number \(c\).
We call \(\slhull \pchars\) the \emph{semi-linear hull of} \(\pchars\).
Using \ref{item:A2}, we can transfer the primary averages to it:
\begin{equation}\label{eq:1.1}
\pumn{g} = \pumn(c+\sum_i c_i g_i)= c+\sum_i c_i \pumn{g_i},
g\in\slhull\pchars.
\end{equation} % TODO: This equation seems to have been modified somewhat by Enrique Mirands. Was this intentional?
\pagebreak % 19
Now the consistency requirement is formulated as follows:
\[
0\leq g\in\slhull\pchars
\Rightarrow
\pumn{g} \geq 0;
\]
that is, for secondary characteristics of g that only take non-negative values, the upper average (assigned, converted from the lower, or transferred from \(\pchars\) to \(\slhull \pchars\)) must be non-negative.
The need for this consistency requirement is self-explanatory.
It will be illustrated in the examples below.
Let us give the main result.
Denote \(\bchars_{\pchars}=\{f: f\leq g\in \slhull \pchars\}\) a class of characteristics such that each of them is dominated by a secondary one.
Thus, given \(f\in \bchars_{\pchars}\) there is at least one \(g\in\slhull\pchars\) such that \(f\leq g\).
Obviously, \(\bchars_{\pchars}\) includes all the characteristics that are bounded above (given that \(c\in\slhull\pchars\)), so \(\bchars_0\subset \bchars_{\pchars}\), and \(\bchars_0=\bchars_{\pchars}\) if all \(g\in\pchars\) are bounded.
Let us call \(\bchars_{\pchars}\) the class of \emph{dominatable characteristics}.
\subsection{Theorem of extension of averages}
\begin{theorem}\label{th:1.1}
If the primary averages do not contradict each other, then let
\begin{align}\label{eq:1.2}
\umn{f} = \inf_{g: f(x)\leq g(x) \in \slhull\pchars} \pumn{g},\quad
f \in \bchars_{\pchars};
&&
\lmn{f} = -\umn(-f),\quad
f \in -\bchars_{\pchars},
\end{align}
are consistent upper and lower averages on the class \(\bchars_{\pchars}\) of \emph{dominatable characteristics}. % TODO: translation deviates from original in math symbols present in sentence; fix
They form an IM \(\IM\) with domain of existence of the upper average \(\bchars=\bchars_{\pchars}\).
Else, write Eq.~\eqref{eq:1.2} as % TODO: this material is not italicised in the original...
\begin{equation*}
\umn f=\inf \{\pumn{g}: f\leq g \in \slhull\pchars\}.
\end{equation*}
\end{theorem}
\begin{proof}
Axioms \ref{item:A1} and \ref{item:A2} follow from~\eqref{eq:1.2}, while \ref{item:A4} holds by definition.
It remains to check axiom \ref{item:A3}.
Since \(f_1\leq g^*\in\slhull\pchars, f_2\leq g^{**}\in\slhull\pchars\), then \(f_1+f_2\leq g^*+g^{**}=g\in\slhull\pchars\) and \(\pumn{g^*})+\pumn{g^{**}}=\pumn{g}\), whence
\begin{align*}
\umn(f_1+f_2)
&= \inf\{\pumn{g}: f_1+f_2\leq g \in \slhull\pchars\} \\
&\leq \inf\{[\pumn{g^*}+\pumn{g^{**}}: f_1\leq g^* \in \slhull\pchars, f_2\leq g^{**} \in \slhull\pchars\}\\
&=\umn{f_1}+ \umn{f_2}
\end{align*}
and the proof is complete.
\end{proof}
According to~\eqref{eq:1.2}, in order to find \(\umn f\) the secondary characteristics are derived from the primary ones by \(g=c+\sum_i c_i^+ g_i\) so that they dominate \(f\), that is, \(f\leq g\).
Each of these \(g\) will have its own average \(\pumn{g}\), using~\eqref{eq:1.1}.
Then we take the “best” among them, which corresponds to the one that is minimal.
The following principle of constructive mathematics is laid down in\eqref{eq:1.2}: consider only those results to which we can get arbitrarily close in a finite number of operations.
This is the reason why secondary features are defined as sums of primary ones.
The effect of this principle is revealed by the following consequence of \eqref{eq:1.2}.
\begin{corollary*}
For each \(f\) with finite upper average \(\umn f\) and given \(\epsilon>0\), it is always possible to specify a dominating finite linear combination of primary characteristics \(g=c+\sum_i c_i^+ g_i \geq f\) (that is, a secondary characteristic) such that
\[
\umn f+\epsilon\geq c+\sum c_i^+ \pumn{g_i}
= \pumn{g}.
\]
\end{corollary*}
Since \(g\geq f \Rightarrow \pumn{g}\geq \umn f\), it follows that \(|\pumn{g}-\umn f|\leq \epsilon\).
As \(\epsilon\) decreases, we need to involve a larger number of primary characteristics in order to approximate \(\umn f\), and so the number of operations increases.
The relationships between the sets of features that we define are illustrated in Fig~\ref{fig:1.2}.
Here, each of the upper hemispheres includes all the lower ones.
According to Theorem~\ref{th:1.1}, the averages from the primary set of characteristics \(\pchars\) are preserved by linear combinations \(\slhull\pchars\) (the secondary signs) and after them they are defined on the class \(\bchars_{\pchars}\) of characteristics that are dominated by the secondary ones.
This will be the domain of existence of its upper averages; at its core we have the class \(\bchars_0\) of those characteristics that are bounded above: \(\bchars_0 \subset \bchars_{\pchars}\).
The class \(\bchars_{\pchars}\) will also included unbounded characteristics if those are included in the primary ones; otherwise, it will be \(\bchars_0=\bchars_{\pchars}\).
\begin{figure}[b]
\centering
\begin{tikzpicture}
\draw (-4,0) -- (4,0);
\draw[dashed] (1.1,0) arc (0:180:1.1);
\draw (1.9,0) arc (0:180:1.9);
\draw (2.7,0) arc (0:180:2.7);
\draw (3.5,0) arc (0:180:3.5);
%%
\path[decorate,decoration={text along path,
text={|\scriptsize|Consistent},text align=center}]
(-0.85,0) arc [start angle=180,end angle=0,radius=0.85];
%%
\path[decorate,decoration={text along path,
text={|\scriptsize|{\(\mathcal{Y}'\)}},text align=center}]
(-0.45,0) arc [start angle=180,end angle=0,radius=0.45];
%%
\path[decorate,decoration={text along path,
text={|\scriptsize|Primary},text align=center}]
(-1.65,0) arc [start angle=180,end angle=0,radius=1.65];
%%
\path[decorate,decoration={text along path,
text={|\scriptsize|{\(\mathcal{Y}\)}},text align=center}]
(-1.3,0) arc [start angle=180,end angle=0,radius=1.3];
%%
\path[decorate,decoration={text along path,
text={|\scriptsize|Secondary},text align=center}]
(-2.45,0) arc [start angle=180,end angle=0,radius=2.45];
%%
\path[decorate,decoration={text along path,
text={|\scriptsize|{\(\mathcal{L}+\mathcal{Y}\)}},text align=center}]
(-2.1,0) arc [start angle=180,end angle=0,radius=2.1];
%%
\path[decorate,decoration={text along path,
text={|\scriptsize|Majorized {\(\mathcal{F}_{\mathcal{Y}}\qquad\qquad\quad\)} Restricted {\(\mathcal{F}_0\qquad\qquad\quad\)} Limit {\(\mathcal{F}_{\infty}\)}},text align=center}]
(-3.15,0) arc [start angle=180,end angle=0,radius=3.15];
\end{tikzpicture}
\caption{Stages in the definition process.}
\label{fig:1.2}
\end{figure}
\subsection{Consistent primary averages}
Remarkably, \eqref{eq:1.2} does not only provide the upper average \(\umn f\) for \(\forall f\in \bchars_{\pchars}\) (and in particular \(\umn f, \lmn f\) for \(\forall f\in \bchars_{00}\)), but also, considering \(g_i\in\pchars\) instead of \(f\), updated consistent assessments \(\umn g_i\).
Since \(g_i\) dominates itself as a primary characteristic, \eqref{eq:1.2} implies that \(\pumn{g_i} \geq \umn g_i\) for every \(g_i\in\pchars\).
If \(\pumn{g_i}=\umn g_i\), the primary value is consistent with the other averages and is then denoted \(\umn g_i\).
If \(\pumn{g_i}>\umn g_i\), then there will be secondary characteristics dominating \(g_i\) and the value \(\pumn{g_i}\) will be inconsistent with the rest, so it can be removed from the primary set without any damage.
Thus, only the consistent primary averages matter for the definition of the IM, and as will be seen later, not necessarily all of them.
\pagebreak %21
Their minimum number is called the \emph{dimension} of the IM.
Given an interval model with primary averages \(\pumn{g}, g \in \pchars\), denoted \(\IMfunc{\pumn}{\pchars}\), if we exclude all inconsistent primary characteristics from the set, it becomes \(\IMfunc{\umn}{\pchars'}\), \(\pchars'\subset \pchars.\)
The expansion to the initial set to \(\umn f\) by means of~\eqref{eq:1.2} does not change the IM, so we obtain the same model if we take as primary set \(\pchars, \pchars'\), or all of \(\bchars=\bchars_{\pchars}\): \(\IM=\IMfunc{\pumn}{\pchars}=\IMfunc{\pumn}{\pchars'}=\IMfunc{\pumn}{\bchars}\).
We will note at the same time one peculiarity, that inconsistent characteristics \(g_i\), for which \(\pumn{g_i}>\umn g_i\) must be dominatable by one of the secondary characteristics \(g\in\slhull\pchars\), which will then be used in~\eqref{eq:1.2} to update the value \(\umn g_i\).
Thus, \(\slhull\pchars'=\slhull\pchars\) and \(\bchars_{\pchars}'=\bchars_{\pchars}\).
It also follows from above that the primary averages \(\pumn{g}_i, g_i \in\pchars\), are consistent if and only if \(g_i\leq g\in \slhull\pchars\Rightarrow \pumn{g_i} \leq \pumn{g}\) for all \(g_i \in\pchars\) and \(g\in\slhull\pchars\).
Then \(\pumn{g_i}=\umn g_i \ \forall g_i \in\pchars\).
\subsection{Characteristics of random variables}
As an example, consider the case when \(\pspX=\reals\)-the real line.
Such a phenomenon has numerical outcomes and is called \emph{random variable} (abbreviated as r.v.).
characteristics \(f\) of r.v.
are any numeric functions \(f(X)\) taking values on \(\reals\) (from now on, the numerical outcomes \(X\subset\reals\) will be denoted by capital letters).
A random variable is called discrete if its set of possible values is finite or countable.
It is convenient to consider the entire real line as an outcome space for such r.v.
and the fact that \(\Omega\subset\reals\) with the addition of the condition \(\lpr(\Omega)=1\).
Some relevant characteristics of random variables are represented in Table~\ref{tab:1.1}.
Let us give a couple of examples and compute their averages by means of Theorem~\ref{th:1.1}.
\begin{table}
\caption{ }
\label{tab:1.1}
\centering
\begin{tabular}{l|c|c}
\hline
\begin{tabular}{c}Characteristic of a\\ random variable\end{tabular} & Formal definition & Function \\ \hline
\begin{tabular}{l}\(\Omega\) - Set of values \\ or the r.v.\end{tabular}&
\(\lpr(\Omega)=1\) &
\parbox{0.25\columnwidth}{\begin{tikzpicture}
\draw[white] (-1.7,-.5) rectangle (1.8,1);
\draw[->] (-1.5,0) -- (1.5,0);
\draw (1.6,-.2) node {\scriptsize \(X\)};
\draw (-1.05,0) rectangle (1.05,0.75);
\draw (0,0.375) node {\scriptsize \(\Omega\)};
\draw (-1.25,0.75) node {\small 1};
\end{tikzpicture}}
\\ \hline
\begin{tabular}{l}The exact average \(m\)\\ of the r.v. lies in\\ the interval \([\underline{m},\overline{m}]\)\end{tabular}& \begin{tabular}{c} \(MX=m\) \\ \(\underline{M}X=\underline{m}\), \(\overline{M}X=\overline{m}\)\end{tabular}
&
\parbox{0.25\columnwidth}{\begin{tikzpicture}
%\node[draw,minimum height=10ex] {***};
\draw[white] (-1.7,-.75) rectangle (1.8,1.0);
\draw[->] (-1.5,0) -- (1.5,0);
\draw (-0.35,-0.5) -- (0.35,0.5);
\draw (0.1,-0.2) node {\scriptsize 0};
\draw (-0.1,0.3) node {\scriptsize \(X\)};
\end{tikzpicture}}
\\ \hline
\begin{tabular}{l}
Mean square of the\\ r.v. is greater than \(\underline{b}\)\\ and lower than \(\overline{b}\)
\end{tabular} &
\(\underline{M}X^2=\underline{b}\), \( \overline{M}X^2=\overline{b} \) &
\parbox{0.25\columnwidth}{\begin{tikzpicture}
%\node[draw,minimum height=10ex] {***};
\draw[white] (-1.7,-.4) rectangle (1.8,1.20);
\draw[->] (-1.5,0) -- (1.5,0);
\draw[scale=0.5, domain=-1.5:1.5, smooth, variable=\x] plot ({\x},{0.8*\x*\x});
\draw (0,0.6) node {\scriptsize \(X^2\)};
\end{tikzpicture}}
\\ \hline
\begin{tabular}{l}The probability of hitting\\ segment \(A\) lies within the\\ specified limits \end{tabular} &
\(\lpr(A), \ \upr(A)\) &
\parbox{0.25\columnwidth}{\begin{tikzpicture}
%\node[draw,minimum height=10ex] {***};
\draw[white] (-1.7,-.35) rectangle (1.8,1.15);
\draw[->] (-1.5,0) -- (1.5,0);
\draw (-.75,0) rectangle (.75,0.75);
\draw (0,0.375) node {\scriptsize \(A\)};
\draw (-0.9,0.75) node {\small 1};
\end{tikzpicture}}
\\ \hline
\begin{tabular}{l}Probability of ``blowout'' \\ level \(h\) not more than \(p_h\)\end{tabular} &
\(\upr(X>h)=p_h\) &
\parbox{0.25\columnwidth}{\begin{tikzpicture}
%\node[draw,minimum height=10ex] {***};
\draw[white] (-1.7,-.35) rectangle (1.8,1.15);
\draw[->] (-1.5,0) -- (1.5,0);
\draw (0,0) -- (0,0.75) -- (1.45,0.75);
\draw (-0.2,0.75) node {\small 1};
\draw (0,-0.15) node {\scriptsize \(h\)};
\end{tikzpicture}}
\\ \hline
\begin{tabular}{l}
Mean of the module\\ of the r.v. lies\\ within the specified limits
\end{tabular} &
\(\underline{M}|X|, \ \overline{M}|X|\) &
\parbox{0.25\columnwidth}{\begin{tikzpicture}
%\node[draw,minimum height=10ex] {***};
\draw[white] (-1.7,-.35) rectangle (1.8,1.15);
\draw[->] (-1.5,0) -- (1.5,0);
\draw (-0.7,0.7) -- (0,0) -- (0.7,0.7);
\draw (0,0.5) node {\scriptsize \(|X|\)};
\draw (0,-0.15) node {\scriptsize 0};
\end{tikzpicture}}
\\ \hline
\begin{tabular}{l}
Average of harmonic\\ functions
\end{tabular} &
\(\begin{array}{l l} \underline{M}\cos uX, & \overline{M}\cos uX\\ \underline{M}\sin uX, & \overline{M} \sin uX \end{array}\) &
\parbox{0.25\columnwidth}{\begin{tikzpicture}
%\node[draw,minimum height=10ex] {***};
\draw[white] (-1.7,-.75) rectangle (1.8,0.75);
\draw[->] (-1.5,0) -- (1.5,0);
\draw[dashed,scale=0.5, domain=-1.8:1.8, smooth, variable=\x] plot ({\x},{0.8*sin(\x*pi/1.8 r)});
\draw[scale=0.5, domain=-1.8:1.8, smooth, variable=\x] plot ({\x},{0.8*sin((\x+.7)*pi/1.8 r)});
\end{tikzpicture}}
\\ \hline
\begin{tabular}{l}
The probability of hitting\\
segment \(A\) is greater than\\
in segment \(B\)
\end{tabular} &
\( \overline{M}[B(x)-A(x)]\leq 0 \) &
\parbox{0.25\columnwidth}{\begin{tikzpicture}
%\node[draw,minimum height=10ex] {***};
\draw[white] (-1.7,-.85) rectangle (1.8,0.85);
\draw[->] (-1.5,0) -- (1.5,0);
\draw (0.5,0) rectangle (1.2,0.5);
\draw (-0.5,0) rectangle (-1.2,-0.5);
\draw (1.3,0.5) node {\small 1};
\draw (-1.3,-0.5) node {\small 1};
\draw [underbrace style] (-0.45,.1) -- (1.2,0.1) node[below,pos=0.5,yshift=-3mm] {\scriptsize B};
\draw [overbrace style] (0.45,.35) -- (-1.2,0.35) node[above,pos=0.5,yshift=-1mm] {\scriptsize A};
\end{tikzpicture}}
\\ \hline
\end{tabular}
\end{table}
\begin{example}\label{ex:1.4}
Let it be known that the mean of the random variable is equal to \(m\); write \(MX=m\) (i.e., \(\umn(\pm X)=\pm m\)).
Then using property \ref{item:prop12} of the secondary features \(f(X)=c+c_1X\), \(M(c+c_1 X)=c+c_1 M(X)\).
The domain of existence \(\bchars\) consists of functions that are dominatable by straight lines \(c+c_1 X\).
Since \(c^2-2cX\geq -X^2\), then \(-X^2\in\bchars\) and \(\umn(-X^2)=\min_c (-2cm+c^2)=-m^2\), i.e., \(\lmn(X^2)=m^2\).
The function \(X^2\) is not dominated by any straight lines, so \(\umn(X^2)=\infty\) and \(X^2\notin \bchars\).
For any indicator characteristics \(A(x)\) it is impossible to find dominating straight lines \(c+c_1 x\) except for \(c=0\) and \(c_1=1\), whence \(\lpr(A)=0\) and \(\upr(A)=1\).
This leads to the conclusion that the primary average does not carry non-trivial information about events.
\end{example}
\begin{example}\label{ex:1.5}
Let the upper mean square of the r.v. be given, that is, \(\umn(X^2)=\overline{b}\), giving rise to an IM of dimension 1.
The area of existence of the upper average, \(\bchars_{\pchars}\), consists of all the characteristics \(f(X)\) dominated by all the parabolas of the form \(c+c_2^+ X^2\), i.e., those \(f\) for which \(\overline{\lim}_{|X|\rightarrow \infty} \frac{f(X)}{X^2}<+\infty\).
When \(a_1>0\), as can be seen from Figure~\ref{fig:1.3}, corresponding to the dominant dominating parabola \(c=0, c_2^+=a_1^{-2}, \{a_1\leq X \leq a_2\}=x^2/a_1^2\), we obtain \(\upr(a_1 \leq X \leq a_2)\leq \umn{X^2}/a_1^2=\overline{b}/a_1^2\) (when \(a_2=\infty\) we have an analog of Chebychev's inequality).
The equality will occur when the right hand side is smaller than 1, i.e, when \(a_1>\sqrt{\overline{b}}\) (when \(a_2<0\) then \(a_1\) replaces \(a_2\)); otherwise \(\upr\) is trivially equal to \(1\).
Thus, knowing \(\overline{b}\) makes nontrivial the upper probabilities of segments that are at least a distance greater than \(\sqrt{\overline{b}}\) from the origin.
\begin{figure}
\centering
\begin{tikzpicture}[scale=1.5]
\draw[->] (-0.5,0) -- (3,0);
\draw (3,-0.15) node {\small \(x\)};
\draw[scale=1, domain=0:1.1, smooth, variable=\x] plot ({\x},{\x*\x*\x});
\draw (1,0) rectangle (2.5,1);
\draw[dashed,scale=1, domain=0.85:2.6, smooth, variable=\x] plot ({\x},{1-(\x-3.5/2)*(\x-3.5/2)*(2/1.5)^2});
\draw (1.1,-0.15) node {\small \(a_1\)};
\draw (2.4,-0.15) node {\small \(a_2\)};
\draw (1.95,1.15) node {\footnotesize \(\{a_1 c_2^+ m^2/(c_2^+-1)\), and minimizing the average of the parabola with this restriction, we get the coefficients \(c_2^+=1+|m|\sqrt{\overline{b}}\), \(c=(\sqrt{\overline{b}}+|m|)|m|\), whence \(\umn(X\pm m)^2=(|m|+\sqrt{\overline{b}})^2\).
Let us now add a zero mean as a primary average \(MX=0\) (or, equivalently \(\umn(\pm X)=0\)).
Then the IM is narrowed and its dimension is equal to \(3\).
The secondary averages will be then parabolas that are pushed by the averages, \(\umn(c+c_2^+(X-c_1)^2)=c+c_2^+\overline{b}+c_2^+ c_1^2\), using property \ref{item:prop11} in \S\ref{sec:1.1}.
In particular, we deduce that \(\umn(X\pm m)^2=\umn{X^2}+m^2=\overline{b}+m^2\).
Let us consider the probabilities of intervals.
A parabola dominating an interval, \(\{a_1 \leq X \leq a_2\}\leq c+c_2^+(X-c_1)^2\) must satisfy \(c\geq 0, c_1 \leq a_1, c+c_2^+(a_1-c_1)^2\geq 1\), and we need to find such \(c,c_2^+,c_1\) with the minimum upper average.
By making calculations, we obtain \(c=0, c_1=-\overline{b}/a_1, c_2^+=(a_1+\overline{b}/a_1)^{-2}\) when \(a_1>0\) (if \(a_2<0\) then we replace \(a_1\) with \(a_2\)), and as a result the probability is the minimum average of the parabola: \(\upr(a_1\leq X\leq a_2)=(1+a_1^2/\overline{b})^{-1}\).
The probability is not trivial for any \(a_1>0\) (or \(a_2<0\)).
The lower probability of the interval is computed by considering the parabola in the dashed line in Figure~\ref{fig:1.3}, that we can use to transfer the probability of the event:
\begin{equation*}
\lpr(a_1\leq X \leq a_2)
\geq 1 - \umn(X-\frac{a_1+a_2}{2})^2(\frac{2}{a_2-a_1})^2
= 1 - \frac{4\overline{b}+(a_1+a_2)^2}{(a_2-a_1)^2}.
\end{equation*}
The right hand side is larger than \(0\) when \(-a_1 a_2 > \overline{b}\) (whence \(a_1<0\) and \(a_2>0\)), from which we obtain the lower probability (otherwise, it is equal to \(0\)).
Thus, knowledge of the zero average, and specifying the upper probabilities of intervals distant \(|a_1|\) from the origin makes the lower probabilities of intervals that include the origin non-trivial.
Note that the additional knowledge \(\lmn(X^2)=\underline{b}\) does not change the probabilities.
\end{example}
\pagebreak % 24
\subsection{Characteristics of random processes}
A random process \(X_t\) numbered by the index \(t\) (called time) is a sequence of r.v..
The values of \(t\) may be an interval \([0,T]\) from the time axis \(\reals\), the entire axis, some discrete reference points \(t_1,\dots,t_n\) on this axis (and then the random process becomes a vector).
This is not important for the time being.
We shall denote \(T\) the set of this values.
The output space \(\pspX\) will be the set of all possible implementations \(x_t, t \in T\), as a function of time \(t\).
To describe the process, you need to describe each r.v. \(X_t\) separately with its own features, in the manner discussed earlier, as well as the relationship between \(X_t\) and \(X_{t'}\) at different \(t, t' \in T\).
Features of this relationship, in particular, are the products \(X_t X_{t'}\) and their averages \(\lmn{X_t X_{t'}}=\underline{b}_{t t'}\), \(\umn{X_t X_{t'}}=\overline{b}_{t t'}\), called upper and lower \emph{correlation functions}.
More generally, these are correlations after transformations of each r.v. by the same function \(F\) (called inertia-free transformations); then the primary parameters of the process will be \(\lmn{F(X_t) F(X_{t'})}\), \(\umn{F(X_t) F(X_{t'}})\).
Thus, if \(F(X)=(X>h)\)-the indicator function of exceeding level \(h\), this will be called the excess correlation (and for \(h=0\) it is the polarity correlation).
Other than these, there are characteristics in the forms of integrals, \(\int_T F(X_t) dt\).
In particular, \(\frac{1}{T}\umn{\int_0^T X_t^2 dt}\) is the upper average integral square of the process on the interval \([0,T]\).
Thus, the characteristics and their averages constitute a universal form of describing any phenomena.
The complexity of the description is defined not so much by the space \(\pspX\) of outcomes, but by the dimension of the interval model, that is determined by the number of primary averages.
\subsection{Vacuous models}
Assume that a single fact is our primary information: that event \(B\) is true, and nothing else.
This corresponds to the primary probability \(\lpr(B)=1\) (or, equivalently, to \(\pr(B)=1\)).
This model is called the \emph{\(B\)-indicator}, and is denoted \(\vacuous_B\).
It is described by the average \(\umn{f(x)}=\sup_{x\in B} f(x)\) and the region of existence are all those characteristics that are bounded above for \(x\in B\). % TODO: \umn{f(x)} instead of \umn{f} !?
It \(B=\pspX\), then the model is called \emph{vacuous} and is denoted by \(\vacuous\).
It is described by the average \(\umn{f}=\sup f\), and it corresponds to the full absence of data about the phenomenon.
Its area of existence is the class \(\bchars_0\) of all restricted characteristics.
\subsection{Modified extension formula}
Replacing the primary characteristics \(g_i\in\pchars\) and their averages \(\pumn{g_i}\) (given above) by \(g_i'(x)=c_i^+ g_i(x)+c\) and \(\pumn{g_i'}=c_i^+ \pumn{g_i} +c\) leads obviously to an equivalent model.
If we center the primary characteristics by replacing \(g_i\) with \(\mathring{g}_i(x)=g_i(x)-\pumn{g_i}\), we obtain \(\pumn{\mathring{g}_i}=0\).
The \emph{centered} set of characteristics is denoted \(\mathring{\pchars}\) and \(\pumn{\mathring{\pchars}}=0\) is the set of all null primary values.
For this set, ~\eqref{eq:1.2} becomes:
\begin{align}\label{eq:1.3}
\pumn{f}
&= \inf \{ [c+ c_i^+ \pumn{\mathring{g}_i}]: c+ c_i^+ \mathring{g}_i(x)\geq f(x)\}\notag\\
&= \inf \{ c: c\geq f(x)-\sum c_i^+ \mathring{g}_i(x) \}\notag\\
&= \inf_{c_i^+} \sup_x [f(x)-\sum c_i^+ \mathring{g}_i(x)].
\end{align}
Obviously, any secondary characteristic of the type \(\mathring{g}(x)=\sum c_i^+ \mathring{g}_i(x)\) is centered, i.e., its upper average is \(\pumn{\mathring{g}}=0\), and conversely any centered characteristic can be represented in this manner.
Therefore, the purpose of formula~\eqref{eq:1.3} is to find the secondary characteristic that best approximates the characteristic \(f(x)\) from above.
The calculations of~\eqref{eq:1.3} are illustrated in the following example.
\begin{example}\label{ex:1.6}
Consider a random variable (i.e., \(\pspX=\reals\)) with a single primary value \(\pumn{e^{-|X|}}=\mu\).
Let us determine \(\upr(0\leq X \leq d)\).
Solving Equation~\eqref{eq:1.3} comes down to finding the value
\[
\upr(0\leq X \leq d)
= \min_{c^+} \max_X [\{0\leq X \leq d\}-c^+ e^{-|X|}-\mu]
= \min_{c^+} \max \{1-c^+ (e^{-d} - \mu, c^+ \mu\}
\]
(for clarity, we recommend to draw the graph of the function).
If \(e^{-d}<\mu\), then the minimum of the function will be reached at \(c^+=e^{-d}\), giving \(\upr(0\leq X \leq d)=e^{d}\mu\).
If \(e^{-d} \geq \mu\), then the minimum of the function will be reached at \(c^+=0\), and then the desired probability is equal to \(1\).
\end{example}
\begin{addendum}
\item
The domain of existence \(\bchars\) was assumed to include the characteristics \(f\) such that \(\umn{f}=-\infty\).
The interval model can be assumed to be the set of all those characteristics \(f\), by assigning to those \(f\) for which \(\umn{f}\) does not exist, average equal to infinity: \(\umn{f}=\infty\); similarly we can make \(\lmn{f}=-\infty\) for all those \(f\) whose lower average does not exist.
The axioms of IM are satisfied in this case if we assume that \(0\cdot \infty=0\) and consider that the class of characteristics is not closed under the sum operation, since if \(f(x_0)=\infty,g(x_0)=-\infty\) for some \(x_0\), it is not clear the value we should give to \(f(x_0)+g(x_0)=\infty-\infty\).
It is then advisable to consider that axiom \ref{item:A3} should be fulfilled only on those characteristics for which the addition is defined.
\item
Given the lower primary averages \(\plmn{g}\), \(g\in\pchars_H\) and the upper primary averages \(\pumn{g}', g' \in\pchars_B\), then the secondary characteristics will be the linear combinations of the form
\[
g(x) = c+\sum c_i^+ g_i'(x)-\sum d_i^+ g_i(x),
g_i'\in\pchars_B, g_i\in\pchars_H
\]
with \(c\) arbitrary and non-negative coefficients \(c_i^+,d_i^+\).
Equation~\eqref{eq:1.2} then becomes
\[
\umn(f) = \inf\{
[c+\sum c_i^+ \pumn{g}_i'-\sum d_i^+ \plmn{g_i}]: c+\sum c_i^+ g_i'(x)-\sum d_i^+ g_i(x)\geq f(x)
\}
\]
and~\eqref{eq:1.3} becomes
\[
\umn(f) = \inf_{c_i^+,d_i^+} \sup_x [
f(x) - \sum c_i^+ (g_i'(x)-\pumn{g_i'}))
+ \sum d_i^+ (g_i-\plmn{g_i})
]
\]
with \(g_i'\in\pchars_B, g_i\in\pchars_H\).
\item
Assuming \(\pchars=\pchars_H\cup\pchars_B\), it is always possible to make it so that in \(\pchars\) the average intervals \(\plmn{g}\), \(\pumn{g}\), \(g \in\pchars\) are set.
To do this, unassigned averages are replaced by the corresponding extreme values of \(g\).
Then the secondary characteristics will be those linear combinations of elements of \(\pchars\), denoted \(\lhull\pchars=\{g(x)=c+\sum c_i g_i(x)\}\), where \(g_i\in \pchars\) and \(c_i,c\) are arbitrary.
The formulae for the extended averages are
\begin{align*}
\umn(f) &= \inf\{
[c+\sum \pumn{c_i g_i}]: c+\sum c_i g_i(x)\geq f(x)
\}\\
\umn(f) &= \inf_{c_i} \sup_x [
f(x)-\sum (c_i g_i(x)-\pumn{c_i g_i})
\end{align*}
where \(\pumn{c_i g_i}=c_i\pumn{g_i}\) if \(c_i>0\) and \(\pumn{c_i g_i}=c_i\plmn{g_i}\) if \(c_i<0\).
\item
In the extension formula, the class \(\slhull\pchars\) in property \ref{item:prop13} may be replaced by its closure \([\slhull\pchars]\) with respect to uniform convergence.
This class includes, in addition to the semilinear combinations of elements from \(\pchars\), their uniform limits.
This is of course only non-trivial when the class \(\pchars\) is infinite: otherwise \(\slhull\pchars\) and its closure \([\slhull\pchars]\) coincide.
\item
Our extension formulas are similar to the duality formulas used in the theory of generalized Chebyshev inequalities [17].
The difference is that here we are considering a different axiom.
\item
That the secondary characteristics are finite sums of the primary ones, and not infinite, is a fundamental “reluctance” in our theory to add “extra” axioms, and is due to the fact that it is not provable that the sum of an infinite (countable) number of zeros considered as a whole, and not as a limit, is zero.
\item
If each characteristic \(f(x)\) is a r.v., then the IM \(\IMfunc{\umn}{\bchars}\) can be regarded a a set of agreed averages on the random variables defined on \(\pspX\).
A set of characteristics \(g_t\), \(t\in T\) put together produces a vector if \(T\) is discrete, or a process if it is continuous.
\item
The fact that the IM is determined by the average \(\umn f\) of an infinite set of characteristics \(\bchars\), is not a hindrance for applications.
The ability to compute \(\umn f\) by means of~\eqref{eq:1.2} does not mean that this computation should be made for each \(f\).
Quite the contrary: as we will see later, there is no need for applied tasks that go beyond secondary attributes.
\item
Any regularity properties of the averages that do not follow directly from the axioms requires the selection of some subclass from the general ensemble, as well as the introduction of some additional axioms.
\item \textsc{Initial moments}.
Assume that the primary set of characteristics allows to compute the averages of the powers \(\plmn{X^j}\), \(\pumn{X_j}\) for \(j\in J\), where \(J\) is a set of consecutive non-negative indices.
The average \(\plmn{X^j}\) is called the lower moment of the \(j\)-th order, while \(\pumn{X_j}\) is called the upper moment.
The first moments \(\plmn{X}\), \(\pumn{X}\) are the lower and upper averages and the second moments \(\plmn{X^2}\), \(\pumn{X_2}\) are the lower and upper powers.
The absolute initial moments are the moments of the module of the r.v., \(\plmn{|X|^j}\), \(\pumn{|X|^j}\), \(j>0\).
Obviously, for even non-negative \(j\), these coincide with the initial moments.
In general \(j\) may not be chained.
In the case of absolute moments, the following inequalities must be satisfied: for \(r\geq s\geq 0\),
\[
(\umn |X|^r)^{1/r} \geq (\umn |X|^s)^{1/s},
\quad
(\lmn |X|^r)^{1/r} \geq (\lmn |X|^s)^{1/s}
\]
(this is proved similarly to [1, p.~169]).
\end{addendum}
\section{Relations between interval models}\label{sec:1.3}
\begin{smallpar}
Here interval models are geometrically represented as some convex “bodies” with a characteristic external contour description, inclusion relations, and union and intersection operations.
Moreover, since an IM is completely determined by its averages, we will through them introduce formal relations and operations between IM below.
\end{smallpar}
\subsection{Geometric illustration of IM}
Consider a finite set of elementary outcomes: \(\pspX=\{x_1,\dots,x_r\}\), and a probability vector \(P=\{p_1,\dots,p_r\}\).
Since \(\sum p_i=1\), the dimension of \(P\) is one less than the number of outcomes \(r\).
The set of all \(P\) is denoted
\[
\probs = \{P: p_i\geq 0, \sum_{i=1}^{r} p_i=1\}.
\]
This is a subset of the \(r\)-dimensional Euclidean space \(\reals^r\).
In Fig.~\ref{fig:1.4}, for \(r=3\) this family is a triangle.
\begin{figure}[b]
\centering
\begin{tikzpicture}[scale=2.5]
\draw[->] (0,0,0) -- (1.2,0,0);
\draw[->] (0,0,0) -- (0,1.2,0);
\draw[->] (0,0,0) -- (0,0,1.2);
\draw (1,0,0) -- (0,1,0) -- (0,0,1) -- (1,0,0);
\draw (0,0.15,1.4) node {\scriptsize \(p_3\)};
\draw (0,1.0,-0.3) node {\scriptsize \(p_2\)};
\draw (1.1,0,-0.2) node {\scriptsize \(p_1\)};
%%%
\draw[dashed] (0.3,0.7,0) -- (0,0.55,0.45) node[sloped,above=-0.25em,pos=0.25] {\tiny \(\pumn g_5\)};
\draw[dashed] (0.3,0,0.7) -- (0,0.3,0.7) node[inner sep=-0.7pt,left=.27em,pos=0.27,fill=white] {\tiny \rotatebox{-45}{\(\pumn g_6\)}};
\draw (0.3,0,0.7) -- (0,0.3,0.7);
\draw (0,0,0) -- (0,0,0.4);
\draw (0,0,1) -- (0,0,0.7);
%%%
\draw (0.5,0.5,0) -- (0.67,0.33,0) -- (0.45,0.1,0.45) node[above=-0.2em,pos=0.35,sloped] {\tiny \(\pumn g_1\)} -- (.1,.2,0.7) -- (0.1,0.5,0.4) -- (0.5,0.5,0) node[sloped,above=-.22em,pos=0.6] {\tiny \(\pumn g_4\)};
%%
\draw (-0.05,1,0) -- (0,1,0);
\draw (-.1,1,0) node {\scriptsize 1};
\draw (1,0.05,0) -- (1,0,0);
\draw (1,0.1,0) node {\scriptsize 1};
\draw (-0.06,0,0.94) -- (0,0,1);
\draw (-0.1,0.05,0.9) node {\scriptsize 1};
%%
\draw (0.33,0.33,0.33) node {\tiny \(\mathcal{M}\)};
\end{tikzpicture}
\caption{Geometry of an IM.}
\label{fig:1.4}
\end{figure}
For each fixed \(P\in\probs\), the average value of the trait \(f(x)\), that is a vector \(f=(f(x_1),\dots,f(x_r))\) is equal to the scalar product of \(f\) and \(P\):
\[\mn_{P}=\sum_{i=1}^{r} f(x_i) p_i.\]
Consider a family \(\IM_0\subset\probs\).
Then the average will not be exact, and for each \(f\) it will be determined by the boundaries
\begin{align*}
\lmn f=\inf_{P\in\IM_0} \mn_{P} f,
&&
\umn f=\sup_{P\in\IM_0} \mn_{P} f.
\end{align*}
Let us verify that the thus defined lower and upper averages satisfy the axioms of IM:
\begin{itemize}
\item[\ref{item:A1}:] \(g(x_i)\geq f(x_i)\Rightarrow \mn_P g\geq \mn_P f, \ \forall P \Rightarrow \umn g \geq \umn f\).
\item[\ref{item:A2}:] \(\umn(b^* f+c)=\sup_{P\in\IM_0} b^+ \mn_P f +c=b^+ \umn f+c.\)
\item[\ref{item:A3}:] \(\umn(f+g)=\sup_{P\in\IM_0} (\mn_P f+\mn_P g) \leq \sup_{P\in\IM_0} \mn_P f+ \sup_{P\in\IM_0} \mn_P g=\umn f+\umn g\).
\item[\ref{item:A4}:] \(-\lmn f=-\inf_{P\in\IM_0} \mn_P f=\sup_{P\in\IM_0} \mn_P (-f)=\umn(-f)\).
\end{itemize}
The average values do not change if the family \(\IM_0\) is replaced by its convex hull- “body” of vectors \(\hyperplane\).
This will be an IM \(\IM\).
Thus, any convex “body” of probability vectors \(\hyperplane\) determines an IM on a discrete space \(\pspX\).
And conversely, any IM corresponds to a convex “body” on \(\pspX\).
How does this work?
With each average \(\umn f\) we associated a half-space of vectors \(\hyperplane\) by means of the condition \(\mn_P f \leq \umn_P f\), i.e., bordered with one side of the hyperplane \(\hyperplane\): \(\mn_P f=\umn f\).
The trait \(f\) gives the direction of the hyperplane, and the value \(\umn f\) its position.
If we model the set \(\umn f\), \(f \in \bchars\) as the intersection of the half-spaces corresponding to the different \(\umn f\), we obtain the convex body \(\IM\).
The values of the averages \(\umn f\) give the positions of the hyperplanes.
The primary traits \(\pumn g_i\), \(i=1,\dots,k\) determine the initial primary hyperplanes \(\hyperplane\): \(\mn_P g_i=\pumn g_i\), setting the form of \(\IM\).
If their number is finite, then the IM will be a polyhedron \(\IM_k\) (as illustrated in fig.~\ref{fig:1.4}), whose faces are the consistent values \(\pumn g_i=\umn g_i\).
The inconsistent ones, such as \(\mn g_5\), will be outside \(\IM\) and have no effect.
There are hyperplanes \(\hyperplane\): \(\mn_Pg=\umn g\) that go through the vertices of \(\IM\) but that do not coincide with any of the facets, such as \(\umn g_6\).
Although they are consistent, their exclusion does not affect \(\IM\), and in this sense they are redundant.
Removing all the redundant primary elements is only clear when there is a finite number of primary traits, and can be unfeasible in the infinite case.
This does not only apply to discrete \(\pspX\), but also to arbitrary ones.
The number of elements of a non-redundant primary set is the \emph{dimension} of an IM.
Interval models may be of finite or infinite dimension.
In the first case, it coincides with the number of faces of the polyhedron \(\IM\), except for the vacuous case that coincides with \(\probs\).
The dimension of an IM characterises the minimum amount of data that is needed for this task.
Of course, for practical purposes it is appropriate to use sets determined by a finite number of data.
Interval models of infinite dimension are convex bodies whose contours are defined by an infinite number of tangent hyperplanes.
\pagebreak % 29
\subsection{Discussion}
Let us pause to discuss the meaning of the adopted approach.
What is preferable, to describe an IM “body” \(\IM\) as a set of vectors \(\hyperplane\) of probabilities - a kind of “atomic” models, or simply as the external contours \(\umn g\), \(g \in \pchars\)?
If \(\pspX\) has a small number of elements, then there is no fundamental difference.
But if the number of elements of \(\pspX\) grows, so does the dimension of \(\hyperplane\), and the atomic models are smaller.
The description of \(\hyperplane\) becomes disproportionally complex when we consider an infinite amount of outcomes, and in particular when we move to a continuous space \(\pspX\), for instance the real line (and even more in a process \(X_t\)).
Then the “atoms” disappear altogether: they become inaccessible.
The assessment of \(\hyperplane\) requires then of all sorts of tricks, as we discuss in the next paragraph.
At the same time, the contour description, i.e., the description by primary parameters, does not depend on the “atoms” of \(\hyperplane\) in any manner, and is determined bypassing \(\hyperplane\).
Everything depends on the number of faces (the dimension) and their positions (for the traits \(g_i\in\pchars\), but not on the space \(\pspX\).
This is almost like geometry: it is not important what a body is filled with, but only its external structure and proportions.
\subsection{Hierarchy of models}
For any two IM \(\IM_1\) and \(\IM_2\) on \(\pspX\) such that \(\umn_1 f\leq \umn_2 f\), we say that \(\IM_1\) is \emph{included} in \(\IM_2\) and write it \(\IM_1 \subset \IM_2\).
In the inequality above, if \(f\) is not bounded above and there is no \(\umn f\) for it, we formally consider it unthinkable.
A necessary condition for the inclusion \(\IM_1 \subset \IM_2\) is that the set \(\bchars_1\) for which \(\umn_1 f\) exists cannot be narrower than the region \(\bchars_2\) for which \(\umn_2 f\) exists: \(\bchars_1 \supset \bchars_2\).
The inclusion is illustrated in fig.~\ref{fig:1.5} as the inclusion of \(\IM_1\) in \(\IM_2\).
We will also say that \(\IM_1\) is narrower than \(\IM_2\)- and that \(\IM_2\) is wider.
Adding primary traits cuts off new faces, “shapes” \(\IM\) and leads to its narrowing and clarification: for example, for \(\umn f\) is a decrease --- and for \(\lmn f\) an increase, so the inclusion \(\IM_1\subset \IM_2\) corresponds to having more data in \(\IM_1\), or it being more accurate. % TODO: interjection unclearly translated (closing --- missing) and should \lmn f be \mn{f} a in the Russian original?
As a result, \(\IM_1\) is more precise than \(\IM_2\).
The widest model of all is the vacuous model (in Fig.~\ref{fig:1.4} it is the set \(\probs\) of all probability vectors), defined by the averages \(\umn f=\sup f, f \in \bchars_0\) on all bounded above traits and denoted by \(\probs\).
It is if there are no data- “clothes” that distinguish one IM from another, or they are missing: \(\pchars=0\), i.e., nothing is known about the phenomenon, a kind of “black box” with a completely cryptical structure, the output of which is not observed for any \(x\).
For \(\probs\) the interval of averages \(\lmn f\), \(\umn f\) of each restricted trait \(f\in\bchars_{00}\) are the same as the range \(\inf f,\sup f\) of its possible values.
\pagebreak %30
Thus, the most inaccurate of all, including all the others, and the most primitive in the structural sense is the vacuous IM: \(\probs \supset \IM, \forall \IM\).
We say that some \(x\) in \(\pspX\) will happen, without knowing any regularities.
As data accumulate, the IM narrows.
Is there a narrowest IM?
For a discrete \(\pspX\) there is: a vector P of probabilities.
In general the answer is no, from what was written in the discussion above.
The need for a hierarchical closure in the class of all models leads to the need to consider an \emph{empty model} \(\emptyset\).
This is how we refer to an incorrect IM model, when the primary averages \(\pumn g\) are inconsistent, and as a result the bounds of the average by means of the extension formula~\eqref{eq:1.2} are “mixed-up”: the lower one is greater than the upper one, and they also run to infinity: \(\lmn f=\infty, \umn f=-\infty\).
These averages values are obtained \(\forall f\), and we denote by \(\emptyset\) the only IM that produces this inconsistency.
Since for any IM \(\lmn f\geq -\infty, \umn f\leq \infty \ \), it follows that \(\emptyset \subset \IM\), \(\forall \IM\).
Let \(\traits\) be a set of traits.
We shall call a \emph{\(\traits\)-extension \(\IM\)} a model \(\IMfunc{\umn}{\traits}\) in which the primary averages \(\IM\) are the corresponding \(\umn q\), \(q \in \traits\).
This is illustrated in figure~\ref{fig:1.5}.
This is a manner of simplifying \(\IM\) by including it in a polyhedron with facets \(\umn q\), \(q \in \traits\)- our new primary data- and discard the remaining knowledge about \(\IM\).
Of course, in order not to lose too much by this expansion we need to select carefully the facets (the traits \(q\)), which leads to the problem of which elements should we consider, which ones should we leave, and how many of them.
The set of traits \(\traits\) is called the \emph{defining} one for model \(\IM\) if its \(\traits\)-extension matches the model itself: \(\IMfunc{\umn}{\traits}>=\IM\).
The defining features are those that, put together, completely determine the IM.
It is clear that a set of primary traits is always crucial for an IM and even more so any class of traits \(\pchars\) that includes for instance secondary ones. % TODO: is the Russian completely translated? A parenthetical is missing here, for instance.
\begin{minipage}{0.35\textwidth}
%\begin{figure}[h]
\centering
\begin{tikzpicture}[scale=1.2]
\draw (0,0.1) -- (1,1.2) node[sloped, above=-.2em,pos=0.5] {\scriptsize \(\umn g_1\)} -- (4,0) node[sloped, above=-.2em,pos=0.4] {\scriptsize \(\umn g_2\)} -- (1,-1.2) node[sloped, below=-.2em,pos=0.5] {\scriptsize \(\umn g_3\)} -- (0,-0.1) node[sloped, below=-.2em,pos=0.5] {\scriptsize \(\umn g_4\)};
\draw (1.4,0) ellipse (1.25 and .9);
\draw (0.7,0.1) node {\scriptsize \(\mathcal{M}\)};
\draw (2.6,0) node[fill=white] {\scriptsize \(\langle \umn Q\rangle\)};
\end{tikzpicture}
\captionof{figure}{An infinite dimensional IM extension of an IM with four primary values.}
\label{fig:1.5}
\end{minipage}
%\end{figure}
\hfill
\begin{minipage}{0.55\textwidth}\raggedleft
%\begin{figure}[h]
\centering
\begin{tikzpicture}
\draw (-0.2,0) -- (1.65,1.25) -- (-1.6,2.65) -- (-2.1,0.4) -- (-0.2,0);
\draw (0.9,-0.12) -- (3.3,0.1) -- (4.0,1.15) -- (3.6,2.3) -- (1.1,2.55) -- (-0.6,1.2) -- (0.9,-0.12);
%%%%
\draw (-0.2,-0.2) node {\scriptsize \({\bf P}_1\)};
\draw (-2.3,0.4) node {\scriptsize \({\bf P}_2\)};
\draw (-1.39,2.40) node {\scriptsize \({\bf P}_3\)};
\draw (1.85,1.2) node {\scriptsize \({\bf P}_4\)};
\draw (0.9,-0.32) node {\scriptsize \({\bf P}_5\)};
\draw (-0.8,1.2) node {\scriptsize \({\bf P}_6\)};
\draw (1.1,2.75) node {\scriptsize \({\bf P}_7\)};
\draw (3.88,2.3) node {\scriptsize \({\bf P}_8\)};
\draw (4.25,1.15) node {\scriptsize \({\bf P}_9\)};
\draw (3.45,-0.1) node {\scriptsize \({\bf P}_{10}\)};
%%%%
\draw (0.5,1.2) node {\scriptsize \(\mathcal{M}_1\wedge\mathcal{M}_2\)};
\draw (-1.35,1.6) node {\scriptsize \(\mathcal{M}_1\)};
\draw (2.9,1.4) node {\scriptsize \(\mathcal{M}_2\)};
%%%%
\draw[dashed] (-0.2,0) -- (0.9,-0.12);
\draw[dashed] (-1.6,2.65) -- (1.1,2.55);
%%%%
\draw[dashdotted,shorten >=-0.3cm,shorten <=-0.3cm] (-2.5,2.43) -- (-0.7,2.87) node[above,pos=0.95,sloped] {\tiny \(\umn_1 f\)};
\draw[dashdotted,shorten >= -0.5cm, shorten <= .5cm] (-1.5,1.98) -- (2.4,2.87) node[above,pos=1.01,sloped] {\tiny \(\umn_2 f\)};
\draw[dashdotted,shorten >= -1.9cm,shorten <= 1.25cm] (-1.8,1.38) -- (2.6,2.41) node[above,pos=1.35,sloped] {\tiny \(\umn f\)};
\end{tikzpicture}
\captionof{figure}{Intersection and union of IM.}
\label{fig:1.6}
%\end{figure}
\end{minipage}
\pagebreak %31{}
\subsection{Intersection of IM} Let \(\IM_1\) and \(\IM_2\) be to IM defined on the same space \(\pspX\), each of them determined by its set of averages \(\umn_1 f\), \(f \in \bchars_1\) and \(\umn_2 f\), \(f \in \bchars_2\), with respective domains \(\bchars_1,\bchars_2\).
The \emph{intersection} of \(\IM_1\) and \(\IM_2\) is the IM \(\IM=\IM_1 \wedge \IM_2\) defined by the averages \(\pumn f=\min\{\umn_1 f, \umn_2 f\}\), \(\forall f \in \bchars_1 \cup \bchars_2\).
Here, the wavy line means that \(\pumn f\) may not be at first consistent with each other, since the axiom \ref{item:A3} of semi-additivity may not hold, and then it must be reconciled.
This can be seen in figure~\ref{fig:1.6}, where the dashed line indicates the tangent corresponding to the average of \(\umn_1 f, \umn_2 f\) and \(\umn f\), \(\umn f< \pumn f=\min\{\umn_1 f, \umn_2 f\}\).
Secondly, \(\pumn f\) is extended by means of formula~\eqref{eq:1.3} to the class \(\slhull(\bchars_1\cup\bchars_2)\) of traits made up by combinations of elements from \(\bchars_1\) and \(\bchars_2\).
The intersection means that the information included in \(\bchars_1\) and \(\bchars_2\) is correct, and the most accurate one is extracted from them.
As a result, the interval values and the averages are narrowed: \(\IM_1 \wedge \IM_2\subset \IM_1, \IM_1 \wedge \IM_2\subset \IM_2\) (see Fig.~\eqref{fig:1.6}).
The primary averages (faces) of the intersection will be the primary averages of \(\IM_1\) and \(\IM_2\) and no others, meaning that \(\IMfunc{\pumn_1}{\pchars_1} \wedge \IMfunc{\pumn_2}{\pchars_2}=\IMfunc{\pumn_1}{(\pchars_1\cup \pchars_2)}\) is equivalent to adding two primary sets in a single \(\bchars_1 \cup \bchars_2\) preserving the primary averages, some of which may be inconsistent or not affect the intersection.
The domain of existence of the intersection will be \(\slhull(\bchars_{\pchars_1}\cup \bchars_{\pchars_2})\).
The intersection operator extends to an arbitrary number \(\IM_{\theta}=\IMfunc{\umn_{\theta}}{\pchars_{\theta}}\), \(\theta \in \Theta\):
\[
\IM = \bigwedge_{\theta \in\Theta} \IM_{\theta}
\Leftrightarrow
\pumn f = \inf_{\theta \in \Theta} \umn_{\theta} f,
\quad \forall f \in \slhull(\cup_{\theta} \bchars_\theta)
\]
Let us use this fact.
\begin{example}[Representation of IM as intersections]
Any IM \(\IMfunc{\pumn}{\pchars}\) can be represented as an intersection
\[
\IMfunc{\pumn}{\pchars}
= \bigwedge_{g\in\pchars}\IMfunc{\pumn}{g}
\]
where the models \(\IMfunc{\pumn}{g}\) have dimension \(1\) and are defined by a single primary average \(\pumn g\) of the trait \(g\), that here replaces the index \(\theta\).
Then \(\bchars_g=\bchars_\theta\) make up the features dominated by \(g\), and \(\slhull(\cup_g \bchars_g=\bchars_{\pchars}\) their semi-linear combinations.
\end{example}
\subsection{Combination of IM}
Let \(\IMfunc{\umn_{\theta}}{\pchars_{\theta}}\), \(\theta \in \Theta\) be a family of IM on \(\pspX\), indexed by the parameters \(\theta\) running through the set \(\Theta\).
Define their union (denoted as \(\vee\) as follows):
\[\IM=\bigvee_{\theta\in\Theta}\IM_{\theta}\Leftrightarrow \umn f=\sup_{\theta \in \Theta} \umn_{\theta} f, \quad \forall f \in \bigcap_{\theta} \bchars_\theta\]
Here, the average \(\umn f\), obtained by maximizing consistent \(\umn_{\theta} f\) on \(\theta\), is easy to determine, as illustrated in fig.~\ref{fig:1.6}, where the union is hatched and corresponds to the convex hull of \(\IM_1\) and \(\IM_2\).
As can be see from figure~\ref{fig:1.6}, in the combination new facets (primary traits) appear: the marked dashed lines do not match the facets of \(\IM_1,\IM_2\).
In this case, the facets of the union do not go beyond the linear hull of the facets of the components, i.e., speaking strictly, when \(\IM_1=\IMfunc{\pumn_1}{\pchars_1}\) and \(\IM_2=\IMfunc{\pumn_2}{\pchars_2}\), the primary traits of the combination will belong to \(\slhull(\pchars_1\cup\pchars_2)\).
The combination \(\IM_1 \vee \IM_2\) is a symbolic representation of the phrase “The true (correct representation of the phenomenon) model is in \(\IM_1\) or \(\IM_2\)”.
This is a matter in which the uncertainty leads to the expansion of the IM.
\begin{example}[Representation of the union of vertices]\label{ex:1.8}
Let the output space \(\pspX\) be discrete.
Each IM with finite dimension can be determined by specifying its vertices \(\pr_i\)-probability vectors.
In Fig.~\ref{fig:1.6} \(\IM_1\) is the convex hull of \(\pr_1\), \(\pr_2\), \(\pr_3\), \(\pr_4\).
When combining two IM, their vertices are selected from the sets of vertices of \(\IM_1\) and \(\IM_2\).
No other information can appear.
For an IM of any dimension, when \(\IM_1=\bigvee_{\theta}\pr_{1\theta}\) there is a family of vectors whose hull determines \(\IM_1\) (or its boundary), and similarly \(\IM_2=\bigvee_{\vartheta}\pr_{2\vartheta}\).
Then \(\IM_1 \vee \IM_2=\bigvee_{\theta}\pr_{1\theta}\bigvee_{\vartheta}\pr_{2\vartheta}\).
In other words, the combination is performed in these families.
And it suffices to limit yourself to the extreme points of those families (those that are not included in the convex hull of the rest).
\end{example}
\subsection{Properties of the operations}
\begin{enumerate} % TODO: property names in \scshape?
\item
Idempotency: \(\IM \wedge \IM=\IM\), \(\IM \vee \IM=\IM\).
\item
Commutativity: \(\IM_1 \wedge \IM_2=\IM_2 \wedge \IM_1\), \(\IM_1 \vee \IM_2=\IM_2 \vee \IM_1\).
\item
Associativity:
\begin{gather*}
(\IM_1 \wedge \IM_2)\wedge \IM_3=\IM_1\wedge(\IM_2 \wedge \IM_3),
\\
(\IM_1 \vee \IM_2)\vee \IM_3=\IM_1\vee(\IM_2 \vee \IM_3).
\end{gather*}
\item
\(\IM \wedge \probs=\IM\), \(\IM \vee \probs=\probs\).
\item
\(\IM \wedge \emptyset=\emptyset\), \(\IM \vee \emptyset=\IM\).
\end{enumerate}
These properties are proved elementary and generalised to any number of operations.
It is also obvious that
\[
\IM_1 \subset \IM_2
\Rightarrow
\IM_1 \wedge \IM_2=\IM_1, \IM_1 \vee \IM_2=\IM_2
\]
From this we immediately obtain the well-known property of being an algebra.
\begin{enumerate}[resume*]
\item Law of absorption: % TODO: property name in \scshape?
\begin{align*}
(\IM_1 \wedge \IM_2)\vee \IM_1=\IM_1,
&&
(\IM_1 \vee \IM_2)\wedge \IM_1=\IM_1
\end{align*}
(use that \(\IM_1 \wedge \IM_2 \subset \IM_1, \IM_1 \vee \IM_2 \supset \IM_1\)).
This is a replacement for the usual distributivity property for Boolean algebras, that is not satisfied by our operations.
The reason is that the union \(\vee\) of IM is not the usual well-known combination, since the result should be a IM again, which is convex by nature.
It results the inability to determine the complement of IM (or the opposite of IM), since, as a convex body, its complement will not be convex.
\end{enumerate}
\begin{addendum}
\item
For discrete spaces \(\pspX\), formula~\eqref{eq:1.2} follows from the usual duality formula in linear programming, with constraints
\[
\sum_{i=1}^{r} p_i=1, \sum_{i=1}^{r} g_{ji} p_i \leq \pumn g_j, j=1,\dots,k
\]
the maximum of the linear function being found in \(\umn f=\max_{\pr} \sum_{i=1}^r f_i p_i\).
\item
Operations on models can also be defined even if they are set on different outcome spaces, \(\IM_1\) on \(\pspX_1\), \(\IM_2\) on \(\pspX_2\).
Then they are combined on \(\pspX=\pspX_1 \cup \pspX_2\), where \(\IM_1\) is supplemented with the primary value \(\pr_1(\pspX_1)=1\) and \(\IM_2\) with \(\pr_2(\pspX_2)=1\).
As a result, \(\IM_1\) and \(\IM_2\) are reduced to one set of outcomes.
\item
It is said that a phenomenon with set of outcomes \(\pspX\) is described by a set of models \(\IM_\theta, \theta\in\Theta\) when the model is a combination \(\bigvee_\theta\), \(\IM_\theta\).
\item
The representation of IM as a union of vertices given in example~\ref{ex:1.8} is not universal because it is impossible for arbitrary spaces of outcomes \(\pspX\) to separate the “atoms” \(\hyperplane\) of the model.
\end{addendum}
\section{Interval probability distributions}\label{sec:1.4}
\subsection{Properties of interval probabilities}
Historically, the description of random phenomena was based on probabilities, which we owe to the clarity of game examples (coin, cards, dice), and that gave probability theory an initial push and fueled it throughout its development.
This very same clarity, and the subsequent refinement and harmony of the theory led to the ignoring of other approaches \footnote{
In [2], the theory is based on exact averages, but these are endowed with such rigid properties that it completely reduces the resulting models to probabilistic case. % TODO: cite ref 2
}
Our interval models in their definition are based on interval averages, and probabilities -- lower \(\lpr(A)\) and upper \(\upr(A)\)-- are a special case where the traits are indicator functions \(A(x)\) of events: \(\lpr(A)=\lmn A(x)\); \(\upr(A)=\umn A(x)\), \(A \subset \pspX\).
For any IM the probabilities \(\lpr(A)\), \(\upr(A)\) are defined \(\forall A\) (because \(A(x)\in \bchars_{00}\)).
The properties of probabilities follow directly from the consistency of the averages (see section 1.1).
Using the plus sign \(A+B\) and the symbol \(\sum_i A_i\) for the union of disjoint events (in contrast with the general notation foe the association of events \(\cup\)), we have:
\begin{enumerate}[series=Pprop]
\item\label{item:Pprop1}
\(\lpr(\pspX)=\upr(\pspX)=\pr(\pspX)=1\) — the space \(\pspX\) is always a valid event \(\umn 1=\pr \pspX=1\).
\item\label{item:Pprop2}
Conjugacy: \(\lpr(A)=1-\upr(A^c)\), where \(A^c\) is the complementary of event \(A\), because \(\lmn A (x)=1-\umn A^c(x)\).
\item\label{item:Pprop3}
Upper semiadditivity: \(AB=\emptyset\Rightarrow \upr(A+B)\leq \upr(A)+\upr(B)\) (because \(\umn(A(x)+B(x))\leq \umn A(x)+\umn B(x)\)).
\item\label{item:Pprop4}
Lower semiadditivity: \(AB=\emptyset\Rightarrow \lpr(A+B)\geq \lpr(A)+\lpr(B)\) (because \(\lmn(A(x)+B(x))\leq \lmn A(x)+\lmn B(x)\)).
\end{enumerate}
Probabilities satisfying these properties are called \emph{consistent}.
The meaning of the term “consistency” is that if one of these properties is not fulfilled, then one of these bounds can be refined at the expense of others, i.e., \(\lpr(A)\) can be increased or \(\upr(A)\) can be decreased.
The following properties follow directly from consistency:
\begin{enumerate}[resume*=Pprop]
\item\label{item:Pprop5}
\(\pr(\emptyset)=0\) — the probability of an empty event is zero.
\item\label{item:Pprop6}
\(\lpr(A)\leq\upr(A)\) — the lower probability cannot exceed the upper probability.
\item\label{item:Pprop7}
\(AB=\emptyset\Rightarrow \lpr(A+B)\leq \lpr(A)+\upr(B)\leq \upr(A+B)\) (a particular case of the corresponding property for averages, that can be proved using properties 1--4).
\item\label{item:Pprop8}
For a finite number of pairwise disjoint events, \(\upr(\sum_{i=1}^{k} A_i)\leq \sum_{i=1}^{k} \upr(A_i)\) (this follows applying induction on property 3).
\item\label{item:Pprop9}
For a finite or countable number of pairwise disjoint events, \(\lpr(\sum_i A_i)\geq \sum_i \lpr(A_i)\).
For a finite number of \(A_i\) this follows directly from 4, and in the countable case from
\begin{align*}
\lpr\left(\sum_{1}^{\infty} A_i\right)
&= \lpr\left(\sum_1^k A_i + \sum_{k+1}^{\infty} A_i\right)\\
&\geq \lpr\left(\sum_1^k A_i\right)+\lpr\left(\sum_{k+1}^{\infty} A_i\right)
\geq \sum_1^k \lpr(A_i),
\end{align*}
taking the limit in the right-hand side as \(k\rightarrow +\infty\).
\end{enumerate}
Property~\ref{item:Pprop8} extends~\ref{item:Pprop3} and is called \emph{finite upper semi-additivity} of probabilities.
Property~\ref{item:Pprop9} generalises~\ref{item:Pprop4} not only to finite, but to countable sums, and is called \emph{lower countable semi-additivity}.
This is a stronger property, so by nature lower probabilities are “more continuous” than upper probabilities.
For upper probabilities, countable upper semi-additivity holds only under the additional condition that is on the left-hand side of the following statement:
\begin{enumerate}[resume*=Pprop]
\item\label{item:Pprop10}
\(
\lim_{k\rightarrow \infty} \upr(\sum_{k+1}^{\infty} A_i) = 0
\Rightarrow
\upr(\sum_1^{\infty}A_i)\leq\sum_1^{\infty} \upr(A_i).
\)
To prove it, we need to take the limit as \(k\rightarrow\infty\) in the inequality
\begin{equation*}
\upr\left(\sum_1^k A_i + \sum_{k+1}^{\infty} A_i\right)\geq \upr(A_i)+\sum_1^k\upr\left(\sum_{k+1}^{\infty} A_i\right).
\end{equation*}
\end{enumerate}
Let \(B_n\) be a monotone increasing sequence converging to \(B\).
This is written as \(B_n\uparrow B\) and means that \(B_n\subset B_{n+1} \forall n\), and that for each point \(x\in B\) there is some \(n\) such that \(B_n\) includes \(x\).
Nevertheless, even such strict convergence requirements are not enough to guarantee the convergence of their probabilities, which is proclaimed by the following thesis:
\begin{enumerate}[resume*=Pprop]
\item\label{item:Pprop11}
Probabilities are not in general continuous with respect to the monotone convergence of events.
This negative property means that given \(\lpr(B_n)\) and \(\upr(B_n\)) and knowing that \(B_n\uparrow B\), \(\lpr(B)\) and \(\upr(B)\) can be different from the limits \(\lim_{n\rightarrow \infty}\lpr(B_n)\) and \(\lim_{n\rightarrow \infty}\upr(B_n)\) without violating the properties of consistency of probabilities.
Let us show this with an example.
\end{enumerate}
\begin{example}\label{ex:1.9}
Let \(B_n\uparrow B\), \(B_n\neq B\) and make the preliminary assessments \(\upr(B_n)=p_1\), \(\upr(B)=p_2>p_1\).
Then \(\lim_{n\rightarrow \infty} \upr(B_n)=p_10\).
\end{example}
\subsection{Extension of primary probabilities} The \emph{interval probability distribution} (abbreviated IPT) of an IM are the events for which there are primary (exact or interval) probabilities.
The primary traits of an IPT are the indicators functions \(A(x)\) of events \(A\) in the primary set \(\events\), and they are given probabilities in the form of exact values \(\pr(A)\), intervals \(\plpr(A)\), \(\pupr(A)\) or one of the borders, usually the upper one. % TODO: Quique thinks there is a typo in the book in the interval specification (lower bound)
If there is no lower one, you can always put it equal to 0, and if the upper one is not specified you can put \(\pupr(A)\) equal to 1.
This does not change anything except that all events from \(\events\) will have primary interval probabilities, allowing us to use the notation \(\IPT{\plpr(\events),\pupr(\events)}\).
If all the primary probabilities are just the upper ones, then the IPT becomes \(\IPT{\pupr(\events)}\).
Let us consider how to extend the primary probabilities so as to compute the average of any trait.
Since the primary traits of an IPT are events from \(\events\), the secondary ones will the their linear combinations: \(\mifs=\{g(x)=c+\sum c_i A_i(x), A_i\in\events\}\) (the secondary traits are also \(\events\)-measurable functions), where \(c\) and \(c_i\) are arbitrary coefficients and the primary values are transferred to the secondary values by means of~\eqref{eq:1.1}:
\begin{equation*}
\pumn g=c+\sum \pumn c_i A_i(x),
\text{ where }
\pumn c_i A_i(x)=
\begin{cases}
c_i \pupr(A_i) &\text{ if } c_i>0,\\
c_i \plpr(A_i) &\text{ if } c_i<0.
\end{cases}
\end{equation*}
This is the first step, that is not unambiguous, because the same \(g\) could be written in different ways using the \(A_i\) (then we take the minimum \(\pumn g\) over all possible expressions).
Next step is the extension of these averages according to~\eqref{eq:1.2} to all bounded functions \(f(x)\).
But this will only be possible if the primary ones are not contradictory: \(g\geq 0 \Rightarrow \pumn g\geq 0\), \(\forall g\in\mifs\).
Then the formula
\begin{equation*}
\umn f=\inf_{f(x)\leq g(x)\in\mifs} \pumn g, \quad \forall f\in\bchars_0
\end{equation*}
gives consistent values of averages for any bounded above traits.
The class \(\bchars_0\) includes the \(\events\)-dominated traits and therefore constitutes the natural area of extension of the IPT averages: \(\bchars_\events=\bchars_0\). % TODO: Quique not sure about last part of this sentence
In particular, the upper probabilities \(\upr(B)\) will be determined (and through them the lower, by property~\ref{item:Pprop2}) for all events \(B\subset\pspX\); another thing is that they may be trivial, i.e., equal to 1, for many events.
The interest to extend the averages to unbounded traits (such as \(x,x^k, \tan x\), etc) is analogous to the mathematical expectations and justifies the third step, that we proceed to.
\subsection{Limits of the extension of the average}
We act strictly by analogy with integration, keeping in mind that integration by a probability measure leads to mathematical expectations.
In the theory of integration:
\begin{enumerate*}[label=\alph*)]
\item
primary measures are given for events;
\item
their values are assigned to the integrals of indicators of events;
\item
these integrals are extended by additivity to integrals of simple functions;
\item
then we extend these to integrals of simple measurable functions;
\item
the latter, finally, are extended to unbounded functions.
\end{enumerate*}
The last step is applied to IM and is the subject of our next focus.
Let us truncate the unbounded function \(f\) below by \(H_1\) and above by \(H_2\), determining
\begin{equation*}
f^{(-H_1,H_2)}(x)=
\begin{cases}
-H_1 &\text{ if } f(x)<-H_1,\\
f(x) &\text{ if } -H_1\leq f(x)\leq H_2,\\
H_2 &\text{ if } f(x)>H_2.
\end{cases}
\end{equation*}
The functions \(f^{(-H_1,H_2)}(x)\) are always bounded and therefore their average is always defined.
Let us define for an unbounded function
\begin{equation}\label{eq:1.4}
\umn f
= \lim_{H_1\rightarrow \infty}
\lim_{H_2\rightarrow \infty} \umn f^{(-H_1,H_2)}.
\end{equation}
In fact, this is how an unbounded function is always intuitively understood as the limit of its truncated version when the truncation levels converge to infinity; and this is how the average should be understood naturally.
It is important that \(H_2\) converges to \(\infty\) first, since this allows us to obtain the highest value on the right-hand side using monotonicity, and then to take the limit on \(H_1\).
For the limits in~\eqref{eq:1.4}, all the axioms are satisfied.
Out of these, \ref{item:A1}--\ref{item:A3} are obtained using the inequalities:
\begin{enumerate}[label=\arabic*)]
\item
\(f\geq g\Rightarrow f^{(-H_1,H_2)}\geq g^{(-H_1,H_2)}\).
\item
\(b^{+}f^{(-H_1,H_2)}+c=(b^{+}f+c)^{(-b^+ H_1+c; b^+ H_2+c)}\);
\item
\((f+g)^{(-H_1,H_2)}\leq f^{(-H_1/2,H_2/2)}+ g^{(-H_1/2,H_2/2)}\),
\end{enumerate}
in which you need to take \(\umn\) and go to the limits, first \(H_2\rightarrow \infty\), and then \(H_1\rightarrow \infty\).
Axiom~\ref{item:A4} will be true by definition.
And, since \((-f)^{(-H_1,H_2)}=-f^{(-H_2,H_1)}, \lmn f=\lim_{H_2\rightarrow \infty} \lim_{H_1\rightarrow \infty} \lmn f^{(-H_1,H_2)}\).
In the future, we will mainly deal with the upper averages, leaving the lower ones “behind the scenes” of the conversion formula (axiom~\ref{item:A4}).
For a given \(\IM\) denote \(\bchars_{\infty}\) the class of all traits for which the limit in~\eqref{eq:1.4} is not equal to \(\infty\): \(\bchars_{\infty}=\{f:\umn f <\infty\}\).
Let's call this class the \emph{limiting region of existence of the upper averages} IPT, corresponding to the set of averages \(\IMfunc{\umn}{\bchars_{\infty}}\) -- the limit \(\IM_{\infty}\).
The meaning of the extension~\eqref{eq:1.4} is very natural: unlimited traits are understood as having a ceiling that is immeasurably high (similarly as our understanding of cosmic vastness).
The average is computed in the foreseeable range \(-H_1,H_2\), with \(H_i\) increasing more and more and approximating the limiting value.
\begin{smallpar}
The same point of view could be held for any IM, considering all unbounded traits included in \(\bchars\) as having some common ceiling that is so unattainably high that is simply convenient to ignore it in our actions, operating with the “lower” parts of the traits.
With this interpretation, the limit of the IM \(\IM_{\infty}\) is as “sensible” as the one based on primary information and the extension formula.
\end{smallpar}
\textsc{Comments.} % TODO: environment?
\begin{enumerate}
\item
Taking the limit in~\eqref{eq:1.4} mathematically implies the continuity of the right-hand side at \(H_1,H_2 \rightarrow \infty\), and this is an additional property of the averages which does not follow from the ones we proved earlier.
If you do not impose this property, you would take \(\umn f\) different from the right-hand side value in~\eqref{eq:1.4}, so from the formal point of view the limit model \(\IM_{\infty}\) would be a narrowing from \(\IM \supset \IM_{\infty}\) by means of rule corresponding to~\eqref{eq:1.4}.
\item
The idea of taking the limit in the extension of the domain of existence also applies to the general class of IM \(\IMfunc{\pumn}{\pchars}\).
First of all, the averages \(\pchars\) are determined according to the extension formula~\eqref{eq:1.2} to the region \(\bchars_\pchars\) of traits majored by \(\pchars\), and in particular, to the bounded traits \(\bchars_{00}\subset\bchars_{\pchars}\) (if all \(g\in\pchars\) are bounded, then \(\pchars_\bchars=\pchars_0\)).
Then, taking the limit, ~\eqref{eq:1.4} allows us to go from \(\bchars_{00}\) to \(\bchars_{\infty}\).
The linear span \(\lhull^+(\bchars_{\pchars}\cup\bchars_{\infty})\) will become then the extended domain of existence for the averages, the transition to which is equivalent to narrowing the model \(\IMfunc{\pumn}{\pchars}\) to the limit \(\IMfunc{\pumn}{\pchars}_{\infty}\).
\end{enumerate}
\subsection{Illustration of IPT}
Clearly, for discrete spaces \(\pspX\) IPT are represented as polyhedra of probability vectors \(P\), with faces that are parallel to the axes \(P(x_i)=0\), as can be seen from Figure~\ref{fig:1.7}, where the triangle \(\probs\) corresponds to the vacuous IM from Fig~\ref{fig:1.4}.
Here we can see a trapezoid \(\IM_2\) determined by two primary probabilities: \(\plpr_2(x_1)\) on the left side and \(\pupr_2(x_2)\) on the right side, i.e., with dimension 2, whereas \(\IM_1\) has six primary faces and is therefore of dimension six.
The intersection \(\IM_1\vee \IM_2\) has primary probabilities that are obtained by putting together the primary probabilities of \(\IM_1\) and \(\IM_2\).
Some of them will be redundant, because they will lie outside the edges of \(\IM_1 \vee \IM_2\), such as \(\pupr_2(x_2)\) and \(\plpr_1(x_1)\) in Fig.~\ref{fig:1.7}.
\begin{figure}[t]
\centering
\begin{tikzpicture}[scale=3.2,rotate around x=30,rotate around y=-45,rotate around z=0]
\draw (0,0,1) -- node[above=0em,sloped] {\tiny \(P(x_1)=0\)} (0,1,0) -- node[above=0em,sloped] {\tiny \(P(x_3)=0\)} (1,0,0) -- node[below=0em] {\tiny \(P(x_2)=0\)} (0,0,1);
\draw (.15,0.10,1.1) node {\tiny \(P(x_3)=1\)};
\draw (0.5,1.3,-0.1) node {\tiny \(P(x_2)=1\)};
\draw (1.1,-0.05,-0.05) node {\tiny \(P(x_1)=1\)};
%%%
\draw (0.4,0.60,0) -- (0.30,0.60,.1) -- (0.3,0.43,0.27);
\draw (0.3,0.37,0.33) -- (0.30,0,0.70) -- (1,0,0) -- (0.35,0.65,0);
%%%
\draw (0.30,0.12,0.58) -- (0.58,0.12,0.30) -- node[below=-0.15em,sloped,scale=0.9] {\tiny \(\tilde{P}_1(x_1)\)} (0.58,0.27,0.15) -- (0.35,0.50,0.15) -- (0.15,0.50,0.35) -- node[above=-0.15em,sloped,scale=0.9] {\tiny \(\undertilde{P}_1(x_1)\)} (0.15,0.27,0.58) -- (0.30,0.12,0.58);
%%%
\draw (0.43,0.27,0.3) node[scale=0.9] {\tiny \(\mathcal{M}_1\wedge \mathcal{M}_2\)};
\draw[fill=white] (0.3,0.4,0.3) node[scale=0.9] {\tiny \(\mathcal{M}_1\)};
\draw (0.8,0.1,0.1) node[scale=0.9] {\tiny \(\mathcal{M}_2\)};
%%%
\draw[dashed] (0.15,0.5,0.35) -- (0.3,0.60,0.1);
\draw[dashed] (0.15,0.27,0.58) -- (0.30,0,0.70);
%%%
\draw (0.14,0.720,.14) node[scale=0.8](p2) {\tiny \(\undertilde{P}_2(x_1)\)};
\draw[line width=0.5] (p2) -- (0.3,0.55,0.15);
\end{tikzpicture}
\caption{Operations on the IPT.}
\label{fig:1.7}
\end{figure}
The combination of the two IPTs, outlined in Fig.~\ref{fig:1.7}, forms new contour faces that are not parallel to the faces of \(\probs\); in other words, they no longer correspond to probabilities, which leads us to more general IM.
Thus, the class of all IPT is not closed with respect to the join operation.
The vacuous IPT corresponds to the abscence of non-trivial probabilities, and is the same as the vacuous IM.
Let us consider next some special cases of IPT.
\subsection{Finitely additive IPT}
An interval model for which the primary system is made of non-overlapping events is called an \emph{finitely additive interval probability distribution} (in short, \(\Sigma\)-IPT).
Denote \(\events_{\Sigma}=\{A_1,A_2,\dots\}\), \(A_iA_j=\emptyset\), \(i\neq j\) a set of pairwise disjoint events, and let \(\plpr(A_j),\pupr(A_j), j=1,2,\dots\).
Although it is not required that the union of \(A_j\) gives the entire space \(\pspX\), it is convenient to consider this, adding a residual event \(A_0=(\sum_j A_j)^c\) to the original set if necessary (provided it is not empty) and assigning to it a primary trivial probability interval \(\plpr(A_0)=0\), \(\pupr(A_0)=1\).
Then \(\events_{\Sigma}\) becomes a partition of the space \(\pspX\), which will be needed later.
The consistency of primary probabilities is equivalent to the fulfillment of the inequalities \(0\leq\plpr(A_j)\leq\pupr(A_j)\), \(\plpr(\pspX)=\sum_j \plpr(A_j)\leq 1\), \(\pupr(\pspX)=\sum_j \pupr(A_j)\geq 1\).
The last condition is only required if the partition \(\events_{\Sigma}\) is finite.
In the case of countable partitions this is not necessary, because despite the increase of \(\sum_1^k A_j\), as \(k\) increases, we will always have room for a residual event \(A_0\), for which we set \(\pupr(A_0)=1\).
Hence, \(\pupr(\pspX)\geq 1\), where the tilde is used to denote the probability transferred on \(\pspX\).
The secondary features will be all possible finite linear combinations \(g(x)=c+\sum_j c_j A_j(x)\).
They form a linear class of functions \(\mifs_{\Sigma}\).
In particular, it includes the so-called \emph{secondary events} \(A_{J_k}\): these are the ones expressed as unions of events \(A_j\): \(A_{J_k}=\sum{j\in J_k} A_j\), where \(J_k\) is the final index set.
For them \(\plpr(A_{J_k})=\sum_{j\in J_k} \plpr(A_j)\), \(\pupr(A_{J_k})=\sum_{j\in J_k} \pupr(A_j)\) are the primary probabilities carried over by additivity (hence the name of additive IPT).
We have already mentioned the difference between properties~\ref{item:Pprop8} — upper and~\ref{item:Pprop9} — lower probabilities.
This difference is also reflected in the extended probabilities, the one with the “best” properties being the lower probability.
This is manifested in the fact that \(\plpr(A_J)\) can be extended by additivity to sums of countable sets of indices \(J\) and this does not affect the IPT.
Indeed, for \(A_J\) such that finite subsets \(J_k\subset J\) form the “basis” of \(A_J\), so that \(A_{J_k}\subset A_J\) and \(\plpr(A_{J_k})\leq \plpr(A_J)\) \(\forall k\), we can take the maximum value of the left-hand side in this last inequality taking the limit in \(k, J_k \uparrow J\), leading to the formula of countable additivity: \(\plpr(A_J)=\sum_{j\in J}\plpr(A_j)\) for countable \(J\).
In particular, if the number of events in \(\events_{\Sigma}\) is countable, then the complementary \(J_k^c\) of any finite set of indices \(J_k\) will be countable, and as a consequence
\begin{equation*}
\plpr(A_{J_k}^c) = \sum_{j\in J_k^c} \plpr(A_j).
\end{equation*}
The formula of countable additivity does not hold for upper probabilities, because for an event \(A_J\) with countable \(J\) it is impossible to build an upper bound with a finite number of primary events.
\begin{theorem}\label{thm:1.2}
If the primary interval probabilities are set on a partition \(\events_{\Sigma}\) of \(\pspX\), then by
\begin{equation}\label{eq:1.5}
\upr(A_{J_k})=\min\{\pupr(A_{J_k}), 1-\plpr(A_{J_k}^c)\}
\end{equation}
we obtain consistent assessments on secondary events.
The extension to all secondary traits is performed by the formulas
\begin{equation}\label{eq:1.6}
\umn({\textstyle\sum} c_j A_j)=\min_c \left[c+\sum_{c_j>c} (c_j-c)\pupr(A_j)-\sum_{c_jc^*} \pupr(A_j)+\sum_{c_jc} c_j \pupr(A_j)+c^* P^*+ \sum_{c_jc^*} \pupr(A_j)-\sum_{c_jc^*\), where \(c^*\) is chosen so that \(\pr^*\) is a probability vector, i.e., \(\sum_j \pr^*(A_j)=1\), and also taking into account the components \(\pr_i\) of the vector corresponding to the indices \(i\) for which we have the equality \(c_i=c^*\).
Our next extension of the averages is to all traits that are bounded above, which together will form the natural area of existence of the averages for the \(\Sigma-\)IPT, and that is obtained by the well-known formula
\begin{equation*}
\umn f=\inf_{g: f(x)\leq g(x)\in\lhull\events_{\Sigma}} \umn g, \quad f\in\bchars_0.
\end{equation*}
\subsection{Countably additive IPT}
We deliberately refrained for a long time from extending \(\pupr(A_J)\), \(A_J=\sum_j A_j\) to countable \(J\) by the property of summability of series, because this step would immediately lead us out of the accepted axiomatics.
This step is competence of the primary set, to which we turn our attention now.
Let \(\events_{\Sigma}=\{A_1,A_2,\dots\}\) be a countable partition of \(\pspX\).
We have seen that if the primary values are \(\plpr(A_j)\), \(\pupr(A_j)\) then we can extend the lower ones by additivity to \(\plpr(A_J\), including for countable \(J\), while the upper \(\pupr(A_{J_k})\) are only given for finite~\(J_k\).
These are the basis for the probability calculations for \(\Sigma\)-IPT, by means of~\eqref{eq:1.5}.
We will now assume that the primary events are not only \(A_j\) (and hence \(A_{J_k}\)) but also their countable unions \(A_J\).
This forms a system \(\events_\sigma\) of primary events, whose primary probabilities are immediately additive in the sense that \(\pupr(A_J)=\sum_j \pupr(A_j), \plpr(A_J)=\sum_j \plpr(A_j)\) for any countable \(J\) (addendum~\ref{add:1.4.3} shows that this requirement is not mandatory, in the sense that there may be probabilities that are not countably additive on \(\events_{\sigma}\)).
\pagebreak %41
An interval probability distribution, whose primary system \(\events_\sigma\) is made by countable unions of disjoint events and whose primary (lower or upper) probabilities are countably additive, is called \emph{countably additive} (in short, \(\sigma\)-IRV) and is denoted by \(\IPT{\plpr(\events_\sigma),\pupr(\events_\sigma)}\).
For the intervals \(\plpr(A_j)\),\(\pupr(A_j)\), \(j=1,2,\dots\), due to the expansion of the primary set, the countably additive IPT is narrower than the \(\Sigma\)-IPT:
\begin{equation*}
\IPT{\plpr(\events_{\sigma}),\pupr(\events_{\sigma})}
\subset
\IPT{\plpr(\events_{\Sigma}),\pupr(\events_{\Sigma})}.
\end{equation*}
Moreover, in general \(\sigma\)-IPT do not belong to the class \(\sum\)-IPT, because \(\events_{\sigma}\) contains countable unions of events, whereas \(\Sigma\)-IPT are defined by separate instances of events and their probabilities.
Thus, the property of countable additivity is equivalent to the actual expansion of the primary set of events and the imposition of additional requirements on the primary probabilities.
For countably additive IPT, equation~\eqref{eq:1.5} matching probabilities on \(\events_\sigma\) (these are all primary now) holds for every \(A_J\in\events_\sigma\).
Similarly, formula~\eqref{eq:1.6} gives the value for secondary traits, that are any countable sums \(\sum_1^{\infty} c_j A_j\), with \(A_j\in\events_\sigma\).
Next example compares finite and countably additive IPT.
\begin{example}\label{ex:1.10}
Let \(\pspX=\{0,1,2,\dots\}\) be the set of natural numbers and consider the primary probabilities \(\plpr_0,\pupr_0,\plpr_1,\pupr_1\) such that \(\sum_0^{\infty} \plpr_i\leq 1\).
We obtain a \(\sum\)-IPT (actually for arbitrary spaces \(\pspX\) with \(\events_{\Sigma}=\{A_1,A_2,\dots\}\) we obtain the same IPT by making the correspondence \(A_j\rightarrow j, j=0,1,\dots\)).
For a finite set \(J\), we obtain the probabilities by means of~\eqref{eq:1.5}.
Given the countable event \(A=\{0,2,4,\dots\}\) made by the even numbers, and its complementary \(A^c\) by the odd numbers, for a \(\Sigma\)-IPT
\begin{align*}
\plpr(A) = \sum_{j=0}^{\infty} \plpr_{2j},
&&
\plpr(A^c)=\sum_{j=0}^{\infty} \plpr_{2j+1},
\end{align*}
and consistent values are obtained from \(\lpr(A)=\plpr(A),\upr(A)=1-\plpr(A^c)\).
Note that \(\pupr_j\) is not used for determining the upper probability of countable events \(A\) due to the absence of finite covering systems of primary events.
For a countably additive IPT, an additional consistency condition is \(\sum_1^\infty \pupr_j\geq 1\).
For finite \(J_k\) the values are the same.
For the one considered above, we have
\(
\pupr(A) = \sum_{j=0}^{\infty}\pupr_{2j},
\pupr(A^c) = \sum_{j=0}^{\infty}\pupr_{2j+1},
\)
and other values will be agreed upon
\begin{align*}
\lpr(A)=\max\{\plpr(A),1-\pupr(A^c)\};
&&
\upr(A)=\min\{\pupr(A),1-\plpr(A^c)\},
\end{align*}
which corresponds to more accurate probabilities.
This increase in accuracy is achieved due to the fact that in the primary set we have not only the natural numbers but also countable unions of them.
\pagebreak %42
If we give initial probabilities \(\plpr_j=\pupr_j=\pr_j\), \(j=0,1,\dots\) such that \(\sum_0^\infty \pr_j=1\) (consistency), it is easy to see that the probability of any event \(A\) will be precise: \(\lpr(A)=\upr(A)=\pr(A)\) both for finite and for countably additive IPT.
I.e., both types of IPT coincide and for any of them the probability of any countable set is precise and equal to the sum of the probabilities of its elements.
This is the law of countable additivity for exact probabilities, which is true for discrete (finite or countable) partitions \(\events_{\Sigma}\) of an arbitrary space \(\pspX\).
\end{example}
\subsection{Generalizations}
Here we will consider the case when \(\events_{\Sigma}\) is not discrete, i.e., finite or countable.
It is convenient for clarity to consider \(\pspX\) as the set of real numbers~\(\reals\).
A class of events \(\events\) is called a \emph{ring} and denoted \(\rings\) when it is closed with respect to the intersection operation and the symmetric difference (\(\Delta\)):
\(
A,B\in\rings
\Rightarrow
AB\in\rings,
A\Delta B=AB^c \cup A^cB \in\Delta\in\rings.
\)
If it is also closed with respect to countable unions, then it is called a \emph{\(\sigma\)-ring} and denoted \(\rings_{\sigma}\).
On the real numbers \(\reals\) the ring of sets formed by all possible finite disjoint unions (sums) of disjoint intervals is called the \emph{ring of intervals} \(\rings_{\Sigma}\) (for more information, see [20]). % TODO: cite ref 20
A \emph{measure} on the ring \(\rings\) is a non-negative finitely additive function, and on \(\rings_{\sigma}\) it is a countably additive function.
Note that the unions \(A_{J_k}=\sum_{j\in J_k} A_j\) of disjoint \(A_j\) form a ring (and if we consider countable unions, a \(\sigma\)-ring), and \(\plpr(A_J),\pupr(A_J)\) determine a measure (that we can extend without loss of generality to a countably additive measure).
Let us return to the general case and assume that the primary values are given by \(\plpr(A)\), \(\pupr(A)\), \(A\in\rings\) on some ring \(\rings\).
A finitely additive measure determines a \(\Sigma\)-IPT, a countably additive measure on a \(\sigma\)-ring determines a \(\sigma\)-IPT.
For instance, on \(\reals\) the lower and upper probabilities \(\plpr[a,b)\), \(\pupr[a,b)\) are defined for any interval and extended by additivity to finite unions of intervals.
In this case, the upper probability \(\pupr\) may be greater than 1, which is not a problem since it is later corrected by consistency.
In general, the values \(\upr(A)\) defined as primary probabilities on small sets are no longer probabilities, but measures on wide sets.
The consistency requirement on primary probabilities should be as follows:
\pagebreak %43
\begin{enumerate}[label=\alph*)]
\item \(0\leq\plpr(A)\leq\pupr(A) \ \forall A\in\rings\);
\item \(\plpr(A)\leq 1 \ \forall A\in\rings\);
\item \(\pupr(\pspX)\geq 1\).
\end{enumerate}
The last condition makes sense only when \(\pspX\) belongs to the primary events.
Otherwise it is not necessary, since by axiom~\ref{item:A1}, it will be \(\pr(\pspX)=1\).
The formula for the extension of the probabilities will be identical to~\eqref{eq:1.5}.
For arbitrary events, it gives
\begin{equation*}
\upr(B)
= \min\left\{\inf_{B\subset A \in\rings} \pupr(A), 1-\sup_{B^c \supset A \in \rings}\plpr(A)\right\}.
\end{equation*}
Note that if the upper probabilities are not specified, then we can assume \(\pupr(A)=1\) \(\forall A\in\rings\) and \(\lpr(A)=\plpr(A)\), \(\upr(A)=1-\plpr(A^c)\), \(A\in\rings\).
The lower probability is additive on unions: \(\lpr(\sum A_i)=\sum \lpr(A_i)\), but the upper one is not.
A trait \(f(x)\) is called \emph{\(\rings\)-measurable} when it is a uniformly convergent limit of finite linear combinations of events from \(rings\).
Strictly speaking, the class of all \(\rings\)-measurable traits are formed by closing the class \(\lhull\rings\) with respect to uniform convergence.
The term “measurable \(f\)” means that, being able to measure the probabilities of the events \(A\in\rings\), you can be as precise as you like when computing the integral of \(f\) in a finite number of steps.
If \(f\in\lhull\rings\), that is, if \(f\) is a finite linear combination of events from \(\rings\), then \(\umn f\) is given by ~\eqref{eq:1.6}.
If \(\pspX=\reals\), for functions \(f\) that are measurable with respect to the ring of intervals (these are the functions that are continuous on \(x\) or with discontinuities of the first kind), the sum in~\eqref{eq:1.6} is transformed into an integral:
\begin{equation}\label{eq:1.7}
\umn f=\min_c \left[c+\int (f(x)-c)^+ d\pupr-\int (c-f(x))^+ d\plpr\right],
\end{equation}
where the plus symbol means that the non-negative part of the function is taken.
For \(\Sigma\)-IPT (the primary class is the ring of intervals \(\rings_{\Sigma}\)), the integral is understood in the Riemann-Stieltjes sense; for \(\sigma\)-IPT it is understood in the Lebesgue-Stieltjes sense, and the class is extended to the class of Lebesgue-measurable traits.
\subsection{Precise probability distributions}
Assume on a primary system of events \(\events\) we are given precise probabilities: \(\plpr(A),\pupr(A)=\pr(A), A\in\events\).
Their consistency is equivalent to: 1) \(A,B\in\events, A\subset B \Rightarrow \pr(A)\leq\pr(B)\) 2) if \(A_j\in\events\) are pairwise disjoint events whose union is also included in \(\events\), then the law of additive probabilities holds:
\begin{equation*}
\sum A_j \in\events
\Rightarrow
\pr\left(\sum A_j\right)=\sum \pr(A_j).
\end{equation*}
The probability of the union of disjoint \(A_j\in\events\) will always be exact if the union is finite (this is a consequence of properties \ref{item:Pprop8}, \ref{item:Pprop9}), since the requirement of finite additivity of precise probabilities is equivalent to their consistency.
This is not the case for the requirement of countable additivity, that should be nonetheless imposed if we know that \(\sum_1^\infty A_j \in\events\), i.e., that the union has precise probability.
The general properties of exact probabilities follow from directly from the properties of interval probabilities (see the beginning of this section).
Let us continue with the numbering of the properties stated there, and take \(A_i\in\events\).
\begin{enumerate}[resume*=Pprop]
\item\label{item:Pprop12}
\(\pr(\pspX)=1,\pr(\emptyset)=0\).
\item\label{item:Pprop13}
\(\pr(A)=1-\pr(A^c)\).
\item\label{item:Pprop14}
\(\pr(\sum_{1}^{k} A_i)=\sum_{1}^{k} \pr(A_i)\).
\item\label{item:Pprop15}
\(\lim_{k\rightarrow \infty}\upr(\sum_{k+1}^{\infty} A_i)=0\Rightarrow \pr(\sum_1^{\infty} A_i)=\sum_1^{\infty} \pr(A_i)\).
\end{enumerate}
Denote \(\events_*\) the class of events where probabilities remain accurate when making the extension from \(\events\).
Obviously, \(\events_* \supset \events\).
Properties~\ref{item:Pprop12}, \ref{item:Pprop13} imply that \(\pspX,\emptyset \in \events_*\) and that \(A\in\events_* \Rightarrow A^c \in\events_*\).
Property~\ref{item:Pprop14} means that \(\events_*\) is closed with respect to finite unions of disjoint events.
The wider \(\events\) is in terms of the number of events and the closeness of the operations, the richer the set \(\events_*\) where probabilities are exact will be.
Let us move to the case where \(\events\) is closed with respect to intersections and differences, i.e., when it forms a ring of sets \(\events=\rings\) (you can also consider a semi-ring, such as the intervals of a numeric line).
By properties~\ref{item:Pprop12}, \ref{item:Pprop13}, \ref{item:Pprop14}, the set \(\events_*\) where probabilities are exact forms an algebra of events (an algebra is a ring that includes \(\pspX\), i.e., closed with respect to unions).
\emph{Finitely additive probability distributions} (denoted \(\distr_{\Sigma}\)) are given by primary exact probabilities on the algebra (ring, semi-ring) of events.
They are a particular case of \(\Sigma\)-IPT, where the primary probability intervals are exact values.
Replacing \(\lpr(A_i)=\pr(A_i), A_i \in\events_*\), we have by property~\ref{item:Pprop9} \(\lpr\left(\sum_1^{\infty} A_i\right)=\sum_1^{\infty} \pr(A_i)\).
In other words, the exact initial probabilities determine by additivity the lower probability of countable unions.
But the upper probabilities do not, except for the case when the remainders \(\sum_{k+1}^{\infty} A_i\) can be covered by events \(B_k\) whose probabilities as \(k\rightarrow \infty\) are arbitrarily small.
Then property~\ref{item:Pprop15} applies.
For illustration purposes, example~\ref{ex:1.11} is provided below.
The property of countable additivity, mechanically extended to countable unions, is equivalent to extending the primary set to a \emph{sigma-algebra} \(\events_{\sigma}\) that includes the total event and unions of events, and the model for a countable additive probability distribution \(\distr_{\sigma}\) on \(\events_{\sigma}\), which is a particular case of \(\sigma\)-IPT:
\begin{equation*}
A_i\in\events_{\sigma}
\Rightarrow
\sum_1^{\infty} A_i \in \events_{\sigma},
\pr_{\sigma}\left(\sum_1^{\infty} A_i\right)=\sum_1^{\infty} \pr_{\sigma}(A_i)
\end{equation*}
(the primary set can be \(\events_{\sigma}\) and the probabilities on it may be accurate but not countably additive, as we show in addendum~\ref{add:1.4.2}).
For \(\pr_{\sigma}\), the property of monotone convergence is satisfied: \(B_n \uparrow B \Rightarrow \pr_{\sigma}(B_n)\rightarrow \pr_{\sigma}(B)\), which is somewhat equivalent to the countable additivity of precise probabilities [19]. % TODO: cite ref 19
The average of \(\events^*\)-measurable traits is exact, and can be obtained using~\eqref{eq:1.7} as \(\mn f=\int f d\pr\), where for \(\distr_{\Sigma}\) these are Riemann-Stieltjes integrals and for \(\distr_{\sigma}\) Lebesgue-Stieltjes integrals (then the class of traits \(f\) with exact averages expands to those that are Lebesgue-measurable).
The extension of the average to unbounded traits is made by means of~\eqref{eq:1.4}:
\begin{align*}
\lmn f
= \lim_{H_2\rightarrow\infty} \lim_{H_1\rightarrow\infty} \mn f^{(-H_2,H_1)},
&&
\umn f
= \lim_{H_1\rightarrow\infty} \lim_{H_2\rightarrow\infty} \mn f^{(-H_1,H_2)}.
\end{align*}
% NOTE: Quique observed that the subindices in his original differ from this scanned version!
In this case, all truncated traits \(f^{(-H_1,H_2)}\) must be measurable.
Obviously, the average will be accurate \(\lmn f=\umn f\) if the limits on the right-hand side do not depend on the order in which \(H_1,H_2\) converge to infinity- this will be the integral of an unbounded function.
For exact averages, it holds that \(\mn \sum f_i=\sum \mn f_i\), so the symbol \(\mn\) can be carried for finite sums of traits.
This is a general property.
In which sense does \(\distr_{\Sigma}\) differ from \(\distr_{\sigma}\)?
The answer to this question is given in the following example:
\begin{example}[Uniform distribution (measure length)]\label{ex:1.11}
Let \(\pspX=[0,1)\) and consider the primary probabilities \(\pr[a,b)=b-a\), \(0\leq a**\sum_{x\in D_{\infty}} \pr(x)=0\).
This illustrates the lack of need for countable additivity.
\end{example}
\subsection{Cumulative distribution functions}
Let \(\pspX=\reals\), and let the primary events be nested left-unbounded half intervals \((-\infty,y], y \in\pspY\) where \(\pspY\) is an arbitrary subset of \(\reals\).
Primary probabilities
\begin{equation*}
\plpr(-\infty,y)=\pldf(y),
\quad
\pupr(-\infty,y)=\pudf(y),
\quad
y\in\pspY
\end{equation*}
as a function of variable \(y\) are called lower and upper primary distribution functions, and the extended IPT they determine is called the \emph{interval distribution function}.
The condition of consistency of primary probabilities results in the requirements
\begin{equation*}
0\leq \pldf(x)\leq\pudf(y)\leq 1
\quad \forall x\leq y, x,y\in\pspY,
\end{equation*}
meaning that the lower distribution function on \(y\) should never be above the upper \(\pudf(y)\) (this will hold if, as usual, you set \(\pldf(y)\) and \(\pudf(y)\) and require that the lower is not larger than the upper: \(\pldf(y)\leq \pudf(y)\), \(\forall y \in\pspY\).)
\pagebreak %47
The consistency of the initial probabilities and their extension to any interval \((-\infty,x), x\in\reals\) is carried out according to the formula
\begin{align}\label{eq:1.8}
\ldf(x)=\sup_{x\geq y\in\pspY} \pldf(y),
&&
\udf(x)=\inf_{x\leq y\in\pspY} \pudf(y).
\end{align}
The probabilities of half-intervals are extended to those of single segments:
\begin{align*}
\lpr(y,z)=[\ldf(z)-\udf(y)]^+;
&&
\upr(y,z)=\udf(z)-\ldf(y),
\end{align*}
where the plus sign indicates the non-negative part of the function.
Extending these formulas to finite sums of segments is made according to the following expressions:
\begin{align*}
\lpr\left(\sum_1^k [y_i,z_i)\right)
&= \sum_1^k \lpr[y_i,z_i);\\
\upr\left(\sum_1^k [y_i,z_i)\right)
&= \udf(z_k)-\ldf(y_1)-\sum_1^{k-1} [\udf(y_{i+1})-\ldf(z_i)]^+,
\end{align*}
in which it is assumed that \(y_1\leq z_1\leq y_2\leq z_2\leq\dots\leq y_k\leq z_k\).
For exact distribution functions we must have \(\df(x)=\ldf(x)=\udf(x)\ \forall x\) and the probabilities of the intervals are given by the exact increments: \(\pr[y,z)=\df(z)-\df(y)\).
The same applies to their finite sums, while the further extension on traits is made by means of Riemann-Stieltjes integration: \(\mn g=\int g(x) dF(x)\).
\subsection{Similar IPT} % TODO: check this title
Let us begin with special cases.
Consider an increasing set of events on a space \(\pspX\) by \(\events_{\uparrow}=\{A_y, y\in\pspY\subset \reals \}\), \(A_y\subset A_{y'}\) for \(yx_m\).
\pagebreak %49
\subsection{Relative probabilities and averages}
Truth is understood in comparison.
This philosophical saying also applies to random phenomena.
Sometimes the source data is comparative information about probabilities and averages.
These are judgements of the type “event \(A\) is more likely than event \(B\)”, expressed briefly “\(\pr A\geq \pr B\)” or “on average, trait \(f\) takes a greater value than trait \(g\)”, that can also be expressed as “\(\mn f\geq \mn g\)”.
This kind of information is called \emph{relative} (or subjective [14]) probabilities or averages, and is typical of expert systems. % TODO: cite ref 14
Let us show how they can be represented as families of probability distributions and how they can be transformed into average values of the corresponding traits.
Imagine for a moment that the probabilities of \(A\) and \(B\) are exact (which implies the statistical stability of the phenomenon) and correspond to any exact probability distribution \(\distr\) such that \(\pr(A)\geq\pr(B)\).
From the properties of the average, it then satisfies \(\mn_{\distr}(A-B)=\mn_{\distr}A-\mn_{\distr}B=\pr(A)-\pr(B)\geq 0\). % NOTE: Quique says: there is a correction here with respect to the printed version!
The phrase “\(\pr A\geq\pr B\)” corresponds to the family of all such distributions: \(\IM=\vee\distr\), and according to the rules for computing the averages in those cases, \(\umn(A-B)=\inf_{\distr} \mn_{\distr}(A-B)=0\).
The result may be written in terms of the upper average as \(\umn(B-A)=0\), and would be translation of the statement “\(\pr A\geq\pr B\)” as a primary average (see table~\eqref{tab:1.1}).
From “\(\pr A\geq\pr B\)” the extension of primary averages gives \(\lpr(B)\leq\lpr(A)\leq\upr(B), \ \ \upr(B)\leq\upr(A)\) (because \(\umn B-\umn A\leq 0=\umn(B-A)\leq\umn B-\lmn A\) and \(0=\umn(B_A)\geq\umn B-\umn A\)).
This may be interpreted as saying that the interval probability of event \(A\) overlaps in general with that of event \(B\), and is shifted towards larger values.
Different statements about the superiority of probabilities correspond to primary averages of the form \(\pumn(B(x)-A(x))=0\), for different \(A\) and \(B\) that are then extended to any traits according to the axiomatic of averages.
Interestingly, the rules of logical inference will also be correct here: “\(\pr A\geq\pr B\)” and “\(\pr B\geq\pr C\)” implies “\(\pr A\geq\pr C\)” (because \(\umn(C-B+B-A)\leq \umn(C-B)+\umn(B-A)\leq 0\)).
Let us move now to the average.
Exactly the same is shown, proposition “\(\mn f\geq \mn g\)” in stable conditions is equivalent to \(\lmn(f-g)=0\).
The conditions agreeing with this statement in term of the averages of the traits \(f\) and \(g\) are \(\lmn g\leq \lmn f \leq \umn g,\ \ \umn g\leq \umn f\).
\textsc{Comment}. % TODO: needs separate environment
The relation \(\lmn f \geq \umn g\), in which the intervals do not overlap, represents a a stronger statement “\(f\) is on average greater than \(g\) under any (unstable) condition”, corresponding to the case where the exact averages \(\mn f, \mn g\) do not exist and are considered only in the interval sense.
\begin{addendum}
\item\label{add:1.4.1} \textsc{Axiomatization of IPT.}
Interval probability distributions can be defined as sets of interval probabilities \(\lpr(A),\upr(A), \forall A\subset\pspX\), related through the following axioms [21]: % TODO: cite ref 21
\begin{enumerate*}[label=\arabic*)]
\item \(\lpr(\pspX)=\upr(\pspX)=1\);
\item \(\upr(A+B)\leq\upr(A)+\upr(B), AB=\emptyset\);
\item \(\lpr(A+B)\geq \lpr(A)+\lpr(B), AB=\emptyset\);
\item \(\lpr(A)=1-\upr(A^c)\) (other equivalent variants of the set of axioms are also possible).
\end{enumerate*}
Consistent probabilities in the sense of these axioms can be the result of the extension of primary probabilities, and a further extension towards the averages of traits using the consistency rules of IM leads to the interval averages \(\lmn f,\umn f\).
This path traditionally echoes the modern probabilistic approach, where the averages for measurable \(f\) are usually called mathematical expectations as a result of mathematical calculation \(\mn f\) by a exact probability distribution \(\distr_{\sigma}\).
The axiomatization of the theory based on probabilities leads to an open construction, since it outlines the class of models only by interval probability distributions, which together form only a very narrow class of IM.
\item\label{add:1.4.2} \textsc{Proof of theorem~\ref{thm:1.2}.}
Let us establish formula~\eqref{eq:1.5}.
Let \(A_J\) be majorized either by itself, i.e., by the sum \(g_1(x)=\sum_{j\in J} A_j(x)\), whence \(\pumn g_1=\pupr(A_J)\) or by the trait \(g_2(x)=1-\sum_{j\notin J} A_j(x)\), whence \(\pumn g_2=1-\plpr(A^c_J)\).
The minimum (most accurate) of \(\pumn g_1, \pumn g_2\) determines the value of \(\upr(A_J)\) according to~\eqref{eq:1.5}.
It is easy to verify that no other secondary trait that dominates \(A_J(x)\) results in a more precise value for this probability.
To prove~\eqref{eq:1.6}, we need to represent \(\IPT{\plpr(\events_{\Sigma}),\pupr(\events_{\Sigma})}\) as a family of \(\events_{\Sigma}\)-exact distributions \(\distr\) such that \(\plpr(A_j)\leq\pr(A_j)\leq\pupr(A_j)\) and make sure that the maximum in \(\distr\) under these restrictions gives
\begin{equation*}
\umn \sum c_j A_j(x)=\max_{\distr} \sum c_j \pr(A_j)=\sum c_j \pr^*(A_j)
\end{equation*}
which is attained by the probability distribution
\begin{equation*}
\pr^*(A_j) =
\begin{cases}
\pupr(A_j) &\text{ if } c_j>c^* \\
x\pupr(A_j)+(1-x) \plpr(A_j) &\text{ if } c_j=c^*,\\
\plpr(A_j) &\text{ if } c_j c^*\), and in order to comply with the normalization condition, the minimum values \(\plpr(A_j)\) are forced to be left at the \(A_j\) where \(g(x)\) is small, i.e., \(c_j \umn g\). Geometrically, the secant IM \(\IMfunc{\mn_*}{g}\) (for a finite possibility space \(\pspX\)) can be thought as a hyperplane of vectors of probabilities. In the case of \(\pspX=\{x_1,x_2,x_3\}\) the cross-section \(\IM_{\mn_*\pchars}\) is shown in Figure~\ref{fig:1.8} with a straight line.
\begin{figure}
\centering
\begin{tikzpicture}[scale=3.2,rotate around x=30,rotate around y=-45,rotate around z=0]
%Ideal scale=3
\draw (0,0,1) -- (0,1,0) -- (1,0,0) -- (0,0,1);
\draw[dashed,shorten >=-1cm,shorten <=-1.2cm] (0.6,0.4,0) -- (0,0.3,0.7) node[above=0em,sloped,pos=-0.25] {\scriptsize \(\langle Mg\rangle\)};
\draw (0.21,0.15,0.64) -- (0.7,0.15,0.15) -- (0.3,0.6,0.1) -- (0.15,0.5,0.35) -- (0.21,0.15,0.64);
\draw[line width=1.4pt,shorten <=.24cm] (0.54,0.39,0.07) -- (.21-0.06*11.1/21.6,0.15+0.35*11.1/21.6,.64-.29*11.1/21.6);
%%%
\draw (0.20,0.10,0.7) node {\scriptsize \({\bf P}_1^{*}\)};
\draw (0.77,0.10,0.13) node {\scriptsize \({\bf P}_2^{*}\)};
\draw (0.25,0.65,0.1) node {\scriptsize \({\bf P}_3^{*}\)};
\draw (0.10,0.55,0.35) node {\scriptsize \({\bf P}_4^{*}\)};
%%%
\draw (0.07,0.86,0.07) node {\scriptsize \(\mathcal{J}\)};
%%%
\draw (0.55,0.22,0.23) node {\scriptsize \(\mathcal{M}\)};
\draw (0.29,0.47,0.24) node(mg) {\scriptsize \(\mathcal{M}_{Mg}\)};
\draw (mg) -- (0.32,0.36,0.32);
\end{tikzpicture}
\caption{Cross-sections of the model.}
\label{fig:1.8}
\end{figure}
Let us consider how to determine the averages corresponding to the sections. Let \(\IM_{\mn_*\pchars}=\IM \wedge \IMfunc{\mn_*}{\pchars}\). The primary averages for \(\IM_{\mn_*\pchars}\) will be \(\umn h, h\in\bchars\) and \(\mn_*g, g \in \pchars\), so
\begin{equation*}
\umn_{\mn_*g}(f)=\inf_{h+cg \geq f, h \in \bchars} (\umn h+c\mn_* g),
\end{equation*}
where the infimum is searched for \(c\in\mathbb{R}\) and \(h\in\bchars\). At a given \(c\), the infimum is attained for \(h\) when \(h=f-cg\), and as a result
\begin{equation}\label{eq:1.9}
\umn_{\mn_*g}(f)=\min_c (\umn (f-cg) +c\mn_* g).
\end{equation}
\emph{Note.} If \(g\) does not belong to the domain of existence of \(\IM\), i.e., \(g\notin \bchars \cap (-\bchars)\), then for any trait \(f\in\bchars \cap (-\bchars)\) it holds that \(\umn_{\mn_*g}(f)=\umn f\) (because \(\umn (f-cg)=\infty\) for \(c\neq 0\)).
Similarly, if \(\pchars_k=\{g_1,\dots,g_k\}\) is a finite set of traits, then
\begin{equation}\label{eq:1.10}
\umn_{\mn_* \pchars_k}(f)=\min_c (\umn (f-\sum_1^k c_i g_i) +\sum_1^k c_i \mn_* g_i).
\end{equation}
For an arbitrary set \(\pchars\), using the equations \(\IM_{\mn_*\pchars}=\IM \wedge \IMfunc{\mn_*}{\pchars}=
\IM \wedge (\bigwedge_{\pchars_k \subset \pchars} \IMfunc{\mn_*}{\pchars_k})=\bigwedge_{\pchars_k \subset \pchars} (\IM \wedge \IMfunc{\mn_*}{\pchars_k})=\bigwedge_{\pchars_k \subset \pchars} \IM_{\mn_*\pchars_k}\), we obtain
\begin{equation*}
\umn_{\mn_* \pchars}(f)=\inf_{\pchars_k \subset \pchars} \umn_{\mn_* \pchars_k}(f)
\end{equation*}
where the infimum is taken for all finite subsets \(\pchars_k\) of \(\pchars\).
\subsection{Properties of sections} The cross-section has the usual properties of IM. In addition, the following properties are true:
\begin{enumerate}
\item \(\IM_{\mn_* \pchars}\subset \IM\).
\item \(\IM_{\mn_* \pspX}=\IM\) if \(\mn_*\pspX=1\), otherwise \(\IM_{\mn_* \pspX}=\emptyset\).
\end{enumerate}
\pagebreak %
\begin{enumerate}
\item[3.] \((\IM_{\mn_* \pchars_1})_{\mn_* \pchars_2}=\IM_{\mn_*(\pchars_1\cup \pchars_2)}\), i.e., the \(\mn_*\pchars_2\)-section of \(\IM_{\mn_* \pchars_1}\) is equal to the \(\mn_*(\pchars_1\cup \pchars_2)\)-section of \(\IM\), that is, \(\IM \wedge \IMfunc{\mn_*}{\pchars_1} \wedge \IMfunc{\mn_*}{\pchars_2}\).
\item[4.] If \(\pumn h, h \in \mathcal{H}\) define a model \(\IMfunc{\pumn} {\mathcal{H}}\), the section \(\mn_* \pchars\) will be measured in terms of the averages \(\pumn \mathcal{H}\cup \mn_*\pchars\), i.e., \(\IMfunc{\pumn} {\mathcal{H}}_{\mn_*\pchars}=\IPT{\pumn \mathcal{H}\cup \mn_*\pchars}\). % TODO: the use of \IPT here seems strange; I'd expected something like \IMfunc, but that does not square with the arguments; it may well be that \IPT and \IMfunc should be joined into one operator based on internpretation
\item[5.] \(\IM=\bigwedge_{\theta} \IM_{\theta} \Rightarrow \IM_{\mn_*\pchars}=\bigwedge_{\theta}(\IM_{\theta})_{\mn_*\pchars}\): intersections commute with cross-sections.
\item[6.] The boundaries \(\umn_{\mn_*\pchars} f\) are additive with respect to the addition of finite linear combinations of features from \(\pchars\): \(\umn_{\mn_*\pchars} (f+\sum c_i g_i)=\umn_{\mn_*\pchars} f+\sum_i c_i \mn_* g_i\).
\end{enumerate}
The proof of these properties is elementary.
It follows from property 6 that the cross section will be the same for all IMs that are obtained by ``shifting'' all primary traits by \(\sum c_i g_i, g_i\in\pchars\), and correspondingly shifting their average by \(\sum c_i \mn_* g_i\).
\subsection{Representation theorem of IM}
\begin{theorem}\label{thm:1.3}
Any IM \(\IM\) is represented as the union of its \(\mn_*\pchars\)-sections:
\begin{equation}\label{eq:1.11}
\IM=\bigvee_{\mn_* \pchars} \IM_{\mn_* \pchars},
\end{equation}
where \(\pchars\) is any set of traits and the aggregation is made on the values \(\mn_* g\) in the intervals \(\lmn g\leq \mn_* g\leq \umn g, g \in \pchars\).
\end{theorem}
The proof of the theorem is given in Appendix 1 at the end of the paragraph.
{\em Note}. In the union~\eqref{eq:1.11} we may assume that \(\mn_* g\) takes values between \(-\infty\) and \(\infty\), since at \(\mn_*g<\lmn g\) and \(\mn_*g>\umn g\) the cross-section is empty: \(\IM_{\mn_* \pchars}=\emptyset\).
If \(\pchars\) is an arbitrary set of events \({\mathcal B}\), it follows from the theorem that each IM can be represented as a union of IMs with exact probabilities on the elements of \({\mathcal B}\):
\begin{equation*}
\IM=\bigvee_{\lpr \mathcal{B} \leq P_* \mathcal{B} \leq \upr \mathcal{B}} \IM_{P_* \mathcal{B}},
\end{equation*}
where \(P_* \mathcal{B}=\{P_* B: B\in\mathcal{B}\}\) is the set of probabilities between the lower probability \(\lpr(B)\) and the upper probability \(\upr(B)\).
The geometric representation of the theorem can be seen in Figures~\ref{fig:1.8} and~\ref{fig:1.9}: \(\IM\) is described by a sequence of parallel segments obtained by intersecting the lines \(\IMfunc{\mn_*}{g}\) with the model body \(\IM\).
\begin{figure}
\centering
\begin{tikzpicture}[scale=0.8]
% Ideal scale=1
\draw[rotate around={-35:(2.6,1.3)}] (2.6,1.3) ellipse (1.2 and 0.8);
\draw (0,0) -- (5,0) -- (2.5,4.33) -- (0,0);
\draw[dashed] (-.3,-1.2) -- (1.63,2.02) node[above=0em,pos=0.15,sloped,scale=0.8] {\tiny \(P(x_1)=\upr(x_1)\)};
\draw[dashed] (.79,-0.84) -- (2.3,1.67) node[above=0em,pos=0.1,sloped] {\tiny \(P'\)};
\draw[dashed] (1.6,-0.92) -- (3.15,1.67) node[above=0em,pos=0.1,sloped] {\tiny \(P''\)};
\draw[dashed] (2.52,-1.20) -- (3.64,0.67) node[above=0em,pos=0.25,sloped,scale=0.8] {\tiny \(P(x_1)=\lpr(x_1)\)};
\draw (2.5,1.3) node {\tiny \(\mathcal{M}_0\)};
\clip[rotate around={-35:(2.6,1.3)}] (2.6,1.3) ellipse (1.2 and 0.8);
\draw (1.3,0) -- (3,2.84);
\draw (1.2,0) -- (2.9,2.84);
\draw (1.1,0) -- (2.8,2.84);
\draw (1.0,0) -- (2.7,2.84);
\draw (0.9,0) -- (2.6,2.84);
\draw (0.8,0) -- (2.5,2.84);
\draw (0.7,0) -- (2.4,2.84);
\draw (0.6,0) -- (2.3,2.84);
\draw (0.5,0) -- (2.2,2.84);
\draw (2.15,0) -- (3.85,2.84);
\draw (2.25,0) -- (3.95,2.84);
\draw (2.35,0) -- (4.05,2.84);
\draw (2.45,0) -- (4.15,2.84);
\draw (2.55,0) -- (4.25,2.84);
\draw (2.65,0) -- (4.35,2.84);
\draw (2.75,0) -- (4.45,2.84);
\draw (2.85,0) -- (4.55,2.84);
\draw (2.95,0) -- (4.65,2.84);
\draw (3.05,0) -- (4.75,2.84);
\draw (3.15,0) -- (4.85,2.84);
\draw[draw=white,fill=white,rotate around={20:(1.9,1.7)}] (1.7,1.5) rectangle (2.1,1.9) node[pos=.5] {\tiny \(\mathcal{M}_1\)};
\draw[draw=white,fill=white,rotate around={20:(3.3,0.9)}] (3.1,0.7) rectangle (3.5,1.1) node[pos=.5] {\tiny \(\mathcal{M}_2\)};
\end{tikzpicture}
\caption{Sectional representation of the model}
\label{fig:1.9}
\end{figure}
This is with one \(g\). If there is more than one, the cross-sections will give increasingly smaller elements (down to atoms of the model -for discrete spaces probability vectors- and then \(\IM\) is represented as a family of atoms of the model -probability vectors). This is formulated as a corollary.
\begin{corollary}
\(\IMfunc{\pumn}{\pchars}=\bigvee_{\IMfunc{\mn_*}{\pchars} \subset \IMfunc{\pumn}{\pchars}} \IMfunc{\mn_*}{\pchars} \)
\end{corollary}
Here \(\IMfunc{\mn_*}{\pchars}\) are simple models defined by exact values \(\mn_*g, g\in\pchars\). Thus, models with primary averages \(\pumn g, g\in\pchars\) are represented as families of models with exact values \(\mn_* g, g\in\pchars\) such that \(\mn_* g\leq \pumn g\).
Corollary 1 remains valid if instead of \(\pchars\) we take any set of traits that includes \(\pchars\); and also any set of traits and events from which each element of \(\pchars\) can be obtained by taking linear combinations (and their closures). In particular, if it is a system of events, we have the following statement:
\begin{corollary}
If all attributes of a set \(\pchars\) are measurable with respect to a system of events \({\mathcal A}\) (i.e., if they are representable as linear combinations of indicators of events in \({\mathcal A}\) or their closures under uniform convergence), then
\begin{equation*}
\IMfunc{\pumn}{\pchars}=\bigvee_{\IMfunc{P_*}{\mathcal{A}}\subset \IMfunc{\pumn}{\pchars}} \IMfunc{P_*}{\mathcal{A}}.
\end{equation*}
\end{corollary}
According to this corollary, the IM is represented as a family of probability distributions that is exact on \(\mathcal{A}\). If \(\mathcal{A}\) is an algebra or a ring of events, these will be finitely additive probability distributions. But what about if \(\mathcal{A}\) is a sigma algebra, i.e. an algebra closed with respect to countable unions? Then again the IM is representable in terms of \(\mathcal{A}\)-exact probability distributions, but again these will be mainly finitely additive distributions (!).
It can be concluded that countably additive distributions are themselves too rare an exception in the "family" of probability distributions to be able to describe many IM (in particular, of finite dimension). Moreover, the extension of the \({\mathcal A}\) system to break down the space \(\pspX\) into smaller and smaller pieces and hence the logical transition to Borel sigma-algebras and smaller Lebesgue algebras, although somewhat increasing the descriptive power of count-additive distributions, does not change the principle basis of the conclusion.
The hidden twist in consequences 1 and 2 is that statistically unstable phenomena are described in terms of families of exact models corresponding to stable phenomena, in particular by means of families of exact probability distributions. Statistical instability is mutually ``pumped'' into informational instability, into our ignorance of exact probabilities and unknown choices. At first sight it seems paradoxical, but in fact it is quite natural, because in both cases at independent repetitions in the limit we will receive different arithmetic mean, and these are also averages in their interval sense.
\subsection{Definition of an IM by setting sections} It has just been said that one can think of an IM as a union or a family of its smaller parts- models. But this is also a way of defining \(\IM\) when it is initially defined not through its \(\umn f\) environments but as a union of simpler description models \(\IM^*_{\theta}: \IM=\bigvee_{\theta} \IM^*_{\theta}\). The main reason for this is that it is easy to find an average \(\umn_{\theta} f\). Then \(\umn f=\sup_{\theta} \umn_{\theta}^* f\). This is one way of indirectly setting the IM, various aspects of which are discussed here.
Let us first consider the case where \(\IM^*_{\theta}\) is \(\mn^* \pchars\)-precise and the role of the onedimensional parameter \(\theta\) is played by the set of averages \(\mn^* g, g\in\pchars\) itself. Let us write
\begin{equation}\label{eq:1.12}
\IM=\bigvee_{\mn^*\pchars} \IM^*_{(\mn^*\pchars)}.
\end{equation}
The symbol \(\mn^*\pchars\) on the right is deliberately enclosed in parentheses, to indicate the fact that for \(\IM^*_{(\mn^*\pchars)}\) the average of the traits \(g\in\pchars\) is exact, and equal to the value of the corresponding parameter: \(\IM^*_{(\mn^*\pchars)}(g)=\mn^* g\) and that, in contrast with~\eqref{eq:1.11}, the ``parameters'' do not have to run all values in \([\lmn g, \umn g], g \in\pchars\) and \(\IM^*_{(\mn^*\pchars)}\) in turn does not have to be the \(\mn\pchars\) sections of the model \(\IM\). As can be clearly seen in Figure~\ref{fig:1.9}, where \(\IM=\IM_0 \vee \IM_1 \vee \IM_2\) can be given by the cross-sections of \(\IM_1\) and \(\IM_2\) and here, even if the sections of \(\IM_0\) are empty, it is still \(\IM=\IM_1\vee \IM_2\), i.e., the convex hull of \(\IM_1\) and \(\IM_2\) (or their sections) defines \(\IM\).
In formula~\eqref{eq:1.12} we will call \(\IM^*_{(\mn^*\pchars)}\) the \emph{model defining sections}. Let us consider an example.
\begin{example}\label{ex:1.12}
Let \(\pspX=\{x_1,x_2,x_3\}\) be made by three elementary outcomes and let \(\IM\) be interpreted as a convex family of probability vectors \(\pr\). As we see from Figure~\ref{fig:1.9}, the description of \(\IM\) by primary averages, equivalent to the description of the contour of \(\IM\) by tangent lines (of which there are infinitely many), is not convenient. At the same time, each of the \(\pr^*(x_1)\)-sections \(\IM^*_{\pr^*(x_1)}\) has a relatively simple structure, given by the exact probability \(\pr^*(x_1)\) and the limits for \(\pr^*(x_2)\): \(\lpr_{P^*(x_1)}(x_2)\leq \pr^*(x_2) \leq \upr_{P^*(x_1)}(x_2)\), that depend, in general, on \(\pr^*(x_1)\). Since the values \(\IM^*_{\pr^*(x_1)}\) determine \(\IM=\bigvee_{\pr^*(x_1)} \IM^*_{\pr^*(x_1)}\), it is necessary to specify the limits of variation of the parameters \(\pr^*(x_1)\) either in the range of values from \(\lpr(x_1)\) to \(\upr(x_1)\) corresponding to the \(\IM\), or in the narrower of the two ranges \([\lpr(x_1),p']\) and \([p'',\upr(x_1)]\) corresponding separately to \(\IM_1\) and \(\IM_2\), as indicated in Figure~\ref{fig:1.9}.
\end{example}
\pagebreak %
Let us return to formula~\eqref{eq:1.12}. Let the models \(\IM^*_{(\mn^*\pchars)}\) be described as the union of the values \(\mn^*(g), g \in\pchars\) (different for different models) together with the same primary average \(\pumn h, h \in \mathcal{H}\):
\begin{equation*}
\IM^*_{(\mn^*\pchars)}=\IMfunc{\pumn}{\mathcal{H}} \wedge \IMfunc{\mn^*}{\pchars}.
\end{equation*}
The combining them by \(\mn^* g, g \in \pchars\), that are bounded above by \(\pumn g, g \in\pchars\), defines an IM \(\IM\), with the same primary averages \(\pumn \mathcal{H}\) together with \(\pumn \pchars\). This is formally written down as:
\begin{equation*}
\IM=\bigvee_{\mn^* g \leq \pumn g, g \in \pchars} (\IMfunc{\pumn}{\mathcal{H}} \wedge \IMfunc{\mn^*}{\pchars})=\IMfunc{\pumn}{\mathcal{H}} \wedge \IMfunc{\pumn}{\pchars}.
\end{equation*}
Let us explain the above. Let \(\pchars=g\) and let us depict in Figure~\ref{fig:1.10} the \(\mn^* g\)-section as a planar polyhedron \(\IM^*_{(\mn^* g)}\) on the hyperplane of the exact value \(\mn^* g\). If its facets \(\pumn h_j\) do not change with the \(\mn^* g\)-“shifts”, then they remain the same also for the figure \(\IM\) resulting from the translation of this polyhedron when \(\mn^* g\) changes from \(\lmn g\) to \(\umn g\). Moreover, the two faces will correspond to the “extreme” values \(\mn^* g=\lmn g\) and \(\mn^* g=\umn g\). The polyhedra located on them completely describes \(\IM: \IM=\IM^*_{(\lmn g)} \vee \IM^*_{(\umn g)}\). A similar reduction is possible with any set \(\pchars\).
A slightly more general case than the previous one will take place if (the other conditions being equal), \(\pumn_{(\mn^* \pchars)}(h)\) depends on \(\mn^* g_i, g_i\in\pchars\), and vary linearly with their limits:
\begin{equation*}
\pumn_{(\mn^* \pchars)}(h)=\tilde{m}_h+\sum c_i(h) \mn^* g_i, h \in\mathcal{H},
\end{equation*}
where the coefficients \(c_i(h)\) depend on \(h\).
In Figure~\ref{fig:1.10} this will appear as a change in the direction of motion when displacing the planar polyhedron, which will cause a change in the final position of the faces of \(\IM\) corresponding to the traits of \(h\) on the sections, and hence of h itself, which will be transformed into \(h-\sum c_i(h) g_i\) with average \(\mn[h-\sum c_i(h) g_i]=\tilde{m}_h, h \in\mathcal{H}\). These primary averages, together with \(\pumn g, \plmn g\), will determine \(\IM\). Let us look at an example of this representation.
\begin{figure}[b]
\centering
\begin{tikzpicture}[scale=0.9]
\draw (0,0) rectangle (4,1);
\draw (0,0) -- node[above=-0.2em,pos=0.5,sloped,scale=0.9] {\tiny \(\widetilde{M}h_3\)} (-0.75,-0.5) -- (-1.6,.5) -- (-0.75,1.5) -- node[above=-0.2em,pos=0.75,sloped,scale=0.8] {\tiny \(\widetilde{M}h_1\)} (0,1) -- node[right=-0.08em,pos=0.5] {\tiny \(\widetilde{M}h_2\)} (0,0);
\draw (-0.75,1.5) -- (3.25,1.5) -- node[above=-0.2em,pos=0.55,sloped,scale=0.8] {\tiny \(\widetilde{M}h_1\)} (4,1) -- (0,1) -- (-0.75,1.5);
\draw (-0.8,0.5) node {\scriptsize \(\mathcal{M}^{*}_{( \undertilde{M}g )}\)};
\draw (-0.75,-0.5) -- (3.25,-0.5) -- node[below=-0.1em,pos=0.5,sloped] {\tiny \(\widetilde{M}h_3\)} (4,0);
\draw (4,0) -- node[right=-0.08em,scale=0.8] {\tiny \(\widetilde{M}h_7\)} (4,1);
\draw[dashed] (3.25,-0.5) -- (2.4,.5) -- (3.25,1.5);
\draw (-0.75,1.6) node {\tiny \(\undertilde{M}g\)};
\draw (3.25,1.75) node {\tiny \(\widetilde{M}g\)};
\draw (3.2,0.5) node[gray] {\scriptsize \(\mathcal{M}^{*}_{( \undertilde{M}g )}\)};
\end{tikzpicture}
\caption{Shaping the faces of the model.}
\label{fig:1.10}
\end{figure}
\begin{example}
Let the setting sections \(\IM^*_{(\mn^* X)}\) at each fixed \(\mn^* X\) on \(\mathcal{R}\) be given by the averages
\(\pumn_{(\mn^* X)} X^2=\widetilde{m}_2+\mn^* X\). This corresponds to the case considered above with \(g(x)=x, h(x)=x^2\). Then the union of the cross sections by the parameter \(\mn^* X\) varying within \(\plmn X \leq \mn^* X \leq \pumn X\), form an IM with primary averages \(\widetilde(X^2-X)=m_2, \plmn X, \pumn X\).
\end{example}
We have considered the case where the primary averages of \(\mn^* \pchars\)-sections depend on \(\mn^* \pchars\). Another way is to make the primary traits of the \(h_{\mn^* \pchars}\)-sections themselves the dependent on \(\mn^* \pchars\).
\begin{example}
Setting a random variable to mean and variance. The \emph{variance} is the second moment centered to a zero mean, that is \(\underline{\overline{\sigma}}=\underline{\overline{M}}(X-\mn X)^2\). In the general case, it is not possible to determine the variance. It is only possible under the assumption of a precise \(\mn X\), i.e., that the \(\mn^* X\)-section \(\IM^*_{(\mn^* X)}\) determines the model \(\pumn^*_{(\mn^* X)}(X-\mn^* X)^2=\widetilde{\sigma}^2_{(\mn^* X)}\). Combining the sections by \(\mn^* X\) will give a model of the random variable, given by the limits of \(\plmn X, \pumn X\), the change in the mean and (at each \(\mn^* X\)) the upper dispersion \(\widetilde{\sigma}^2_{(\mn^* X)}\).
Converting in each section the expression for the variance, taking into account that \(\mn^* X\) is precise, we have \(\widetilde{\sigma}^2_{(\mn^* X)}=\pumn^*_{(\mn^* X)}(X-\mn^* X)^2=\pumn^*_{(\mn^* X)} X^2-(\mn^* X)^2\), or \(\pumn^*_{(\mn^* X)} X^2=\widetilde{\sigma}^2_{(\mn^* X)}-(\mn^* X)^2\) % TODO: is this a typo?
It follows that the primary trait \(h(x)=(x-\mn^* X)^2\) for the \(\mn^* X\)-sections may be replaced by \(x^2\) with the value \(\widetilde{\sigma}^2_{(\mn^* X)}-(\mn^* X)^2\) that depends non-linearly on \(\mn^* X\).
This technique can be extended to the setting of IM with interval central moments and cumulants.
\end{example}
\subsection{Representation via standard IM} Let us return to the general idea: \(\IM=\bigvee_{\theta} \IM_{\theta}\), particular cases of which are both the notation of the IM as a convex hull of simple distributions (vertices) and as a union of sections. It is desirable for all defining \(\IM_{\theta}\) to have the same, sufficiently simple structure.
We will focus on the case where all \(\IM_{\theta}\) are derived in an uncomplicated way from a single \(\IM_0\), called the standard one. This is done by introducing a \(\theta\)-dependent correspondence between the features \(f\leftrightarrow f_\theta\), so that their averages for the standard and the traslated IM are identical: \(\pumn_0 f_0=\pumn_\theta f_\theta\), without disturbing the consistency. Then to find \(\pumn_{\theta} \phi\) it is enough to find the trait \(f\) corresponding to \(\phi\) (at a given \(\theta\)) and to take from it the mean \(\pumn_0 f\).
Let us consider more strictly the question, what should be the correspondence between the traits in this scheme, how do they relate to each other? Let a standard \(\IM_0\) be defined by its mean \(\IM_0 f, f \in \bchars\), and give \(\IM_\theta\) values
\begin{equation}\label{eq:1.13}
\umn_{\theta} f=\umn_0 (L_\theta f),
\end{equation}
where \(L_\theta\) is the operator that maps the area \(\bchars_\theta\) of existence of \(\IM_\theta\) into the area \(\bchars_0\) of existence of \(\IM_0: \bchars_\theta \rightarrow^{L_{\theta}} \bchars_0, \theta \in \Theta\).
\pagebreak %
%%%%Kuznetsov calls this ''assertion'', create an environment for that?
\begin{theorem}\label{thm:1.4}
The averages in formula~\eqref{eq:1.13} are consistent if \(L_{\theta}\) satisfies the following two properties: (a) linearity: \(L_\theta(c_1 f_1+c_2+c)=c_1 L_\theta(f_1)+c_2 L_\theta(f_2)+c\); b) order preservation: \(f_1\geq f_2 \Rightarrow L_\theta f_1 \geq L_\theta f_2\).
\end{theorem}
\begin{proof}
It must be proven that for the bounds determined by~\eqref{eq:1.13} the IM axioms are satisfied (see pp.15). A1 and A4 are obvious. Next, \(b\geq 0 \Rightarrow \umn_{\theta}(bf+c)=\umn_0 L_\theta(bf+c)=\umn_0 (b L_\theta f+c)=b \umn_0(L_\theta f)+c)=b \umn_\theta f+c\). This establishes A2. Finally, A3 follows from the relations \(\umn_\theta(f+g)=\umn_0 L_\theta(f+g)\leq \umn_0 L_\theta(f)+\umn_0 L_\theta(g)=\umn_\theta(f)+\umn_\theta(g)\). This completes the proof.
\end{proof}
From the properties of linearity and order preservation of the operator \(L_\theta\) it follows that if the primary set of the standard IM \(\IM_0=\IMfunc{\pumn_0}{\pchars}\) is \(\pchars\), then the primary sets for \(\IM_\theta\) are \(\pchars_\theta=\{g_\theta: L_\theta g_\theta=g, g \in\pchars\}\) with mean \(\pumn_\theta g_\theta=\pumn_0 g\) if \(L_\theta g_\theta=g\).
\subsection{Functional representations}
Let us consider one particular case of the previous representation. To do this, we turn to the notation of \(x\) in the form of a mapping \(x=V_{\theta}\xi\), where \(V_{\theta}\) is a known operator that depends on an unknown parameter \(\theta\) taking values on \(\Theta\). These are common in signal extraction tasks where noise \(\xi\) (vector or process) acts in the communication channel described by the operator \(V_{\theta}\), where \(\theta\) is an unknown one-dimensional or multidimensional channel parameter (or signal and noise parameters), and \(x\) is the resulting vector or process. The ``noise'' IM \(\IM_0^{\xi}\) is defined, playing the role of an standard IM, and the IM \(\IM^x\) of the observation is required.
If the operator \(V_\theta\) at each \(\theta\) maps ``realisations'' \(\xi\) to ``realisations'' of x in a mutually-ambiguous way, then \(L_\theta\), defined as \(L_\theta f(x)=f(V_\theta \xi)\), will satisfy the terms of statement~\ref{thm:1.4}. The equalities \(\umn_\theta^x f(x)=\umn^\xi_0 f(V_\theta \xi)\) in~\eqref{eq:1.13} will generate a series of IM \(\IM^\theta_x\), whose union in \(\theta\) will give the model \(\IM^x\) of the observations, determined by the mean
\[
\umn f(x)=\sup_\theta \umn^x_\theta f(x)=\sup_\theta \umn^\xi_0 f(V_\theta \xi)
\]
\begin{example}
In radiolocation and communication problems, a mixture of signal \(\omega_t\), where \(t\) is time, with noise \(\xi_t\) is often written in the form \(X_t=\theta \omega_t+\xi_t\), where \(\theta\geq 0\) is the unknown amplitude of the signal. Let the noise \(\xi_t\) be described by the IM \(\IM_0\). Then the model \(\IM_\theta\) and the observations \(X_t\) is defined at each \(\theta\) by \(\umn_\theta f(X_t)=\sup_{\theta\geq 0} \umn_0 f(X_t-\theta \omega_t)\). These determine the model \(\IM\). Here, if \(\IM_0=<\pumn_0 \pchars\) has a primary set of traits \(\pchars\), then the primary traits of \(\IM_\theta\) are \(g_\theta=g(X_t-\theta \omega_t\), with the same primary averages as those set in \(\pchars\): \(\pumn_\theta g(X_t-\theta \omega_t)=\pumn_0 g\).
\end{example}
\subsection{Density}
Let us consider a procedure for expressing a standard IM in terms of another, using a function \(p(x)\) of variable \(x\).
Let us first consider the simplest case where \(\pspX=\mathbb{R}\), and where on the primary ring \(\mathcal{K}_{\Sigma}\) of all segments there is a finitely additive distribution \(\mathcal{P}_1\) with exact probabilities \(P_1[x,y), xp_2\}\).
Combining \(N\) and \(N'\) we get \(\upr(N\cup N')=0\), which proves the statement.
\end{smallpar}
An event \(N_0\) that is null for \(\IM_0\) must also be null for \(\IM_1\): \(\upr_0(N_0)=0\Rightarrow \upr_1(N_0)=\umn_0 N_0 p=0\). However, \(p(x)\) can in principle take any value on \(N_0\), even \(+\infty\) (since \(0\infty=0\)). Moreover, the event on which \(p(x)=\infty\) must be null.
\textit{If \(p\) is the formal density of \(\IM_1\) over \(\IM_0\) and \(p>0\), then \(\frac{1}{p}\) is the formal density of \(\IM_0\) over \(\IM_1\): \[p=\IM_1/\IM_0>0 \Rightarrow \frac{1}{p}=\IM_0/\IM_1.\]}
The proof follows from the equality \(\umn_1(f/p)=\umn_0(fp/p)=\umn_0(f)\), and it \(p=\infty\) at \(x\in N\), then the \(p/p\) ratio in this region can be given any value between 0 and \(\infty\).
\begin{theorem}
Let the primary traits for \(\IMfunc{\pumn_0}{\pchars_0}\) be \(\pchars_0\) and \(\mn_0 p(x)=1\). Then the primary traits for a model \(\IMfunc{\pumn_1}{\pchars_1}\) for which the formal density with respect to \(\IMfunc{\pumn_0}{\pchars_0}\) exists and is equal to \(p(x)\), is given by \[\bchars_1=\{g_1(x): g_1(x)=g(x)/p(x), g(x) \in \bchars_0\}\cup\{\pm 1/p(x)\},\] with averages \[\pumn_1(g/p)=\pumn_0(g), g\in\bchars_0, \pumn_1(1/p)=-\pumn(-1/p)=1.\]
\end{theorem}
\begin{smallpar} % TODO: perhaps this should be a proof environment
Indeed, since without loss of generality we can assume that \(p(x)\) is a primary trait for \(\IMfunc{\pumn_0}{\pchars_0}>\) with average \(\mn_0 p=1\), we get
\begin{align*}
\umn_1 f&=\umn_0 fp=\inf\{[c+c_0\umn_0 p+\sum c_i^+ \pumn_0 g_i]: c+c_0 p+\sum c_i^+ g_i \geq fp \}\\&=\inf\{[c \umn_1 1/p+c_0+ \sum c_i^+ \pumn_0 g_i]: c/p+c_0+\sum c_i^+ g_i/p \geq p \},
\end{align*}
from which the statement of the theorem follows.
\end{smallpar}
Thus, the meaning of the formal density of \(\IM_1\) with respect to \(\IM_0\) is to recalculate the primary traits \(g_i\rightarrow gi/p, p\rightarrow 1/p\) and give them identical primary averages, which can be regarded as a one-to-one correspondence between the models. It is enough to consider those models whose primary averages are consistent.
Let us consider now the implications of the theorem.
\pagebreak %
\begin{corollary}
The region of existence \(\bchars_1\) of the upper average of the IM \(\IM_1\) whose formal density with respect to \(\IMfunc{\pumn_0}{\pchars_0}\) is \(p(x)\) is the set of traits that are majorizable by linear combinations of the form \(c+\sum c_i^+ g_i/p, g_i \in\pchars_0\). The traits from \(\bchars_1\) are representable as \(f=c+\sum c_i^+ g_i/p+f_0\) where \(g_i\in\pchars_0\) and \(f_0\) is bounded above.
\end{corollary}
\begin{corollary}
\(p=\IM_1/\IM_0=\IM'_1/\IM'_0 \Rightarrow p=(\IM_1 \wedge \IM'_1)/(\IM_0 \wedge \IM'_0) \).
\end{corollary}
It can be concluded that for the existence of a formal density \(p=\IM_1/\IM_0\), it is necessary that \(\IM_0\) has a richer primary set than \(\IM_1\). The ratio \(\pchars/p\) results in a loss of information, at least in the region where \(p(x)=0\). Hence, the dimensionality of \(\IM_1\) should not be higher than that of \(\IM_0\). If \(p(x)>0\ \forall x\), then the dimensionality of the two models should be the same and the primary sets are in a way equivalent.
Let us consider an example of the definition of a formal density for an IM that is not an exact probability distribution.
% TODO: the numbering in Kuznetsov is wrong! for now, we make the same mistake
\addtocounter{example}{-1}
\begin{example}
Let \(\IM_0\) be the IM defined by the initial moments \(\pumn_0 X^i, i=1,\dots,k\), that are upper and therefore imprecise except for \(\plmn_0 X^2=\pumn_0 X^2=\mn_0 X^2\). The function \(p(x)=x^2\) will be the formal density \(x^2=\IM_1 /\IM_0\) for a \(\IM_1\) with primary averages \(\mn_1(1/X^2)=1, \umn_1(1/X)=\umn_0 X, \umn_1 X=\umn_0 X^3, \dots, \umn_1 X^{k-2}=\umn_0 X^k.\) The range of existence of \(\bchars_1\) will be the set of traits \(f\) of the form \(c+\sum_1^k c_i X^{i-2}+ f_0 \in\bchars_0 \). In this example, since \(p(x)>0\) at \(x\neq 0\) the function \(\frac{1}{p(x)}=\frac{1}{x^2}\) will be a formal density of \(\IM_0\) with respect to \(\IM_1\), if we exclude the point 0 from the numerical line on which they are set.
\end{example}
Note that the density of one IRV with respect to another can only exist for exact distributions, since the requirement \(\mn_0 p=\mn_1(1/p)=1\) excludes any other cases.
\begin{addendum}
\item \textsc{Proof of the representation theorem} Let us prove the theorem for the section \(\IM\) of a one dimensional parameter \(\mn g\). Taking into account the definition of the union, it suffices to prove the equality \[\max_{\lmn g \leq \mn g \leq \umn g} \umn_{\mn g} f=\umn f.\]
Let us denote the left-hand side by \(\overline{\umn}f\).
Using formula~\ref{eq:1.9}, we obtain
\[\overline{\umn}f=\max_{\lmn g \leq \mn g \leq \umn g} \min_c [\umn (f-cg)+ c \mn g]=\max_{\mn g} \min_c W(c,\mn g).\]
The function \(W(c,\mn g)\) is linear (and hence concave) in \(\mn g\) and convex in the parameter \(c\), since for \(0\leq \gamma \leq 1\):
\begin{align*}
&W(\gamma c_1+(1-\gamma) c_2,\mn g)=\umn[\gamma f -\gamma c_1 g +(1-\gamma) f+ (1-\gamma)c_2 g]\\
&+ (\gamma c_1+(1-\gamma) c_2) \umn g \leq \gamma \umn[f -c_1 g] \gamma c_1 \mn g\\
&+ (1-\gamma) \umn[f-c_2 g]+ (1-\gamma) c_2 \umn g =\gamma W(c_1,\mn g)+ (1-\gamma) W(c_2, \mn g).
\end{align*}
Therefore, using the well-known minimax theorem \cite{22}, the maximum and minimum can be swapped. We obtain
\[\overline{\umn}f=\min_c \max_{\mn g} W(c,\mn g)=\min_c [\umn(f-cg)+\max\{c\lmn g, c \umn g\}]\]
which equals \(\umn f\), since the minimum is reached at \(c=0\) (indeed at \(c\geq 0, \umn(f-cg)+\max\{c\lmn g, c \umn g\}=\umn(f-cg)+c \umn g \geq \umn f-c \umn g+ c \umn g=\mn f\), and the same inequality is true for \(c\leq 0\)). The theorem is then proved for the case where \(\pchars\) consists of a single trait.
By induction the theorem extends to the case where \(\pchars=\pchars_k\) is a finite set. Finally, for an arbitrary set \(\pchars\), if we denote
\[\overline{\umn}f=\sup_{\mn \pchars} \umn_{\mn \pchars} f=\sup_{\mn \pchars} \inf_{\pchars_k \subset \pchars} \umn_{\mn \pchars_k} f\] it is necessary to prove the equality \(\overline{\umn}f=\umn f\). Given \(f\) such that \(|\umn f|<\infty\) (whence \(|\umn_{\mn\pchars} f|<\infty\)) and a fixed \(\epsilon>0\), there exists some \(k'\) and some \(\pchars_{k'}\) such that \[\inf_{\pchars_k \subset \pchars} \umn_{\mn \pchars_k} f\geq \umn_{\mn \pchars_{k'}} f-\epsilon.\]
As a consequence, \[\overline{\umn}f \geq \sup_{\pchars_{k'}\subset \pchars} \umn_{\mn \pchars_{k'}} f-\epsilon \geq \umn f-\epsilon.\]
At the same time \(\IM_{\mn \pchars}\subset \IM\), whence \(\bigvee_{\mn \pchars} \IM_{\mn \pchars}\subset \IM\) and \(\overline{\umn}f \leq \umn f\). Consequently, for any trait \(f\) in the domain of existence \(\bchars\) of the mean, we have \(|\overline{\umn}f - \umn f|\leq \epsilon\), and the validity of the theorem follows now from the regularity of \(\epsilon\). This completes the proof.
\item \textsc{An example of recalculation using formula~\eqref{eq:1.13}}. Let \(y=V_{\theta} x\) be a one-to-one mapping from \(\pspX\) to \(\pspX\). Then the operator \(L_\theta f(x)=f V_\theta x\) satisfies immediately the properties a) and b) in statement~\ref{thm:1.4}. In linear spaces \(\pspX\) (e.g., if \(\pspX=\mathbb{R}^n\)) examples of thys type of operator are linear transformations \(L_\theta f(x)=f(x-b_{\theta} x_0)\) where the element \(x_0\) characterises the direction of the shift. By~\eqref{eq:1.13}, we obtain \(\umn_{\theta} f(x)=\umn_0 f(x-b_{\theta} x_0)\). Here \(\IM_{\theta}\) is derived from \(\IM\) by shifting all \(x\in\pspX\) by a value of \(b_\theta x_0\).
\end{addendum}
\section{Conditional interval models}\label{sec:1.6}
\subsection{Problem statement}. Interval models provide descriptions of phenomena that have not yet occurred. Suppose that such an unconditional description of \(\IM\) is composed as a set of consistent means \(\umn f, f \in \bchars\). And then it is additionally known that event \(B\) has occurred. This represents partially, but not completely, the result of the phenomenon, unless it is an elementary event. Then after \(B\) the uncertainty remains and the phenomenon is described by a new model \(\IM_B\), called the \emph{conditional} at \(B\). The conditional model has its own averages \(\umn_B f\).
\pagebreak %
The presence of a credible event \(B\) for the conditional model corresponds to the probability \(\lmn_B B(x)=\lpr_B B=1\). But that's not all there is to it. All averages are converted from \(\umn f\) to \(\umn_B f\), and as part of them also the probabilities \(\upr(A)\) to \(\upr_B(A)\). But, how?
If event \(B\) has exact probabilities \(\pr(A B)\) and \(\pr(B)\), then the computation of the conditional probability is done simply by \[\pr_B(A)=\pr(AB)/\pr(B).\]
What if it is not exact? If only the probability in the numerator is imprecise, then to calculate \(\upr_B(A)\), we use \(\upr(AB)\) instead, which corresponds to the supremum \(\pr(AB)\). The same principle is used to compute the average of any trait.
But how do you define conditional averages and probabilities when the denominator is the probability \(\pr(B)\) in the denominator is determined by the interval \(\lpr(B),\upr(B)\)? The main difficulty here is that of you take the supremum on the right-hand side, then \(\pr(AB)\) and \(\pr(B)\) are bound: it is impossible to put at the same time \(\pr(B)=\lpr(B)\) and \(\pr(AB)=\upr(AB)\) (just as it is absurd to wish, filling the “probability mass” to the edges of the “bay” \(AB\), ensure at the same time the minimal total filling of both bays \(AB\) and \(AB^c\) of components \(B\)).
\subsection{Definition of a conditional interval model} We start with the reasoning that will lead us to the definition. Let \(\IM_{P_*(B)}\) be the \(\pr_*(B)\)-section of \(\IM\). The probability of event \(B\) for \(\IM_{P_*(B)}\) is exact and equal to \(\pr_*(B)\), and \(\pr_*(B)>0\). The IM \(\IM_{P_*(B),B}\) corresponding to the \(\IM_{P_*(B)}\)cross-sections are then obtained by narrowing the domain and existence to the traits \(f(x) B(x), f\in\bchars,\) where \(\bchars\) is the domain of \(\umn f\) and normalizing all values by \(\frac{1}{\pr_*(B)}\), which gives \[\umn_{\pr_*(B),B} f=\frac{\umn fB}{\pr_*(B)},f\in\bchars.\]
Obviously, the averages defined in this manner satisfy the IM axioms.
According to the represenation Theorem~\ref{thm:1.3}, any IM can be obtained as the union of its \(\pr_*(B)\)-sections. For each of these sections we can apply the formula above to determine a conditional IM; their union will give the required conditional IM.
Conditional on the occurrence of event \(B\) is the IM \(\IM_B\) on \(\pspX\), whose averages are determined from \(\IM\) by:
\begin{equation}\label{eq:1.15}
\umn_B f=\max_{\lpr(B)\leq \pr_* B \leq \upr^*(B)}[\umn_{\pr(B)} fB/\pr_*(B)], \quad f \in\bchars,
\end{equation}
where \(\umn_{\pr(B)} fB\) is the average for the \(\pr(B)\)-section \(\IM_{\pr_*(B)}=\IM \wedge \IPT{\pr_*(B)}\) from the traits \(f(x) B(x)\).
\pagebreak %
The averages according to formula~\eqref{eq:1.15} are called conditional averages. It is easy to check that they satisfy the axioms of IM. The area of existence of the upper averages of conditional IM will be the class of traits that coincide with functions from \(\bchars\) on \(B\) are are arbitrary (possibly taking values from \(\pm\infty\)) outside this set. This class is denoted \(\bchars B\).
The conditional lower average is determined by the usual formula \(\lmn_B f=-\umn_B(-f)\).
If \(\lpr(B)=0\), the right-hand side of~\eqref{eq:1.15} attains its maximum with \(\pr_*(B)=0\), understood as the limit \(\pr_*(B)\rightarrow 0\). This is also the way in which the formula is understood if \(\upr(B)=0\). Then \[\umn_B f=\lim_{\pr_*(B)\rightarrow 0}\frac{\umn_{\pr_*(B)} fB}{\pr_*(B)}=\lim_{\pr_*(B)\rightarrow 0} \frac{\pr_*(B) \max_x(fB)}{\pr_*(B)}=\max_x fB,\] where for any \(f\) (possibly unrestricted)the function \(fB\) is taken to be equal to zero on \(B^c\). It follows that if an event \(B\) is impossible (that is, \(B=\emptyset\)), then the conditional IM \(\IM_B\) is \(\IM_B=\mathcal{I}_B\) and is defined on \(\pspX\) by a single primary mean \(\upr_B(B^c)=0\).
Substituting the formulas~\eqref{eq:1.15} and~\eqref{eq:1.9} for the average cross sections gives the expression for the conditional averages:
\begin{equation}\label{eq:1.16}
\umn_B f=\max_{\lpr(B)\leq \pr_*(B)\leq \upr(B)}\frac{\min_c[\umn (f-c)B+c\pr_*(B)]}{\pr_*(B)}.
\end{equation}
Let us give some examples of calculations using this formula.
\begin{example}\label{ex:1.16}
Computation of conditional IPT. Let \(f\) in~\eqref{eq:1.16} be \(A(x)\). Then the minimum on c in the square brackets in~\eqref{eq:1.16} will be attained at either \(c=0\) or \(c=1\), resulting in \(\umn_{P_*(B)}(AB)=\min\{\upr(AB), \pr_*(B)-\lpr(A^cB)\}\). Replacing this in~\eqref{eq:1.15} gives the upper conditional probabilities
\begin{multline*}
\upr_B(A)=\upr_B(AB)=\umn_B(AB)\\=\max_{\lpr(B)\leq \pr_*(B)\leq \upr(B)} \min \{\frac{\upr(AB)}{\pr_*(B)},1-\frac{\lpr(A^c B)}{\pr_*(B)}\}=\frac{\upr(AB)}{\upr(AB)+\lpr(A^cB)},
\end{multline*}
taking into account that the maximum is reached when the terms under the minimum sign are equal.
\end{example}
\begin{example}
Let \(\IM\) on \(\mathbb{R}\) be defined by two primary averages \(\lmn |X|, \umn|X|\), and let \(B=[a,b], 0a\) we get \(\umn_B |X|=\frac{a \lmn|X|}{\umn |X|}\).
For another event \(B_1=[-a,a]\) with the same primary averages we get \(\lpr(B_1=[1-\umn|X|/a]^+, \upr(B_1)=1\), and if this event has occurred, the conditional probabilities are no longer, as above, trivial, since, for example, for an event \(D_1=[-d,d], d0} \mathbf{P}_{\theta}^B/P_{\theta}(B),\]
where \(\mathbf{P}_{\theta}^B\) is a probability vector with components \(\mathbf{P}_{\theta}^B(x_i)=\begin{cases}
P_\theta(x_i) & x_i \in B \\
0 & x_i \notin B,
\end{cases}
\) and the union is performed on those vertices with respect to which the probability of B is nonzero.
Geometrically, obtaining a conditional IM is as follows. First, each vertex \(\mathbf{P}_\theta\) is reduced to a vector
\pagebreak %
\(\mathbf{P}_{\theta}^B\) by zeroing out all components \(P_\theta(x_i)\) for which \(x_i\notin B\). This reduction may be interpreted as a projection of the vectors \(\mathbf{P}_\theta\) into the corresponding subspace \(\mathcal{R}_B^r\) of the space \(\mathcal{R}^r\). Then from the origin in the direction of the vectors \(\mathbf{P}_\theta\) we draw the rays \(\lambda P_\theta^B, \lambda\geq 0\), up to the intersection with the hyperplane \(\sum_{x_i\in B} P(x_i)=1\) of the subspace \(\mathcal{R}_B^r\). The intersection is attained by \(\lambda=\frac{1}{P_\theta(B)}\), and the intersection points \(\mathbf{P}_{\theta}^B/P_{\theta}(B)\) give the conditional probability vectors.
As an illustration for \(\pspX=\{x_1,x_2,x_3\}\) and \(B=\{x_1,x_2\}\), consider Figure~\ref{fig:1.11}. The conditional IM \(\IM_B\) here corresponds to the section \([P_{1B},P_{3B}]\). Note that the number of vertices in \(\IM_B\) is smaller than in \(\IM\): \(\mathbf{P}_2\) is a vertex in \(\IM\) but \(\mathbf{P}_{2B}\) is not a vertex in \(\IM_B\).
\begin{figure}
\centering
\begin{tikzpicture}[scale=0.8]
\draw (0,0) -- (5,0) -- (2.5,4.33) -- (0,0);
\draw (3.5,4.30) node {\scriptsize \(P(x_3)=1\)};
\draw (5.8,0.1) node {\scriptsize \(P(x_1)=1\)};
\draw (-0.8,0.1) node {\scriptsize \(P(x_2)=1\)};
\draw[dashed] (2.5,4.33) -- (2.5,0.58);
\draw[dashed] (2.5,4.33) -- (1.25,0);
\draw[dashed] (2.5,4.33) -- (3.75,0);
\draw (2.5,-0.2) node {\scriptsize \({\bf P}_{2B}\)};
\draw (1.2,-0.2) node {\scriptsize \({\bf P}_{1B}\)};
\draw (3.8,-0.2) node {\scriptsize \({\bf P}_{3B}\)};
\draw[fill=white] (2.5,2.2) -- (1.74,1.7) -- (3.32,1.49) -- (2.5,2.2);
\draw (2.5,1.85) node {\scriptsize \(\mathcal{M}\)};
\draw (2.75,2.3) node {\scriptsize \({\bf P}_2\)};
\draw (1.55,1.75) node {\scriptsize \({\bf P}_1\)};
\draw (3.6,1.49) node {\scriptsize \({\bf P}_3\)};
\draw [overbrace style,rotate around={180:(2.5,0)}] (1.25,-0.25) -- (3.75,-0.25) node[pos=0.5,below,yshift=3mm] {\scriptsize \(\mathcal{M}_B\)};
\end{tikzpicture}
\caption{Geometric illustration of conditional models.}
\label{fig:1.11}
\end{figure}
We have shown that conditional IMs are defined through vertices if unconditional ones are given. If an \(\IM\) is defined by its primary means \(\pumn g, g\in \pchars\), then theoretically it is possible to obtain the conditional \(\IM_B\) by determining the vertices of \(\IM\). To do this, proceed as follows: primary averages \(\pumn \pchars\rightarrow\) agreeing facets of \(\IM\rightarrow\) vertices of \(\IM\rightarrow\) vertices of the conditional IM \(\rightarrow\) primary averages of \(\IM_B\). This process is quite long, as each of the operations is quite time-consuming. But the main point, it is the fact that for infinite spaces \(\pspX\) the notion of a vertex as a vector of probabilities makes no sense, which significantly limits the universality of such a procedure.
\subsection{Some properties of conditional interval models}. For conditional IM \(\IM_B\) the event B is credible \(\lpr_B(B)=1\) while \(B^c\) is null \(\upr_B(B^c)=0\). With this in mind, conditional IMs have all the properties of the IMs on a possibility space \(\pspX\).
Let us consider how the conditional IM varies when we narrow or expand the conditioning event \(B\). Obviously, if event \(B\) is credible, i.e., \(\lpr(B)=1\), the conditional IM agrees with the initial one \(\IM_B=\IM\), and if event \(B\) is impossible, i.e., \(\upr(B)=0\), then the conditional IM is the \(B\)-indicator: \(\IM_B=\mathcal{I}_B\). It turns out that this last equality does not only hold for impossible events.
\emph{Property 1. If each of the primary functions \(g\in\pchars\) of the unconditional model \(\IM=\IMfunc{\umn}{\pchars}\) is constant on \(B\), then \(\IM_B=\mathcal{I}_B\).}
Indeed, in this case \(\umn_{\pr_*(B)} fB=\pr_*(B)\max fB\) and then the property follows directly from the definition of conditional IM.
\emph{Property 2. The conditional IM corresponding to the indicator model \(\mathcal{I}\) will be the B-indicator \(\mathcal{I}_B\), whatever the conditioning event B is.}
\emph{Property 3. The conditonal model of a union \(\IM=\IM_1 \vee \IM_2\) is equal to the union of the conditional models \(\IM_B=\IM_{1B}\vee \IM_{2B}\).}
\emph{Property 4. \(\IM_1 \subset \IM_2 \Rightarrow \IM_{1B}\subset \IM_{2B}\): inclusion is preserved by conditioning.}
Property 3 generalizes to an arbitrary union of models. In fact, this property was at the basis for the definition of conditional models when moving from unions of sections to unions of conditional sections.
Property 4 is a direct consequence of 3, since \(\IM_1\subset \IM_2 \Rightarrow \IM_1\vee \IM_2=\IM_2\).
As an illustration, let us consider the picture in Figure~\ref{fig:1.11}. Since \(\pspX\) is discrete, we can regard \(\IM\) as a convex body of probability vectors, and we can place a point source at the origin. When illuminating \(\IM\) from this source, a shadow is cast on the hyperplane \(\sum_{x\in B} \pr(x)=\pr(B)=1\). This shadow will give the conditional IM. Based on this representation we obtain a graphical interpretation of properties 3 and 4. The shadow of the convex union of bodies will be equal to the union of their shadows, which illustrates property 3. However, the shadow of the intersection of bodies will not be equal to the intersection of their shadows. Two bodies may not intersect each other but when illuminated by a point source they may cast the same shadow on the hyperplane. This leads to the following two propositions.
{\em Property 5. The intersection of two IMs does not correspond in general to the intersection of their conditional IMs}.
{\em Property 6. The same conditional IM can correspond to two different unconditional IMs}.
For instance, let \(B(x)\) be the only primary trait for both \(\IM_1\) and \(\IM_2\) with exact probabilities \(\pr_1(B)\neq \pr_2(B)\). Then \(\IM_1\) and \(\IM_2\) do not overlap, but both determine the same conditional model \(\mathcal{I}_B\).
\subsection{On the reconstruction of the unconditional model from the conditional}. Let us discuss the possibility of reconstructing \(\IM\) from the set of its corresponding conditional \(\IM_{B_i}\), where \(\mathcal{B}_{\Sigma}=\{B_1,\dots,B_k\}\) is a partition of the possibility space \(\pspX\).
Assume the probabilities of \(B_i\) are exact in \(\IM: \pr(B_i)=\lpr(B_i)=\upr(B_i), i=1,\dots,k\). Then \(\umn_{P(B_i)} fB_i=\umn f B_i\) from which we derive the inequality \[\umn f\leq \sum \umn fB_i=\sum \pr(B_i) \umn_{B_i} f.\]
The right-hand side corresponds to the averaging of the conditional \(\umn_{B_i} f\) over the exact probabilities \(\pr(B_i)\) of the events \(B_i\). The equality of the left and the right-hand sides (corresponding to the well-known formula of the total likelihood) is achieved only when \(\umn\) is additive on the events \(B_i: \umn \sum fB_i=\sum \umn f B_i\).
\pagebreak %
Let us turn now to the more general case where the probabilities \(\pr(B_i)\) are not exact. Let us write \(\sum \pr(B_i) \umn_{B_i} f=\mn [\sum B_i(x) \umn_{B_i} f]\) and replace the exact mean symbol by \(\IM\). This gives
\[\overline{\umn} f=\umn [\sum B_i(x) \umn_{B_i} f], \quad f\in\bchars\], defining a new IM that we denote \(\overline{\umn}\). Thus, \emph{we have the inclusion \(\overline{\overline{\IM}} \supset \IM\). And if \(\IM=\IMfunc{\umn}{\pchars}\) and all functions \(\pchars\) are \(\mathcal{B}_{\Sigma}\)-measurable} (i.e., finite linear combinations of \(B_i(x)\) or their closures), \emph{we get \(\overline{\overline{\IM}}=\IM\).}
The thesis becomes clearer if we note that in order to compute \(\overline{\umn} f\) we need to know \(\umn_{B_i} f\), i.e., the conditional model \(\IM_{B_i}\) as well as the averages over all possible linear combinations of \(B_i: \umn \sum c_i B_i\), which, if we take the class \(\lhull \mathcal{B}_{\Sigma}\) as primary set) would be the \(\lhull \mathcal{B}_{\Sigma}\)-extension of \(\IM\). If, by the second sentence of the thesis, all the \(g\in\pchars\) are \(\mathcal{B}_{\Sigma}\)-measurable, then \(\pchars \subset \lhull \mathcal{B}_{\Sigma}\) and the extension coincides with \(\IM\), which gives the degenerate conditional models \(\IM_{B_i}=\mathcal{I}_{B_i}\).
Thus, the scope of the recovery boils down to the degenerate conditional models \(\IM_{B_i}=\mathcal{I}_{B_i}\).
In the general case, however, the transition to conditional models results in a loss of information from \(\IM\), so that it will not be possible to reconstruct it, but only the broader (and therefore less accurate) \(\overline{\overline{\IM}}\). This limits the application of conditional models mostly to exact probability distributions.
\subsection{Abstract conditional models}. In the previous developments we assumed that some event \(B\) occurred and investigated the conditonal model \(\IM_B\). Let us now replace the indicator \(B(x)\) by an arbitrary trait \(q(x)\) and using the same formulas we will determine the conditional IM \(\IM_q\). This can be done by means of formula~\eqref{eq:1.15}. As discussed at the beginning of Section~\ref{sec:1.1}, if \(q(x)\) satisfies \(0\leq q(x)\leq 1\) then \(q\) may be interpeted as the trait of a fuzzy event, and then \(\IM_q\) would be the conditional IM is such an event occurred.
Let \(q(x)\) be a function on \(\pspX\) such that \(\mn q>0\). The abstract conditional model on the fact that \(q\) has happened is denoted \(\IM_q\) and defined as
\begin{equation}\label{eq.1.17}
\umn_q f=\max_{\lmn q \leq \mn_* q\leq \umn q}\frac{\umn_{\mn_* q}(fq)}{\mn_* q}.
\end{equation}
In the numerator of the right-hand side of~\eqref{eq.1.17} we have the average of the \(\umn_* q\)-sections of \(\IM\).
We obviously have that \(\IM_{c^+ q}=\IM_q\): multiplication of \(q(x)\) by a non-negative coefficient \(c^+\) does not change the abstract conditional model.
\pagebreak %
\emph{Let \(p(x)\) be the density of \(\IM\) with respect to \(\IM^0: p=\IM/\IM^0\). Then the corresponding abstract conditional model on \(\IM\) on the occurrence of \(q(x), \IM_q\), is equal to the abstract conditional model \(\IM^0_{qp}\) on the occurrence of \(q(x)p(x)\)}: \[p=\IM/\IM^0, \umn q>0 \Rightarrow \IM_q=\IM^0_{qp}.\]
Indeed, according to the definition of density \(\umn f=\umn^0 fp\). Applying~\eqref{eq.1.17}, the formula for the \(\mn_* q\)-section and after some manipulations, we obtain
\begin{align*}
\umn_q f&=\max_{\lmn q \leq \mn_* q \leq \umn q} \frac{\min_c [\umn(f-c)q+c\mn_* q]}{\mn_*q}\\
&=\max_{\lmn q \leq \mn_* q \leq \umn q} \frac{\min_c [\umn^0(f-c)qp+c\mn_* qp]}{\mn_*qp}=\umn^0_{qp} f,
\end{align*}
which completes the proof.
From this it follows that if \(p(x)\) is the density of \(\IM\) with respect to \(\IM^0\) then the conditional model \(\IM_B\) corresponding to the observation of the event \(B\subset\pspX\) is equal to the abstract conditional model \(\IM^0_{Bp}\) on the occurrence of \(B(x) p(x)=q(x)\): \[p=\IM/\IM^0 \Rightarrow \IM_B=\IM^0_{Bp}, \quad B\subset\pspX.\]
With \(B=\pspX\) we get \(\IM=\Im^0_p.\)
Both these points are interesting from a mathematical point of view.
Thus, the notion of a conditional model introduced in this paragraph is easily extended to fuzzy events. Let us illustrate this.
Conditional probability formula (Bayes formula) for fuzzy events. Suppose a IPT requires the computation of the probability of \(A\) if \(B\) has happened, not with total reliability but with some doubt about whether \(B\) happened at all. Instead of \(B\) we have the trait \(q(x)=\gamma B+(1-\gamma) B^c \) (a fuzzy event), where the coefficient \(\gamma\) is interpreted as the probability that \(B\) has happened. Using~\eqref{eq.1.17} and after some manipulations, we obtain the following expression, which is applicable for \(\frac{1}{2}\leq \gamma \leq 1\), for the probability of \(A\) given the occurrence of \(q\):
\[\upr_q(A)=\frac{(1-\gamma)\upr(A)+(2\gamma-1)\upr(AB)}{(1-\gamma)+(2\gamma-1[\upr(AB)+\lpr(A^c B)])}.\]
We see that if \(\gamma=1\) the result is the same as in Example~\ref{ex:1.16} and if \(\gamma=\frac{1}{2}\) the conditional probability becomes the unconditional (a priori) probability \(\upr_q(A)=\upr(A)\) for any event \(A\). This is intuitive, since \(\gamma=\frac{1}{2}\) is equivalent to \(q\equiv\frac{1}{2}\), which in turn is equivalent to \(q\equiv 1\) (since multiplying \(q\) by a constant does not change
abstract-conditional certainty) and leads to a credible event \(B=\pspX\).
The formula for the conditional probability at \(0\leq\gamma\leq\frac{1}{2}\) (corresponding to the prevailing belief that \(B^c\) rather than \(B\) occurred) can be obtained from the above replacing \(B\) by \(B^c\) and \(\gamma\) by \(1-\gamma\), and exchanging also lower and upper conditional probabilities.
\begin{conclusions}{sec:1.7}
We consider random phenomena, in the description of which, directly or indirectly, a set of mutually exclusive elementary outcomes that form the space of elementary events can be indicated.
Functions on this space are called features.
The average statistical values of features are the limits of the arithmetic mean results of independent repetitions of a phenomenon under the same conditions and can be exact (for stable phenomena) and interval (for unstable, uncertain).
Averages in the interval sense exist in a very wide area of features, which necessarily includes all limited ones.
The interval model is a set of lower and upper averages in the area of their existence, linked by the axioms of~\S\ref{sec:1.1}.
The axioms agree on the averages.
Any IM (\S\ref{sec:1.2}) is formed by a primary set of features \(\pchars\), the elements of which will predetermine the type of model, and an inconsistent (correct) assignment for primary means \(\pchars\), specifying its type.
The key is Theorem~\ref{th:1.1}, according to which averages from primary features are unambiguously continued with agreement on all features dominated by primary ones, forming the domain of existence of IM.
If among the primary signs there are unlimited, then they will give rise to unlimited signs in the field of existence.
An additional extension of the domain of existence can be performed by passing to the limit \eqref{eq:1.4} from functions truncated above and below, in the same way as the integral of unbounded functions is understood.
Interval models differ from each other in different sets of primary characteristics and different primary averages, and, as a result, in different values of averages in the area of their existence.
By the inclusions of these values, one can judge which of the IM is wider and which is less (\S\ref{sec:1.3}).
The wider the MI, the less useful data on the phenomenon it contains.
The broadest of all is the naked MI, which corresponds to the absolute absence of data (or the entire instability of the phenomenon).
The expansion of IM serves as a working tool for its simplifications.
Through the means in \S\ref{sec:1.3}, the operations of intersection of IM are defined as adding data to the already used ones and combining as scattering of data, an increase in uncertainty.
Geometrically, an IM is a polyhedron in which the primary features and their means determine the direction of the faces and their positions, respectively.
The intersection of polyhedra is their common part, therefore, it will again be a polyhedron, and the union is supposed to be extended to its convex hull to get an IM.
A special case of MI, when the probabilities of events are taken as primary, are the interval probability distributions described in \S\ref{sec:1.4}.
The primary set of RVI events is arbitrary in terms of both quantity and topology.
If this is a semiring (for example, segments of a number line) and exact probabilities are given on it, then they unambiguously continue, remaining exact, to the algebra of events (sums of segments), and their consistency is equivalent to additivity, which leads to finitely additive probability distributions.
Extending the primary set to countable algebra and setting the probabilities deliberately countably additive leads to a narrowing of the probability distributions to countably additive ones.
\pagebreak % 72
The generalization of the indicated types of probability distributions are finite and countably additive IRVs, for which, for the same primary system of events, the probabilities are set by interval.
Another type of IRI is an interval probability distribution function, the primary for which is a set of nested events; can be represented as a family of exact distribution functions, and the continuous ones of them are finitely additive probability distributions, and the discontinuous ones correspond to their groups.
Any family of probability distributions (finitely additive or countable) is some kind of IM.
On the other hand, any IM with interval means can be represented as a union of its simple components with exact means (Theorem~\ref{thm:1.3} \S\ref{sec:1.5}).
In particular, it can be a family of finitely additive probability distributions.
But not countably additive as too special to become a universal “building material” for all IM (excluding discrete spaces of elementary events).
Another IM can acquire clarity, and sometimes even physical meaningfulness by the correct selection of its presentation through some standard models.
This can be done by a good choice of the form of the corresponding functional transformation.
Another special way is to transfer the means of one IM through another (standard) using a formal density, which is understood broader than the classical probability density (\S\ref{sec:1.5}).
In the presence of a reliably accomplished event, IM is transformed into a conditional recalculation of its averages (\S\ref{sec:1.6}).
The recalculation formula~\ref{eq:1.15} is very complicated (except for the case of exact probabilities).
It is applicable to fuzzy events.
The transition to conditional models is usually accompanied by expansion.
It is possible to return from the conditional to the original model without losing anything only in rare exceptions, so conditional models do not occupy any significant place in interval methods.
\end{conclusions}
\vfill
\chapter{Combined analysis}\label{cha:2}
\section{Deterministic transformations of outcomes}\label{sec:2.1}
\subsection{Mappings}
Having thus defined interval models and their properties, let's proceed and analyse how they change, in the sense of how their features vary by means of transformations of the space \(\pspX\).
To do this, let us first classify the transformations themselves.
Consider a map between \(\pspX\) and \(\pspY\)
\[
\pspX \xrightarrow{\mathrm{s}} \pspY
\]
that records the relationship between the outcomes in \(\pspX\) and in \(\pspY\).
If the value \(y=\mathrm{s}x\) is uniquely determined by converting each outcome of \(x\in\pspX\) into the outcome \(y\in\pspY\), then the transformation is called \emph{deterministic}.
\pagebreak % 73
Once we have specified how a deterministic \(\mathrm{s}\) transforms the outcomes, we can determine how features are transformed; for the time being, we will not look at all of them, but only at a particular class - events.
There are no difficulties here: \(A \subset \pspX\) is converted into \(B=\{\mathrm{s}x: x \in A\}\).
This is a set of values that includes \(y=\mathrm{s}x\) when the value \(x\) runs through \(A\).
Written formally, \(B=\mathrm{s}A\) and we say that B is the event transformation of A.
Each event A in \(\pspX\) has its own transformation \(B\) in \(\pspY\).
% TODO: deal with as environment or section?
\textsc{Properties}. The transformation of a combination of events in \(\pspX\) is equal to the combination of the transformations in \(\pspY\): \(\mathrm{s}(A_1 \cup A_2)=\mathrm{s}(A_1) \cup \mathrm{s}(A_2)\).
The image of the empty set is the empty set: \(\mathrm{s}(\emptyset)=\emptyset\).
When \(\pspX\) is a subset of \(\pspY\), then we say that s is an embedding of \(\pspX\) into \(\pspY\).
Similarly, when \(\pspX\) coincides with \(\pspY\) (which is actually easy to accomplish by excluding from \(\pspY\) those elements that are not in the range \(\mathrm{s}\pspX\)), then the mapping from \(\pspX\) to \(\pspY\) is surjective.
The inclusion between events is preserved by the transformation: \(A_1 \subset A_2\) implies that \(\mathrm{s}(A_1) \subset \mathrm{s}(A_2)\).
Similarly, if \(\mathrm{s}\) is surjective, then \(\mathrm{s}(A^c)=(\mathrm{s}A)^c\).
With respect to intersections, the situation is displayed in Fig.~\ref{fig:2.1}.
\begin{figure}[b]
\centering
***
\caption{***}
\label{fig:2.1}
\end{figure}
Denote all elements \(x\) that are mapped into the point \(y\) by \(A_y=\mathrm{s}^{-1}y=\{x: \mathrm{s}x=y\}\).
We call this the anti-image of point \(y\).
The figure shows that the anti-image of an event \(B\) of \(\pspY\) is obtained by putting together the anti-images of its elements: \(A_B=\cup_{y \in B} A_y=\{x: \mathrm{s}x \in B\}\).
Then disjoint \(B_1, B_2\) have disjoint anti-images \(\mathrm{s}^{-1} B_1\) and \(\mathrm{s}^{-1} B_2\) and any algebraic operations performed on them are similarly performed in their anti-images.
As a consequence, the sets \(\mathrm{s}^{-1}y \subseteq \pspX\) are disjoint and \(\mathrm{s}\) can thus be regarded as a one-to-one correspondence between the elements \(y\) of \(\pspY\) and the subsets \(A_y\) of \(\pspX\), that produces the correspondence \(\cup_{y \in B} A_y \leftrightarrow B\), leading to the isomorphism between two algebras of events: the algebra of \(\events\) on \(\pspX\) generated by the events \(A_y\), and the algebra of all events on \(\pspY\) denoted by \(2^\pspY\). These algebras \(\events \leftrightarrow 2^\pspY\) are isomorphic in the sense that the algebraic relations and actions on the sets of one are mirrored in the relations and actions on the other.
\pagebreak % 74
We will generalize the concept of transformations, carrying it over to the isomorphism of two arbitrary algebras of events. For this, we will understand \(\mathrm{S}\) (denoted in contrast to the transformation by a capital letter) to be a mapping under which every point \(x\) is mapped, in general, to a subset \(\mathrm{S} x_1 = B_1\) of the space \(\pspY\), and moreover: a) for distinct \(x_1 \neq x_2\), the sets \(\mathrm{S} x_1 = B_1\) and \(\mathrm{S} x_2 = B_2\) into which these points are mapped either coincide with each other or do not intersect; b) \(\mathrm{S} \pspX = \cup \mathrm{S} x = \pspY\). The equality \(\mathrm{S} x = B_x\) is understood as a fuzziness or uncertainty about where \(x\) lies, and in this sense \(\mathrm{S}\) is a vague mapping.
The map \(\mathrm{S}\) can take any event in \(\pspX\) to an event in \(\pspY\): \(\mathrm{S} A = {\cup}_{x \in A} \mathrm{S} x\), and back: \(\mathrm{S}^{-1} B = \{x : \mathrm{S} x \subset B\}\). In our broad sense, the inverse \(\mathrm{S}^{-1}\) for any \(\mathrm{S}\) is always defined. When the map and inverse map are applied repeatedly \(S^{-1} S A\) events, in general, blur and become larger than \(A\) with the exception of certain events, which are of the most interest as these events are precisely the ones that fully characterize \(\mathrm{S}\). This is the class \(\mathcal{A}_\mathrm{S}\) of all the events in \(\pspX\) (and its image \(\mathcal{B}_\mathrm{S}\) in \(\pspY\)), which under application of the map followed by its inverse ``stays neatly in place":
\[
\mathcal{A}_\mathrm{S} = \{A : \mathrm{S}^{-1} \mathrm{S} A = A \} \overset{\mathrm{S}}{\leftrightarrow} \mathcal{B}_\mathrm{S} = \{B : B = \mathrm{S} \mathrm{S}^{-1} B\}.
\]
It is characterized by the one-to-one relationship: \(A \leftrightarrow \mathrm{S} A = B\), \(\mathrm{S}^{-1} B = A \leftrightarrow B\), which induces two isomorphic algebras \(\mathcal{A}_\mathrm{S} \leftrightarrow \mathcal{B}_\mathrm{S}\), where \(\mathcal{B}_\mathrm{S}\) is the image of \(\mathcal{A}_\mathrm{S}\). The atoms of the algebras are \(B_x = \mathrm{S} x,~ x\in \pspX\) for \(\mathcal{B}_\mathrm{S}\) and \(A_x = \mathrm{S}^{-1} \mathrm{S} x\) for \(\mathcal{A}_\mathrm{S}\). If we relabel these as \(B_z, A_z\) by joining intersecting \(B_x\) (and \(A_x\)) together with a single index \(z\), then \(\mathrm{S}\) can be interpreted as a one-to-one and onto correspondence \(A_z \leftrightarrow B_z\).
And so, any mapping \(\mathrm{S}\) induces isomorphic algebras \(\mathcal{A}_\mathrm{S}\) and \(\mathcal{B}_\mathrm{S}\) and fully defines them. At the same time, for any isomorphism, one can find a corresponding map \(S\), understanding it to be a map of events in \(\pspX\) (atoms \(A_z\)) to events in \(\pspY\) (atoms \(B_z\)).
The algebras \(\mathcal{A}_\mathrm{S}\) and \(\mathcal{B}_\mathrm{S}\) comprise those events that remain precise after transformations. The images and (anti-images) of all remaining events can be defined through them by the formulas:
\[
S A = \cap_{\substack{A' \in \mathcal{A}_\mathrm{S} \\ A' \supset A}} \mathrm{S} A';~~ S^{-1} B = \cap_{\substack{B' \in \mathcal{B}_\mathrm{S} \\ B' \supset B}} \mathrm{S}^{-1} B'.
\]
\subsection{Transformations of characteristics}
We have considered how events can be transformed with mappings. Now, we will turn to transformations of characteristics; we will establish how one characteristic changes into another under transformation.
\pagebreak % 75
A numeric function \(f(x)\), which is measured with respect to the algebra of events \(\mathcal{A}_\mathrm{S}\) induced by the mapping \(\mathrm{S}\), is called an \(\mathrm{S}\)-representable characteristic. If we understand the function \(f\) broadly as a mapping of events \(A\) to the subset of the real line \(f(A) = \{f(x) : x\in A\}\), then for \(\mathrm{S}\)-representable characteristics \(f(A) \equiv f(\mathrm{S}^{-1} \mathrm{S} A),~ \forall A\), which can equivalently be taken as their definition; these characteristics have constant value on the ``atoms" \(A_z\) of the algebra \(\mathcal{A}_\mathrm{S}\). The class of all \(\mathrm{S}\)-representable characteristics will be denoted as \(\mathcal{F}_\mathrm{S}\). Clearly, all indicator functions of the events in \(\mathcal{A}_\mathrm{S}\) are in this class.
The class \(\mathcal{F}_\mathrm{S}\) is linear and closed with respect to any arithmetic operations, i.e. a transformation \(F(f_1, f_2, \dots)\) of characteristics from this class goes to a characteristic within the class itself (infinite-valued characteristics are allowed).
Every \(\mathrm{S}\)-representable characteristic has its own \textit{image} \(\phi(y) = f(\mathrm{S}^{-1} y)\) (or \(\phi(B) = f(\mathrm{S}^{-1} B)\)), which is itself a \(\mathcal{B}_\mathrm{S}\)-measurable function, or, equivalently, an \(\mathrm{S}^{-1}\)-representable characteristic \(\phi(B) \equiv \phi(\mathrm{S} \mathrm{S}^{-1} B)\). The class of all such characteristics will be denoted as \(\Phi_{\mathrm{S}^{-1}}\).
The class \(\mathcal{F}_\mathrm{S}\) and its image \(\Phi_{\mathrm{S}^{-1}}\) are related by a bijection: \(\mathcal{F}_\mathrm{S} \leftrightarrow \Phi_{\mathrm{S}^{-1}}\), which preserves the order of functions: \(f_1 \geq f_2 \iff \phi_1 \geq \phi_2\), where \(f_1 \leftrightarrow \phi_1\), \(f_2 \leftrightarrow \phi_2\); and also preserves the identity of arithmetic operations: \(F(f_1, f_2, \dots ) \leftrightarrow F(\phi_1, \phi_2, \dots)\). For example, the image of the linear combination \(c + \sum_i c_i f_i(x),~f_i \in \mathcal{F}_\mathrm{S}\) will be the linear combination \(c + \sum_i c_i \phi_i(y),~\phi_i \in \Phi_{\mathrm{S}^{-1}},~f_i \leftrightarrow \phi_i \).
We note that if \(y = s x\) is a deterministic transformation, then \(\Phi_{\mathrm{S}^{-1}}\) is in general the set of any functions of the variable \(y\), while \(\mathcal{F}_\mathrm{S}\) is the set of functions of the form \(f(x) = \phi(s x)\).
All \(\mathrm{S}\)-representable characteristics and their images give the ``directions," roughly speaking, according to which the calculations of means will be made in IM under the mapping \(\mathrm{S}\).
\subsection{Calculation of means}
Suppose that on \(\pspX\) an IM \(\IM^x\) is given with domain \(\mathcal{F}\) of existence of upper means \(\umn f,~f \in \mathcal{F}\), and that \(\mathrm{S}\) is a mapping from \(\pspX\) to \(\pspY\). It is required to calculate the corresponding IM \(\IM^y\) on \(\pspY\), which we will write as \(\IM^y = \mathrm{S} \IM^x\). What kind of data about the means \(\umn \phi\) on \(\pspY\) will there be? What will be lost in the process of mapping?
For \(\mathrm{S}\)-representable characteristics \(f\) precisely nothing is lost:
\begin{equation}\label{eq.2.1}
\umn \phi = \umn f,~~\phi \leftrightarrow f \in \mathcal{F}_\mathrm{S} \cap \mathcal{F}
\end{equation}
and means simply carry over to their images. This is clear, since the values of \(\phi(y)\) are fully repeated in those of \(f(x) = \phi(\mathrm{S} x)\).
\textit{The means will be primary for the transformed IM \(\IM^y = \mathrm{S} \IM^x \)} \eqref{eq.2.1}. They are coherent on \(\mathcal{B}_\mathrm{S}\) by way of their coherency on \(\mathcal{A}_\mathrm{S}\) and the isomorphism between these algebras.
The sequence of steps for defining \(\IM^y\) are as follows:
\begin{enumerate}
\item at first, there is only information about the means in the subclass of \(\mathrm{S}\)-representable characteristics \(\mathcal{F}_\mathrm{S}\), which corresponds to the \(\mathcal{F}_\mathrm{S}\)-extension of \(\IM^x\);
\item the means from \(\mathcal{F}_\mathrm{S}\) carry over to their images using formula \eqref{eq.2.1}, forming the primary values for \(\IM^y\);
\item finally, the primary values are continued to arbitrary characteristics.
\end{enumerate}
It follows from what has been said that whether \(\IM^x\) or its \(\mathcal{F}_\mathrm{S}\)-extension \(\langle \umn^x \mathcal{F}_\mathrm{S} \rangle \) is given on \(\pspX\), the result \(\IM^y = \mathrm{S} \IM^x\) will be the same. Only the means of \(\mathrm{S}\)-representable characteristics take part in the calculation of \(\IM^y\); the others means ``blur", they lose their own means and become ``subordinate" to the set \(\umn \mathcal{F}_\mathrm{S} = \{\umn f_\mathrm{S} : f_\mathrm{S} \in \mathcal{F}_\mathrm{S}\}\).
Clearly, there will be no such loss in the case that \(\umn \mathcal{F}_\mathrm{S}\) precisely defines a model \(\IM^x\), i.e. its primary characteristics \(g \in \mathcal{G}\) are all \(\mathrm{S}\)-representable: \(\mathcal{G} \subset \mathcal{F}_\mathrm{S}\). Then the \(\mathcal{F}_\mathrm{S}\)-extension of \(\mathcal{M}^x\) coincides with \(\mathcal{M}^x\) itself, and the primary characteristics of \(\IM^y\) will be the images of \(g \in \mathcal{G}\) with the same means as in \(\IM^x\), so that the primary characteristics and means of \(\IM^x\) and \(\IM^y\) are in identity correspondence. The calculation of means then takes places not in three steps, as shown above, but in two, since here \(\mathcal{F}_\mathrm{S}\) may be replaced by \(\mathcal{G}\) and the first step becomes degenerate.
\begin{example}
Let \(\pspX = \reals\), and assume that \(y = x^2\) is the transformation. Then \(\pspY = \reals^+\), the half-line, and the means \(\umn \phi(Y)\) (for convenience random variables are denoted with capital letters) that define \(\IM^y\) are expressed via the means of \(\IM^x\) according to the formula \eqref{eq.2.1}: \(\umn \phi(Y) = \umn \phi(X^2)\). For example, \(\umn Y = \umn X^2\), \(\umn Y^2 = \umn X^4\), \(\umn \cos Y = \umn \cos X^2\), etc. The right hand sides are calculated on the basis of the primary data about \(\IM^x\). There will be no loss upon transformation only if the primary characteristics of \(\IM^x\) are themselves of the form \(x^2\). Let's say that \(\IM^x\) is defined by the values \(\umn X^2, \umn X^4\). Then \(\IM^y\) will be defined by the primary values \(\umn Y = \umn X^2\) and \(\umn Y^2 = \umn X^4\), and the remaining \(\umn \phi(Y)\), for example \(\umn \cos Y\), are calculated from them or carry over directly as means of the images \(\umn \phi(X^2)\), for example \(\umn \cos X^2\).
\end{example}
\begin{example}[Deterministic transformations of a random process]
Let \(X_t\) be a process defined by its means \(\umn f \{X_t\}\), where \(f\{X_t\}\) are functionals on \(X_t\), and let \(\mathrm{S}\) be the transformation \(Y_t = \mathrm{S} X_t\). This may be a nonlinear transformation \(Y_t = v(X_t)\), for example raising to the second power \(Y_t = X_t^2\), a restriction, a linear transformation, or a combination of these, for example \(Y_t = \int v(X_\tau) h_{t, \tau} d\tau\)---a non-linear \(v\) followed by a filter. Strictly speaking, the form of the transformation is not important for calculating \(\umn \phi \{Y_t\}\), which is quite simple: these means coincide precisely with the means \(\umn \phi \{\mathrm{S} X_t\}\) by way of the \(\mathrm{S}\)-representable functionals \(f\{X_t\} = \phi\{\mathrm{S} X_t\}\): that is, \(\umn \phi \{Y_t\} = \umn \phi \{\mathrm{S} X_t\}\). So under the transformation of the ``non-linearity-then-filter" type introduced just above, we have:
\begin{align*}
\umn Y_t &= \umn \int v(X_\tau) h_{t, \tau} d\tau,~~\umn Y_t^2 = \umn [\int v(X_\tau) h_{t, \tau} d\tau]^2; \\
\umn \int Y_t dt &= \umn \int \int v(X_\tau) h_{t, \tau} d\tau dt,
\end{align*}
etc. From this follows the calculation of the corresponding means of the remaining characteristics of the process \(Y_t\), as with the primary characteristics according to the continuation theorem.
\end{example}
\subsection{Similarity of models}
We will discuss mappings, which do not lead to a loss of data relative to the model. It is absolutely clear that there should be no loss if \(\mathrm{S}\) is a one-to-one map from \(\pspX\) to \(\pspY\), since it is always possible to return to the original space and original model with the inverse map \(\mathrm{S}^{-1}\) as if there had been no mapping. The question becomes not so trivial if \(\mathrm{S}\) reduces the space, taking \(\pspX\) to a space \(\pspY\) of lesser dimension.
A map \(\mathrm{S}\) is called a \textit{similarity transformation} (or simply similarity) for \(\IM^x\), if \(\IM^x = \mathrm{S}^{-1} \mathrm{S} \IM^x\), that is, if upon mapping the space \(\pspX\) with \(\mathrm{S}\) to \(\pspY\) and then back with \(\mathrm{S}^{-1}\), we return to the same model. In this scenario \(\mathrm{S}^{-1}\) will also be a similarity transform for \(\IM^y = \mathrm{S} \IM^x\), since \(\mathrm{S} \mathrm{S}^{-1} \IM^y = \IM^y\).
Two models \(\IM^x\) and \(\IM^y\), related to one another by a similarity, are called \textit{similar} and are denoted as \(\IM^x \sim \IM^y\).
\textit{The mapping \(\mathrm{S}\) will be similar for \(\IM^x = \langle \umn^x \mathcal{G} \rangle\) if all primary characteristics are \(S\)-representable: \(g(x) = \psi_g(\mathrm{S} x),~\forall g \in \mathcal{G}\). In this case, \(\IM^x\) maps to its similar model \(\IM^y = \langle \umn^x \psi\rangle\), defined by the primary means: \(\umn^y \psi_g = \umn^x g\) and \(\Psi = \{\psi_g : g \in \mathcal{G}\}\)}.
This statement is a repetition of the discussion at the end of the previous section.
On the other hand, if \(\mathrm{S}\) is similar for \(\IM^x = \langle \umn^x \mathcal{G}\rangle\), then, replacing \(\mathcal{G}\) with the subset \(\mathcal{G}_\mathrm{S}\) of all \(S\)-representable primary characteristics \(\mathcal{G}_\mathrm{S} \subset \mathcal{G}\), we do not change the model: \(\IM^x = \langle \umn^x \mathcal{G}_\mathrm{S}\rangle\). Then the class \(\slhull \mathcal{G}_\mathrm{S}\) of secondary characteristics will be \(\mathrm{S}\)-representable and will map one-to-one, while preserving order relationships and the values of means, to the class \(\slhull \Psi_\mathrm{S}\), \(\Psi_\mathrm{S} = \{\psi_g : g \in \mathcal{G}_\mathrm{S}\}\), of secondary characteristics of the similar \(\IM^y = \mathrm{S} \IM^x\), while \(\slhull \mathcal{G}_\mathrm{S}\) and \(\slhull \Psi_\mathrm{S}\), in turn, will be subclasses of \(\mathcal{F}_\mathrm{S}\) and \(\Phi_{\mathrm{S}^{-1}}\), respectively, the classes of conceivable functions sufficient for calculating the models \(\IM^y\) via \(\IM^x\) under the transformation \(\mathrm{S}\). From here, if \(\IM^x\) and \(\IM^y\) are similar, then the classes \(\mathcal{F}_\mathrm{S}\) and \(\Phi_{\mathrm{S}^{-1}}\) (as with \(\slhull \mathcal{G}_\mathrm{S}\), \(\slhull \Psi_\mathrm{S}\)) map to one another one-to-one with mirroring arithmetic operations, while preserving order and means: \(f \leftrightarrow \phi\), \(\umn^x f = \umn^y \phi\), \(f \in \mathcal{F}_\mathrm{S}\), \(\phi \in \Phi_{\mathrm{S}^{-1}}\).
The relationship of similarity of models is reflexive: \(\IM^x \sim \IM^x\), symmetric: \(\IM^x \sim \IM^y \implies \IM^y \sim \IM^x\), and transitive \(\IM^x \sim \IM^y,~ \IM^y \sim \IM^z \implies \IM^x \sim \IM^z\). Similar models have identical sizes and strictly corresponding boundaries, and they can be depicted with identical geometric figures (though on different spaces).
\begin{example}
Let \(\mathcal{X}\) be partitioned into disjoint events: \(\mathcal{X} = A_1 + A_2 + \dots + A_k\) given an interval probability distribution IPD with lower and upper probabilities \(\undertilde{p}_i = \undertilde{P}(A_i)\), \(\tilde{p}_i = \tilde{P}(A_i)\). This model is similar to the IPD on the \(k\)-element space \(\mathcal{Y}_k = \{y_1, \dots, y_k\}\) with the same probabilities \(\undertilde{P}(y_i) = \undertilde{p}_i\), \(\tilde{P}(y_i) = \tilde{p}_i\). The map that induces the similarity will be \(A_i \overset{\mathrm{S}}{\rightarrow} y_i\) and its inverse will be the isomorphism \(y_i \overset{\mathrm{S}^{-1}}{\rightarrow} A_i\). The class \(\mathcal{F}_\mathrm{S}\) is made up of functions of the form \(c + \sum c_i A_i(x)\), and \(\Phi_{\mathrm{S}^{-1}}\) of the form \(c + \sum c_i \delta_{y_i}(y)\).
\end{example}
\pagebreak % 78
Similarity is preserved by algebraic operations on models:
\[
\IM_\theta^x \sim \IM_\theta^y,~~\forall \theta \implies \bigwedge_\theta \IM_\theta^x \sim \bigwedge_\theta \IM_\theta^y,~~\bigvee_\theta \IM_\theta^x \sim \bigvee_\theta \IM_\theta^y.
\]
\section{Random transformations}\label{sec:2.2}
\subsection{Transitional models} In the preceding paragraph, the maps under consideration from \(\pspX\) to \(\pspY\) and their more general forms were isomorphisms, under which every point \(x\) with certainty \(1\) is taken to the set \(B_x \subset \pspY\), and \(B_x\) corresponding to different \(x\)'s either coincide or do not intersect.
Consider the general case, when \(x\) in principle can go to any point \(y \in \pspY\), and knowledge about it is average in nature. Such transformations are called random: \(\pspX \overset{\mathrm{Q}}{\rightarrow} \pspY\). For random transformations \(\mathrm{Q}\), some IM \(\IM_x^y\) on \(\pspY\) is specified by each point by each point \(x\), which is called \textit{transitional} from \(\pspX\) to \(\pspY\) and which is defined by its mean values \(\umn_x^y \phi(y)\). The way the model is specified is exactly as with any IM, specifically it is specified by the primary means \(\umn_x^y \psi,~\psi \in \Psi\), where the \(\psi\) are the primary characteristics of \(\pspY\), the means of which and the form of which will in general depend on \(x\).
In the special case that \(\IM_x^y\) is an interval probability distribution, the events on \(\mathcal{Y}\) will be the primary characteristics and the transitional IM will be fully defined by the upper transitional probabilities
\[
\overline{q}(x, B) = \upr_x^y(B),~B \subset \mathcal{Y},
\]
which indicate the greatest probability with which the point \(x\) will go to the event \(B\) under the transformation \(\mathrm{Q}\). Clearly, \(\underline{q}(x, B) = 1 - \upr_x^y(B^c)\), i.e. the smallest transitional probabilities can be immediately obtained from the upper ones.
\subsection{Transformations of models} Now assume that there is an IM \(\IM^x\) on \(\pspX\) and a transitional model \(\IM_x^y\) on \(\pspY\), which describes the random transformation \(\mathrm{Q}\) from \(\pspX\) to \(\pspY\). Then the corresponding \(\IM^y\) on \(\mathcal{Y}\) will be defined by the means
\begin{equation} \label{eq.2.2}
\umn^y \phi (y) = \umn^x (\umn_x^y \phi (y)).
\end{equation}
Here, \(\umn_x^y \phi (y)\) is a mean by way of the model \(\IM_x^y\), and as functions of \(x\) these means are in turn characteristics on \(\pspX\), the means of which are obtained from \(\IM^x\), denoted by \(\umn^x\), and they yield the same values as \(\umn^y \phi\), defined by \(\IM^y\). The result of the transformation is written as: \(\IM^y = \mathrm{Q} \IM^x\).
\pagebreak %
If \(\mathcal{F}^x\) and \(\mathcal{F}_x^y\) are the spaces of existing means for \(\IM^x\) and \(\IM_x^y\), then for \(\IM^y\) this space will consist of those functions \(\phi(y) \in \mathcal{F}_x^y\) that satisfy \(\umn_x^y \phi(y) \in \mathcal{F}^x\). Here, of course, any bounded functions may be included.
Consider the example of the consistency of calculations in formula \eqref{eq.2.2}.
\stepcounter{example} % NOTE: in the original, an example number seems to be skipped; let's do the same
\begin{example}
Assume that \(\IM^x = \langle \umn^x g \rangle\) has only a single primary mean \(\umn^x g\), and that, likewise, the transitional model \(\IM_x^y = \langle \umn_x^y \psi \rangle\) has only one mean \(\umn_x^y \psi(y) = \overline{h}(x)\). Then according to formula \eqref{eq.2.2}, for a characteristic \(\phi(y)\) we have
\[
\umn^y \phi = \umn^x \umn_x^y \phi = \min_{c + c_1^+ g(x) \geq \umn_x^y \phi}[c + c_1^+ \umn^x g],
\]
where \(\umn_x^y \phi = \min_{d + d_1^+ \psi(y) \geq \phi(y)}[d + d_1^+ \overline{h}(x)]\).
The mean \(\umn^y \phi\) will be defined if \(\umn_x^y \phi\) is defined and dominated by the linear combinations \(c + c_1^+ g(x)\). So for any \(\phi\), the transitional mean \(\umn_x^y \phi\) as a function of the variable \(x\) will be proportional to \(\overline{h}(x)\). Thus, the only thing needed from \(\IM^x\) will be knowledge of \(\umn^x \overline{h}(x)\), i.e. \(\overline{h}(x)\) is its extension. This fact does not carry over to the case of several primary values \(\umn_x^y \psi_j = \overline{h}_j(x)\), as \(\umn_x^y \phi\) will not in general be linear combinations of the \(\overline{h}_j(x)\).
We will explain why this is the case. Assume that there is not one, but rather there are, say, two primary characteristics \(\psi_1\) and \(\psi_2\) of the transitional model with corresponding upper means \(\overline{h}_1(x)\) and \(\overline{h}_2(x)\). Then
\[
\umn_x^y \phi = \min_{d+d_1^+ \psi_1(y) + d_2^+ \psi_2(y) \geq \phi(y)}[d + d_1^+ \overline{h}_1(x) + d_2^+ \overline{h}_2(x)]
\]
and the right hand side will not be an expression of the form of a linear combination of \(\overline{h}_1(x)\) and \(\overline{h}_2(x)\) because the values of the coefficients \(d\) and \(d_j^+\) at which the minimum achieved may depend on \(x\).
\end{example}
In this example, we introduced the case when \(\umn_x^y \phi\), under arbitrary \(\phi\), forms the entire bounded class \(\mathcal{F}_*^x\) of characteristics, and further it is always the closure \([\slhull \mathcal{H}]\) of the semilinear hull \(\slhull \mathcal{H}\) of some set \(\mathcal{H}\). Thus, for the calculation of \(\IM^y = \mathrm{Q} \IM^x\) from \(\IM^x\) one only needs knowledge of \(\umn^x f,~f \in \mathcal{F}_*^x\), and the \(\mathcal{F}_*^x\)-extension of \(\IM^x\) will not change \(\IM^y : \mathrm{Q} \IM^x = \mathrm{Q} \langle \umn^x \mathcal{F}_*^x\rangle\).
Let us pause here and look at the most extreme case first. Assume the transitional IM, which does not depend on \(x : \IM_x^y = \IM_0\). Then \(\umn_x^y \phi\) will be constants, which do not depend on \(x\), and the class \(\mathcal{F}_*^x\) is degenerate at some constant \(c\). It so happens that one does not need to know anything about \(\IM^x\) in order to calculate \(\IM^y: \IM^y = \IM_0\). This is the case when data about the statistical properties on \(\pspY\) do not change if the ``input" \(x\) is known.
Let us now consider a different case. Assume the transitional model \(\IM_x^y\) on disjoint events \(A_j\) of the partition \(\mathcal{A}_\Sigma = \{A_1, \dots, A_k\}\) of the space \(\pspX\). It is sufficient that the primary characteristics, which define the transitional IM, were constant on the \(A_j\), and thus \(\umn_x^y \phi\) will be constant for \(x \in A_j\), i.e. \(\mathcal{A}_\Sigma\)-measurable, so that the class \(\mathcal{F}_*^x\) can be obtained. The model \(\IM^x\), without prejudice to \(\IM^y = \mathrm{Q} \IM^x\) may be extended to the IM, whose primary elements are the means \(\umn \sum c_i A_i(x)\) of all \(\mathcal{A}_\Sigma\)-measurable functions (but not to the IRB with primary events from \(A_\Sigma\) or their unions).
Deterministic transformations are a special case of random transformations, when transitional models \(\IM_x^y = \langle P(B_x) = 1\rangle\) become naked on the mutually disjoint \(B_x\), i.e. \(x \rightarrow B_x\) deterministically. These transformations relate to the previous type if, for some \(B_x\), we denote \(A_z = \{x : x\rightarrow B_z\}\), and thus \(A_z \leftrightarrow B_z\).
\subsection{Properties of model transformations} Let us consider the properties possessed by the operation of transformation of this kind of statistical recalculation of models: \(\IM^y = \mathrm{Q} \IM^x\). Assume that there are no data on \(\pspX\), meaning the IM \(\mathcal{J}^x\) is bare. Then \(\IM^y = \mathrm{Q} \mathcal{J}^x\) will be defined by the means:
\[
\umn^y \phi(y) = \sup_x \umn_x^y \phi(y)
\]
and the result is written as
\[
\mathrm{Q} \mathcal{J}^x = \bigvee_{x \in \pspX} \IM_x^y.
\]
This means that the transformation of the bare IM is equivalent to the union of the transitional IMs with input \(x\), which is regarded as a parameter.
We will fix this result by giving it an extended description: \textit{the union of the IM \(\IM_\theta\) over the parameter \(\theta \in \Theta\) can equivalently be regarded as the result of a random transformation of the set \(\Theta\), for which there is no prior data, to the outcome space, while \(\IM_\theta\) may be regarded as a model of this transformation.}
The lack of prior data about \(x\) will yield the broadest model on the output, so in general \(\mathrm{Q} \IM^x \subset \mathrm{Q}\mathcal{J}^x\).
But what if the transitional model \(\IM_x^y\) is bare for all \(x\), i.e. there are no data about the transformation? Then the transformation of any \(\IM^x\) yields a bare model, which does not carry any statistical data concerning the outcomes on \(\pspY\):
\[
\IM_x^y = \mathcal{J}^y \implies \mathrm{Q} \IM^x \equiv \mathcal{J}^y,~\forall \IM^x.
\]
This fact follows from the formula \eqref{eq.2.2}: \(\umn^y \phi(y) = \umn^x \sup \phi(y) = \sup \phi(y)\).
We will not consider how the inclusion relations and union/intersection operations of models on \(\pspX\) transform under a random transformation \(\mathrm{Q}\).
\begin{enumerate}
\item The inclusion relation is preserved under transformations:
\[
\IM_1^x \subset \IM_2^x \implies \mathrm{Q} \IM_1^x \subset \mathrm{Q} \IM_2^x.
\]
\item The transformation of unions of models on \(\pspX\) is equal to the union of their transformations:
\[
\mathrm{Q}(\bigvee_v \IM_v^x) = \bigvee_v \mathrm{Q} \IM_v^x.
\]
\pagebreak %
In particular, if \(\IM^x = \bigvee_v \mathcal{P}_v\) appears as a union of vertices, then the transformation of \(\IM^x\) will appear as a union of transformed vertices:
\[
\mathrm{Q} \IM^x = \bigvee_v \mathrm{Q} \mathcal{P}_v.
\]
\item The transformation of the intersections of models is contained in the intersection of their transformations:
\[
\mathrm{Q}(\bigwedge_v \IM_v^x) \subset \bigwedge_v \mathrm{Q} \IM_v^x
\]
The fact that this is an inclusion rather than an equality is due to the weakening of the order between characteristics under random transformations. In particular (when $v$ is an index for primary characteristics), we obtain
\[
\mathrm{Q} \langle \umn_\pchars^x \rangle = \mathrm{Q} \bigwedge_{g \in \pchars} \langle \umn^x g\rangle \subset \bigwedge_{g \in \pchars} \mathrm{Q} \langle \umn^x g\rangle.
\]
\item The following inclusion is valid:
\[
(\bigvee_\theta \mathrm{Q}_\theta) \IM^x \supset \bigvee_\theta \mathrm{Q}_\theta \IM^x,
\]
where \(\mathrm{Q} = \bigvee_\theta \mathrm{Q}_\theta\), the transformation corresponding to the union of transitional IMs: \(\IM_x^y = \bigvee_\theta \IM_{x, \theta}^y\).
In fact, \(\umn^y \phi = \umn^x \sup_\theta~ \umn_{x, \theta}^y \phi \geq \sup_\theta~ \umn^x \umn_{x, \theta}^y = \sup_\theta~ \umn_\theta^y \phi\).
\end{enumerate}
As an example, we will give a geometric illustration of what has been said.
\begin{example}
Assume that the three-point set \(\pspX = \{x_1, x_2, x_3\}\) is mapped to itself by the random transformation \(\mathrm{Q}\). For each value \(x_i\), the transitional model \(\IM_x^y\) is a complete set \(\mathrm{Q}_i\) of probability vectors \(\mathbf{q}^\top = (q(x_i, y_1), q(x_i, y_2), q(x_i, y_3))\), so \(\umn_{x_i}^y \phi = \max_{q \in \mathrm{Q}_i}\mathbf{q}^\top \phi\), where the function \(\phi\) is a vector. The set \(\mathrm{Q}_i\), as shown in \ref{fig:2.2} (a), is a result of transforming models \(\langle P(x_i) = 1\rangle,~i=1, 2, 3\), which are the vertices of \(\probs\). Transforming the probabilities \(\mathbf{P} = (P(x_1), P(x_2), P(x_3))\) leads to the family of vectors:
\[
\mathrm{Q} \mathbf{P} = \left\{\mathbf{P}^y ~ : ~ P^y(y_j) = \sum_{i=1}^3 P(x_i)q(x_i, y_j),~q \in Q_i \right\}.
\]
As seen in Figure \ref{fig:2.2} (b), \(\mathrm{Q} \IM^x\) is formed from the transformed vectors of probability, which lie on the vertices of \(\IM^x\) and which are ``driven" into the interior of the transformation \(\mathrm{Q} \probs^x\) of the bare IM. The essence of property 3 above is revealed in the inclusion: \((\mathrm{Q} \mathbf{P}_2 \vee \mathrm{Q} \mathbf{P}_3) \wedge (\mathrm{Q} \mathbf{P}_3 \vee \mathrm{Q} \mathbf{P}_1) \supset \mathrm{Q} \mathbf{P}_3\).
\end{example}
\begin{figure}
\centering
***
\caption{Images of models under random transformations.}
\label{fig:2.2}
\end{figure}
\subsection{Indicator transformations, interval arithmetic} Consider the special class \(\IM^x\) and \(\IM_x^y\). Assume that \(\IM^x\) is given with a single sentence: ``the event \(A\) from \(\pspX\) is reliable." This will yield the indicator model; \(\IM^x = \langle P_x(A) = 1\rangle\), which is defined by the coinciding means \(\umn^x f(x) = \sup_{x \in A} f(x) = \overline{f}(A),~\forall f \in \bchars_0\). Assume a random operator reliably takes each point \(x\) to the event \(B_x\) in \(\pspY\), which is described by \textit{indicator transitional models} \(\IM_x^y = \langle P^y (B_x) = 1\rangle\) defined by the means: \(\umn_x^y \phi(y) = \overline{\phi}(B_x),~ \forall \phi \in \bchars_0\) (here it is not required, as with deterministic isomorphic mappings, that the \(B_x\) are mutually disjoint). Then according to \eqref{eq.2.2}
\[
\umn^y \phi(y) = \sup{x \in A}(\sup_{y \in B_x}\phi(y)) = \overline{\phi}(\bigcup{x \in A} B_x),
\]
i.e. the resulting \(\IM^y\) will also be \(B\)-indicator with events \(B = \cup_{x \in A} B_x ~:~ \IM^y = \langle P^y(B) = 1\rangle\).
In this way, indicator transformations take indicator models to themselves, remaining within this very simple class. But in fact, direct transformations are produced from some events to others. In the case that $\pspX = \pspY = \reals$, $A$ and $B_x$ are intervals on these spaces, and the transitional models reflect the simplest arithmetic operations, we obtain the rules of transforming intervals---interval arithmetic.
\textsc{Addition}. Let \(\IM^x\) be an indicator model on the segment \(A = [a, b]\) and let the transitional operator add to the number \(x\) the segment \([c, d]\), i.e. \(\IM_x^y\) are the indicator models on the segments \(B_x = x + [c, d] = [x+c, x+d]\). Then the indicator for \(\IM^y\) will be the segment \(B = [a, b] + [c, d] = [a+c, b+d]\), which gives us a rule for interval addition.
\textsc{Subtraction}. Analogous to the previous \([a, b] - [c, d] = [a - d, b - c]\). Under addition and subtraction, the width of the resulting segment is equal to the sum of the constituent segments.
\textsc{Multiplication}. With the same model \(\IM^x\), the transitional operator multiplies each number \(x\) to a segment, which produces the segment \(B_x = x[c, d]= [xc, xd]\) when \(x \geq 0\) and \(B_x = [xd, xc]\) when \(x < 0\). Taking the union of the \(B_x\) for \(x \in [a, b]\), we obtain the result of multiplying an interval by another interval \(B = [a,b] \times [c, d] = [\min \{ac, ad, bc, bd\}, \max\{ac, ad, bc,bd\}]\), given on the indicator model \(\IM^y\).
\pagebreak % 83
\textsc{Division}. With the same model \(\IM^x\), the operator carries out division such that when \(x \geq 0\) we have
\[
B_x = x / [c, d] = \left\{\begin{array}{cc}
\left[\frac{x}{d}, \frac{x}{c}\right] & \text{if } c \leq d < 0, \\
& \\
\left( -\infty, \frac{x}{c}\right] \cup \left[\frac{x}{d}, \infty\right) & \text{if } c \leq 0 \leq d, \\
& \\
\left[ \frac{x}{d}, \frac{x}{c} \right] & \text{if } 0 < c \leq d,
\end{array} \right.
\]
and if \(x < 0\), \(d\) is replaced with \(c\) in the inequalities on the right hand side above.
Taking the union over \(x \in [a, b]\) yields the result of interval division:
\[
[a, b] / [c, d] = \left\{ \begin{array}{cc}
[a, b] \times [1/d, 1/c], & 0 \notin [c, d], \\
(-\infty, \max\{a/c, b/d\}] \cup [\min\{a/d, b/c\}, \infty), & c \leq 0 \leq d.
\end{array} \right.
\]
In the first case when \(0 \notin [c, d]\), the result is expressed via interval multiplication, while in the second case it is expressed as the union of segments, and here division results not in one interval but in two half-lines.
Under both addition and subtraction, an interval becomes wider. Clearly, since fuzzy transformations can only add more ambiguity, the operations of interval addition and subtraction are not mutual inverses; for instance \([a,b] + [c, d] - [c, d] = [a+c -d, b+d-c]\) (except when \(c=d\)). This is the same with the operations of multiplication and division.
Repeated operations lead to further expansion of the resulting intervals. When splitting intervals with the operation of division, subsequent operations must be performed separately on each part, and their union must be taken later.
\subsection{Simple transformations} In the discussion of deterministic mappings \(\mathrm{S}\) in the previous paragraph, the one-to-one connection between the classes \(\mathrm{S}\) and \(\mathrm{S}^{-1}\) of conceivable characteristics \(\mathcal{F}\mathrm{S} \leftrightarrow \Phi_{\mathrm{S}^{-1}}\)---one class on \(\pspX\), the other on \(\pspY\)---served as the basis for finding \(\IM = \mathrm{S} \IM^x\), and means were carried over from \(\mathcal{F}_\mathrm{S}\) to \(\Phi_{\mathrm{S}^{-1}}\), defining \(\IM^y\). We will show that an analogous connection exists for several types of random transformations (which generalize isomorphisms). Let us define them.
Let \(\Phi\) be the set of characteristics on \(\pspY\). We call random transformations from \(\pspX\) to \(\pspY\) \(\Phi\)-\textit{simple} when they are given precisely by the means \(\lmn_x^y \phi = \umn_x^y \phi = \mn_x^y \phi,~\forall \phi \in \Phi\); these are primary, so \(\IM_x^y = \langle \mn_x^y \Phi \rangle\).
In light of the fact that precise means cover the linear hull, remaining precise on it, \(\Phi\)-simple and \(\lhull \Phi\)-simple transformations are essentially the same; thus it is convenient to consider \(\Phi\) a linear subclass of characteristics on \(\pspY\).
\pagebreak %
Under \(\Phi\)-simple transformations, a characteristic \(\phi \in \Phi\) on \(\pspY\) is assigned to a corresponding characteristic \(f_\phi\) on \(\pspX\) via
\[
f_\phi(x) = \mn_x^y \phi (y),
\]
which is called the \textit{pre-image} of \(\phi\), so \(\phi \rightarrow f_\phi\). Means of characteristics \(\phi \in \Phi\) and of their images \(f_\phi\) are the same:
\begin{equation}\label{eq.2.3}
\umn^y \phi(y) = \umn^x \mn_x^y \phi(y) = \umn^x f_\phi(x).
\end{equation}
We will denote the linear class by
\[
\mathcal{F}_\Phi = \{f_\phi(x),~\phi \in \Phi\}
\]
and we will call it the \textit{pre-image} of \(\Phi\). In this way, a \(\Phi\)-simple transformation \(\mathrm{Q}\) induces the mapping \(\Phi \rightarrow \mathcal{F}_\Phi\) of characteristics on \(\pspY\) to their images on \(\pspX\). Conversely, each characteristic \(f \in \mathcal{F}_\Phi\) will correspond to a subset \(\Phi_f\) of characteristics \(\phi \in \Phi\) for which \(f\) is their image:
\[
f \rightarrow \Phi_f = \{\phi : \phi \in \Phi,~\mn_x^y \phi(y) = f(x)\}.
\]
The set \(\Phi_f\) is called the \textit{fuzzy image of the characteristic \(f\)} with respect to \(\mathrm{Q}\). Every characteristic \(f\) from \(\mathcal{F}_\Phi\) has a corresponding image, and taking the union of these images yields \(\Phi\). This is illustrated in Figure \ref{fig:2.3}.
\begin{figure}
\centering
***
\caption{***}
\label{fig:2.3}
\end{figure}
If \(\phi_1 \geq \phi_2\), then \(f_{\phi_1} = \mn_x^y \phi_1 \geq \mn_x^y \phi_2 = f_{\phi_2}\). It follows that the order correspondence between characteristics \(\phi \in \Phi\) entails the same relationship between pre-images. This is not always the case for images: it may be that \(f_1 \geq f_2,~f_1,f_2 \in \mathcal{F}_\Phi\), while \(\phi_1 \geq \phi_2\) does not hold for \(\phi_1 \in \Phi_{f_1},~\phi_2 \in \Phi_{f_2}\). The conclusion is that order relations within \(\Phi\) are weaker than within \(\mathcal{F}_\Phi\), and, correspondingly, \textit{random transformations, even simple ones, violate order between characteristics}.
Let's consider what happens with \(\IM^y = \mathrm{Q} \IM^x\) under simple transformations. First of all, if \(\IM^x\) is a \(\pchars\)-precise model (\(\lmn^x g = \umn^x g,~g \in \mathcal{G}\)) and \(\pchars \in \mathcal{F}_\Phi\), then \(\IM^y\) will be \(\Phi_\pchars = \bigcup_{f \in \pchars} \Phi_f\)-precise.
Now assume that \(\IM^x = \langle \tilde{\mn}^x \pchars \rangle\) is not precise and is given on the primary set of characteristics \(\pchars\). Also assume that \(\pchars \subset \mathcal{F}_\Phi\). Since \(\mathcal{F}_\Phi\) is a linear class, we have \(\lhull \pchars \subset \mathcal{F}_\Phi\). What will \(\IM^y = \mathrm{Q}\IM^x\) be here? According to equation \eqref{eq.2.3}, \(\umn^y \phi\) on characteristics from \(\Phi\) are transferences of means from the pre-images of these characteristics, i.e. from characteristics from the class \(\mathcal{F}_\Phi\). Extending further to arbitrary characteristics on \(\pspY\) follow from equation \eqref{eq.2.2}
\[
\umn^y h(y) = \umn^x [ \inf_{h \leq \phi \in \Phi} \mn_x^y \phi ] = \umn^x [ \inf_{h \leq \phi \in \Phi} f_\phi (x) ],
\]
where the infimum is taken over \(\phi \in \Phi\). It is evident that in the general case
\[
\umn^y h(y) \leq \inf_{h \leq \phi \in \Phi} \umn^x f_\phi = \inf_{h \leq \phi \in \Phi} \umn^y \phi.
\]
When the inequality is strict, the class \(\Phi\) will not be primary for \(\IM^y\), but rather, a proper subset \(\IM^y \subset \langle \umn^y \Phi \rangle\), where \(\langle \umn^y \Phi \rangle\) is the \(\Phi\)-extension of \(\IM^y\). The same equality will occur when the infimum is taken inside the symbol \(\mn^x\). This is the very case when \(\umn_x^y h \in \mathcal{F}_\Phi\) for all \(h\) for which the left hand side is defined. From this, it follows that:
\textit{If \(\mathrm{Q}\) is a random transformation; a) the transitional models \(\IM_x^y = \langle \mn_x^y \Phi\) are given precisely on the class \(\Phi\), b) if \(\umn_x^y h, \forall h \in \mathcal{F}^y\) are preimages of \(\phi \in \Phi\), then \(\IM^y = \mathrm{Q} \IM^x\) will be completely defined by the means \eqref{eq.2.3} on the class of characteristics \(\Phi\).}
Let's consider an example for which the conditions of this assertion are met.
\begin{example}
Let \(\pspX = \reals^n,~\pspY = \reals^m\), vector spaces, and let the transformation \(\reals^n \overset{\mathrm{Q}}{\rightarrow} \reals^m\) be a precise transitional probability density \(p_x(y)\) on \(\reals^m\) for each vector \(x \in \reals^n\). If this density corresponds to a length measure, then the class \(\Phi\) is the set of all integrable functions \(\phi(y)\) whose pre-images are
\[
f_\phi(x) = \int_{y \in \reals^m} \phi(y) P_x(y) dy,~~\phi \in \Phi.
\]
\end{example}
What will this class of pre-images be? To answer this question, we will ``christen" \(y\) as a parameter taking values in \(\pspY\) and we will view the transitional density as a set of transformation functions on \(x\) indexed by the parameter \(y\), for which we will rewrite it as \(q(x,y) = p_x(y)\). If we interpret the integral as a linear combination of functions \(q(x, y)\) with weights \(\phi(y)\), then it is clear that the class \(\mathcal{F}_\Phi\) is the closure of the linear hull \(\sum_i c_i q(x, y_i)\). This class is linear. Its dimension is defined by the number of linearly independent functions \(q(x, y_i)\), i.e. the dimension of its basis. This may be small if \(q(x, y)\) only takes a few specific forms as a function of \(x\) for all possible \(y \in \pspY\).
When calculating \(\IM^y\) one can discard all knowledge about \(\IM^x\), leaving only \(\umn^x f_\phi (x),~f_\phi \in \mathcal{F}_\Phi\) since these yield the means \(\umn^y \phi(y)\) for any integrable \(\phi\).
\begin{addendum} \item \textsc{Connections between transformations} The class of indicator maps is wider than the class of isomorphic maps \S\ref{sec:2.1}, in which \(x \rightarrow B_x\), but the \(B_x\) either must coincide or be mutually disjoint.
Although \(\IM^x\) are defined for indicator transformations \(x \rightarrow B_x\) by precise primary means \(\mn_x^y B_x = 1\), indicator transformations are not included in the class of simple transformations.
Actually, they are given by the precise values \(\mn_x^y \phi(y),~\phi \in \Phi\), in which primary characteristics \(\phi(y)\) of transitional models do not depend on \(x\) (they only depend on the primary values themselves), and \(B_x\) corresponds to the indicator characteristic \(B_x(y)\), which depends on \(x\).
\pagebreak
Isomorphisms \S\ref{sec:2.1} are a special case of simple transformations, when indicators \(\phi(y) = B_j(y)\) are primary for a transitional model, where \(B_j\) are elements of a partition of \(\pspY\) and their probabilities \(\mn_x^y \phi(y) = P_x^y (B_j)\) are equal to 1 if \(x \rightarrow B_j\) and equal to 0 otherwise.
\item \textsc{Random similarity} We will give a generalization of the understanding of similarity, showing that models connected by simple random transformations and by endowed properties of similar mappings \S\ref{sec:2.1} may be similar.
Models \(\IM^x\) and \(\IM^y\) are called \textit{similar}, written as \(\IM^x \sim \IM^y\), if, first, \(\IM^y = \mathrm{Q} \IM^x\) for some transformation \(\mathrm{Q}\) (in general, a random one) from \(\pspX\) to \(\pspY\), and second, there exists a ``return" (not always an inverse) transformation \(\mathrm{Q}^-\) from \(\pspY\) to \(\pspX\) (also, in general random) such that \(\IM^x = \mathrm{Q}^- \IM^y\), i.e. it returns to the original model on \(\pspX\). One can prove that a random transformation \(\mathrm{Q}\) \textit{will be a similarity transformation if three conditions are met: a) \(\mathrm{Q}\) is given precisely on the class \(\Phi\) with transitional models \(\IM_x^y = \langle \mn_x^y \Phi\rangle\); b) characteristics from \(\Phi\) connect one-to-one and with preservation of order \(\phi \leftrightarrow f_\phi\) to their pre-images \(f_\phi = \mn_x^y \phi,~f_\phi \in \Phi\) (suggesting an isomorphism between \(\Phi\) and \(\mathcal{F}_\Phi\)); c) the class \(\mathcal{F}_\Phi\) is defined for \(\IM^x\) (and \(\Phi\) is defined for \(\IM^y\)).}
Let's consider an example when these conditions are met.
Let each primary characteristic \(g\) of the model \(\IM^x\) admit the decomposition \(g(x) = \sum_{i=1}^{k+1} g_i q_i(x)\) in the basis \(q_1, \dots, q_{k+1}\) such that: a) \(q_i(x) \geq 0\); b) \(\sum_{i=1}^{k+1} q_i(x) = 1\); c) \(g(x) \geq 0 \iff g_i \geq 0,~i=1, \dots, k+1\). Then the transformation \(\mathrm{Q}\) from \(\pspX\) to \(\pspY = \{y_1, \dots, y_{k+1}\}\), given transitional probabilities \(P_x(y_i) = q_i(x)\), is similar for \(\IM^x\) and leads to the similar model \(\IM^y\) defined by the means
\[
\umn^y \phi(y) = \umn^y \sum_{i=1}^{k+1} \phi_i \delta_{y_i} (y) = \umn^x \sum_{i=1}^{k+1} \phi_i q_i(x),~\phi_i = \phi(y_i),~\forall \phi_i.
\]
This is illustrated in Fig. \ref{fig:2.4}.
\end{addendum}
\begin{figure}[b]
\centering
***
\caption{Random similarity: a) the basis; b) a view of the primary characteristics of \(\IM^x\)}
\label{fig:2.4}
\end{figure}
\pagebreak %
\section{Fuzzy events and fuzzy probabilities}\label{sec:2.3}
\subsection{Observations and their pre-images} Observations are the results of a phenomenon, fixed through a channel, measurement devices, sensory organs, etc. It is convenient to imagine two spaces: a space \(\pspX\) of outcomes of a phenomenon called a \textit{subjective} (sometimes, universal [18]) space, and a space \(\pspY\) for describing results of \textit{observations} (in the form of numbers, judgments, verbal statements, etc.) as events \(B\) on \(\pspY\). The spaces \(\pspX\) and \(\pspY\) are connected by a random operator, which is described by the transitional model \(\IM_x^y\). Through the prism of this operator, in fact, we are watching from \(\pspY\)'s side what is happening in \(\pspX\). Every observation \(B \subset \pspY\) will have its interval pre-image in the subjective space \(\pspX\):
\[
[\underline{q}(x, B), \overline{q}(x, B)] = [\lpr_x^y (B), \upr_x^y (B)],~B \subset\pspY
\]
These are the probability bounds of the event \(B\) with the outcome \(x \in \pspX\), which are computed according to the transitional model \(\IM_x^y\). The blurring of the curved pre-image as a function of the variable \(x\) characterizes the fuzziness of the observation \(B\).
Pre-images of different events \(B_i \subset \pspY\) are components of means of a transitional model, and therefore must be consistent with one another. From here, the logical connection between events in \(\pspY\) give rise to corresponding relations between pre-images in \(\pspX\):
\begin{enumerate}
\item $B_1 \subset B_2 \iff \left\{ \begin{array}{c}
\underline{q}(x, B_1) \leq \underline{q}(x, B_2),\\
\overline{q}(x, B_1) \leq \overline{q}(x, B_2);
\end{array}\right.$
\item $B_1 B_2 = \emptyset \iff \left\{ \begin{array}{c}
\underline{q}(x, B_1+B_2) \geq \underline{q}(x, B_1) + \underline{q}(x, B_2),\\
\overline{q}(x, B_1+B_2) \leq \overline{q}(x, B_1) + \overline{q}(x, B_2);
\end{array}\right.$
\item $B_1 = B_2^c \iff \left\{ \begin{array}{c}
\underline{q}(x, B_1) = 1 - \overline{q}(x, B_2),\\
\overline{q}(x, B_1) = 1 - \underline{q}(x, B_2);
\end{array} \right.$
\item The whole space \(\pspY\) and the empty set \(\emptyset\) are observations with trivial (equal to 1 and 0) pre-images:
\[
q(x, \pspY) \equiv 1,~q(x, \emptyset) \equiv 0.
\]
\end{enumerate}
To define the pre-images \(\lpr_x(B), \upr_x(B)\) of all \(B \subset \pspY\) it is sufficient to assign transitional probabilities to the primary events \(B_i \in \mathcal{B}\) of the form \(\plpr_x(B_i), \pupr_x(B_i),~B_i \in \mathcal{B}\), and then carry these over to any event according to the known formula of continuation and consistency:
\begin{equation*}
\begin{aligned}
\upr_x(B) &= \inf_{c + \sum_i c_i B_i(y) \geq B(y)}[c + \sum_i (c_i^+ \pupr_x(B_i) - (-c_i)^+ \plpr_x(B_i))],\\
\lpr_x(B) &= 1 - \upr_x(B^c),
\end{aligned}
\end{equation*}
where \(c_i^+ = c_i\) when \(c_i \geq 0\) and \(c_i^+ = 0\) when \(c_i < 0\). This will lead to the concordance of the pre-image of any \(B \subset \pspY\).
Observations, as events on \(\pspY\), are divided into the \textit{primary} \(B_i\), for which, at the outset, the pre-images \(\underline{q}(x, B_i) = \lpr_x(B_i),~\overline{q}(x, B_i) = \upr_x(B_i)\) are known, and the remaining \(B \subset \pspY\), which logically follow (the logic of relations of judgments is used, essentially, in the inequality under the infimum when finding \(\upr_x(B)\)).
From observations \(B\) by using ``conjectural statements" \textit{fuzzy judgments} are formulated, according to which a random choice of one of several \(B_i \subset \pspY\) is produced with probabilities \(\gamma_i,~i=1,2, \dots\). In other words, it is not clear which of the observations \(B_i\) take place and \(\gamma_i\) is the degree of confidence (the probability) that \(B_i\), in fact, occurred. The pre-image of such a judgment on the subjective space will be \([\sum_i \gamma \underline{q}(x, B_i), \sum_i \gamma_i \overline{q}(x, B_i)]\). So, if \(B\) is correct with probability \(p\) and false with probability \(1-p\), then the pre-image on \(\pspX\) will be
\[
[p \underline{q}(x, B) + (1-p)(1 - \overline{q}(x, B)), p \overline{q}(x, b) + (1-p)(1 - \underline{q}(x, B))].
\]
Every judgment, either precise in the form of an event \(B\) or fuzzy, is objectively defined by its pre-images. Two judgments having the same pre-images will be identical, since they are described by the same situation, only differently. In general, any pair of bounds \([\underline{q}(x), \overline{q}(x)],~0 \leq \underline{q}(x) \leq \overline{q}(x) \leq 1\) on the subjective space \(\pspX\) is a \textit{fuzzy event}. Of course, it is not necessarily the pre-image of an observation. Pre-images of primary observations, all remaining observations, and of judgments are only a special class of fuzzy events. As such, there generally exist many more fuzzy events than observations. The logic of fuzzy events and operations between them are defined through relations and operations between bounds by analogy with the relationships 1--4 between pre-images.
\subsection{Fuzzy probabilities and means} The very statement that a mean \(\mn f\) takes some concrete value is a form of representing our knowledge about the mean, that is to say, an observation of it. If the mean is precise, we have a precise observation of it. An interval mean \([\lmn f, \umn f]\) is one of the forms of fuzzy observation of means or fuzzy knowledge of it. Put more correctly, it is the pre-image of our knowledge, which has data about the mean, on the subjective space of its values, the real line. When a mean is interval the pre-image is indicator, as demonstrated in Figure \ref{fig:2.5}. But there are other, more general forms of pre-images that we have studied, in the form of outlines. Here, our goal is to apply these to means.
\begin{figure}[b]
\centering
***
\caption{***}
\label{fig:2.5}
\end{figure}
\pagebreak % 89
One caveat must be made first. In their essence, means, just like probabilities, are convex in the sense that the ranges of their values are convex sets. One cannot have an interval mean in the form of a union of two non-intersecting segments \([\lmn f, \umn f], [\underline{\lmn} f, \overline{\umn} f], \umn f < \underline{\lmn} f\); the segments must merge to one segment \([\lmn f, \overline{\umn} f]\). This imposes quite a concrete imprint on the general form of curves of images of means, introduced by the following definition.
A function \(q_{\mn f}(r)\) on real values \(r \in \reals\) is called the pre-image of a mean \(\mn f\), or the \textit{fuzzy mean of the characteristic \(f\)} if:
\begin{enumerate}[noitemsep]
\item It is non-negative and less than \(1\): \(0 \leq q_{\mn f} \leq 1\);
\item It is equal to \(0\) outside the range \([\inf f, \sup f]\) of possible values of \(\mn f\);
\item It is unimodal (it does not have local maxima);
\item it is lower semi-continuous;
\item It takes value \(1\) for at least one \(r\).
\end{enumerate}
The forms of fuzzy means are shown in Figure \ref{fig:2.5}. When \(f = A\), we have interval probabilities.
We will need the concept of a horizontal \textit{slice} of a fuzzy mean at height \(\gamma,~0 \leq \gamma \leq 1\); this is an interval, within which the pre-image of the mean is greater than or equal to \(\gamma\):
\[
[\lmn_{(\gamma)} f, \umn_{(\gamma)} f] = \{r~:~q_{\mn f}(r) \geq \gamma\}.
\]
The conditions already specified, which limit the form of fuzzy means, essentially require that the slice at any \(0 \leq \gamma \leq 1\) is interval and only interval. This interval creates the foundation, which connects fuzzy means with intervals and further with interval models. The magnitude of \(\gamma\) is a degree of confidence or a kind of preference allotted to the interval.
Any fuzzy mean is equivalently defined through its dependence on \(\gamma\) as it ranges in value from \(0\) to \(1\), corresponding to the slice intervals \([\lmn_{(\gamma)}, \umn_{(\gamma)} f]\), which contract as \(\gamma\) increases. Infinite values for the bounds of these intervals are allowed.
Let us turn to the concept of the fuzzy model of means. We will imagine for the moment, that fuzzy means \(q_{\mn f}(r)\) are given for all characteristics \(\forall f\), bounded and unbounded. The slices \([\lmn_{(\gamma)} f, \umn_{(\gamma)} f]\) for a single height \(\gamma\) are interval means. If they concur \(\forall f\), then they define an interval model \(\IM_{(\gamma)}\) with range of existence \(\mathcal{F}_{(\gamma)} = \{f~:~\umn_{(\gamma)} f < \infty\}\). As \(\gamma\) ranges in value from \(0\) to \(1\), it forms a contracting sequence of models: \(\IM_{(\gamma)} \subset \IM_{(\gamma ')},~\gamma \geq \gamma'\), which, as though lying upon one another on different levels \(\gamma\), give the image of a pyramid, i.e. a fuzzy model.
\pagebreak % 90
A sequence \(\IM_{(\gamma)}\) of interval models that contracts as \(\gamma\) grows from \(0\) to \(1\) and that does not degenerate to the empty set (when \(\gamma=1\)) is called a \textit{fuzzy model}. It is defined by the totality of fuzzy means \(q_{\mn f}(r),~\forall f\), which are consistent with each other in each slice.
From this interpretation of a fuzzy model follows the main way of specifying one: at first the fuzzy means \(\tilde{q}_{\mn g}(r)\) are given on the set \(g \subset \pchars\) of primary characteristics. Then, \(\gamma\)-slices \(\plmn_{(\gamma)} g,~\pumn_{(\gamma)} g\) are taken, which like primary means yield contracting models \(\IM_{(\gamma)} = \langle \plmn_{(\gamma)} \pchars,~\pumn_{(\gamma)} \pchars \rangle\) as \(\gamma\) grows. When \(\gamma = 1\), the primary means must not contradict one another, so that the highest slice is non-empty: \(\IM_{(\gamma)} \neq \emptyset\) (what is more they will not be contradictory for all \(0 \leq \gamma \leq 1\)). Via \(\IM_{(\gamma)}\) the intervals \(\lmn_{(\gamma)} f, \umn_{(\gamma)} f,~\forall f\) are found, which give the \(\gamma\)-slices defining the fuzzy means \(q_{\mn f} (r),~ \forall f\). It is not difficult to verify that these satisfy conditions 1--5 included in the definition of fuzzy means. As a special case, we have fuzzy probabilities \(q_{P(B)}(r),~\forall B\), whose distinguishing characteristic is that they are located, as seen in Figure \ref{fig:2.5}, in the interval \([0, 1]\) of values \(r\).
In the process of continuing means to all characteristics, the consistency of primary values \(\tilde{q}_{\mn g}(r)\) happens in parallel; the new curves \(q_{\mn g}(r)\) obtained as a result are, in general, narrower: \(q_{\mn g}(r) \leq \tilde{q}_{\mn g}(r)\), and accordingly, more precise.
As a small digression, let's delineate levels of descriptions. The first level concerns the gradation of events on those that are elementary \(x \in \pspX\), complex \(A \subset \pspX\), those defined by precise pre-images \(q(x, B)\), and finally interval \([\underline{q}(x, B), \overline{q}(x, B)]\) events. The next level consists of statistical descriptions: a precise mean \(\mn f\), an interval mean \([\lmn f, \umn f]\), or a fuzzy mean \(q_{\mn f}(r)\).
Lastly, one could speak about a higher level, namely, the fuzziness of these descriptions, introducing instead of \(q_{\mn f}(r)\) the interval \(\underline{q}_{\mn f}(r), \overline{q}_{\mn f}(r)\), and further, generalizing it to some membership curve. The question is only whether this kind of complication can be justified. As an answer, let's consider the extreme case when there is entirely no knowledge of a mean, which is equivalent to the bare model \([\lmn f, \umn f] = [\inf f, \sup f]\), which is also equivalent to the trivial pre-image of the form \(\underline{q}_{\mn f}(r) \equiv 0,~ \overline{q}_{\mn f}(r) \equiv 1\). And since the preceding is certainly a less convenient form, switching to meta-descriptions (descriptions of descriptions) and further hardly makes any sense to do.
\pagebreak % 91
\subsection{Fuzzy operations} Interval arithmetic \S\ref{sec:2.2} is suited to calculating mistakes in computation that are caused by rounding numbers. But in some problems there is the need and further there is the possibility to indicate the location of the unknown not in the form of intervals, i.e. categorically, where ``yes" indicates belonging to the interval and ``no" indicates non-belonging, but more smoothly in the form of preference curves, which are assigned to certain numerical values (these are also called membership curves [15]). It is with these that we must create arithmetic and analytical operations.
We will denote by \(a(r)\) the fuzzy pre-image of a number; this is a function on \(\reals\) that satisfies the properties of fuzzy means (illustrated in Figure \ref{fig:2.5}). The pre-image of \(a(r)\) should be interpreted as a set of intervals \(A(\gamma) = \{r~:~a(r) \geq \gamma\}\), which are obtained from the horizonal \(\gamma\)-slices \(a(r)\), and every slice defines an interval number in the form of its indicator model \(\IM_{(\gamma)} = \langle P(A(\gamma)) = 1\rangle\). When taken together over \(0 \leq \gamma \leq 1\) the slices define a fuzzy number model in the sense that was discussed in the previous section.
Any operation on numbers is calculated through the rules of interval arithmetic (i.e. by the rules of transformations of indicator models) for each \(\gamma\)-slice separately and is later combined through union over \(\gamma\) into the pre-image of the result. More specifically, if \(a_j(r)\) are pre-images of numbers, then \(A_j(\gamma)\) will be mirror images of \(a_j(r)\) with respect to the main diagonal, i.e. they are the same as \(a_j(r)\), but with the \(x\)- and \(y\) axes exchanged. For each \(\gamma\) the values of \(A_j(\gamma)\) will be intervals; thus, the transformation \(f(a_1(r), \dots, a_J(r))\) is calculated according to the rules of operations on interval numbers, the total of which will become the intervals \(F(\gamma),~0 \leq \gamma \leq 1\), for which it remains to change axes from \(y\) to \(x\), thereby obtaining the fuzzy result \(f(r)\).
We thus have a model interpretation of fuzzy numbers and arithmetic operations, the details of which we defer to the review book [18]. Our next step leads to the theory laid out in that work. The theory consists of the definition of fuzzy functions as mappings \(z \rightarrow a_z(r)\), and the intervals obtained from them as intervals from the bounds of interval functions \(A_z(\gamma)\) (of \(\gamma\)-slices), which give an interval result after translation to its pre-image. Although they arrived at the well known result, the argumentation we present, by all appearances, is useful in developing the conception of fuzzy descriptions and operations because it involves a general mechanism for the goals contained in this text and it creates a mathematically rigorous conception of fuzziness from the point of view of this mechanism.
\section{Combined interval models}\label{sec:2.4}
\subsection{Combined and partial interval models} We consider combined interval models, which describe the results of two arbitrary random phenomena with outcome spaces \(\pspX\) and \(\pspY\).
\pagebreak % 92
Assume \(\pspX \times \pspY\) is the direct product of two outcome spaces, each element of which is a pair \((x, y),~x \in \pspX,~y \in \pspY\). The model \(\IM^{xy}\) on this product is called \textit{combined}. It is defined by the concurring means \(\umn^{xy} f(x, y),~\forall f \in \mathcal{F}^{xy}\), where \(\mathcal{F}^{xy}\) is the space of upper means (including, at least, all functions of two variables that are bounded from above). It is given by the primary means \(\pumn g(x,y),~g \in \pchars\), and thus \(\IM^{xy} = \langle \pumn \pchars \rangle\).
The characteristics \(f(x, y)\) of two variables are called \textit{combined}, while those of each variable \(f(x),~\phi(y)\) are called \textit{partial}; the partial characteristics form the subclass \(\mathcal{F}^x\) and, correspondingly, \(\mathcal{F}^y\), of the combined \(\mathcal{F}^{xy}\).
Means in the subclass of partial characteristics \(\umn f(x),~f(x) \in \mathcal{F}^x,~\umn \phi(y),~\phi \in \mathcal{F}^y\), clearly, are consistent with one another and define the \textit{partial models \(\IM^x\) and \(\IM^y\)}. So, \textit{partial models are obtained as a case of the means of a combined model}.
The next theorem allows us to find partial primary means from combined.
\begin{theorem}[On primary characteristics of partial models]\label{thm:2.1}
Let \(\IM^{xy} = \langle \pumn \pchars \rangle\) be a combined model and \(\pumn g(x,y),~g \in \pchars\) be the core of its primary means. Then the corresponding partial model \(\IM^x\) on \(\pspX\) will be defined by its characteristics of the form \(\inf_{y}~\sum_i c_i^+ g_i(x,y)\) with primary means equal to:
\[
\pumn^x[\inf_{y} ~ \sum_i c_i^+ g_i(x,y)] = \sum_i c_i^+ \pumn g_i(x, y),~g_i \in \pchars,
\]
for all possible choices of non-negative coefficients \(c_i^+,~i=1, \dots, k < \infty\).
\end{theorem}
\begin{smallpar} % TODO: perhaps this should be a proof environment
In fact, based on the general continuation formulae, centering characteristics, we have:
\begin{equation*}
\begin{aligned}
\umn^{xy} f(x) &= \inf_{c + \sum_i c_i^+ g_i(x, y) \geq f(x)} [c + \sum_i c_i^+ \pumn g_i] \\
&= \inf_{c,c_i^+} \{c~:~c + \sum_i c_i^+ [g_i(x,y) - \pumn g_i] \geq f(x)\},
\end{aligned}
\end{equation*}
and in the conditions on \(c, c_i^+\) the inequality must be observed for all \(y \in \pspY\), which is equivalent to substituting
\[
c+\inf_{y} [\sum_i c_i^+ (g_i(x, y) - \pumn g_i)] = c + h_c(x),~c = (c_1, \dots, c_k).
\]
into the left part of the function.
The function \(h_c(x)\) can be considered as a primary characteristic for \(\IM^x\) with null primary mean. For a linear combination of such characteristics \(\sum_i d_i^+ h_{c_i} (x)\) there is a majorizing characteristic \(h_{c^*}(x),~c^* = \sum_i d_i^+ c_i\) with null mean, so
\begin{equation*}
\begin{aligned}
\umn^x f(x) &= \inf_{c, c_i} \{c~:~c + \sum_i d_i^+ h_{c_i}(x) \geq f(x)\} \\
&= \underset{c, h_{c^*}}{\inf} \{c~:~c + h_{c^*} (x) \geq f(x)\},
\end{aligned}
\end{equation*}
from which follows the proof of the theorem.
\end{smallpar}
\pagebreak %
In this way, the primary characteristics for the partial IM will be the lower bounds \(\inf_{y}~g(x,y),~ g \in \slhull \pchars\) (the minimum is taken over the excluded variable) of the secondary characteristics of the combined model with preservation of means.
As an example, if \(\IM^{xy} = \langle \pumn g \rangle\) is defined by only one primary characteristic \(\pumn g(x,y)\), then \(\pumn[\inf_{y}~g(x,y)] = \pumn g\) will be primary for \(\IM^x\). if there are two primary means \(\IM^{xy} = \langle \pumn g_1 \rangle \wedge \langle \pumn g_2 \rangle\), then the primary means for the partial IM will look like:
\[
\pumn h_c(x) = \pumn \inf_{y}~[g_1(x,y) + c^+ g_2(x,y)] = \pumn g_1 + c^+ \pumn g_2.
\]
There are no longer two, but many because of the arbitrary choice of \(c^+\). In general, even with a finite set of primary means \(\pumn g_i(x,y),~i=1, \dots, k\) that specify \(\IM^{xy}\), there is no guarantee that the partial IM \(\IM^x\) will be defined by a finite number of primary values, except in a few cases that are discussed in the following example.
\begin{example}
Let \(\pspX = \{x_1, \dots, x_k\},~\pspY = \{y_1, \dots, y_l\}\) and let the primary probabilities, called combined, be given on the product of these spaces:
\[
0 \leq \upr(x_i, y_j) \leq 1,~\sum_i \sum_j \upr (x_i, y_j) \geq 1.
\]
These probabilities are consistent with one another and specify \(\IM^{xy}\). The probabilities \(\upr(x_i) = \min \{1, \sum_j \upr(x_i, y_j)\}\) will be primary for the partial IM \(\IM^x\). Any other characteristic having form according to Theorem \ref{thm:2.1}
\[
h_c(x) = \inf_{y}~\sum_i \sum_j c_{ij}^+ \delta_{x_i}(x) \delta_{y_j}(y) = \sum_i \underline{c}_i^+ \delta_{x_i}(x),
\]
where \(\underline{c}_i^+ = \min ~c_{ij}^+\) will be absorbed by the probabilities \(\upr(x_i)\) due to the inequality \(\pumn h_c(x) = \sum_i \sum_j c_{ij}^+ \upr(x_i, y_j) \geq \sum_i \sum_j \underline{c}_i^+ \upr(x_i, y_j) = \sum_i c_i^+ \sum_j \upr (x_i, y_j) \geq \sum_i \underline{c}_i^+ \upr (x_i)\). In the case of precise probabilities \(P(x_i, y_j)\) that represent the combined distribution \(\sum_i \sum_j P(x_i, y_j) = 1\), the partial distribution will also be precise, equal to the sum \(P(x_i) = \sum_j P(x_i, y_j)\), which is easily shown.
\end{example}
\begin{corollary*}
Partial characteristics \(g_i(x)\) having concurring means and being primary for the combined IM \(\IM^{xy}\) will remain exactly the same with the same means for the partial IM \(\IM^x\).
\end{corollary*}
In fact, when finding primary characteristics for \(\IM^x\) as in Theorem \ref{thm:2.1}, those characteristics \(g_i \in \pchars\) of the combined model \(\IM^{xy}\) that depend only on the variable \(x\) can be taken out of the infimum sign
\[
\inf_{y}[\sum_i d_i^+ g_i(x) + \sum_j c_j^+ g_j(x, y)] = \sum_i d_i^+ g_i(x) + \inf_{y} \sum_j c_j^+ g_j (x, y),
\]
from which it is clear that exactly these \(g_i(x)\) (not their linear combinations) will be primary for \(\IM^x\). Obviously, consistency of means from \(\IM^{xy}\) carries over to \(\IM^x\).
\begin{example}[Specification of a combined model with partial primary characteristics]
Here we will consider the case when the primary characteristics \(g \in \pchars\) of the combined model \(\IM^{xy}\) are split into dependence only on the variable \(x\) or only on the variable \(y\):
\[
g(x,y) = \left\{\begin{array}{c}
h(x),~~h \in \mathcal{H},\\
\psi(y),~~\psi \in \Psi,
\end{array}\right.
\]
so \(\pchars = \mathcal{H} \cup \Psi\). Also, we assume that their means \(\umn h(x), ~ \umn \psi(y)\) are consistent. These will be primary for the partial models (according to Theorem \ref{thm:2.1}), and, as can be shown without difficulty, the additive property of means extends to them:
\[
\umn^{xy}[f(x) + \phi(y)] = \umn f(x) + \umn \phi(y).
\]
Dividing primary characteristics into functions only on \(x\) and only on \(y\) is equivalent to separately setting \(\IM^x = \langle \umn \mathcal{H}\rangle\) and \(\IM^y = \langle \umn \Psi \rangle\) in the absence of data about the causal relationship (dependence) between outcomes of phenomena \(x \in \pspX\) and \(y \in \pspY\). The combined model is equal to the intersection \(\langle \pumn \mathcal{H}\rangle \wedge \langle \pumn \Psi \rangle\) of the partial models specified on \(\pspX \times \pspY\) by the sets of characteristics of only one variable.
\end{example}
\textsc{Relationships between combined and partial models}.
\begin{enumerate}
\item For a bare combined IM the partial model will be bare:
\[
\IM^{xy} = \probs^{xy} \implies \IM^x = \probs^x .
\]
It is clear that insofar as there is nothing known about \(\pspX \times \pspY\) there will also be no kind of data on \(\pspX\).
\item The hierarchy of inclusion relationships is preserved when going from combined models to partial models:
\[
\IM_1^{xy} \subset \IM_2^{xy} \implies \IM_1^x \subset \IM_2^x,
\]
so \(\umn_1^x f(x) = \umn_1^{xy} f(x) \leq \umn_2^{xy} f(x) = \umn_2^x f(x) .\)
The question arises: are the algebraic operations of union and intersection of models preserved? For the operation of union, the answer is yes:
\item \(\IM^{xy} = \bigvee \IM_\theta^{xy} \implies \IM^x = \bigvee \IM_\theta^x\), so
\[\umn^x f(x) = \umn^{xy} f(x) = \sup_{\theta}~\umn_\theta^{xy} f(x) = \sup_{\theta}~f(x).
\]
The operation of intersection, in general, is not preserved:
\item \(\IM^{xy} = \bigwedge \IM_\theta^{xy} \nRightarrow \IM^x = \bigwedge \IM_\theta^x\).\\
In fact, when \(\IM_\theta^{xy} = \langle \pumn \pchars_\theta \rangle\) the set \(\pchars = \bigvee \pchars_\theta \) with means \(\pumn g (x, y) = \inf_{\theta}~\pumn_\theta g(x,y),~g\in \pchars\) will be primary for their intersection. Now, according to Theorem \ref{thm:2.1} the primary characteristics for \(\IM^x\) will be
\begin{align*}
\umn^x [\inf_{y} \sum_i c_i^+ g_i(x,y)]
&= \sum_i c_i^+ \umn^{xy} g_i (x, y)\\
&= \sum_i c_i^+ \inf_{\theta} \pumn_\theta g_i(x,y)
\leq \inf_{\theta}[\sum_i c_i^+ \pumn_\theta g_i(x,y)],
\end{align*}
where the square brackets contain the primary means of the models \(\IM_\theta^x\), while the infimum corresponds to their intersection.
\end{enumerate}
\subsection{Specification of combined models with random transformations} At the beginning of this chapter we studied transformations \(\pspX \rightarrow \pspY\). These are certain operators that transform an ``input" \(x\) into an ``output" \(y\). We were interested, firstly, in ways of describing these transformations: deterministic \(y = \mathrm{s} x\) (Section \S \ref{sec:2.1}) and random (Section \S \ref{sec:2.2}), which are specified transitional models \(\IM_x^y\). Secondly, we were interested in calculating the ``output" model \(\IM^y\) by way of the transformation and the ``input" model \(\IM^x\). Here, we are interested in a different question: how can we specify a combined interval model \(\IM^{xy}\) on \(\pspX \times \pspY\) using random (and as a special case, deterministic) transformations?
Assume there is an input model \(\IM^x\), i.e. consistent \(\umn^x f(x),~f\in \mathcal{F}\) defined in a known way. And let there be a random transformation from \(\pspX\) to \(\pspY\) specified by the transitional models \(\IM_x^y,~\forall x \in \pspX\), i.e. for each \(x\), the transitional means \(\umn_x^y \phi(y),~\phi(y) \in \mathcal{F}_x^y\) are defined, where the class \(\mathcal{F}_x^y\) may in general be different for each \(x\).
We call the combined model on \(\IM^{xy}\) the \textit{product} of \(\IM^x\) on \(\IM_x^y\), which is denoted by:
\[
\IM^{xy} = \IM^x \IM_x^y,
\]
and defined by the means:
\begin{equation}\label{eq.2.4}
\umn^{xy} f(x,y) = \umn^x \umn_x^y f(x,y).
\end{equation}
The right hand side of \eqref{eq.2.4} is a sequential calculation for each \(x\) of transitional means \(\umn_x^y f(x,y)\), using \(\IM_x^y\) on \(f(x,y)\) as a function of the variable \(y\), and because the transitional means will be functions of \(x\), i.e. characteristics on \(\pspX\), the mean \(\umn^x\) can be obtained from them in the second step. The space \(\mathcal{F}^{xy}\) where model products exist consists of the characteristics \(f(x,y)\), which belong to \(\mathcal{F}_x^y\) for each \(x\) and for which \(\umn_x^y f \in \mathcal{F}^x\). This includes at least all bounded functions of two variables.
Formula \eqref{eq.2.4} is written as \(\lmn^{xy} f = \lmn^x \lmn_x^y f\) for lower means (one needs to substitute \(-f\) for \(f\)).
We will comment on the concept of products. As a matter of fact, if we interpret both \(\IM_x^y\) and \(\IM^x\) as families of precise models (with precise means), then in both the first step, consisting of the calculation of transitional means \(\lmn_x^y f, \umn_x^y f\), and the second step, wherein \(\lmn f, \umn f\) are finally found, each time we consider the worst possible options within the families, and we do this separately for \(\IM_x^y\) and \(\IM^x\) and separately for lower and upper means. These are the data we would have about the mean \(\mn f\) in the least favorable case for which any data are available (in the interval view or in the view of families) about an input model and about a random transformation.
Below we will need the following rather obvious distributive properties of a function of a variable \(x\) for the sign of a mean \(\umn_x^y\) of a transitional model:
\begin{equation*}
\begin{aligned}
&\umn^{xy} c^+(x) f(x, y) = \umn^x [c^+(x) \umn_x^y f(x,y)],\\
&\umn^{xy}[d(x) + f(x,y)] = \umn^x[d(x) + \umn_x^y f(x, y)],
\end{aligned}
\end{equation*}
where \(c^+(x)\) is an arbitrary non-negative function, and \(d(x)\) is any function of \(x\) (not taken from the class \(\mathcal{F}^{xy}\)).
\subsection{Recovering the factors of a decomposable model} A combined model \(\IM^{xy}\) that can be written in product form as \(\IM^x \IM_x^y\) is called \textit{decomposable}. We will show how to recover the model factors from a combined decomposable model. Concerning the first of these factors, \(\IM^x\), no problem arises: it is a partial model whose means \(\umn^x f(x)\) constitute a particular case of the means \(\umn^{xy} f(x,y)\) of the combined model.
The problem of recovering the second factor \(\IM_x^y\) is a bit more complex. For this, we need to pick out classes of characteristics \(f(x,y)\) that are ``specific" for the transitional models \(\IM_x^y\) as well as the means \(\umn^{xy} f\) on which the model will be defined. The specificity of these classes should manifest in the fact that for different \(x\) they are completely different, non-intersecting classes that are ``sharply responsive" to changes in \(x\). Hence, we might suspect that these are delta-shaped functions of \(x\).
As before, we will denote by \(\delta_{x_1}(x)\) the indicator function of an elementary event \(x_1 \in \pspX\). For brevity, we will denote \(\umn^{xy}\) by \(\umn\). We will introduce the function \(f(x,y) = \delta_{x_1}(x) \cdot \phi(y)\), where \(\phi \in \mathcal{F}^{xy}\). According to \eqref{eq.2.4}
\begin{equation*}
\begin{aligned}
\umn \delta_{x_1}(x) \phi(y) = \umn^x \umn_x^y \delta_{x_1}(x) \phi(y) &= \umn^x \delta_{x_1}(x) \umn_{x_1}^y \phi(y) \\
&= \left\{ \begin{array}{c}
\upr(x_1) \umn_{x_1}^y \phi(y),~\umn_{x_1}^y \phi(y) \geq 0,\\
\lpr(x_1) \umn_{x_1}^y \phi(y),~\umn_{x_1}^y \phi(y) < 0,
\end{array} \right.
\end{aligned}
\end{equation*}
where \(\lpr(x_1), \upr(x_1)\) are the probability bounds of the elementary event \(x_1\). Assume \(\upr(x_1) > 0\) and \(\umn \delta_{x_1}(x) \phi(y) \geq 0\). From the second inequality it follows that \(\umn_{x_1}^y \phi(y) \geq 0\); as a result
\begin{equation}\label{eq.2.5}
\umn_{x_1}^y \phi(y) = \frac{\umn \delta_{x_1}(x) \phi(y)}{\upr (x_1)},~\text{when } \umn \delta_{x_1}(x) \phi(y) \geq 0.
\end{equation}
This formula defines the means of the transitional model \(\IM_{x_1}^y\) for those \(x_1 \in \pspX\) such that \(\upr(x_1) > 0\).
Here the requirement that \(\umn \delta_{x_1}(x) \phi(y) \geq 0\) is not unnecessarily burdensome. Actually, if this inequality is not satisfied, and if \(\phi(y)\) is bounded, then it will be satisfied for the function \(\phi_c(y) = \phi(y) + c\) when \(c \geq - \inf~ \phi(y)\) because of the fact that \(\phi_c(y) \geq 0\). Defining \(\umn_{x_1}^y \phi_c(y)\) by \eqref{eq.2.5}, we therefore find that
\[
\umn_{x_1}^y \phi(y) = \umn_{x_1}^y [\phi(y) + c]-c,~ c \geq -\inf~ \phi.
\]
In this way, it is enough that \eqref{eq.2.5} holds for non-negative bounded functions \(\phi(y)\), since for unbounded functions it can be obtained by passing to the limit of their truncations. This is a kind of boundary continuation formula. And so, we have:
\textit{If \(\IM^{xy}\) is decomposable and the partial model \(\IM^x\) is such that \(\upr(x_1) > 0\) for all \(x_1 \in \pspX\), then formula \eqref{eq.2.5} allows for the recovery of the transitional IM from the combined model. In this case, the transitional and conditional IM (when \(x\) occurs) coincide with one another.}
The last part of the above follows from Addendum paragraph 1 and the point therein that a transitional model can be recovered while defining the conditional according to \S \ref{sec:1.6} given past elementary events \(x_1 \in \pspX\).
\subsection{Decomposability of a combined model} Can any combined model \(\IM^{xy}\) be decomposed into partial and transitional (conditional) products, i.e. can one interpret the relationship between outcomes \(x \in \pspX\) and \(y \in \pspY\) through the actions of a random operator? Alas, this is far from true! The substitution of conditional models in \eqref{eq.2.4} will lead to an extended model compared to \(\IM^{xy}\): \(\umn^x \umn_x^y f(x,y) \geq \umn f(x,y)\).
Let \(\IM^{xy}\) be a combined model. What properties do its consistent means need to satisfy so that it is decomposable? To resolve this question, we will proceed as though \(\IM^{xy}\) were decomposable, i.e. we will calculate the transitional means according to \eqref{eq.2.5}, substituting \(\phi(y)\) for \(f(x,y)\):
\[
\umn_{x_1}^y f(x,y) = \frac{\umn \delta_{x_1}(x) f(x,y)}{\upr(x_1)}~\text{when}~\umn \delta_{x_1} f \geq 0,~~\upr(x_1) > 0.
\]
Further, we will see whether it works to substitute the values \(\umn f(x,y)\) into the expression in the right hand side of \eqref{eq.2.4}, and whether it works for all \(f \in \mathcal{F}^{xy}\), as this will give us a decidedly good reason to believe that \(\IM^{xy}\) is decomposable.
\begin{theorem}[On the decomposability of combined models]\label{thm:2.2}
If \(\upr(x) > 0,~\forall x \in \pspX\), then for a model \(\IM^{xy}\) to be decomposable into products \(\IM^x \IM_x^y\) it is necessary and sufficient that for all \(x_1 \in \pspX\) and any non-negative \(f^+(x,y)\) from \(\mathcal{F}^{xy}\), the identity
\[
\umn^{x_1} \left[\begin{array}{c}
\umn \delta_{x_1}(x) f^+(x,y) \\
\upr (x_1)
\end{array} \right] \equiv \umn f^+(x,y)
\]
holds, where \(\umn^{x_1}\) is a mean of the partial model \(\IM^x\). When this holds, the transitional model coincides with the conditional.
\end{theorem}
The proof is in the Addendum paragraph 2.
Let's comment on the requirement that the upper probability is non-zero \(\upr(x) > 0\) in the theorem. In the majority of realistic problems the number of data about an event is finite, which corresponds to models \(\IM^{xy}\) of finite size. For these, (if impossible outcomes are excluded) the upper probabilities of individual events are necessarily non-zero. A zero probability \(\upr(x) \equiv 0\) is a limiting case when a model becomes infinitely precise. Theorem \ref{thm:2.2} and its basic identity should be interpreted on this basis.
The assumptions of Theorem \eqref{eq.2.2} are quite strong and difficult to verify. We will note one simple property, necessary for the decomposability of a combined model. It consists of the fact that for all \(\phi(y) \in \mathcal{F}^{xy}\) and \(c\) there must be an equality
\[
\umn \delta_{x_1}(x) [\phi(y) - c] = \left\{ \begin{array}{c}
\umn \delta_{x_1}(x) \phi(y) - c \upr (x_1)~~\text{when} \geq 0,\\
\umn \delta_{x_1}(x) \phi(y) - c \lpr(x_1)~~\text{when} <0,
\end{array} \right.
\]
where the conditions \(\geq 0\) and \(<0\) depend on the value of the mean on the left hand side. The property of decomposable models under consideration is precisely the same as that in the derivation of \eqref{eq.2.5} when \(c=0\). It is illustrated in Figure \ref{fig:2.6}, in which the values of a mean are graphed as a function of the translation parameter \(c\). In the region of positive mean values, or, more concretely, for those \(c\) such that \(\umn_{x_1}^y \phi(y) \geq c\), the function is linear. For the negative mean values, corresponding to \(\umn_{x_1}^y \phi(y) < c\), it is also linear. Between the functions there is a kink.
\begin{figure}[b]
\centering
***
\caption{The property of model decomposability.}
\label{fig:2.6}
\end{figure}
This course of reasoning remains valid if \(\phi(y)\) is exchanged for \(f(x,y)\). Then \(\delta_{x_1}(x)[f(x,y) - c]\) is, in fact, a vertical delta-slice of the function \(f(x,y) - c\) at the coordinate \(x = x_1\), and the property under consideration is the linearity of the operator \(\umn\) with respect to translation parameter \(c\), which is refracted according to Figure \ref{fig:2.6} upon intersection with the axis.
\subsection{Primary means of decomposable interval models} Let \(\IM^{xy} = \IM^x \IM^y\) be a decomposable combined model and let \(\IM^x = \langle \pumn \mathcal{H}\rangle,~\IM_x^y = \langle \pumn_x^y \Psi \rangle\) be given with upper primary values \(\umn^x h(x),~h \in \mathcal{H};~\umn_x^y \psi(x,y),~\psi \in \Psi\), where \(h(x)\) are partial characteristics, and \(\psi(x,y)\) are primary characteristics on \(\pspY\) for given \(x\) values, which are called transitional.
Here, we study the connection between primary means of a combined model and model-factors.
\begin{theorem}\label{thm:2.3}
An interval model of the product form \(\IM^{xy} = \langle \pumn^x \mathcal{H} \rangle \langle \pumn_x^y \Psi\rangle\) is defined by the centered characteristics
\[
\mathring{\pchars} = \{h(x) - \pumn h,~h \in \mathcal{H}\} \cup \{c^+(x) [\psi(x,y) - \pumn_x^y \psi],~\psi \in \Psi,~\forall c^+(x)\},
\]
all having zero primary means: \(\pumn \mathring{g} = 0,~\forall \mathring{g} \in \mathring{\pchars}\), and the consistent means \(\pumn_x^y \psi = \umn_x^y \psi\) correspond to the consistent values \(\pumn \mathring{g} = \umn \mathring{g},~\mathring{g} = c^+(x)[\psi - \umn_x^y \psi]\).
\end{theorem}
Before we prove the theorem, we will give its interpretation. We will denote by \(\mathring{h}(x) = h(x) - \pumn h,~\mathring{\psi}(x,y) = \psi(x,y) - \pumn_x^y \psi\) the centered, i.e. having zero upper mean \(\pumn \mathring{h} = \pumn_x^y \mathring{\psi} = 0\), primary characteristics of the partial and transitional models, respectively. Clearly, \(\langle \pumn^x \mathcal{H}\rangle = \langle \pumn^x \mathring{\mathcal{H}}\rangle\), \(\langle \pumn_x^y \Psi\rangle = \langle \pumn_x^y \mathring{\Psi}\rangle\). Hence, the theorem asserts that the centered (with zero mean) primary characteristics of the product will be the partial characteristics \(\mathring{h}(x) \in \mathring{\mathcal{H}}\) in addition to combined characteristics of the form \(\mathring{\psi}(x,y) c^+(x),~\mathring{\psi} \in \Psi\), which are equal to the centered transitional characteristics multiplied by arbitrary non-negative functions \(c^+(x)\) of the variable~\(x\).
\begin{smallpar}
For the proof, we will write a general expression of a mean \(\umn f(x,y)\) of a decomposable model via centered primary values of the factors, realized according to the two-step calculation of \eqref{eq.2.4};
\[
\umn f(x,y) = \umn^x \umn_x^y f(x,y) = \umn^x C(x) = \inf \{d~:~d + \sum_j d_j^+ \mathring{h}_j(x) \geq C(x)\},
\]
where \(C(x) = \inf \{c(x)~:~c(x) + \sum_j c_j^+(x) \psi_j(x,y) \geq f(x,y)\}\).
Taking together two bounds: one in the choice of \(d\) and the other in the choice of \(c(x)\), we can substitute \(c(x)\) for \(C(x)\), which reduces calculating the mean to finding \(\umn f(x,y) = \inf \{d~:~d_j^+ \mathring{h}_j(x) \geq c(x),~c(x) + \sum_j c_j^+ \times \psi_j(x,y) \geq f(x,y)\}\).
Because \(c(x)\) is arbitrary, the first bound can be changed to an equality.
Substituting this \(c(x)\) into the second bound, we obtain \(d + \sum_j d_j^+ \mathring{h}_j(x) + \sum_j c_j^+ (x) \mathring{\psi}(x,y) \geq f(x,y)\), which corresponds to the assertion of the theorem concerning the characteristics that define the model.
\end{smallpar}
\textsc{Comment.} In Theorem \ref{thm:2.3}, the centered characteristics \(c^+(x) \mathring{\psi}(x,y)\) will not necessarily be consistent for all \(c^+(x)\) and not all consistent characteristics will necessarily be primary. For example, any function \(c_1^+ (x)\) is a primary characteristic of a combined model, while the form \(b^+ c_1^+(x) + c\) yields a characteristic that is outwardly different, but is in fact the same. This also applies to \(c_j^+(x) \mathring{\psi}_j(x,y),~b^+ c_j^+(x) \mathring{\psi}_j(x,y)\); therefore the coefficients \(c^+(x)\) can be normalized in some way, assuming they take values between \(0\) and \(1\).
It follows from Theorem \ref{thm:2.3} that due to the fact that the coefficients \(c^+(x)\) are arbitrary, the products of the models will be defined by a much larger number of primary characteristics than those of the components of the model taken together. In particular, the size of the product IM is incomparably larger than the sizes of the factors, and may in principle be infinite.
\pagebreak %
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\begin{addendum}
\item \textsc{***}
***
\pagebreak
\item \textsc{***}
***
\item \textsc{***}
***
\end{addendum}
\section{Independence}\label{sec:2.5}
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
\begin{figure}[b]
\centering
***
\caption{***}
\label{fig:2.7}
\end{figure}
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\begin{addendum}
\item \textsc{***}
***
\pagebreak
\item \textsc{***}
***
\item \textsc{***}
***
\end{addendum}
\pagebreak %
\begin{conclusions}{sec:2.6}
Section~\ref{sec:2.1} examines the effect of transforming one outcome space into another on the appearance of an interval model.
Everyone knows the formulas for density transformations and how complex they become for nonlinear and inertial transformations (just remember the calculations of the filter-limiter-filter system).
For interval models, the calculation principles are much simpler and more universal.
Among the huge variety of features, there will always be those consistent with the transformation, called representable features, which, firstly, are completely elementary transform(?) themselves, and secondly, for them, a direct transfer of the means from the input to the output is carried out, thus determining the primary data of the output model.
It is only necessary to calculate the mean of representable features at the input.
The calculation procedure will be simplified if the primary characteristics at the input are all representable and consistent, then the actions will be a direct transfer of the means from the input to the output, and the models turn out to be similar.
Similarity means the similarity of structures and the absence of irreparable damage during the transformation.
In particular, all probability densities are similar to each other, since they transform one into another when transforming random variables.
Random transformations \S\ref{sec:2.2} as a mathematical record of vague, indefinite actions are given by interval models of the output phenomenon for each value of the input and are called transitional models.
The randomness of transformations adds uncertainty at the output, leading to feature stratification and mean broadening.
Their special case leads to the interval arithmetic operations inherent in interval analysis.
The random transformation is visually compared to the agitated aquarium of a translucent inhomogeneous liquid through which we look from the room to the street.
Objects become distorted, indistinct, difficult to distinguish.
We no longer see a tree and a car, but their vague random outlines, along which averaged images are drawn.
These will be the fuzzy observations of \S\ref{sec:2.3} as a result of vague random\footnote{ % TODO: footnote probably not attached to right word
Precisely random ones with inherent statistical laws, in contrast to the theory of fuzzy sets by Zadeh [15].
}
transformations, where the input is a certain subject area (street) inaccessible to us, and the output is observations that are tied into judgments about what is happening.
By the way, in human language, words and phrases also have subject meanings and refer to the subject area according to the principle: we are used to understanding this way, that is, statistically on average.
In this perspective, a person is a kind of random transformation of an input (which we are talking about) into an output (which we are talking about).
Interval mean can be viewed as a transformation of numbers into intervals, that is, as a kind of result of “seeing” the exact means (if any) through the prism of a limited experiment, forcing one to resort to careful estimates in the form of intervals.
If you look a little closer, then these will no longer be intervals, but their vague analogs, giving blurry images of averages.
Defined for all features, they constitute a fuzzy model (as an example of fuzzy transformations, fuzzy arithmetic is constructed at the end of \S\ref{sec:2.3}).
This made the fourth step on the way from determinism to randomness in their following chain: 1) deterministic phenomena, 2) random phenomena given by probability distributions, 3) given by interval models, 4) given by fuzzy models.
Further the road, it seems, is not visible.
Transformations are not the only ones capable of reflecting the joint behavior of two random phenomena.
This is done more widely and fully by the joint models of \S\ref{sec:2.4}, which are given by the primary means of the joint features of both phenomena and continue to all the others.
The means of particular features together form particular models with their own structure of primary data (Theorem~\ref{thm:2.1}).
Not all models, alas, can be considered as the results of some transformations, but only a subclass of models called decomposable.
Decomposable models are written as products of particular models and transitional ones.
The special case, when the transitional models do not depend on the input, leads to the concept of the freedom of the output from the input, as if someone arbitrarily disposes of the output within the framework of the model, knowing the input, which to take into account (to consider which or not) is his business.
Independence is ignorance of anything additional about one phenomenon if the result of another has become known, and vice versa.
The property is symmetrical.
Independence as an objective and subjective reality embraces the entire phenomenon with all their signs and through the average signs is defined in \S\ref{sec:2.5}.
The breadth of this concept turns out to be dependent on the data on the phenomena.
If exact values of probabilities (means) are given, then independence is reduced to the property of multiplicativity, and for interval values - to an interval analogue of this property.
And if they are not known at all, then ... independence always takes place!
Doesn't this follow from the meaning of the concept?
And the following conclusion is interesting from this that independence can be achieved by expanding the joint model (in fact, by forgetting connections).
So that the conclusion does not seem too strange, we recall that the model is just a mirror of the phenomenon, a reflector of its sides-properties in its images and in its own language, therefore, independence is manifested in a certain overorganization of the joint model, of the features of its structure that can be achieved by expansion of the model. % TODO: check translation; translator unsure
The foregoing refers to the concepts of uncorrelatedness and uncovariance in their interval definitions.
These are weaker properties in comparison with independence, that is, they correspond to a broader joint model.
The connection between them is discussed in the last section of the chapter.
It must be said that the classical definition of independence as the multiplicativity of exact probabilities on algebras of events is, in our terms, just non-covariance of algebras, and this is somewhat weaker than true independence.
\end{conclusions}
\clearpage
\chapter{Random variables, sequences, sums}\label{cha:3}
\section{Random variables, sequences}\label{sec:3.1}
\pagebreak %
***
\pagebreak %
***
\begin{equation}\label{eq:3.1}
***
\end{equation}
***
\begin{equation}\label{eq:3.2}
***
\end{equation}
***
\pagebreak %
***
\begin{equation}\label{eq:3.3}
***
\end{equation}
***
\begin{equation}\label{eq:3.4}
***
\end{equation}
\begin{theorem}[***]\label{thm:3.1}
***
\end{theorem}
***
\pagebreak %
***
\begin{equation}\label{eq:3.5}
***
\end{equation}
***
\pagebreak %
***
\begin{equation}\label{eq:3.6}
***
\end{equation}
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
\section{Convergences}\label{sec:3.2}
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\begin{equation}\label{eq:3.7}
***
\end{equation}
***
\vfill
\subsection{Convergence of the arithmetic mean, the law of large numbers}
In the theory of probability and the interval theory of models of random phenomena developed by us, this law is of key character.
In terms of its internal content, the average, according to the definition of \S\ref{sec:1.1}, is a physical quantity, attainable as the limit of the arithmetic mean of the results of observations of the attribute \(f\) in a series of independent identical repetitions.
For stable phenomena the limit is: the number \(\mn{f}\) will be, and no matter how many times we return to a new series of tests, it is the same thing.
And for unstable ones, these will be different numbers in each series, but located on some one and the same interval \([\lmn{f},\umn{f}]\), the wider the deeper the internal laws of the phenomenon generation are “struck” by instability.
\pagebreak
Now the task is to check whether the theory we have built itself confirms the original meaning that was invested in its construction? % TODO: why question mark?
This will be the main criterion for the consistency of the theory (if we relate to the following criteria the availability of the theory, the interpretability of the parameters and the ease of application).
All the data for this verification are already available: independence has been defined and, as a form of its manifestation, non-covariance, the concepts of convergence have been introduced.
Let's start exploring the arithmetic mean.
***
\begin{equation}\label{eq:3.8}
***
\end{equation}
***
\pagebreak %
***
\begin{equation}\label{eq:3.9}
***
\end{equation}
***
\pagebreak %
***
\pagebreak %
***
\begin{addendum}
\item \textsc{***}
***
\pagebreak
\item \textsc{***}
***
\end{addendum}
\section{Prelimit and limit problems}\label{sec:3.3}
\pagebreak %
***
\vfill
The estimates in the form of the right-hand sides of the inequalities for the mean values of the power functions (hence the polynomials) form a set that approximates the sum model from above, that is, allows one to form an extended sum model using only the knowledge of the moments of the terms.
One of the advantages of just such an extension is the direct physical interpretability of the characteristics on which it is based: the first moment is the mean ***, the second is the average statistical power, etc.
And another important advantage is due to the fact that, according to the first Weierstrass theorem [23, p. 39], any continuous (and piecewise-continuous) function on a bounded segment can be approximated by polynomials arbitrarily accurately in the uniform metric, which makes the class of power-law tests very widespread.
Power-law features \(x_k\), \(k=1,2,\dots\), form \emph{the first universal class of features}.
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\begin{equation}\label{eq:3.10}
***
\end{equation}
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
\begin{equation}\label{eq:3.11}
***
\end{equation}
***
\begin{equation}\label{eq:3.12}
***
\end{equation}
***
Together with \eqref{eq:3.12}, the right-hand sides of \eqref{eq:3.11} are the prelimit estimates for the moments of sums of symmetric ***.
The most remarkable thing about \eqref{eq:3.11} is that only the last term becomes significant as \(n\) increases.
This fact occupies a key position in the limiting problem, to a preliminary coverage of which for homogeneous sequences we pass, leaving in \S\ref{sec:3.4} the development of the prelimit and limiting problems to general sums.
\subsection{An introduction to the ultimate problem}
Its essence consists of those limiting simplifications for \(n\to+\infty\), which are obtained with exact values or with approximate ones in the form of the right-hand sides of inequalities for the average values of the key features of normalized sums.
Limit theorems, firstly, give an answer to the question for which specific features and under what conditions the means admit nontrivial approximations.
And secondly, they point to the limiting values of the approximations at \(n\to+\infty\) that form the extended limiting model of sums.
Our ultimate goal is to trace how, as data on the terms are accumulated and refined (their models are narrowed), the limiting model of the normalized sum narrows and, ultimately, converges to the normal one.
The convergence to the normal law can be traced in two universal directions: on the class of power criteria and on the class of harmonic, according to Theorem~\ref{thm:3.1} of the characterization of normal ***.
For power-law tests, convergence means convergence of moments to normal values ***
\pagebreak %
***
\vfill
Under the conditions of the investigation, it is quite reasonable to believe that harmonic averages should also converge to their normal values.
Indeed, the vanishing of sinusoidal means follows from the symmetry of the terms.
And the convergence to normal values of cosine means will be a consequence of limit theorems for sums of general form, which will be discussed in the subsequent sections.
\pagebreak %
Thus, there are two directions for proving the limit laws of normal convergence: power and harmonic.
In anticipation of the normal law, when either \(n\) is finite (prelimit case), or the conditions on the terms are insufficient for normal convergence, both directions do not replace, but complement each other; each of them gives its own facets of the prelimit model, each in its own way characterizes the approximation to the normal ***.
We have formulated the simplest version of the limiting law, demonstrating the main idea, the general idea.
At the same time, the requirements for the terms were very strict: all have zero mean and zero odd moments, as well as exact identical variances, which by itself implies the stationarity of these parameters and the necessary statistical stability of the sample, and, ultimately, absolute knowledge.
Let's assume for a moment that the sequence is “slightly” statistically unstable.
***
***
\vfill
***
\begin{addendum}
\item[] \textsc{***}
***
\pagebreak %
***
\end{addendum}
\section{Limit models of general type sums}\label{sec:3.4}
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\begin{addendum}
\item \textsc{***}
***
\pagebreak
\item \textsc{***}
***
\pagebreak
\item \textsc{***}
***
\end{addendum}
\begin{conclusions}{sec:3.5}
Whereas the previous two chapters dealt with general outcome spaces, then here and below --- the number spaces.
The results of random phenomena will be random variables (r.v.), sequences, then processes.
Outcomes will now be linked not only by multiple relationships, but also numerical: they can be ordered, added to each other, multiplied by a scalar, transformed according to the rules of action with numbers, vectors, functions.
Such possibilities are realized as in the new ways of working with r.v. and sequences (see \S\ref{sec:3.1}), so also new forms of representations.
Random variables (and sequences) in the general structure are set by the mean of the features, and the features are all sorts of transformations on the number line - functions of one (many) variables.
Including those unlimited, which are dominated by primaries(?).
That is why it does not always exist as an average of the r.v. itself, because the features in the form of an identical transformation of a straight line are not limited, and also moments, in particular, the root-mean-square value (with an exact mean, the variance).
The most common representative of r.v. is normal, in our construction given in three equivalent ways, three different sets of means: 1) using the density, therefore, the probabilities segments; 2) moments; 3) harmonic means in the form of a characteristic function.
% TODO: enumeration
In the direction of these sets it is also convenient to judge the degree of approximation to the normal r.v..
The need for limiting results, which make up the most significant part of the chapter, required the definition of the concepts of convergence of models (IM convergence), which consists in approximating the means of one model to another.
The main reason for introducing IM-convergence is its focus on limiting results for sums of independent r.v.
In particular, it is used to formulate the law of large numbers as applied to statistically unstable sequences (Theorem 3.7). % TODO: theorem \ref
All have long been accustomed, that when adding independent r.v. have the right to expect a normal limit law, assuming the well-known Lindeberg--Feller conditions to be satisfied.
And if these conditions are not met, which is quite natural given their initial rigidity, which consists in the exact knowledge of the mean terms and their variances, as well as unlimited growth of their number?
Imagine for a moment that the averages of the terms are not quite accurate, that is, at least a little, but interval, and you will immediately find yourself in a dead-end situation, since the average of the normalized sums will become an interval running up in width, and the limit is simply lost.
It will be lost in the framework of the classical approach, but not the interval one, which can cover any intermediate cases, and even for finite sums, that is, the prelimiting case.
\pagebreak
In general, the prelimiting and limiting results are important because the operation of summation spontaneously participates in the practice of giving birth to many r.v..
So, the error in the manufacture of a part is the result of overlapping various factors.
The addition lies in the roots of the filtering procedure, etc.
And it is useful to know that not only information about the features of the terms (in particular, the probabilities, the mean of the r.v.) are transferred according to the corresponding formulas to the sums, but the summation itself assumes within itself the refinement of the averages along the characteristic directions, given by power and harmonic signs.
Such directions are universal due to purely arithmetic properties, which are manifested when substituting sums in place of the argument.
Their averages will be the boundaries of the moments, harmonic averages, combined into a characteristic function (not necessary for us) with exact probability distributions.
Depending on the number of terms and the data on them for the sums, sometimes more or less wide interval models are obtained, determined by their averages along the universal directions, both staid and harmonic.
It is interesting to trace how the nature of the data on the terms affects the width of the prelimiting and limiting models.
And as in the extreme case of exact data satisfying the classical conditions, the normal model will become the limiting one
(see Theorem 3.17). % TODO: theorem \ref
The presentation of the results is structured in such a way that first, in \S\ref{sec:3.3}, the case of homogeneous terms is considered, which allows us to understand the essence, and then in \S\ref{sec:3.4} it is carried over to inhomogeneous sums of a general form, where the laws are more general, but also more complex.
New prelimit and limiting statements make it possible, in full in terms of IM, to reveal the data arising from the summation long before the sums of the stages are extremely normal, and even if they cannot become extremely normal.
\end{conclusions}
\chapter{Stochastic process}\label{cha:4}
\section{Descriptions of stochastic processes}\label{sec:4.1}
\subsection{Description principle}
Time is an inexorable mover, it runs and runs tirelessly.
In this non-stop running, events occur that are called random on the basis that the fact of their appearance or non-appearance is not predicted absolutely accurately.
But time gives us another manifestation of randomness: you can reliably know that an event will occur, but not know the moment of occurrence, and the event becomes random in time, i.e., a random process.
In general, any random or non-random events, if we take into account their position in time, form a process.
Is such an absolutization of random processes convenient?
Our goal - the construction of mathematical models - obliges not to complicate, but to simplify.
The introduction of time as an independent parameter is justified under the following circumstances.
Firstly, if the moment of appearance of co-existence seems important, for example, in radar, where the delay of the reflected pulse carries information about the distance to the target.
Secondly, when describing physical phenomena, the coherence and naturalness of the course of development of which is unthinkable without time, such as the growth of agriculture, technological processes, signals of a dynamic system, noise, interference, and so on.
Processes in nature can be very diverse: shot and atmospheric noise, transport and industrial, impulse and harmonic interference, all kinds of flows in queuing systems, etc.
The task of the researcher is to develop as economical and simple descriptions as possible, achieved by identifying the most significant, important aspects, a kind of “personal data” of processes, followed by dressing these data in the “toga” of primary averages.
Actually, in the model itself, due to the choice of primary features, there is a potential for simplifications, directed reductions, and the more economical the model, the smaller the amount of data it is given, the easier the process in our representation, that is, in the form that is convenient for us with ***.
This is the fundamental position of the theory.
And the transition to discrete time is then natural as one of the types of reduction.
\subsection{Realizations and signs}
Formally, \emph{a random process} (or simply a process) \emph{is a system of random variables ***}
\vfill
***
\vfill
***
\pagebreak %
In our opinion, for real, physically, so to speak, tangible models, each possible implementation should have a nonzero upper probability \(***>0\), \(***\in \pspX\).
This will be the case if there is a finite number of primary data on the process and they are not absolutely accurate, that is, in a sense, blurred.
\begin{smallpar}
For the needs of the theory, in spite of what has been said, one cannot exclude the extreme option when the probabilities of all individual realizations are zero, understanding it as limiting or ideal, corresponding to an unlimited set of data.
In this variant, the reliable set of realizations \(\pspX\) may not be uniquely determined (equivalent to the ambiguity of the zero set).
Indeed, if \(\pspX_k\), \(k = 1, 2, \dots\) are different variants of \(\pspX\), then their final intersections will necessarily be reliable, but in no way countable, since the intersection of all \(\pspX_k\) does not lead to a reliable set, which means that the smallest of them does not exist (equivalent to the fact that the union of a countable number of zero sets does not lead to a zero set).
\end{smallpar}
Any set of implementations including at least one \(\pspX\) variant will be valid.
It is not always necessary to strive to make \(\pspX\) as narrow as possible, but it is desirable that it be as simple as possible (even at the expense of some expansion).
Before approaching the description of the processes, let us recall the general construction of interval models.
It remains uniform for any random objects, be it random variables, sequences, processes, and finally, fields, and consists of three steps:
\begin{enumerate*}[label=\arabic*)]
\item
the structure of features with their mutual semi-ordering is analyzed;
\item
primary ones are allocated, on which primary means are set;
\item
the primary averages continue to all other signs, forming a pattern.
\end{enumerate*}
The complexity of the model will be determined by the number of primary features and, of course, their structure, and the “pitfalls” will be the dimension of the space \(\pspX\) and the associated difficulties in controlling the persistence of features, which will be discussed below.
***
\vfill
***
\vfill
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\vfill
***
Here, this transition, as a matter of fact, does not change the data about the model.
And since any models can be reduced to the specified form of primary functionals by extension, discretization, in general, is nothing else in the sense of the theory as a simplifying extension of the process model.
The same can be said for \emph{quantization}, which consists in bringing the process values to preselected levels.
\pagebreak %
So, any transformations of a process entail extensions of its model, except for similarity transformations, since they retain all the known properties of the process.
\subsection{Characteristic features of the processes}
The first and direct way to define processes is to isolate characteristic statistical properties and dress them in the form of primary means.
For this, in general, any number of them can be allocated.
Moreover, our choice is considered successful only when, without a serious expansion of the model body, the number of primary values turns out to be smaller, since as a result the model will be simpler.
Capturing the most important and distinctive features of a process and embodying it in a model is the art of mastering a mathematical language based on engineering intuition.
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\begin{addendum}
\item \textsc{***}
***
\pagebreak
\item \textsc{***}
***
\end{addendum}
\section{Correlation properties}\label{sec:4.2}
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
\section{Homogeneous and stationary processes}\label{sec:4.3}
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
***
\subsection{***}
***
\vfill
***
\vfill
A process is called \emph{\(***\)-stationary (partially stationary)} if all features of the set \(***\) are stationary, and \emph{stationary} if any features of \(***\) are.
The latter definition requires comment.
Is there a stationary process in nature at all, that is, such that whatever feature \(***\) is taken, it turns out to be stationary?
If yes, then this implies, firstly, the possibility of exact knowledge of the average of any feature, and secondly, the complete identity of the work of the internal statistically stable mechanism of the process in time.
But even if there is no such process, all the same, the absolutization of stationarity as a mathematical abstraction turns out to be very convenient as the certainty that no matter what set of \(***\) features is taken, with a shift in time, their averages, if they suddenly become precisely known, do not change.
Bearing in mind that the practical choice will always be limited by our capabilities, so in fact we will deal with partial stationarity.
So, stationarity is the statistical stability of a process in two directions.
First, according to the ensemble, when for a process, if it were possible to repeatedly “scroll” it under the same conditions, the existence of exact means of the statistical \(***\) of its signs \(***\) is established (otherwise declared).
And secondly, in terms of time, when the invariability of these statistical averages is confirmed over the course of the process over time.
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
\pagebreak %
***
\pagebreak %
***
\pagebreak %
\section{Linear transformations of a process}\label{sec:4.4}
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
\begin{conclusions}{sec:4.5}
A process is a phenomenon whose outcomes are realizations of time.
Modern theory knows different approaches to describing random processes.
The most general, consisting in setting a process with consistent multidimensional probability distributions, turns out to be complex, ineffective, and difficult to apply.
The exception, perhaps, are normal processes, and that is because of their proximity to another approach to the definition of the process by its correlation properties.
The correlation approach is easy to understand, relies on the physical nature of the inertia of processes, is interpreted across the spectrum and is widely used in all areas.
Another approach is to use functional transformations of standard processes, usually white noise (this is how diffusion processes are defined), and occupies an intermediate position between these two.
All classical approaches are characterized by constructions that are absolutely complete in terms of detail: if the probabilities are known, then the entire set, and the same with correlations.
Nevertheless, in the very existence of the correlation approach and its successes, there is a tendency of a forced departure from the absolute towards simplifications, because correlations are only an integral part, a fraction of the entire boundless arsenal of probabilistic properties.
The requirement for further simplifications makes it necessary to open up space for any partial, abbreviated descriptions, tasks of the process with its individual properties, and not necessarily in an exact, but possibly in a blurred, interval form.
Incomplete for the classical theory, such constructions turn out to be quite complete and rigorous, even natural for interval models, where any probabilities, correlations, moments (exact or interval) as fragments of averages, if taken as primary, already somehow determine the process, and the smaller the number, the easier it is.
The preservation of the obligatory, most visible, characteristic features and the “forgetting” of all the secondary ones brings the complexity of the descriptions of the process to a level that does not differ much from the models of phenomena with simple outcomes (such as discrete and continuous random variables, sequences), and it comes often down to them.
And this requires a directed selection of features that functionals serve on the implementation space, and the specification of their means.
Such for impulse noise can be the probabilities of exceeding one or a grid of levels.
The process generated by an inertial device is characterized by partially known correlations, a correlation interval, properties of continuity of realizations, etc.
Any desired traits with the appropriate skill are translated into the primary average language.
The new approach requires a revision of some provisions of modern theory.
So, not every process is written as the sum of its mean and remainder, but more general additive representations are true (end of \S\ref{sec:4.1}).
Unusually, insidiousness is determined if the average is interval.
Well-known techniques are also used.
Thus, a process defined by correlation properties (of the second order) is represented by families of exact own means and covariance functions.
Simplification of the structure of descriptions can be achieved due to the invariability of the properties of the process in time, which allows, setting the mean at one origin, transfer them by shift to all the others, thereby increasing the number of primary data.
What has been said is covered by the concept of homogeneity of the process as invariance in time of the external appearance in the form of primary means, and hence all the others.
The concept of stationarity as the preservation in time of the internal, sometimes invisible microstructure of the process model turns out to be much more subtle and complex.
Stationarity makes it possible to transfer the process to the spectral region, and not so much to transfer it (as it can be done for other processes as well), but to single out simple properties of the spectrum, especially useful in problems of stationary filtration and in describing processes through spectral twins.
Spectral descriptions are so autonomous that they allow one to specify degenerate processes like white noise, which are not twins of anyone, but are extremely convenient for representing the properties of other, quite real processes.
\end{conclusions}
\part{Statistical synthesis}\label{part:2}
\chapter{Decision-making theory}\label{cha:5}
\section{Statistical models}\label{sec:5.1}
\subsection{What is Mathematical Statistics?} % TODO: deal with question mark/period
In any activity, be it production, social sphere or everyday life, you have to make decisions.
For the latter concept, a scientific meaning is put in, and much more voluminous than the everyday one.
It is considered a solution, firstly, the answer to the question, what is the value of the parameter of interest to us, which determines the position of the object or the state of the system, then we have the problem of estimating the parameters.
For example, determining the range to the target.
The problem of estimation covers solutions presented in the form of an interval (interval estimates) and generally in the form of a fuzzy event (vague estimates).
Such an assessment is called confidential, when vagueness serves as a guarantor of reliability.
Secondly, the solution is a direct selection of the values of a certain physical quantity in its course in time, then we have a filtering problem.
Thirdly, the decision will be the choice of one of the two hypotheses such as is the goal or not, the parameter is zero or not zero, to be or not to be, etc., etc.
Fourth, this is a test of one of many hypotheses, for example, in which of several working (frequency) channels there is a signal, which of those presented for identification is a criminal (culprit?), what mark the student passing the exam knows, etc.
If this is a choice of one of a discrete set of values of a numerical parameter, then as the discretes approach and, accordingly, an increase in their number, the problem of testing hypotheses approaches the estimation, since the solution will increasingly be reduced to the choice of a specific value of the parameter.
Thus, the general problem of decision making breaks down into subject areas that are closely related to each other.
What is needed to make decisions?
Nourishment for solutions is provided by observations that are somehow associated with the parameters of the state of the system of interest to us.
If the desired parameters can be observed or measured directly, then there is no problem and we get an absolutely accurate result.
It couldn't be better.
It is more difficult if the observations are indirect, distorted by errors, interference, measurements are inaccurate and the result is veiled by noise.
This is where the need for statistical methods arises.
\pagebreak
The adjective “statistical” itself means that data averaged over many observations are used, a kind of aggregate average statistical experience.
And the novelty of our approach in comparison with the classical one is that this experience is formalized in the form of interval models of averages, which allows us to cover practically the most diverse statistical material in its poverty and wealth, taking into account the form, volume, uncertainty and degree of confidence in it.
The decisions themselves in statistical methods have a statistical coloration: they do not have to be completely accurate at all times, since it is impossible to do it once, but on average they should lead to the best result.
This is the main task of statistical methods - optimal synthesis, to which the entire second part of the book is devoted.
Mathematical statistics - the science of analyzing decision rules and their synthesis - has a long tradition rooted in the history of probability theory.
These two sciences grew together, pulling each other up.
Mathematical statistics gave food to the theory of probability by the requirement to master new models, developing methods for them.
And as a result of the rapid joint growth that owes the 20th century, we are faced with an amazing variety of models and methods.
It got to the point of absurdity, when researchers first came up with common-sense rules, and then “made them scientific”, looking for models for which these rules are optimal (if this is a way of justifying, then only their existence).
And consumers had no choice but to believe or pretend to believe, following the famous fairy tale about the naked king.
The roots of such absurdities lie in that unquestioning obedience, invisible fatalism, with which the choice of a model determines the method of synthesis and, as a result, the type of optimal solution.
And if you use an arsenal of accurate models, then their apparent diversity, on the one hand, and unreliability, on the other, give rise to the same doubt in the choice options, make completely different models equivalent, thereby depersonalizing the optimal decision procedures.
The obvious way out is to expand the range of models, supplementing it with simple, crude, reliable models to meet demand with the kind that can and should be trusted.
Not to collect models every time as accurate families, because it is a long way, but to have ready-made samples for all occasions - that is our goal!
In principle, the fact is that in real problems there is always a finite number of data and they cannot be absolutely accurate.
These are the main models we develop, gaining potential reliability at the expense of a loss of accuracy.
And this is precisely the meaning of the algorithmic methods we have prepared, focused on a finite number of data, and with an unlimited increase in modern analytical methods crumbling into “fireworks” (one way or another finding their reasonable justification within the framework of the proposed general approach).
\pagebreak
The new models are suitable for any “climatic” conditions: they “tolerate” both abundance and deficiency of the initial statistical data, operate under conditions of statistical instability (reflected in interval means), as well as in the partial and complete absence of statistical data.
At the same time, within the framework of indicator models, interval averages can be replaced by indications of intervals, tolerances for observations, bringing us closer to interval analysis, and in this regard, the theory is still waiting for its development.
\subsection{***}
***
\pagebreak %
\begin{example}
***
\end{example}
***
\pagebreak %
***
\pagebreak %
\section{Optimal rules}\label{sec:5.2}
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\begin{theorem}\label{thm:5.1}
***
\end{theorem}
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
\section{Sufficient reduction of observations}\label{sec:5.3}
\pagebreak %
***
\begin{theorem}\label{thm:5.2}
***
\end{theorem}
***
\pagebreak %
***
\begin{theorem}\label{thm:5.3}
***
\end{theorem}
***
\pagebreak %
***
\pagebreak %
\begin{theorem}\label{thm:5.4}
***
\end{theorem}
***
\pagebreak %
\section{Reduction of observations and invariance}\label{sec:5.4}
\pagebreak %
***
\begin{theorem}\label{thm:5.5} % TODO: name differs from Theorem
***
\end{theorem}
***
\begin{theorem}\label{thm:5.6}
***
\end{theorem}
***
\pagebreak %
***
\begin{theorem}\label{thm:5.7}
***
\end{theorem}
***
\pagebreak %
\section{Determinate solutions and filtration}\label{sec:5.5}
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
\begin{figure}[t]
\centering
***
\caption{***}
\label{fig:5.1}
\end{figure}
\begin{conclusions}{sec:5.6}
The theory of statistical synthesis proposed here is similar in construction to the classical one.
One can reasonably speak of optimality if this concept is defined.
And for this purpose, the apparatus of analysis must be worked out, criteria for comparing algorithms are put forward: which of any selected is better, which is worse.
The apparatus will be effective if the analysis of each algorithm is fundamentally possible and not so laborious that it threatens to stop somewhere.
Then the thought arises that since the algorithms are ordered in a column one after another according to their quality indicators, then among them there is the best one, which can be approached by moving into the column head.
Moreover, for this, it is not necessary to enumerate all the algorithms, there are other effective techniques and their development is our goal.
Synthesis cannot do without analysis, and analysis cannot do without preparatory work of the following kind.
First, taking into account the random nature of the environment, it is necessary to determine the nature of the randomness, i.e., the mathematical model in the form.
The latter is a joint description of the behavior of observations, i.e., the input of the algorithm, and not observable, but of interest to us, internal states of the environment.
Outputs of algorithms are associated with states, i.e., decisions made by them.
The view (kind?) of the set (array?) of solutions is the face of the problem.
If there are only two solutions, then it will be a test of two hypotheses.
A multi-alternative problem corresponds to several solutions.
If you need to estimate a parameter, then the set of solutions will be the points of the number axis.
Finally, in filtering tasks, the solutions will be implementations of the signal processed according to the algorithm.
Each algorithm is a rule of action, an instruction prescribing which output to assign to each observation, that is, the input.
The rationale of our approach is that it does not have to be a completely strict instruction (deterministic rule), but it may be a set of recommendations in the form of a list of comparative preferences that are endowed with different solutions.
Algorithms are inanimate objects, and let the final decision remain with the person.
This brings us to the notion of fuzzy solutions and fuzzy algorithms.
The practice of applications and the kind of mathematical apparatus of synthesis may require algorithms of fixed structure (say, linear).
Then, from the very beginning, a restriction of the rules is introduced in the form of a class of admissible algorithms, within which the choice of the best will then be made.
The preparatory work is not over yet.
You need to be able to compare the true values of the states with the decisions made.
Of course, complete coincidence is very good; but there would be no statistical problem if the algorithm had no “margin for error”.
You need to assign a cost for errors in the form of a loss function.
Only now, after the introduction of all the attributes of the statistical problem, can one proceed to the synthesis, that is, to find the optimal decision rule.
The criterion for comparing and choosing the best one will be the risk equal to the average losses.
Two features arise here.
One - constructive, consists in calculating the risk by continuing the primary means of the SIM on the loss function (a special case of such a continuation will be integration over the probability distribution).
The other is the interval nature of risk as the average of one of the signs of SIM, with the lower risk being the smallest, most optimistic value, and the upper one - pessimistic.
The idea of taking some intermediate value leads to a pessimism coefficient that weights risks.
We want to have a reliable guaranteed result; we take the pessimism coefficient equal to 1, focusing entirely on the worst upper risk.
We take zero - we are counting on the lower risk, over-optimistically betting on total luck (as in the maximum likelihood method §\ref{sec:5.5}).
Optimal rules with excessive optimism are feverish in their properties.
They are done a little better with semi-optimism, when the lower and upper risks are summed up with the same weights.
The semi-optimistic regime relaxes the requirements for the reliability of the model (which determines the “health” of the algorithm) and favors the deliberate transition to ideal “weather conditions” in the form of accurate models, thereby justifying the applicability of probability distributions.
Optimal rules synthesized in the regime of pessimism and super-pessimism acquire complete immunity to “weather”.
By their properties, they become stable and robust.
Algorithms are products of scientific laboratories.
It is convenient to make their “production” in two stages: first - preliminary preparation, and then - final optimization.
The first step is the meaning of sufficient reduction.
Its purpose is to narrow down the class of decision rules by indicating on what material it should be based, i.e. other than preliminary reductions of observations (that is how if preliminary reductions in observations) will not lead to a loss of properties.
It is interesting and important (Theorem~\ref{thm:5.3}) that this reduction is entirely determined by the structure of the primary features that initially define the SIM.
The simpler the primary set, the deeper the reduction is possible.
Sufficiency is exclusively the prerogative of the pessimistic regime.
In our presentation, it is connected with the classical concept by Theorem~\ref{thm:5.4} on factorization (extended in relation to interval models, and hence to families of probability distributions).
The invariance of interval models and their symmetry to transformations of the observation space gives rise to the same features of the optimal rules, and therefore, to some extent predetermines the form of sufficient transformations (§\ref{sec:5.4}).
The last section of §\ref{sec:5.5} of the chapter is devoted to the illustrated application of the main statements of the theory to problems of deterministic (pointwise) estimation and filtration.
The influence of the pessimism coefficient on the type of optimal filtering algorithms is clarified.
\end{conclusions}
\pagebreak
***
\vfill
\chapter{Fuzzy evaluation}\label{cha:6}
\section{General questions}\label{sec:6.1}
\subsection{Rule errors}
***
This can be done only when the true \(x\) is not “pricked” by a point, but covered by something tangible, for example, an interval, which brings us to vague estimates.
The need for vague estimates arises primarily where, in addition to the value of a state, the accuracy with which it is estimated is needed.
For example, the task is set - to accompany the indication of the distance to the target with the magnitude of the error, the range of spread with which it is measured.
This is how the confidence interval is generated.
The wider it is, the less likely an error will occur when specifying.
This error can be included as a component of the risk, but it can be part of the initial technical requirements for assessment, equipment, then the error must be fixed and maintained at a certain level.
In this case, we are talking about estimating a fixed error level~\(\alpha\), or simply the level~\(\alpha\).
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\begin{theorem}\label{thm:6.1}
***
\end{theorem}
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
\section{Confidence estimation given probability distributions of fluctuations}\label{sec:6.2}
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\begin{figure}[b]
\centering
***
\caption{***}
\label{fig:6.1}
\end{figure}
\pagebreak %
***
\pagebreak %
***
\vfill
\section{Estimate of regression parameters given energy and correlation data of fluctuations}\label{sec:6.3}
\subsection{Rationale}
The most common, available information about the process, in addition to the average, is its energy characteristics: average power, current power, as well as correlation properties, spectrum.
The reason for their spread is that the upper limit of the average power is simply the limit of the energy capabilities of the radiation source.
And the correlation properties are usually due to the filter, whether it is an object of natural nature as the inertia of the environment or artificial in the form of the initial stages of the receiver, which overcomes the process before getting to the processing and decision-making device.
By the way, the energy data is a part of the correlation data, as well as the uncorrelatedness of the sample, which is usually achieved if the samples of the process are widely spaced from each other.
Here, the optimal confidence estimates are constructed for regression parameters, in particular, the shift parameter, when certain data of the specified type on fluctuations are known.
The estimates are vague, but in no way indicative, that is, not in the form of confidence intervals.
It is interesting in the course of the presentation to trace how the vague form of these estimates is associated with the type of primary signs that are invisibly present in the energy, correlation and other initial (primary) data, and also how as the quantity and quality (accuracy) of these data increase, the estimates improve, becoming more accurate, more reliable.
\pagebreak %
***
\pagebreak %
***
\begin{figure}[b]
\centering
***
\caption{***}
\label{fig:6.2}
\end{figure}
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
\begin{figure}[t]
\centering
***
\caption{***}
\label{fig:6.3}
\end{figure}
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
\section{Estimate of shift parameters by moments and harmonic means}\label{sec:6.4}
\pagebreak %
\begin{figure}[t]
\centering
***
\caption{***}
\label{fig:6.4}
\end{figure}
***
\pagebreak %
***
\pagebreak %
***
\begin{center}
\begin{tabular}{l*{2}{S[table-format=1.3]}*{2}{S[table-format=1.2]}S[table-format=1.2e-1]S[table-format=1.1e-1]}
\toprule
*** & {1} & {2} & {3} & {4} & {5} & {6}\\
\midrule
*** & 0.267 & 0.086 & 0.03 & 0.01 & 3.84e-3 & 1.4e-3\\
*** & 2.62 & 3.9 & 4.84 & 5.62 & 6.3 & 6.9\\
*** & 2.22 & 3.44 & 4.36 & 5.08 & 5.76 & 6.4\\
\bottomrule
\end{tabular}
\end{center}
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\vfill
\section{Confidence estimation of scale parameter}\label{sec:6.5}
\subsection{General considerations}
The scale parameter determines the energy properties of observations: power, effective value.
The problem of its estimation arises, firstly, when the level of intensity of the process carries useful information about the position of the object of interest to us.
For example, the magnitude of hydroacoustic noise can indicate the distance to a school of fish or a ship.
In this case, noise itself acquires the rank of utility in a certain sense.
And secondly, the need to measure the noise level arises when determining and monitoring the performance of operating equipment, for example, to record the probability of false alarms in radar stations.
And in this case, the estimation of the noise level can not be considered “second-class”, a secondary matter, as components of evaluation are included in a single synthesized complex functioning of the equipment.
\pagebreak %
From the standpoint of modern probability theory, the estimation of scale parameters occupies a special position.
It is due to continuous time and the assumption about the exact knowledge of the correlation properties, since then, expanding the process of finite time length from \(0\) to \(T\) in a series in terms of the own functions of the correlation kernel, we obtain an infinite number of uncorrelated expansion coefficients, which, normalized (by dividing by the roots of their variances), are all reduced to the same variances.
As a result, the observations are transformed into an endless sequence of transformed, new values, uncorrelated with each other and responding in the same way to a change in scale, and thus, carrying an unlimited supply of information about this parameter; a margin that allows you to select it as accurately as you like (moreover, along the segment \((0, T)\) of observations of arbitrarily small length!).
This paradox - the effect of exact knowledge - is called the singularity\footnote{***}.
The phenomenon of singularity turns out to be a serious obstacle in the real estimation of the scale parameter, since all assessments reflexively stretch in its direction as promising “heavenly bushes” (paradise) of unlimited improvement of their quality indicators, blocking the search for other ways and completely forcing to forget that the reduction of the interval length \((0, T)\) of observations imposes increasingly stringent requirements on the accuracy of the correlation function in order to somehow guarantee the uncorrelatedness of even a small number of expansion components.
Our goal is to construct estimates of scale parameters based on real, finite data, which by itself will exclude any possibility of such purely mathematical “tricks” as a singularity, and put the theory of estimation on the basis of reality.
Moreover, as always, we will take the weakest initial data (about the phenomenon) of the energy sense as the beginning, and then we will gradually strengthen them in the right direction, not forgetting to follow the changes in the optimal assessment that are taking place at the same time.
In the end, it is useful to know what real information should be available in order to estimate the scale parameter with the required accuracy, in particular, its kind - variance.
\pagebreak %
***
\pagebreak %
\begin{figure}[t]
\centering
***
\caption{***}
\label{fig:6.5}
\end{figure}
***
\pagebreak %
***
\begin{figure}[b]
\centering
***
\caption{***}
\label{fig:6.6}
\end{figure}
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\begin{conclusions}{sec:6.6}
The logic is that you do not expect an intelligible answer to an unclear question.
The natural response to the fuzziness of models caused by a priori data poverty is fuzzy estimates.
Estimates of the required parameters must obey two diametrically opposite requirements.
On the one hand, to be reliable and, therefore, vague to the required extent, and on the other, consumer interest forces them to be specific, that is, to the greatest possible accuracy.
The resolution of this contradiction leads to the optimal confidence estimates of the fixed reliability.
The introduction of a composite risk as the sum of the penalty for vagueness (in the form of the area under the decision curve) plus the value (probability) of the weighted error achieves its goal - a significant simplification of methods for synthesizing confidence estimates.
Even with classical probability distributions, minimization of the composite risk leads to - new in content - general formulas for confidence intervals (\S\ref{sec:6.2}) (for normal distributions turning into known ones), to pose and solve new problems, for example, to find a confidence estimate of the variance that has minimal vagueness; to obtain the optimal confidence estimates of the regression parameters.
The most important thing is that the composite risk, organically combined with the construction of interval models in the form of primary means, goes to simple structures of optimal estimates, which will be linear combinations of primary features truncated from below, as stated in Theorem~\ref{thm:6.1}.
Depending on the primary data, a variety of assessments are obtained, giving a picture of preferences for the parameter values on a 0-1 scale.
In particular, in case of probabilistic models, these will be confidence intervals (preference for a complete 1 inside and 0 outside the interval) due to the fact that the primary signs here are indicator.
\pagebreak
Less data - easier to find the best estimate.
Is this not the correct yet unfamiliar concept?
No data - no search needed, estimates are trivial (all values will be assigned the same preference).
The homogeneity of the initial data, their symmetry responds in the assessment structures by a decrease in the number of variable coefficients that are subject to optimization.
The coefficients are connected by a linear equation fixing the reliability, and the rest are found from the minimum of the integral vagueness of the estimates.
Their search is embedded in a typical parametric optimization problem with constraints.
We solve it analytically, not excluding the usefulness of numerical methods in the form of standard programs.
Optimal confidence estimates for the shift parameter are found in \S\ref{sec:6.3}, where only the average power of fluctuations is considered known.
They become more accurate for uncorrelated fluctuations, and then generalize to the case of specified correlations and to the estimation of the (one-dimensional) regression parameter.
It is noteworthy that the parabolic form all these estimates inherit or quadratic features that make up the properties of the second order of observations.
And it is characteristic of our theory that the complexity estimates are the same both for the sequence of observations and for processes if the composition of the primary data is the same.
As the number of data grows, the estimates become somewhat more complicated, but they are refined and become narrower in form with the same reliability.
The consumer question is anticipated: Are the obtained optimal estimates better than the usual confidence intervals for the normal distribution?
To a naive question, the answer is disappointing: of course, worse!
And there can be many, since different estimates are designed for different conditions.
Some conditions are a regime of complete prior well-being in the form of a normal probability distribution, according to which a confidence interval is calculated.
The estimates we obtained are optimal in the conditions of complete trouble, inherent in poor knowledge of momentary problems, therefore, they are more vague.
And yet, their trouble is relative.
With independent readings, as the number of observations increases, without any additional assumptions, our estimates approach the normal in properties (\S\ref{sec:6.4}).
One can only suspect that we owe this to the internal laws of normal convergence, which are not formally involved in the synthesis.
This result demonstrates the excellent “digestion” of the developed apparatus, which allows assimilating any nutrients with an extremely poor a priori ration.
The introduction of pre-marginal and marginal results into the assessment on a systematic basis is set out at the end of \S\ref{sec:6.4}.
The end of the chapter is devoted to the problem of estimating the scale parameter (in particular, the power) from the data on the properties of the second order of observations.
It turns out to be unexpected here that the absence of restrictions on the moments of the fourth order does not allow obtaining any good estimates even with an unlimited increase in the length of observations.
But it turns out to be very simple to fix them with the necessary restrictions.
And it was quite rightly expected that the singularity problem (consisting in the absolute accuracy of estimating the variance, and in general, the shift) becomes impossible by itself if we abandon ideal probabilistic models and go over to real interval models.
Singularity is a by-product of theorized abundance and an unattainable dream for a priori poverty habitual for practical tasks.
\end{conclusions}
\pagebreak
***
\vfill
\chapter{Test of hypotheses}\label{cha:7}
\section{General statements}\label{sec:7.1}
\subsection{Introduction}
Let us abstractly imagine a certain box, which does not matter what it is filled with, and what happens in it and how, but what is important is that the processes inside can be in some two states.
One state is usually working, conventionally called zero, does not cause any reactions in us and does not require intervention, while the other, on the contrary, is critical, and called alternative.
It is necessary to decide from observations of the output of this "box" which of the two states is present at the moment, and under conditions when the output is so clogged with noise or fluctuations that it is impossible to make a decision unambiguously and with complete confidence, since the appearance of the same realizations is characteristic to both zero and alternative states, but with different probabilities.
The latter is very important and implies the presence of average statistical data on the output, on observations.
This is what you can and should seize upon to solve the problem.
The source of the data is not important for us now: either it is the result of studies of the internal physical nature of the process, or the consequence of preliminary observations of the output, when it is said to which state it corresponds to (learning with a teacher), or there may be learning without a teacher.
The form of data presentation is important, which should correspond to our general concept of the formation of an interval model of means: to have the form of vague average statistical properties of the features of the output observations.
Thanks to the language of primary data, this is a very accessible and universal form, which we have seen more than once.
Moreover, the data must, in one way or another, be associated with states, whether it is zero or alternative (otherwise the solution will become trivial).
It is important here to analyze and obtain data not on any available signs for the purpose of using all of them, but only on the most characteristic ones, highlighting those directions that turn out to be most sensitive to a change in states (less is better).
Then there will be less primary data, and the task will be easier.
It is this case that we are most interested in also because when learning or preliminary analysis is impossible, many real problems turn out to be “sitting on a starvation diet” of a priori deficit.
\pagebreak
Since the actual states are unknown and this has yet to be solved, they are called hypotheses: respectively, the null hypothesis and the alternative (easier, alternative).
They can have different weight and priority.
After all, rejection of the null hypothesis, that is, acceptance of an alternative, if it is incorrect, can ultimately lead to dire consequences.
For example, a wrong decision is made about the movement of a group of unidentified objects in the direction of someone's territory, which can cause instant response aggression that cannot be justified.
Or a decision is made about the harmlessness of a new medicine, when in fact it is not.
Or an innocent is convicted.
The foregoing leads to the principle, so to speak, of the "presumption of the hypothesis", according to which it is better to reject the alternative several times, even if it is wrong, when it is actually true, than to reject the hypothesis at least once incorrectly.
As a consequence of this principle, a hypothesis will often be accepted, and an alternative will be accepted only when there are quite good reasons for this, which do not cause much doubt.
Thus, the hypothesis is given priority, weight, and thus a conservative beginning is laid in it, and in the alternative - a progressive (sometimes aggressive) one.
This is important for understanding the meaning of the rules for checking the hypothesis with a fixed level of errors, which will be discussed below ***.
The problem of distinguishing between hypotheses is also not left aside, corresponding to non-priority hypotheses, for example, the case when i. the communication channel sends messages \(0\) (corresponding to the null hypothesis) or \(1\) (alternative).
In fact, if the weight is replaced by the hypothesis probability (the average expected frequency of alternating zero messages), then the choice of the probability can always take into account the desired priority, as well as vice versa.
This convinces us (and will be mathematically supported) that the problems of testing and distinguishing hypotheses, i.e., with priority and non-priority hypotheses, are adjacent, substituting for each other.
\subsection{***}
***
\vfill
***
\pagebreak %
***
\pagebreak %
***
\begin{theorem}\label{thm:7.1}
***
\end{theorem}
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\begin{figure}[b]
\centering
***
\caption{***}
\label{fig:7.1}
\end{figure}
\pagebreak %
***
\begin{theorem}\label{thm:7.2}
***
\end{theorem}
***
\begin{theorem}\label{thm:7.3}
***
\end{theorem}
***
\pagebreak %
***
\pagebreak %
\section{Correlation theory of hypotheses' test}\label{sec:7.2}
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
\begin{figure}[t]
\centering
***
\caption{***}
\label{fig:7.2}
\end{figure}
\begin{figure}
\centering
***
\caption{***}
\label{fig:7.3}
\end{figure}
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
\section{Using confidence estimates for hypotheses' test}\label{sec:7.3}
***
\begin{theorem}\label{thm:7.4}
***
\end{theorem}
***
\pagebreak %
\begin{figure}[t]
\centering
***
\caption{***}
\label{fig:7.4}
\end{figure}
***
\pagebreak %
***
\pagebreak %
\section{Special methods for rule synthesis}\label{sec:7.4}
\pagebreak %
***
\begin{theorem}\label{thm:7.5}
***
\end{theorem}
***
\begin{theorem}\label{thm:7.6}
***
\end{theorem}
***
\pagebreak %
***
\begin{theorem}\label{thm:7.7} % TODO: theorem variant within same numbering sequence?
***
\end{theorem}
***
\pagebreak %
***
\begin{theorem}\label{thm:7.8}
***
\end{theorem}
\begin{figure}
\centering
***
\caption{***}
\label{fig:7.5}
\end{figure}
\begin{figure}[b]
\centering
***
\caption{***}
\label{fig:7.6}
\end{figure}
\pagebreak %
***
\begin{figure}[b]
\centering
***
\caption{***}
\label{fig:7.7}
\end{figure}
\begin{theorem}\label{thm:7.9} % TODO: theorem variant within same numbering sequence?
***
\end{theorem}
\pagebreak %
***
\pagebreak %
\section{Test of hypotheses about the given parameter value}\label{sec:7.5}
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
\section{Discrimination of several hypotheses}\label{sec:7.6}
\pagebreak %
***
\begin{theorem}\label{thm:7.10}
***
\end{theorem}
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
\begin{figure}[t]
\centering
***
\caption{***}
\label{fig:7.8}
\end{figure}
***
\pagebreak %
***
\pagebreak %
***
\begin{figure}[b]
\centering
***
\caption{***}
\label{fig:7.9}
\end{figure}
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\vfill
\begin{conclusions}{sec:7.7}
Two hypotheses, null and alternative, are considered, with different statistical descriptions of observations, in the form of interval mean models (IM).
Behind the hypotheses there are specific practical tasks such as whether there is a signal or not when it is detected, or checking a device malfunction, compliance with its technical requirements, etc.
The purpose of the decision rule is, based on the results of observations, to make a choice in favor of one of the hypotheses.
And a specific choice is not required, but rather a vague one is allowed, in which solutions are presented ambiguous in the form of preferences (realized by randomization).
The rules for testing hypotheses are characterized by the probabilities of erroneous acceptance of one when the other is true.
There are two errors: the first kind, with a wrong rejection of the null hypothesis, and the second.
The probabilities of errors are found by the continuation of the primary means of IM to decision rules (considered as features).
The result will be the interval values of each probability: its lower boundary, which gives the most optimistic forecasts for error, and the upper one, pessimistic.
They display intermediate values depending on the degree of pessimism.
\pagebreak
The optimal rule for testing a null hypothesis is one that minimizes the magnitude of the second type error for a given first (rule level).
Although, by definition, priority is given in favor of the null hypothesis, synthesis is equivalent to solving the related problem of minimizing the weighted sum of errors with the subsequent selection of weights (replacing the prior probabilities of the hypotheses).
In this plan, sufficiency is determined.
The central result is formulated by Theorem~\ref{thm:7.1} and consists in the fact that classes of rules that are sufficient under pessimism, and hence the structure of the optimal rule, are determined exclusively by linear combinations of the primary signs of hypotheses.
It remains to find the coefficients, which will be the smaller, the simpler the hypothesis is in terms of their equilibrium composition.
In this case, the type of the rule can target only one of the hypotheses, either the null or the alternative, the first case being preferable because of the simplicity of fixing the level.
The symmetry and homogeneity of hypotheses further simplifies the task.
The features of the optimal rules are, first, in their vagueness (randomization), the nature of which is entirely determined by the form of the primary features of the hypothesis (or alternative).
And the second feature, which is a side factor of the arbitrariness of tuning to a hypothesis, is the ambiguity of the species.
Both features are exacerbated with poor initial material constituting the hypotheses and disappear when passing to “rich” models in the form of probability distributions, where randomization can remain only on the border according to the well-known Neyman-Pearson lemma.
The provisions of the theory are revealed by finding in \S\ref{sec:7.2} the optimal rules for observational shifts and the known correlation properties accompanying hypotheses (oddly enough, such a necessary problem in the classical apparatus has no solution).
The rules are vague, their nature, as follows from the quadratic form of the primary features, is parabolic.
Difficulties in identifying errors in rules are pushing to search for other methods of synthesis that use the paths traversed.
One of them (\S\ref{sec:7.3}), beaten by the confidence estimates of the previous chapter, requires hypotheses to be written in terms of two different values of the same parameter.
The rule is to assign to the null hypothesis the degree of preference which the confidence score gives the parameter value corresponding to this hypothesis.
The vagueness of the assessment will give rise to the vagueness of the rule, and the magnitude of the estimation error goes to the level of the rule and opens the way for the approximate calculation of the error of the second kind.
Thus, a large number of rules are formed at once, which may not be quite optimal, but nevertheless good in terms of their “kinetic” proximity to estimates, if they are taken the best.
Another way (\S\ref{sec:7.4}) is already traditional and consists in interpreting the models as families of exact probability distributions, searching within families of the least favorable ones and comparing their ratio with the threshold according to the prescription of the well-known Neyman--Pearson lemma.
The path draws on game theory methods and embraces a robust approach.
The author presents his results on interval densities to demonstrate the comparative capabilities of the robust approach within general interval constructions.
\pagebreak
Hypotheses acquire a qualitatively different character when it is necessary to make a decision in the form of agreement with the put forward (hypothetical) position and disagreement.
For example, the device is working properly or not; the hypothesis is concretized, and the alternative will be everything else, however, it is not clear what.
Figuratively speaking, if a hypothesis and a specific alternative constitute a directed dipole, then it turns out to be non-oriented, but fixed on the part of the hypothesis.
The statement and methods of approach to this problem are contained in \S\ref{sec:7.5}, where it is proposed as possible ways to use the previously found hypothesis testing rules and confidence estimates.
The problem of distinguishing between hypotheses \S\ref{sec:7.6} arises when it is necessary to separate or distinguish between several states of an object.
Hypotheses are formulated in terms of IM, their primary features determine the structure of the optimal rule (Theorem~\ref{thm:7.10}) that minimizes the total error.
Its ambiguity is taken into account by introducing a neutral solution.
Specific rules are obtained for hypotheses specified by the correlation properties of observations.
The problem of distinguishing many states brings us closer to their estimation and can even be solved by methods of estimation theory, the whole question rests on the criteria of quality analysis and methods of risk formation.
\end{conclusions}
\chapter{Reliability synthesis}\label{cha:8}
\section{General questions of the model synthesis}\label{sec:8.1}
\subsection{Model synthesis methodology}
Our life can be compared to the movement in the compartment of a fast train, from the window of which landscapes quickly replace each other.
And there is not the slightest time to think that each of its fragments “breathes” its own very complex and meaningful life, a detailed study of the laws of which is the lot of many, many generations (if not all).
We, looking out the window, restrict ourselves to the most superficial representation of all this, an external model that connects what we saw with our internal views, experience.
This is natural, because common experience in its claims and manifestations is the essence of an economical representation of the environment: highlighting the main thing, ignoring everything that is secondary (although children, beautiful girls and some scientists sometimes have the opposite tendency).
The figurative example is not accidental, but explains the nature of the enduring epistemological connection:
\begin{center}
\begin{tikzpicture}
\node (p) {\uppercase{phenomenon}};
\node (m) {\uppercase{model}};
\draw[->,yshift=1ex] (p) -- (m) node[above,pos=.5] {adequacy};
\draw[->,yshift=-1ex] (m) -- (p) node[below,pos=.5] {act(ion)};
\end{tikzpicture}
\end{center}
A person builds models as reflections of realities (and on a very different basis) and uses them to cognize nature, as well as to influence it (therefore, the relation is indicated by arrows in either direction).
Like nature, the model lives its own independent life.
Both lives are related in the sense that they must be coordinated, as they say, the model must be adequate to the phenomenon (upper arrow), which will allow predicting the states of the model and then controlling the states of the phenomenon (lower arrow).
An excellent illustration of what has been said is the random process of Brownian motion - the classical model of the movement of a particle during its chaotic collisions with molecules during thermal motion.
A vivid physical picture here gave rise to a mathematical image, a probabilistic model, in which there are no particles as such, they were forgotten, and only the squeezed out result is the process of movement in the form of a mathematical construction.
This abstraction made it possible to find the probabilities of deviations, to establish the ideal laws of Brownian motion.
In general, you need to be careful, because in the language of models, far from everything that exists in nature is convenient and reflective.
Idealization leads to models that are reliable only at first glance, reflecting the desired experimental picture for the model.
In the future, it is convenient, due to the specifics of this presentation, to speak only about mathematical-probabilistic models - a symbolic language for describing random phenomena.
The depth and boundless complexity of the phenomena of the real world forces, it would seem, the same qualities in models.
So it sometimes seems that the more complex the model, the richer its own life, the more powerful in its reflective potential it is.
One can also be saved if this did not destroy the direct connection between the model and the phenomenon.
The complication of the model requires an urgent check of each of its new fragments for compliance with reality.
With the words “let” (so practical in modern mathematical-probabilistic expositions, let the process be Markov, let the density exist and be differentiable twice, let the probability be known, etc., etc.) one dictates the isolation of the life of the model, turns into its goal in itself its teaching to mathematicians who consider themselves completely ***.
It can be argued that history knows examples, such as group theory or non-Euclidean geometrics, when this or that purely mathematical construction ultimately met with practical applications.
But calculating every time for a case is the same as betting on a meteorite hitting it in the elimination of a cosmic object.
Isn't it safer to aim something controlled in time in three-dimensional space at an object?
We are convinced here once again in our main idea, which gave rise to and permeates the entire content of the book, that torn off supercomplex models are not needed, the nuances of which can be taken seriously by some conscientious researcher as known laws of nature.
It is much more economical and reliable to have an arsenal of simple models that connect with the phenomenon in a small number of sides, a kind of channels, and reflect the phenomenon not all at once, but in parts, illuminating each time only the side that is of immediate interest.
That is the power and art of cognition: to divide the entire sphere of activity into sciences, each of which is given its own part of a physical phenomenon, then science into subject areas, etc.
\pagebreak
Let us now concretize the connection: \uppercase{mathematical theory \(\longrightarrow\) model}.
Mathematical theory develops group laws of models and is organized within itself by a system of axioms - this formal construction that connects each model.
On symbolic material, axiomatic connections must repeat real ones, which is the adequacy of the axioms and the guarantor of the theory's validity.
And the vitality of a theory will depend on what aspects of its representatives (models) are associated with phenomena, how easily these connections are induced, accessible, physically visual (interpretable), habitual, and finally, reliable.
This will help the research engineer to choose the specific model that is needed to solve the problem, in order to further attract the full power of the theory, which only benefits from simplifying the models.
\subsection{Formulation of the problem}
Our goal is to consider the problem of synthesis of interval statistical models of means.
***
\vfill
Evaluation of \(\theta\) should be made from a clear awareness of the ultimate goals, by which we mean that subsequent, hereditary task for the solution of which the model is called upon.
This can be either an analysis task or a decision rule building task.
For example, a noise model is used to find an algorithm for optimal signal detection or estimation parameters in additive notation.
Another option is possible, when the hypothesis and alternative models are not interconnected through noise and are constructed separately from one another.
Let us illustrate our idea that the solvers should influence the synthesis of the model itself in the form of requirements for estimating the setting parameters through themselves with an example.
\pagebreak
\begin{example}
***
***\footnote{***}.
\end{example}
\vfill
***
\vfill
***
\vfill
Two problems arise here: the choice of the defining features and the estimation of their means.
The first constitutes its own sphere of activity, and not always scientific, but taking into account the overwhelming variety of the signs themselves and the possibilities of their choice, sometimes turning into art (it begins where science ends).
This is the engineering art of choosing connecting signs, using a priori information about the phenomenon, what is the knowledge of its mechanism, observing the phenomenon, relying on practice, experience and common sense, in accordance with the complexity of the procedure for synthesizing a model, without "lowering the sight" from the subsequent application of the model.
The qualities of the model that permeate the mood of the entire book are guiding here: simplicity, accessibility, reliability.
\pagebreak
The second problem is the estimation of the driving parameters.
It is consistent with the choice of defining attributes and therefore does not exclude the use of the most diverse physical laws and facts.
But it can also be formally solved on the basis of a preliminary experiment, training implementations, which interests us the most.
Here, the requirement for the adequacy of the relationship between the model and the phenomenon imposes on the training implementations the obligation to be “authorized representatives” of the phenomenon of interest to us, that is, to be carriers of the same parameter values (mean), then they can be evaluated.
This is the condition for the stationarity of the setting parameters, which we will now discuss.
\subsection{Stationarization of statistical parameters}
From the standpoint of mathematical models, averages, like probabilities, are fixed numbers, while interval averages and probabilities are intervals, the boundaries of which satisfy the IM axioms.
These concepts contain the meaning of the arithmetic mean or frequency.
\vfill
***
\vfill
***
\vfill
Interval averages and probabilities arise when their exact values are not available, either because of the limited number of tests (lack of experience), or because of the possible nonstationarity of tests, i.e. instability of the means and frequencies.
The latter is an unpleasant "step", but fortunately, not for the entire statistical approach, but for those formal methods of synthesizing the model, which we strive for and which willy-nilly require the parameters to be stationary.
Let us indicate a method of stationarization, typical for statistical applications, based on a random choice.
\pagebreak
***
\vfill
\begin{example}
***
\end{example}
\vfill
***
\vfill
The conclusion is this: \emph{a random rare choice from a set is a means of stationary sequence and a reason for the independence of its elements}.
Stationarization facilitates the construction of a mathematical model, since it reduces its synthesis to evaluating the setting parameters that are identical in the course of tests, i.e., to its new statistical problem, which can be formalized and solved by the methods considered in the previous chapters.
To do this, you need to know what content of the estimates that define the model you want to get, what basic criteria they should satisfy.
\subsection{Trust model concept} % TODO: trust = confidential?
Let's analyze the contents of Fig.~\ref{fig:8.1}.
The construction of a mathematical model is a separate problem of parameter estimation, or, more precisely, a super-problem, which feeds on tests, and like any statistical problem, it needs an initial construction.
In the construction of the superproblem, first of all, there is an initial mathematical model of the very sequence of tests, so to speak, a supermodel, which differs from the desired one by its width and extremely unpretentious queries: for us these will only be assumptions about independence and stationarity.
It will be interesting to observe further its narrowing into the required model, which is enough to do in the direction of \(***\), the estimated (setting) parameters.
\vfill
***
\pagebreak %
\begin{figure}[t]
\centering
***
\caption{***}
\label{fig:8.1}
\end{figure}
***
\pagebreak %
\section{Building a confidence model on a given set of events}\label{sec:8.2}
\subsection{Starting positions}
Here we consider the case when the defining parameters are the probabilities of a set of events that are initially non-intersecting, which is applicable to those problems in which the characteristic features of the phenomenon of interest to us are added from the probabilities.
This choice can be guided by the simplicity and convenience of estimating probabilities.
At the same time, it is clearly necessary to realize that all the primary signs of the final confidence model will be necessarily constant at the specifying events or their intersections, and the same property will move on to the decision rules (of a sufficient class) as the ultimate goal of building the model.
\vfill
***
\vfill
***
\vfill
***
\vfill
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\pagebreak %
***
\vfill
\section{Consistent synthesis of models and rules}\label{sec:8.3}
\subsection{Robustness of Models and True Rule Errors}
Whether we like it or not, the process of manufacturing industrial products is “overgrown” with a mass of preparatory work and support services.
You need to get raw materials, make headers (models), assemble them, deliver them to the place, and for this you need to have a room, prepare machines, equipment, finally find workers and interest them in wages, organize economic services (although this is still, of course, not all).
And only then you can start making it.
Reliability, rhythm of work must be looked at in combination with the coverage of the entire well-oiled mechanism of enterprises as a whole.
***
At the same time, it is impossible to bring this requirement to complete absurdity, forgetting about the intended purpose of the model - to lead directly to the product - the decisive rule.
Excessive tooling costs will increase both production time and total cost.
A compromise is needed here.
*** by “products” we mean the rules of confidence assessment (although this can be a test of hypotheses).
The model and the rule have their own attributes: the model has reliability, the rule has an error (level of significance).
The latter is calculated by the type of model.
Obviously, having an unreliable model, one can no longer trust the error (the calculated significance level \(\alpha\)) calculated (according to the model) for the rule, because the true probability of error is expected to be higher than the calculated one.
At the same time, a too broad and hence overly reliable model will lead to an unjustified increase in the calculated errors of the rules.
The resulting contradiction gives rise to the need to reveal the strict connection between the reliability of the model and the errors of the rules, to which we will proceed.
\pagebreak %
***
\pagebreak %
***
\vfill
***
\vfill
***
\begin{equation}\label{eq:8.9}
***
\end{equation}
***
\vfill
It should be noted that the structure of optimal rules is generally protected from risk recalculations, since it is entirely determined by the first term \eqref{eq:8.9}, and, ultimately, by the primary features of the confidence model.
Some parameters of these rules will remain flexible, variations of which help to control fuzziness and error.
So, it has been established that the unreliability of the model leads to the need to introduce corrections to the errors of the rules: the true errors will, in general, be greater than the calculated ones.
If you need to provide a fixed true error (level), you need to prudently take a deliberately lower calculated value with a discount for unreliability, which corresponds and will lead to a drop in the real qualities of the rules, both optimal and non-optimal.
\subsection{Fuzzy trust models and decisions}
Vague estimates of the setting parameters lead to different models, and indicator estimates lead to MI, and non-indicator ones - to more general blurred models §\ref{sec:2.3}.
And here the question arises of how to organize the synthesis of optimal rules for fuzzy (non-interval) models.
This important point remained outside the scope of consideration, since during the synthesis we were limited to strictly interval models.
Let's move on to its coverage.
\pagebreak %
***
\pagebreak %
\subsection{Adaptation, reliable estimation of the mean with unknown variance}
The adaptation principle consists in narrowing the confidence model in the process of receiving observations \(y_1, \dots, y_n\).
The peculiarity is that “learning” is carried out on the basis of the same observations, which are used to make decisions about states (estimation, testing hypotheses); this is the first.
And secondly, the refinement of the model is carried out: sequentially according to \(n\).
To describe the adaptation scheme, let us consider one case when it is required to estimate the shift parameter of an independent sample with an unknown variance of fluctuations.
The adaptation consists in estimating the variances and substituting the shift parameter into the estimate.
\vfill
***
\vfill
***
\vfill
***
\vfill
***
\pagebreak %
***
\vfill
\begin{conclusions}{sec:8.4}
What are the benefits of using interval models?
Qualitatively, they lose to exact (probability distributions), and this seems to put an end to them.
Imagine for a moment that there is a hitherto unseen object and you need to describe its shape.
Moreover, the object is poorly distinguishable (located in fog or far from us).
It is clear that one will have to stipulate the conventionality of the description by insertions of the type “seemingly”, “it seems”, or be satisfied with a crude way, subduing the imagination in relation to what is not distinguishable (science is not fantasy).
The same is true for models, which are seen in most real problems very vaguely due to the finiteness of time and allocated funds for the study of the phenomenon, finally, the shift, instability of the real phenomena themselves.
Realizing perfectly well that the erroneousness of the model will immediately render its further exploitation useless, we will try to be more careful, that is, to describe the edges of the model in a vague form, while selecting only the most “visible” part of them.
And we come to “purebred” interval models in their constructive assignment with a set of primary means.
\pagebreak
Building a model in real conditions is the sphere of many-sided (sometimes long-suffering) research: physical, experimental statistical.
We are interested in formal methods, when training implementations are provided at the disposal of complete initial ambiguity.
Then the construction of the model is a joint estimation of the selected part of the parameters that define the model (\S\ref{sec:8.1}).
Point estimates lead to accurate models.
Confidence scores form a confidence model of a given reliability.
The question boils down to the choice of parameters defining the model.
These can be probabilities, the joint estimation of which is discussed in \S\ref{sec:8.2}.
Another way of choosing a model, inherent in the classical approach, is to check the agreement that the model has an advanced specific form (for example, normal).
This path, owing in many respects to the extreme narrowness of the working arsenal of accurate models, makes us “content” from the very beginning with a hypothetical, moderately simple option chosen in advance.
Acceptance of the latter, if there is agreement, nevertheless will not lead to a near-reliable model, but will only establish the relevance of the hypothetical variant to our confidence IM model as an integral part.
Ultimately, this will be “snatching” from the confidence model of its pre-formed piece - an approach characteristic of the regime of optimism.
We are much more careful, using the entire confidence model (the immense arsenal of the IM allows you to make any choice), adjusting the width with the help of reliability.
The more we put forward reliability, the wider the confidence model, from which the concreteness and quality of conclusions during the operation of the model will suffer.
On the contrary, the lower reliability, it would seem, will make the conclusions more specific and of high quality, but the credibility will decrease due to the loss of the model's fidelity.
The way out of this vicious circle is to jointly consider the model-inference tandem (\S\ref{sec:8.3}), taking into account both types of errors: due to the unreliability of the model and during operation due to random observations.
Joint consideration forces us to introduce corrections to the estimated operational errors, increasing them according to the unreliability of the model, which leads to true errors, and more broadly, to true risk.
This is exactly what is objectively needed for the purposes of analysis and synthesis of decision rules.
The difficulties are that the confidence models inherited from the confidence estimates that gave rise to them, in general, vague, non-indicator ones, become blurred in the form of averages, that is, with smeared intervals of averages.
The definition of the concept of an optimal rule for fuzzy statistical models made it possible in \S\ref{sec:8.3} to consider the joint picture of the synthesis of a model in terms of its only setting parameter—variance, with the subsequent finding
All together, this looks like a confidence estimation of the variance (the level of confidence of which will become the reliability of the model) with its subsequent use in the confidence estimation of the shift with its already confidence (calculated error).
Moreover, everything is done according to the same sample of observations, as it lengthens, the variance estimate is refined, that is, the adaptive narrowing of the model.
\pagebreak
Let us emphasize once again that the main result of considering the tandem model rule was the true risk (true errors) of the synthesized rule.
And the reliability of the model is just an auxiliary synthesis attribute that acquires a specific meaning by minimizing the true risk.
Such an optimal reliability exists and is found in the considered adaptation problem in \S\ref{sec:8.4}, where its tendency is established with an increase in the sample length to tend to unity at the rate of the cube root.
Now we can give a substantiated answer to the question posed at the beginning: the benefit of interval models consists in obtaining objectively reliable decision rules that are substantiated in all respects with an assessment of their true qualities.
This is the main achievement of the reliability synthesis.
\end{conclusions}
\setcounter{secnumdepth}{0}
\pagebreak
\printbibliography
\pagebreak
***
\pagebreak
***
\pagebreak
*** % TODO: page should be empty
\pagebreak
*** % TODO: page should be empty
\clearpage
\tableofcontents
\end{document}
\endinput
**