1 Introduction

Within the wide spectrum of theoretical, methodological and technological aspects covered by the Granular Computing framework, let us focus on a relatively new one rooted on the two interwoven concepts of Interactive Granular Computing [IGC (Pedrycz and Chen 2015)] and Complex Information Granule [c-granule, Skowron and Jankowski (2015)]. Both arise from the need of putting in relation an internal setting made of information granules with the physical world representing the external environmental counterpart that those granules affect. In the age of user experience (Amershi et al. 2014) these relations have great relevance in many operational situations when we want to optimize the person’s perception of pleasantness about using a particular device. A way of dealing with the two contexts passes through the definition of complex information granules (c-granules), which are usually related to interplaying agents that face the physical environment through abstractly defined windowed hunks (Heller 1990). This definition takes place in terms of the three components: (1) the soft_suit, recording perceived properties of hunks and their interactions; (2) the hard_suit, responsible for the response of the environment to the agents’ interactions in terms of physical variables, and (3) the link_suit, piping interactions between agent and environment, allowing the former to perform sensory measurement and/or actions on the physical objects. The granule interactions in the physical and mental worlds are generally exploited among agents that are mainly directed to the social evolution of huge communities of them, addressing the fundamental issues in Wisdom Technology (Jankowski and Skowron 2013), and are implemented in terms of ontology updating (Clark et al. 2013; Skowron et al. 2012), usually under the framework of Interactive Rough Granular Computing [IRGC (Skowron and Szczuka 2010; Kacprzyk and Pedrycz 2015)]. Rather, in this paper we focus on the interactions inside the single c-granule, and rule them in terms of a peculiar Fuzzy Inference System [FIS (Provotar et al. 2013)]. The common ground between the two perspectives lies in the adaptive judgment about interactive computations on c-granules, which allows an inductive and rational evaluation of user-system interactions (Jankowski et al. 2013).

Since their introduction by Zadeh (1965), fuzzy sets have been intended as a rigorous way of dealing with non-formalized knowledge falling in the sphere of the experience and intuition of the designers. In place of explaining why, membership functions comprise a clear way of describing the granularity of the information owned by them. Within the Granular Computing framework, fuzzy sets constitute the key ingredients of both the theory of approximate reasoning (Zadeh 1979), a powerful framework for reasoning in the face of imprecise and uncertain information, and its operational counterpart known as fuzzy rule systems as well (Pedrycz 2013).

In this paper, we use fuzzy rules to manage the intra-granule interactions, where the two main contributions we provide focus on stretching the Granular Information features to cope with the specific soft_suit of our c-granule, and on devising the Granular Constructs to deal with them within the link_suit, respectively. Namely, we consider fuzzy sets inside a universe of discourse that is hidden to the user and then offer a method for both identifying and controlling fuzzy rule systems based on this kind of fuzzy attributes. We call this method learning from nowhere, not to diminish its value, rather to remark on the extremely poor a priori information a Granular Construct may exploit during a training session, and extend the distal learning construct from the neural network to the fuzzy set framework to get rid of this drawback. We note that hidden universes of discourse and our c-granule constitute a realistic scenario that we may meet very frequently in our everyday life. In fact, we faced and resolved it in our own the European project SandS,Footnote 1 which deals with the conduction of common household appliances. We devote a great part of the paper to the setup and conduction of numerical experiments, due to the relevance of this challenging problem and to the consequent casting out nines it represents for a concrete suitability of some Granular Constructs.

Fuzzy rule systems represent the favorite tool for solving system control problems when fuzziness affects the description of their dynamics (Miu and Hayashi 2000). Even though different interpretations of logical connectives, fuzzy implication and compositional operators lead to a variety of well-known inference mechanisms ruling the semantics of a fuzzy rule-based system (consider for instance the Mamdami, Tsukamoto, Sugeno and Larsen reasoning models), its general expression may be synthesized in the following one (Fullèr 1998):

$$\begin{aligned} \begin{array}{ccc} {\mathbf {if}}\; x_1\;\text { is }\;A_{11}\;{\textsc {and}} &{} \ldots &{} {\textsc {and}}\;x_n\;\text { is }\;A_{1n}\;{\mathbf {then}}\; o\;\text { is }\;B_1\\ {\mathbf {if}}\; x_1\;\text { is }\;A_{21}\;{\textsc {and}} &{} \ldots &{}\;{\textsc {and}}\;x_n\;\text { is }\;A_{2n}\;{\mathbf {then}}\; o\;\text { is }\;B_2\\ \vdots &{} \vdots &{} \vdots \\ {\mathbf {if}}\; x_1\;\text { is }\;A_{k1}\;{\textsc {and}} &{} \ldots &{} {\textsc {and}}\;x_n \;\text { is }\;A_{kn} \;{\mathbf {then}}\; o\;\text { is }\;B_k \end{array} \end{aligned}$$
(1)

where \(A_{ij}\) and \(B_i\), for all \(i=1,\ldots ,k\) (k—the number of rules), \(j=1,\ldots ,n\) (n—the number of conditions), are fuzzy sets defined in the corresponding input and output spaces; \(x_i\) and o are input and output variables (possibly linguistic) corresponding to, respectively, the i-th condition and conclusion. In the above system, the premise of the i-th rule (prefixed by if) maps into its firing strength \(w_i({\varvec{x}})=\bigwedge _{j=1}^n \mu _{A_{ij}}(x_j)\), where \(\mu _{A}(x)\) denotes the membership grade of x to the fuzzy set A. In turn, together with the consequent \(B_i\) (prefixed by then), firing strength \(w_i({\varvec{x}})\) contributes to the individual rule output \(B'_i = w_i({\varvec{x}})\rightarrow \mu _{B_i}(o)\) and, through a proper aggregation operator, to the overall system output, computed as \(\bigvee _{i=1}^k B'_i\). The various instances of fuzzy inference schemes arise then from the different interpretations given to the above connectives and operators, as well as from the adopted shapes of membership functions (Fullèr 1998).

Within the mentioned project SandS, where we aimed at optimally and adaptively ruling in remote the microcontrollers of household appliances, we faced peculiar fuzzy rules which, re a bread maker for example, sound as follows:

  • if the loaf is slightly soft and soggy then increase baking time

  • if the loaf is very crusty and crunchy then decrease baking time

While in the first instance a Mamdani reasoning model (Mamdani 1977) might seem the most appropriate schema to frame the above rules—mainly due to its widespread acceptance and intuitiveness which makes it well suited to human input, a more in-depth investigation led us to prefer the Sugeno reasoning model (Takagi and Sugeno 1985). Here, the consequents are crisp variables to be computed through a weighted mixture of functions depending directly on the input variables; in formula:

$$\begin{aligned} o=\sum _{i=1}^k \overline{w}_i f_i({\varvec{x}},{\varvec{s}})=\frac{\sum _{i=1}^k w_i f_i({\varvec{x}},{\varvec{s}})}{\sum _{i=1}^k w_i} \end{aligned}$$
(2)

where \(f_i\) defines the activation function (for short Sugeno function) of the i-rule, whose shape, arguments and free parameters \({\varvec{s}}\) depend on the chosen model, and \(w_i\) denotes the satisfaction degree of the premise of the rule, i.e. its firing strength. Finally, \({\varvec{x}}\) and o refer, respectively, to the crisp input variables and the overall output of the system.

Our preference for this model was mainly motivated by the possibility of injecting prior knowledge in the ruleset, specifically in terms of rules and variables identification, and Sugeno function definition as well, thanks to specific competences and expertises held by household manufacturers and expert practitioners in the field. For greater clarification, returning to the two aforementioned rules, let’s consider the pictorial system schema shown in Fig. 1. Recent studies in the bread-making process (Mondal and Datta 2008) have taught us that to have, for instance, a crunchy crust, we have to prolong the baking time and lower the baking temperature but not intervene directly in the leavening process. This information allows us not only to select a useful subset of rules fine for meeting user needs, but even to identify the shape and possibly some parameters of the involved Sugeno functions.

Fig. 1
figure 1

A learning from c-granules instance. In the picture, \(x_1\) and \(x_2\) refer, respectively, to the fuzzy variables crustiness and moistness; the fuzzy sets \(A_{11}\) and \(A_{12}\) refer to the linguistic variables “slightly soft” and “soggy” (i.e. “poorly crusty” and “very moist”), while \(A_{21}\) and \(A_{22}\) to “very crusty” and “crunchy” (i.e. “not much moist”), respectively; the output \(o_j\) of each Sugeno function describes the numeric contribution of the j-th rule to the baking time. All contributions are gathered in the output of the system through (2)

Of course, this choice is not free from complications. The first drawback, which in turn shows up in the distinguishing features of our fuzzy sets—the first main contribution of our paper—is linked to the fact that some input variables (such as the above crustiness and moistness) live in a non-metric space, in the sense that the user is not able to attribute a specific value to them. In fact, when the user says that “the loaf is very crusty”, even if s/he distinguishes different kinds of crustiness falling within the same quantifier “very crusty”, s/he has no reference metric to identify its specific value. Bread crustiness is a subjective attribute not simply as it may vary on a per user basis; rather, the same user may attribute to a same crustiness level the term “very crusty” in certain contexts/recipes, and “low crusty” in other ones. Vice versa, the same quantifier “very crusty” may incorporate a wide spectrum of crustiness levels, whose relation is hidden from the user her/himself and dynamically changes according to external non-measurable factors. In other words, crustiness level \(x_j\) exists, but its value cannot be objectively set by the user, who in turn limits her/himself to assigning it a quantifier s/he retains suitable during a particular period of her/his life. Nevertheless, to identify a suitable value of the output o in (2) (both to compute the satisfaction degree of the premise and to instantiate the Sugeno function in the consequent) we must estimate the values of such hidden variables—a problem that per se is not far from the one of identifying the state transition matrix in the Hidden Markov Models (Rabiner and Juang 1986); it does, however, run into special difficulties given the control framework where it is embedded. The second main contribution of this paper consists in overcoming these difficulties.

Actually, the typical taxonomy for Granular Constructs  (Apolloni et al. 2008) shown in Fig. 2 considers the two alternatives where the antecedent is either a crisp variable to be framed into a fuzzy set by the interface (first row in the picture), or the fuzzy set as a whole (second row). Our case is an intermediate one, where we need the crisp variables both to compute the satisfaction degree of the premise and to instantiate the Sugeno function in the consequent. But we do not know their values. Typical approaches for further stressing the fuzziness of some variables are represented by type 2 fuzzy sets (Mendel and John 2002), where membership degrees are in turn fuzzy sets, and by higher order fuzzy sets (Pedrycz 2013), where the universe of discourse of a fuzzy set is in turn made up of fuzzy sets. Our approach is, rather, reminiscent of the fuzzy set calibration introduced in Pedrycz et al. (1997). In that case a proper non-linear mapping is learned from an original universe of discourse to another one that may prove more suitable for supporting the fuzziness of some attributes. In our case this mapping remains implicit, since we do not know the original universe and aim directly to infer the single elements of it which support the questioned attributes.

Fig. 2
figure 2

Four fundamental modes of the use of fuzzy models (Apolloni et al. 2008)

The identification of the crisp variables—the core of the hard_suit component of our c-granule—is further complicated by the special features of the Sugeno functions \(f_i\)s in two respects. On the one hand, in the absence of expert knowledge, we dare to use functional forms, such as linear or quadratic ones, to interpret consequents like “increase/decrease” of the baking time. Although a common plague of the Sugeno approach (Cococcioni et al. 2007), it reflects a fortiori the robustness and flexibility of this model, capable of fitting highly non-linear trends in spite of the simple (possibly linear) regressors each rule produces in output. On the other hand, we do not know the relation between the operational parameter (the baking time) and its effect on the crustiness and moistness appreciation provided by the user. This entails a non-trivial identification problem to be faced in a way reminiscent of distal learning in neurocontrol (Jordan and Rumelhart 1992) with a two-phase (identification and control) algorithm. In particular, the operational parameters (i.e. baking time and temperature, leavening time, etc.), being directly controlled by the learner, play the role of proximal variables in the distal learning framework. Actually, these variables are the only levers the learner can move to modify the bread consistency and quality. Conversely, user judgments (re baking crustiness, for instance) are distal variables that the learner controls indirectly through the intermediary of the proximal variables. The only assumption the whole system relies on is that the target values of user judgments be always available; actually, these requirements are always straightforwardly met: independently of the true (unknown) level of crustiness, in case the user is satisfied with the crust consistency, s/he will not request any further adjustment aimed at increasing/decreasing its texture.

To give some hints on the information flow characterizing the proposed FIS together with the interrelate role played by both proximal and distal variables, let’s consider a very simplistic system composed of the following single rule:

if :

baking time is somewhat high and baking temperature is very high and the loaf is slightly soft and soggy

then :

increase baking time very much and lower baking temperature a bit

Firstly, both operational crisp parameters (baking time and temperature) and user fuzzy judgments (bread softness and sogginess) are provided in input to the rule, where the former characterize the current status of the system to be controlled (i.e. the bread machine), while the latter represent the user appreciation of the bread produced by the device in its actual configuration. Then the rule controls the two operational parameters by invoking two Sugeno functions aimed at estimating how the parameters should be modified to meet user requirements. Proceeding with the lead example, we find that the output of this rule might be “increase the baking time by 12 seconds and lower the temperature by 10 degrees”. The person determining whether or not the proposed settings of the system are optimal, or at least preferable w.r.t. previous ones, is the user who, after having tasted the home-baked bread, is called upon to evaluate its softness and sogginess through two fuzzy judgments. The latter will have a twofold role in the learning process: they represent both the target to be achieved by the system and at the same time its fuzzy input for the next trial. As for the former, the system will use these values to tune the way operational parameters are modified so as to achieve a configuration satisfying the user tastes, corresponding to the barycentric position w.r.t. the two evaluations. As for the latter, the FIS per se may be considered as a dynamic system where learner, user, and indirectly the bread machine, continually interrelate with another until an optimal configuration is reached. So, the input of the system at the next time step will be constituted both by the operational parameters obtained in output at the previous iteration and the same user fuzzy judgments playing the role of target values at the previous time step.

At this point, some remarks are noteworthy.

  1. 1.

    We are free to select the structure of the rules, with the aim of achieving a suitable balance between informativeness of the system and its trainability, where the former calls for a high number of rich rules; vice versa for the latter. On the one hand, we may subdivide the overall FIS in various parts, called clusters henceforth, each one deputed to tune a single operational parameter. On the other hand, while letting the target be constituted by all the available judgments, to reduce the number of rules we may consider as input only the operational parameter to be tuned in the cluster the rule belongs to.

  2. 2.

    The extra complexity w.r.t. a standard FIS is constituted by learning from nowhere the value of those variables related to user evaluations. The true values of bread softness and crustiness, while hidden from both FIS and user her/himself, are needed to compute both the rule strength and the output of the Sugeno function. The aim of the whole process is then to learn these values along with the parameters the FIS is composed of, so as to meet user requirements. Obviously, we may bargain with respect to the shape of the membership functions and the non-linearities of the scales of these variables. However, as to the two extremal alternatives: \(\langle \)linear scale (no matter the meaning of the variables) and highly complex membership functions\(\rangle \), \(\langle \)unknown scale (we just infer the coordinates of the sample points), elementary membership functions (such as triangular or Gaussian)\(\rangle \), implicitly in line with (Pedrycz et al. 1997), we prefer the latter, as a sort of kernel trick (Hofmann et al. 2008) applied to fuzzy rules.

  3. 3.

    To tune the system according to the user judgments, our extension to the fuzzy rules of the two-phase algorithm proposed in the distal learning framework (Jordan and Rumelhart 1992) consists of an initial task where the learner identifies the system by discovering, through a classical supervised learning task, the mapping between operational parameters and user judgments (hence the term identification phase). After having identified the system, the learner switches to a control phase, aimed at regulating the system so as to satisfy user needs.

  4. 4.

    No assumption is made on the expertise level and competencies of the user. In the lead example, s/he plays the role of bread tester: taking a loaf out of the bread maker, s/he relies only on her/his palate and sense of taste to evaluate the bread’s consistency. In turn, the inference system should be able to adapt to the user needs even in case of out-of-the-ordinary comments and judgments. Users should not to be confused with experts, such as appliance manufacturers. Contrary to the former, the latter may directly inject some prior knowledge to the system, by suggesting, for instance, the most adequate rules or the shape of Sugeno functions, as explained in the next section.

In our paper we implement our c-granule in two scenarios: (1) a case study where, using a graphical tool, we are challenged to beautify a face whose morphological parameters are computed, on the basis of user evaluations based on four criteria, by a FIS trained on a relatively small and noisy dataset; and (2) the actual assessment of the SandS system on a bread maker.

The paper is organized as follows. In Sect. 2 we formalize the training problem and specify the adopted options. In Sect. 3 we describe the two experimental scenarios, framing the latter within the SandS project, while in Sect. 4 we discuss the numerical results. Conclusions are drawn in Sect. 5.

2 The formal framework

Having framed our inference instance in the IGC paradigm in the Sect. 1, we focus in this section on its solution in terms of the training of a FIS that we characterize by:

  • \(n_c\) crisp variables \(y_i\)s, associated to the operational parameters;

  • \(n_f\) fuzzy variables \(g_i\)s, each described by r linguistic quantifiers, associated to the task execution evaluation;

  • k rules, grouped in \(n_c\) clusters each containing \(k_c=r^{n_f+1}\) rules, where each cluster is responsible for the update of the single operational parameter, and each rule consists in principle of \(n_f+1\) antecedents (the \(n_f\) fuzzy variables plus the operational parameter the cluster refers to).

Framing the lead example in the above context, Fig. 1 shows two out of the \(k_c\) rules constituting the cluster responsible for the baking time, where we have \(n_f=2\) fuzzy variables \(g_1\) and \(g_2\) (loaf crustiness and humidity) but no crisp variable. Actually, according to the system architecture described in the above bullet list, a realistic model, such as the one adopted in our experiments, would include as input even the crisp variable baking time \(y_1\), after a suitable fuzzification. In the experiments we fixed \(r=3\) as a suitable compromise between richness of the model and manageability in operational terms. For instance, the three fuzzy sets associated to loaf crustiness have been chosen as: poorly, regular, and very crusty.

For purpose of clarity, we structure our method according to the standard ANFIS architecture (Jang 1993), which proves well suited to implementing the Sugeno model. It is composed of 6 layers (see Fig. 3), which from left to right complete the following tasks: (1) input coding, (2) computing the related membership degrees to the rules’ antecedents, (3) synthesizing them into overall satisfactions of the single rule antecedents, (4) normalizing the above satisfactions on the rule set, (5) computing the effect of the single rules on the consequent according to the Sugeno functions, and (6) combining the single effects into the rule system consequents.

Fig. 3
figure 3

The proposed neuro-fuzzy system architecture

In greater detail, the computations in the various layers are as follows:

  • layer 0, where we enter the \(n=n_c+n_f\) input variables \({\varvec{x}}=({\varvec{y}},{\varvec{g}})\), is complicated by the hidden features of those underlying the fuzzy sets (namely, the judgments \({\varvec{g}}\)). This resolves in the addition of learned shift values \({\varvec{\delta }}\) to initial ones, where the latter may be either the nominal values of the fuzzy numbers used by the tester or completely random numbers (in a reasonable range) computed by the system. The output of this layer is a vector \({\widetilde{\varvec{x}}}\), where:

    $${\widetilde{x}}_i= {\left\{ \begin{array}{ll} x_i &{}\quad \mathrm {if\ } x_i \mathrm {\ is\ a\ crisp\ variable\ } y_i\\ x_i+\delta _i &{}\quad \mathrm {if\ } x_i \mathrm {\ is\ a\ fuzzy\ variable\ } g_i \end{array}\right. }$$
    (3)

    Returning to the lead example and using as initial values for the fuzzy sets their nominal values, with reference to the second rule we note that the fuzzy variable \(g_1\equiv \) very crusty translates into the crisp value obtained by adding to the mode of the fuzzy set \(A_{22}\) the learned shift parameter \(\delta _2\). In turn the mode is learned as well by the neuro-fuzzy procedure;

  • in the fuzzification layer 1, where the membership functions (m.f.s) are introduced, we focus only on triangular functions \(\mu _{{A_{\{a,b,c\}}}}({\widetilde{x}})\) and asymmetric Gaussian-like functions \(\mu _{{A_{\{\nu ,\sigma _l,\sigma _r\}}}}({\widetilde{x}})\) which require only three parameters to be specified (Dubois and Prade 1988); namely:

    $$\begin{aligned} \mu _{{A_{\{a,b,c\}}}}({\widetilde{x}})= {\left\{ \begin{array}{ll} \frac{{\widetilde{x}}-a}{(b-a)} &{} \text {if }\;{\widetilde{x}}\in [a,b] \\ \frac{c-{\widetilde{x}}}{(c-b)} &{} \text {if }\;{\widetilde{x}}\in (b,c] \\ 0 &{} \text {otherwise} \end{array}\right. }\quad \mu _{{A_{\{\nu ,\sigma _l,\sigma _r\}}}}({\widetilde{x}})= {\left\{ \begin{array}{ll} \mathrm e^{-\frac{({\widetilde{x}}-\nu )^2}{\sigma _l ^2}} &{}\; \text {if }\;{\widetilde{x}}\le \nu \\ {\mathrm{e}}^{-\frac{({\widetilde{x}}-\nu )^2}{\sigma _r ^2}} &{}\; \text {if }\;{\widetilde{x}}>\nu \end{array}\right. } \end{aligned}$$
    (4)

    Each membership value reflects how much each input variable matches with the corresponding fuzzy set;

  • in layer 2, we compute the satisfaction degree \(w_j\) of the premise of the j-th rule (the conjunction of the antecedents) as the product of the membership degrees of the metric variables, i.e. using the product T-norm. Remembering that \(A_{ji}\) refers to the m.f. fuzzifying the i-th variable in the j-rule, we have:

    $$\begin{aligned} w_j =\prod _{i=1}^n \mu _{A_{ji}}({\widetilde{x}}_i) \end{aligned}$$
    (5)

    The firing strength of the j-rule reflects how much the variables in input to that rule satisfy its premises;

  • in layer 3, the normalized satisfaction degree \(\overline{w}_j\) is computed as usual by dividing a single degree by the sum of degrees in a same cluster c consisting of \(k_c\) rules:

    $$\begin{aligned} \overline{w}_j =\frac{w_j}{\sum _{z=1}^{k_c} w_z} \end{aligned}$$
    (6)

    The in-cluster-based normalization is a powerful tool contributing considerably to injecting non-linearities in the overall system, all while preserving the comparable role held by each operational parameter in the task execution;

  • in layer 4, Sugeno functions \(f_i\) are computed as a function of the crisp variables \({\widetilde{\varvec{x}}}\) computed in layer 0 (both \({\varvec{y}}\) and \({\varvec{g}}\), after having learned the shift values modulating the latter). In particular, \(f_i\) is composed of a linear combination of suitable (possibly different) scalar functions \(\varsigma \) of each input variable (for instance the identity function, power laws, \(\log \), \(\exp \), etc.), where the weights \({\varvec{s}}\) of the linear combination constitute the free parameters to be suitably learned. In the lead example, after having (either randomly or with some expert advice) decided on the shape of scalar functions \(\varsigma \)s, the resulting Sugeno function, which reads \(f_2({\widetilde{\varvec{x}}},{\varvec{s}}_1)=s_{21}{\widetilde{x}}_1+s_{22}{\widetilde{x}}_2^2\) (in case a linear and a quadratic scalar function \(\varsigma \) were chosen for, respectively, \({\widetilde{x}}_1\) and \({\widetilde{x}}_2\)), provides the numerical contribution of the second rule to the baking time, reflecting the action to be taken according to the satisfaction degree of its premises. Finally,

  • in layer 5, the outputs of the Sugeno functions are summed in each cluster c with a weight equal to the normalized satisfaction degree computed at layer 3:

    $$\begin{aligned} o_c=\sum _{j=1}^{k_c} \overline{w}_j f_j({\widetilde{\varvec{x}}},{\varvec{s}}_j) \end{aligned}$$
    (7)

    In other words, the more a rule is compatible with the input instance, the higher will be its contribution to the overall output—which reads as the value assigned to the baking time during the next task execution in the lead example.

To sum up, each cluster c in the system, in input the value of both the operational parameters \({\varvec{y}}^{(t)}\) and the task execution evaluation \({\varvec{g}}^{(t)}\) (or suitable subsets of them, depending on the operational task to be faced), computes as output the operational parameter \(o_c^{(t+1)}\equiv y_c^{(t+1)}\) at the next time step, which in turn receives user evaluation in terms of the fuzzy variables \({\varvec{g}}^{(t+1)}\). For the sake of conciseness, we will drop the temporal index in the notation whenever no ambiguities arise. Moreover, due to the twofold role played by the operational variables \(y_c\) in terms of both input and output of the system, we still refer to the latter with \(o_c\).

We address the entire training task in terms of a back-propagation algorithm (Werbos 1994), hence a long derivative chain as the following:

$$\begin{aligned} \frac{\partial E}{\partial \theta }=\frac{\partial {E}}{\partial {g}}\cdot \frac{\partial {g}}{\partial {o}}\cdot \frac{\partial {o}}{\partial {\theta }} \end{aligned}$$
(8)

where \(\theta \) represents the generic parameter to be learned in the premises and consequents of the FIS, and where scalar symbols were used to lighten the notation. In particular, the error E takes a twofold expression depending on the learning phase (see Table 1). It is the canonical mean square error between original signal \(\tau \) (the target) and reconstructed signal o in the identification phase, where the former is the value of operational parameters (i.e., baking time in the lead example) which we expect for a given input to the FIS, while the latter is the output proposed by the system. Although in principle the original signal \(\tau \) may be suggested by household appliance manufacturers, some remarks on its effective identification in real-world scenarios will be given in the experimental Sect. 3.

On the contrary, in the control phase—which is devoted to the tuning of the FIS according to user needs after suitable identification of the same—error E is assumed to be the square of the task execution evaluation (judgment g).

Table 1 The analytics of the two-phase learning procedure

This choice is motivated by the fact that the set point of g, i.e. the judgment corresponding to complete satisfaction of the user, is 0, while positive/negative deviations from it encompass direction and magnitude of user dissatisfaction. In the lead example, while a consistency of the bread crust satisfying user tastes will be described by the fuzzy judgment “properly crusty”, even though its true value remains hidden from both user and system, we assume it to be 0 only in case of full satisfaction (i.e. perfect crust consistency). Each evaluation subtending a consistency crustier than the optimal will be associated to positive shifts from the set point; vice versa in the case of an excessively soft crust. Being the only criteria ruling the learning from nowhere inference framework, these order relations introduced on the fuzzy judgments prove to be essential requisites for the success of the inference task.

Note that in the identification phase g is a dummy variable, so that \(\frac{\partial {E}}{\partial {g}}\cdot \frac{\partial {g}}{\partial {o}}\) in (8) contracts in \(\frac{\partial {E}}{\partial {o}}\), with o, the cluster output, computed as in (7), whereas in the control phase \(\frac{\partial {E}}{\partial {g}}\) is g itself, modulo a constant. The last derivatives of o w.r.t. all the underlying parameters identifying both membership functions in the premises and regression coefficients in the Sugeno functions are computed as usual [see Masulli et al. (1996); Casalino et al. 1998] for a revisiting of [Nauck et al. (1997), Ch. 8.3]. It is worth noting that in addition to the above parameters, since the true values underlying the fuzzy sets responsible for the user evaluation are hidden, we added another parameter, namely the shift \(\delta \), to the nominal value of the fuzzy number g [i.e. the central vertex b of the triangular m.f.s or the mean \(\nu \) of the asymmetric Gaussian-like m.f.s in (4)]. While its derivative computation is analogous to the one performed on the nominal value of the fuzzy set, we remind the reader that, contrary to the latter which is associated to a single membership function, each shift \(\delta \) refers to each user judgment. So, if the user interacts with the system for m times through a total of \(n_g\) evaluation variables, then the total number of \(\delta \)s parameter will be \(m \times n_g\). This fact, which confirms the intrinsic complexity of the learning task w.r.t. standard neuro-fuzzy architectures like ANFIS, highlights how each user judgment is totally decoupled from previous individual evaluations. In this acceptation, the system cannot rely on the existence of presumed relations between two consecutive “very crusty” comments, as this type of linguistic quantifier may refer to two completely different ideas of bread consistency.

As for training modality, we distinguish between either batch mode or cyclic mode (Heskes and Wiegerinck 1996) in the identification phase, thanks to their efficiency and learning stability, and online mode in the control phase, which in turn matches with the real-time mechanism adopted to obtain new instances \(\langle \)operational parameters, user evaluations\(\rangle \).

Moreover, according to the principle of justifiable information granularity (Pedrycz 2011), which states that a fuzzy set should reflect (or match) the available experimental data to the highest extent, all while being specific enough to come with a well-defined semantic, we further rearranged the inputs \({\widetilde{x}}_i\)s through a proper transformation so as to make them well framed in the corresponding fuzzy sets by filling their support.

Fig. 4
figure 4

a Triangular, and b asymmetric Gaussian-like fuzzy sets corresponding to the linguistic quantifier “low crisp”, “properly crisp”, and “very crisp”. Gray and black bullets represent the history of user interactions with the system, respectively, before and after applying the principle of justifiable information granularity

Namely, consider the fuzzy sets shown in Fig. 4a where the three triangular m.f.s are related to the linguistic quantifiers “not no crisp”, “properly crisp”, and “very crisp”, while the bullets refer to inputs \({\widetilde{x}}_i\)s processed by the system during a history of user interactions. An initial strategy consists in rescaling the inputs \({\widetilde{x}}_i\)s in one-shot (at each learning cycle) so that their \(\gamma \) and \(1-\gamma \) empirical quantiles would coincide, respectively, with the fuzzy maxmin and minmax of the three fuzzy numbers (Dubois and Prade 1988). This holds true, respectively, for a proper choice of \(\gamma \), where maxmin gives the maximum of the left extremes delimiting the support of the fuzzy sets, while minmax the minimum of the analogous right extremes. In particular, having introduced the affine transformation \(\rho ({\widetilde{x}})=\alpha {\widetilde{x}}+\beta \), at each learning cycle we obtained \(\alpha \) by computing \(-\frac{q_{1-\gamma } \textsf {maxmin} -q_{\gamma } \textsf {minmax}}{q_{\gamma }-q_{1-\gamma }}\), and \(\beta \) through \(-\frac{\textsf {minmax}-\textsf {maxmin} }{q_{\gamma }-q_{1-\gamma }}\), where \(q_\gamma \) is the \(\gamma \) empirical quantile of the inputs \({\widetilde{x}}\)s. Black bullets in Fig. 4a show the results of the rescaling applied to the gray bullets when \(\gamma \) was set equal to 0.3.

The second strategy, which took the form of an incremental procedure, was the default choice in our experiments, having the obvious advantage to avoid abrupt oscillations in the training process. It is depicted in Fig. 4b in the case of asymmetric Gaussian-like m.f.s. Namely, having introduced for each linguistic variable two translation parameters \(\beta ',\gamma '\) and one scaling parameter \(\alpha '\), we applied the affine transformation:

$$\begin{aligned} \rho '({\widetilde{x}})=\alpha '({\widetilde{x}}-\beta ')+\gamma ' \end{aligned}$$
(9)

to the shifted input value \({\widetilde{x}}\) of each linguistic variable \(g_i\) so as to match both position and spread of all the currently learned m.f.s. In particular, we considered a mixture \(\mathcal M\) of asymmetric Gaussian random variables (John 1982), which differs from the companion set of m.f.s only in the normalization factor, and computed its mean \(\nu _{\mathcal M}\) and standard deviation \(\sigma _{\mathcal M}\) (Robertson and Fryer 1969). At each system evaluation, through a gradient-descent procedure, we gradually modified translation and scaling parameters in (9) to minimize the sum of the mean square errors between the two aforementioned statistics \(\nu _{\mathcal M}\) and \(\sigma _{\mathcal M}\) and their sample realizations computed on the observed data. In Fig. 4b the position of black bullets is the output of several steps of the proposed incremental procedure.

At this point two remarks are noteworthy. Firstly, re the sole user evaluation variables gs, the output (3) of layer 0 must be further processed so as to include the chosen transformation (\(\rho \) or \(\rho '\)) as the last step. Secondly, we avoided to operate a local rescaling on the single fuzzy sets (e.g. “poorly crisp”) since we would be obliged to make each point \({\widetilde{x}}_i\) belong to a single fuzzy set. While in principle this operation can be straightforwardly performed by attributing a point to the fuzzy set whose membership grade is maximal, this assignment is not free from introducing biases in the learning process. This is due to the continuous adjustment of m.f. parameters and input position during learning.

In conclusion, focusing on asymmetric Gaussian-like m.f.s, our inference concerns the parameter vector \(\varvec{\theta }\), with:

$$\begin{aligned} \varvec{\theta }=\left( \nu _{ji},\sigma _{lji},\sigma _{rji}, \delta _{pi},s_{cji} \right) \end{aligned}$$
(10)

where \(\nu , \sigma _l\) and \(\sigma _r\) denote, respectively, mean, left and right standard deviation of the Gaussian m.f. as in (4), \(\delta \) is the aforementioned shift of each fuzzy input variable, s reads as the generic regression parameter of the Sugeno function, while cj and i are indexes coupled, respectively, with the output units, the rules, and the input variables. In the identification phase their increments are ruled as shown in Table 2. Rather, in the control phase the error function reads:

$$\begin{aligned} E = \sum _i^{n_f}\left( g_i^{(t+1)}\right) ^2 \end{aligned}$$
(11)

where \(g_i^{(t+1)}\) is the i-th evaluation provided by the user at time \(t+1\).

Table 2 Back-propagation-like equations ruling the identification phase of the proposed FIS

Here, the derivative \(\frac{\partial {g^{(t+1)}}}{\partial {o^{(t+1)}}}\) is the most critical part of the chain rule (8), since we do not know the mapping linking the FIS output o to the user judgment g. Actually, this mapping is user-dependent, as s/he is the sole actor responsible for this association—consider for example the evaluation as “too crusty” of the currently tasted loaf. Therefore, we propose either:

  • an analytical solution in the reciprocal of the derivative \(\frac{\partial {o^{(t+1)}}}{\partial {g^{(t)}}}\), as clear from the identification of \(o^{(t+1)}\), or

  • a numerical solution in the ratio of the differences at consecutive time steps of the two quantities, i.e. \(\frac{g^{(t+1)}-g^{(t)}}{o^{(t+1)}-o^{(t)}}\), to bypass identification errors.

The feasibility of the former solution, which in terms of the chain rule (8) reads as:

$$\begin{aligned} \frac{\partial {E}}{\partial { \theta }} = \eta \displaystyle \sum _i^{n_c}\sum _{i'}^{n_f}2g_{i'}^{(t+1)}\left( \frac{\partial o_i^{(t+1)}}{\partial g_{i'}^{(t)}}\right) ^{-1} \frac{\partial o_i^{(t+1)}}{\partial \theta } \end{aligned}$$
(13)

comes directly from the FIS architecture, having in the judgment g at time t one of the inputs of the system, and in the operational parameter o at time \(t+1\) exactly its output. Indeed, referring g to a different time step (exactly the previous one), we see that it suffers from the typical plague of being identified in a domain that may be far from the one where o will be applied during the control phase—a drawback typically overcome by alternating the two phases (Wu et al. 1992). However, remark to have never been forced to apply this strategy in our experiments.

The second solution

$$\begin{aligned} \frac{\partial {E}}{\partial { \theta }} = \eta \displaystyle \sum _i^{n_c}\left( o_i^{(t+1)}-\tau _i\right) ^2\sum _{i'}^{n_f}2g_{i'}^{(t+1)}\frac{g_{i'}^{(t+1)}-g_{i'}^{(t)}}{o_i^{(t+1)}-o_i^{(t)}} \frac{\partial o_i^{(t+1)}}{\partial \theta } \end{aligned}$$
(14)

although temporally consistent, may deserve overfitting, and in any case suffers from the cold start problem.

3 The experimental framework

Granular computing gathers a set of constructs and procedures which have arisen with the goal of matching computational intelligence tools with concrete operational frameworks (Pedrycz and Chen 2011). The interactions of information granules with the physical world and their relation to perception of interactions in the physical world covered by the c-granule paradigm pave the way to concrete solutions of real-world problems. Strictly along these lines, the main experiment testbed of our procedure is framed within the European project SandS, which is aimed at building up a physical and computational networked infrastructure allowing household appliances to better meet the needs of their owners. This is achieved through finely tuned instructions dispatched by a social network to the appliances, in turn equipped with intelligent functionalities. In these settings, the goal is to learn a fuzzy rule system capable of producing proper recipes for our household appliances, where a recipe is the sequence of instructions which completely define the working cycle of these appliances. As an example, in the context of a loaf taken out of the bread maker, an excerpt of a recipe could be identified by the following \(\langle \)parameter, value\(\rangle \) pairs:

$$\begin{aligned} \begin{array}{ll} \langle \mathrm{first\;leavening\;time,}\;60 \mathrm{min} \rangle \\ {\ldots }\\ \langle \mathrm{baking\;time,}\;150 \mathrm{min} \rangle \\ \langle \mathrm{baking\;temperature,}\;190^\circ \rangle \\ {\ldots }\\ \end{array} \end{aligned}$$

The goal is to produce recipes that exactly meet needs and preferences of the appliance owner, where the latter are expressed in terms of evaluation on the product/service provided by the appliance, such as loaf moistness and crustiness in the lead example.

In doing this, we have a drawback to eliminate and a challenge to meet.

Fig. 5
figure 5

The online training cycle

The drawback concerns the dynamic feature of the entire training procedure. Indeed, since we aim to learn how to get a better user evaluation, we must embed the training phase inside an overall cycle where:

  1. 1.

    a user asks for a task execution;

  2. 2.

    the social network computes the related recipe as a result of previous recipes and related evaluations, and dispatches it to the appliance;

  3. 3.

    the appliance executes the recipe; and

  4. 4.

    the user issues an evaluation on the executed task, so closing the loop (see Fig. 5).

The problem is that step 3 is definitely time consuming (generally, in the order of hours), thus slowing the entire procedure down. This is a pitfall and a severe benchmark as well, since it is met very frequently when we want to apply the soft computing paradigm to real scenarios and not to simulated ones. Hence, the success in managing this problem will decree the success of the entire paradigm; in turn, rendering this procedure successful is the aforementioned challenge that we met with a two-step strategy. On the one hand, we assessed and validated our FIS paradigm on a case study—namely, the face beautifying task—which has the same features yet almost null recipe execution time (see Sect. 3.1). On the other hand, after a brief explanation of the SandS project and architecture, we apply in Sect. 3.2 the procedure to the actual task of producing recipes for our appliances.

3.1 The case study: face beautification

Fig. 6
figure 6

Morphing of a face

The face beautifying task is described in the following algorithm.

  • Start from a picture of a slightly deformed face like in Fig. 6a;

  • do

    1. 1.

      a tester is requested to express a \(\{-5,\ldots ,+5\}\) Likert evaluation (Trochim 2006) of the picture w.r.t. the four criteria listed in Table 3b;

    2. 2.

      on the basis of these evaluations, the FIS algorithm (replaced by a human operator in the first 120 runs) may change the anthropomorphic parameters reported in Table 3a;

    3. 3.

      a graphical routine updates the face picture according to these changes, obtaining variations like in Fig. 6b;

    4. 4.

      the new picture is supplied to the tester for a new evaluation, so that the procedure continues from step 1;

  • until either a satisfactory picture is achieved (not necessarily close to the original one, see Fig. 6c), or a given stopping condition is reached.

Table 3 List of anthropomorphic parameters and evaluation criteria employed in the face beautification task (Van Dongen 2014)

Interested readers may play with the online toolFootnote 2, meanwhile contributing to the enrichment of the continually growing benchmark. At the same time, they may have first-hand experience of the intrinsic complexity of the task. In fact, despite the immediacy and at first glance apparent local scope of the anthropomorphic parameters, actually they have been planned so as to be responsible for a global morphing of the face. They are also highly correlated with one another, so that a slight modification of one parameter has an effect on face elements in principle ruled by any other parameters. Analogously, the employed evaluation criteria are not simply a matter of subjective interpretations. In addition, their relation with anthropomorphic parameters is highly non-linear and non-univocal, in the sense that various parameter settings may collapse in the same evaluation, even when performed by the same user.

Thanks to the online tool, at a rate of 20 iterations per hour, we may rely on circa 1000 triplets \(\langle \)task, recipe, evaluation\(\rangle \) in approximately one week, where:

  • tasks may be differentiated on the basis of both the involved tester and the face to be enhanced: we may in fact suppose that the evaluation answers differ both from tester to tester on a same picture, and from face to face on the same tester;

  • recipes will be generated through slight modifications of the current anthropomorphic parameters, thanks to fuzzy rules properly tuned by the learning algorithm on the basis of the tester fuzzified evaluations. An excerpt of a prototypical recipe could be:

    $$\begin{aligned} \begin{array}{l} \langle \mathrm{face width,} +3.26 \rangle \\ \langle \mathrm{face height,} -2.41 \rangle \\ \ldots \\ \langle \mathrm{nose length,} +1.63 \rangle \\ \end{array} \end{aligned}$$

    where we used as value of each pair the shift the corresponding anthropomorphic parameter should be affected by;

  • users provide their evaluations at each image modification through a suitable Likert scale.

In this way we may plan to gather operational indications for the training of FIS on our actual appliances’ testbed. Namely, in both cases the evaluation criteria, expressed in the discrete range \(\{-5,\ldots ,+5\}\), guide the learning system in the selection of a dozen operational parameters, each affecting the results of the process (pleasant face in one case, good bread, soft laundry, etc. in the other one). Analogously, since the relation underlying user evaluation and operational parameters is unknown, we will rely on a set of rules, synthesizing a general wisdom, whose premises are mainly fuzzy while their consequences are metric. Also for the face experiment we will use the proposed special Sugeno system where some of the metric variables in input to the premises (precisely those referred to the user evaluations) are hidden.

3.2 The goal testbed

In an extreme synthesis, we are setting up the SandS ecosystem (Apolloni et al. 2013, 2015) which deals with a social network aimed at producing recipes with tools of computational intelligence, to be dispatched to household appliances grouped in homes via a domestic WiFi network. A recipe is a set of scheduled, possibly conditional, instructions (hence a sequence of evaluated parameters such as water temperature or soak duration) which completely define the running of an appliance. They are managed by a home middleware to be properly transmitted to the appliance via suitable protocols.

Fig. 7
figure 7

The SandS super-mom

The entire contrivance is devised to optimally carry out usual housekeeping tasks through a proper running of home appliances with a minimal user intervention. Feedbacks are sent by the housekeepers and appliances themselves to a networked intelligence module (in the role of a virtually electronic super-mom in Fig. 7) to close the continuous recipe optimization loop. In this cadre, experts and appliance manufacturers may contribute with off-line advices and suggestions. An electronic board interfaces each single appliance to the home middleware.Footnote 3

Table 4 An excerpt of washing machine recipe

An excerpt of a recipe for washing machine is reported in Table 4. It consists of two tables; with the former we fix the parameters that characterize the working cycle to be executed by the machine; with the latter we list the sequence of instructions that are sent step by step to the machine to implement this working cycle.

A relevant aspect concerns the network intelligence. In our approach, starting from the most suitable baseline recipe, we modify some of its parameters on the basis of the feedback it received on its execution. We do so through a rule set that has been trained on the log of the triples \(\langle \)task, recipe, evaluation\(\rangle \). In this way we translate our inference problem on the recipe parameter variations into the one of identifying the rules generating the recipes. In addition, since the underlying rules are based in great part on vague evaluations expressed by the user both on the past recipe performance and on the current task specification, also in this case we fall in the field of fuzzy rule systems with hidden variables, proper to the learning from nowhere framework. Actually, analogously to Table 3, in Table 5 we list the operational parameters and evaluation criteria in the bread-making task.

Table 5 List of operational parameters and evaluation criteria employed in the bread-making task (Mondal and Datta 2008)

4 Numerical results

4.1 The face beautification experiment

At present, we have performed two kinds of experiments by computing either a subset or the entire set of face parameters. Moreover, since the procedure we propose is relatively new, we focused on the same face beautifying task to conduct many numerical investigations on the constituent steps of the algorithm, with special focus both on the positioning of the hidden variables under the fuzzy sets and on the identification of the Sugeno functions and their inverses too.

4.1.1 Turing like test

As for the former, like in the Turing Test (Turing 1950), in our experiment we let 6 parameters—mainly related to nose and eyes shaping—be tuned by the user, and the other 5—regulating the overall face dimensions and mouth—be produced by our fuzzy rule system, with the aim that the two families of parameters would result indistinguishable to an external observer. First of all, we obtained a suitable training set by letting users interact with the online face beautification tool.Footnote 4 Namely, each user drove the beautification process by cyclically adjusting the 11 anthropomorphic parameters and then evaluating the pleasantness of the resulting face. In this way, we got a collection of 121 records, each deputed to describe each beautification task through a list of \(\langle \)face parameters, evaluation\(\rangle \) histories. On the basis of the above training set, the identification phase proved satisfactory, as shown by the Mean Square Error (MSE) curves reported in Fig. 8.

Fig. 8
figure 8

Course of the Mean Square Error (MSE) during FIS identification phase

Accordingly, Fig. 9 shows the correspondence between the original parameters set by the user in the 121 records of the training set and the companion ones computed by the fuzzy rule systems. The 45 degree straight lines highlight the good approximation of the recovered points.

Fig. 9
figure 9

Comparison between the original parameters set by the user in the 121 records of the training set (x axis), and the companion ones computed by the fuzzy rule systems (y axis)

As for the FIS architecture used throughout this experimentation, we peaked randomly the components \(\varsigma _i\)s of each Sugeno function f among linear and quadratic functions, committing the FIS for a proper identification of the related coefficients. We used the asymmetric Gaussian-like m.f.s in (4) and affine transformation (9) to let the coordinates of the linguistic variables fulfill the support of the Gaussian bells. Clusters of rules were used to singularly generate shifts the current values of anthropomorphic parameters are affected by. Each cluster was composed of \(3^5\) rules, where 3 is the number of levels of the adopted fuzzy quantifiers (e.g., with reference to the approachability evaluation, they read as “low”,“somewhat”, and “very approachable”, standing, respectively, for “aloof”, “apathetic”, and “friendly”), and 5 is the number of antecedents in the rules: the 4 evaluations plus the questioned parameter computed in the previous run. Namely, for purposes of homogeneity with the other antecedents, we decided to group into three fuzzy sets the values assumed by the single questioned parameter in the training set (using the same three-level quantifiers as the ones adopted for evaluation variables), and to specialize the rules for each fuzzy quantifier and for each cluster as well.

Fig. 10
figure 10

Parameters and evaluations trajectories observed during the beautification process carried out with the FIS support

The control phase consists of a recall of the identified system, still subject to a continuous training based on the sole user evaluations. Namely, having as target the judgment the user makes on the pleasant appearance of the current face, and modifying its parameters on the basis of the history of user interaction in terms of pairs \(\langle \)face parameters, evaluations\(\rangle \), the fuzzy system is called upon to generate new face parameters, with the aim of moving user judgments around their set points. Prior to a more rigorous evaluation procedure, we let a dozen users evaluate the pleasantness of the face generated by the FIS after about 15 iterations. They all said that they were highly satisfied. Fig. 10 reports the trajectories of parameters and evaluations along a session where three different faces were submitted to one of these users. In all instances both parameters and evaluations converged to the optimal value (set equal to 0), after some wandering in the respective spaces. In fact, observing a full user satisfaction in correspondence to no modifications, in output to the FIS, of the values of the operational parameters, means that a suitable equilibrium point has been achieved by the system.

Fig. 11
figure 11

Parameters trajectories observed during the beautification process carried out without the FIS support

A different trend is denoted by the parameters when their setting is left in the hands of the user. In Fig. 11 we observe less spread in their values but no convergence to the neutral settings at the end of the three training sessions outlined in the picture.

4.1.2 The complete task

The second experiment, involving the entire set of parameters, has similar operational features but weaker identification performances, as denoted, for instance, by the two recovering graphs in Fig. 12.

Fig. 12
figure 12

Same graphs as in Fig. 9, when all 11 parameters are jointly identified

Fig. 13
figure 13

Face beautification when all 11 parameters are jointly identified: a the face obtained after 23 evaluations performed by the user; b the companion parameters computed by the fuzzy rule system

However, the user is able to drive the image toward the face in Fig. 13a, which he assumed to be satisfactory, through the parameters’ paths in Fig. 13b. The fact is that, during the face beautifying session, a two-way adaptation process occurs: FIS adapts to the user evaluations and user adapts to the FIS behavior. This process is shorter (around 15 steps) when the system is well identified in the batch phase, and longer (23 steps) when starting from a less accurate FIS. It is exactly this reciprocal adaptation process that we rely on in the transfer from the case study to the appliances’ regulation instances.

For a preliminary comparative study, we considered a simple Actor-Critic Reinforcement learning procedure (Grondman et al. 2012) as a competitor of our method. In a true essence, the Actor is the user replaced by the system which decides how to improve a face. The Critic is definitely the user who evaluates the effects of the parameter changes chosen to improve the face. Per se, the procedure results in a random walk in the parameter space with two sagacities:

  1. 1.

    it gives very small perturbations to the already trained system;

  2. 2.

    it drives the chance in a very conservative way, relying on a backtracking procedure as the last option.

The pseudocode of the entire procedure is reported in Algorithm 1.

figure a

The lead strategy is to encourage the exploration of the parameter space each time this produces a good result and to compress the exploration in the opposite case. When the exploration gets stuck in a local optimum, so that explorations are more and more inhibited (parameter variation ranges close to 0), we call for a long jump by resetting the variation ranges to their default values. If the jump is inappropriate (negative evaluation) then a backtracking procedure is performed, and another jump is realized. Finally, the algorithm stops after too many negative jumps.

The face beautification experiment shows improvements in a trend passing from phenotypes to genotypes. The Critic bases its actions on the user’s likes/dislikes (the phenotype). However, it may root its computations on the ranges of the parameter changes on the used parameters (the genotypes). The core of the above procedure lies in the criterion: if changes give rise to improvements (hence further likes) then maintain a large range, exactly in the directions where the modifications have been affected. Hence, if a parameter has been decremented with success (a new user like) decrease the negative extreme of the effected changes. Rather, if it gave rise to a dislike, then increase the negative extreme. Analogously with the positive extreme. The local minimum trap may be discovered simply by thresholding the change ranges: when the range of each parameter goes below a given threshold (hence, after many dislikes), we reset the ranges to the initial values, thus favoring a long jump in another parameters’ region. If no jump is effective, then stop.

We pay for the simplicity of the algorithm with a longer set of trials before getting satisfactory results. Fig. 14 shows this aspect as a companion of Fig. 13. We may see the squeezed thread of the parameters due to the reset of their range after the sticking in a local minimum.

Fig. 14
figure 14

Same pictures as in Fig. 13 when the parameter exploration is drawn by Reinforcement Learning

4.2 Adequacy, rationality and generality of the proposed procedure

Fig. 15
figure 15

Monotonicity preservation in the hidden variables

We used the same dataset collected for the face beautifying task to check the reasonableness and correctness of the various steps constituting our learning algorithm. In particular, we focus here on the adequacy of the procedure in discovering the hidden variables, which in turn represents the main novelty of the proposed FIS. Namely, with reference to the same control phase described in detail in the previous sections, for each judgments gs we were asked to find some regularities with the corresponding hidden variables hs learned by the system. To this aim, we randomly drew initial values for the latter in the range \([-5,5]\) and coupled each h with a judgment g, ranging in the integer lattice \(\{-5,\ldots ,+5\}\). Representing each judgment g through three fuzzy quantifiers (namely, “low”, “medium” and “high”), we run the learning algorithm by adjusting both hidden variables hs and parameters of the triangular m.f.s so as to achieve full user satisfaction. Fig. 15 denotes the good performance of the procedure in discovering the hidden variables, highlighting the preservation of the monotonicity between the values expressed by the users (y axis), namely the judgments gs, and the level of the fuzzy set identified within the three m.f.s by the one denoting the maximum membership of h (x axis). In view of the initial random coupling between hidden variables hs and user judgments gs, this is a remarkable result reflecting the general requirement of monotonicity on the maps considered in (Pedrycz et al. 1997). These experiments and emerging properties indicate that our method is very general and robust, so as to be employable in many operational fields for purposes of eliminating our blindness re the universe of discourse. Indeed, we simply transfer the above procedures to the goal testbed without noteworthy changes.

4.3 Facing the original task

Moving on to our recipe generation task, we applied our two-phase procedure to the bread maker experiment dataset.Footnote 5 It was generated by letting the partners of the SandS project prepare the loaf with the bread machine and then asking a suitable set of tasters to evaluate the quality of the home-baked bread. Repeating this experiment each day over a period of 4 months, we obtained a total of 200 pairs \(\langle \)operational parameters, evaluations\(\rangle \). As clear from Table 5, in this case we have 10 parameters (time and temperature in the 5 baking phases) and 4 evaluations as well. Clusters of rules were used to singularly generate the operational parameters. The overall FIS architecture was the same as the one adopted for the face beautification task (refer to Sect. 4.1 for more details).

As our first training instance, we omitted training the FIS on parameters 6, due to its scarce variability, and 10, since temperature settings above the security threshold (set to 150 in the microcontroller) are dummy. Rather, the identification on the remaining parameters (after proper normalization) proved sufficiently satisfactory (see Fig. 16).

Fig. 16
figure 16

Plots of the tested operational parameter values and their recovering in the bread-making experiment

However, many records in the training set have been collected during a sort of users’ training phase, so that their suggestions about the new recipe—which constitutes the FIS target on the identification phase—may definitely prove misleading. Hence we decided to dope the training set with a set of examples made up of the original instances as to the premises and of the optimal parameter settings (which we learned during the experimental campaign) as to the consequences. This expedient was profitable, as confirmed by the mean square error curves reported in Fig. 17.

Fig. 17
figure 17

Mean square error descent on the 10 operational parameters of the doped training set

Essentially the system is pushed toward the optimal parameters in the consequents, profiting from a well-biased noise represented by the original examples. With this FIS configuration the system produces a good tuning of the bread maker parameters even on new inputs (hence in generalization). However, to render the system really adaptive to the judgments of the single user so as to learn how to get a better evaluation from her/him, we needed to train the FIS following the single user interactions depicted in Fig. 5, as we did with the face beautification case study. Here the problem is a little trickier due to the slowdown incurred by the entire process, which in turn is caused by the delays introduced by the appliance during the recipe execution. Hence we devised the following expedient. After a good system identification (performed as before), we may assume that FIS computes the correct parameters in the doped part of the training set, so that the evaluation given in correspondence of these parameters properly applies to the computed parameters as well. So, we may use this part of the training set to train the system on the evaluations as well (the control phase). Rather, we adopted an intermediate strategy by pairing the identification error with the control error through a mutual weighting based on a linearly convex composition. Namely, with reference to Table 1, the error becomes:

$$\begin{aligned} E=\xi \sum _i^{n_f} g_i^2 + (1-\xi ) \sum _i^{n_c} (\tau _i-o_i)^2 \end{aligned}$$
(15)

for proper (small enough) \(\xi \in [0,1]\). Now that the FIS has been weaned, it is ready to work online. In this case, the error expression remains the same, where the second term acts as a regularization term, since the target \(\tau \) now is the previous parameter value and \(\xi \) is more or less big, depending on the evaluation values. The criterion is: if the overall evaluation is close to 0, then refrain FIS from producing new parameters that are far from the current ones. Vice versa do not hesitate to move far from them if the current evaluation is poor (nowhere near 0). Table 6 reports a log of these online interactions leading to a gentle reduction of the leavening and baking parameters in response to user evaluations, expressed in terms of slight defects of softness, baking, and crustiness.

Table 6 Log of online user interactions with the system

5 Conclusions

In this paper, we solve a new learning instance, that we denote by learning from complex granules, in response to a new characterization of information granules in terms of fuzzy sets on a hidden universe of discourse and their relations with the physical environment. What emerges is a new Granular Construct that by means of cognitive tools solves the problem of learning the parameters of a single c-granule. It is framed between the two constructs investigated so far: one where crisp variables are fuzzified to enter a fuzzy rule, and the other where fuzzy sets are dealt with as a whole within a fuzzy rule. The learning procedure is an extension of the back-propagation algorithm running on an architecture that is rather complex as for both the questioned variables and the two (identification/control) operational phases. Robustness of the numerical results and weakness of the theoretical guarantees are inherited by the original algorithm. Currently, we claim no comparative efficiency w.r.t. other approaches to the problem, apart from a trivial comparison with Reinforcement Learning in the case study. Rather, we stress its suitability as a very general method for concretely solving problems that are frequently met in the real life. In the sector of white goods manufacturers this is witnessed by the conduction of household appliances in a true fuzzy framework, where the principal inputs come from user reactions. In a tight concreteness thread, we considered all operational aspects of the specific problem of training a bread maker, and found the proof of its solution in a record of people daily preparing their loaves of bread according to the suggestions of the trained system.

Future work will be devoted to exploring in greater depth the theoretical aspects of the paradigm we have introduced. The main questions to answer are:

  1. 1.

    To what extent may we reduce the data inside the information granule? Obviously, this sentence may be read in many ways, questioning what we denote by datum, what by information granule. In a very pragmatic approach, we may ask ourselves about the sense of ruling a bread maker on the basis of feedback from a kook. In this case we may have a virtual reality where the user is satisfied, along with an effective reality where the same user may be intoxicated by the bread his machine has baked.

  2. 2.

    To what extent we may make our c-granule a more complex granule? We must recall that a distinguishing feature of our granule is that the soft_suit is rooted in a hidden universe of discourse, a drawback that we try to overcome thanks to its strong relationship with the signals coming from the hard_suit and an efficient link_suit. Parameterizing the granules in terms of approximation spaces, for instance, requires measures on the universe of discourse that would constitute an additional target of our inference system. The additional complexity opens the path to interesting abstract operations on granules, such as composition, hierarchy, etc., which become crucial in field such as Big Data Pedrycz and Chen (2015). As we mentioned in Sect. 2, for now we privileged a pure subsymbolic way based on the cognitive adjustment of non-symbolic parameters. However, the exploration of a hybrid approach will be a relevant part of our future work.