research-article

Open Access

Smooth Approximation of Lipschitz Maps and Their Subgradients

Author:
Abbas Edalat

Imperial College London, London

Imperial College London, London

0000-0002-6211-1991
View Profile

Authors Info & Claims

Journal of the ACM Volume 69 Issue 1Article No.: 8pp 1–32https://doi.org/10.1145/3481805

Published:22 December 2021Publication History

Journal of the ACM

Abstract

We derive new representations for the generalised Jacobian of a locally Lipschitz map between finite dimensional real Euclidean spaces as the lower limit (i.e., limit inferior) of the classical derivative of the map where it exists. The new representations lead to significantly shorter proofs for the basic properties of the subgradient and the generalised Jacobian including the chain rule. We establish that a sequence of locally Lipschitz maps between finite dimensional Euclidean spaces converges to a given locally Lipschitz map in the L-topology—that is, the weakest refinement of the sup norm topology on the space of locally Lipschitz maps that makes the generalised Jacobian a continuous functional—if and only if the limit superior of the sequence of directional derivatives of the maps in a given vector direction coincides with the generalised directional derivative of the given map in that direction, with the convergence to the limit superior being uniform for all unit vectors. We then prove our main result that the subspace of Lipschitz C^∞ maps between finite dimensional Euclidean spaces is dense in the space of Lipschitz maps equipped with the L-topology, and, for a given Lipschitz map, we explicitly construct a sequence of Lipschitz C^∞ maps converging to it in the L-topology, allowing global smooth approximation of a Lipschitz map and its differential properties. As an application, we obtain a short proof of the extension of Green’s theorem to interval-valued vector fields. For infinite dimensions, we show that the subgradient of a Lipschitz map on a Banach space is upper continuous, and, for a given real-valued Lipschitz map on a separable Banach space, we construct a sequence of Gateaux differentiable functions that converges to the map in the sup norm topology such that the limit superior of the directional derivatives in any direction coincides with the generalised directional derivative of the Lipschitz map in that direction.

1 INTRODUCTION

Lipschitz maps between metric spaces (i.e., maps named after Rudolph Lipschitz that increase distances by at most a given factor) provide a fundamental class of functions in pure and applied mathematics as well as a variety of areas of computation including optimisation, control theory, geometric modelling and machine learning.

In all these areas, we often need to deal with functions that are indeed locally Lipschitz but not differentiable. Lipschitz maps naturally arise as any composition of functions consisting of piecewise continuously differentiable functions, the absolute value function or the maximum or minimum of a finite set of such functions [11]. In machine learning, the least absolute deviation method or loss gives a non-differentiable but Lipschitz map as an objective function. In deep learning, currently the most widely used activation function in feed forward neural networks is the Rectified Linear Unit (ReLU), which like the absolute value function fails to be differentiable at 0 but is Lipschitz with Lipschitz constant 1 [39].

Lipschitz maps, closed under composition, contain the important class of piecewise polynomial functions that are supported in basic mathematical software such as MatLab [14] and are widely used in geometric modelling, approximation and interpolation [14]. They are uniformly continuous and, in contrast to differentiable maps, are closed under the fundamental min, max operations and absolute value on functions. Lipschitz maps with uniformly bounded Lipschitz constants are also closed under convergence with respect to the sup norm topology. In addition, there is the distinguished property in the theory and applications of differential equations that a Lipschitz vector field in has a unique solution to the initial value problem [13].

On a more theoretical level, Lipschitz maps between finite dimensional Euclidean spaces are, by Rademacher’s theorem, differentiable almost everywhere [12, p. 148]. By Kirszbraun’s theorem [32, p. 202], a Lipschitz map from a subset of a finite dimensional Euclidean space to another such space can be extended to the whole space with the same Lipschitz constant. Lipschitz maps are also at the very foundation of non-linear functional analysis [9].

It is against this background that Frank Clarke introduced, for locally Lipschitz maps, the notion of generalised directional derivative that he used to define the subgradient, as a set-valued derivative with values that are non-empty, compact and convex sets; the subgradient extends a similar concept for convex functions [11]. A closely related notion of generalised Jacobian was introduced by him for vector Lipschitz maps. The emerging subject of set-valued analysis and non-smooth optimisation has grown to impact many areas of computation in engineering and applied mathematics [3]. The subgradient, also called subdifferential, plays a key role in several areas of mathematical computation which deal with Lipschitz maps including non-smooth dynamical systems, differential inclusions, calculus of variations and optimal control [2, 4, 11].

A well-established method to tackle non-smooth optimisation problems is to find a suitable smooth approximation to the minimisation problem and then use standard techniques such as the gradient descent algorithm [45]. A major challenge in optimisation is therefore to find suitable smooth approximations to non-smooth minimisation problems [5]. There is a large literature to design smooth approximations to the problem when dealing with convex optimisation that use the notion of convex conjugate of the underlying function [7].

In this article, adopting a new inter-disciplinary approach, several new representations for the subgradient and the generalised Jacobian of locally Lipschitz maps are introduced which are the basis of a framework for smooth approximation of Lipschitz maps and their differential properties. From an optimisation point of view, we develop a smooth approximation method for problems that are Lipschitz but may not be smooth or convex. We focus here on the foundational mathematical results and leave the algorithmic consequences in various areas of optimisation to future work.

We first apply domain theory, a branch of order theory, to derive some new representations and basic results about the subgradient and the generalised Jacobian of locally Lipschitz maps. Domain theory was developed, on the one hand, by Dana Scott as a mathematical model of computation for the denotational semantics of programming languages [42, 43], particularly functional programming languages, and, on the other hand, independently by several groups of mathematicians, in various mathematical contexts, including Karl Hofmann, Jimmie Lawson, Mike Mislove, Al Stralka, Klaus Keimel, Gerhard Gierz and Yuri Ershove (see [33]). It was later used to build new computational models in several other areas including exact real number computation, computational geometry, measure and integration theory and solutions of ODEs [16, 17, 18, 23, 24, 28, 29, 30]. In addition, employing domain-theoretic methods, the so-called L-derivative was introduced for real and complex Lipschitz maps [22, 25], and it was subsequently shown that the L-derivative coincides with the subgradient for real-valued functions on finite dimensional Euclidean spaces [19]. This latter result has later been extended by Hertling to real-valued Lipschitz maps on Banach spaces [34].

Domains are partially ordered sets equipped with notions of completeness and approximation. For example, the upper space of a finite dimensional Euclidean space, namely the collection of non-empty compact subsets of the space partially ordered with reverse inclusion, is a domain, as is its subcollection of convex subsets, the fundamental domain under consideration in the present work. Domains are T₀-topological spaces endowed with their Scott topology. This topology for the upper space coincides with the upper topology, which is widely used in optimisation theory: set-valued functions continuous with respect to this topology are called upper semicontinuous multifunctions [11, p. 29]. Elements of domain theory required in this article are outlined in Section 2.

Here, we employ the notion of the lower limit or limit inferior of a map from a dense subset of a topological space without isolated points into a bounded complete domain. It is closely related to the notion of lower envelope of such a map as introduced in the work of Gierz et al. [33, Exercise II-3.19]) in connection to bounded complete domains as densely injective spaces [33, p. 182]). We show in Section 3 that the subgradient and the generalised Jacobian of locally Lipschitz maps are the lower limits, equivalently the lower envelopes, of the classical derivative of the map which exists on a dense subset. This new representation leads to significantly shorter proofs for several basic properties of the subgradient and the generalised Jacobian.

Our key results, however, use the L-topology on the space of locally Lipschitz maps which was introduced in the work of Edalat [21] and is defined using the Scott topology on the space of Scott (equivalently upper) continuous functions from a finite dimensional Euclidean space to the domain of non-empty, convex and compact subsets of the space. In fact, this function space inherits the partial order of the ambient domain, by pointwise ordering of functions, which makes the function space itself into a bounded complete domain. Given that the subgradient of a locally Lipschitz map belongs to this function space, the L-topology is defined as the weakest refinement of the sup norm topology on the space of locally Lipschitz maps that makes the subgradient a continuous functional into this function space equipped with its Scott topology. The L-topology admits a complete metric and is weaker than the Lipschitz norm topology [21].

We derive simple necessary and sufficient conditions for a sequence of locally Lipschitz maps between finite dimensional Euclidean spaces to converge in the L-topology to a given locally Lipschitz map: the limit superior of the directional derivatives of the maps in the sequence in a given vector direction must coincide with the generalised directional derivative of the given map in that direction, with the convergence to limit superior being uniform for all unit vectors. We will then prove that in maps are dense in the space of Lipschitz maps between finite dimensional Euclidean spaces with respect to the L-topology.

It is well known that convolution with a smooth function can make a given function smooth [10]. It is also well known that given a Lipschitz map , with Lipschitz constant c, one can construct a sequence of functions with Lipschitz constant c such that the sequence converges in the sup norm to f. In fact, by taking a sequence of test functions with for and for all , the sequence of convolutions with converges to f in the sup norm and each function is with Lipschitz constant c [6, p. 1]. It is easy to see that this result also holds for Lipschitz maps of type . In this article, we will extend these results and show that the limit superior of the sequence of directional derivatives of the maps f_n in a given vector direction v always coincides with the generalised directional derivative of the Lipschitz map in that direction v, with the convergence to the limit superior being uniform for all unit vectors. This will therefore give us a notion of approximation of Lipschitz maps by a sequence of maps which entails convergence in the sup norm topology to the Lipschitz map and, in addition, preserves the differential properties of the sequence in the limit. This result then can be viewed as the basis for a method for smooth global approximation of a Lipschitz map. We will provide an elementary self-contained account of these results for the reader.

To this end, given a Lipschitz map between finite dimensional Euclidean spaces, we take its convolution with a sequence of Gaussian probability distributions, as our test functions, and explicitly construct a sequence of maps convergent to it in the L-topology, implying that the limit superior of the directional derivatives of the functions in the sequence in a given vector direction coincide with the generalised directional derivative of the Lipschitz map in that direction, with the convergence to limit superior being uniform for all unit vectors. This result does not depend on the particular choice of the sequence of Gaussian probability distributions we use as our test functions. We first present the proof of the preceding property for scalar Lipschitz maps in Section 5, which is elementary, and formulate the more sophisticated proof in the case of vector Lipschitz maps in Section 7, which uses Imbert’s expression for the generalised Jacobian [36]. These approximation properties of Lipschitz maps give credence to the claim that the subgradient of a Lipschitz map and the generalised Jacobian of a Lipschitz vector map are the generalisations of the classical derivative for the respective Lipschitz maps.

As an application of the preceding results, we present a short proof of the extension of Green’s theorem to interval-valued vector fields in Section 6.

Last, in Section 8, we consider real-valued Lipschitz maps on Banach spaces, which include all L^p function spaces () used widely in optimisation. The subgradient at a point of the Banach space is now a non-empty, convex and compact set of the dual of the Banach space equipped with its weak* topology. Using the equivalence of the L-derivative with the subgradient, we provide a short proof that the subgradient of a Lipschitz map is upper continuous. Then, for a given Lipschitz map on a separable Banach space, we use a non-degenerate Gaussian measure on the space to construct a sequence of Gateaux differentiable maps that converges to the Lipschitz map in the sup norm topology such that the limit superior of the sequence of directional derivatives of the maps in a given direction coincides with the generalised directional derivative of the Lipschitz map in that direction. We conclude the article in Section 9.

Since the L-topology is defined using the Scott topology of a function space, which is only T₀ with no classical counterpart, we argue that the results in this article are based fundamentally on an interdisciplinary approach bridging domain theory as developed in the Scott theory of computation on the one hand and non-smooth optimisation on the other. An attempt has been made to use more elementary mathematical notions and results in the earlier sections related to real-valued Lipschitz maps on finite dimensional Euclidean spaces and employ more advanced and recent results in analysis only in the later sections relating to vector Lipschitz maps and Lipschitz maps on Banach spaces.

1.1 Notation and Terminology

We first recall that a map of two metric spaces is Lipschitz if there exists such that for all ; the map f is locally Lipschitz if any point in X has an open neighbourhood in which f is Lipschitz. For any positive integer n, we equip with the Euclidean norm . The usual inner product of vectors is denoted by in the standard Cartesian coordinate system of . If is a non-empty compact convex set and , we write which is a compact interval of . For a vector and a linear map of type , represented in the standard coordinate system by the matrix , the value of the linear map A at v is written as usual by with . The transpose of a matrix is denoted by M^T, the unit sphere centred at the origin in by and the closed ball centred at and radius by . The closure, respectively, the interior, complement and boundary of a subset of a topological space X are denoted by , respectively, , S^c and S^b. For two topological spaces X and Y, the set of continuous functions from X to Y is denoted by . When it is convenient and there is no ambiguity, we identify x and for . For a non-empty, compact convex set , the ε-open neighbourhood of C is denoted by , which is convex. For subsets , the Minkowski sum and difference are defined by .

The collection of Borel subsets of a topological space (i.e., the smallest σ-algebra containing the open sets of X) is denoted by . Given a map of spaces X and Y, we denote by the image of a subset under f. If f is continuous and μ is a Borel measure on X, then the induced pushforward measure on Y is given by for .

We equip the vector space of matrices over with the Frobenius norm given by

for

. Note that the Frobenius norm, which for vectors coincides with the Euclidean norm, is subordinate to the Euclidean norm of vectors (i.e.,

). For convenience, we write

as simply

. Recall also that for matrices of type

, the Frobenius norm, like any other matrix norm, induces the Euclidean topology on

, which in this article is taken as the topology of linear maps of type

Let X and Y be normed vector spaces and the normed vector space of bounded linear operators from X to Y with the operator norm. The one-sided directional derivative of at in the direction is given by

when the limit exists. The directional derivative

in the direction

is given by

when the limit exists, in which case

. The Gateaux derivative of f at a point

, if it exists, is a bounded linear operator

such that for all

we have

[9]. The Fréchet derivative [9, 48] of f at

, if it exists, is a bounded linear map

with

If f is Fréchet differentiable, then it is Gateaux differentiable and the two linear maps coincide.

The convex hull of a subset A of a topological vector space is denoted by , or simply if no ambiguity arises. We write the closure of by , which is called the closed convex hall of A and is equal to the intersection of closed convex sets that contain A [35, p. 31]. Recall that the Hausdorff metric on the set of non-empty compact subsets of is defined by . The support function of a non-empty convex set is defined by . We state some of its basic properties here [41, 1.7.1 and 1.8.14]: The support function is convex, thus continuous, and, for non-empty compact and convex sets C₁ and C₂, we have

(1)

where

is the uniform norm on the sphere

; moreover,

iff

—that is, any non-empty convex compact set is completely determined by its support function.

2 DOMAIN THEORY

Domain theory was introduced by Dana Scott in computer science as a mathematical model of computation in particular for developing denotational semantics for programming languages [42, 43]. The so-called algebraic domains were used to develop mathematical models of λ-calculus for the denotational semantics of functional programming language [47]. It was later shown that non-algebraic domains, the so-called continuous domains which will be employed in this work, can model mathematical computation in a variety of areas, including exact real number computation [30], computational geometry [25, 26], measure and integration theory [16, 17, 20], differential calculus [25], solution of ODEs [29] and hybrid systems [28].

We review the elements of domain theory we need in this article [1, 33]. A directed complete partial order (dcpo) is a partial order in which every directed subset has a supremum. The partial order relation is considered to imply that y has more information than x. A subset is an open subset of the Scott topology of D if O is an upper set (i.e., ) and inaccessible by supremums of directed sets (i.e., if is directed then ). This topology is T₀. A map of dcpo’s is Scott continuous iff it is monotone (i.e., implies ) and preserves supremums of directed sets (i.e, if is directed then ).

For , we have x way-below y, denoted , if for every directed subset the relation implies there exists with . The way-below relation refines the partial order (i.e., implies ). The idea is that when , x is a finitary approximation to y in the sense that if the subset A represents a set of finitary information whose total aggregate provides more information than y, then there is already a piece of information in A that exceeds the information in x. We say is a basis of D if the set of elements in B way-below any element is directed with supremum y—that is, is directed with . A (countably based) domain is a dcpo with a (countable) basis. If D is a domain, then D itself is a basis for D. Two given elements of a domain have the following simple property widely used in this work: if for all the relation implies , then we have (since in fact x is the supremum of all such z). The way-below relation in a domain D has the so-called interpolation property, very useful in practice: if , then there exists , where is any given basis, such that . A domain is bounded complete if every bounded set of elements has a least upper bound. Since the empty set is trivially bounded, a bounded complete domain has a least element, denoted ⊥. The Scott topology of a domain with basis B has basic open sets given by for .

Some of the basic examples of countably based domains in mathematical analysis and computation are related to finite dimensional Euclidean spaces. The lattice of open subsets of , ordered by subset inclusion, is a bounded complete domain with a greatest and a least element (i.e., a continuous lattice) in which iff is compact with (i.e., iff an ε-neighbourhood of O₁ is contained in O₂). The interiors of rational polytopes provide a countable basis for . The upper space of , consisting of non-empty compact subsets of ordered by reverse inclusion and augmented with as the least element, is a bounded complete countably based domain with iff —that is, iff an ε-neighbourhood of C₂ is contained in C₁ [17]. A countable basis is given by rational polytopes. The Scott topology on coincides with the upper topology which has basic open subsets of the form , for any open set . The singleton map with is an embedding of onto the set of maximal elements of equipped with the Scott topology. For convenience, we identify with (i.e., we consider ). The upper space has several sub-domains relevant to our work: (i) consisting of the convex subsets in , which is the fundamental domain for the results of this work; (ii) consisting of axes-aligned hyper-rectangles with for , where a non-empty compact interval is written as ; and (iii) , for an open set , consisting of axes-aligned hyper-rectangles contained in A. We will widely use the interpolation property of the way-below relation in the lattice of open sets of and that of the way-below relation in in this work; the reader unfamiliar with domain theory can simply regard the way-below relation on such pairs of subsets as a shorthand notation for the preceding relation between open sets or between compact sets.

From basic domains, one can construct higher-order domains, including domains of functions. In particular, if X is a topological space with a continuous lattice of open sets and is a bounded complete domain with basis , then the function space consists of the collection of Scott continuous functions partially ordered by pointwise ordering (i.e., if ). Then is itself a bounded complete domain with a basis of step functions which we will now define [33, p. 200]. A single-step function is given by and a basic open subset with if and ⊥ otherwise. A step function is the least upper bound of a finite bounded set of single-step functions. We note that a basis of can be obtained by taking single-step functions constructed from the basis elements and basic open subsets . We have iff [33, Proposition II-4.20(iv)], which characterises the way-below relation in . The fundamental function space we will deal with in this article consists of Scott continuous functions of type , where U is an open subset.

Finally, consider a real Banach space X and recall that the dual of X is defined as the set of bounded real-valued linear functionals on X. The weak* topology on is the weakest topology on that makes all functionals , where with , continuous. This topology is Hausdorff and by Banach-Alaoglu’s theorem, the closed ball of radius c centred at the origin is compact with respect to the weak* topology for any . Let denote the set of non-empty weak* compact and convex subsets of , augmented with , with partial ordering induced by reverse inclusion. Then, is a bounded complete domain in which iff , where the interior is with respect to the relative weak* subspace topology on . Given , the closed line segment between them is denoted by .

3 LOWER EXTENSION, LOWER ENVELOPE AND LOWER LIMIT OF DOMAIN MAPS

In any bounded complete dcpo D, in particular in any bounded complete domain, any non-empty set has an infimum given by as the latter set is directed. In addition, in such a dcpo any net has a limit inferior or liminf defined by Gierz et al. [33, p. 133]:

This can be used to define the notion of the lower limit of a map as well as its lower envelope, which is in fact the construction given in the work of Gierz et al. [33, Exercise II-3.19]. Recall that an isolated point of a topological space is a point x for which

is open.

Definition 3.1.

Let where X is a dense subset of a topological space Y and Z is a bounded complete domain. The lower envelope (cf. [33, Exercise II-3.19]) of f is given by the map with

A continuous map

is a lower extension of f if

on X. If Y has no isolated points, then the lower limit of f is given by the map

with

Note that the lower limit or and the lower envelope of a real-valued function on Euclidean spaces are well-known constructions; see, for example, the work of Yeh [49, p. 144–145].

Proposition 3.2.

Let where X is a dense subset of a topological space Y and Z is a bounded complete domain:

The map is a lower extension of f with for any lower extension g of f.
If f is continuous at then .
If Y has no isolated points, then . If Y is also T₁, in particular Hausdorff, then is continuous.

Proof.

(i) From the definition, it follows directly that . Let and . By the interpolation property, take with . Then, by the definition of , there exists an open set with such that . Thus, for all , we have , which shows that is continuous. To show that for any , let . Then by the continuity of g at y, there exists an open set with such that and thus .

(ii) Let . By continuity, there exists open set such that implies and thus . Since is arbitrary, it follows that and thus by (i).

(iii) The relation follows from the definitions since for any open set O with we have , where the latter infimum exists because, by assumption, Y has no isolated points. Suppose Y is in addition T₁. Let and . Then there exists an open subset with such that . We claim that implies . Let . By the separation property, there exists an open set with and . Therefore, and thus , which proves the continuity of at y₀.□

The properties of the lower envelope then give us the following.

Corollary 3.3.

Suppose Z and U are bounded complete domains with where X is a dense subset of Y and where V is a dense subset of Z with . Then in the partial ordering of the function space .

Proof.

The map is continuous as it is the composition of two continuous functions. Moreover, and and thus . Hence, is a lower extension of and the result follows from Proposition 3.2(i).□

We can also deduce the following property of the lower limit and lower envelope.

Proposition 3.4.

Suppose X is a dense subset of Y and a map where Z is a bounded complete domain. If U is a bounded complete domain and is Scott continuous and preserves non-empty infima, then and, if Y has no isolated points, .

Proof.

We give the proof for the lower envelope as the proof for the lower limit is similar. For and any open set with , we have since g preserves non-empty infima. Thus, by Scott continuity of g, we obtain

□

There are many examples of lower limits and lower envelopes of maps with sets of discontinuities of various cardinality in analysis. We start with examples related to the domain of non-empty compact intervals of partially ordered with reverse inclusion. The first shows that the lower envelope and the lower limit can be different, and that the lower limit need not be a lower extension.

Example 3.5.

Consider the step function defined by if , if and . Then, for but .

Example 3.6.

Consider the periodic sawtooth wave defined by . Clearly, S has a discontinuity at each . We have is given by if and for . Moreover, .

Example 3.7.

Consider any function that is bounded (i.e., from below and above). Note that is dense with respect to the Scott topology: in fact, given , any open interval containing x generates the Scott (equivalently upper) open set which contains . It follows that . It is easily seen that —that is, the smallest compact interval that contains for all .

Example 3.8.

More generally, consider any bounded function —that is, there exists such that for all . Since is dense in , the lower limit and lower envelope exist and —that is, the smallest convex and compact set containing .

Example 3.9.

We now give an example, presented in [25, Example 6.6], of a map whose set of discontinuities is uncountable with positive Lebesgue measure μ. Take any positive real number . The construction in the work of Edalat and Lieutier [25] gives two disjoint open subsets with and —that is, the interior of the complement C^c of C. Define with if and if . Then is the set of discontinuities of f and has Lebesgue measure . The lower limit and the lower envelope coincide and if , if and if .

Furthermore, we note the following well-known fact.

Lemma 3.10 ([35, p. 32]).

If is a bounded set, then

Thus, if is a bounded set, then , where is the embedding , and it can be computed as . We will also need the following result later on.

Proposition 3.11.

Suppose X is a dense subset of a topological space Y without any isolated points and is continuous. Then, the lower limit and lower envelope of g coincide: as maps of type .

Proof.

Given , for any open set , we note that whether or not we have since g is continuous. Thus, we have

The result now follows from the definitions of the lower envelope and lower limit.□

Example 3.12.

Let be a differentiable function. Then is continuous on a dense set but the set of discontinuities of can be dense, of positive Lebesgue measure or of full Lebesgue measure [8, Chapter 1.3.2, Proposition, 1.10, p. 30]. We have by Proposition 3.11.

4 GENERALISED JACOBIAN AS LOWER LIMIT OF DERIVATIVES

The notions of lower limit and lower envelope can be applied to non-smooth analysis and optimisation. We recall the notion of generalised Jacobians of real locally Lipschitz vector functions, as introduced by Clarke and presented in that work [11, section 2.6]. First, note that by Rademacher’s theorem [12, page 148], a locally Lipschitz map is differentiable almost everywhere with respect to the Lebesgue measure.

Let be the null set where the locally Lipschitz map fails to be differentiable, and let be any null set with respect to the Lebesgue measure. The generalised Jacobian for is defined to be

(2)

where

denotes the Fréchet derivative (Jacobian) of f at

with respect to the standard Cartesian coordinates. The right-hand side of the preceding formula is to be interpreted as follows. There are many sequences

that converge to x such that

also converges to a limit; the generalised Jacobian

is the convex hull of all such limits. The first property to note is that the resulting set in Equation (2) does not depend on S.

Theorem 4.1 ([46, Theorem 4]).

The set is independent of the null set S.

We will thus write for any null set S. Let the vector space of matrices over real numbers be equipped with the Frobenius norm.

Theorem 4.2 ([11, Proposition 2.6.2]).

If is a locally Lipschitz map, then is a non-empty convex compact subset of for each , and the map is upper semi-continuous.

We have , where the latter denotes the set of matrices whose jth row belong to . For a locally Lipschitz (i.e., , the generalised Jacobian is called the subgradient which is equivalently defined by the support function of for (see Clarke [11, Proposition 2.1.2(b) and Theorem 2.5.1]):

(3)

Here, is called the generalised directional derivative of f at x in the direction of . We will use each of the four equivalent terms in Equation (3) as convenient in this work.

4.1 Representation by Lower Limit

We will now show that for any locally Lipschitz map , the generalised Jacobian is the lower limit, equivalently the lower envelope, of the derivative map , where is the dense subset where f is differentiable. This will provide a new representation for the generalised Jacobian and for the subgradient when . Note that can be considered as a subset of the maximal elements of the bounded complete domain —that is, the set of non-empty compact and convex subsets of the space of real matrices ordered by reverse subset inclusion and augmented with a bottom element that can be regarded as the whole space . Thus, we can consider as a map of type . We need the following theorem of Carathéodory on convex hulls.

Theorem 4.3 ([15]).

Any point of the convex hull of a subset lies in the convex hull of at most points in S.

By allowing some points in Carathéodory’s theorem to be the same points if necessary, we can assume that any point of the convex hull of a subset lies in the convex hull of points in S.

We now have our main result in this section.

Theorem 4.4.

For any locally Lipschitz map , the generalised Jacobian coincides with the lower envelope and the lower limit of the derivative map:

Proof.

Let . By Proposition 3.2(iii), we already know that . We will now show that . By Theorem 4.1, we have where ; hence, it is sufficient to prove that . Suppose we have a sequence , , with , such that exists and thus . Let be an open set with on which f is Lipschitz. Then there exists such that implies —that is, for all . If is a Lipschitz constant for f in then for any and thus , where is the compact unit ball of radius c around the origin in . Hence, by the comment after Lemma 3.10, is the convex hull of the closure of , which implies —that is, . Since this holds for any sufficiently small open set O containing x, it follows that —that is, . But is the convex hull of points such as y and is convex. Thus, .

Next, we will show that . Suppose . For any integer , let O_k be the open ball of radius centred at x. Since for , , it follows from Carathéodory’s Theorem 4.3, applied to the point y of the mn dimensional Euclidean space , that there exist points , for , such that . By the definition of the closure of a set, let be such that for each and . Since the subset is compact for each , there is a subsequence such that the limits exist for all i with . By continuity, we have . In fact, the polyhedron converges to in the Hausdorff metric on which implies since for each , where is the minimum distance from the point y to a compact set A. By construction, we have and . Thus, for , and by convexity, . Hence, , which completes the proof.□

Since the lower limit or the limit inferior of a map is more widely used in analysis, we will formulate our results in this article in terms of the lower limit.

The following simple property of continuous maps provides a useful tool in computation in different contexts, in particular for the extension of continuous maps, including the elementary functions of type to intervals which is at the basis of interval analysis [38].

Proposition 4.5.

Let be a continuous function, where k is a non-negative integer. Then and for compact and convex subsets and .

Proof.

By Proposition 3.11, . Their common value easily follows from the definition of the lower envelope since a continuous map preserves compact subsets.□

If , then the map given by is linear and thus preserves compact and convex sets. By Proposition 4.5, we have given by . We observe that is Lipschitz on the non-bottom elements of with respect to the Hausdorff metric: if and , then implies , since for we have as the Frobenius norm is subordinate to the Euclidean norm. From this relation and its symmetric counterpart, it follows easily that , applied to non-empty compact and convex subsets, has Lipschitz constant .

Corollary 4.6.

If and is Lipschitz, then (i.e., ) for all .

Proof.

Since is Lipschitz with respect to the Hausdorff metric, it is Scott continuous. It is also easy to check, using Theorem 4.3 that for any non-empty family M_i, where , with we have . Therefore, preserves non-empty infima and the result follows from Proposition 3.4 since we have .□

4.2 Basic Properties of Generalised Jacobian

A number of properties of the generalised Jacobians proved in the work of Clarke [11] now simply follow, in the light of its coincidence with the lower envelope given by Theorem 4.4, by the basic properties of the lower envelope in Proposition 3.2.

Corollary 4.7.

For any locally Lipschitz map and we have

The set is non-empty, convex and compact.
If exists, then .
If f is continuously differentiable at then .
is upper continuous.

Remark 4.8.

We cannot directly use the standard extension of continuous maps as in the work of Gierz et al. [33, p. 181] to obtain Theorem 4.4. In fact, there are Lipschitz functions which are not continuously differentiable at any point, so Corollary 4.7(iii) does not apply. For example, in the work of Lebourg [37, Proposition 1.9], a Lipschitz map has been constructed with for all . It follows that f is not continuously differentiable at any point since at such a point we would have .

Next we show that the two chain rules for the generalised Jacobian derived in the work of Clarke [11, Theorem 2.6.6 and its corollary] can be deduced with a much shorter proof. We note that if and , then . In fact, it is easy to check that is compact, but it is, in general, not convex. As a counter example, with and , let and

Then, AB is not convex as

but

. However, we have the following proposition.

Proposition 4.9.

If and , then

Proof.

Since the second and third sets in the preceding equalities are contained between the first and the fourth, it is sufficient to show the equality of the latter two. Clearly, . If and , then by Carathéodory Theorem 4.3, there exists , for with and , for with such that and . It follows that with . Thus, and the result follows.□

We now recall the mean value theorem for the generalised Jacobian.

Proposition 4.10 ([11, Proposition 2.6.5]).

If is Lipschitz in U, then for we have .

Note that in the preceding proposition, we have , where is the compact line segment between the points . By Proposition 4.9, we have . We will actually give an alternative proof of Proposition 4.10 later in Corollary 7.6.

Observe that the composition map with is continuous with respect to the Frobenius norm equivalently the Euclidean topology. By Proposition 4.5, with .

Theorem 4.11.

(Cf. [11, Corollary p. 75]) Suppose is Lipschitz near and is Lipschitz near . Then for any , we have

Proof.

By Corollary 4.6, it follows that

However, the map

of type

, is the composition of upper continuous or continuous functions and is thus upper continuous. Since by Theorem 4.4

, the result will follow by Proposition 3.2(i) if we show that the upper continuous map

, of type

, is indeed a lower extension of the map

of the same type. Assume that the Fréchet derivative

exists for some

. (Note that the existence of

does not imply the existence of

.) We will show that if

, then

which will complete the proof. Let

be given. Put

where

is the unit vector in the direction of v. By the definition of the Fréchet derivative, there exists

such that

implies

(4)

Let and and put . By the upper continuity of at and that of at x as well as the continuity of f at x, there exists such that implies

(5)

with

, where

is, the

-open neighbourhood of C as defined in Section 1.1.

By Proposition 4.9 and Proposition 4.10, for , we have

Thus, for

, by Carathéodory Theorem 4.3, there exist

and

together with

and

, for

, and

, with

, such that

and

. From Relation (4), for

, we have

. Therefore, for

, we obtain

It follows that

. Since

is arbitrary and

is compact, we obtain

. The result follows.□

Corollary 4.12.

(Cf. [11, 2.6.2]) For , we have

Proof.

When , the two sets and are compact subsets of . The theorem now says that the support function of is less that of , from which the result follows by the comment after Equation (1).□

5 SMOOTH APPROXIMATION OF SUBGRADIENT

Let be an open subset, and let denote the set of Lipschitz maps , and let denote the set of locally Lipschitz maps of type —that is, for each , there exists an open set such that andf is Lipschitz in O. For any function , let be the derivative of f at x when it exists. In particular, we have a map

where

is the set of continuously differentiable functions

equipped with the

norm topology, and

is the space of continuous functions

equipped with the sup norm topology.

Recall that the L-topology, the weakest refinement of the sup norm topology such that the subgradient operator is continuous, is second countable and admits a complete metric [21]. We will from now on consider equipped with its L-topology.

In this section, we characterise the L-topology in terms of sequences of locally Lipschitz maps and we will show that , the subset of continuously differentiable functions in , is dense with respect to the L-topology. Note that if U is relatively compact (i.e., has a compact closure), then any map in extends by continuity to the closure of U and in this case we can use the closure of U and our results will imply that is a dense subset.

We start by noting the following property.

Lemma 5.1.

Suppose , , is a sequence such that in the sup norm topology as where . Then, for all and , we have

Proof.

Let and , and let , and be given. By the definition of in Equation (3), there exist y and t with , and such that . Since in the sup norm topology, there exists such that . By the compactness of , it follows that there exists an open set with such that is Lipschitz in U₀. By Lebourg’s mean value theorem [11, 2.3.7] applied to in the open line segment , there exists such that (i.e., ), and the result follows.□

5.1 Characterisation of Convergence in L-Topology

Observe that since for , the restriction of the L-topology on is precisely the norm topology.

Proposition 5.2.

The relative subspace L-topology induced on the subset coincides with the norm topology.

We will now obtain one of our main results, which gives a characterisation of the L-topology by classical notions in mathematical analysis. First, we need to fix our terminology. For a sequence of functions , with , and a subset , we say uniformly for if the sequence converges (in fact decreases) to as uniformly for .

Lemma 5.3.

Suppose the sequence of maps converges to in the sup norm topology. Then, we have

(6)

for all

and

, with the convergence being uniform for

, iff for all

, there exists

and

such that

for

and all

Proof.

By Lemma 5.1, . Next, we note that the limsup taken over two variables is converted in the usual way to limsup over a single variable. In fact, with , where O_k is the open ball centred at x and of radius with x removed. The result now follows.□

Theorem 5.4.

A sequence of maps converges to in the L-topology, as , iff in the sup norm topology and for all and we have

(7)

with the convergence being uniform for

Proof.

Suppose that the sequence converges to in the L-topology. Since the L-topology is a refinement of the topology, in the sup norm topology. To invoke Lemma 5.3 to deduce the uniform convergence to the limit superior, let and . Then, by upper continuity of at x, there exists an open set O₀ with such that for . Let with and let be such that open ball O of radius δ and centre x satisfies . Then, and thus [33, Proposition II-4.20(iv)]. Since the sequence h_k converges to f in the L-topology and since is a Scott open neighbourhood of , it follows that there exists N such that for , we have , and hence for . It follows that for all , and , which yields the required uniform convergence by Lemma 5.3.

Next, assume that and that , , is a sequence with as in the sup norm topology and for all , the convergence in (6) is uniform in . Let be a single-step function with in and compact. From , we obtain . By the interpolation property, there exists an open set O with , which implies that for all we have . Thus, for each , there exists and an open neighbourhood of x such that . In addition, using the uniform convergence in (6), by Lemma 5.3, there exists, for each , an open neighbourhood of x and such that for all , and . Consider the open cover of the compact set by open sets with . Let for be a finite cover and put and . Then for all , and , we have for some and hence: . Therefore, for all and —that is, , for all . Thus, (i.e., for all ), which proves the convergence of the sequence h_k to f in the L-topology.□

Corollary 5.5.

A sequence of maps , , converges to a map in the L-topology iff for all and we have

(8)

with the convergence being uniform for

Recall that a Lipschitz map is regular at if the one-sided directional derivative of f exists at x for all and . We say f is a regular map if it is regular for all [11, 2.3.4].

Corollary 5.6.

A sequence of maps , , converges to a regular map in the L-topology iff for all and we have

(9)

with the convergence being uniform for

If , and , we define and when . For a map , we define . Thus, when we have as usual and . Consider now a Lipschitz map with . The partial subgradient of f with respect to the subspace , denoted by , is defined by restricting the vector v in Equation (3) to ; see the work of Clarke [11, p. 48].

Corollary 5.7.

A sequence converges to in the L-topology, as , iff in the sup norm topology and for all , and we have

(10)

where the convergence is uniform for

Let be the projection to with . Since for ,

from Corollary 5.7, we obtain an alternative short proof of a corresponding result in Clarke [11, Proposition 2.3.16].

Corollary 5.8.

If is locally Lipschitz, then for all :

Recall that, given any metric space , the collection of bounded real-valued Lipschitz functions on X is equipped with its Lipschitz norm defined as

(11)

where

is the sup norm and

is complete, then so is the Lipschitz norm [44]. We now present a simple example of a sequence of

functions that converges in the sup norm topology and in the L-topology, but not in the Lipschitz norm, to a Lipschitz map.

Example 5.9.

Consider the sequence of and Lipschitz functions , for , with

and the Lipschitz map

with

. Clearly,

in the sup norm topology. It is easily checked that

for all

and thus h_k does not tend to f as

in the Lipschitz norm topology. However, for

and, trivially, for the two unit vectors

and

the convergence is uniform. Hence, by Theorem 5.5,

in the L-topology. This therefore gives a simple application of our new results in basic mathematical analysis.

5.2 Construction of Approximations to a Lipschitz Map

Let with

(12)

where

, be the standard multivariate Gaussian (normal) probability density distribution—that is, the product of n independent standard Gaussian distributions each along an axis of

. For any positive integer k, let

be given by

. Then

is the multivariate Gaussian probability density distribution with mean

and variance

for

, and thus

. Then

is a sequence of test functions. For definiteness, we will use this particular sequence of test functions in deriving Theorem 5.10 presented in the following, but it is easy to see that this theorem follows for any sequence of test functions. For a Lipschitz map

, let

be the convolution

for any positive integer k.

Theorem 5.10.

For any map with Lipschitz constant c, the sequence is a sequence of functions with Lipschitz constant c such that in the L-topology.

Proof.

Since is , it follows from the derivative properties of convolutions [10, p. 119] that all partial derivatives

for

exist as do all higher-order partial derivatives and therefore h_k is

. Let

, for

, so that

. Then

, where for subsets

, recall that the Minkowski sum and difference are defined by

. Thus, by the preceding change of variable in the integral, we obtain

(13)

Then, we have

Therefore, h_k has Lipschitz constant c, and we have

(14)

showing that

in the sup norm topology. By Theorem (5.5), it remains to show that for

, where the convergence is uniform for

. To use Lemma 5.3 to deduce the uniform convergence, let

and

. Since the map

is uniformly continuous on the compact set

, there exists

such that for

with

, we have

. Furthermore, since

is upper continuous [11, Proposition 2.1.1(b)], there exist

such that for

and

, we have

. Thus, for

with

and

, we have

(15)

Let O_r be the open ball of radius centred at the origin, and let be such that for ,

(16)

For and , we have . Thus, using Equation (13) to compute and , applying Inequality (15), in which we replace y with , and employing Inequality (16), we obtain for , , , , and :

Hence, for

and

with

, we have

. Now the open cover of the compact set

with balls of centre

and radius β has a finite subcover with, say,

for

. We now put

and also

. Then, for

we obtain

for all

and all

, which completes the proof by Lemma 5.3.□

The following main result characterises the L-topology on Lipschitz maps in terms of the density of the subspace of and maps.

Corollary 5.11.

The subspace , and thus , is dense in with respect to the L-topology.

Corollary 5.11 shows that the L-topology is the appropriate topology for Lipschitz maps when approximating these maps by sequences of functions.

5.3 Subgradient Operator as Lower Limit of Derivative Operator

Now we are able to prove our final result in this section. Note that the differential operator with can be regarded as having type since can be identified as a subset of the maximal elements of . For convenience, let in this section. As we have seen, is, by Corollary 5.11, dense in , and moreover, since the restriction of the L-topology on is the norm topology, D is continuous on . In addition, equipped with the L-topology has clearly no isolated elements since for any we have for any . Therefore, the lower limit exists. Next note that the L-topology on , being the meet of the sup norm topology and the Scott topology, is itself second countable. Thus, in the definition of we can use a countable set of open sets. We will now find a simple expression for the support function of for .

Lemma 5.12.

Let , and . Then we have

where

in the L-topology.

Proof.

Let be a local basis of the L-topology for f. If A_k for is a shrinking sequence of non-empty compact and convex sets, then by Equation (1), it is easy to see that the support function satisfies the following equality:

(17)

We thus have the following derivation:

□

Theorem 5.13.

The lower limit and the lower envelope of the differential operator of the type coincide with the subgradient operator: .

Proof.

Let with Lipschitz constant c and . Take a shrinking sequence W_k of open subsets with for all that form a local basis for the L-topology at f. Using the notation , if , then we immediately have . Otherwise, if , then there is a sequence with in the L-topology, equivalently the norm topology, and again . Hence, . We show that . Since , it is sufficient to show that they have the same support function—that is, for each ,

(18)

Let for be the open ball of radius centred at x. Assume without loss of generality that W₀ contains maps with Lipschitz constant bounded by in O₀. We have given by . Let for . Then as in the L-topology. For and , by Cauchy-Schwarz inequality, obtain , which gives us

Taking the limit as

of the two decreasing sequences of supremums, we obtain

(19)

Then, we have

□

If is Lipschitz and , then by using instead of D and instead of in Theorem 5.13, we obtain a similar result for the partial derivatives.

Corollary 5.14.

We have: .

6 EXTENSION OF FUNDAMENTAL THEOREM OF LINE INTEGRALS

We now use the results of Section 5 regarding the lower limit of the classical derivative operator to deduce a simple proof of the interval version of Green’s theorem (i.e., the fundamental theorem of line integrals), which was obtained using interval valued integration in the work of Edalat et al. [27].

Let be an open set and a path in U from a given point to a point . If is a map, then the path integral is independent of the path p. In particular, if , then the path integral is always zero independent of the closed path p. We now define the operator

, where

stands for the function

for any function h. Thus,

gives the derivative of the composition

with respect to

. Since

is dense with respect to the L-topology, we have its lower limit

(20)

where W_k is a shrinking sequence of open subsets for

that form a local basis for the L-topology at f. Recall that any Scott continuous function

is of the form

where

and

are respectively lower and upper continuous function with

. In addition, if

is a step function with I finite, then

iff

for each

[33, Proposition II-4-20].

Proposition 6.1.

The function space is dense in with respect to the Scott topology.
The lower limit of the integral operator with type is given by
where and μ is the Lebesgue measure on .

Proof.

(i) Let , be a step function, where is compact, O_i is an open interval, and I is a finite indexing set with , which implies for all . Since h is a basic Scott open set in , all we need to do is to construct a continuous function with . We have where and are, respectively, lower and upper semicontinuous, piecewise constant maps with as real numbers if , and otherwise and when . The collection of open intervals O_i, for , induces a partition of into a finite number of (open, closed or half-open/half-closed) intervals on each of which and are constant. The Scott continuity of h at q_i for implies that or for and . Let with be the value of in the interiors of the two intervals in P with common boundary q_i. Then u_i is compact while v_i is either compact or . Put for . If is compact, put ; otherwise, let . Similarly, if is compact, put ; otherwise, let . Then, consider the piecewise linear map with , linear in each interval for . By construction, for each we have . In fact, for each , there exists an open set with such that for . It follows that for and hence .

(ii) We first note that the integral is a continuous functional since if then where is the sup norm on . Thus, by the previous part and Proposition 3.11, we know that and both exist and . Consider any Scott continuous function . To compute , assume is a Scott open set with . Let . Since and are respectively lower and upper semi-continuous functions, there exist an increasing sequence of continuous functions and a decreasing sequence of continuous functions , where , such that and [31, Section 1.7.15(c)]. Thus, the sequence , for , is an increasing sequence of Scott continuous functions with and hence there exists such that . Since this holds for all open sets O containing f, it follows that . If is any open interval with , then by the monotone convergence theorem applied to the sequences and , there exists such that —that is, .□

We now obtain a short proof for the interval version of Green’s theorem, a main result in the work of Edalat et al. [27].

Theorem 6.2.

The composition

is given by

(21)

and satisfies

(22)

Proof.

Take and in Equation (18). Then from the proof of Theorem 5.13, the expression for in Equation (20) gives us . Equation (21) now follows from Proposition 6.1(ii). Next consider the operator

with

. Its lower limit has type

and is continuous with respect to the L-topology by Proposition 3.2(iii) since

is a Hausdorff space. Let

, be any sequence with

in the L-topology. Since

is continuous with respect to the L-topology and

is continuous with respect to the relative subspace L-topology, we have

But the composition

of Scott continuous functions is Scott continuous, and if

then

. Thus,

is a lower extension of

. Since D_p and

are both continuous functionals, by Proposition 3.11 we have

as well as

. Thus, Relation (22) follows from Proposition 3.2(i).□

Clearly, the composition in Equation (21) is in general interval valued rather than real valued. For example, let , and and consider the Lipschitz function with for all as in the work of Edalat [19, Lemma 7.8]. Then, but since and for all , we obtain . We have thus constructed an example in which the lower limit of the composition of two higher-order maps is not equal to the composition of the lower limits of the two maps.

7 SMOOTH APPROXIMATION OF GENERALISED JACOBIAN

In this section, we will extend the results of Sections 5 to vector Lipschitz maps. Let denote, for an open set , the set of continuously differentiable maps of type , and let , respectively , be the set of Lipschitz maps, respectively locally Lipschitz maps, . We will use the following closed expression for the generalised Jacobian derived by Imbert [36], which can be viewed as a divergence theorem for Lipschitz vector maps. Let denote the collection of real matrices with the inner product , which induces the Frobenius norm. For and , consider the hyper-cube of volume with sides emanating from x in the direction of the canonical unit vectors e_i with —that is,

Let

be the boundary¹ of

with

the outer unit normal at

and

the surface Lebesgue measure on

. Consider

. Then, the following two results are obtained in the work of Imbert [36]. For

, the support function of

in the direction

is given by

(23)

For

, note that

, for

and we have

(24)

where

[36].

Proposition 7.1.

If a sequence of maps , for , converges uniformly to a map , then for all and .

Proof.

Assume first that , , and are given. Then, by Equation (23), there exist , with , and , with , such that and

(25)

Since

uniformly as

, there exists

such that

(26)

However, since

is differentiable almost everywhere by Rademacher’s theorem [12, page 148], using Gauss-Green (divergence) theorem [40, Theorem 2.9], we obtain

(27)

where

is the divergence of the vector field

. Thus, by using Relations (25), (26) and (27),

From the preceding inequality, it follows that

for some

(in fact, for y₀ in a subset of

of positive Lebesgue measure). Then

. Since

and

are arbitrary, the result follows for

For , let , and be given. There exist, by Equation (24), and , with and , such that and , where . Put . Since uniformly as , there exists such that for we have

Thus, for

, we have

By Lebourg’s mean value theorem, there exists

such that

Since

and

are arbitrary, the result follows.□

The L-topology for locally Lipschitz vector maps is defined similarly as for locally Lipschitz scalar maps in Section 5: it is the weakest refinement of the sup norm topology on that makes the generalised Jacobian continuous with respect to the Scott topology on . The following theorem can now be deduced with a proof similar to that of Theorem 5.4. Since the inner product , when , is reduced to , we can state the result below uniformly for all .

Theorem 7.2.

A sequence , , converges to in the L-topology iff h_k converges to f in the sup norm topology and for all and , we have

where the convergence is uniform for

Corollary 7.3.

A sequence , , converges to in the L-topology iff h_k converges to f in the sup norm topology and for all and , we have

where the convergence is uniform for

To show that is dense in , suppose with components is a Lipschitz map with Lipschitz constant and μ is the standard Gaussian probability measure on . For each positive integer k, as in Theorem 5.10 of Section 5.2, consider the convolutions with components given by , where and is the standard Gaussian probability distribution of Equation (12).

Theorem 7.4.

For each , we have and the sequence h_k converges to f in the L-topology.

Proof.

Applying Theorem 5.10 to each component , for and , it follows that with in the sup norm topology. It is now convenient to express , for , in terms of the probability measure μ as follows:

(28)

with

where

. In addition, we express Equation (28) in vector notation—that is,

(29)

To show that the convergence as in Theorem 7.3 is uniform for

, we first compute the following triple limit superior. For

, using Formula (23) for the Jacobian

of h_k, Fubini’s theorem and the reverse Fatou’s lemma, we obtain

We claim that the last term is bounded above by . Let be given. Let be large enough so that . Since is upper semi-continuous at x₀ and the inner product is continuous, it follows that there exists with such that for and with , we have . Thus, for , and , we have which, denoting the open ball of radius r centred at the origin by O_r, implies

Since

is arbitrary, our claim follows. Thus,

(30)

Now, to use Lemma 5.3 to complete the proof, let be given. Since is a compact convex set, the support function is continuous and hence uniformly continuous on . Thus, there exists such that for with , we have . By Relation (30), there exists , and such that for with , and we have . Hence, for , and we have . Consider the open cover of by balls of centre M and radius and take a finite cover given by M_i for , say. It follows that for , and all , we have and the proof is complete for .

For , with , we have

where μ is now the one-dimensional standard Gaussian distribution. Put

for integers

. Since

, we can apply the dominated convergence theorem to obtain

for

. Now using reverse Fatou’s lemma, we deduce

where the latter inequality easily follows from the definition of

. The uniform convergence for

, in view of the compactness of

, now follows with a proof similar to the case of

presented after Equation (30).□

Theorem 7.4 has finally extended Theorem 5.10 to vector Lipschitz maps, and we conclude the following.

Corollary 7.5.

is dense in with respect to the L-topology.

We can now also obtain a simple proof of the mean value theorem for vector Lipschitz functions.

Corollary 7.6.

(cf. [11, 2.6.5]) Suppose is Lipschitz in an open set containing the line segment . Then, .

Proof.

Let with . Then, where the latter set is open. Take open set , on which f is Lipschitz, with . Then by Gierz et al. [33, Proposition II-4-20]. Suppose is a sequence of functions that converges to f in the L-topology. Thus, there exists N such that implies and in particular . We now apply the classical mean value theorem to obtain . Since in the sup norm topology, it follows that . As is arbitrary, we conclude that .□

Finally, we have the counterpart of Theorem 5.13 for vector Lipschitz maps, which is proved in a similar way. Let denote the vector differential operator with .

Theorem 7.7.

The lower limit and the lower envelope of the differential operator coincide with the subgradient operator—that is, with .

8 SUBGRADIENT ON BANACH SPACES

In this section, we consider Lipschitz maps of type where X is a real Banach space with as the norm of . If is a non-empty weak* compact and convex set, then its support function is given by .

Consider a Lipschitz map with Lipschitz constant and the bounded complete domain of non-empty weak* compact and convex subsets of as presented at the end of Section 2. Since is a compact Hausdorff space with respect to the relative subspace weak* topology on , the upper topology and the Scott topology coincide on it [17]. The subgradient of f at is a compact convex subset of which, as in the finite dimensional case, can be defined by its support function [11, p. 28]:

The subgradient coincides with the so-called L-derivative as introduced in the work of Edalat [19] which we will now define. We start by observing that can be identified with the set of maximal elements of the bounded complete domain . Moreover, , identified as usual with the subset of maximal elements of , is dense in . Hence, for , the linear map , with has continuous extension , with

Given an open set

and

a weak* compact convex set, the tie

is defined to be the set of all Lipschitz maps

such that

for all

, where

, which is a compact interval. It was shown in the work of Edalat [19] that the map

given by

(31)

is well defined and upper continuous with

, and in addition if X is finite dimensional, then

. Hertling [34] showed later that in fact

for any Banach space X. This leads to a short proof of the upper continuity of the subgradient.

Proposition 8.1.

For any Lipschitz map on a Banach space X, the subgradient is upper continuous.

Proof.

If c is a Lipschitz constant for f, then we have . We know that is Scott continuous [19] and [34]. But since is a compact Hausdorff space, the Scott topology on coincides with the upper topology [17].□

8.1 Approximation of Subgradient on Separable Banach Spaces

For an infinite dimensional Banach space X, the function space is no longer a domain and the Scott topology on it does not have a simple representation. However, if X is separable, we show that there is a sequence of Gateaux differentiable functions converging to a given Lipschitz map in the sup norm topology such that the limit superior of the sequence the Gateaux derivatives of the functions in any given direction converges to the generalised directional derivative of the Lipschitz map in that direction. We assume from now on that X is actually a separable Banach space and is Lipschitz with Lipschitz constant .

Recall that a Gaussian measure on the separable Banach space X is a probability Borel measure μ on X such that for every the induced measure on given by L with , for any Borel set , has a Gaussian distribution on ; a non-degenerate Gaussian measure on the separable Banach space X is one such that for every non-zero the induced measure on has a non-degenerate Gaussian distribution on [9, 6.17]. We first describe how such a Gaussian measure can be constructed on a Hilbert space (see [9]). Consider a separable real Hilbert space H with inner product for . Assume is a trace class positive operator—that is, there exists an orthonormal basis of H with such that for , and . If is any given point, then the probability measure μ with the characteristic function

(32)

where

and

, is a Gaussian measure on H. Thus, the coordinates a_n, for

, of x with respect to the basis

are independent Gaussian variables with mean

and variance

We can now use the Gaussian measure μ constructed on H to obtain a Gaussian measure on the separable Banach space X. We fix a sequence with a dense linear span in X such that . Consider the linear map with . It follows that the forward measure on X induced by T is a non-degenerate Gaussian measure on X that is constructed from a simple Gaussian measure on a Hilbert space.

The key property we need to invoke to obtain the result of this section is the following.

Theorem 8.2 ([9, 6.25 and 6.42]).

If is a Lipschitz map on the separable Banach space X, then the Gateaux derivative exists everywhere on X except for x in a null set with respect to any non-degenerate Gaussian measure on X.

We can now deal with the construction of a sequence of Gateaux differentiable functions converging to a given Lipschitz map with similar limiting properties as in the finite dimensional case in Theorem 5.5. We first observe that the proof of Lemma 5.1 extends to the case of a sequence of Gateaux differentiable maps on a Banach space converging in the sup norm topology to a Lipschitz map. This is because the mean value theorem, invoked in that lemma, also holds for any Gateuax differentiable map . In fact if with , define by . Then g is continuous in and differentiable in , and hence there exists such that with . Thus, we have the following lemma.

Lemma 8.3.

Suppose is a sequence of Gateaux differentiable such that in the sup norm topology as where f is a Lipschitz map. Then for all and , we have

Let be a Lipschitz map with Lipschitz constant , where we have assumed for convenience (and without loss of generality) that the domain of f is the whole space X. We consider the construction in [9, 6.43] of a sequence of maps given by

for

, where μ is any non-degenerate Gaussian measure on X.

Lemma 8.4 ([9, 6.43]).

The maps h_n are uniformly Lipschitz with Lipschitz constant c and are Gateaux differentiable for with the sequence h_n converging in the sup norm topology to f.

We can now obtain our final result.

Theorem 8.5.

There is a sequence of uniformly Lipschitz and Gateaux differentiable maps converging in the sup norm topology to f such that we have for ,

Proof.

Consider the sequence h_n for in Lemma 8.4. By Theorem 8.2, the Gateaux derivative exists everywhere on X except for x in a null set with respect to the non-degenerate Gaussian measure μ on X. We can therefore obtain an integral expression with respect to μ for the Gateaux derivative of h_n as follows. We have

Since the integrand, in absolute value, on the RHS is bounded by

, writing

for integer

, with

, the dominated convergence theorem (as in the proof of Theorem 7.4) implies that

is integrable with respect to μ with

By Lemma 8.3, we already have

where the first inequality follows from the continuity of the bounded linear map

for each

and

Let . Since the map is upper continuous in z and v [11, 2.1.1(b)], there exists such that for and we have . Hence, in the above neighbourhoods of z and v, whenever exists, we have , which implies that . However, we have for . Let be such that . Then, the reverse Fatou’s lemma yields

Therefore, since

is arbitrary, we have

and the result follows.□

Comparing Theorem 8.5 with Theorem 5.10, we see that our result for approximation of a Lipschitz map on a separable Banach space provides a sequence of Gateuax differentiable maps rather than a sequence of maps as was the case in finite dimensions. In addition, in finite dimensions, the convergence in Theorem 5.5 and thus in Theorem 5.10 is uniform on the unit sphere, a property which is fundamentally based on compactness properties in finite dimensions.

9 CONCLUDING REMARKS

We have obtained new representations for the subgradient of a real-valued locally Lipschitz map, defined on a finite dimensional Euclidean space, and also for the generalised Jacobian of a locally Lipschitz vector map between finite dimensional Euclidean spaces. These results lead us to simpler proofs for some of the basic properties of these generalised derivatives. The L-topology on the space of locally Lipschitz maps has been characterized in terms of convergent sequences in this space. We have shown that the set of maps is dense in the space of Lipschitz maps equipped withe the L-topology and that convergence of a sequence of maps to a Lipschitz map in the L-topology is equivalent to the convergence of the sequence to the Lipschitz map in the sup norm topology and the uniform convergence of the limit superior of the sequence of the derivatives of the maps in a given unit vector direction with the subgradient of the Lipschitz map in the direction of that unit vector. Given a Lipschitz map between finite dimensional Euclidean spaces, we have constructed a sequence of maps that converges to the Lipschitz map in the L-topology. This result confirms that the L-topology is the appropriate topology on the space of Lipschitz maps when approximating a Lipschitz map with or maps. For a real-valued Lipschitz map on a separable Banach space, we also explicitly derived a sequence of Gateaux differentiable Lipschitz maps converging in the sup norm to the Lipschitz map such that the limit superior of the Gateaux derivative of the maps in any direction coincides with the subgradient of the Lipschitz map in that direction.

As for future work, the question arises if any of the new representations of the subgradient of a Lipschitz map between finite dimensional Euclidean spaces can be extended to find a new representation of the subgradient of a real-valued Lipschitz map on a separable Banach space.

Footnotes

¹ We avoid the usual notation for the boundary of a set A as it can be confused here with the generalised Jacobian.
Footnote

REFERENCES

[1] Abramsky S. and Jung A.. 1994. Domain theory. In Handbook of Logic in Computer Science, Abramsky S., Gabbay D. M., and Maibaum T. S. E. (Eds.). Vol. 3. Clarendon. Google ScholarDigital Library
Reference
[2] Acary V. and Brogliato B.. 2008. Numerical Methods for Nonsmooth Dynamical Systems: Applications in Mechanics and Electronics. Springer Science & Business Media.Google ScholarCross Ref
Reference
[3] Aubin J.-P. and Cellina A.. 2012. Differential Inclusions: Set-Valued Maps and Viability Theory. Vol. 264. Springer Science & Business Media. Google ScholarDigital Library
Reference
[4] Aubin J.-P. and Frankowska H.. 2009. Set-Valued Analysis. Springer Science & Business Media.Google ScholarCross Ref
Reference
[5] Auslender A. and Teboulle M.. 2006. Asymptotic Cones and Functions in Optimization and Variational Inequalities. Springer Science & Business Media.Google Scholar
Reference
[6] Azagra Daniel, Ferrera Juan, López-Mesas Fernando, and Rangel Yenny. 2007. Smooth approximation of Lipschitz functions on Riemannian manifolds. Journal of Mathematical Analysis and Applications 326, 2 (2007), 1370–1378.Google ScholarCross Ref
[7] Beck A. and Teboulle M.. 2012. Smoothing and first order methods: A unified framework. SIAM Journal on Optimization 22, 2 (2012), 557–580.Google ScholarDigital Library
Reference
[8] Benedetto John. 2013. Real Variable and Integration: With Historical Notes. Springer-Verlag.Google Scholar
[9] Benyamini Y. and Lindenstrauss J.. 2000. Geometric Nonlinear Functional Analysis. American Mathematical Society.Google Scholar
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
[10] Bracewell R. N.. 1986. The Fourier Transform and Its Applications. McGraw-Hill, New York, NY.Google Scholar
Reference
[11] Clarke F. H.. 1990. Optimization and Nonsmooth Analysis (2nd ed.). Classics in Applied Mathematics, Vol. 5. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA.Google ScholarCross Ref
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
[12] Clarke F. H., Ledyaev Yu. S., Stern R. J., and Wolenski P. R.. 1998. Nonsmooth Analysis and Control Theory. Springer. Google ScholarDigital Library
[13] Coddington E. A. and Levinson N.. 1955. Theory of Ordinary Differential Equations. McGraw-Hill.Google Scholar
Reference
[14] Davis T. A. and Sigmon K.. 2005. MATLAB Primer (7 ed.). CRC Press, Boca Raton, FL. Google ScholarDigital Library
Reference 1Reference 2
[15] Eckhoff J.. 1993. Helly, radon, and caratheodory type theorems. In Handbook of Convex Geometry. Vol. B. North-Holland, 389–448.Google Scholar
[16] Edalat A.. 1995. Domain theory and integration. Theoretical Computer Science 151 (1995), 163–193. Google ScholarDigital Library
Reference 1Reference 2
[17] Edalat A.. 1995. Dynamical systems, measures and fractals via domain theory. Information and Computation 120, 1 (1995), 32–48. Google ScholarDigital Library
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
Reference 5
[18] Edalat A.. 1997. Domains for computation in mathematics, physics and exact real arithmetic. Bulletin of Symbolic Logic 3, 4 (1997), 401–452.Google ScholarCross Ref
Reference
[19] Edalat A.. 2008. A continuous derivative for real-valued functions. In New Computational Paradigms. Springer, 493–519.Google ScholarCross Ref
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
[20] Edalat A.. 2009. A computable approach to measure and integration theory. Information and Computation 207 (2009), 642–659. Google ScholarDigital Library
Reference
[21] Edalat A.. 2010. A differential operator and weak topology for Lipschitz maps. Topology and Its Applications 157, 9 (2010), 1629–1650.Google ScholarCross Ref
Reference 1Reference 2Reference 3
[22] Edalat A.. 2015. A derivative for complex Lipschitz maps with generalised Cauchy–Riemann equations. Theoretical Computer Science 564 (2015), 89–106.Google ScholarCross Ref
Reference
[23] Edalat A. and Heckmann R.. 1998. A computational model for metric spaces. Theoretical Computer Science 193, 1–2 (1998), 53–73. Google ScholarDigital Library
Reference
[24] Edalat A. and Heckmann R.. 2002. Computing with real numbers: (i) LFT approach to real computation, (ii) Domain-theoretic model of computational geometry. In Applied Semantics: Advanced Lectures, Barthe G., Dybjer P., Pinto L., and Saraiva J. (Eds.). Lecture Notes in Computer Science, Vol. 2395. Springer, 193–267. Google ScholarDigital Library
Reference
[25] Edalat A. and Lieutier A.. 2002. Foundation of a computable solid modelling. Theoretical Computer Science 284, 2 (2002), 319–345. Google ScholarDigital Library
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
[26] Edalat A., Lieutier A., and Kashefi E.. 2001. The convex hull in a new model of computation. In Proceedings of the 13th Canadian Conference on Computational Geometry.Google Scholar
Reference
[27] Edalat A., Lieutier A., and Pattinson D.. 2013. A computational model for multi-variable differential calculus. Information and Computation 224 (2013), 22–45. Google ScholarDigital Library
Reference 1Reference 2
[28] Edalat A. and Pattinson D.. 2007. Denotational semantics of hybrid automata. Journal of Logic and Algebraic Programming 73, 1–2 (2007), 3–21.Google ScholarCross Ref
Reference 1Reference 2
[29] Edalat A. and Pattinson D.. 2007. A domain-theoretic account of Picard’s theorem. LMS Journal of Computation and Mathematics 10 (2007), 83–118.Google ScholarCross Ref
Reference 1Reference 2
[30] Edalat A., Potts P. J., and Sünderhauf P.. 1998. Lazy computation with exact real numbers. In Proceedings of the 3rd ACM SIGPLAN International Confrence on Functional Programming. ACM, New York, NY, 185–194. Google ScholarDigital Library
Reference 1Reference 2
[31] Engelking R.. 1989. General Topology. Sigma Series in Pure Mathematics, Vol. 6. Heldermann Verlag, Berlin.Google Scholar
[32] Federer H.. 1969. Geometric Measure Theory. Springer.Google Scholar
[33] Gierz G., Hofmann K. H., Keimel K., Lawson J. D., Mislove M., and Scott D. S.. 2003. Continuous Lattices and Domains. Cambridge University Press.Google ScholarCross Ref
Reference 1Reference 2
[34] Hertling P.. 2017. Clarke’s generalized gradient and Edalat’s L-derivative. Journal of Logic and Analysis 9 (2017).Google Scholar
Reference 1Reference 2Reference 3
[35] Hiriart-Urruty Jean-Baptiste and Lemaréchal Claude. 2012. Fundamentals of Convex Analysis. Springer Science & Business Media.Google Scholar
[36] Imbert C.. 2002. Support functions of Clarke generalized Jacobian and of its plenary hull. Nonlinear Analysis: Theory, Methods and Applications 49, 8 (2002), 111–1125. Google ScholarDigital Library
Navigate to
Reference 1
Reference 2
Reference 3
Reference 4
[37] Lebourg G.. 1979. Generic differentiability of Lipschitzian functions. Transactions of American Mathematical Society 256 (1979), 125–144.Google ScholarCross Ref
[38] Moore R. E.. 1966. Interval Analysis. Prentice-Hall.Google Scholar
Reference
[39] Murphy K. P.. 2012. Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge, MA. Google ScholarDigital Library
Reference
[40] Pfeffer Washek F.. 2005. The Gauss–Green theorem in the context of Lebesgue integration. Bulletin of the London Mathematical Society 37, 1 (2005), 81–94.Google ScholarCross Ref
[41] Schneider R.. 1993. Convex Bodies: The Brunn-Minkowski Theory. Cambridge University Press.Google ScholarCross Ref
[42] Scott D.. 1970. Outline of a mathematical theory of computation. In Proceedings of the 4th Annual Princeton Conference on Information Sciences and Systems. 169–176.Google Scholar
Reference 1Reference 2
[43] Scott D.. 1976. Data types as lattices. SIAM Journal on Computing 5, 3 (1976), 522–587.Google ScholarDigital Library
Reference 1Reference 2
[44] Sherbert D. R.. 1963. Banach algebras of Lipschitz functions. Pacific Journal of Mathematics 13, 4 (1963), 1387–1399.Google ScholarCross Ref
Reference
[45] Shor N. Z.. 2012. Minimization Methods for Non-Differentiable Functions. Vol. 3. Springer Science & Business Media. Google ScholarDigital Library
Reference
[46] Warga J.. 1981. Fat homeomorphisms and unbounded derivate containers. Journal of Mathematical Analysis and Applications 81 (1981), 545–560.Google ScholarCross Ref
[47] Winskel G.. 1993. The Formal Semantics of Programming Languages: An Introduction. MIT Press, Cambridge, MA. Google ScholarDigital Library
Reference
[48] Yamamuro S.. 1970. Differential Calculus in Topological Linear Spaces. Lecture Notes in Mathematics, Vol. 374. Springer-Verlag.Google Scholar
Reference
[49] Yeh James. 2006. Real Analysis: Theory of Measure and Integration(2nd ed.).Google ScholarCross Ref

Index Terms

Smooth Approximation of Lipschitz Maps and Their Subgradients
1. Mathematics of computing
  1. Continuous mathematics
    1. Topology
      1. Point-set topology
  2. Mathematical analysis
    1. Functional analysis
      1. Approximation
    2. Mathematical optimization
      1. Continuous optimization
        Nonconvex optimization
2. Theory of computation
  1. Logic
    1. Constructive mathematics

Recommendations

Iterative approximation of Lipschitz strictly pseudocontractive mappings in uniformly smooth Banach spaces
Read More
A subgradient extragradient algorithm for solving multi-valued variational inequality

In this paper, we propose a subgradient extragradient method for solving multi-valued variational inequality. It is showed that the method converges globally to a solution of multi-valued variational inequality, provided the multi-valued mapping is ...
Read More
Proximally Guided Stochastic Subgradient Method for Nonsmooth, Nonconvex Problems

In this paper, we introduce a stochastic projected subgradient method for weakly convex (i.e., uniformly prox-regular) nonsmooth, nonconvex functions---a wide class of functions which includes the additive and convex composite classes. At a high level, the ...
Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Published in
Journal of the ACM Volume 69, Issue 1
February 2022
358 pages
ISSN:0004-5411
EISSN:1557-735X
DOI:10.1145/3501289
Editor:
Venkatesan Guruswami
University of California, Berkeley, United States
Issue’s Table of Contents
Copyright © 2021 Association for Computing Machinery.
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 22 December 2021
- Accepted: 1 August 2021
- Revised: 1 July 2021
- Received: 1 October 2019
Published in jacm Volume 69, Issue 1

Permissions
Request permissions about this article.
Request Permissions

Check for updates
Author Tags
Non-smooth optimisation
generalised Jacobian
subgradient
Lipschitz maps
L-topology
Qualifiers
- research-article
- Refereed
Conference
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 2
  Total Citations
  View Citations
- 1,443
  Total Downloads
- Downloads (Last 12 months)330
- Downloads (Last 6 weeks)36
Other Metrics
View Author Metrics
Cited By
View all

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Smooth Approximation of Lipschitz Maps and Their Subgradients

Journal of the ACM

Abstract

1 INTRODUCTION

1.1 Notation and Terminology

2 DOMAIN THEORY

3 LOWER EXTENSION, LOWER ENVELOPE AND LOWER LIMIT OF DOMAIN MAPS

4 GENERALISED JACOBIAN AS LOWER LIMIT OF DERIVATIVES

4.1 Representation by Lower Limit

4.2 Basic Properties of Generalised Jacobian

5 SMOOTH APPROXIMATION OF SUBGRADIENT

5.1 Characterisation of Convergence in L-Topology

5.2 Construction of Approximations to a Lipschitz Map

5.3 Subgradient Operator as Lower Limit of Derivative Operator

6 EXTENSION OF FUNDAMENTAL THEOREM OF LINE INTEGRALS

7 SMOOTH APPROXIMATION OF GENERALISED JACOBIAN

8 SUBGRADIENT ON BANACH SPACES

8.1 Approximation of Subgradient on Separable Banach Spaces

9 CONCLUDING REMARKS

Footnotes

REFERENCES

Cited By

Index Terms

Recommendations

Iterative approximation of Lipschitz strictly pseudocontractive mappings in uniformly smooth Banach spaces

A subgradient extragradient algorithm for solving multi-valued variational inequality

Proximally Guided Stochastic Subgradient Method for Nonsmooth, Nonconvex Problems

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media