Optimization
Vocabulary :
For an objective function of several variables, a "stationary point"
is a set of values of those variables where all partial derivatives
of the function vanish.
As illustrated below, a local extremum must be
a stationary point but the converse need not hold.
Because of the inclusive meaning
assigned by default to the simplest mathematical terms
(which is the exact opposite of the exclusive meaning often
attributed to simple words in everyday language)
most mathematicians consider "stationary point" and "saddlepoint"
to be synonymous:
At a saddlepoint, the relevant quantity may
rise in some directions and fall in others
but it's not required to do so...
There might very well be an extremum there!
Some authors reserve the term "saddlepoint"
to nonextremal stationary points.
We prefer to call those proper
saddlepoints
(thus following normal mathematical nomenclature).
(2007-09-23)
Optimizing a smooth function f
of a single variable.
'Tis little more than finding where the function's
derivative vanishes.
For a point x in the interior
of the domain of f and a small enough value
of e,
both points x-e
and x+e are in the allowed domain and
yield values of f which are on both sides of
f (x) if we just assume that f '
is nonzero. Therefore, there can be an extremum at point x
only when f ' (x) vanishes.
On the other hand, there's no such requirement for a point x
at the border of the allowed domain, because small displacements
of only one sign are allowed.
Away from the border, a saddlepoint x
(let's use that general term to indicate that
f ' (x) vanishes) will be
the location of a
minimum (resp. a
maximum )
when f '' (x)
is positive (resp. negative). If it's zero, further
analysis is needed to determine whether x
is the location of an extremum or not.
That discussion involving second-order
(or higher) derivatives is typical of what happens with
several variables when it comes to decide whether a saddlepoint
is an actual extremum, as illustrated in the
next article.
Extreme Value Theorem
(real-valued continuous function on a compact set)
(E. M. of Wisconsin Rapids, WI.
2000-11-21)
Saddlepoints & Extrema
Determine the points where the [objective] function
z = 3x3+3y3-9xy
is maximized or minimized. [ Check for
second-order condition. ]
A necessary (but not sufficient) condition for a smooth function
of two variables to be extremal
( "minimized" or "maximized" ) on an open domain
is to be at a saddlepoint
where both partial derivatives vanish.
In this case that means:
9x2-9y = 0
and
9y2-9x = 0
So, extrema of z can only occur when
(x,y) is either (0,0) or (1,1).
To see whether a local extremum actually occurs,
we must examine the second-order behavior of the
function about each such saddlepoint
(in the rare case where the second-order
variation vanishes, further analysis would be required).
Well, if the second-order partial derivatives are L, M and N, then
the second-order variation at point (x+dx, y+dy) is the quantity
½ [ L(dx)2 + 2M(dxdy) + N (dy)2 ]
We recognize the bracket as a quadratic expression whose sign remains the same
(regardless of the ratio dy/dx) if and only if
its (reduced) discriminant
(M2-LN) is negative.
If it's positive, the point in question is not an extremum.
In our example, we have L = 18x, M = -9 and N = 18y.
Therefore:
M2 - LN = 81 (1 - 4xy)
At (0,0) this quantity
is positive (+81).
Thus, there's no extremum there.
On the other hand, at the point (1,1)
this quantity is negative (-243) so the point (1,1) does
correspond to the only local extremum of z.
Is this a maximum or a minimum? Well, just look at the sign of L
(which is always the same as the sign of N for an extremum).
If it's positive, surrounding points yield higher values and,
therefore, you've got a minimum. If it's negative you've got a maximum.
Here, L = 18, so a minimum is achieved at (1,1).
To summarize: z has only one local extremum;
it's a minimum of -3, reached at x=1 and y=1.
Does this mean we have an absolute minimum?
Certainly not!
(When  x and y are negative,
a large enough magnitude of either will make
z fall below any preset threshold.)
(2007-09-22)
Unconstrained (or "absolute") saddlepoints
Saddlepoints of a function of several independent variables.
We're seeking saddlepoints (stationary points) of a
scalar objective function
M of n variables x1 , ... , xn
which we view as components of a vector x.
M ( x1 , ... , xn ) = M ( x )
If those n variables are independent,
then a saddlepoint is obtained only at a point where
the differential form dM vanishes.
This is to say:
0 = dM = grad M . dx
As this relation must hold for any
infinitesimal vectorial displacement
dx, such absolute saddlepoints of M
are thus characterized by:
grad M = 0
That vectorial relation is shorthand for
n separate scalar equations:
| ¶ M |
= 0 |
 |
| ¶ xi |
(2007-09-22)
Lagrange Multipliers
Optimizing entails one Lagrange multiplier per constraint.
Instead of a set of independent variables
(as discussed above) we may have to deal with several
variables that are tied to each other by some constraints.
For example, the variables may be subject to a single constraint :
E ( x1 , ... , xn ) = constant
While an unconstrained saddlepoint of M was obtained when dM = 0.
A constrained saddlepoint is obtained when dM is proportional to dE.
More generally, when k functions
E1 ... Ek are given to be constant,
a constrained saddlepoint of M is achieved when the
differential form dM is a linear
combination of the differentials of E1 ... Ek.
dM =
l1 dE1 +
l2 dE1 + ... +
lk dEk
In other words, there are k constants
li
(each is known as the Lagrange multiplier
associated with the corresponding constraint) such that the following
function has an unconstrained (or absolute)
saddlepoint.
M - (
l1 E1 +
l2 E1 + ... +
lk Ek )
Video:
Lecture 2 | Modern Physics:
Statistical Mechanics by Leonard Susskind (2009-04-06)
(2007-09-23)
The fattest cone of given base...
What apex minimizes the lateral area of a cone of given base and volume?
The volume of a cone is one third of the product of its base area by its
height. So, by imposing the cone's volume, we're actually imposing
the height of the cone and looking for an optimal
apex somewhere in a fixed plane parallel to the base...
(2007-09-23)
Calculus of Variations
(cf. Lagrangian mechanics)
Euler-Lagrange equations hold whenever a path integral
is stationary.
Historically,
the first problem of the type described below
was the brachistochrone problem
(find the shape of the curve of fastest descent)
which had been considered by Galileo
in 1638 and solved by Johann Bernoulli in 1696
(Bernoulli withheld his solution to turn the problem into a public challenge,
which was quickly met by several prestigious mathematicians).
The subject was investigated by Euler in 1744 and
by Lagrange in 1760.
It was named calculus of variations by Euler in 1766
and made fully rigorous by Weierstrass
before 1890.
Let L be a smooth enough function
of 2n+1 variables:
L =
L ( q 1 , q 2 ... q n ,
v1 , v2 ... vn , t )
We ssume that the first n variables (q) are actually functions of the
last one (t) and also that the subsequent variables (v)
are simply their respective derivatives with respect to t:
| v i (t) = |
d |
q i (t) |
 |
| dt |
This makes L a function of t alone and we may consider
the following integral S (called the "action" of L )
for prescribed configurations at both extremities.
In this context, a configuration is defined as a complete set
of values for the q-variables only (irrespective of what the v-variables may be)
so, what we are really considering now are prescribed fixed values of
all the 2n quantities
q i (a) and
q i (b).
The fundamental problem of the calculus of variations is to
find what local conditions make S stationary,
for optimal functions q i .
Namely:
Euler-Lagrange equations
| ¶ L |
= |
d |
( |
¶ L |
) |
 |
 |
 |
| ¶ q i |
dt |
¶ vi |
|
Proof :
From a set of optimal functions q i
and arbitrary (sufficiently smooth) functions
h i which vanish at both extremities
a and b, let's define the following family of
functions, depending on a single parameter e :
| Q i (t) = q i (t) +
e h i (t)
| |
V i (t) = v i (t) +
e |
d |
h i (t) |
 |
| dt |
Those yield a value S(e) of the action which must be
stationary at e = 0
(since the functions q i
are optimal). Thus, the derivative of S(e)
along e does vanish
at e = 0, which is to say:
| 0 = |
ò |
b |
å i |
[ |
h i |
¶ L |
+ |
d h i |
|
¶ L |
] dt |
|
 |
 |
 |
| a |
¶ q i |
dt |
¶ vi |
Since every h i vanishes at both extremities
a and b , we may
integrate by parts
the second term of the square bracket to obtain:
| 0 = |
ò |
b |
å i |
h i |
[ |
¶ L |
- |
d |
( |
¶ L |
) |
] dt |
|
 |
 |
 |
| a |
¶ q i |
dt |
¶ vi |
As h i is arbitrary, the square bracket must vanish
everywhere, for every i.
(If this wasn't so, the whole equality would be violated for a choice
of h i vanishing wherever the square bracket
doesn't have a prescribed sign).
Theoretical and Practical Examples :
The above is most commonly used as the basis for
variational mechanics and
related aspects of theoretical physics based on a
principle of least action.
However, it's also the correct answer to more practical concerns:
On 2008-10-27,
Bill Swearer
wrote: [edited summary]
As a pilot, I’ve always been interested in writing a proper piece of
flight-planning software to optimize the plane’s path with regard to time, fuel,
or any combination thereof. [...]
I’ve always felt the professionally-provided solutions were crude approximations that do
not take into account the full range of possibilities, especially over long distance,
as one crosses varying jet streams at different altitudes, etc. [...]
What is the correct mathematical approach?
Thanks a lot.
Bill Swearer
|
Well, just express carefully
the local cost function
(L) as a function of the position
of the aicraft and of the related derivatives
(to a good approximation,
the latter are only useful to compute horizontal speed).
Predicted meteorological conditions (changing with time
throughout space) can be used for best planning.
The Euler-Lagrange equations then tell the pilot what
to do at all times.
(2009-07-05)
A Proof of Noether's Theorem (1915)
Proving Noether's theorem
for continuous symmetries.
A slight modification of the above proof yields a straight
derivation of one of the
greatest statements of mathematical physics. Let's just do it...
Suppose that, for specific functions
hi , a symmetry exists which leaves L 
unchanged, (to first-order variations of
e about 0)
when q i is replaced by
Q i = q i +
e h i
Formally, this leads to a situation similar to the previous one,
since we still know that the derivative of
S(e) vanishes at
e = 0
(albeit for very different reasons).
However, the h functions need not vanish at the extremities
a and b.
So, an extra "integrated term" appears
in the following expression of the derivative of
S(e)
which results from our integration by parts:
| 0 = |
å i |
h i |
¶ L |
| |
b |
+ |
ò |
b |
å i |
h i |
[ |
¶ L |
- |
d |
( |
¶ L |
) |
] dt |
 |
|
|
 |
 |
 |
| ¶ vi |
a |
a |
¶ q i |
dt |
¶ vi |
Now, as the integral still vanishes (because
the previously established Euler-Lagrange equations make every
square bracket vanish) the extra term must be zero as well.
This means that the following quantity doesn't change:
Conserved Noether Charge
| å i |
h i |
¶ L |
 |
| ¶ vi |
|
Emmy Noether (1882-1935)
Video:
Lagrangian & Field Theory
by Leonard Susskind
blog
(2012-04-02)
The geodesics of a two-dimensional surface.
Paths of least length
On a 2D surface M (u,v) the infinitesimal
distance corresponding to a displacement du,dv is given by
the first fundamental quadratic form:
F1 (du,dv)
= (dM)2
= E (du)2 + 2 F du dv + G (dv)2
The problem of minimizing the length of the path between two points on that surface is
a standard exemple of the calculus of variations.
Let's introduce notations compatible with the above:
q1 = u
q2 = v
v1 = du/dt
v2 = dv/dt
L = [ F1 (v1 , v2 )
]½
=
[ E v12 + 2 F v1 v2
+ G v22
]½
The two Euler-Lagrange equation for this Lagrangian are (i = 1,2) :
| ¶ L |
= |
d |
( |
¶ L |
) |
 |
 |
 |
| ¶ q i |
dt |
¶ vi |
The first of those (i = 1, qi = u) translates into:
| E'u v12 + 2 F'u v1 v2
+ G'u v22 |
= |
d |
( |
E v1 + F v2 |
) |
 |
 |
 |
| 2 L | dt | L |
Since dX/dt =
v1 ¶X/¶u +
v2 ¶X/¶v
the right-hand side is equal to:
Geodesic lines
(2010-07-06)
The brachistochrone curve is a cycloid (1696)
The curve of fastest descent in a uniform gravitational field.
We are now presenting the brachistochrone problem as a simple application
of the calculus of variation. Historically, the relation is reversed,
as this problem actually helped define the need for the latter, which was formalized
many years after the brachistochrone problem was solved.
In June 1696,
Johann
Bernoulli (1667-1748) challenged the readers of
Acta Eruditorum with his
famous brachistochrone problem:
Along what curve would a point-mass fall from one prescribed point to another lower
one (not directly underneath) in the least possible time?
Johann Bernoulli's own solution was published in the same journal
a year later, along with 4 of the 5 solutions sent by famous contributors
(the solution of Guillaume
de l'Hôpital was only published in 1988).
We shall use a coordinate system where the origin is the starting point.
We choose to orient the y-axis downward so that y is positive for
all target points. Let's call u the slope of the trajectory dy/dx.
In a gravitational field g the conservation of mechanical energy for a mass
m dropped from the origin at zero speed tells us that:
m g y = ½ m [ (dx/dt)2 + (dy/dt)2 ]
Therefore, ( 2 g y ) ½ dt = ( 1 + u2 ) ½ dx
To minimize the descent time, we seek to minimize the path integral of dt or, equivalently,
to minimize the integral of (2g)½ dt = L dx
L dx = L (y,u,x) dx = ( 1 + u2 )½ / y½ dx
To minimize this integral, the relevant
Euler-Lagrange equation must hold:
| ¶ L |
= |
d |
( |
¶ L |
) |
 |
 |
 |
| ¶ y |
dx |
¶ u |
With the above expression of L, this translates into:
- ( 1 + u2 )1/2 / 2y3/2
=
d/dx [ u ( 1 + u2 )-1/2 / y1/2 ]
Recalling that u = dy/dx, we reduce this algebraically:
- ( 1 + u2 )-1/2 / 2y3/2
= (du/dx)
( 1 + u2 )-3/2 / y1/2
- ( 1 + u2 ) = 2 y du/dx
= 2 u y du/dy
That last equality holds only when dy/dx doesn't vanish, which need not
always be true. We put v = u2 and separate the
variables to integrate:
d {Log y} = dy/y = - dv / (1+v)
= - d { Log (1+v) }
Therefore, y / 2a = 1/(1+v) which is to say
v = 2a / y-1 for some a.
This characterizes a cycloid
corresponding to a wheel of radius a.
Wikipedia :
Brachistochrone curve
(2008-11-10)
The Isoperimetric Inequality
Among planar loops of unit length, the circle encloses the largest area.
The surface area S enclosed by a closed planar curve
of given perimeter P is largest in the case of a circle.
This ancient result can be expressed by the following relation,
known as the isoperimetric inequality:
S ≤ P 2 / 4p
The three-dimensional equivalent of the isoperimetric inequality says that the
closed surface of area S which encloses the largest volume
V is a sphere.
V 2 ≤ S 3
/ 36 p
The above can be generalized to n dimensions:
No hypersurface of hyperarea S encloses a larger
hypervolume V than the hypersphere.
This makes the relations given in the
oldest Numericana article yield:
V n-1 ≤
G ( 1 + n/2 )
[ S / nÖp ] n
(2008-11-10)
Plateau's Problem
The surface of least area with a given border.
The Plateau rules (1873)
state that the solutions are smooth surfaces of constant mean curvature with only
two possible types of singularities:
lines where 3 such smooth surfaces meet at
120° angles and isolated points
where 4 of those lines meet in a regular tetrahedral configuration.
The first published proof of those rules is due to Jean Taylor (1976).
Wikipedia :
Plateau's Laws
|
Joseph Plateau (1801-1883)
|
Jean Taylor
(1944-)
(2008-11-18)
Borderless embedded surfaces of minimal area
Minimal surfaces without edges or self-intersections in ordinary space.
Such surfaces are technically known as
complete embedded minimal surfaces.
Before 1982, only four examples of these were known:
- The plane.
- The catenoid (Euler, 1741).
In 1776, Meusnier remarked that the catenoid has zero
mean curvature.
- The helicoid
(Meusnier,
1776).
Catalan remarked that the plane and the helicoids are the only minimal
surfaces which are ruled.
- A fourth example was found by
Riemann in 1860 which consists of
infinitely many horizontal planes with slanted tunnels between adjacent pairs.
| |
 Celso José da Costa |
In 1982,
Celso
J. Costa
(then a graduate student in Brazil) stumbled upon a
minimal surface topologically equivalent to a torus with three holes.
Costa suspected that his surface had no self-intersections but,
at first, couldn't prove it...
That task was undertaken at the
University of Massachusetts at Amherst
by David Hoffman
(a 1973 Stanford Ph.D.).
 |
|
David Hoffman teamed up with
William
H. Meeks III .
and
Jim Hoffman,
(then a graduate student) to produce a computer visualization of the
stunning symmetries of the strange surface described by Costa
(which contains two straight lines meeting at a right angle).
Dividing Costa's surface into 8 congruent pieces,
they proved, indeed, that it never intersects itself!
|
Loosely speaking, Costa's surface features two complementary
pairs of "tunnels" through the
equatorial plane which connect the inside of the catenoid's northern half
and the outside of its southern half, or vice-versa.
| |
 David Hoffman |
 William H. Meeks, III |
Subsequently, Dave Hoffman and Bill Meeks discovered that Costa's surface is just the simplest
member of a whole family of complete minimal
embedded surfaces constructed in the same way but with more "tunnels"...
Applied to helicoids, the idea yields yet another family of complete minimal
embedded surfaces where tunnels provide shortcuts between
slices of space which are otherwise connected only by circling around the helicoid's axis.
Minimal Surfaces and
their Classification (pdf 4.44 MB) by William H. Meeks, III.
Denis Viala (2008-02-28)
Connect the dots, without crossings...
Given n blue dots and n
red dots in the plane (no three aligned) there are always
n disjoint segments with blue and red extremities.
HINT: This little puzzle appears among
optimization problems... Proof.
n=2, n=3, etc.
(2008-11-09)
Shortest Way to Connect Three Points
How to connect three dots with lines of least total length?
The three vertices of a triangle ABC
can be connected using just two of its three sides.
If the angle between those two sides is 120°
or larger, this is the most economical way to do so.
On the other hand, for triangles where the largest inside angle is
less than 120°, the most economical connecting network consists of
three straight lines (OA, OB, OC)
connecting the vertices to some optimal center (O)
where the three lines
OA, OB and OC
meet at 120° angles.
(2008-11-09)
The Honeycomb Theorem
In 1999, Thomas Hales finally proved an "obvious" fact.
Nobody questioned
the celebrated Honeycomb conjecture
before 1993, when Phelan and Weaire showed its
3D counterpart (Kelvin's conjecture)
to be false!
The issue was settled in
1999
with a proper proof by Thomas C. Hales :
Thus, the 2D Honeycomb conjecture
is now a theorem! Here it is:
Honeycomb Theorem
In any partition of the plane into tiles of unit area,
each tile cannot have a lesser perimeter than the regular hexagon
of unit area.
(Loosely speaking, the regular honeycomb tiling
is the most economical one.)
The first proof by
Thomas C. Hales
(1999)
was surprisingly intricate.
It was crucial to get rid of the
convexity restriction which earlier authors had reluctantly imposed.
(The isoperimetric inequality
implies that the boundaries between
optimal shapes are either circular arcs or straight lines.
Both sides of such a boundary are convex only in the latter case.)
The reason curved boundaries are ruled out
is not at all obvious; it is false
in the 3D case.
(2008-11-09)
On Minimal Foams
Kelvin's problem & counterexamples to Kelvin's conjecture.
The Kelvin problem asks for a partition of space into cells of equal volume and minimal area.
It has been described by
Simon Lhuilier
(1750-1840) and others as one of the most difficult problems in geometry.
 The uniform truncated octahedron
can tile space without voids. |
|
Lord Kelvin (1824-1907)
observed that a partition of space into
uniform truncated octahedra
was quite economical.
In 1887, he conjectured that the optimal foam would be obtained by curving the faces
of that tetradecahedron in accordance with Plateau's laws.
Kelvin's foam thus consists of a
single type of cell whose vertices are
exactly the same as a uniform truncated octahedron.
The square faces are planar figures with curved edges.
The hexagonal faces are
monkey saddles
with straight [great] diagonals.
In 1993, Denis Weaire and Robert Phelan disproved Kelvin’s conjecture
by describing a structure
which is more economical than Kelvin's:
The A15 Weaire-Phelan
structure
is one example of the so-called Frank-Kasper foams.
It consists of two distinct types of cells having
either pentagonal or hexagonal faces;
one is a squashed dodecahedron, the other is a tetradecahedron.
In a draft
(2008-10-25)
of a paper published on
2009-06-02,
Ruggero Gabbrielli presents
a new type of unit-cell foam (featuring some quadrilateral
faces) whose cost (5.303) is intermediary between
the Weaire-Phelan foam (5.288) and Kelvin's foam (5.306).
Gabbrielli has dubbed this structure "P42a".
Its repeating pattern consists of 14 curved
"polyhedra" of 4 different types
(10 tetradecahedra and
4 tridecahedra, for
an average
of 96/7 = 13.714285... faces per cell).
The above pictures for the A15 and P42a foams were
obtained by Ruggero Gabbrielli using the
GAVROG 3dt visualization software.
Gabbrielli has also posted several related
interactive
3D visualizations
under Java.
The 1993 optimization on A15 and Gabbrielli's recent research both used Ken Brakke’s
Surface Evolver
(with specific code provided by John M. Sullivan, in the latter case at least).
At this writing, the A15 foam described by Weaire and Phelan in
1993 is still conjectured to be the answer to Kelvin's problem.
It is about 0.3% less costly than Kelvin's original foam.
The Weaire-Phelan foam was used as the basis for the design of the celebrated
Water-cube
of the 2008 Olympics
(Beijing National Aquatics Center) which is the largest structure
covered with ETFE film.
The number of faces in a minimal foam (1992)
by Robert B. Kusner
Comparing the Weaire-Phelan
foam to Kelvin's foam (1996)
by Rob Kusner and John M. Sullivan
A new counterexample to
Kelvin’s conjecture on minimal surfaces
by Ruggero Gabbrielli
Comparing the Kelvin,
Gabbrielli and Weaire-Phelan structures (Java)
The
Weaire-Phelan Structure (and the Water-Cube) by
John C. Baez (2008-08-21).
|