Abstract: These notes, a work in progress, explore the relationship between wavelets and approximation theory. I intend to eventually
cover Besov spaces, sequence norms of wavelet coefficients, nonlinear approximation with wavelets, connection to image compression and
noise removal, etc. The notes, based loosely on a graduate course I taught a number of times in the 1990s,
were first produced in Fall 2005 while I was a long-term visitor at the IMA during their emphasis year
on Imaging, and they've been revised a number of times since then, most notably in 2012--2013 when I was on sabbatical
visiting Stacey Levine at Duquesne University.

Review:
This is a survey of the fast-growing area of wavelets from an approximation
theory point of view. It is good reading for those who have an interest
in learning about wavelets for the first time (as an overview and guide
to further reading) as well as for those who know some aspect of wavelets
and want to see what the approximation-theoretic perspective has to offer.
The paper begins with a fairly detailed description of the Haar wavelets
to motivate the material that follows. The construction of wavelets section
begins with an overview of multiresolution analysis and the basic framework
of shift invariant spaces in which context constructions of orthogonal
wavelets and prewavelets are discussed. Primary examples here are cardinal
spline wavelets and prewavelets in one dimension and box spline wavelets
in several dimensions. The authors include a heuristic discussion of the
Daubechies construction of compactly supported orthogonal wavelets highlighting
the major points. There is a section on the fast wavelet transform, a section
that relates the coefficients in a wavelet expansion of a function to its
smoothness properties, and a final section on applications. The connections
to approximation theory are evident throughout: from the role of shift
invariant spaces in the theory, the discussions of smoothness and approximation
order in the constructions, and most decidedly in the relation of spaces
of coefficients, to smoothness spaces and how this relation together with
nonlinear approximation theory techniques can be used in applications.
(Review by S. Riemenschneider in Math. Reviews)

Abstract:
Recently, a theory, developed by DeVore, Jawerth, and Popov, of nonlinear
approximation by both orthogonal and nonorthogonal wavelets has been applied to problems in
surface and image compression by DeVore, Jawerth, and Lucier. This theory relates precisely the
norms in which the error is measured, the rate of decay in that error as the compression decreases,
and the smoothness of the data. In addition, one can interpret the error incurred by the
quantization of wavelet coefficients in terms of this theory. In this talk we give an overview of the previous
results, and expand our argument, made earlier for image compression, that frequency-amplitude
response curves that arise quite naturally in problems involving human visual and audio
perception should be used to decide the quantization strategy for wavelet coefficients and the norm in
which to measure the error in compressed data.

Abstract: In this paper we present certain results about the com-
pression of images using wavelets. We concentrate on the simplest
case of the Haar decomposition and compression in L 2 . Further
results about compression in L p , p not 2, are mentioned.

Abstract: A novel theory is introduced for analyzing image compression
methods that are based on compression of wavelet decompositions. This
theory precisely relates (a) the rate of decay in the error between the
original image and the compressed image as the size of the compressed
image representation increases (i.e., as the amount of compression
decreases) to (b) the smoothness of the image in certain smoothness
classes called Besov spaces. Within this theory, the error incurred by
the quantization of wavelet transform coefficients is explained. Several
compression algorithms based on piecewise constant approximations are
analyzed in some detail. It is shown that, if pictures can be
characterized by their membership in the smoothness classes considered,
then wavelet-based methods are near-optimal within a larger class of
stable transform-based, nonlinear methods of image compression. Based on
previous experimental research it is argued that in most instances the
error incurred in image compression should be measured in the integral
sense instead of the mean-square sense.

Abstract: We propose wavelet decompositions as a technique for compressing the number
of control parameters of surfaces that arise in Computer-Aided Geometric Design.
In addition, we give a specific numerical algorithm for surfact compression
based on wavelet decompositions of surfaces into box splines.

Abstract:
Devore, Jawerth, and Lucier have previously intro-
duced a definition of the smoothness of images that
is directly related to the performance of wavelet com-
pression schemes. In this paper we survey previous
results on the equivalence between smoothness, rate
of decay of the wavelet coefficients, and efficiency of
wavelet compression techniques applied to images. We
report on other applications including deciding how
many pixel quantization intervals are needed to pre-
serve smoothness, and the fast solution of variational
problems that arise naturally in several areas of image
processing.

Abstract: This paper examines the relationship between wavelet-based
image processing algorithms and variational problems. Algorithms are derived
as exact or approximate minimizers of variational problems; in particular,
we show that wavelet shrinkage can be considered the exact minimizer of
the following problem: given an image $F$ defined on a square $I$,minimize
over all $g$ in the Besov space $B^1_1(L_1(I))$ the functional
$\|F-g\|_{L_2(I)}^2+\lambda\|g\|_{B^1_1(L_1(I))}$.
We use the theory of nonlinear wavelet image compression in $L_2(I)$ to
derive accurate error bounds for noise removal through wavelet shrinkage
applied to images corrupted with i.i.d.,&nbsp;mean zero, Gaussian noise. A new
signal-to-noise ratio, which we claim more accurately reflects the visual
perception of noise in images, arises in this derivation. We present extensive
computations that support the hypothesis that near-optimal shrinkage parameters
can be derived if one knows (or can estimate) only two parameters about
an image $F$: the largest $\alpha$ for which $F\in B^\alpha_q(L_q(I))$,
$1/q=\alpha/2+1/2$, and the norm $\|F\|_{B^\alpha_q(L_q(I))}$. Both theoretical
and experimental results indicate that our choice of shrinkage parameters
yields uniformly better results than Donoho and Johnstone's VisuShrink
procedure; an example suggests, however, that Donoho and Johnstone's SureShrink
method, which uses a different shrinkage parameter for each dyadic level,
achieves lower error than our procedure.

The version of the paper contained on this web page is longer than the
published version, and contains the proofs of some theorems omitted from
the published paper, an extra section on biorthogonal wavelets, and a section
that claims, through several examples, that the human visual system is
quite sensitive to image processes that introduce changes to the Besov
smoothness of images.

Abstract:
We have used genetic programming to develop
efficient image processing software.
The ultimate goal of our work is to detect certain signs of breast cancer
that cannot be detected with current segmentation and classification methods.
Traditional techniques do a relatively
good job of segmenting and classifying small-scale features of mammograms,
such as micro-calcification clusters.
Our strongly-typed genetic programs work on
a multi-resolution representation of the mammogram,
and they are aimed at handling features at medium and large scales,
such as stellated lesions and architectural distortions.
The main problem is efficiency.
We employ program optimizations
that speed up the evolution process by more than a factor of ten.
In this paper we present our genetic programming system,
and we describe our optimization techniques.

Abstract: We describe the wavelet-vaguelette decomposition (WVD) for
solving a homogeneous equation $Y = Af + Z$, where $A$ satisfies
$\widehat{A^{\ast}Af}(\xi)
= {\vert{\xi}\vert}^{-2\alpha}\widehat{f}(\xi)$ for some $\alpha\ge0$.
We find a sufficient condition on functions to have a WVD. This result
generalizes Daubechies's work on the discrete wavelet transform.
We examine the relation between the WVD-based method and variational problems
for solving a homogeneous equation. Algorithms are derived as exact
minimizers of variational problems of the form; given observed function
$Y$, minimize over all $g$ in the Besov space $B_{1,1}^{\beta_0}(R^d)$
the functional ${\|Y-Ag\|}_{\mathcal Y}^2+2\gamma{\vert{g}\vert}_{B_{1,1}^{\beta_0}}$,
where $\mathcal Y$ is a separable Hilbert space. We use the theory of nonlinear
wavelet approximation in $L^2(R^d)$ to derive accurate error bounds for
recovering $f$ through wavelet shrinkage applied to observed data $Y$ corrupted
with independent and identically distributed mean zero Gaussian noise $Z$.
We give a new proof of the rate of convergence of wavelet shrinkage that
allows us to estimate rather sharply the best shrinkage parameter. We conduct
tomographic reconstruction computations that support the hypothesis that
near-optimal shrinkage parameters can be derived if one knows (or can estimate)
only two parameters about a phantom image $f$: the largest $\beta$ for
which $f \in B_{p,p}^{\beta}(R^2)$, $p = {\frac{3}{\beta+3/2}}$, and the
seminorm ${\vert{f}\vert}_{B_{p,p}^{\beta}}$. Both theoretical and experimental
results indicate that our choice of shrinkage parameters yields uniformly
better results than Kolaczyk's procedure and classical filtered backprojection
method.

Abstract: Because the Radon transform is a smoothing transform, any
noise in the Radon data becomes magnified when the inverse Radon transform
is applied. Among the methods used to deal with this problem for the Radon
transform and other homogeneous equations is the Wavelet-Vaguelette Decomposition
(WVD) coupled with Wavelet Shrinkage, as introduced by David Donoho. We
extend several results of Donoho and others here. First, we introduce a
new sufficient condition on wavelets to generate a WVD, which generalizes
a result of Daubechies on the discrete wavelet transform. For a general
homogeneous operator $A$, which class includes the Radon transform, we
show that a variant of Donoho's method for solving inverse problems can
be derived as exact minimizers of variational problems of the form: given
the observed data $Y$, minimize over all $g$ in the Besov space
$B_{1}^{\beta_0}(L_1(\Bbb R^d))$ the functional ${\|Y-Ag\|}_{\mathcal
Y}^2+2\gamma{\vert{g}\vert}_{B_{1}^{\beta_0}(L_1(\Bbb
R^d))}$, where $\mathcal Y$ is a separable Hilbert space containing the range
of $A$. We use the theory of nonlinear wavelet approximation in $L_2(\Bbb
R^d)$ to derive accurate error bounds for recovering $f$ through wavelet
shrinkage applied to observed data $Y$ corrupted with independent and identically
distributed, mean zero, Gaussian noise $Z$. One intriguing result of this
analysis is that there is only one value of $\beta_0$, depending on $\alpha$,
the homogeneity index of $A$, and $d$, for which the error remains bounded
no matter the number of observations or the value of the regularizing parameter
$\gamma$. (For the Radon transform, $\alpha=1/2$, and the optimal value
of $\beta_0$ is $d/2-\alpha=1/2$ in two dimensions.) We give a new proof
of the rate of convergence of wavelet shrinkage that allows us to estimate
rather sharply the best shrinkage parameter. We conduct tomographic reconstruction
computations that support the hypothesis that near-optimal shrinkage parameters
can be derived if one knows (or can estimate) only two parameters about
an image $f$: the largest $\beta$ for which $f \in
B_{p}^{\beta}(L_p(\Bbb R^d))$, $p = {{3}/{(\beta+3/2)}}$, and the semi-norm
${\vert{f}\vert}_{B_{p}^{\beta}(L_p(\Bbb
R^d))}$. Both theoretical and experimental results indicate that our choice
of shrinkage parameters yields uniformly better results than Kolaczyk's
variant of Donoho's method and the classical filtered backprojection method.

Abstract: Ronald Coifman and David Donoho suggested translation-invariant
wavelet shrinkage as a means of removing noise from images. Basically,
this applies wavelet shrinkage to a two-dimensional version of the semi-discrete
wavelet representation of Mallat and Zhong. Coifman and Donoho also showed
how the method could be implemented in $O(N\log N)$ operations, where there
are $N$ pixels, which compares to $O(N)$ operations for ordinary wavelet
shrinkage, and $O(N\log N)$ operations for the Fast Fourier Transform.
In this paper, we provide a mathematical framework for iterated translation-invariant
wavelet shrinkage, and show, using a theorem of Kato and Masuda, that with
orthogonal wavelets it is equivalent to gradient descent in $L_2(I)$ along
the semi-norm for the Besov space $B^1_1(L_1(I))$, which, in turn, can
be interpreted as a new nonlinear wavelet-based image smoothing scale space.

Abstract: We introduce new anisotropic wavelet decompositions
associated with the smoothness $\boldsymbol\beta$,
$\boldsymbol\beta=(\beta_1,\dots,\beta_d)$,
$\beta_1,\dots,\beta_d>0$ of multivariate functions as measured in
anisotropic Besov spaces $B^{\boldsymbol\beta}$. We give the rate of
nonlinear approximation of functions $f\in B^{\boldsymbol\beta}$ by these
wavelets. Finally, we prove that, among a general class of anisotropic
wavelet decompositions of a function $f\in B^{\boldsymbol\beta}$, the
anisotropic wavelet decomposition associated with $\boldsymbol\beta$ gives
the optimal rate of compression of the wavelet decomposition of $f$.

Abstract:
Functional (time-dependent) Magnetic Resonance Imaging can be used to
determine which parts of the brain are active during various limited
activities; these parts of the brain are called activation regions.
In this preliminary
study we describe some experiments that are suggested
from the following questions:
Does one get improved results by analyzing the complex image data rather than
just the real magnitude image data? Does wavelet shrinkage smoothing improve
images? Should one smooth in time as well as within and between slices?
If so, how should one model the relationship between time smoothness (or
correlations) and spatial smoothness (or correlations). The measured data is
really the Fourier coefficients of the complex image---should we remove noise
in the Fourier domain before computing the complex images? In this preliminary
study we describe some experiments related to these questions.

Purpose: To evaluate the accuracy of a visually lossless,
image-adaptive, wavelet-based compression method for
achievement of high compression rates at mammography.
Materials and Methods:
The study was approved by the institutional review board
of the University of South Florida as a research study with
existing medical records and was exempt from individual
patient consent requirements. Patient identifiers were
obliterated from all images. The study was HIPAA
compliant. An algorithm based on scale-specific quantization of
biorthogonal wavelet coefficients was developed for the
compression of digitized mammograms with high spatial
and dynamic resolution. The method was applied to 500
normal and abnormal mammograms from 278 patients
who were 32-85 years old, 85 of whom had biopsy-proved
cancer. Film images were digitized with a charge-coupled
device-based digitizer. The original and compressed
reconstructed images were evaluated in a localization
response operating characteristic experiment involving
three radiologists with 2-10 years of experience in reading
mammograms.
Results: Compression rates in the range of 14:1 to 2051:1 were
achieved, and the rates were dependent on the degree of
parenchymal density and the type of breast structure.
Ranges of the area under the receiver operating characteristic
curve were 0.70-0.83 and 0.72-0.86 for original and
compressed reconstructed mammograms, respectively.
Ranges of the area under the localization response
operating characteristic curve were 0.39-0.65 and 0.43-0.71
for original and compressed reconstructed mammograms,
respectively. The localization accuracy increased an
average of 6% (0.04 of 0.67) with the compressed
mammograms. Localization performance differences were
statistically significant with P = 0.05 and favored interpretation with the
wavelet-compressed reconstructed images.
Conclusion: The tested wavelet-based
compression method proved to be an accurate approach for digitized
mammography and yielded visually lossless high-rate compression
and improved tumor localization.

Abstract:
Two observer experiments were performed to evaluate the performance
of wavelet enhancement and compression methodologies for digitized
mammography. One experiment was based on the localization response
operating characteristic (LROC) model. The other estimated detection and localization
accuracy rates. The results of both studies showed that the two algorithms
consistently improved radiologists' performance although not always in a
statistically significant way. An important outcome of this work was that lossy wavelet
compression was as successful in improving the quality of digitized
mammograms as the wavelet enhancement technique. The compression algorithm not
only did not degrade the readers' performance but it improved it consistently
while achieving compression rates in the range of 14 to 2051:1. The proposed
wavelet algorithms yielded superior results for digitized mammography relative
to conventional processing methodologies. Wavelets are valuable and diverse
tools that could make digitized screen/film mammography equivalent to its
direct digital counterpart leading to a filmless mammography clinic with full
inter- and intra-system integration and real-time telemammography.

Abstract:
In this paper we study finite-difference approximations to the variational problem using the BV smoothness penalty that was introduced in an image smoothing context by Rudin, Osher, and Fatemi. We give a dual formulation for an upwind finite-difference approximation for the BV seminorm; this formulation is in the same spirit as one popularized by the first author for a simpler, less isotropic, finite-difference approximation to the (isotropic) BV seminorm. We introduce a multiscale method for speeding the approximation of both Chambolle's original method and of the new formulation of the upwind scheme. We demonstrate numerically that the multiscale method is effective, and we provide numerical examples that illustrate both the qualitative and quantitative behavior of the solutions of the numerical formulations.

Abstract:
The Rudin-Osher-Fatemi variational model has been extensively studied and used
in image analysis. There have been several very successful numerical algorithms
developed to compute the minimizer of the discrete version of the ROF energy. We
study the convergence of numerical solutions of discrete total variation models to the
solution of the continuous model. We use the discrete ROF energy with a symmetric
discrete TV operator and obtain an error bound between the minimizer for the discrete
ROF model with a symmetric TV operator and the minimizer for the continuous ROF
model. Partial results are also obtained on error bounds of some non-symmetric
discrete TV minimizers.

Abstract:
We bound the difference between the solution to the continuous Rudin-Osher-Fatemi
image smoothing model and the solutions to various finite-difference approximations
to this model. These bounds apply to "typical" images, i.e., images with edges or
with fractal structure. These are the first bounds on the error in numerical methods for ROF smoothing.

Abstract:
A key bottleneck to high-speed chemical analysis, including hyperspectral imaging and monitoring of dynamic chemical processes, is the time required to collect and analyze hyperspectral data. Here we describe, both theoretically and experimentally, a means of greatly speeding up the collection of such data using a new digital compressive detection strategy. Our results demonstrate that detecting as few as ~10 Raman scattered photons (in as little time as ~30 microseconds) can be sufficient to positively distinguish chemical species. This is achieved by measuring the Raman scattered light intensity transmitted through programmable binary optical filters designed to minimize the error in the chemical classification (or concentration) variables of interest. The theoretical results are implemented and validated using a digital compressive detection instrument that incorporates a 785 nm diode excitation laser, digital micromirror spatial light modulator, and photon counting photodiode detector. Samples consisting of pairs of liquids with different degrees of spectral overlap (including benzene/acetone and n-heptane/n-octane) are used to illustrate how the accuracy of the present digital compressive detection method depends on the correlation coefficients of the corresponding spectra. Comparisons of measured and predicted chemical classification score plots, as well as linear and non-linear discriminant analyses, demonstrate that this digital compressive detection strategy is Poisson photon noise limited and outperforms total least squares--based compressive detection with analog filters.

Abstract:
Recent advances allow for the construction of filters with precisely defined frequency response for use in Raman
chemical spectroscopy. In this paper we give a probabilistic interpretation of the output of such filters and use
this to give an algorithm to design optimal filters to minimize the mean squared error in the estimated photon
emission rates for multiple spectra. Experiments using these filters demonstrate that detecting as few as ~10
Raman scattered photons in as little time as ~30 microseconds can be sufficient to positively distinguish chemical species.
This speed should allow "chemical imaging" of samples.

Abstract: Digital compressive detection, implemented using optimized binary (OB) filters, is shown to greatly increase the speed at which Raman spectroscopy can be used to quantify the composition of liquid mixtures and to chemically image mixed solid powders. We further demonstrate that OB filters can be produced using multivariate curve resolution (MCR) to pre-process mixture training spectra, thus facilitating the quantitation of mixtures even when no pure chemical component samples are available for training.

Figures: The journal Analyst embedded the images in the paper using the lossy compression algorithm DCTEncode rather than the lossless compression algorithm FlateEncode.
This introduced certain visual distortions into the images. We offer the interested reader the original figures as sent to the journal.

Abstract: The recently-developed optimized binary compressive de-
tection (OB-CD) strategy has been shown to be capable of using Raman
spectral signatures to rapidly classify and quantify liquid samples and
to image solid samples. Here we demonstrate that OB-CD can also be
used to quantitatively separate Raman and fluorescence features, and thus
facilitate Raman-based chemical analyses in the presence of fluorescence
background. More specifically, we describe a general strategy for fitting
and suppressing fluorescence background using OB-CD filters trained on
third-degree Bernstein polynomials. We present results that demonstrate
the utility of this strategy by comparing classification and quantitation
results obtained from liquids and powdered mixtures, both with and without
fluorescence. Our results demonstrate high-speed Raman-based quantitation
in the presence of moderate fluorescence. Moreover, we show that this
OB-CD based method is effective in suppressing fluorescence of variable
shape, as well as fluorescence that changes during the measurement process,
as a result of photobleaching.

Abstract: Several methods have been proposed to reduce boundary artifacts
in image deblurring. Some of those methods impose certain assumptions on
image pixels outside the field-of-view; the most important of these assume
reflective or anti-reflective boundary conditions. Boundary condition methods, including reflective and anti-reflective ones, however, often fail to reduce
boundary artifacts, and, in some cases, generate their own artifacts, especially
when the image to be deblurred does not accurately satisfy the imposed condition. To overcome these difficulties ,we suggest using free boundary conditions,
which do not impose any restrictions on image pixels outside the field-of-view,
and preconditioned conjugate gradient methods, where preconditioners are designed to compensate for the non-uniformity in contributions from image pixels
to the observation. Our simulation studies show that the proposed method outperforms reflective and anti-reflective boundary condition methods in removing
boundary artifacts. The simulation studies also show that the proposed method
can be applicable to arbitrarily shaped images and has the benefit of recovering
damaged parts in blurred images.