scheme consists in the imperceptible embedding of a string of bits,
namely the watermark code b, within a host asset A. Embedding is
achieved by modifying a set of host features F(A) according to the content
of b.
In some cases, it is convenient to transform the watermark code b in
a signal w, called the watermark signal, which can be more easily hidden
within A. In this case, watermark embedding amounts to the insertion of
w within J-(A). In detectable watermarking the detection process consists
in assessing whereas w is contained in A or not. With readable watermarking,
a further step is needed, since the bit string b must be inferred from
w. By following the digital communication analogy, the transformation of
the information string b before its injection within the host asset, can be
paralleled to line coding or digital modulation, where the bits to be transmitted
are associated to a set of suitable waveform signals which are more
easily transmitted through the channel. Unlike in digital communication,
though, we will intend information coding in a wider sense, thus letting the
transformation from b to w encompass operations such as channel coding
or message repetition as well.
The interpretation of the watermarking process as the insertion of a
watermark signal w within the host asset, stems from the analogy with
digital communication, where the to-be-transmitted signal is first injected
within the channel, then it is recovered by the receiver. Though widely
used in the early years of watermarking research, such a viewpoint is not
the only possible. One could first specify how the detector works, defining
exactly the detection region in the asset space, then look at data embedding
as the mapping of the host digital asset into a point inside the detectionDirect embedding paradigm. Given a watermark detection region Rw
(grey region) in the feature space, watermark embedding is simply seen as the
mapping of the host asset within a point inside R^,. The same action can be seen
as the addition of an asset dependent signal Wi (informed embedding).
region. According to this perspective, watermark embedding can be seen
as a direct modification of the host features, rather than their mixing with
a watermark signal.
Figure 3.1 summarizes the above concepts. Instead of being defined by
a signal w, the watermark is seen as a detection region Rw in the feature
space. Marking a host asset Aj, then reduces to mapping it into a point
within Rw (direct embedding). Note that the same action can be seen as
the insertion (here exemplified by a vector addition) of an asset-dependent
signal Wj within the host asset A^.
From the point of view of information coding, the procedure described
in figure 3.1 can be interpreted in two equivalent ways. According to the
former, information coding reduces to the definition of the detection region
Rw and to channel coding1. Alternatively, the role of the watermark signal
w may be kept, and information coding seen as the mapping of b into an
asset-dependent signal w. In this book we will use the first viewpoint, in
that the choice of the asset-dependent signal w» will be seen as part of the
watermark embedding process, rather than as part of information coding.
Accordingly, we will discuss channel coding in this chapter (section 3.4)
and leave the mapping of the host asset into a point in Rw to the next
chapter. As to the definition of detection regions, the discussion of such a
topic is postponed to chapters 4 and 6.In the rest of the book, we will refer to schemes for which watermarking
passes through the definition of an asset-independent signal w as blindembedding
or waveform-based schemes, in contrast to informed embedding
schemes which allows the use of an asset-dependent signal w$. Informed
embedding schemes will also be referred to as direct embedding schemes,
according to the interpretation that sees the watermarking process as the
mere mapping of the host asset A within the proper detection region2.
With regard to this chapter, information coding in detectable watermarking
is discussed in section 3.1, where particular attention is given to
waveform-based coding. Readable watermarking is treated in sections 3.2
and 3.3, where waveform-based and direct embedding schemes are considered
respectively. Finally, the use of channel coding as a mean to improve
the reliability of readable watermarking algorithms is described in section
3.4.
3.1 Information coding in detectable watermarking
In waveform-based detectable watermarking, information coding is straightforward.
Let B = {bj, b 2 , . . . b2fc} be the set of possible watermark codes.
The information coding process corresponds to the definition of a set of
digital waveforms W = {wi, w2, . . . WM} (M > 2f c), arid a coding rule $
that maps each b e B into a distinct element of W. At the same time,
watermark detection can be seen as a classical problem of signal detection
within noise. For each element of W, the detector defines a detection
region and a non-detection region, if the analyzed asset lies within the detection
region, the detector decides for the watermark presence, otherwise
the watermark absence is established. In the following, we will review a set
of different ways of defining W and <3>. More specifically, we will consider
information coding through PN-sequences and orthogonal sequences (sections
3.1.1 through 3.1.3). We will also discuss the use of colored (section
3.1.5) pseudo-random sequences, of periodic self-synchronizing sequences
(section 3.1.4), and information coding by means of chaotic sequences (section
3.1.6). The section ends with a brief discussion of direct embedding
watermarking, leaving a more detailed description on this kind of techniques
to the next chapter.
3.1.1 Spread spectrum watermarking
By relying on the observation that digital communications through very
noisy channels, possibly affected by intentional disturbs such as jammingor interferences, are usually based on spread spectrum technology, many
of the watermarking algorithms developed so far use a similar technique
to code the to-be-hidden information. To be more specific, each message
b € B is transformed into a pseudo-random sequence w of proper length
n (in agreement with the spread spectrum paradigm, n is usually much
larger than fc). In this case, Wi's are random variables drawn from a given
probability density function (pdf) pWi(w)3. In most of the cases t/Vs are
identically distributed variables, that is pWi(w) = pw(w) = p(w), where we
omitted the subscript w for sake of simplicity. As to the choice of p(w),
possible solutions include a normal pdf, N(0, a2), for which:
a uniform pdf:
(
In some applications, the bipolar distribution is conveniently replaced
by a three-valued distribution:
{ —a with probability p
0 with probability l-2p (3.3)
+a with probability p
Each of the above solutions presents a number of advantages and drawbacks,
however no definitive arguments exist demonstrating the superiority
of one pdf over the others. Sometimes, it is necessary that the watermark
signal be limited, e.g. to ensure invisibility, to exactly control the maximum
modification of host features, or to avoid that the host features change sign
thus loosing their meaning4. In this case, the normal pdf can not be used,
thus limiting the choice to the uniform and the bipolar distributions. A
drawback with the bipolar distribution is that, for a given peak distortion,
it results in a higher average distortion than the uniform distribution, due
to its larger variance. On the other side, it also ensures a higher transmission
rate, in that the bipolar pdf is the capacity-achieving distribution
under a peak distortion constraint.
A problem with discrete valued watermarks is their weakness against
the collusion attack. In the collusion attack, it is assumed that t copies
of the same asset, each marked with a different watermark, are available5.The attacker tries to remove the watermark by averaging the t assets in
his/her hands. One possible defense against the collusion attack consists of
making it unfeasible by trying to increase the minimum number of copies
necessary to remove the watermark as much as possible. This is not the case
with a binary valued watermark. Suppose, for example, that watermark
insertion is achieved by either adding a or —a to /j. Then, an attacker
only needs to find out two documents in which fa takes different values,
and averaging them, since in this way the watermark is completely erased.
It is clear that the use of continuous valued watermarks can give greater
robustness to this kind of attack. More specifically, it is found that the
normal distribution greatly outperforms the uniform one, since a much
larger number of copies are needed to effectively remove the watermark.
Such a behavior can be explained by noting that the collusion attack may
be thought of as a problem of signal estimation in noise, where the host
coefficient is the constant-valued signal and the various instances of w»
represent noise. It is known that, at least for the additive case, the gaussian
noise leads to the worst estimation accuracy.
^.Prorn a general point of view, it is essential that p(w) has a zero mean,
since it is known from digital communication theory that in this way the
transmission power can be minimized without increasing the error probability.
In a data hiding scheme this results in a lower distortion, for a given
level of robustness. Moreover, some detection schemes explicitly exploit
the knowledge that the expected value of Wi, nWi = E[wi\, is equal to zero.
In addition to being identically distributed, watermark coefficients are
usually designed so to be independent of each other, i.e. w^'s are independent
random variables. This leads to a white watermark, in that the power
spectrum of the watermark signal is a flat one. Though very popular, the
adoption of a white watermark signal is not always the best choice. In some
cases, in fact, it is preferable to accurately shape the watermark spectrum
in order to make it more robust and less perceptible at the same time.
Power shaping of w is treated in more detail in section 3.1.5.
Continuous pseudo-random sequence generation
According to the above ideal formulation of spread spectrum watermarking,
it is necessary that a sequence {w\, w-i... wn } of random numbers following
a given pdf is generated. This is known to be a very hard problem, that
has received considerable attention due to its importance in many different
fields including computer simulations, Monte Carlo methods, and CDMA6
digital communication.Let us consider first the continuous case. Our aim is to generate a
sequence of random numbers (RN sequence) with a given, continuous, distribution.
In the present state of the art, however, this can not be done
directly. On the contrary, an RN sequence z uniformly distributed in an
interval (0, TO) is first generated, then, a new RN sequence having the desired
pdf is obtained by properly operating on z. Note that z is not truly
continuous, since it only takes integer values (all the integers in (0, TO)),
nevertheless if m is sufficiently large, an RN sequence which closely approximates
an RN sequence uniformly distributed in (0,1) can be obtained
by dividing z by TO:
«i = -. (3.4)
TO
The most general algorithm for generating an RN sequence has the following
form:
Zi = g(zt-i • • • Zi-r) mod TO, (3.5)
where g{zi-\... z»_r) is a function depending on the last r values of the
RN sequence. Note that, due to the presence of the mod operator, z± is
the remainder of the division of g(zi-\... Zj_r) by TO. The simplest and
one of the most effective RN generators is the Lehmer's algorithm based
on the following recursion formula:
f zi = azj_i mod TO, i > 1
(3.6)
zo = 1
where TO is a large prime number and a is an integer. Alternatively, equation
(3.6) can be written as
Zi = a1 mod TO. (3.7)
Due to the mod operator, the sequence z^ takes values between 1 and
TO — 1; hence after TO steps at least two equal values are found, thus permitting
us to conclude that z^ is a periodic sequence with period TOO < TO — 1.
Of course, the periodic nature of Zj does not agree with the randomness requirement,
nevertheless, if the required number of samples is lower than TOO,
periodicity is not a problem. To take into account the imperfect randomness
of Zi, RN sequences generated through mathematical formulas such
as those expressed in equations (3.5) through (3.7) are usually referred to
as pseudo-random or pseudo-noise sequences (PN sequences for short). A
possible choice of m, suggested by Lehmer in 1951, is 231 — 1, leading to a
period TOO which is large enough for most practical applications. To complete
the specification of the Lehmer generator, the value of the multiplier
a must be chosen. A first consideration leads to searching for a number
a which maximizes the period of Zj, i.e. TOO = TO — 1. To do so, let us
introduce the notion of primitive root of TO.m, if the smallest n such that
an = 1 mod m, (3.8)
is n = m — 1.
It can be shown that each prime number has at least one primitive
root7. It is obvious from the above definition, that mo = m — 1 if and only
if a is a primitive root of m. Unfortunately, letting a be a primitive root
of m only ensures that the period of Zi is m — 1, but it does not guarantee
that Zi is a good PN sequence. Without going into much details about
the definition of what a good PN sequence is, we can say that a good PN
sequence should pass a number of randomness tests aiming at assessing
whether the characteristics of the sequence agree with those of an ideal
RN sequence. For a selection of the specific randomness tests z^ has to be
subjected to, the particular application served by z^ must be considered.
An example of a good choice of a and m, which has been used effectively
in a variety of applications, is:
a = 16807, m = 231 - 1 = 2147483647. (3.9)
In the framework of the data hiding scenario, the most important properties
of Zi are the adherence to the uniform distribution, and the lack of
correlation between subsequent samples. In many applications it is also
necessary that a sufficiently large number of sequences can be generated.
This follows directly from the condition that the number K of watermark
signals must be larger than 2fc. The generation of all the sequences w» 6 W
by starting from the Lehmer algorithm goes through the observation that
when a is chosen in such a way that mo = m — 1, the initial state of
the generator z0 can assume any integer value in [1, m — 1], with different
choices resulting in sequences that are cyclically shifted version of the same
sequence. If n is much smaller than m, then the sequences w^'s can be obtained
by running Lehmer's algorithm several times, each time by varying
ZQ. For example, a common way to vary ZQ consists in generating it randomly,
e.g. by using the computer clock. Of course, the larger the number
of sequences, the higher the probability that two sequences are generated
exhibiting large cross-correlation values, possibly invalidating the effectiveness,
and security, of the whole data hiding system. Alternatively, z0 could
be varied systematically. Suppose, for example, that the length n of the
watermark signal is 16,000, and that the Lehmer generator is used withTO = 231 — 1. The generator can be run m/n ~ 128,000 times, each time
producing a completely different sequence, thus accommodating approximately
217 bit-strings.
By relying on the pseudo-random sequence it,;, having a uniform distribution
in (0, 1), a random sequence Wi following any desired pdf can be
built. For example, to generate a uniform sequence taking values in any
finite interval (a, b) it only needs to compute wt = Ui(b — a) + a. The
construction of a normally distributed sequence is more tricky. Due to the
importance that the normal distribution has in many applications, several
methods have been developed to build a normal PN sequence by starting
from a uniform one. We will review the most popular ones, namely the
central-limit method and the Box-Muller method.
Given / independent identically distributed random variables xi:x% . . . x;,
having mean fj,x and variance a^., it is known from the central limit theorem8
that, for / — * co, the random variable:
tends to the standardized normal distribution, i.e.:
lim p(yi] = IV(0, 1). (3.11)
I— too
The above equations suggest that if we take / pseudo-random numbers
uniformly distributed in (0, 1) and we consider
Ei=i
then, for I large enough (commonly / = 12 is used), yt can be considered to
be normally distributed with zero mean, and unitary variance. A drawback
with the above method is that computing time is rather high, since only
one normal sample is obtained from 12 uniformly distributed numbers.
Additionally, the normal sequence produced by equation (3.12) takes values
in (—1, /) thus failing to approximate the normal distribution for large values
of y.
The Box-Muller algorithm represents a valuable alternative to the usage
of the central limit theorem. This method, originally introduced by
G.E.P.Box and M.E.Muller in 1958, relies on the following observation.Given two random variables u and v uniformly distributed in (0, 1), the
random variables
x = (— 2lnu}2cos2TTv
(3.13)
y = (— 21nwj2 senl-nv
follow a standardized normal distribution N(0, 1)9. Then, to generate a
normally distributed sequence, it only needs to generate two uniformly
distributed sequences, and use equations (3.13), to build the normal random
variables x and y. The Box-Muller method is usually preferred to methods
based on the central limit theorem, because of its lower complexity and,
most of all, because it produces a sequence which better approximates the
normal distribution.
Binary PN sequences
A binary PN sequence can be easily obtained by starting from a continuous
pseudo-random number generator. For example, if a PN sequence M» which
is uniformly distributed in (0, 1) is available, one can build a new sequence
u>j by letting:
f +1 if Ui >0.5
Wi = { (3.14)
\-l if Ui<0.5 { '
The sequence Wi is a bipolar i.i.d. sequence taking value —1 or +1 with
equal probability.
The use of pseudo-random generators ensures that the desired characteristics
of the watermark signal are satisfied on a probabilistic basis.
A possible alternative to the use of a continuous pseudo-random number
generator consists in the usage of maximum length sequences (m-sequences
for short).
Maximum length sequences are generated by means of an m-stage shift
register with feedback, as exemplified in figure 3.2. Due to the finite number
of states of the shift register, the output sequence is a periodic one, with
maximum period equal to 2m — 1, where m is the number of stages of
the shift register. The actual period of the output sequence depends on
feedback connections. When the connections are chosen in such a way
that the output period is maximum, the sequence of bits produced by the
shift register is called a maximum-length sequence. The configurations of
feedback connections ensuring a maximum period can be derived by relying
on cyclic coding theory and Galois fields and have been extensively studied
for digital communication applications. In Table 3.1, possible shift registerconnections for generating m-sequences with m ranging from 2 through
34 are shown. Of course, the output of an m-stage shift register such as
that depicted in figure 3.2, takes value in {0,1}. However, it is trivial to
map this sequence into a bipolar sequence with elements in {—1,1}. From
now on, when speaking about m-sequences we will always mean the bipolar
sequence associated to the {0,1} sequence at the output of the shift register.
Table 3.1: Possible shift register connections for generating m-sequences of different
lengths.
Connected
stages
m
Connected
stages m
Connected
stages
2
3
4
5
6
7
8
9
10
11
12
1,2
1,3
1,4
1,4
1,6
1,7,
1,5,6,7
1,6
1,8
10
1,7,9,12
13
14
15
16
17
18
19
20
21
22
23
1,10,11,13
1,5,9,14
1,15
1,5,14,16
1,15
1,12
1,15,18,19
1,18
1,20
1,22
1,19
24
25
26
27
28
29
30
31
32
33
34
1,18,23,24
1,23
1,21,25,26
1,23,26,27
1,26
1,28
1,8,29,30
1,29
1,11,31,32
1,21
1,8,33,34
Maximum length sequences possess a number of interesting properties
resembling those of ideal RN sequences. First, each period has exactly
2m~1 positive and 2m~l — 1 negative coefficients, thus ensuring that the
sequence average is very close to 0 (the average value of a bipolar msequence exactly equals l/(2m — 1)). Ideally a PN sequence should also
have correlation properties that are similar to those of white noise. That
is the autocorrelation Rw(l) should be L for I — 0 and 0 for 1 < / < L — 1,
with L = 2m — 1 denoting the sequence period. In the case of m sequences,
we have
Rw(l) = \L . l/
f [~,* T . (3.15)
1 - 1 if 1 < / < L — 1
It is evident that for large values of m m-sequences are very close to ideal
RN sequences from the autocorrelation point of view. A set of different
m-sequences having length m can be generated by using different feedback
connections. However, the number of possible m-sequences for a given
m is not unlimited, as it is shown in table 3.2. System designers must
take carefully into account the limits expressed in table 3.2, since they
directly impact the maximum number of different watermarks that can be
accommodated by the system.
Table 3.2: Number of different m-sequences for several values of TO.
m
3
4
5
6
7
number of sequences
2
2
6
6
18
m
8
9
10
11
12
number of sequences
16
48
60
176
144
In most cases, the cross-correlation properties of different watermark
signals are as important as the autocorrelation properties. For example,
when two or more watermarks have to be inserted within the same asset,
it is desirable that the two watermarks do not interfere each other, a con-
Table 3.3: Peak cross-correlation of m-sequences.
m
3
4
5
6
7
Rmax/R(Q]
0.71
0.60
0.35
0.36
0.32
) m
8
9
10
11
12
Rmax/R(0)
0.37
0.22
0.37
0.14
0.34dition that is usually met if the PN sequences corresponding to different
watermarks are mutually uncorrelated. Unfortunately, m-sequences do not
meet this constraint, since they may exhibit a rather high cross-correlation.
As it can be seen by observing the values reported in table 3.3, the max
value of R ( l ) for I / 0 , say Rmax, of the cross-correlation function may be
rather high (30% of the peak value, i.e. -R(O), on the average).
A possible solution consists in selecting a sub-set of all possible resequences
that have a smaller cross-correlation, however in this way the
number of admissible watermarks may become too small. Gold sequences
are a valid alternative to m-sequences. Gold sequences are generated by
selecting a pair of m-sequences, called preferred m-sequences, and summing
them modulo-2, as depicted in figure 3.3.
If m is odd, the maximum value of the cross-correlation between any two
pairs of sequences is Rmax = Y/2L, whereas for m even, we have Rmax =
\TL. Given a set of M binary sequences of period L, it is known that a
lower bound on their maximum cross-correlation is given by
Rmax <
M -I
LM-1'
(3.16)
then, for large values of L and M, Gold sequences are almost optimal.
Another advantage of Gold sequences with respect to m-sequences, is
their abundance. For a fixed length L, in fact, up to L + 2 different Gold
sequences can be generated (see table 3.4), a number that greatly outperforms
the number of possible m-sequences of the same length.
3.1.2 Orthogonal waveforms watermarking
Though the use of Gold sequences improves the cross-correlation properties
of binary PN-sequences, better results can be obtained by using watermarking
sequences which are expressly designed to be orthogonal of each other.Table 3.4: Number of different Gold sequences for different values of m.
m
3
4
5
6
7
number of sequences
9
17
33
65
129
m
8
9
10
11
12
number of sequences
257
513
1025
2049
4097
In this way, the cross correlation between different signal is equal to zero
and the interference between two watermarks which are simultaneously
present in the same asset minimized.
A solution that is commonly adopted consists in letting W coincide
with the set of Walsh-Hadamard sequences of length n.
Walsh sequences may be generated in many different ways, however,
the easiest passes through the definition of Hadamard matrices (hence the
name Hadamard-Walsh sequences). Hadamard matrices, are squared matrix,
whose possible orders are limited to the powers of two, hence Walsh
sequences will be limited to lengths of n = 2m. Hadamard matrices are
recursively obtained by starting from the lowest order matrix, defined by:
1
-1
(3.17)
Higher order matrices are obtained through the recursive relationship
Hn = > H-2] (3.18)
where
the matrix Hn/2 is multiplied by the matrix H^. For instance, for n = 4
and n = 8, we have:
14 =
i
-i
i
-i
ii
-i
-i
i—
i
-i
i
(3.19)
The difference between Walsh sequences and Hadamard matrices only consists
in the order in which the codes appear. The rows in the Hadamard
matrices are first re-ordered according to the number of zero-crossings in
each row: Walsh sequences correspond to the rows of these re-ordered matrices.
For instance, for n — 4, Walsh sequences correspond to the rows ofthe matrix:
" 1 1 1 1
1 i _i _i
(3.20)
1 1 1 1
1 1 - 1 - 1
1 - 1 - 1 1
1 - 1 1 - 1
Due to orthogonality, in a perfectly synchronized environment, interference
between Walsh sequences is zero10.
To increase the number of sequences (and hence the number of admissible
users) and the average distance between signals in W, bi-orthogonal
sequences may be used, where each orthogonal signal in W is further modulated
by a binary antipodal symbol ±1. More specifically, given a set of n
orthogonal sequences W, a set W consisting of In bi-orthogonal sequences
is built as follows:
(3.21)
The use of bi-orthogonal sequences permits to double the number of watermarking
signals with performances which are very close to those of orthogonal
sequences.
3.1.3 Orthogonal vs PN watermarking
In the attempt to summarize the pro's and con's of PN and orthogonal
watermarking, we will take into account several points of view, referring to
different practical scenarios.
Single user watermarking
It is well known from digital communications theory that for single user
scenarios, no advantage is gained from the use of spread spectrum modulation,
unless the possible presence of a jammer is considered (see below in
the text). On the contrary, it is advisable to deterministically design the
watermarking signals in such a way that they are as far apart as possible,
thus minimizing the possibility that the presence of the wrong watermark
is erroneously detected. From this perspective, the use of bi-orthogonal
sequences is the most advisable solution.Non-synchronized watermarking
The above conclusion does not hold when the possibility that the watermark
embedder and the watermark detector are not synchronized. As we
will see later on in the book, such a situation is very common, when the
watermarked asset is processed, either by a malevolent or a non-malevolent
user.
Due to their peaked autocorrelation function, PN sequences provide
a simple mean to recover the synchronism between embedder and detector.
In most cases, it only needs to compute the correlation between the
searched watermark and the watermarked asset. A pronounced peak in
the correlation, in fact, indicates that the watermark contained within the
asset and the one the detector is looking for are synchronized.
In general, this is not possible with orthogonal watermarking sequences,
since in this case the autocorrelation function is not necessarily peaked.
Let us consider, for example, the Walsh sequences depicted in figure 3.4
together with their autocorrelation. As it can be seen, in this case the
autocorrelation functions do not present a unique, well pronounced, peak,
thus making synchronization more cumbersome. Actually, this result was
expected, since the good synchronization properties of PN sequences, ultimately, relies on the spreading property (wide spectrum) of these sequences,
a property which does not hold for orthogonal sequences, such as the Walsh-
Hadamard ones.
Multiple watermarking
When the possibility that more than one watermark is embedded within
the same asset is taken into account, the choice of the watermarking sequence
is more critical11. Such a situation resembles a classical multiple
access scenario, where the same communication medium is shared between
different users. By relying on digital communication theory, we can divide
multiple access techniques into two main categories: orthogonal waveform
multiple access (OWMA), including Frequency Division Multiple Access
(FDMA), Time Division Multiple Access (TDMA) and Orthogonal-Code
Division Multiple Access (OCDMA), and Pseudo-Noise Code Division Multiple
Access (PN-CDMA), including conventional direct sequence and frequency
hopping CDMA. In the watermarking framework, it is readily seen
that the use of orthogonal watermarking sequences may be paralleled to
OWMA (OCDMA in particular), whereas spread spectrum watermarking
relies on the operating principle of PN-CDMA.
When the number of users that can be accommodated on a given channel
is taken into account, the two different approaches show a different
behavior. In OWMA no interference is present, up to a number of n users,
which is the hard limit on the maximum number of users admitted by the
system. In PN-CDMA, interference appears as soon as two active users
are present, however no hard limit on the maximum number of users exists,
such a limit depending on the desired performance and the allowed
Signal to Interference Ratio (SIR). As a matter of fact, this is the reason
why PN-CDMA is often used in overloaded environments, but it is only
rarely adopted in situations where the number of simultaneous users is
lower than n. Apart from the above general principles, it must be pointed
out that in some cases it is not easy to generate a large number of spread
sequences having a reasonably low cross-correlation, thus limiting the number
of users that can be practically accommodated by the system. For
example, if Gold sequences are used, the maximum number of users that is
allowed is roughly the same as for orthogonal sequences (n — 1 vs n + 1).
This is not the case for continuous-valued signals, where the number of
spread sequences is much larger than n, which is the maximum number of
possible, continuous-valued, orthogonal sequences of length n.Watermarking in hostile environments
A (heuristic) rationale for using a PN-CDMA comes from the original applications
such a technology was devised for. Actually, PN-CDMA derived
directly from direct sequence spread spectrum systems originally developed
for military communications system. More specifically, three properties of
spread spectrum communication, make it suitable for the hostile environments
typical of military applications: discreteness, low intercept probability
and robustness against jamming. When considering the peculiarities
of security-oriented applications of watermarking, it immediately turns out
that the same properties make PN-CDMA an ideal technology for a watermarking
system to rely on. Of course, many important differences exist
between watermarking and secure digital communications, nevertheless, the
intrinsic security of PN-CDMA technology suggests that some advantages
may be got from its use in terms of robustness and overall system security.
A summary of the main properties of PN- and orthogonal-sequence
watermarking is given in table 3.5.
Table 3.5: Comparison between PN and orthogonal sequences. In both cases
n indicates the sequence length and m the number of simultaneous watermarks
possibly embedded within the host asset.
Multiple access
(m < n)
Multiple access
(m > n)
Synchronization
Robustness/security
Spread spectrum
watermarking
Interference grows linearly
with m
Interference grows linearly
with m
Easy due to wide spectrum
Good security and robustness
(not theoretically
proved)
Orthogonal sequence
watermarking
No interference
Not possible
Cumbersome
Lack of security
In order to overcome some of the limitations of orthogonal sequence
watermarking, this approach may be combined with spread spectrum. To
be more specific, let w be a binary antipodal spread spectrum sequence
and {hj}"=1 be a set of orthogonal sequences, e.g. the Walsh-Hadamardsequences. By starting from w and hi a new set of sequences, {qi}"=i, is
built by multiplying each sequence hj by w, i.e.:
q; = hi • w. (3.22)
Note the the sequences {q^} are still orthogonal since
(q*, q.7-} = hi}khjtkwkwk — hi}khjtk = EhS(i - j ) , (3.23)
k=l k = l
where by <>(•) the Kronecker impulsive function is meant12, and Eh is the
energy of the sequence h. The new set of sequences q; has now a more
peaked autocorrelation function and is more secure than hj since its knowledge
is constrained to the knowledge of the spreading sequence w. It still
remains valid, though, the the maximum number of users that can be
accommodated through orthogonal signalling is upper bounded by the sequence
length n.
3.1.4 Self-synchronizing PN sequences
As it will be discussed later on in the book (chapter 7), loss of synchronization
in one of the most serious threat to watermark retrieval. Though
the watermark is virtually intact, the detector is not able to recover it,
since it does not know its exact geometric configuration, e.g. its position
in the host asset or its scale factor. A possible way to recover the
synchronism between the encoder and the detector, consists in the use of
auto-synchronizing watermark signals. In the most common case, an autosynchronizing
watermark is nothing but the periodic repetition of the same
PN signal13. Note that the exact nature of the periodic repetition depends
on the host asset media, thus in the audio case we have a ID periodic
repetition, whereas in the image case, repetition has to be intended in a
2D sense. The periodic nature of the watermark may be exploited to infer
important geometric properties of the watermark such as orientation14 or
scale. To exemplify how watermark periodicity can be exploited to estimate
the watermark scale, let us consider the simple case of a zero-mean,
mono-dimensional watermark which is added to a host audio asset S. Also
assume that watermark embedding is performed in the time domain, i.e.
the set of host features corresponds to the audio sample Sj's. We have:
Sw = S + w,or, alternatively:
sw,i = Si + Wi, (3.25)
where sWti denotes the i-th watermarked audio sample. If S and w are
assumed to be uncorrelated, the autocorrelation function of 5W is:
^(0 = ^(0 + ^(0- (3-26)
By assuming that S is not periodic and that the period L of the watermark
signal is much lower than the cardinality of S, RSvt (I) will exhibit a set
of periodic peaks corresponding to the periodic peaks of Rw(l) (see figure
3.5). The distance between such peaks is a clear indication of the period of
w. By comparing the original period of w and that estimated by looking
at the autocorrelation function, an estimate of the current watermark scale
with respect to the original one can be obtained, thus making watermark
detection easier.
3.1.5 Power spectrum shaping
The power spectrum of the watermark signal, affects the performance of a
data hiding system in a twofold way: first it determines the perceptibility
of the watermark, second it impacts watermark robustness. Unfortunately,
a direct dependency between the watermark and the signal power spectrum
can not be easily established without taking into account the exact
embedding rule. Consider, for example, the case of an additive watermark
embedded in the time domain of an audio asset S. In this case, the watermark
power spectrum directly modifies the power spectrum of S. If we
assume that w and S are independent of each other, in fact, the power
spectrum of the watermark and that of S sum together to form the power
spectrum of the watermarked asset. It is rather easy, then, to analyze the
influence of the power spectrum shape on watermark audibility and robustness.
This is not the case if a multiplicative embedding rule is adopted,
since the power spectrum of the resulting watermarked asset is a complex
combination of the spectra of 5" and w (also involving spectrum phase).
Similar considerations hold if the set of host features does not coincide
with plain asset samples. If watermark insertion is performed in the frequency
domain, for example, the shape of the spectrum of the watermarked
asset depends on the frequency location of host features, regardless of the
power spectrum of w.
In spite of the above difficulties, properly shaping the power spectrum
of w has received considerable attention in the technical literature, both
because of its impact on watermark robustness, and because of the importance
of time/spatial domain watermarking. If early research focused onwhite watermarks, as a direct extension of spread spectrum communication,
it soon became evident that a superior robustness (and a lower perceptibility)
could be achieved by adopting colored watermark signals. Thus some
authors used high-frequency watermarks, since they can be more easily separated
from the host signal. Others proposed to embed watermarks which
are perceptually similar to the host asset, pointing out that in this way
watermark masking is more easily achieved and that an attacker can not
modify the watermark without severely degrading the host asset as well.
Eventually, by relying on the observation that low-pass watermarks may
result too perceptible and that high-pass watermarks are too susceptible
to attacks, it has been proposed to use band-pass watermarking signals.
As a matter of fact, when a rigorous analysis of the trade-off between perceptibility,
robustness and capacity is carried out, the necessity of carefully
shaping the watermark power spectrum is confirmed. The optimal shape of
w, however, heavily depends on the possible attacks, the embedding rule,
and the distortion metric used to measure asset degradation as a consequence
of attacks and watermark embedding. Possible choices range from
a white watermark to a PSC-compliant15 watermark, where the shape of
the watermark power spectrum is derived through complex optimization
procedures aiming at maximizing a given measure of the performance of
the data hiding system (e.g. watermark capacity or probability of correct
detection). In most of the cases, though, the heuristic rule leading to a
band-pass watermark is confirmed.
Direct generation of PN sequences having a desired power spectrum, or
equivalently a desired autocorrelation function, is not easy to achieve. To
get around the problem, a white PN sequence is usually generated, then
the white sequence is linearly filtered to obtain the desired autocorrelation
characteristics. Note, however, that due to filtering, the pdf of the
PN sequence is changed, unless the PN sequence is normally distributed,
thus leading to possible difficulties in applications where both the power
spectrum and the watermark pdf are exactly specified.
3.1.6 Chaotic sequences
Though the use of pseudo-random sequences is by far the most common
way to generate the watermarking signal w, some alternative possibilities
have been proposed. Among them, chaotic sequences have received a considerable
attention for their good properties in terms of unpredictability,
power spectrum design, and ease of use. More specifically, watermarking
signals generated through n-way Bernoulli shift maps, seem to present someadvantages with respect to conventional PN sequences16.
An n-way Bernoulli shift Bn(x) is denned by the following expression:
x' = Bn(x) = nx mod 1 (3.27)
where n is an integer. Bernoulli shifts generate a piecewise aftine chaotic
map denned in the [0,1] interval (see figure 3.6, for an example of a Bernoulli
shift map with n = 5).
A Bernoulli chaotic watermark is easily generated though the iterative
applications of Bn(x), as described by the following recursive equation:
- Bn(wi) mod 1. (3.28)
Note that despite the similarity with PN sequences, chaotic sequences possess
rather different properties due to the continuous nature of equations
(3.27) and (3.28), whereas the generators at the basis of PN sequences
deal with integer numbers17. The watermark sequence generated through
equation (3.28) heavily depends on the sequence starting point WQ. More
specifically, if WQ is an irrational number the sequence exhibits a chaotic
non-periodic behavior. If WQ is chosen randomly, the sequence itself can
16Any claim about the superiority of chaotic sequences with respect to PN sequences,
though, must be checked with particular care, since the current state of the art in
watermarking is not mature enough to draw any ultimate conclusions on the matter.
17Of course, the practical implementation of Bernoulli shifts must rely on finite precision
arithmetic, nevertheless if double precision, floating point, numbers are used, thebe regarded as a random sequence, whose characteristics depend on the
particular choice of n. From a watermarking perspective, WQ corresponds
to the watermark key K, thus playing a role similar to that played by the
seed used to initialize the PN generator at the basis of PN watermarking.
It can be shown that the uniform distribution is an invariant pdf for the
n-way Bernoulli shift maps, in that if WQ is uniformly distributed, then
Wi is uniformly distributed too. As a consequence /j,Wi = 0.5 regardless of
i. Additionally, it can be shown that the autocorrelation function of w,
Rww(i + k, *) does not depend on i, since we have:
Rww(i + k, i) = E[wi+kWi] = Rww(k) = — -r, k > 0. (3.29)
l2nK
We can conclude that the watermark signal generated by n-way Bernoulli
shift maps are wide-sense stationary.
Since it is usually required that the watermarking signal is zero mean,
the mean value 0.5 of the sequence generated through equation (3.28) must
be subtracted, thus leading to a map defined in the [—0.5, 0.5] interval.
The possibility of controlling the autocorrelation function of a chaotic watermark
by simply varying n, provides a useful way to shape the watermark
power spectrum without changing the watermark pdf. This property may
be used, for example, to generate a PSC-compliant watermark or to tradeoff
between the properties of white and colored watermarks. To be more
specific, let us consider equation (3.29) in more details. It is readily seen
that for small values of n and k, wi+k and w^ exhibit a high correlation,
however. As n increases, Rww(k] approximates a Dirac delta function, that
is, the watermark approximates a white random watermark. Given Rww(k),
the power spectral density Svr(f) of Bernoulli chaotic watermarks can be
easily evaluated. More specifically, by properly exploiting the symmetry of
Rwwk around k = 0, we have:
n2
12(n2-2ncos(27r/) + l ) '
(3.30)
with / 6 [—0.5,0.5]. Note again that for small values of n, a lowpass
spectrum is obtained, whereas for large values of n, Svt(f) becomes approximately
flat. By varying n one can control the power spectrum of the
watermark, trying to achieve the best performance for the application at
hand.With regard to the correlation between two different watermark signals
w0 and W(>, three cases are possible: i) wa and wj, belong to the same
chaotic orbit; ii) wa and w^ belong to different chaotic orbits; iii) w0 and
Wj, belong to the same chaotic orbit after an initial number of iterations, a
situation occurring when the sequences are initialized with two irrational
numbers not belonging to the same chaotic orbit, but map iteration eventually
leads to the same orbit. When considering the actual implementation
of a chaotic map, though, the finite precision of calculators must be taken
into account. Chaotic orbits become periodic (even if the period may be
extremely long), and only two possibilities have to be distinguished: i)
the watermark length is smaller that the separation / between the starting
points of the two sequences; ii) the watermark length is larger than /.
In both cases, cross-correlation becomes an autocorrelation and equation
(3.29) may be used, by paying attention to shift Rww(k) by /.
3.1.7 Direct embedding
So far we have only considered waveform-based schemes, where watermarking
of the host asset A goes through the definition of a watermarking signal
w and its insertion within A. Though such an approach is a very popular
one, in many cases watermark insertion is performed directly, without having
to introduce the signal w. According to this strategy, usually referred
to under the umbrella of informed embedding, substitutive or direct embedding,
watermarking simply corresponds to moving the host asset into
a point within the proper detection region. When the direct embedding
approach is used, information coding loses most of its importance since it
reduces to the definition of the detection regions associated to each b g B.
Informed coding
In the discussion carried out so far we have always assumed that a unique
watermarking signal is associated to each watermark code. When looked at
from the point of view of detection regions, such an approach corresponds to
associating a unique, usually connected, detection region to each watermark
code b. In some cases, though, it may be convenient to associate a pool
of signals (or, equivalently, a pool of disjoint detection regions), to each
watermark code. The rationale for this choice will be clear in subsequent
chapters, after that informed coding/embedding is discussed. Here it only
needs to observe that if the overall detection region is spread all over the
asset space, it is easier for the embedder to map the host asset within
it, since the distance between the host asset and the detection region is,
on the average, lower. The simplest way to achieve such a goal is to let
more than one signal be available to transmit the same codeword, so thatof the host asset, e.g. the signal which results in a lower asset distortion.
This is exactly what the informed embedding/coding principle says: adapt
the watermarking signal to the asset at hand, so that either distortion is
minimized or robustness is maximized. For a more detailed discussion of
the informed embedding/coding approach the reader is referred to chapters
4 and 9.