Wednesday, December 19, 2007

Information coding

According to the paradigm described in chapter 1, the aim of any data hiding
scheme consists in the imperceptible embedding of a string of bits,
namely the watermark code b, within a host asset A. Embedding is
achieved by modifying a set of host features F(A) according to the content
of b.
In some cases, it is convenient to transform the watermark code b in
a signal w, called the watermark signal, which can be more easily hidden
within A. In this case, watermark embedding amounts to the insertion of
w within J-(A). In detectable watermarking the detection process consists
in assessing whereas w is contained in A or not. With readable watermarking,
a further step is needed, since the bit string b must be inferred from
w. By following the digital communication analogy, the transformation of
the information string b before its injection within the host asset, can be
paralleled to line coding or digital modulation, where the bits to be transmitted
are associated to a set of suitable waveform signals which are more
easily transmitted through the channel. Unlike in digital communication,
though, we will intend information coding in a wider sense, thus letting the
transformation from b to w encompass operations such as channel coding
or message repetition as well.
The interpretation of the watermarking process as the insertion of a
watermark signal w within the host asset, stems from the analogy with
digital communication, where the to-be-transmitted signal is first injected
within the channel, then it is recovered by the receiver. Though widely
used in the early years of watermarking research, such a viewpoint is not
the only possible. One could first specify how the detector works, defining
exactly the detection region in the asset space, then look at data embedding
as the mapping of the host digital asset into a point inside the detectionDirect embedding paradigm. Given a watermark detection region Rw
(grey region) in the feature space, watermark embedding is simply seen as the
mapping of the host asset within a point inside R^,. The same action can be seen
as the addition of an asset dependent signal Wi (informed embedding).
region. According to this perspective, watermark embedding can be seen
as a direct modification of the host features, rather than their mixing with
a watermark signal.
Figure 3.1 summarizes the above concepts. Instead of being defined by
a signal w, the watermark is seen as a detection region Rw in the feature
space. Marking a host asset Aj, then reduces to mapping it into a point
within Rw (direct embedding). Note that the same action can be seen as
the insertion (here exemplified by a vector addition) of an asset-dependent
signal Wj within the host asset A^.
From the point of view of information coding, the procedure described
in figure 3.1 can be interpreted in two equivalent ways. According to the
former, information coding reduces to the definition of the detection region
Rw and to channel coding1. Alternatively, the role of the watermark signal
w may be kept, and information coding seen as the mapping of b into an
asset-dependent signal w. In this book we will use the first viewpoint, in
that the choice of the asset-dependent signal w» will be seen as part of the
watermark embedding process, rather than as part of information coding.
Accordingly, we will discuss channel coding in this chapter (section 3.4)
and leave the mapping of the host asset into a point in Rw to the next
chapter. As to the definition of detection regions, the discussion of such a
topic is postponed to chapters 4 and 6.In the rest of the book, we will refer to schemes for which watermarking
passes through the definition of an asset-independent signal w as blindembedding
or waveform-based schemes, in contrast to informed embedding
schemes which allows the use of an asset-dependent signal w$. Informed
embedding schemes will also be referred to as direct embedding schemes,
according to the interpretation that sees the watermarking process as the
mere mapping of the host asset A within the proper detection region2.
With regard to this chapter, information coding in detectable watermarking
is discussed in section 3.1, where particular attention is given to
waveform-based coding. Readable watermarking is treated in sections 3.2
and 3.3, where waveform-based and direct embedding schemes are considered
respectively. Finally, the use of channel coding as a mean to improve
the reliability of readable watermarking algorithms is described in section
3.4.
3.1 Information coding in detectable watermarking
In waveform-based detectable watermarking, information coding is straightforward.
Let B = {bj, b 2 , . . . b2fc} be the set of possible watermark codes.
The information coding process corresponds to the definition of a set of
digital waveforms W = {wi, w2, . . . WM} (M > 2f c), arid a coding rule $
that maps each b e B into a distinct element of W. At the same time,
watermark detection can be seen as a classical problem of signal detection
within noise. For each element of W, the detector defines a detection
region and a non-detection region, if the analyzed asset lies within the detection
region, the detector decides for the watermark presence, otherwise
the watermark absence is established. In the following, we will review a set
of different ways of defining W and <3>. More specifically, we will consider
information coding through PN-sequences and orthogonal sequences (sections
3.1.1 through 3.1.3). We will also discuss the use of colored (section
3.1.5) pseudo-random sequences, of periodic self-synchronizing sequences
(section 3.1.4), and information coding by means of chaotic sequences (section
3.1.6). The section ends with a brief discussion of direct embedding
watermarking, leaving a more detailed description on this kind of techniques
to the next chapter.
3.1.1 Spread spectrum watermarking
By relying on the observation that digital communications through very
noisy channels, possibly affected by intentional disturbs such as jammingor interferences, are usually based on spread spectrum technology, many
of the watermarking algorithms developed so far use a similar technique
to code the to-be-hidden information. To be more specific, each message
b € B is transformed into a pseudo-random sequence w of proper length
n (in agreement with the spread spectrum paradigm, n is usually much
larger than fc). In this case, Wi's are random variables drawn from a given
probability density function (pdf) pWi(w)3. In most of the cases t/Vs are
identically distributed variables, that is pWi(w) = pw(w) = p(w), where we
omitted the subscript w for sake of simplicity. As to the choice of p(w),
possible solutions include a normal pdf, N(0, a2), for which:
a uniform pdf:
( and a bipolar pdf, for which w^s take value +a or -a with equal probability.
In some applications, the bipolar distribution is conveniently replaced
by a three-valued distribution:
{ —a with probability p
0 with probability l-2p (3.3)
+a with probability p
Each of the above solutions presents a number of advantages and drawbacks,
however no definitive arguments exist demonstrating the superiority
of one pdf over the others. Sometimes, it is necessary that the watermark
signal be limited, e.g. to ensure invisibility, to exactly control the maximum
modification of host features, or to avoid that the host features change sign
thus loosing their meaning4. In this case, the normal pdf can not be used,
thus limiting the choice to the uniform and the bipolar distributions. A
drawback with the bipolar distribution is that, for a given peak distortion,
it results in a higher average distortion than the uniform distribution, due
to its larger variance. On the other side, it also ensures a higher transmission
rate, in that the bipolar pdf is the capacity-achieving distribution
under a peak distortion constraint.
A problem with discrete valued watermarks is their weakness against
the collusion attack. In the collusion attack, it is assumed that t copies
of the same asset, each marked with a different watermark, are available5.The attacker tries to remove the watermark by averaging the t assets in
his/her hands. One possible defense against the collusion attack consists of
making it unfeasible by trying to increase the minimum number of copies
necessary to remove the watermark as much as possible. This is not the case
with a binary valued watermark. Suppose, for example, that watermark
insertion is achieved by either adding a or —a to /j. Then, an attacker
only needs to find out two documents in which fa takes different values,
and averaging them, since in this way the watermark is completely erased.
It is clear that the use of continuous valued watermarks can give greater
robustness to this kind of attack. More specifically, it is found that the
normal distribution greatly outperforms the uniform one, since a much
larger number of copies are needed to effectively remove the watermark.
Such a behavior can be explained by noting that the collusion attack may
be thought of as a problem of signal estimation in noise, where the host
coefficient is the constant-valued signal and the various instances of w»
represent noise. It is known that, at least for the additive case, the gaussian
noise leads to the worst estimation accuracy.
^.Prorn a general point of view, it is essential that p(w) has a zero mean,
since it is known from digital communication theory that in this way the
transmission power can be minimized without increasing the error probability.
In a data hiding scheme this results in a lower distortion, for a given
level of robustness. Moreover, some detection schemes explicitly exploit
the knowledge that the expected value of Wi, nWi = E[wi\, is equal to zero.
In addition to being identically distributed, watermark coefficients are
usually designed so to be independent of each other, i.e. w^'s are independent
random variables. This leads to a white watermark, in that the power
spectrum of the watermark signal is a flat one. Though very popular, the
adoption of a white watermark signal is not always the best choice. In some
cases, in fact, it is preferable to accurately shape the watermark spectrum
in order to make it more robust and less perceptible at the same time.
Power shaping of w is treated in more detail in section 3.1.5.
Continuous pseudo-random sequence generation
According to the above ideal formulation of spread spectrum watermarking,
it is necessary that a sequence {w\, w-i... wn } of random numbers following
a given pdf is generated. This is known to be a very hard problem, that
has received considerable attention due to its importance in many different
fields including computer simulations, Monte Carlo methods, and CDMA6
digital communication.Let us consider first the continuous case. Our aim is to generate a
sequence of random numbers (RN sequence) with a given, continuous, distribution.
In the present state of the art, however, this can not be done
directly. On the contrary, an RN sequence z uniformly distributed in an
interval (0, TO) is first generated, then, a new RN sequence having the desired
pdf is obtained by properly operating on z. Note that z is not truly
continuous, since it only takes integer values (all the integers in (0, TO)),
nevertheless if m is sufficiently large, an RN sequence which closely approximates
an RN sequence uniformly distributed in (0,1) can be obtained
by dividing z by TO:
«i = -. (3.4)
TO
The most general algorithm for generating an RN sequence has the following
form:
Zi = g(zt-i • • • Zi-r) mod TO, (3.5)
where g{zi-\... z»_r) is a function depending on the last r values of the
RN sequence. Note that, due to the presence of the mod operator, z± is
the remainder of the division of g(zi-\... Zj_r) by TO. The simplest and
one of the most effective RN generators is the Lehmer's algorithm based
on the following recursion formula:
f zi = azj_i mod TO, i > 1
(3.6)
zo = 1
where TO is a large prime number and a is an integer. Alternatively, equation
(3.6) can be written as
Zi = a1 mod TO. (3.7)
Due to the mod operator, the sequence z^ takes values between 1 and
TO — 1; hence after TO steps at least two equal values are found, thus permitting
us to conclude that z^ is a periodic sequence with period TOO < TO — 1.
Of course, the periodic nature of Zj does not agree with the randomness requirement,
nevertheless, if the required number of samples is lower than TOO,
periodicity is not a problem. To take into account the imperfect randomness
of Zi, RN sequences generated through mathematical formulas such
as those expressed in equations (3.5) through (3.7) are usually referred to
as pseudo-random or pseudo-noise sequences (PN sequences for short). A
possible choice of m, suggested by Lehmer in 1951, is 231 — 1, leading to a
period TOO which is large enough for most practical applications. To complete
the specification of the Lehmer generator, the value of the multiplier
a must be chosen. A first consideration leads to searching for a number
a which maximizes the period of Zj, i.e. TOO = TO — 1. To do so, let us
introduce the notion of primitive root of TO.m, if the smallest n such that
an = 1 mod m, (3.8)
is n = m — 1.
It can be shown that each prime number has at least one primitive
root7. It is obvious from the above definition, that mo = m — 1 if and only
if a is a primitive root of m. Unfortunately, letting a be a primitive root
of m only ensures that the period of Zi is m — 1, but it does not guarantee
that Zi is a good PN sequence. Without going into much details about
the definition of what a good PN sequence is, we can say that a good PN
sequence should pass a number of randomness tests aiming at assessing
whether the characteristics of the sequence agree with those of an ideal
RN sequence. For a selection of the specific randomness tests z^ has to be
subjected to, the particular application served by z^ must be considered.
An example of a good choice of a and m, which has been used effectively
in a variety of applications, is:
a = 16807, m = 231 - 1 = 2147483647. (3.9)
In the framework of the data hiding scenario, the most important properties
of Zi are the adherence to the uniform distribution, and the lack of
correlation between subsequent samples. In many applications it is also
necessary that a sufficiently large number of sequences can be generated.
This follows directly from the condition that the number K of watermark
signals must be larger than 2fc. The generation of all the sequences w» 6 W
by starting from the Lehmer algorithm goes through the observation that
when a is chosen in such a way that mo = m — 1, the initial state of
the generator z0 can assume any integer value in [1, m — 1], with different
choices resulting in sequences that are cyclically shifted version of the same
sequence. If n is much smaller than m, then the sequences w^'s can be obtained
by running Lehmer's algorithm several times, each time by varying
ZQ. For example, a common way to vary ZQ consists in generating it randomly,
e.g. by using the computer clock. Of course, the larger the number
of sequences, the higher the probability that two sequences are generated
exhibiting large cross-correlation values, possibly invalidating the effectiveness,
and security, of the whole data hiding system. Alternatively, z0 could
be varied systematically. Suppose, for example, that the length n of the
watermark signal is 16,000, and that the Lehmer generator is used withTO = 231 — 1. The generator can be run m/n ~ 128,000 times, each time
producing a completely different sequence, thus accommodating approximately
217 bit-strings.
By relying on the pseudo-random sequence it,;, having a uniform distribution
in (0, 1), a random sequence Wi following any desired pdf can be
built. For example, to generate a uniform sequence taking values in any
finite interval (a, b) it only needs to compute wt = Ui(b — a) + a. The
construction of a normally distributed sequence is more tricky. Due to the
importance that the normal distribution has in many applications, several
methods have been developed to build a normal PN sequence by starting
from a uniform one. We will review the most popular ones, namely the
central-limit method and the Box-Muller method.
Given / independent identically distributed random variables xi:x% . . . x;,
having mean fj,x and variance a^., it is known from the central limit theorem8
that, for / — * co, the random variable:
tends to the standardized normal distribution, i.e.:
lim p(yi] = IV(0, 1). (3.11)
I— too
The above equations suggest that if we take / pseudo-random numbers
uniformly distributed in (0, 1) and we consider
Ei=i
then, for I large enough (commonly / = 12 is used), yt can be considered to
be normally distributed with zero mean, and unitary variance. A drawback
with the above method is that computing time is rather high, since only
one normal sample is obtained from 12 uniformly distributed numbers.
Additionally, the normal sequence produced by equation (3.12) takes values
in (—1, /) thus failing to approximate the normal distribution for large values
of y.
The Box-Muller algorithm represents a valuable alternative to the usage
of the central limit theorem. This method, originally introduced by
G.E.P.Box and M.E.Muller in 1958, relies on the following observation.Given two random variables u and v uniformly distributed in (0, 1), the
random variables
x = (— 2lnu}2cos2TTv
(3.13)
y = (— 21nwj2 senl-nv
follow a standardized normal distribution N(0, 1)9. Then, to generate a
normally distributed sequence, it only needs to generate two uniformly
distributed sequences, and use equations (3.13), to build the normal random
variables x and y. The Box-Muller method is usually preferred to methods
based on the central limit theorem, because of its lower complexity and,
most of all, because it produces a sequence which better approximates the
normal distribution.
Binary PN sequences
A binary PN sequence can be easily obtained by starting from a continuous
pseudo-random number generator. For example, if a PN sequence M» which
is uniformly distributed in (0, 1) is available, one can build a new sequence
u>j by letting:
f +1 if Ui >0.5
Wi = { (3.14)
\-l if Ui<0.5 { '
The sequence Wi is a bipolar i.i.d. sequence taking value —1 or +1 with
equal probability.
The use of pseudo-random generators ensures that the desired characteristics
of the watermark signal are satisfied on a probabilistic basis.
A possible alternative to the use of a continuous pseudo-random number
generator consists in the usage of maximum length sequences (m-sequences
for short).
Maximum length sequences are generated by means of an m-stage shift
register with feedback, as exemplified in figure 3.2. Due to the finite number
of states of the shift register, the output sequence is a periodic one, with
maximum period equal to 2m — 1, where m is the number of stages of
the shift register. The actual period of the output sequence depends on
feedback connections. When the connections are chosen in such a way
that the output period is maximum, the sequence of bits produced by the
shift register is called a maximum-length sequence. The configurations of
feedback connections ensuring a maximum period can be derived by relying
on cyclic coding theory and Galois fields and have been extensively studied
for digital communication applications. In Table 3.1, possible shift registerconnections for generating m-sequences with m ranging from 2 through
34 are shown. Of course, the output of an m-stage shift register such as
that depicted in figure 3.2, takes value in {0,1}. However, it is trivial to
map this sequence into a bipolar sequence with elements in {—1,1}. From
now on, when speaking about m-sequences we will always mean the bipolar
sequence associated to the {0,1} sequence at the output of the shift register.
Table 3.1: Possible shift register connections for generating m-sequences of different
lengths.
Connected
stages
m
Connected
stages m
Connected
stages
2
3
4
5
6
7
8
9
10
11
12
1,2
1,3
1,4
1,4
1,6
1,7,
1,5,6,7
1,6
1,8
10
1,7,9,12
13
14
15
16
17
18
19
20
21
22
23
1,10,11,13
1,5,9,14
1,15
1,5,14,16
1,15
1,12
1,15,18,19
1,18
1,20
1,22
1,19
24
25
26
27
28
29
30
31
32
33
34
1,18,23,24
1,23
1,21,25,26
1,23,26,27
1,26
1,28
1,8,29,30
1,29
1,11,31,32
1,21
1,8,33,34
Maximum length sequences possess a number of interesting properties
resembling those of ideal RN sequences. First, each period has exactly
2m~1 positive and 2m~l — 1 negative coefficients, thus ensuring that the
sequence average is very close to 0 (the average value of a bipolar msequence exactly equals l/(2m — 1)). Ideally a PN sequence should also
have correlation properties that are similar to those of white noise. That
is the autocorrelation Rw(l) should be L for I — 0 and 0 for 1 < / < L — 1,
with L = 2m — 1 denoting the sequence period. In the case of m sequences,
we have
Rw(l) = \L . l/
f [~,* T . (3.15)
1 - 1 if 1 < / < L — 1
It is evident that for large values of m m-sequences are very close to ideal
RN sequences from the autocorrelation point of view. A set of different
m-sequences having length m can be generated by using different feedback
connections. However, the number of possible m-sequences for a given
m is not unlimited, as it is shown in table 3.2. System designers must
take carefully into account the limits expressed in table 3.2, since they
directly impact the maximum number of different watermarks that can be
accommodated by the system.
Table 3.2: Number of different m-sequences for several values of TO.
m
3
4
5
6
7
number of sequences
2
2
6
6
18
m
8
9
10
11
12
number of sequences
16
48
60
176
144
In most cases, the cross-correlation properties of different watermark
signals are as important as the autocorrelation properties. For example,
when two or more watermarks have to be inserted within the same asset,
it is desirable that the two watermarks do not interfere each other, a con-
Table 3.3: Peak cross-correlation of m-sequences.
m
3
4
5
6
7
Rmax/R(Q]
0.71
0.60
0.35
0.36
0.32
) m
8
9
10
11
12
Rmax/R(0)
0.37
0.22
0.37
0.14
0.34dition that is usually met if the PN sequences corresponding to different
watermarks are mutually uncorrelated. Unfortunately, m-sequences do not
meet this constraint, since they may exhibit a rather high cross-correlation.
As it can be seen by observing the values reported in table 3.3, the max
value of R ( l ) for I / 0 , say Rmax, of the cross-correlation function may be
rather high (30% of the peak value, i.e. -R(O), on the average).
A possible solution consists in selecting a sub-set of all possible resequences
that have a smaller cross-correlation, however in this way the
number of admissible watermarks may become too small. Gold sequences
are a valid alternative to m-sequences. Gold sequences are generated by
selecting a pair of m-sequences, called preferred m-sequences, and summing
them modulo-2, as depicted in figure 3.3.
If m is odd, the maximum value of the cross-correlation between any two
pairs of sequences is Rmax = Y/2L, whereas for m even, we have Rmax =
\TL. Given a set of M binary sequences of period L, it is known that a
lower bound on their maximum cross-correlation is given by
Rmax <
M -I
LM-1'
(3.16)
then, for large values of L and M, Gold sequences are almost optimal.
Another advantage of Gold sequences with respect to m-sequences, is
their abundance. For a fixed length L, in fact, up to L + 2 different Gold
sequences can be generated (see table 3.4), a number that greatly outperforms
the number of possible m-sequences of the same length.
3.1.2 Orthogonal waveforms watermarking
Though the use of Gold sequences improves the cross-correlation properties
of binary PN-sequences, better results can be obtained by using watermarking
sequences which are expressly designed to be orthogonal of each other.Table 3.4: Number of different Gold sequences for different values of m.
m
3
4
5
6
7
number of sequences
9
17
33
65
129
m
8
9
10
11
12
number of sequences
257
513
1025
2049
4097
In this way, the cross correlation between different signal is equal to zero
and the interference between two watermarks which are simultaneously
present in the same asset minimized.
A solution that is commonly adopted consists in letting W coincide
with the set of Walsh-Hadamard sequences of length n.
Walsh sequences may be generated in many different ways, however,
the easiest passes through the definition of Hadamard matrices (hence the
name Hadamard-Walsh sequences). Hadamard matrices, are squared matrix,
whose possible orders are limited to the powers of two, hence Walsh
sequences will be limited to lengths of n = 2m. Hadamard matrices are
recursively obtained by starting from the lowest order matrix, defined by:
1
-1
(3.17)
Higher order matrices are obtained through the recursive relationship
Hn = > H-2] (3.18)
where indicates the Kronecker matrix product, whereby each element of
the matrix Hn/2 is multiplied by the matrix H^. For instance, for n = 4
and n = 8, we have:
14 =
i
-i
i
-i
ii
-i
-i
i—
i
-i
i
(3.19)
The difference between Walsh sequences and Hadamard matrices only consists
in the order in which the codes appear. The rows in the Hadamard
matrices are first re-ordered according to the number of zero-crossings in
each row: Walsh sequences correspond to the rows of these re-ordered matrices.
For instance, for n — 4, Walsh sequences correspond to the rows ofthe matrix:
" 1 1 1 1
1 i _i _i
(3.20)
1 1 1 1
1 1 - 1 - 1
1 - 1 - 1 1
1 - 1 1 - 1
Due to orthogonality, in a perfectly synchronized environment, interference
between Walsh sequences is zero10.
To increase the number of sequences (and hence the number of admissible
users) and the average distance between signals in W, bi-orthogonal
sequences may be used, where each orthogonal signal in W is further modulated
by a binary antipodal symbol ±1. More specifically, given a set of n
orthogonal sequences W, a set W consisting of In bi-orthogonal sequences
is built as follows:
(3.21)
The use of bi-orthogonal sequences permits to double the number of watermarking
signals with performances which are very close to those of orthogonal
sequences.
3.1.3 Orthogonal vs PN watermarking
In the attempt to summarize the pro's and con's of PN and orthogonal
watermarking, we will take into account several points of view, referring to
different practical scenarios.
Single user watermarking
It is well known from digital communications theory that for single user
scenarios, no advantage is gained from the use of spread spectrum modulation,
unless the possible presence of a jammer is considered (see below in
the text). On the contrary, it is advisable to deterministically design the
watermarking signals in such a way that they are as far apart as possible,
thus minimizing the possibility that the presence of the wrong watermark
is erroneously detected. From this perspective, the use of bi-orthogonal
sequences is the most advisable solution.Non-synchronized watermarking
The above conclusion does not hold when the possibility that the watermark
embedder and the watermark detector are not synchronized. As we
will see later on in the book, such a situation is very common, when the
watermarked asset is processed, either by a malevolent or a non-malevolent
user.
Due to their peaked autocorrelation function, PN sequences provide
a simple mean to recover the synchronism between embedder and detector.
In most cases, it only needs to compute the correlation between the
searched watermark and the watermarked asset. A pronounced peak in
the correlation, in fact, indicates that the watermark contained within the
asset and the one the detector is looking for are synchronized.
In general, this is not possible with orthogonal watermarking sequences,
since in this case the autocorrelation function is not necessarily peaked.
Let us consider, for example, the Walsh sequences depicted in figure 3.4
together with their autocorrelation. As it can be seen, in this case the
autocorrelation functions do not present a unique, well pronounced, peak,
thus making synchronization more cumbersome. Actually, this result was
expected, since the good synchronization properties of PN sequences, ultimately, relies on the spreading property (wide spectrum) of these sequences,
a property which does not hold for orthogonal sequences, such as the Walsh-
Hadamard ones.
Multiple watermarking
When the possibility that more than one watermark is embedded within
the same asset is taken into account, the choice of the watermarking sequence
is more critical11. Such a situation resembles a classical multiple
access scenario, where the same communication medium is shared between
different users. By relying on digital communication theory, we can divide
multiple access techniques into two main categories: orthogonal waveform
multiple access (OWMA), including Frequency Division Multiple Access
(FDMA), Time Division Multiple Access (TDMA) and Orthogonal-Code
Division Multiple Access (OCDMA), and Pseudo-Noise Code Division Multiple
Access (PN-CDMA), including conventional direct sequence and frequency
hopping CDMA. In the watermarking framework, it is readily seen
that the use of orthogonal watermarking sequences may be paralleled to
OWMA (OCDMA in particular), whereas spread spectrum watermarking
relies on the operating principle of PN-CDMA.
When the number of users that can be accommodated on a given channel
is taken into account, the two different approaches show a different
behavior. In OWMA no interference is present, up to a number of n users,
which is the hard limit on the maximum number of users admitted by the
system. In PN-CDMA, interference appears as soon as two active users
are present, however no hard limit on the maximum number of users exists,
such a limit depending on the desired performance and the allowed
Signal to Interference Ratio (SIR). As a matter of fact, this is the reason
why PN-CDMA is often used in overloaded environments, but it is only
rarely adopted in situations where the number of simultaneous users is
lower than n. Apart from the above general principles, it must be pointed
out that in some cases it is not easy to generate a large number of spread
sequences having a reasonably low cross-correlation, thus limiting the number
of users that can be practically accommodated by the system. For
example, if Gold sequences are used, the maximum number of users that is
allowed is roughly the same as for orthogonal sequences (n — 1 vs n + 1).
This is not the case for continuous-valued signals, where the number of
spread sequences is much larger than n, which is the maximum number of
possible, continuous-valued, orthogonal sequences of length n.Watermarking in hostile environments
A (heuristic) rationale for using a PN-CDMA comes from the original applications
such a technology was devised for. Actually, PN-CDMA derived
directly from direct sequence spread spectrum systems originally developed
for military communications system. More specifically, three properties of
spread spectrum communication, make it suitable for the hostile environments
typical of military applications: discreteness, low intercept probability
and robustness against jamming. When considering the peculiarities
of security-oriented applications of watermarking, it immediately turns out
that the same properties make PN-CDMA an ideal technology for a watermarking
system to rely on. Of course, many important differences exist
between watermarking and secure digital communications, nevertheless, the
intrinsic security of PN-CDMA technology suggests that some advantages
may be got from its use in terms of robustness and overall system security.
A summary of the main properties of PN- and orthogonal-sequence
watermarking is given in table 3.5.
Table 3.5: Comparison between PN and orthogonal sequences. In both cases
n indicates the sequence length and m the number of simultaneous watermarks
possibly embedded within the host asset.
Multiple access
(m < n)
Multiple access
(m > n)
Synchronization
Robustness/security
Spread spectrum
watermarking
Interference grows linearly
with m
Interference grows linearly
with m
Easy due to wide spectrum
Good security and robustness
(not theoretically
proved)
Orthogonal sequence
watermarking
No interference
Not possible
Cumbersome
Lack of security
In order to overcome some of the limitations of orthogonal sequence
watermarking, this approach may be combined with spread spectrum. To
be more specific, let w be a binary antipodal spread spectrum sequence
and {hj}"=1 be a set of orthogonal sequences, e.g. the Walsh-Hadamardsequences. By starting from w and hi a new set of sequences, {qi}"=i, is
built by multiplying each sequence hj by w, i.e.:
q; = hi • w. (3.22)
Note the the sequences {q^} are still orthogonal since
(q*, q.7-} = hi}khjtkwkwk — hi}khjtk = EhS(i - j ) , (3.23)
k=l k = l
where by <>(•) the Kronecker impulsive function is meant12, and Eh is the
energy of the sequence h. The new set of sequences q; has now a more
peaked autocorrelation function and is more secure than hj since its knowledge
is constrained to the knowledge of the spreading sequence w. It still
remains valid, though, the the maximum number of users that can be
accommodated through orthogonal signalling is upper bounded by the sequence
length n.
3.1.4 Self-synchronizing PN sequences
As it will be discussed later on in the book (chapter 7), loss of synchronization
in one of the most serious threat to watermark retrieval. Though
the watermark is virtually intact, the detector is not able to recover it,
since it does not know its exact geometric configuration, e.g. its position
in the host asset or its scale factor. A possible way to recover the
synchronism between the encoder and the detector, consists in the use of
auto-synchronizing watermark signals. In the most common case, an autosynchronizing
watermark is nothing but the periodic repetition of the same
PN signal13. Note that the exact nature of the periodic repetition depends
on the host asset media, thus in the audio case we have a ID periodic
repetition, whereas in the image case, repetition has to be intended in a
2D sense. The periodic nature of the watermark may be exploited to infer
important geometric properties of the watermark such as orientation14 or
scale. To exemplify how watermark periodicity can be exploited to estimate
the watermark scale, let us consider the simple case of a zero-mean,
mono-dimensional watermark which is added to a host audio asset S. Also
assume that watermark embedding is performed in the time domain, i.e.
the set of host features corresponds to the audio sample Sj's. We have:
Sw = S + w,or, alternatively:
sw,i = Si + Wi, (3.25)
where sWti denotes the i-th watermarked audio sample. If S and w are
assumed to be uncorrelated, the autocorrelation function of 5W is:
^(0 = ^(0 + ^(0- (3-26)
By assuming that S is not periodic and that the period L of the watermark
signal is much lower than the cardinality of S, RSvt (I) will exhibit a set
of periodic peaks corresponding to the periodic peaks of Rw(l) (see figure
3.5). The distance between such peaks is a clear indication of the period of
w. By comparing the original period of w and that estimated by looking
at the autocorrelation function, an estimate of the current watermark scale
with respect to the original one can be obtained, thus making watermark
detection easier.
3.1.5 Power spectrum shaping
The power spectrum of the watermark signal, affects the performance of a
data hiding system in a twofold way: first it determines the perceptibility
of the watermark, second it impacts watermark robustness. Unfortunately,
a direct dependency between the watermark and the signal power spectrum
can not be easily established without taking into account the exact
embedding rule. Consider, for example, the case of an additive watermark
embedded in the time domain of an audio asset S. In this case, the watermark
power spectrum directly modifies the power spectrum of S. If we
assume that w and S are independent of each other, in fact, the power
spectrum of the watermark and that of S sum together to form the power
spectrum of the watermarked asset. It is rather easy, then, to analyze the
influence of the power spectrum shape on watermark audibility and robustness.
This is not the case if a multiplicative embedding rule is adopted,
since the power spectrum of the resulting watermarked asset is a complex
combination of the spectra of 5" and w (also involving spectrum phase).
Similar considerations hold if the set of host features does not coincide
with plain asset samples. If watermark insertion is performed in the frequency
domain, for example, the shape of the spectrum of the watermarked
asset depends on the frequency location of host features, regardless of the
power spectrum of w.
In spite of the above difficulties, properly shaping the power spectrum
of w has received considerable attention in the technical literature, both
because of its impact on watermark robustness, and because of the importance
of time/spatial domain watermarking. If early research focused onwhite watermarks, as a direct extension of spread spectrum communication,
it soon became evident that a superior robustness (and a lower perceptibility)
could be achieved by adopting colored watermark signals. Thus some
authors used high-frequency watermarks, since they can be more easily separated
from the host signal. Others proposed to embed watermarks which
are perceptually similar to the host asset, pointing out that in this way
watermark masking is more easily achieved and that an attacker can not
modify the watermark without severely degrading the host asset as well.
Eventually, by relying on the observation that low-pass watermarks may
result too perceptible and that high-pass watermarks are too susceptible
to attacks, it has been proposed to use band-pass watermarking signals.
As a matter of fact, when a rigorous analysis of the trade-off between perceptibility,
robustness and capacity is carried out, the necessity of carefully
shaping the watermark power spectrum is confirmed. The optimal shape of
w, however, heavily depends on the possible attacks, the embedding rule,
and the distortion metric used to measure asset degradation as a consequence
of attacks and watermark embedding. Possible choices range from
a white watermark to a PSC-compliant15 watermark, where the shape of
the watermark power spectrum is derived through complex optimization
procedures aiming at maximizing a given measure of the performance of
the data hiding system (e.g. watermark capacity or probability of correct
detection). In most of the cases, though, the heuristic rule leading to a
band-pass watermark is confirmed.
Direct generation of PN sequences having a desired power spectrum, or
equivalently a desired autocorrelation function, is not easy to achieve. To
get around the problem, a white PN sequence is usually generated, then
the white sequence is linearly filtered to obtain the desired autocorrelation
characteristics. Note, however, that due to filtering, the pdf of the
PN sequence is changed, unless the PN sequence is normally distributed,
thus leading to possible difficulties in applications where both the power
spectrum and the watermark pdf are exactly specified.
3.1.6 Chaotic sequences
Though the use of pseudo-random sequences is by far the most common
way to generate the watermarking signal w, some alternative possibilities
have been proposed. Among them, chaotic sequences have received a considerable
attention for their good properties in terms of unpredictability,
power spectrum design, and ease of use. More specifically, watermarking
signals generated through n-way Bernoulli shift maps, seem to present someadvantages with respect to conventional PN sequences16.
An n-way Bernoulli shift Bn(x) is denned by the following expression:
x' = Bn(x) = nx mod 1 (3.27)
where n is an integer. Bernoulli shifts generate a piecewise aftine chaotic
map denned in the [0,1] interval (see figure 3.6, for an example of a Bernoulli
shift map with n = 5).
A Bernoulli chaotic watermark is easily generated though the iterative
applications of Bn(x), as described by the following recursive equation:
- Bn(wi) mod 1. (3.28)
Note that despite the similarity with PN sequences, chaotic sequences possess
rather different properties due to the continuous nature of equations
(3.27) and (3.28), whereas the generators at the basis of PN sequences
deal with integer numbers17. The watermark sequence generated through
equation (3.28) heavily depends on the sequence starting point WQ. More
specifically, if WQ is an irrational number the sequence exhibits a chaotic
non-periodic behavior. If WQ is chosen randomly, the sequence itself can
16Any claim about the superiority of chaotic sequences with respect to PN sequences,
though, must be checked with particular care, since the current state of the art in
watermarking is not mature enough to draw any ultimate conclusions on the matter.
17Of course, the practical implementation of Bernoulli shifts must rely on finite precision
arithmetic, nevertheless if double precision, floating point, numbers are used, thebe regarded as a random sequence, whose characteristics depend on the
particular choice of n. From a watermarking perspective, WQ corresponds
to the watermark key K, thus playing a role similar to that played by the
seed used to initialize the PN generator at the basis of PN watermarking.
It can be shown that the uniform distribution is an invariant pdf for the
n-way Bernoulli shift maps, in that if WQ is uniformly distributed, then
Wi is uniformly distributed too. As a consequence /j,Wi = 0.5 regardless of
i. Additionally, it can be shown that the autocorrelation function of w,
Rww(i + k, *) does not depend on i, since we have:
Rww(i + k, i) = E[wi+kWi] = Rww(k) = — -r, k > 0. (3.29)
l2nK
We can conclude that the watermark signal generated by n-way Bernoulli
shift maps are wide-sense stationary.
Since it is usually required that the watermarking signal is zero mean,
the mean value 0.5 of the sequence generated through equation (3.28) must
be subtracted, thus leading to a map defined in the [—0.5, 0.5] interval.
The possibility of controlling the autocorrelation function of a chaotic watermark
by simply varying n, provides a useful way to shape the watermark
power spectrum without changing the watermark pdf. This property may
be used, for example, to generate a PSC-compliant watermark or to tradeoff
between the properties of white and colored watermarks. To be more
specific, let us consider equation (3.29) in more details. It is readily seen
that for small values of n and k, wi+k and w^ exhibit a high correlation,
however. As n increases, Rww(k] approximates a Dirac delta function, that
is, the watermark approximates a white random watermark. Given Rww(k),
the power spectral density Svr(f) of Bernoulli chaotic watermarks can be
easily evaluated. More specifically, by properly exploiting the symmetry of
Rwwk around k = 0, we have:
n2
12(n2-2ncos(27r/) + l ) '
(3.30)
with / 6 [—0.5,0.5]. Note again that for small values of n, a lowpass
spectrum is obtained, whereas for large values of n, Svt(f) becomes approximately
flat. By varying n one can control the power spectrum of the
watermark, trying to achieve the best performance for the application at
hand.With regard to the correlation between two different watermark signals
w0 and W(>, three cases are possible: i) wa and wj, belong to the same
chaotic orbit; ii) wa and w^ belong to different chaotic orbits; iii) w0 and
Wj, belong to the same chaotic orbit after an initial number of iterations, a
situation occurring when the sequences are initialized with two irrational
numbers not belonging to the same chaotic orbit, but map iteration eventually
leads to the same orbit. When considering the actual implementation
of a chaotic map, though, the finite precision of calculators must be taken
into account. Chaotic orbits become periodic (even if the period may be
extremely long), and only two possibilities have to be distinguished: i)
the watermark length is smaller that the separation / between the starting
points of the two sequences; ii) the watermark length is larger than /.
In both cases, cross-correlation becomes an autocorrelation and equation
(3.29) may be used, by paying attention to shift Rww(k) by /.
3.1.7 Direct embedding
So far we have only considered waveform-based schemes, where watermarking
of the host asset A goes through the definition of a watermarking signal
w and its insertion within A. Though such an approach is a very popular
one, in many cases watermark insertion is performed directly, without having
to introduce the signal w. According to this strategy, usually referred
to under the umbrella of informed embedding, substitutive or direct embedding,
watermarking simply corresponds to moving the host asset into
a point within the proper detection region. When the direct embedding
approach is used, information coding loses most of its importance since it
reduces to the definition of the detection regions associated to each b g B.
Informed coding
In the discussion carried out so far we have always assumed that a unique
watermarking signal is associated to each watermark code. When looked at
from the point of view of detection regions, such an approach corresponds to
associating a unique, usually connected, detection region to each watermark
code b. In some cases, though, it may be convenient to associate a pool
of signals (or, equivalently, a pool of disjoint detection regions), to each
watermark code. The rationale for this choice will be clear in subsequent
chapters, after that informed coding/embedding is discussed. Here it only
needs to observe that if the overall detection region is spread all over the
asset space, it is easier for the embedder to map the host asset within
it, since the distance between the host asset and the detection region is,
on the average, lower. The simplest way to achieve such a goal is to let
more than one signal be available to transmit the same codeword, so thatof the host asset, e.g. the signal which results in a lower asset distortion.
This is exactly what the informed embedding/coding principle says: adapt
the watermarking signal to the asset at hand, so that either distortion is
minimized or robustness is maximized. For a more detailed discussion of
the informed embedding/coding approach the reader is referred to chapters
4 and 9.

Data hiding for multimedia transmission

Among the possible applications of data hiding, the exploitation of a hidden
communication channel for improved transmission, particularly video
transmission, is gaining more and more consensus. Data hiding can be
helpful for video transmission in several ways. From the source coding
point of view, it can help to design more powerful compression schemes
where part of the information is transmitted by hiding it in the coded
bit stream. For instance, chrominance data could be hidden within the
bit stream conveying luminance information. Alternatively, the audio data
could be transmitted by hiding it within the video frame sequence (audio in
video). From a channel coding perspective, data hiding can be exploited to
improve the resilience of the coded bit stream with respect to channel errors
(self-correcting or self-healing images/video). As a matter of fact, redundant
information about the transmitted video could be hidden within the
coded bit-stream and used for video reconstruction in case channel errors
impaired the bit-stream.
2.3.1 Data compression
Traditionally, data hiding and data compression are considered contradictory
operations, in that each of them seems to obstruct the goal of the
other. As a consequence, a great deal of research has been devoted to finding
an appropriate compromise between the two goals, e.g. by designing
a watermarking algorithm which survives data compression. Despite this
apparent contradiction, some authors started investigating the possibility
of improving the effectiveness of existing data compression schemes by encoding
only part of the information and hiding the remaining information
within the coded bit-stream itself. For instance, the possibility of hiding
the chrominance part of an image within luminance information has been
investigated with rather good results, in that for a given compression rate,
a better fidelity to the original image is obtained. Another possibility consists
in hiding the audio signal of a video within the visual part of the video
stream.
From a theoretical point of view, one may wonder whether, in the presence
of ideal, perceptually lossless compression, data hiding is still possible
or not. The answer to this question is not easy at all. First of all the notion
of ideal perceptually lossless compression must be clarified. For example,
we may say that a perceptually lossless compression algorithm is ideal if it
removes all the information contained in the digital asset which can not be
perceived by a human observer. It is readily seen, that the above definition
precludes the possibility that data hiding and ideal compression coexist to
gether, given that the ultimate goal of any data hiding scheme is to modify
the host asset in such a way that it contains a piece of information which,
though present, can not be perceived by the human senses. Unfortunately
(or fortunately, depending on the point of view), the above definition is by
far too heuristic to lead to a mathematical formulation of the coding problem.
When a more practical definition of perceptually lossless ideal coding
is given, the possibility of designing a data hiding scheme which survives
ideal compression remains an open issue, mainly due to the particular nature
of perceptual equality between assets. As a matter of fact, perceptual
equality is not an equality in a strict mathematical sense since the transitive
property is not satisfied, nor the perceptual distance between assets
is a true distance in a mathematical sense, since the triangular inequality
does not hold, thus making the theoretical analysis of data hiding in the
presence of ideal, perceptually lossless compression extremely difficult.
At a more practical level, data hiding can represent a new way to overcome
the imperfections of current compression algorithms, which, far from
being ideal as they are, do not remove all the perceptual redundancy6 contained
within the to-be-compressed asset.
From an application perspective, the most stringent requirement is watermark
capacity, since the larger the capacity the higher the effectiveness
of the source coding algorithm. On the contrary, robustness is not an issue
at all, provided that data hiding is performed in the compressed domain,
or simultaneously to data compression. This is not the case, if data hiding
precedes compression, since in this case the hidden data must survive compression.
Protocol level requirements are rather obvious: blind watermark
detection is required, as well as the adoption of a readable watermarking
scheme.
2.3.2 Error recovery
A problem with the transmission of data in compressed form is the vulnerability
of the coded bit stream to transmission errors. This is the case
with most of the compression standards, including JPEG for still images,
MPEG and H.263 for digital video or MP3 for audio. For example, in
MPEG-2 video, a single bit error can cause a loss of synchronization that
will be visible over an entire group of pictures (GOP). To cope with the
fragility of compressed data, channel coding is usually adopted to enable
error detection or correction. This always corresponds to the introduction
of a controlled amount of redundancy. Redundancy can either be introduced
at the transmission level, by relying on error correcting codes, or at
the application level, i.e. by modifying the syntax of the coded bit stream,
in the attempt to make it more resilient against errors. Though the above
solutions considerably improve the quality of the reconstructed data in the
presence of errors, all of them share two basic drawbacks: i) usually they
are not standard-compliant (even if new standards make provision for error
resilient compression, backward compatibility with previous standard
is often lost); ii) the net available bit-rate decreases to make room for the
redundancy.
A possible alternative consists in performing error detection and concealment
at the decoder side. For instance, in video transmission, temporal
concealment may be applied in the attempt to reconstruct the missed information
from past frames, or the data in the present frame may be used to
reconstruct lost information (spatial concealment). Nevertheless, it is obviously
impossible to exactly recover the original content of a video frame,
e.g. in the presence of occlusions, once the corresponding part of the bit
stream has been lost.
Data hiding represents an alternative approach to the problem: the
redundant information is hidden within the compressed stream and, possibly,
used by the decoder to recover from errors. For instance, a low
quality version of the compressed asset may be transmitted through the
hidden channel to enable the reconstruction of the information that was
lost because of channel errors. In some cases, it is only important to detect
errors, e.g. to ask the retransmission of data, then the hidden data can be
used as in authentication applications, with tampering being replaced by
transmission errors. Note that with data hiding, backward standard compliance
is automatically achieved, since the hidden data is simply ignored
by a decoder which is not designed to exploit it. As to the preservation of
the net bit-rate available for payload transmission, it has to be noted that,
though unperceivable, the watermark always introduces a certain amount
of distortion which decreases the PSNR of the encoded data. Such a loss
in PSNR should be compared to the PSNR loss caused by the reduction
of the net bit-rate consequent to the use of conventional forward error correction
techniques, or to transmission of the redundant information at the
application level.
As for joint source coding and data hiding, even in this case, the actual
possibility of replacing error correcting codes with data hiding (or improving
the capability of error correcting codes via data hiding methodologies)
is not easy to asses from a theoretical point of view. As a matter of fact,
results from rate distortion theory and Shannon's theorem on channel coding
seem to indicate that no improvement has to be expected by using
data hiding for error correction. Nevertheless, real data transmission conditions
are far from the ideal conditions assumed in information theory: thechannel is not AWGN, source and channel coding are not ideal, asymptotic
analysis does not always hold, PSNR is not a correct measure of perceptual
degradation. At a more practical level, then, data hiding is likely to bring
some advantages with respect to conventional error handling techniques.
Even in this case, applications requirements are less demanding than in
copyright protection applications. The most stringent requirement regards
capacity, in that the higher the capacity the larger amount of redundancy
can be transmitted, thus increasing robustness against errors. For example,
the transmission of a low resolution version of a 512 x 512 gray level image
may require the transmission of 4096 pixel values, for a total required
capacity of about 10 Kbit (we assumed that each pixel requires at least
2.5 bits to be coded), which is a rather high value. Conversely, robustness
is not a major concern, even if it is obviously required that the hidden
information survives transmission errors. It is also obvious that the use
of a blind watermarking algorithm is required. The adoption of readable
watermarking is also mandatory, unless the hidden information is only used
to detect errors, without attempting to correct them.
2.4 Annotation watermarks
Despite digital watermarking is usually looked at as a mean to increase
data security (be it related to copyright protection, authentication or reliable
data transmission), the ultimate nature of any data hiding scheme
can be simply regarded as the creation of a side transmission channel, associated
to a piece of work. Interestingly, the capability of the watermark
to survive digital to analog and analog to digital conversion leads to the
possibility of associating the side channel to the work itself, rather than to
a particular digital instantiation of the work. This interpretation of digital
watermarking paves the way for many potential applications, in which
the watermark is simply seen as annotation data, inserted within the host
work to enhance its value. The range of possible applications of annotation
watermarks is a very large one, we will just describe a couple of examples
to give the reader a rough idea of the potentiality of digital watermarking
when this wider perspective is adopted. Note that the requirements
annotation watermarks must satisfy, can not be given without carefully
considering application details. In many cases, watermark capacity is the
most important requirement, however system performance such as speed
or complexity may play a predominant role. As to robustness, the requirements
for annotation watermarks are usually much less stringent that those
raised by security or copyright protection applications.
2.4.1 Labelling for data retrieval
Content-based access to digital archives is receiving more and more attention,
due to the difficulties in accessing the information stored in very large,
possibly distributed, archives. By letting the user specify the work he is
looking for, by roughly describing its content at a semantic level, many
of the difficulties usually encountered during the retrieval process can be
overcome. Unfortunately, it is very difficult for a fully automated retrieval
engine to analyze the data at a semantic level, thus virtually all contentbased
retrieval systems developed so far fail to provide a true access to the
content of the database. A possibility to get around this problem consists
in attaching to each work a description of its semantic content. Of course,
producing a label describing the semantic content of each piece of work is
a very time consuming operation, thus it is essential that such a label is
indissolubly tied to the object it refers to, regardless of the object format,
and its analog or digital nature. In this context, digital watermarking may
provide a way whereby the labelling information is indissolubly tied to the
host work, regardless of the format used to record it. When the work moves
from an archive to a new one, possibly passing from the analog domain,
the information describing the content of the work travels with the work
itself, thus avoiding information loss due to format modification. To exemplify
the advantages of data hiding with respect to conventional data
labelling, let us consider the archival of video sequences in MPEG-4 format.
An annotation watermark could be hidden within each video object
forming the MPEG-4 stream. For instance, the name of an actor could
be hidden within the corresponding video object. If the marked object is
copy-edited to create a different video sequence, the hidden label is automatically
copied with the object thus avoiding the necessity of labelling it
again. Similarly, if the object is pasted to a new video after going in the
analog and back to the digital domain, the annotation watermark is not
lost, thus making the semantic labelling of the new video easier.
2.4.2 Bridging the gap between analog and digital objects
A clever way to exploit the side communication channel made available by
digital watermarking, consists in linking any analog piece of work to the
digital world. The smart image concept, derived by the MediaBridge system
developed by Digimarc Corporation, is an example of such a vision of
digital watermarking. According to the smart image paradigm, the value
of any image is augmented by embedding within it a piece of information
that can be used to link the image to additional information stored on the
Internet. For example, such an information can be used to link a picture on
a newspaper to a web page further exploring the subject of the article the
image appears in. The actual link to the Internet is activated by showing
the printed picture to a video camera connected to a PC; upon watermark
extraction the URL of the web site with the pertinent information is retrieved
and the connection established. More generally, the information
hidden within the piece of work is dormant until a suitable software reads
it, then it may be used to control the software which retrieved the watermark,
to link the object to additional information, to indicate the user how
to get additional services, or to provide the user with a secret information
to be used only upon the payment of a fee. Watermark retrieval itself,
may be conditioned to the payment of a fee, thus providing a conditional
access mechanism that can be exploited in commercial applications, e.g.
bonus programme applications, where the gathering of a certain number of
watermarks is the access key to a discount programme.
2.5 Covert communications
Covert communication is the most ancient application of data hiding, since
it traces backs at least to the ancient Greeks, when the art of keeping a message
secret was used for military applications. Indeed, it is often invoked
that the first example of covert communication is narrated by Herodotus,
who tells the story of a message tattooed on .the shaved head of a slave:
the slave was sent through the enemy's lines after his hair was grown again,
thus fooling the enemy. Even if it is likely that the history of covert communication
started well before Herodotus' time, the art of keeping a message
secret is called steganography, from the Greek words are^ai/o^ (covered)
and ^paipeiv (writing). As opposed to cryptography, the ultimate goal of
a covert communication scheme is to hide the very existence of the hidden
message. In this case, the most important requirement is the imperceptibility
requirement, where imperceptibility assumes a wider sense, in that
it is essential that the presence of the message can not be revealed by any
means, e.g. through statistical analysis. In steganography, the most important
requirement after security (undetectability) is capacity, even if it
is obvious that the less information is embedded into the carrier signal, the
lower the probability of introducing detectable artifacts during the embedding
process.
A covert communication scheme is often modelled by considering the
case of a prisoner who wants to communicate with a party outside the
prison. To avoid any illegal communication, the warden inspects all the
messages sent by the prisoner and punishes him every time he discovers
that a secret message was hidden within the cover message (even if he is
not able to understand the meaning of the hidden message). Once casted in
a statistical framework, the prisoner problem can be analyzed by using tools
derived from information theory, and the possibility of always establishing
a secure covert channel demonstrated. The capacity of the covert channel
can also be calculated. It is important to point out that according to
the prisoner and the warden model, the prisoner is free to design the host
message so to facilitate the transmission of the hidden message, a condition
which does not hold in many practical applications where the sender is not
allowed to choose the host message.
Despite its ancient origin, and although a great deal of research has
been carried out aiming at designing robust watermarking techniques, very
little attention has been paid to analyzing or evaluating the effectiveness of
such techniques for steganographic applications. Instead, most of the work
developed so far has focused on analyzing watermarking algorithms with
respect to their robustness against various kinds of attacks attempting to
remove or destroy the watermark. However, if digital watermarks are to be
used in steganography applications, the detectability of watermark presence
must be investigated carefully, since detection by an unauthorized agent
would defeat the ultimate purpose of the covert communication channel.
2.6 Further reading
The necessity of considering buyer's rights in addition to those of the seller
in fingerprinting-based copy-protection systems was first pointed out by
[186]. Such a problem was lately analyzed by N. Memon and P. W. Wong
in [149], where they first introduced the IBS copy protection protocol.
The DVD copy protection protocol we briefly discussed in section 2.1.3,
is part of a complex system devised by an international pool of consumer
electronics companies, to protect the digital distribution of copyrighted
video. More details about such a system may be found in [32, 141].
An early formalization of data authentication relying on (semi-)fragile
watermarking may be found in [126], whereas for a detailed list of requirements
fragile authentication-oriented watermarking must satisfy the reader
is referred to [79].
Authentication of video surveillance data through digital watermarking
is thoroughly discussed in [23]. In the same paper, the general mathematical
framework for data authentication discussed in section 2.2.2 was first
introduced.
The notion of compressive data hiding, i.e. the possibility of exploiting
data hiding technology to improve coding efficiency, was formalized by
P. Campisi, D. Kundur, D. Hatzinakos and A. Neri in [35], where the
advantages obtained by hiding the chrominance components of an image
within the luminance bit stream are shown. Such a concept, though, was
already present in earlier works in which the possibility of hiding the audio
component of a video within its visual component was advanced [209].
The possibility of exploiting data hiding to improve the reliability of
multimedia transmission in the presence of errors has been explored by
several researchers. For some practical examples illustrating the potentiality
of such a strategy, readers may refer to [17, 80, 188, 203].
The potentialities of annotation watermarking are still largely unexplored,
partly because research was mainly focused on security-oriented
applications, partly because for this kind of application watermarking just
represents an additional way of solving problems which could be addressed
through different technologies as well. Readers interested in this particular
kind of application may refer to [68] where a survey of possible applications
of annotation watermarks is given, and [65, 198], where the smart image
concept is illustrated.
An insightful mathematical formalization of the covert communication
problem may be found in the seminal work by C. E. Shannon [197]. For
a good survey of covert communication through digital watermarking, the
reader.

Authentication

One of the (undesired) effects of the availability of more and more effective
signal processing tools, and of their possible use to modify the visual or
audio content of digital documents without leaving any perceptible traces
of the modification, is the loss of credibility of digital data, since doubts
always exist that they have been tampered with, in a way that substantially
changes the initial data content2. To overcome such a problem, it
is necessary that proper countermeasures are taken to authenticate signals
recorded in digital form, i.e. to ensure that signals have not been tampered
with (data integrity) and to prove their true origin. As it is explained
below, data authentication through digital watermarking is a promising
solution to both the above problems.
2.2.1 Cryptography vs watermarking
A straightforward way to authenticate a digital signal, be it a still image,
an image sequence or an audio signal, is by means of cryptography, namely
through the joint use of asymmetric-key encryption and a digital hash function.
Let us assume that the device used to produce the digital signal, e.g.
a scanner or a video camera, is assigned a public/private key pair, and that
the private key is hardwired within the acquisition device (which, of course,
should be as tamper-proof as possible). Before recording the digital signal,
the acquisition device calculates a digital summary (digest) of the signal
by means of a proper hash function. Then, it encrypts the digest with the
private key, thus obtaining a signed digest which is stored together with
the digital signal. Later on, the digest can be used to prove data integrity
or to trace back to its origin: one only needs to read the signed digest by
using the public key of the electronic device which produced the signal and
check if it corresponds to the actual signal content. For long signals, e.g.
audio or video signals, the digest should be computed on suitable signal
sub-parts, e.g. a video frame, rather than on the whole signal.
Though cryptography may provide a valuable mean for digital signal authentication,
the development of alternative approaches is desirable in order
to deal with some potential weaknesses of the cryptographic approach. Let
us consider, for example, the digest-based approach outlined previously.
This approach requires that the signal digest is tied to the signal itself, e.g.
by defining a proper format allowing the usage of authentication tools (see
for example the MPEG21 effort of ISO). In this way, however, the possibility
of authenticating the signal is constrained to the use of a particular
format, thus making impossible to use a different format, or to authenticate
the signal after digital-to-analog conversion. This is not the case if
authentication is achieved through digital data hiding, since the authenticating
information is embedded within the signal itself. Another drawback
with digest-based authentication is that the digest changes dramatically
as soon as any modification, be it a small or a large one, is applied to
the signal, thus making impossible to distinguish between malicious and
innocuous modifications. Moreover, if the basic scheme outlined above is
used, cryptographic authentication does not allow a precise localization of
tampering.
Data-hiding-based authentication represents a feasible and very elegant
solution to the above problems. It must be remembered, though, that despite
all the reasons usually produced to justify the resort to data hiding
authentication with respect to conventional cryptography, the main difference
between the two approaches is the way the authenticating information
is tied to the to-be-authenticated signal. More specifically, if the data hiding
approach is adopted, no header or separate file has to be used to ensure
data integrity, in addition digital-to-analog and analog-to-digital conversion
is allowed. Conversely, the main drawbacks of data-hiding-authentication
derive from the relative immaturity of watermarking technology with respect
to cryptography.
In the following section, we describe a general authentication framework
whereby providing authentication through data hiding. Such a framework
is a very general one since it encompasses both schemes using (semi-) fragile
and robust watermarking.
2.2.2 A general authentication framework
Generally speaking, authentication of the host signal may accomplished
either by means of (semi-)fragile or robust watermarking.
As we stated in section 1.2.3, with fragile watermarking the hidden
information is lost or altered as soon as the host signal undergoes any
modification: watermark loss or alteration is taken as an evidence that data
has been tampered with, whereas the recovery of the information contained
within the data is used to demonstrate data integrity and, if needed, to trace
back to data origin. Interesting variations of the previous paradigm, include
the capability to localize tampering, or to discriminate between malicious
and innocuous manipulations (e.g. moderate image compression). In the
latter case, a semi-fragile watermarking scheme has to be used, since it
is necessary that the hidden information survives only a certain kind of
allowed manipulations.
The use of robust watermarking for data authentication relies on a different
mechanism: a summary of the host signal is computed and inserted
within the signal itself by means of a robust watermark. Information about
the data origin is embedded together with the summary. To prove data integrity,
the information conveyed by the watermark is recovered and compared
with the actual content of the sequence: their mismatch is taken as
an evidence of data tampering. The capability to localize manipulations
will depend on the accuracy of the embedded summary. If tampering is
so heavy that the watermark is lost, watermark absence is simply taken as
an evidence that some manipulations occurred and the output of the authentication
procedure is a negative one. Note that in this case watermark
security is not a pressing requirement, since it is unlikely that someone is
interested in intentionally removing the watermark. On the contrary, pirates
would be interested in modifying the host data without leaving any
trace of the modification.
Though the approaches to data authentication relying on (semi-) fragile
and robust watermarking may seem rather different, it is possible to describe
both of them by means of the same mathematical framework. Let us
start by assuming that the watermark authentication relies on is a blind3
and readable one.
During the embedding phase, the watermark signal is generated by a
suitable watermark generation function Q, taking as input a secret key Kg
and, possibly, the to-be-authenticated asset A.
w = Q(A,Kg). (2.7)
The watermarking signal w is then hidden within A, thus producing a
watermarked asset Aw (for sake of simplicity we assume that w coincides
with b):
Av=£(A,w,K), (2.8)
where the secret key K used for watermark embedding must not be confused
with the secret key Kg used to generate the watermark.
To describe the verification procedure, let us indicate by A'w a possibly
corrupted copy of Aw. In order to verify the integrity of A'w, a watermark
signal w' is computed by means of the generation function Q.
vf' = g(A^,Ka). (2.9)
Then the watermark embedded within A'w is extracted, producing the watermark
signal w". Finally, the signals w' and w" are compared: if they
are equal the integrity verification procedure succeeds, otherwise it fails4:
,K), (2.10)
If w' = w" Then
the Asset is authentic
™El se (2-11
the Asset has been tampered with.
Authentication algorithms allowing tampering localization, infer the position
of tampering by giving w a suitable form and by looking at the
positions where w' and w" differ.
The above framework is valid both for fragile and robust watermarking.
The difference between the two approaches resides in the mechanism at the
basis of manipulation detection: while fragile techniques assume that any
manipulations modify the embedded watermark, robust techniques assumes
that the watermark is not affected by any manipulations; on the contrary,
it is the watermark generation function that, in this case, produces a watermark
that does not correspond to the embedded one. More formally, we
can say that, when a manipulation occurs, for fragile techniques we expect
that:that is, the generation function is not affected by manipulations, whereas
the decoding function is. Conversely, in the robust watermarking case we
expect that:
(W1^W => wVw" (2,3) (^ w" = w,
i.e. manipulations only affect the output of the generation function Q.
To introduce a certain degree of tolerance in the integrity verification
phase, e.g. to discriminate between allowed and non-allowed manipulations,
the dependence of Q (in the robust watermarking case) or T> (in
the fragile watermarking case) upon asset manipulations has to be relaxed.
In the fragile scheme, this leads to the use of semi-fragile watermarking,
whereas in the robust approach, this implies the design of a function Q
that depends only on certain asset features5. Alternatively, the possibility
of distinguishing between different types of manipulations can rely on a
clever comparison between w' and w". For instance, if w coincides with a
low resolution version of A, the comparison between w' and w" can be performed
manually, thus letting a human operator decide whether revealed
modifications are admissible or not.
As to authentication through fragile watermarking, the easiest way to
achieve the conditions expressed in equation (2.12) is to let Q depend only
on Kg. In this way, in fact, the watermark signal w does not depend
on the host asset, hence it does not depend on asset manipulations as
well. In the case of robust watermarking, the most common choice for the
generation function Q, is to let its output correspond to a summary of the
to-be-authenticated asset. More specifically, to focus the authentication
procedure on meaningful modifications only, it is rather common to design
Q so that it grasps the semantic content of A, e.g. by letting Q(A, Kg)
coincide with a low resolution version of A.
The authentication framework described above applies to readable watermarking,
however its extension to detectable watermarking is straightforward.
It only needs to replace equations (2.10) and (2.11), with the
following authenticity check:
If T>(A'w,w',K)=yes Then
the Asset is authentic
™ (2'14) Else
the Asset has been tampered with.where w' is still computed as in equation (2.9). As for readable watermarking,
the possibility of distinguishing between allowed and non allowed
manipulations resides in the sensibility of Q or T> on asset manipulations.
2.2.3 Requirements of data-hiding-based authentication
As we already noted, it is impossible to define an exact list of requirements
a data hiding algorithm must fulfill without taking into account the
application scenario; nevertheless, when restricting the analysis to data
authentication, the following general considerations hold:
• Blindness: of course, if the original asset A is available, checking the
integrity of a copy of A is a trivial task, since it only needs to compare
the copy with A. As to data origin, when disentangled from integrity
verification, it can be treated by the same standard as annotation
watermarks (see section 2.4).
• Readability/detectabily: by following the discussion carried out so far,
it can be concluded that no particular preference can be given to readable
or detectable watermarking with respect to integrity verification.
• Robustness: data authentication can be achieved both by means of
fragile and robust watermarking. Moreover, both the approaches permit,
at least in principle, to discriminate between different classes of
manipulations. Trying to summarize the pro's and con's of the two
methods, we can say that with (semi-)fragile techniques it is more difficult
to distinguish between malicious and innocuous modifications,
whereas the robust watermarking approach seems more promising,
since the final judgement on tampering usually relies on a visual comparison
between the asset summary conveyed by the watermark and
the to-be-authenticated copy. Conversely, the need of ensuring a high
watermark capacity without loosing robustness is the Achille's heel
of robust techniques; the need for a high capacity deriving from the
large number of bits needed to produce a meaningful asset summary.
• Imperceptibility: due to the particular nature of the authentication
task, it is usually necessary that watermark imperceptibility is guaranteed.
Nevertheless, some applications may exist in which a slightly
perceptible watermark is allowed. This is the case, for example, of
Video Surveillance (VS) data authentication, where authentication is
needed to keep the legal value of VS data intact: in most cases, it
is only necessary that the hidden information does not disturb the
correct behavior of the automatic visual inspection process the VS
system relies on (see section 5.5.1 for more details).

Applications

2.1 IPR protection
The protection of the rights possessed by the creator, or the legitimate
owner, of a multimedia piece of work encompasses many different aspects
including copyright protection and moral rights protection, e.g. the insurance
that the integrity of the work is respected not to violate the moral
beliefs of the owner/creator. In the sequel we will refer globally to such
rights as Intellectually Property Rights (IPR), even if, rigorously speaking,
IPR protection should consider topics such as patents and trademarks as
well. Due to the wide variety of situations encountered in practical applications,
to the large number of objectives possibly pursued by an IPR
protection system, and to the different legislations holding in different countries,
it is impossible (and beyond the scope of this book) to give a unified
treatment of watermarking-based IPR protection. We then present here
only the major tasks watermarking may be used for and the corresponding
watermarking paradigms, by keeping in mind that any practical IPR protection
system will need to address these tasks, and a considerable number
of other security and economic issues all together1.
2.1.1 Demonstration of rightful ownership
This is the most classical scenario served by watermarking: the author of a
work wishes to prove that he/she is the only legitimate owner of the work.
To do so, as soon as he/she creates the work, he/she also embeds within it a
watermark identifying him/her unambiguously. Unfortunately, this simple
scheme can not provide a valid proof in front of a court of law, unless the
1In general, the design of an IPR protection system goes through the definition of
a Business Model (BM) describing the way electronic transactions are performed, an
Electronic Copyright Management System defining how IPRs are handled within the
BM, and the specification of how the BM and the ECMS are implemented in practice,
e.g. through digital watermarking or cryptography.non-invertibility (non-quasi-invertibility) of the watermarking algorithm is
demonstrated (see section 1.2.7). Nevertheless, the watermark may still
be used by the rightful owner for his/her own purposes. For example, the
author may wish to detect suspicious products existing in the distribution
network. Such products could be individuated by an automated search
engine looking for the watermark presence within all the works accessible
through the network. Then, the author may rely on more secure mechanisms
to prove that he/she was the victim of a fraud, e.g. by depositing
any new creation to a registration authority.
A common way to confer the watermark verification procedure a legal
value, is to introduce the presence of a Trusted Third Party (TTP) in
the watermarking protocol. For example, the watermark identifying the
author may be assigned to him/her by a trusted registration authority, thus
preventing the possibility to use the SWICO attack to fool the ownership
verification procedure. In this way, in fact, it would be by far more difficult
to invert the watermarking operation, especially when blind watermarking
is used, since pirates can not rely on the design of an ad hoc fake original
work.
As to the requirements a watermarking algorithm to be used for rightful
ownership verification must satisfy, it is obvious that for any scheme to
work, the watermark must be a secure one, given that pirates are obviously
interested in removing the watermark, possibly by means of computationally
intensive procedures. In addition, private watermarking is preferable,
due to its inherently superior security. Finally, capacity requirements depend
on the number of different author identification codes the system must
accommodate for.
2.1.2 Fingerprinting
A second classical application of digital watermarking is copy protection.
Two scenarios are possible here; according to the first one, a mechanism
is envisaged to make it impossible, or at least very difficult, to make illegal
copies of a protected work (see section 2.1.3 for a discussion on copy
control mechanisms). In the second scenario, a so called copy deterrence
mechanism is adopted to discourage unauthorized duplication and distribution.
Copy deterrence is usually achieved by providing a mechanism to
trace unauthorized copies to the original owner of the work. In the most
common case, distribution tracing is made possible by letting the seller
(owner) inserting a distinct watermark, which in this case is called a fingerprint,
identifying the buyer, or any other addressee of the work, within
any copy of data which is distributed. If, later on, an unauthorized copy of
the protected work is found, then its origin can be recovered by retrieving
To take into account buyer's right, it is necessary that the situation
depicted in the figure, where several copies of the host asset containing the
identification code of client B\ are distributed to other purchasers, is avoided.
the unique watermark contained in it.
Of course, the watermark must be secure, to prevent any attempt to
remove it, and readable, to make its extraction easier. Note that the readability
requirement may be relaxed if the owner has the possibility to guess
in advance the watermark content.
A problem with the plain fingerprinting protocol described above, is
that it does not take into account buyer's rights, since the watermark is
inserted solely by the seller. Thus, a buyer whose watermark is found
in an unauthorized copy can not be inculpated since he/she can claim
that the unauthorized copy was created and distributed by the seller. The
possibility exists, in fact, that the seller is interested in fooling the buyer.
Let us consider, for example, the situation depicted in figure 2.1, where
the seller is not the original owner of the work, but an authorized reselling
agent. The seller may distribute many copies of a work containing the
fingerprint of buyer E\ without paying the due royalties to the author, and
claim that such copies were illegally distributed or sold by B\.
As in the case of rightful ownership demonstration, a possible solution
consists in resorting to a trusted third party. The simplest way to exploit
the presence of a TTP to confer a legal value to the fingerprint protocol, is
to let the TTP insert the watermark within the to-be-protected work, and
retrieve it in case a dispute resolution protocol has to be run. Despite its
simplicity, such an approach is not feasible in practical applications, mainly
because the TTP must do too much work, then it may easily become the
bottleneck of the whole system. In addition, the protected work must be
transmitted from the seller to the TTP and from the TTP to the customer,
or, in an even worse case, from the TTP to the seller and from the seller to
the customer, thus generating a very heavy traffic on the communication
channel.
An ingenious way to avoid the above difficulties and still ensure that
buyer's rights are respected, relies on the joint exploitation of watermarking
and cryptography, as suggested by the Interactive Buyer-Seller (IBS)
protocol. Even in this case, the the presence of a TTP is envisaged, however
TTP's role is minimized, thus making the IBS protocol more suited
to practical applications. Data exchange is kept to a minimum as well,
resulting in a very low communication overhead. The basic idea the IBS
protocol relies on, is that attention is paid not to let the seller get to know
the exact watermarked copy received by the buyer, hence he/she can not
distribute or sell copies of the original work containing the buyer's identification
watermark. In spite of this, the seller can identify the buyer
from whom unauthorized copies originated, and prove it by using a dispute
resolution protocol. The same protocol can be used by the buyer to
demonstrate his/her innocence. In order to exemplify the IBS protocol, let
Alice be the author of the work and Bob the buyer. We assume that Alice
and Bob possess a pair of public /private keys denoted by KA, KB (public
keys) and K'A, K'B (private keys). Let the encryption of a, message with a
key K be indicated by EK • After sending an identification of his identity,
Bob requests the TTP to send him a valid watermark w (once again we
assume that w coincides with b). The TTP checks Bob's credentials and
generates the watermark w. It then sends back to Bob w encrypted with
Bob's public key:
(wn)}, (2.1)
along with a signature of EKB(W), STTP(EKB(W})- For example,
))), (2.2)
where H is a proper hash function. Note that we assumed that watermark
components Wi's are watermarked independently by using the same
encryption key.
As a second step, Bob sends Alice EKB(W) and STTP(EKB(W}), so
that Alice can verify that EKB(W) is a valid encrypted watermark. Let
A be the digital asset Bob wants to buy. Before sending A to Bob, Alice
inserts within it two distinct watermarks. For the first watermark v,
which conveys a distinct ID uni vocally identifying the buyer, Alice can use
the watermarking scheme she prefers, since such a watermark is used by
Alice only to identify potentially deceitful customers through plain fingerprinting.
The second watermark is built by relying on EKB(\V). As for
EK, we require that the watermarking scheme acts on each host feature
independently, that is we require that:
fAw = {/I ® Wi, h © W>2 • • • fn 0 Wn},
where f^ = {/i, /2 • • • / « } represents the set of non-marked host features.
As a second requirement, we ask that the cryptosystem used by the IBS
protocol is a privacy homomorphism with respect to ®, that is:
EK(x®y) = EK(x)®EK(v'), (2.4)
where x and y are any two messages. Strange as it may seem, the privacy
homomorphism requirement is not difficult to satisfy. For instance, it is
known that the popular RSA cryptosystem is a privacy homomorphism
with respect to multiplication.
To insert the second watermark within A, Alice performs the following
steps. First she permutes the watermark components through a secret
permutation a:
where the equality immediately follows from equation (2.1). Then she
inserts EKB(CT(W)) within A directly in the encrypted domain. This is
possible due to (2.4) and because Alice knows Bob's public key. Stated
in another way, Alice sends to Bob an encrypted version of A containing
£KB(^v,a(w)) = EKB(AV) e EKB(a(w)). (2.6)
It is worth stressing again that in order to produce £KB(^V,does need to access the plain watermark w, since watermark casting is
performed in the encrypted domain.
When Bob receives EKB(AV^^), he decrypts it by using his private
key K'B, thus obtaining AVj(7(w). Note that Bob can not read the watermark
cr(w), thus it is not necessary to ensure the non reversibility of the
watermarking scheme.
In order to recover the identity of potential copyright violators, Alice
first looks for the presence of v. Upon detection of an illegal copy of A,
say A', she can use the second watermark to effectively prove that such
a copy originated from Bob. To do so, Alice must reveal to a judge the
permutation a, the encrypted watermark EKB(W), and STTP(EKB('W)).
After verifying STTP(EKB(w)), the judge asks Bob to reveal its private
key K'B to calculate w (actually it is not necessary that Bob reveals K'B,
it is only necessary that he reveals w whose validity can be verified by
applying KB to it and checking whether it equals EKB(*W)}- Now it is
possible to check A' for the presence of then Bob is judged guilty, otherwise Bob's innocence is been proven. Note
that if cr(w) is found in A', Bob can not maintain that A' originated from
Alice, since to do so Alice should have known either w to insert it within
the plain asset A, or K'B to decrypt EKB(AV!the watermark in the encrypted domain.
The protocols described in this and in previous section, are just two examples
of how illegal copy deterrence can be achieved by relying on watermarking
technology. With regard to the effective value of such mechanisms
as proofs in front of a judge, it must be said that the current state-of-the-art
allows this possibility only if a watermarking/certification authority acting
as a TTP is included within the copyright protection protocol. Nevertheless,
it is important to stress out that, even if plain fingerprinting may not
be considered a proof from a legislative point of view, it may useful in several
situations. For instance, the seller may use it to identify potentially
deceitful customers and break off any further business with them.
2.1.3 Copy control
When copy deterrence is not sufficient to effectively protect legitimate rightholders,
a true copy protection mechanism must be envisaged. Having said
that a comprehensive solution of copy protection mechanisms goes well
beyond watermarking technology, we describe a mechanism which has been
considered for protection of DVD video. This scenario, in fact, represents
a good example of how watermarking can be integrated in a complex copy
protection system and effectively contribute to its efficacy.
The DVD copy protection system outlined below, is the result of the
efforts of many important companies, including IBM, NEC, Sony, Hitachi,
Pioneer, Signafy, Philips, Macrovision and Digimarc. Though the systems
proposed by various companies differ with respect to many important issues
such as, for example, the choice of the underlying watermarking technology,
the overall protection scheme and the role of watermarking within it are
very similar, thus allowing us to briefly describe them without delving into
implementation details.
The mechanism employed to make illegal duplication and distribution
difficult enough to keep losses caused by missed revenues sustainable, relies
on the distinction between copyright compliant devices (CC-devices) and
non compliant devices (NC-devices). In particular, the DVD copy protection
system is designed in such a way that the CC world and the NC world
are kept as distinct as possible, for example, by allowing NC devices to
play only illegal disks and CC devices to play only legal disks. In this way,
users willing to draw from both the worlds must buy two series of devices,
one for legal and one for illegal disks, in the hope that this will prevent
massive, unauthorized, copying, as it happened in the case of audio.
A first important feature of a protected DVD is that its content is scrambled
through a Content Scrambling System (CSS). Descrambling requires
a pair of keys, one of which is unique to the video file, while the other is
unique to the DVD. Keys are stored on the lead-in area of the DVD, an area
that is only read by CC devices. The use of CSS results in the situation
depicted in figure 2.2: a protected DVD can only be played and recorded in
the CC world. It is not possible, in fact, that the output of a CC player is
connected to a NC recorder, since CC devices are not allowed to dialog with
NC-devices. On the other side, recording through CC devices is governed
by a Copy Generation Management System (CGMS) which allows copying
only if this is permitted for that particular disk. Simply speaking, CGMS
relies on two bits stored in the header of an MPEG stream, encoding one
of the following three indications: copy-freely, copy-never and copy-once,
where the result of the copy-once indication is that the video can be copied
but after copying, the CGMS bits are changed to copy-never.
CSS and CGMS prevent the flow from the legal world toward the NC
world, nevertheless, in order to discourage illegal copying the reverse must
also be true, i.e., it should not be possible to use a CC device to play or
record an illegal disk. Otherwise the whole protection mechanism would
only succeed in stimulating the diffusion of CC devices. To this aim, the
sole CSS is not sufficient. Consider, for example, the case of a pirate using
the analog RGB output of a compliant to make an unencrypted copy of the
video by means of a NC recorder. Such a copy can be played, and recorded,
on CC devices as well, since they would mistake the illegal video for a free
video without protection. This is because both scrambling and CGMS bits
are no longer present. Data hiding can help solving this problem, it suffices
that CGMS bits are embedded within the video in the form of a secure
watermark. It is obvious that the presence of CGMS bits prevents video
recording on a CC recorder, since, upon reading the CGMS bits, the CC
devices refuse to copy the video if CGMS bits indications do not allow it.
At the same time, CC players can be designed so to recognize as illegal a
DVD copy without CSS, yet containing the CGMS watermark, and refuse
playing it. A summary of the effect of embedding CGMS bits within DVD
video by means of digital watermarking is given in figure 2.3. As desired,
the worlds of CC- and NC-devices are kept separate, since illegal disks can
only be managed by NC devices and legal disks by CC devices.