Wednesday, December 19, 2007

Authentication

One of the (undesired) effects of the availability of more and more effective
signal processing tools, and of their possible use to modify the visual or
audio content of digital documents without leaving any perceptible traces
of the modification, is the loss of credibility of digital data, since doubts
always exist that they have been tampered with, in a way that substantially
changes the initial data content2. To overcome such a problem, it
is necessary that proper countermeasures are taken to authenticate signals
recorded in digital form, i.e. to ensure that signals have not been tampered
with (data integrity) and to prove their true origin. As it is explained
below, data authentication through digital watermarking is a promising
solution to both the above problems.
2.2.1 Cryptography vs watermarking
A straightforward way to authenticate a digital signal, be it a still image,
an image sequence or an audio signal, is by means of cryptography, namely
through the joint use of asymmetric-key encryption and a digital hash function.
Let us assume that the device used to produce the digital signal, e.g.
a scanner or a video camera, is assigned a public/private key pair, and that
the private key is hardwired within the acquisition device (which, of course,
should be as tamper-proof as possible). Before recording the digital signal,
the acquisition device calculates a digital summary (digest) of the signal
by means of a proper hash function. Then, it encrypts the digest with the
private key, thus obtaining a signed digest which is stored together with
the digital signal. Later on, the digest can be used to prove data integrity
or to trace back to its origin: one only needs to read the signed digest by
using the public key of the electronic device which produced the signal and
check if it corresponds to the actual signal content. For long signals, e.g.
audio or video signals, the digest should be computed on suitable signal
sub-parts, e.g. a video frame, rather than on the whole signal.
Though cryptography may provide a valuable mean for digital signal authentication,
the development of alternative approaches is desirable in order
to deal with some potential weaknesses of the cryptographic approach. Let
us consider, for example, the digest-based approach outlined previously.
This approach requires that the signal digest is tied to the signal itself, e.g.
by defining a proper format allowing the usage of authentication tools (see
for example the MPEG21 effort of ISO). In this way, however, the possibility
of authenticating the signal is constrained to the use of a particular
format, thus making impossible to use a different format, or to authenticate
the signal after digital-to-analog conversion. This is not the case if
authentication is achieved through digital data hiding, since the authenticating
information is embedded within the signal itself. Another drawback
with digest-based authentication is that the digest changes dramatically
as soon as any modification, be it a small or a large one, is applied to
the signal, thus making impossible to distinguish between malicious and
innocuous modifications. Moreover, if the basic scheme outlined above is
used, cryptographic authentication does not allow a precise localization of
tampering.
Data-hiding-based authentication represents a feasible and very elegant
solution to the above problems. It must be remembered, though, that despite
all the reasons usually produced to justify the resort to data hiding
authentication with respect to conventional cryptography, the main difference
between the two approaches is the way the authenticating information
is tied to the to-be-authenticated signal. More specifically, if the data hiding
approach is adopted, no header or separate file has to be used to ensure
data integrity, in addition digital-to-analog and analog-to-digital conversion
is allowed. Conversely, the main drawbacks of data-hiding-authentication
derive from the relative immaturity of watermarking technology with respect
to cryptography.
In the following section, we describe a general authentication framework
whereby providing authentication through data hiding. Such a framework
is a very general one since it encompasses both schemes using (semi-) fragile
and robust watermarking.
2.2.2 A general authentication framework
Generally speaking, authentication of the host signal may accomplished
either by means of (semi-)fragile or robust watermarking.
As we stated in section 1.2.3, with fragile watermarking the hidden
information is lost or altered as soon as the host signal undergoes any
modification: watermark loss or alteration is taken as an evidence that data
has been tampered with, whereas the recovery of the information contained
within the data is used to demonstrate data integrity and, if needed, to trace
back to data origin. Interesting variations of the previous paradigm, include
the capability to localize tampering, or to discriminate between malicious
and innocuous manipulations (e.g. moderate image compression). In the
latter case, a semi-fragile watermarking scheme has to be used, since it
is necessary that the hidden information survives only a certain kind of
allowed manipulations.
The use of robust watermarking for data authentication relies on a different
mechanism: a summary of the host signal is computed and inserted
within the signal itself by means of a robust watermark. Information about
the data origin is embedded together with the summary. To prove data integrity,
the information conveyed by the watermark is recovered and compared
with the actual content of the sequence: their mismatch is taken as
an evidence of data tampering. The capability to localize manipulations
will depend on the accuracy of the embedded summary. If tampering is
so heavy that the watermark is lost, watermark absence is simply taken as
an evidence that some manipulations occurred and the output of the authentication
procedure is a negative one. Note that in this case watermark
security is not a pressing requirement, since it is unlikely that someone is
interested in intentionally removing the watermark. On the contrary, pirates
would be interested in modifying the host data without leaving any
trace of the modification.
Though the approaches to data authentication relying on (semi-) fragile
and robust watermarking may seem rather different, it is possible to describe
both of them by means of the same mathematical framework. Let us
start by assuming that the watermark authentication relies on is a blind3
and readable one.
During the embedding phase, the watermark signal is generated by a
suitable watermark generation function Q, taking as input a secret key Kg
and, possibly, the to-be-authenticated asset A.
w = Q(A,Kg). (2.7)
The watermarking signal w is then hidden within A, thus producing a
watermarked asset Aw (for sake of simplicity we assume that w coincides
with b):
Av=£(A,w,K), (2.8)
where the secret key K used for watermark embedding must not be confused
with the secret key Kg used to generate the watermark.
To describe the verification procedure, let us indicate by A'w a possibly
corrupted copy of Aw. In order to verify the integrity of A'w, a watermark
signal w' is computed by means of the generation function Q.
vf' = g(A^,Ka). (2.9)
Then the watermark embedded within A'w is extracted, producing the watermark
signal w". Finally, the signals w' and w" are compared: if they
are equal the integrity verification procedure succeeds, otherwise it fails4:
,K), (2.10)
If w' = w" Then
the Asset is authentic
™El se (2-11
the Asset has been tampered with.
Authentication algorithms allowing tampering localization, infer the position
of tampering by giving w a suitable form and by looking at the
positions where w' and w" differ.
The above framework is valid both for fragile and robust watermarking.
The difference between the two approaches resides in the mechanism at the
basis of manipulation detection: while fragile techniques assume that any
manipulations modify the embedded watermark, robust techniques assumes
that the watermark is not affected by any manipulations; on the contrary,
it is the watermark generation function that, in this case, produces a watermark
that does not correspond to the embedded one. More formally, we
can say that, when a manipulation occurs, for fragile techniques we expect
that:that is, the generation function is not affected by manipulations, whereas
the decoding function is. Conversely, in the robust watermarking case we
expect that:
(W1^W => wVw" (2,3) (^ w" = w,
i.e. manipulations only affect the output of the generation function Q.
To introduce a certain degree of tolerance in the integrity verification
phase, e.g. to discriminate between allowed and non-allowed manipulations,
the dependence of Q (in the robust watermarking case) or T> (in
the fragile watermarking case) upon asset manipulations has to be relaxed.
In the fragile scheme, this leads to the use of semi-fragile watermarking,
whereas in the robust approach, this implies the design of a function Q
that depends only on certain asset features5. Alternatively, the possibility
of distinguishing between different types of manipulations can rely on a
clever comparison between w' and w". For instance, if w coincides with a
low resolution version of A, the comparison between w' and w" can be performed
manually, thus letting a human operator decide whether revealed
modifications are admissible or not.
As to authentication through fragile watermarking, the easiest way to
achieve the conditions expressed in equation (2.12) is to let Q depend only
on Kg. In this way, in fact, the watermark signal w does not depend
on the host asset, hence it does not depend on asset manipulations as
well. In the case of robust watermarking, the most common choice for the
generation function Q, is to let its output correspond to a summary of the
to-be-authenticated asset. More specifically, to focus the authentication
procedure on meaningful modifications only, it is rather common to design
Q so that it grasps the semantic content of A, e.g. by letting Q(A, Kg)
coincide with a low resolution version of A.
The authentication framework described above applies to readable watermarking,
however its extension to detectable watermarking is straightforward.
It only needs to replace equations (2.10) and (2.11), with the
following authenticity check:
If T>(A'w,w',K)=yes Then
the Asset is authentic
™ (2'14) Else
the Asset has been tampered with.where w' is still computed as in equation (2.9). As for readable watermarking,
the possibility of distinguishing between allowed and non allowed
manipulations resides in the sensibility of Q or T> on asset manipulations.
2.2.3 Requirements of data-hiding-based authentication
As we already noted, it is impossible to define an exact list of requirements
a data hiding algorithm must fulfill without taking into account the
application scenario; nevertheless, when restricting the analysis to data
authentication, the following general considerations hold:
• Blindness: of course, if the original asset A is available, checking the
integrity of a copy of A is a trivial task, since it only needs to compare
the copy with A. As to data origin, when disentangled from integrity
verification, it can be treated by the same standard as annotation
watermarks (see section 2.4).
• Readability/detectabily: by following the discussion carried out so far,
it can be concluded that no particular preference can be given to readable
or detectable watermarking with respect to integrity verification.
• Robustness: data authentication can be achieved both by means of
fragile and robust watermarking. Moreover, both the approaches permit,
at least in principle, to discriminate between different classes of
manipulations. Trying to summarize the pro's and con's of the two
methods, we can say that with (semi-)fragile techniques it is more difficult
to distinguish between malicious and innocuous modifications,
whereas the robust watermarking approach seems more promising,
since the final judgement on tampering usually relies on a visual comparison
between the asset summary conveyed by the watermark and
the to-be-authenticated copy. Conversely, the need of ensuring a high
watermark capacity without loosing robustness is the Achille's heel
of robust techniques; the need for a high capacity deriving from the
large number of bits needed to produce a meaningful asset summary.
• Imperceptibility: due to the particular nature of the authentication
task, it is usually necessary that watermark imperceptibility is guaranteed.
Nevertheless, some applications may exist in which a slightly
perceptible watermark is allowed. This is the case, for example, of
Video Surveillance (VS) data authentication, where authentication is
needed to keep the legal value of VS data intact: in most cases, it
is only necessary that the hidden information does not disturb the
correct behavior of the automatic visual inspection process the VS
system relies on (see section 5.5.1 for more details).

No comments: