Table Of ContentDigital Compositing
Ron M. Brinkmann
Table of Contents
1. Introduction
2. Definitions: What is a Composite?
3. Image Formats and Data Storage
4. Image Manipulation
5. Mattes
6. Image Tracking and Stabilizing
7. Dealing with Non-Square Pixels
8. Efficiency
9. Preparation of Elements
10. Element Repair
11. The Final Touches
Appendix: Common Image File Formats
Bibliography
Introduction: Digital Compositing
1.1 Introduction
Digital Compositing, as we are going to be discussing it, attempts primarily to deal with the process of
integrating images from multiple sources into a single, seamless whole. While many of these techniques apply to
still images, we will be looking at tools and methods which are useful and reasonable for large sequences of
images as well.
In the first half of this document, we will deal more with the Science of Digital Compositing. The second half will
deal with a few of the more complex (or at least misunderstood) issues on the topic, as well as look at techniques
that deal with the Art of Digital Compositing.
As you will see, the skills of a good compositor range from technician to artist. Not only does one need to
understand the basic 'tools of the trade', which can include a variety of software, one must also be aware of the
visual nature of the process.
Remember, the bottom line is that all the technical considerations are unimportant when confronted with the
question of "Does it look right?" Obviously this is a subjective judgement, and a good compositor able to make
these decisions will always be in high demand.
1
Definitions
2.1 What is a Composite?
Here's a basic definition:
Composite: The manipulated combination of at least two source
images to produce an integrated result.
By far, the most difficult part of this process is producing the integrated result - an image which doesn't betray
that its creation is owed to multiple source elements.
In particular, we are usually attempting to produce (sequences of) images which could have been believably
photographed without the use of any post-processing. Colloquially, it should look 'real'. Even if the elements in
the scene are obviously not real, one must be able to believe that everything in the scene was photographed at
the same time, by the same camera.
We will be discussing the manipulations needed to achieve this combination, and the various tools necessary to
achieve the desired result. In the digital world, which is the world we're interested in for the bulk of today's
discussion, these tools are specifically the software needed to create a composite. Keep in mind that compositing
was being done long before computers entered the picture (pardon the pun). Optical compositing is still a valid
and often-used process, and many of the techniques and skills developed by optical compositors are directly
applicable to the digital realm ( in many cases, digital techniques owe their origin directly to optical
methodologies ).
Finally, remember that every person who views an image has a little expert he carries around with him. This
expert, the subconscious, has spent a lifetime learning what looks 'real'. Even if, consciously, the viewer is unable
to say why a composite looks wrong, the subconscious can notice 'warning signs' of artificiality. Beware.
2.2 Digital Compositing Tools
Some of the software tools we'll be discussing include:
Paint Programs
Color Correction utilities
Warping/Morphing Tools
Matte-Extraction software
General-purpose compositing packages
But we'll also be discussing (particularly during the 2nd half of this course), some of the things that should be
done before, during and after the creation of the original elements to ensure that they are as useful as possible.
2
2.3 Basic Terms
Before we go any further, let's take a look at a still-frame composite and define some naming conventions.
Example 1 shows a composite image of a mannequin standing in front of some trees.
This composite was created from a couple of different original images. We usually refer to the individual pieces
from which we create our final composite as 'Elements'. Elements in this composite include:
The mannequin. [Example 1a]
The background. [Example 1a]
3
The matte for the mannequin. [Example 1a]
You may also commonly hear elements referred to as 'layers'. A subset of elements, called 'plates', usually refers
to original scanned sequences. Intermediate elements generated during the creation of a composite are generally
not referred to as plates.
As stated, a composite is the 'manipulated' combination of elements. This 'manipulation' is usually some form of
digital image processing such as color correction or matte creation. We'll discuss various image processing
techniques in Chapter 4. Mattes, which are used either to isolate or remove certain areas of an image before it is
combined with another image, will be discussed in Chapter 5.
There is one final piece of business to be dealt with, before we go any further:
Disclaimer:
Different people, countries and software packages do not always use the same
names to refer to certain tools, operators, or subjects. In addition, due to the need
to simplify certain things, just about every statement we make could probably be
countered with some exception to the rule.
Deal with it.
4
Chapter 3: Image Format and Data Storage
3.1 Introduction
Now, before we get into the basics of compositing, we need to talk about a few things first. For the topic of
digital compositing, there's the obvious issue of how the digital data is stored and represented. And before we
discuss that, we need to cover, very quickly, where the images come from in the first place.
3.2 Image Generation
Elements generally come from one of three places. Either they're hand-painted, synthetic, human generated
elements (which can range in complexity from a simple black and white mask to a photo-realistic matte painting
of an imaginary scene), computer-generated images (rendered elements from a 2-D or 3-D animation package),
or images that have been scanned into the computer from some other source (typically film or video). This is very
simplified - CG elments may contain scanned or painted elements - as texture maps for instance, matte paintings
often borrow heavily from other sources, and live-action elements may be augmented with hand-painted 'wire
removals' or 'effects animations'.
We'll make the assumption that most everyone here has some idea of how CG elements are rendered. The topic is
certainly beyond the scope of this discussion. The major distinguishing factor between CG and original scanned
images (for the purposes of compositing) is the fact that CG elements are usually generated with a matte
channel. Scanned images, unless they're being used solely as a background, will need to have some work done
with them to create a matte channel.
3.3 Image Input Devices
Images which were created by non-digital methods will need to be 'scanned', or 'digitized'.
Sequences of scanned images will probably come from one of 2 places; either video or film. Video images,
captured with a video camera, can simply be passed through an encoder to create digital data. High-end video
storage formats, such as D-1, are already considered to be digitally encoded, and you merely need to have the
equipment to transfer the files to your compositing system.
Since even the best video equipment is limited to less than 8 bits of information per component, it is generally
unnecessary to store digital images which came from a video source at more than 24 bits, although as you'll see,
there are some special considerations, even when storing 8-bit video data, to ensure the highest color fidelity.
Digitizing images which originated on film necessitates a somewhat different approach. The use of a film-scanner
is required. Until very recently, film scanners were typically custom-made, proprietary systems that were built by
companies for their own, internal use. Within the last few years, companies such as Kodak and Imagica have
begun to provide off-the-shelf scanners to whomever wishes to buy them.
3.4 Digital Image Storage
There are dozens, probably hundreds of different ways of storing an image digitally. But, in day to day usage,
most of these methods have several things in common. First, images are stored as an array of dots, or pixels. The
larger the number of these pixels, the greater the 'resolution' (more properly, 'spatial resolution') of the image.
Each pixel is composed of 3 components, red, green and blue (usually simplified to R,G, and B). By using a
combination of these 3 primary colors, we can now represent the full range of the visible spectrum. When we
need to refer to the full-sized array of pixels for a single component, it is known as a 'channel'.
5
Consider an image such as that shown in Example 2 .
Example 3 is the 'Green Channel' of this sample image. Note that the brightest areas of the green channel are
the areas that have the highest green component in the original image.
3.5 Bit Depth
Each component of a pixel can be represented by an arbitrary number of bits. The number of bits-per-component
is known as the channel's 'Bit-Depth'. Probably the most common bit-depth is 8 bits-per-channel, which is also
referred to as a 24bit image. (8 bits each for the Red, Green and Blue channel).
8 bits per component means that each component can have 256 different possible intensities (28) and the 3
components together can represent about 16 million (16,777,216 for the geeks) colors. Although this sounds like
a lot of colors, we'll discuss later why it is often still not enough. Feature film work, for instance, often represents
digital images with as much as 16 bits per component, which gives us a palette of 281 trillion different colors. In
contrast, lower end video games may work with only 4 bits per channel, or less!
6
Example 4 shows our original image decimated to 4 bits-per-channel. Notice the 'quantizing' that arises. This
phenomenon, also known as 'banding' or 'contouring' or 'posterization', arises when we do not have the ability to
specify enough values for smooth transitions between colors.
Since different images may be stored at different bit-depths, it is convenient to normalize all values to floating-
point numbers in the range of 0 to 1. Throughout this document we will assume that an RGB triplet of (1,1,1)
refers to a 100% white pixel, a pixel that is (0,0,0) is pure black, and (0.5, 0.5, 0.5) is 50% gray.
In addition to the 3 color channels, there is often a 4th channel, the alpha channel, which can be stored with an
image. It is used to determine the transparency of various pixels in the image. This channel is also known as the
'matte' channel, and as you'll come to see, it is the concept of 'matte' upon which nearly all compositing is based.
3.6 File Formats
Now that we have a digital representation of an image, we need to store it in a file. There is a huge array of
different formats that one may choose to store an image. Formats vary in their ability to handle things like:
Differing bit-depths.
Different spatial resolutions.
Compression.
'Comment' information in a header.
Matte channel.
Z-depth information.
We've included a list of the more popular file formats, along with some of their different characteristics, at the
end of this paper.
Because high-resolution images can take up a huge amount of disk space, it is often desirable to compress them.
There are a number of techniques for this, some of which will decrease the quality of the image (referred to as
'lossy compression' ) and some which maintain the full quality of the image.
Always be aware of whether or not you are storing in an image format, or using a form of compression, which
could degrade the quality of your image!
In addition to standard data-compression algorithms, there is also a technique whereby images are pre-processed
to be in Non-Linear Color Space .
7
3.7 Non-Linear Color Spaces
In order to fully understand all of the terms involved with the concept of storing images in 'linear' or
'logarithmic' format, we need to go over a few more basic concepts about how images are stored digitally.
If we had an unlimited number of bits-per-pixel, nonlinear representations would become unnecessary. However,
practical considerations of available disk space, memory usage, speed of calculations and even
transfer/transmission methods, all dictate that we attempt to store images as efficiently as possible, keeping the
minimum amount of data necessary to realize the image quality we find acceptable.
Encoding an image into 'nonlinear' space is driven by the need to store the maximum amount of useful
information in the precision or bit-depth we have decided upon. Note that we have made the distinction that we
wish to store useful information. How do we decide what is useful and what isn't? The decision is usually based
(at least partially) on the knowledge of how the human eye works. In particular, the human eye is far more
sensitive to color- and brightness-differences in the low- to mid-range than it is to changes in very bright areas.
Nonlinear encoding is useful in a variety of situations, and whether you work in film or video, you will
undoubtedly need to deal with the process. In the video world, this nonlinear conversion is known as a gamma
correction. Typically, video images are stored with a gamma correction of 2.2, and re-conversion to linear space is
done by merely applying the inverse, gamma 0.45, to the image.
For film images, a more complex conversion is often used, which takes into account various idiosyncracies of how
film-stock responds to varying exposure levels. The most common of these conversions is specified by Kodak for
their 'Cineon' file format, and is colloquially known as storing images in 'Logarithmic Color Space', or simply 'Log
Space'. Kodak's Log Space also includes room for data that may be outside of the range of 'White' or 'Black' when
the digital data is actually put back on film, but needs to be preserved for intermediate color correction. The
Cineon format additionally compresses the file's size by placing three 10-bit channels into 32 bits of data.
Shake converts between logarithmic and linear color space with the LogC and DelogC nodes -prw
For our example of nonlinear encoding, we'll look at the extremely simplified case of wishing to take an image
which originated as a 4-bit grayscale image and store it as efficiently as possible in a 3-bit file format. Once you
understand this scenario, you can mentally extrapolate the process to real-world situations where we deal with
greater bit-depths.
If our original image starts as a 4-bit grayscale image, we have 16 different grayscale values that we can
represent. Our 3-bit destination image has only eight different grayscale values. The most simple conversion
would be merely to take colors 1 and 2 from the input range and convert them to color value 1 in the output
image. Colors 3 and 4 would both become color 2, 5 and 6 would be 3, etc. This mapping is shown as follows:
8
The problem with this method is that it ignores the fact that the human eye is less sensitive to differences in
tone as brightness increases. It is hard to demonstrate with only 16 colors, but consider if we had 100 colors to
choose from. You would find that, visually, it would be almost impossible to distinguish the difference between
99% and 100% white. In the darker colors, however, the difference between 0% and 1% would still remain
noticeable. Thus, a better way to convert from 16 colors to 8 colors would be to try and consolidate a few more
of the upper-range colors together, while preserving as many of the steps as possible in the lower range. The next
graph shows this type of mapping.
The small inset shows a graph (solid line) of this mapping, as well as an approximation (dotted line) of a lookup
table curve that would accomplish this same color-correction. (Note the similarity between this curve's shape and
a gamma-correction curve). If we were to view this new image directly, it would appear to be far too bright,
since we've shifted mid-range colors in the original image to be bright colors in our encoded image. To properly
view this image, we would either need to re-convert back to linear color space or modify our viewing device (i.e.
the video- or computer-monitor) so that it compensates for this compression.
The bottom line, and the most commonly misunderstood fact about representing data in Logarithmic or other
nonlinear formats, is that:
The conversion between Log and Linear is simply a custom color-correction.
It is a method of consolidating certain color ranges together so that they take up less of your color palette, thus
leaving room for the other, more important ranges to keep their full resolution, even when reducing the number
of bits used to store a given image.
Because log-encoding is only a color-correction, it means that any file format can be used to store logarithmic
data. It also means that there is effectively no way to determine if an image is in log or linear space, other than
to visually compare it with the original scene.
9
3.8 Color-Correcting in Nonlinear Space
Although you may have chosen to store images in a nonlinear format, when it comes to actually working with
these images, you will almost always want to convert them back to linear space first. The reason has to do with
the need to perform additional color-correction on these elements. Here's another warning:
Color correction in Log-Space produces unpredictable results. Always Color-correct in Linear space.
Consider Example 5 , and a simple color correction where the red channel has been multiplied by a constant
value of 0.8. The left side of the image was corrected in linear space, and comparing any pixel with the original
image in Example 2 will show a red value that has been reduced by 20%. The right side of the image, however,
was first encoded into log-space, then the red channel was multiplied by 0.8. The image was then restored to
linear space. As you can see, a fairly slight color correction in linear space has become rather drastic when
applied to a log-encoded image. In particular, notice how the mid-tones have changed quite a bit more than the
darker areas of the image. This problem can be particularly vexing when working with bluescreen elements:
Attempts to reduce blue spill may result in undesirable shifts in flesh-tones, for instance.
10