Digital Compositing Ron M. Brinkmann Table of Contents 1. Introduction 2. Definitions: What is a Composite? 3. Image Formats and Data Storage 4. Image Manipulation 5. Mattes 6. Image Tracking and Stabilizing 7. Dealing with Non-Square Pixels 8. Efficiency 9. Preparation of Elements 10. Element Repair 11. The Final Touches Appendix: Common Image File Formats Bibliography Introduction: Digital Compositing 1.1 Introduction Digital Compositing, as we are going to be discussing it, attempts primarily to deal with the process of integrating images from multiple sources into a single, seamless whole. While many of these techniques apply to still images, we will be looking at tools and methods which are useful and reasonable for large sequences of images as well. In the first half of this document, we will deal more with the Science of Digital Compositing. The second half will deal with a few of the more complex (or at least misunderstood) issues on the topic, as well as look at techniques that deal with the Art of Digital Compositing. As you will see, the skills of a good compositor range from technician to artist. Not only does one need to understand the basic 'tools of the trade', which can include a variety of software, one must also be aware of the visual nature of the process. Remember, the bottom line is that all the technical considerations are unimportant when confronted with the question of "Does it look right?" Obviously this is a subjective judgement, and a good compositor able to make these decisions will always be in high demand. 1 Definitions 2.1 What is a Composite? Here's a basic definition: Composite: The manipulated combination of at least two source images to produce an integrated result. By far, the most difficult part of this process is producing the integrated result - an image which doesn't betray that its creation is owed to multiple source elements. In particular, we are usually attempting to produce (sequences of) images which could have been believably photographed without the use of any post-processing. Colloquially, it should look 'real'. Even if the elements in the scene are obviously not real, one must be able to believe that everything in the scene was photographed at the same time, by the same camera. We will be discussing the manipulations needed to achieve this combination, and the various tools necessary to achieve the desired result. In the digital world, which is the world we're interested in for the bulk of today's discussion, these tools are specifically the software needed to create a composite. Keep in mind that compositing was being done long before computers entered the picture (pardon the pun). Optical compositing is still a valid and often-used process, and many of the techniques and skills developed by optical compositors are directly applicable to the digital realm ( in many cases, digital techniques owe their origin directly to optical methodologies ). Finally, remember that every person who views an image has a little expert he carries around with him. This expert, the subconscious, has spent a lifetime learning what looks 'real'. Even if, consciously, the viewer is unable to say why a composite looks wrong, the subconscious can notice 'warning signs' of artificiality. Beware. 2.2 Digital Compositing Tools Some of the software tools we'll be discussing include: Paint Programs Color Correction utilities Warping/Morphing Tools Matte-Extraction software General-purpose compositing packages But we'll also be discussing (particularly during the 2nd half of this course), some of the things that should be done before, during and after the creation of the original elements to ensure that they are as useful as possible. 2 2.3 Basic Terms Before we go any further, let's take a look at a still-frame composite and define some naming conventions. Example 1 shows a composite image of a mannequin standing in front of some trees. This composite was created from a couple of different original images. We usually refer to the individual pieces from which we create our final composite as 'Elements'. Elements in this composite include: The mannequin. [Example 1a] The background. [Example 1a] 3 The matte for the mannequin. [Example 1a] You may also commonly hear elements referred to as 'layers'. A subset of elements, called 'plates', usually refers to original scanned sequences. Intermediate elements generated during the creation of a composite are generally not referred to as plates. As stated, a composite is the 'manipulated' combination of elements. This 'manipulation' is usually some form of digital image processing such as color correction or matte creation. We'll discuss various image processing techniques in Chapter 4. Mattes, which are used either to isolate or remove certain areas of an image before it is combined with another image, will be discussed in Chapter 5. There is one final piece of business to be dealt with, before we go any further: Disclaimer: Different people, countries and software packages do not always use the same names to refer to certain tools, operators, or subjects. In addition, due to the need to simplify certain things, just about every statement we make could probably be countered with some exception to the rule. Deal with it. 4 Chapter 3: Image Format and Data Storage 3.1 Introduction Now, before we get into the basics of compositing, we need to talk about a few things first. For the topic of digital compositing, there's the obvious issue of how the digital data is stored and represented. And before we discuss that, we need to cover, very quickly, where the images come from in the first place. 3.2 Image Generation Elements generally come from one of three places. Either they're hand-painted, synthetic, human generated elements (which can range in complexity from a simple black and white mask to a photo-realistic matte painting of an imaginary scene), computer-generated images (rendered elements from a 2-D or 3-D animation package), or images that have been scanned into the computer from some other source (typically film or video). This is very simplified - CG elments may contain scanned or painted elements - as texture maps for instance, matte paintings often borrow heavily from other sources, and live-action elements may be augmented with hand-painted 'wire removals' or 'effects animations'. We'll make the assumption that most everyone here has some idea of how CG elements are rendered. The topic is certainly beyond the scope of this discussion. The major distinguishing factor between CG and original scanned images (for the purposes of compositing) is the fact that CG elements are usually generated with a matte channel. Scanned images, unless they're being used solely as a background, will need to have some work done with them to create a matte channel. 3.3 Image Input Devices Images which were created by non-digital methods will need to be 'scanned', or 'digitized'. Sequences of scanned images will probably come from one of 2 places; either video or film. Video images, captured with a video camera, can simply be passed through an encoder to create digital data. High-end video storage formats, such as D-1, are already considered to be digitally encoded, and you merely need to have the equipment to transfer the files to your compositing system. Since even the best video equipment is limited to less than 8 bits of information per component, it is generally unnecessary to store digital images which came from a video source at more than 24 bits, although as you'll see, there are some special considerations, even when storing 8-bit video data, to ensure the highest color fidelity. Digitizing images which originated on film necessitates a somewhat different approach. The use of a film-scanner is required. Until very recently, film scanners were typically custom-made, proprietary systems that were built by companies for their own, internal use. Within the last few years, companies such as Kodak and Imagica have begun to provide off-the-shelf scanners to whomever wishes to buy them. 3.4 Digital Image Storage There are dozens, probably hundreds of different ways of storing an image digitally. But, in day to day usage, most of these methods have several things in common. First, images are stored as an array of dots, or pixels. The larger the number of these pixels, the greater the 'resolution' (more properly, 'spatial resolution') of the image. Each pixel is composed of 3 components, red, green and blue (usually simplified to R,G, and B). By using a combination of these 3 primary colors, we can now represent the full range of the visible spectrum. When we need to refer to the full-sized array of pixels for a single component, it is known as a 'channel'. 5 Consider an image such as that shown in Example 2 . Example 3 is the 'Green Channel' of this sample image. Note that the brightest areas of the green channel are the areas that have the highest green component in the original image. 3.5 Bit Depth Each component of a pixel can be represented by an arbitrary number of bits. The number of bits-per-component is known as the channel's 'Bit-Depth'. Probably the most common bit-depth is 8 bits-per-channel, which is also referred to as a 24bit image. (8 bits each for the Red, Green and Blue channel). 8 bits per component means that each component can have 256 different possible intensities (28) and the 3 components together can represent about 16 million (16,777,216 for the geeks) colors. Although this sounds like a lot of colors, we'll discuss later why it is often still not enough. Feature film work, for instance, often represents digital images with as much as 16 bits per component, which gives us a palette of 281 trillion different colors. In contrast, lower end video games may work with only 4 bits per channel, or less! 6 Example 4 shows our original image decimated to 4 bits-per-channel. Notice the 'quantizing' that arises. This phenomenon, also known as 'banding' or 'contouring' or 'posterization', arises when we do not have the ability to specify enough values for smooth transitions between colors. Since different images may be stored at different bit-depths, it is convenient to normalize all values to floating- point numbers in the range of 0 to 1. Throughout this document we will assume that an RGB triplet of (1,1,1) refers to a 100% white pixel, a pixel that is (0,0,0) is pure black, and (0.5, 0.5, 0.5) is 50% gray. In addition to the 3 color channels, there is often a 4th channel, the alpha channel, which can be stored with an image. It is used to determine the transparency of various pixels in the image. This channel is also known as the 'matte' channel, and as you'll come to see, it is the concept of 'matte' upon which nearly all compositing is based. 3.6 File Formats Now that we have a digital representation of an image, we need to store it in a file. There is a huge array of different formats that one may choose to store an image. Formats vary in their ability to handle things like: Differing bit-depths. Different spatial resolutions. Compression. 'Comment' information in a header. Matte channel. Z-depth information. We've included a list of the more popular file formats, along with some of their different characteristics, at the end of this paper. Because high-resolution images can take up a huge amount of disk space, it is often desirable to compress them. There are a number of techniques for this, some of which will decrease the quality of the image (referred to as 'lossy compression' ) and some which maintain the full quality of the image. Always be aware of whether or not you are storing in an image format, or using a form of compression, which could degrade the quality of your image! In addition to standard data-compression algorithms, there is also a technique whereby images are pre-processed to be in Non-Linear Color Space . 7 3.7 Non-Linear Color Spaces In order to fully understand all of the terms involved with the concept of storing images in 'linear' or 'logarithmic' format, we need to go over a few more basic concepts about how images are stored digitally. If we had an unlimited number of bits-per-pixel, nonlinear representations would become unnecessary. However, practical considerations of available disk space, memory usage, speed of calculations and even transfer/transmission methods, all dictate that we attempt to store images as efficiently as possible, keeping the minimum amount of data necessary to realize the image quality we find acceptable. Encoding an image into 'nonlinear' space is driven by the need to store the maximum amount of useful information in the precision or bit-depth we have decided upon. Note that we have made the distinction that we wish to store useful information. How do we decide what is useful and what isn't? The decision is usually based (at least partially) on the knowledge of how the human eye works. In particular, the human eye is far more sensitive to color- and brightness-differences in the low- to mid-range than it is to changes in very bright areas. Nonlinear encoding is useful in a variety of situations, and whether you work in film or video, you will undoubtedly need to deal with the process. In the video world, this nonlinear conversion is known as a gamma correction. Typically, video images are stored with a gamma correction of 2.2, and re-conversion to linear space is done by merely applying the inverse, gamma 0.45, to the image. For film images, a more complex conversion is often used, which takes into account various idiosyncracies of how film-stock responds to varying exposure levels. The most common of these conversions is specified by Kodak for their 'Cineon' file format, and is colloquially known as storing images in 'Logarithmic Color Space', or simply 'Log Space'. Kodak's Log Space also includes room for data that may be outside of the range of 'White' or 'Black' when the digital data is actually put back on film, but needs to be preserved for intermediate color correction. The Cineon format additionally compresses the file's size by placing three 10-bit channels into 32 bits of data. Shake converts between logarithmic and linear color space with the LogC and DelogC nodes -prw For our example of nonlinear encoding, we'll look at the extremely simplified case of wishing to take an image which originated as a 4-bit grayscale image and store it as efficiently as possible in a 3-bit file format. Once you understand this scenario, you can mentally extrapolate the process to real-world situations where we deal with greater bit-depths. If our original image starts as a 4-bit grayscale image, we have 16 different grayscale values that we can represent. Our 3-bit destination image has only eight different grayscale values. The most simple conversion would be merely to take colors 1 and 2 from the input range and convert them to color value 1 in the output image. Colors 3 and 4 would both become color 2, 5 and 6 would be 3, etc. This mapping is shown as follows: 8 The problem with this method is that it ignores the fact that the human eye is less sensitive to differences in tone as brightness increases. It is hard to demonstrate with only 16 colors, but consider if we had 100 colors to choose from. You would find that, visually, it would be almost impossible to distinguish the difference between 99% and 100% white. In the darker colors, however, the difference between 0% and 1% would still remain noticeable. Thus, a better way to convert from 16 colors to 8 colors would be to try and consolidate a few more of the upper-range colors together, while preserving as many of the steps as possible in the lower range. The next graph shows this type of mapping. The small inset shows a graph (solid line) of this mapping, as well as an approximation (dotted line) of a lookup table curve that would accomplish this same color-correction. (Note the similarity between this curve's shape and a gamma-correction curve). If we were to view this new image directly, it would appear to be far too bright, since we've shifted mid-range colors in the original image to be bright colors in our encoded image. To properly view this image, we would either need to re-convert back to linear color space or modify our viewing device (i.e. the video- or computer-monitor) so that it compensates for this compression. The bottom line, and the most commonly misunderstood fact about representing data in Logarithmic or other nonlinear formats, is that: The conversion between Log and Linear is simply a custom color-correction. It is a method of consolidating certain color ranges together so that they take up less of your color palette, thus leaving room for the other, more important ranges to keep their full resolution, even when reducing the number of bits used to store a given image. Because log-encoding is only a color-correction, it means that any file format can be used to store logarithmic data. It also means that there is effectively no way to determine if an image is in log or linear space, other than to visually compare it with the original scene. 9 3.8 Color-Correcting in Nonlinear Space Although you may have chosen to store images in a nonlinear format, when it comes to actually working with these images, you will almost always want to convert them back to linear space first. The reason has to do with the need to perform additional color-correction on these elements. Here's another warning: Color correction in Log-Space produces unpredictable results. Always Color-correct in Linear space. Consider Example 5 , and a simple color correction where the red channel has been multiplied by a constant value of 0.8. The left side of the image was corrected in linear space, and comparing any pixel with the original image in Example 2 will show a red value that has been reduced by 20%. The right side of the image, however, was first encoded into log-space, then the red channel was multiplied by 0.8. The image was then restored to linear space. As you can see, a fairly slight color correction in linear space has become rather drastic when applied to a log-encoded image. In particular, notice how the mid-tones have changed quite a bit more than the darker areas of the image. This problem can be particularly vexing when working with bluescreen elements: Attempts to reduce blue spill may result in undesirable shifts in flesh-tones, for instance. 10