원본 사이트 : http://www.bluesnews.com/abrash/contents.shtml

Surface Caching Revisited, Quake’s Triangle Models, and More

by Michael Abrash

In the late 70’s, I spent a summer doing contract programming at a government-funded installation called the Northeast Solar Energy Center (NESEC).? Those were heady times for solar energy, what with the oil shortages, and there was lots of money being thrown at places like NESEC, which was growing fast.

NESEC was across the street from MIT, which made for good access to resources.? Unfortunately, it also meant that NESEC was in a severely parking-impaired part of the world, what with the student population and Boston’s chronic parking shortage.? The NESEC building did have its own parking lot, but it wasn’t nearly big enough, because students parked in it at every opportunity.? The lot was posted, and cars periodically got towed, but King Canute stood a better chance against the tide than NESEC did against the student hordes, and late arrivals to work often had to park blocks away and hike to work, to their considerable displeasure.

Back then, I drove an aging Volvo sedan that was sorely in need of a ring job.? It ran fine but burned a quart of oil every 250 miles, so I carried a case of oil in the trunk, and checked the level frequently.? One day, walking to the computer center a couple of blocks away, I cut through the parking lot and checked the oil in my car.? It was low, so I topped it off, left the empty oil can next to the car so I would see it and remember to pick it up to throw out on my way back, and headed toward the computer center.

I’d gone only a few hundred feet when I heard footsteps and shouting behind me, and a wild-eyed man in a business suit came running up to me, screaming.? “It’s bad enough you park in our lot, but now you’re leaving your garbage lying around!” he yelled.? “Don’t you people have any sense of decency?”? I told him I worked at NESEC and was going to pick up the can on my way back, and he shouted, “Don’t give me that!”? I repeated my statements, calmly, and told him who I worked for and where my office was, and he said “Don’t give me that” again, but with a little less certainty.? I kept adding detail until it was obvious that I was telling the truth, and he suddenly said “Oh, my God,” turned red, and started to apologize profusely.? A few days later, we passed in the hallway, and he didn’t look me in the eye.

The interesting point is that there was really no useful outcome that could have resulted from his outburst.? Suppose I had been a student--what would he have accomplished by yelling at me?? He let his emotions overrule his common sense, and as a result, did something he later wished he hadn’t.? I’ve seen many programmers do the same thing, especially when they’re working long hours and not feeling adequately appreciated.? For example, a few months back I got mail from a programmer who complained bitterly that although he was critical to his company’s success, management didn’t appreciate his hard work and talent, and asked if I could help him find a better job.? I suggested several ways that he might look for another job, but also asked if he had tried working his problems out with his employers; if he really was that valuable, what did he have to lose?? He admitted he hadn’t, and recently he wrote back and said that he had talked to his boss, and now he was getting paid a lot more money, was getting credit for his work, and was just flat-out happy.

We programmers think of ourselves as rational creatures, but most of us get angry at times, and when we do, like everyone else, we tend to be driven by our emotions instead of our minds.? It’s my experience that thinking rationally under those circumstances can be difficult, but produces better long-term results every time--so if you find yourself in that situation, stay cool and think your way through it, and odds are you’ll be happier down the road.

Of course, most of the time programmers really are rational creatures, and the more information we have, the better.? In that spirit, let’s look at more of the stuff that makes Quake tick, starting with what I’ve recently learned about surface caching.

More on surface caching

Last time, I discussed in detail the surface caching technique that Quake uses to do detailed, high-quality lighting without lots of polygons.? Since then, I’ve spent a considerable amount of time working on the port of Quake to Rendition’s Verite 3-D accelerator chip, so I’ll start off this month by discussing what I’ve learned about using surface caching in conjunction with hardware.

As you’ll recall, the key to surface caching is that lighting information and polygon detail are stored separately, with lighting not tied to polygon vertices, then combined on demand into what I call surfaces:? lit, textured rectangles that are used as the input to the texture mapper.? Building surfaces takes time, so performance is enhanced by caching the surfaces from one frame to the next.? As I pointed out last time, 3-D hardware accelerators are designed to optimize Gouraud shading, but surface caching can also work on accelerators, with some significant quality advantages.

The surface-caching architecture of the Verite version of Quake (VQuake) is essentially the same as in software Quake:? The CPU builds surfaces on demand, which are then downloaded to the accelerator’s memory and cached there.? There are a couple of key differences, however:? The need to download surfaces, and the requirement that the surfaces be in 16-bit-per-pixel format.

Downloading surfaces to the accelerator is a performance hit that doesn’t exist in the software version.? Although Verite uses DMA to download surfaces, DMA does steal performance from the CPU.? This cost is increased by the requirement for 16-bpp surfaces, because twice as much data must be downloaded.? Worse still, it takes about twice as long to build 16-bpp surfaces as 8-bpp surfaces, so the cost of missing the surface cache is well over twice as expensive in VQuake as in Quake.? Fortunately, there’s 4 Mb of memory on Verite-based adapters, so the surface cache doesn’t miss too often and VQuake runs fine (and looks very good, thanks to bilinear texture filtering, which by itself is pretty much worth the cost of 3-D hardware), but it’s nonetheless true that a completely straightforward port of the surface-caching model is not as appealing for hardware as for software.? This is especially true at high resolutions, where the needs of the surface cache increase due to more detailed surfaces but available memory decreases due to frame buffer size.

Does my recent experience indicate that as the PC market moves to hardware, there’s no choice but to move to Gouraud shading, despite the quality issues?? Not at all.? First, surface caching does still work well, just not as relatively well compared to Gouraud shading as is the case in software.? Second, there are at least two alternatives that preserve the advantages of surface caching without many of the disadvantages noted above.

The obvious solution is to have the accelerator card build the textures, rather than having the CPU build and then download them.? This eliminates downloading completely, and lets the accelerator, which should be faster at such things, do the texel manipulation.? Whether this is actually faster depends on whether the CPU or the accelerator is doing more of the work overall, but it eliminates download time, which is a big help.? This approach retains the ability to composite other effects, such as splatters and dents, onto surfaces, but by the same token retains the high memory requirements and dynamic lighting performance impact of the surface cache.? It also requires that the 3-D API and accelerator being used allow drawing into a texture, which is not universally true.? Neither do all APIs or accelerators allow applications enough control over the texture heap so that an efficient surface cache can be implemented, a point that favors non-caching approaches.? (A similar option that wasn’t possible due to time limitations is downloading 8-bpp surfaces and having the accelerator expand them to 16-bpp surfaces as it stores them in texture memory.? Better yet, some accelerators support 8-bpp palettized hardware textures that are expanded to 16-bpp on the fly during texturing.)

One appealing non-caching approach is doing unlit texture-mapping in one pass, then lighting from the light map as a second pass, using the light map as an alpha texture.? In other words, the textured polygon is drawn first, with no lighting, then the light map is textured on top of the polygon, with the light map intensity used as an alpha value to determine how brightly to light each texel.? The hardware’s texture-mapping circuitry is used for both passes, so the lighting comes out perspective-correct and consistent under all viewing conditions, just as with the surface cache.? The lighting polygons don’t even have to match the texture polygons, so they can represent dynamically-changing lighting.? Two-pass lighting not only looks good, but has no memory footprint other than texture and light map storage, and provides level performance, because it’s not dependent on surface cache hit rate.? The primary downside to two-pass lighting is that it requires at least twice as much performance from the accelerator as single-pass drawing; the current crop of 3-D accelerators is not particularly fast, and few of them are up to the task of doing two passes at high resolution, although that will change soon.? Another potential problem is that some accelerators don’t implement true alpha blending.? Nonetheless, as accelerators get better, I expect two-pass (or three-or-more-pass, for adding splatters and the like by overlaying sprite polygons) drawing to be widely used.? I also expect Gouraud shading to be widely used; it’s easy to use and fast.? Also, speedier CPUs and accelerators will enable much more detailed geometry to be used, and the smaller polygons become, the better Gouraud shading looks compared to surface caching and two-pass lighting.

Our next engine at id Software will be oriented heavily toward hardware accelerators, and at this point it’s a toss-up whether we’ll use surface caching, Gouraud shading, or two-pass lighting.? I’ll keep you posted.

Drawing triangle models

I’ve spent a number of columns over the past year discussing how Quake works.? If you look closely, though, you’ll see that almost all of the information was about drawing the world--the static walls, floors, ceilings, and such.? There are several reasons for this, in particular that it’s hard to get a world renderer working well, and that the world is the base on which everything else is drawn.? However, moving entities, such as monsters, are essential to a useful game engine.? Traditionally, these have been done with sprites, but when we set out to build Quake, we knew that it was time to move on to polygon-based models (in the case of Quake, the models are composed of triangles).? We didn’t know exactly how we were going to make these models fast enough, though, and went through quite a bit of experimentation and learning in the process of doing so.? For the rest of this column I’ll discuss some interesting aspects of our triangle-model architecture, and present code for one useful approach for rapid drawing of triangle models.

Drawing triangle models fast

We would have liked to have had one rendering model, and hence one graphics pipeline, for all drawing in Quake; this would have simplified the code and tools, and would have made it much easier to focus our optimization efforts.? However, when we tried adding polygon models to Quake’s global edge table, edge processing slowed down unacceptably.? This isn’t that surprising, because the edge table was designed to handle 200-300 large polygons, not the 2000-3000 tiny triangles that a dozen triangle models in a scene can add.? Restructuring the edge list to use trees rather than linked lists would have helped with the larger data sets, but the basic problem is that the edge table requires a considerable amount of overhead per edge per scan line, and triangle models have too few pixels per edge to justify that overhead.? Also, the much larger edge table generated by adding triangle models doesn’t fit well in the CPU cache.

Consequently, we implemented a separate drawing pipeline for triangle models, as shown in Figure One.? Unlike the world pipeline, the triangle-model pipeline is in most respects a traditional one, with a few exceptions, noted below.? The entire world is drawn first, and then the triangle models are drawn, using z-buffering for proper visibility.? For each triangle model, all vertices are transformed and projected first, and then each triangle is drawn separately.

Figure One: Quake's triangle-model drawing pipeline.


Triangle models are stored quite differently from the world.? Each model consists of front and back skins stretched around a triangle mesh, and contains a full set of vertex coordinates for each animation frame, so animation is performed by simply using the correct set of coordinates for the desired frame.? No interpolation, morphing, or other runtime vertex calculations are performed.

Early on, we decided to allow lower drawing quality for triangle models than for the world, in the interests of speed.? For example, the triangles in the models are small, and usually distant--and generally part of a moving monster that’s trying its best to do you in--so the quality benefits of perspective texture mapping would add little value.? Consequently, we chose to draw the triangles with affine texture mapping, avoiding the work required for perspective.? Mind you, the models are perspective correct at the vertices; it’s just the pixels between the vertices that suffer slight warping.

Another sacrifice at the altar of performance was subpixel precision.? Before each triangle is drawn, we snap its vertices to the nearest integer screen coordinates, rather than doing the extra calculations to handle fractional vertex coordinates.? This causes some jumping of triangle edges, but again, is not a problem in normal gameplay.? One interesting benefit of integer coordinates is that it lets us do backface culling and rejection of degenerate triangles in one operation, because the cross-product z component used for backface culling returns zero for degenerate triangles. ?Conveniently, that cross-product component is also the denominator for the lighting and texture gradients calculations used in drawing each triangle, so as soon as we check the cross-product z value and determine that the triangle is drawable, we immediately start the FDIV to calculate the reciprocal.? By the time we get around to calculating the gradients, the FDIV has completed, effectively taking only the one cycle required to issue it, because the integer execution pipes can process independently while FDIV executes.

Finally, we decided to Gouraud-shade triangle models, because this makes them look considerably more 3-D.? However, we can’t afford to calculate where all the relevant light sources for each model are in each frame, or even which is the primary light source.? Instead, we select each model’s lighting level based on the how brightly the floor point it was standing on is lit, and use that lighting level for both ambient lighting (so all parts of the model have some illumination) and Gouraud shading--but the lighting vector for Gouraud shading is a fixed vector, so the model is always lit from the same direction.?? Somewhat surprisingly, in practice this looks considerably better than pure ambient lighting.

An idea that didn’t work

As we implemented triangle models, we tried several ideas that didn’t work out.? One that’s notable because it seems so appealing is caching a model’s image from one frame and reusing it in the next frame as a sprite.? Our thinking was that clipping, transforming, projecting, and drawing a several-hundred-triangle model was going to be a lot more expensive than drawing a sprite, too expensive to allow very many models to be visible at once.? We wanted to be able to have at least a dozen simultaneous models, so the idea was that for all but the closest models, we’d draw into a sprite, then reuse that sprite at the model’s new locations for the next two or three frames, amortizing the 3-D drawing cost over several frames and boosting overall model-drawing performance.? The rendering wouldn’t be exactly right when the sprite was reused, because the view of the model would change from frame to frame as the viewer and model moved, but it didn’t seem likely that that slight inaccuracy be noticeable for any but the nearest and largest models.

As it turns out, though, we were wrong; the repeated frames were sometimes painfully visible, looking like jerky cardboard cutouts, in fact, a lot like the sprites used in DOOM--precisely the effect we were trying to avoid.? This was especially true if we reused them more than once--and if we reused them only once, then we had to do one full 3-D rendering plus two sprite renderings every two frames, which wasn’t much faster than just doing two 3-D renderings.? The sprite architecture also introduced considerable code complexity, increased memory footprint because of the need to cache the sprites, and made it difficult to get hidden surfaces exactly right because sprites are 2-D.? The performance of drawing the sprites dropped sharply as models got closer, and that’s also where the sprites looked worse when they were reused, limiting sprites to use at a considerable distance.? All these problems could have been worked out reasonably well if necessary, but the sprite architecture had the feeling of being fundamentally not the right approach, so we tried thinking along different lines.

An idea that did work

John Carmack had the notion that it was just way too much effort per pixel to do all the work of scanning out the tiny triangles in distant models.? After all, distant models are just indistinct blobs of pixels, suffering heavily from texture aliasing and pixel quantization, he reasoned, so it should work just as well if we could come up with another way of drawing blobs of approximately equal quality.? The trick was to come up with such an alternative approach.? We tossed around half-formed ideas like flood-filling the model’s image within its silhouette, or encoding the model as a set of deltas, picking a visible seed point, and working around the visible side of the model according to the deltas.? The first approach that seemed practical enough to try was drawing the pixel at each vertex replicated to form a 2x2 box, with all the vertices together forming the approximate shape of the model.? Sometimes this worked quite well, but there were gaps where the triangles were large, and the quality was very erratic.? However, it did point the way to something that did the trick.

One morning I came in, to find that overnight (and well into the morning), John had designed and implemented a technique I’ll call subdivision rasterization, which scans out approximately the right pixels for each triangle, with almost no overhead, as follows.? First, all vertices in the model are drawn.? Ideally, only the vertices on the visible side of the model would be drawn, but determining which those are would take time, and the occasional error from a visible back vertex is lost in the noise.

Once the vertices are drawn, the triangles are processed one at a time.? Each triangle that makes it through backface culling is then drawn with recursive subdivision.? If any of the triangle’s sides is more than one pixel long in either x or y--that is, if the triangle contains any pixels that aren’t at vertices--then that side is split in half as nearly as possible given integer coordinates, and a new vertex is created at the split, with texture and screen coordinates that are halfway between those of the vertices at the endpoints.? (The same splitting could be done for lighting, but we found that for small triangles--the sort that subdivision works well on--it was adequate to flat-shade each triangle at the light level of the first vertex, so we didn’t bother with Gouraud shading.)? The halfway values can be calculated very quickly with shifts.? This vertex is drawn, and then each of the two resulting triangles is then processed recursively in the same way, as shown in Figure Two.? There are some details, such as the fill rule that ensures that each pixel is drawn only once (except for backside vertices, as noted above), but basically subdivision rasterization boils down to taking a triangle, splitting a side that has at least one undrawn pixel and drawing the vertex at the split, and repeating the process for each of the two new triangles.? The code to do this, shown in Listing One, is very simple and easily optimized, especially by comparison with a general triangle rasterizer.

Figure Two: One recursive subdivision triangle-drawing step.

Subdivision rasterization introduces considerably more error than affine texture mapping, and doesn’t draw exactly the right triangle shape, but the difference is very hard to detect for triangles that contain only a few pixels.? We found that the point at which the difference between the two rasterizers becomes noticeable was surprisingly close:? 30 or 40 feet for the Ogres, and about 12 feet for the Zombies.? This means that most of the triangle models that are visible in a typical Quake scene are drawn with subdivision rasterization, not affine texture mapping.

How much does subdivision rasterization help performance?? When John originally implemented it, it more than doubled triangle-model drawing speed, because the affine texture mapper was not yet optimized.? However, I took it upon myself to see how fast I could make the mapper, so now affine texture mapping is only about 20% slower than subdivision rasterization.? While 20% may not sound impressive, it includes clipping, transform, projection, and backface-culling time, so the rasterization difference alone is more than 50%.? Besides, 20% overall means that we can have 12 monsters where we could only have had 10 before, so we count subdivision rasterization as a clear success.

Some more ideas that might work

Useful as subdivision rasterization proved to be, we by no means think that we’ve maxed out triangle model drawing, if only because we spent far less design and development time on subdivision than on the affine rasterizer, so it’s likely that there’s quite a bit more performance to be found for drawing small triangles.? For example, it could be faster to precalculate drawing masks or even precompile drawing code for all possible small triangles (say, up to 4x4 or 5x5), and the memory footprint looks reasonable.? (It’s worth noting that both precalculated drawing and subdivision rasterization are only possible because we snap to integer coordinates; none of this stuff works with fixed-point vertices.)

More interesting still is the stack-based rendering described in the article “Time/Space Tradeoffs for Polygon Mesh Rendering,” by Bar-Yehuda and Gotsman, in the April, 1996 ACM Transactions on Graphics.? Unfortunately, the article is highly abstract and slow going, but the bottom line is that it’s possible to represent a triangle mesh as a stream of commands that place vertices in a stack, remove them from the stack, and draw triangles using the vertices in the stack.? The result is similar to a tristrip, but with excellent CPU cache coherency, because rather than indirecting all over a vertex pool to retrieve vertex data, all vertices reside in a tiny stack that’s guaranteed to be in the cache.? Local variables used while drawing can be stored in a small block next to the stack, and the stream of commands representing the model is accessed sequentially from start to finish, so cache utilization should be very high.? As processors speed up at a much faster rate than main memory access, cache optimizations of this sort will become steadily more important in improving performance.

As with so many aspects of 3-D, there is no one best approach to drawing triangle models, and no such thing as the fastest code.? In a way, that’s frustrating, but the truth is, it’s these nearly-infinite possibilities that make 3-D so interesting; not only is it an endless varied challenge, but there’s almost always a better solution waiting to be found.
원본 사이트 : http://www.bluesnews.com/abrash/contents.shtml

Quake's Lighting Model:? Surface Caching

by Michael Abrash

It was during my senior year in college that I discovered computer games.? Not Wizardry, or Choplifter, or Ultima, because none of those existed yet--the game that hooked me was the original Star Trek game, in which you navigated from one 8x8 quadrant to another in search of starbases, occasionally firing phasers or photon torpedoes.? This was less exciting than it sounds; after each move, the current quadrant had to be reprinted from scratch, along with the current stats--and the output device was a 10 cps printball console. ?A typical game took over an hour, during which nothing particularly stimulating ever happened (Klingons appeared periodically, but they politely waited for your next move before attacking, and your photon torpedoes never missed, so the outcome was never in doubt), but none of that mattered; nothing could detract from the sheer thrill of being in a computer-simulated universe.

Then the college got a PDP-11 with four CRT terminals, and suddenly Star Trek could redraw in a second instead of a minute.? Better yet, I found the source code for the Star Trek program in the recesses of the new system, the first time I'd ever seen any real-world code other than my own, and excitedly dove into it.? One evening, as I was looking through the code, a really cute girl at the next terminal asked me for help getting a program to run.? After I had helped her, eager to get to know her better, I said, "Want to see something?? This is the actual source for the Star Trek game!" and proceeded to page through the code, describing each subroutine.? We got to talking, and eventually I worked up the nerve to ask her out.? She said sure, and we ended up having a good time, although things soon fell apart because of her two or three other boyfriends (I never did get an exact count).? The interesting thing, though, was her response when I finally got around to asking her out.? She said, "It's about time!"? When I asked what she meant, she said, "I've been trying to get you to ask me out all evening--but it took you forever!? You didn't actually think I was interested in that Star Trek program, did you?"

Actually, yes, I had thought that, because I was interested in it.? One thing I learned from that experience, and have had reinforced countless times since, is that we--you, me, anyone who programs because they love it, who would do it for free if necessary--are a breed apart.? We're different, and luckily so; while everyone else is worrying about downsizing, we're in one of the hottest industries in the world.? And, so far as I can see, the biggest reason we’re in such a good situation isn't intelligence, or hard work, or education, although those help; it's that we actually like this stuff.

It's important to keep it that way.? I've seen far too many people start to treat programming like a job, forgetting the joy of doing it, and burn out.? So keep an eye on how you feel about the programming you're doing, and if it's getting stale, it's time to learn something new; there's plenty of interesting programming of all sorts to be done.? Follow your interests--and don't forget to have fun!


As I've mentioned in previous columns, I've spent the last year and a half working with John Carmack on Quake's 3-D graphics engine.? John faced several fundamental design issues while architecting Quake.? I've written in past columns about some of those issues, including eliminating non-visible polygons quickly via a precalculated potentially visible set (PVS), and improving performance by inserting potentially visible polygons into a global edge list and scanning out only the nearest polygon at each pixel.

For the rest of this column, I'm going to talk about another, equally crucial design issue:? how we developed our lighting approach for the part of the Quake engine that draws the world itself, the static walls and floors and ceilings.? Monsters and players are drawn using completely different rendering code, with speed the overriding factor.? A primary goal for the world, on the other hand, was to be as precise as possible, getting everything right so that polygons, textures, and sophisticated lighting would be pegged in place, with no visible shifting or distortion under all viewing conditions, for maximum player immersion--all with good performance, of course.? As I’ll discuss, the twin goals of performance and rock-solid, complex lighting proved to be difficult to achieve with traditional lighting approaches; ultimately, a dramatically different approach was required.

Gouraud shading

The traditional way to do realistic lighting in polygon pipelines is Gouraud shading (also known as smooth shading).? Gouraud shading involves generating a lighting value at each polygon vertex by applying all relevant world lighting, linearly interpolating between lighting values down the edges of the polygon, and then linearly interpolating between the edges of polygon across each span.? If texture mapping is desired (all polygons are texture mapped in Quake), then at each pixel in each span, the pixel's corresponding texture map location (texel) is determined, and the interpolated lighting is applied to the texel to generate a final, lit pixel.? Texels are generally taken from a 32x32 or 64x64 texture that's tiled repeatedly across the polygon, for several reasons:? Performance (a 64x64 texture sits nicely in the 486 or Pentium cache), database size, and less artwork.

The interpolated lighting can consist of either a color intensity value or three separate red, green, and blue values.? RGB lighting produces more sophisticated results, such as colored lights, but is slower and best suited to RGB modes.? Games like Quake that are targeted at palettized 256-color modes generally use intensity lighting; each pixel is lit by looking up the pixel color in a table, using the texel color and the lighting intensity as the look-up indices.

Gouraud shading allows for decent lighting effects with a relatively small amount of calculation and a compact data set that's a simple extension of the basic polygon model.? However, there are several important drawbacks to Gouraud shading, as well.

Problems with Gouraud shading

The quality of Gouraud shading depends heavily on the average size of the polygons being drawn.? Linear interpolation is used, so highlights can only occur at vertices, and color gradients are monotonic across the face of each polygon.? This can make for bland lighting effects if polygons are large, and makes it difficult to do spotlights and other detailed or dramatic lighting effects.? After John brought the initial, primitive Quake engine up using Gouraud shading for lighting, the first thing he tried to improve lighting quality was adding a single vertex and creating new polygons wherever a spotlight was directly overhead a polygon, with the new vertex added directly underneath the light, as shown in Figure One.? This produced fairly attractive highlights, but simultaneously made evident several problems.

Figure One: Adding an extra vertex directly beneath a light.

A primary problem with Gouraud shading is that it requires the vertices used for world geometry to serve as lighting sample points as well, even though there isn't necessarily a close relationship between lighting and geometry.? This artificial coupling often forces the subdivision of a single polygon into several polygons purely for lighting reasons, as with the spotlights mentioned above; these extra polygons increase the world database size, and the extra transformations and projections that they induce can harm performance considerably.

Similar problems occur with overlapping lights, and with shadows, where additional polygons are required in order to approximate lighting detail well.? In particular, good shadow edges need small polygons, because otherwise the gradient between light and dark gets spread across too wide an area.? Worse still, the rate of lighting change across a shadow edge can vary considerably as a function of the geometry the edge crosses; wider polygons stretch and diffuse the transition between light and shadow.? A related problem is that lighting discontinuities can be very visible at t-junctions (although ultimately we had to add edges to eliminate t-junctions anyway, because otherwise dropouts can occur along polygon edges).? These problems can be eased by adding extra edges, but that increases the rasterization load.

Another problem is that Gouraud shading isn't perspective correct.? With Gouraud shading, lighting varies linearly across the face of a polygon, in equal increments per pixel--but unless the polygon is parallel to the screen, the same sort of perspective correction is needed to step lighting across the polygon properly as is required for texture mapping.? Lack of perspective correction is not as visibly wrong for lighting as it is for texture mapping, because smooth lighting gradients can tolerate considerably more warping than can the detailed bitmapped images used in texture mapping, but it nonetheless shows up in several ways.

First, the extent of the mismatch between Gouraud shading and perspective lighting varies with the angle and orientation of the polygon being lit.? As a polygon turns to become more on-edge, for example, the lighting warps more and therefore shifts relative to the perspective-texture mapped texels it’s shading, an effect I'll call viewing variance.? Lighting can similarly shift as a result of clipping, for example if one or more polygon edges are completely clipped; I'll refer to this as clipping variance.

These are fairly subtle effects; more pronounced is the rotational variance that occurs when Gouraud shading any polygon with more than three vertices.? Consistent lighting for a polygon is fully defined by three lighting values; taking four or more vertices and interpolating between them, as Gouraud shading does, is basically a hack, and does not reflect any consistent underlying model.? If you view a Gouraud-shaded quad head-on, then rotate it like a pinwheel, the lighting will shift as the quad turns, as shown in Figure Two.? The extent of the lighting shift can be quite drastic, depending on how different the colors at the vertices are.

Figure Two: Gouraud shading varies with polygon screen orientation.

It was rotational variance that finally brought the lighting issue to a head for Quake.? We'd look at the floors, which were Gouraud-shaded quads; then we'd pivot, and the lighting would shimmy and shift, especially where there were spotlights and shadows.? Given the goal of rendering the world as accurately and convincingly as possible, this was unacceptable.

The obvious solution to rotational variance is to use only triangles, but that brings with it a new set of problems.? It takes twice as many triangles as quads to describe the same scene, increasing the size of the world database and requiring extra rasterization, at a performance cost.? Triangles still don't provide perspective lighting; their lighting is rotationally invariant, but it's still wrong--just more consistently wrong.? Gouraud-shaded triangles still result in odd lighting patterns, and require lots of triangles to support shadowing and other lighting detail.? Finally, triangles don't solve clipping or viewing variance.

Yet another problem is that while it may work well to add extra geometry so that spotlights and shadows show up well, that's feasible only for static lighting.? Dynamic lighting--light cast by sources that move--has to work with whatever geometry the world has to offer, because its needs are constantly changing.

These issues led us to conclude that if we were going to use Gouraud shading, we would have to build Quake levels from many small triangles, with sufficiently finely-detailed geometry so that complex lighting could be supported and the inaccuracies of Gouraud shading wouldn't be too noticeable.? Unfortunately, that line of thinking brought us back to the problem of a much larger world database and a much heavier rasterization load (all the worse because Gouraud shading requires an additional interpolant, slowing the inner rasterization loop), so that not only would the world still be less than totally solid, because of the limitations of Gouraud shading, but the engine would also be too slow to support the complex worlds we had hoped for in Quake.

The quest for alternative lighting

None of which is to say that Gouraud shading isn't useful in general.? Descent uses it to excellent effect, and in fact Quake uses Gouraud shading for moving entities, because these consist of small triangles and are always in motion, which helps hide the relatively small lighting errors.? However, Gouraud shading didn't seem capable of meeting our design goals for rendering quality and speed for drawing the world as a whole, so it was time to look for alternatives.

There are many alternative lighting approaches, most of them higher-quality than Gouraud, starting with Phong shading, in which the surface normal is interpolated across the polygon's surface, and going all the way up to ray-tracing lighting techniques in which full illumination calculations are performed for all direct and reflected paths from each light sources for each pixel.? What all these approaches have in common is that they're slower than Gouraud shading, too slow for our purposes in Quake.? For weeks, we kicked around and rejected various possibilities and continued working with Gouraud shading for lack of a better alternative--until the day John came into work and said, "You know, I have an idea..."

Decoupling lighting from rasterization

John's idea came to him while was looking at a wall that had been carved into several pieces because of a spotlight, with an ugly lighting glitch due to a t-junction.? He thought to himself that if only there were some way to treat it as one surface, it would look better and draw faster--and then he realized that there was a way to do that.

The insight was to split lighting and rasterization into two separate steps.? In a normal Gouraud-based rasterizer, there's first an off-line preprocessing step when the world database is built, during which polygons are added to support additional lighting detail as needed, and lighting values are calculated at the vertices of all polygons.? At runtime, the lighting values are modified if dynamic lighting is required, and then the polygons are drawn with Gouraud shading.

Quake’s approach, which I'll call surface-based lighting, preprocesses differently, and adds an extra rendering step.? During off-line preprocessing, a grid, called a light map, is calculated for each polygon in the world, with a lighting value every 16 texels horizontally and vertically.? This lighting is done by casting light from all the nearby lights in the world to each of the grid points on the polygon, and summing the results for each grid point.? The Quake preprocessor filters the values, so shadow edges don't have a stairstep appearance (a technique suggested by Billy Zelsnack); additional preprocessing could be done, for example Phong shading to make surfaces appear smoothly curved.? Then, at runtime, the polygon's texture is tiled into a buffer, with each texel lit according to the weighted average intensities of the four nearest light map points, as shown in Figure Three.? If dynamic lighting is needed, the light map is modified accordingly before the buffer, which I'll call a surface, is built.? Then the polygon is drawn with perspective texture mapping, with the surface serving as the input texture, and with no lighting performed during the texture mapping.

Figure Three: A surface is built by tiling the texture and lighting the texels from the light map.

So what does surface-based lighting buy us?? First and foremost, it provides consistent, perspective-correct lighting, eliminating all rotational, viewing, and clipping variance, because lighting is done in surface space rather than in screen space.? By lighting in surface space, we bind the lighting to the texels in an invariant way, and then the lighting gets a free ride through the perspective texture mapper and ends up perfectly matched to the texels.? Surface-based lighting? also supports good, although not perfect, detail for overlapping lights and shadows.? The 16-texel grid has a resolution of two feet in the Quake frame of reference, and this relatively fine resolution, together with the filtering performed when the light map is built, is sufficient to support complex shadows with smoothly fading edges.? Additionally, surface-based lighting eliminates lighting glitches at t-junctions, because lighting is unrelated to vertices.? In short, surface-based lighting meets all of Quake's visual quality goals, which leaves only one question:? How does it perform?

Size and speed

As it turns out, the raw speed of surface-based lighting is pretty good.? Although an extra step is required to build the surface, moving lighting and tiling into a separate loop from texture mapping allows each of the two loops to be optimized very effectively, with almost all variables kept in registers.? The surface-building inner loop is particular efficient, because it consists of nothing more than interpolating intensity, combining it with a texel and using the result to look up a lit texel color, and storing the results with a dword write every four texels.? In assembly language, we've gotten this code down to 2.25 cycles per lit texel in Quake.? Similarly, the texture-mapping inner loop, which overlaps an FDIV for floating-point perspective correction with integer pixel drawing in 16-pixel bursts, has been squeezed down to 7.5 cycles per pixel on a Pentium, so the combined inner loop times for building and drawing a surface is roughly in the neighborhood of 10 cycles per pixel.? It's certainly possible to write a Gouraud-shaded perspective-correct texture mapper that's somewhat faster than 10 cycles, but 10 cycles/pixel is fast enough to do 40 frames/second at 640x400 on a Pentium/100, so the cycle counts of surface-based lighting are acceptable.? It's worth noting that it's possible to write a one-pass texture mapper that does approximately perspective-correct lighting.? However, I have yet to hear of or devise such an inner loop that isn't complicated and full of special cases, which makes it hard to optimize; worse, this approach doesn't work well with the procedural and post-processing techniques I'll discuss shortly.

Moreover, surface-based lighting tends to spend more of its time in inner loops, because polygons can have any number of sides and don't need to be split into multiple smaller polygons for lighting purposes; this reduces the amount of transformation and projection that are required, and makes polygon spans longer.? So the performance of surface-based lighting stacks up very well indeed--except for caching.

I mentioned earlier that a 64x64 texture tile fits nicely in the processor cache.? A typical surface doesn't.? Every texel in every surface is unique, so even at 320x200 resolution, something on the rough order of 64,000 texels must be read in order to draw a single scene.? (The number actually varies quite a bit, as discussed below, but 64,000 is in the ballpark.)? This means that on a Pentium, we're guaranteed to miss the cache once every 32 texels, and the number can be considerably worse than that if the texture access patterns are such that we don't use every texel in a given cache line before that data gets thrown out of the cache.? Then, too, when a surface is built, the surface buffer won't be in the cache, so the writes will be uncached writes that have to go to main memory, then get read back from main memory at texture mapping time, potentially slowing things further still.? All this together makes the combination of surface building and unlit texture mapping a potential performance problem, but that never posed a problem during the development of Quake, thanks to surface caching.

Surface caching

When he thought of surface-based lighting, John immediately realized that surface building would be relatively expensive.? (In fact, he assumed it would be considerably more expensive than it actually turned out to be with full assembly-language optimization.)? Consequently, his design included the concept of caching surfaces, so that if the same surface was visible in the next frame, it could be reused without having to be rebuilt.

With surface rebuilding needed only rarely, thanks to surface caching, Quake's rasterization speed is generally the speed of the unlit, perspective-correct texture-mapping inner loop, which suffers from more cache misses than Gouraud-shaded, tiled texture mapping, but doesn't have the overhead of Gouraud shading, and allows the use of larger polygons.? In the worst case, where everything in a frame is a new surface, the speed of the surface-caching approach is somewhat slower than Gouraud shading, but generally surface caching provides equal or better performance, so once surface caching was implemented in Quake, performance was no longer a problem--but size became a concern.

The amount of memory required for surface caching looked forbidding at first.? Surfaces are large relative to texture tiles, because every texel of every surface is unique.? Also, a surface can contain many texels relative to the number of pixels actually drawn on the screen, because due to perspective foreshortening, distant polygons have only a few pixels relative to the surface size in texels.? Surfaces associated with partly hidden polygons must be fully built, even though only part of the polygon is visible, and if polygons are drawn back to front with overdraw, some polygons won't even be visible, but will still require surface building and caching.? What all this meant was that the surface cache initially looked to be very large, on the order of several megabytes, even at 320x200--too much for a game intended to run on an 8 Mb machine.

Mipmapping to the rescue

Two factors combined to solve this problem.? First, polygons are drawn through an edge list with no overdraw, as discussed a few columns back, so no surface is ever built unless at least part of it is visible.? Second, surfaces are built at four mipmap levels, depending on distance, with each mipmap level having one-quarter as many texels as the preceding level, as shown in Figure Four.? The mipmap level for a given surface is selected to result in a texel:pixel ratio approximately between 1:1 and 1:2, so texels map roughly to pixels, and more distant surfaces are correspondingly smaller.? As a result, the number of surface texels required to draw a scene at 320x200 is on the rough order of 64,000; the number is actually somewhat higher, because of portions of surfaces that are obscured and viewspace-tilted polygons, which have high texel-to-pixel ratios along one axis, but not a whole lot higher.? Thanks to mipmapping and the edge list, 600K has proven to be plenty for the surface cache at 320x200, even in the most complex scenes, and at 640x480, a little more than 1 Mb suffices.

Figure Four: Each texel at a given mipmap level corresponds to four texels at the preceding mipmap level.

All mipmapped texture tiles are generated as a preprocessing step, and loaded from disk at runtime.? One interesting point is that a key to making mipmapping look good turned out to be box-filtering down from one level to the next by averaging four adjacent pixels, then using error diffusion dithering to generate the mipmapped texels.

Also, mipmapping is done on a per-surface basis; the mipmap level for a whole surface is selected based on the distance from the viewer of the nearest vertex.? This led us to limit surface size to a maximum of 256x256.? Otherwise, surfaces such as floors would extend for thousands of texels, all at the mipmap level of the nearest vertex, and would require huge amounts of surface cache space while displaying a great deal of aliasing in distant regions due to a high texel:pixel ratio.

One final issue with surface caching involves 3-D hardware accelerators.? Surfaces are effectively large textures (and larger at the mipmap levels typically used at the high resolutions of accelerators than they are at 320x200), and texture memory tends to be a limited resource on accelerators.? Worse, accelerators are built for 16- or 32-bpp graphics, and surfaces are twice as large at 16-bpp as they are at 8-bpp, and correspondingly slower to build.? Although the edge list can still be used to cull invisible polygons, it's nonetheless true that a surface cache around 2 Mb is best on a hardware accelerator.

The first generation of accelerators was originally designed for 2 Mb of RAM, which would have been a squeeze, but plummeting memory prices seem to have solved the problem; 4 Mb is fast becoming the standard.? And given sufficient memory, surface caching runs at about the same speed on accelerators as Gouraud shading (slower because of building and downloading surfaces, but faster because of fewer, larger polygons), and still offers the same advantage as in software:? detailed and consistently correct lighting.

Two final notes on surface caching

Dynamic lighting has a significant impact on the performance of surface caching, because whenever the lighting on a surface changes, the surface has to be rebuilt.? In the worst case, where the lighting changes on every visible surface, the surface cache provides no benefit, and rendering runs at the combined speed of surface building and texture mapping.? This worst-case slowdown is tolerable but certainly noticeable, so it's best to design games that uses surface caching so only some of the surfaces change lighting at any one time.? If necessary, you could alternate surface relighting so that half of the surfaces change on even frames, and half on odd frames, but large-scale, constant relighting is not surface caching's strongest suit.

Finally, Quake barely begins to tap surface caching's potential.? All sorts of procedural texturing and post-processing effects are possible.? If a wall is shot, a sprite of pockmarks could be attached to the wall's data structure, and the sprite could be drawn into the surface each time the surface is rebuilt.? The same could be done for splatters, or graffiti, with translucency easily supported.? These effects would then be cached and drawn as part of the surface, so the performance cost would be much less than effects done by on-screen overdraw every frame.? Basically, the surface is a handy repository for all sorts of effects, because multiple techniques can be composited, because it caches the results for reuse without rebuilding, and because the texels constructed in a surface are automatically drawn in perspective.
원본 사이트 : http://www.bluesnews.com/abrash/contents.shtml

Sorted Spans in Action

by Michael Abrash

Last time, we dove headlong into the intricacies of hidden surface removal by way of z-sorted (actually, 1/z-sorted) spans. At the end, I noted that we were currently using 1/z-sorted spans in Quake, but it was unclear whether we’d switch back to BSP order. Well, it’s clear now: We’re back to sorting spans by BSP order.

In Robert A. Heinlein’s wonderful story “The Man Who Sold the Moon,” the chief engineer of the Moon rocket project tries to figure out how to get a payload of three astronauts to the Moon and back. He starts out with a four-stage rocket design, but finds that it won’t do the job, so he adds a fifth stage. The fifth stage helps, but not quite enough, “Because,” he explains, “I’ve had to add in too much dead weight, that’s why.” (The dead weight is the control and safety equipment that goes with the fifth stage.) He then tries adding yet another stage, only to find that the sixth stage actually results in a net slowdown. In the end, he has to give up on the three-person design and build a one-person spacecraft instead.

1/z-sorted spans in Quake turned out pretty much the same way, as we’ll see in a moment. First, though, I’d like to note up front that this column is very technical and builds heavily on previously-covered material; reading the last column is strongly recommended, and reading the six columns before that, which cover BSP trees, 3-D clipping, and 3-D math, might be a good idea as well. I regret that I can’t make this column stand completely on its own, but the truth is that commercial-quality 3-D graphics programming requires vastly more knowledge and code than did the 2-D graphics I’ve written about in years past. And make no mistake about it, this is commercial quality stuff; in fact, the code in this column uses the same sorting technique as the test version of Quake, qtest1.zip, that we just last week placed on the Internet. These columns are the Real McCoy, reports from the leading edge, and I trust that you’ll be patient if careful rereading and some catch-up reading of prior columns are required to absorb everything contained herein. Besides, the ultimate reference for any design is working code, which you’ll find in part in Listing 1 and in its entirety in ftp.idsoftware.com/mikeab/ddjzsort.zip.

Quake and sorted spans

As you’ll recall from last time, Quake uses sorted spans to get zero overdraw while rendering the world, thereby both improving overall performance and leveling frame rates by speeding up scenes that would otherwise experience heavy overdraw. Our original design used spans sorted by BSP order; because we traverse the world BSP tree from front to back relative to the viewpoint, the order in which BSP nodes are visited is a guaranteed front to back sorting order. We simply gave each node a increasing BSP sequence number as it was visited, set each polygon’s sort key to the BSP sequence number of the node (BSP splitting plane) it lay on, and used those sort keys when generating spans.

(In a change from earlier designs, polygons now are stored on nodes, rather than leaves, which are the convex subspaces carved out by the BSP tree. Visits to potentially-visible leaves are used only to mark that the polygons that touch those leaves are visible and need to be drawn, and each marked-visible polygon is then drawn after everything in front of its node has been drawn. This results in less BSP splitting of polygons, which is A Good Thing, as explained below.)

This worked flawlessly for the world, but had a couple of downsides. First, it didn’t address the issue of sorting small, moving BSP models such as doors; those models could be clipped into the world BSP tree’s leaves and assigned sort keys corresponding to the leaves into which they fell, but there was still the question of how to sort multiple BSP models in the same world leaf against each other. Second, strict BSP order requires that polygons be split so that every polygon falls entirely within a single leaf. This can be stretched by putting polygons on nodes, allowing for larger polygons on average, but even then, polygons still need to be split so that every polygon falls within the bounding volume for the node on which it lies. The end result, in either case, is more and smaller polygons than if BSP order weren’t used--and that, in turn, means lower performance, because more polygons must be clipped, transformed, and projected, more sorting must be done, and more spans must be drawn.

We figured that if only we could avoid those BSP splits, Quake would get a lot faster. Accordingly, we switched from sorting on BSP order to sorting on 1/z, and left our polygons unsplit. Things did get faster at first, but not as much as we had expected, for two reasons.

First, as the world BSP tree is descended, we clip each node’s bounding box in turn to see if it’s inside or outside each plane of the view frustum. The clipping results can be remembered, and often allow the avoidance of some or all clipping for the node’s polygons. For example, all polygons in a node that has a trivially accepted bounding box are likewise guaranteed to be unclipped and in the frustum, since they all lie within the node’s volume, and need no further clipping. This efficient clipping mechanism vanished as soon as we stepped out of BSP order, because a polygon was no longer necessarily confined to its node’s volume.

Second, sorting on 1/z isn’t as cheap as sorting on BSP order, because floating-point calculations and comparisons are involved, rather than integer compares. So Quake got faster, but, like Heinlein’s fifth rocket stage, there was clear evidence of diminishing returns.

That wasn’t the bad part; after all, even a small speed increase is a good thing. The real problem was that our initial 1/z sorting proved to be unreliable. We first ran into problems when two forward-facing polygons started at a common edge, because it was hard to tell which one was really in front (as discussed below), and we had to do additional floating-point calculations to resolve these cases. This fixed the problems for a while, but then odd cases started popping up where just the right combination of polygon alignments caused new sorting errors. We tinkered with those too, adding more code and incurring additional slowdowns in the process. Finally, we had everything working smoothly again, although by this point Quake was back to pretty much the same speed it had been with BSP sorting.

And then yet another crop of sorting errors popped up.

We could have fixed those errors too; we’ll take a quick look at how to deal with such cases shortly. However, like the sixth rocket stage, the fixes would have made Quake slower than it had been with BSP sorting. So we gave up and went back to BSP order, and now the code is simpler and sorting works reliably. It’s too bad our experiment didn’t work out, but it wasn’t wasted time, because we learned quite a bit. In particular, we learned that the information provided by a simple, reliable world ordering mechanism such as a BSP tree can do more good than is immediately apparent, in terms of both performance and solid code.

Nonetheless, sorting on 1/z can be a valuable tool, used in the right context; drawing a Quake world just doesn’t happen to be such a case. In fact, sorting on 1/z is how we’re now handling the sorting of multiple BSP models that lie within the same world leaf in Quake; here we don’t have the option of using BSP order (because we’re drawing multiple independent trees), so we’ve set restrictions on the BSP models to avoid running into the types of 1/z sorting errors we encountered drawing the Quake world. Below, we’ll look at another application in which sorting on 1/z is quite useful, one where objects move freely through space. As is so often the case in 3-D, there is no one ”right” technique, but rather a great many different techniques, each one handy in the right situations. Often, a combination of techniques is beneficial, as for example the combination in Quake of BSP sorting for the world and 1/z sorting for BSP models in the same world leaf.

For the remainder of this column, I’m going to look at the three main types of 1/z span sorting, then discuss a sample 3-D app built around 1/z span sorting.

Types of 1/z span sorting

As a quick refresher, with 1/z span sorting, all the polygons in a scene are treated as sets of screenspace pixel spans, and 1/z (where z is distance from the viewpoint in viewspace, as measured along the viewplane normal) is used to sort the spans so that the nearest span overlapping each pixel is drawn. As discussed last time, in the sample program we’re actually going to do all our sorting with polygon edges, which represent spans in an implicit form.

There are three types of 1/z span sorting, each requiring a different implementation. In order of increasing speed and decreasing complexity, they are: intersecting, abutting, and independent. (These are names of my own devising; I haven’t come across any standard nomenclature.)

Intersecting span sorting

Intersecting span sorting occurs when polygons can interpenetrate. Thus, two spans may cross such that part of each span is visible, in which case the spans have to be split and drawn appropriately, as shown in Figure 1.

Figure 1: Intersecting span sorting. Polygons A and B are viewed from above.


Intersecting is the slowest and most complicated type of span sorting, because it is necessary to compare 1/z values at two points in order to detect interpenetration, and additional work must be done to split the spans as necessary. Thus, although intersecting span sorting certainly works, it’s not the first choice for performance.

Abutting span sorting

Abutting span sorting occurs when polygons that are not part of a continuous surface can butt up against each other, but don’t interpenetrate, as shown in Figure 2. This is the sorting used in Quake, where objects like doors often abut walls and floors, and turns out to be more complicated than you might think. The problem is that when an abutting polygon starts on a given scan line, as with polygon B in Figure 2, it starts at exactly the same 1/z value as the polygon it abuts, in this case, polygon A, so additional sorting is needed when these ties happen. Of course, the two-point sorting used for intersecting polygons would work, but we’d like to find something faster.

Figure 2: Abutting span sorting. Polygons A and B are viewed from above.

As it turns out, the additional sorting for abutting polygons is actually quite simple; whichever polygon has a greater 1/z gradient with respect to screen x (that is, whichever polygon is heading fastest toward the viewer along the scan line) is the front one. The hard part is identifying when ties--that is, abutting polygons--occur; due to floating-point imprecision, as well as fixed-point edge-stepping imprecision that can move an edge slightly on the screen, calculations of 1/z from the combination of screen coordinates and 1/z gradients (as discussed last time) can be slightly off, so most tie cases will show up as near matches, not exact matches. This imprecision makes it necessary to perform two comparisons, one with an adjust-up by a small epsilon and one with an adjust-down, creating a range in which near-matches are considered matches. Fine-tuning this epsilon to catch all ties without falsely reporting close-but-not-abutting edges as ties proved to be troublesome in Quake, and the epsilon calculations and extra comparisons slowed things down.

I do think that abutting 1/z span sorting could have been made reliable enough for production use in Quake, were it not that we share edges between adjacent polygons in Quake, so that the world is a large polygon mesh. When a polygon ends and is followed by an adjacent polygon that shares the edge that just ended, we simply assume that the adjacent polygon sorts relative to other active polygons in the same place as the one that ended (because the mesh is continuous and there’s no interpenetration), rather than doing a 1/z sort from scratch. This speeds things up by saving a lot of sorting, but it means that if there is a sorting error, a whole string of adjacent polygons can be sorting incorrectly, pulled in by the one missorted polygon. Missorting is a very real hazard when a polygon is very nearly perpendicular to the screen, so that the 1/z calculations push the limits of numeric precision, especially in single-precision floating point.

Many caching schemes are possible with abutting span sorting, because any given pair of polygons, being noninterpenetrating, will sort in the same order throughout a scene. However, in Quake at least, the benefits of caching sort results were outweighed by the additional overhead of maintaining the caching information, and every caching variant we tried actually slowed Quake down.

Independent span sorting

Finally, we come to independent span sorting, the simplest and fastest of the three, and the type the sample code in Listing 1 uses. Here, polygons never intersect or touch any other polygons except adjacent polygons with which they form a continuous mesh. This means that when a polygon starts on a scan line, a single 1/z comparison between that polygon and the polygons it overlaps on the screen is guaranteed to produce correct sorting, with no extra calculations or tricky cases to worry about.

Independent span sorting is ideal for scenes with lots of moving objects that never actually touch each other, such as a space battle. Next, we’ll look at an implementation of independent 1/z span sorting.

1/z span sorting in action

Listing 1 is a portion of a program that demonstrates independent 1/z span sorting. This program is based on the sample 3-D clipping program from the March column; however, the earlier program did hidden surface removal (HSR) by simply z-sorting whole objects and drawing them back to front, while Listing 1 draws all polygons by way of a 1/z-sorted edge list. Consequently, where the earlier program worked only so long as object centers correctly described sorting order, Listing 1 works properly for all combinations of non-intersecting and non-abutting polygons. In particular, Listing 1 correctly handles concave polyhedra; a new L-shaped object (the data for which is not included in Listing 1) has been added to the sample program to illustrate this capability. The ability to handle complex shapes makes Listing 1 vastly more useful for real-world applications than the earlier 3-D clipping demo.

By the same token, Listing 1 is quite a bit more complicated than the earlier code. The earlier code’s HSR consisted of a z-sort of objects, followed by the drawing of the objects in back-to-front order, one polygon at a time. Apart from the simple object sorter, all that was needed was backface culling and a polygon rasterizer.

Listing 1 replaces this simple pipeline with a three-stage HSR process. After backface culling, the edges of each of the polygons in the scene are added to the global edge list, by way of AddPolygonEdges(). After all edges have been added, the edges are turned into spans by ScanEdges(), with each pixel on the screen being covered by one and only one span (that is, there’s no overdraw). Once all the spans have been generated, they’re drawn by DrawSpans(), and rasterization is complete.

There’s nothing tricky about AddPolygonEdges(), and DrawSpans(), as implemented in Listing 1, is very straightforward as well. In an implementation that supported texture mapping, however, all the spans wouldn’t be put on one global span list and drawn at once, as is done in Listing 1, because that would result in drawing spans from all the surfaces in no particular order. (A surface is a drawing object that’s originally described by a polygon, but in ScanEdges() there is no polygon in the classic sense of a set of vertices bounding an area, but rather just a set of edges and a surface that describes how to draw the spans outlined by those edges.) That would mean constantly skipping from one texture to another, which in turn would hurt processor cache coherency a great deal, and would also incur considerable overhead in setting up gradient and perspective calculations each time a surface was drawn. In Quake, we have a linked list of spans hanging off each surface, and draw all the spans for one surface before moving on to the next surface.

The core of Listing 1, and the most complex aspect of 1/z-sorted spans, is ScanEdges(), where the global edge list is converted into a set of spans describing the nearest surface at each pixel. This process is actually pretty simple, though, if you think of it as follows.

For each scan line, there is a set of active edges, those edges that intersect the scan line. A good part of ScanEdges() is dedicated to adding any edges that first appear on the current scan line (scan lines are processed from the top scan line on the screen to the bottom), removing edges that reach their bottom on the current scan line, and x-sorting the active edges so that the active edges for the next scan can be processed from left to right. All this is per-scan-line maintenance, and is basically just linked list insertion, deletion, and sorting.

The heart of the action is the loop in ScanEdges() that processes the edges on the current scan line from left to right, generating spans as needed. The best way to think of this loop is as a surface event processor, where each edge is an event with an associated surface. Each leading edge is an event marking the start of its surface on that scan line; if the surface is nearer than the current nearest surface, then a span ends for the nearest surface, and a span starts for the new surface. Each trailing edge is an event marking the end of its surface; if its surface is currently nearest, then a span ends for that surface, and a span starts for the next-nearest surface (the surface with the next-largest 1/z at the coordinate where the edge intersects the scan line). One handy aspect of this event-oriented processing is that leading and trailing edges do not need to be explicitly paired, because they are implicitly paired by pointing to the same surface. This saves the memory and time that would otherwise be needed to track edge pairs.

One more element is required in order for ScanEdges() to work efficiently. Each time a leading or trailing edge occurs, it must be determined whether its surface is nearest (at a larger 1/z value than any currently active surface); in addition, for leading edges, the currently topmost surface must be known, and for trailing edges, it may be necessary to know the currently next-to-topmost surface. The easiest way to accomplish this is with a surface stack; that is, a linked list of all currently active surfaces, starting with the nearest surface and progressing toward the farthest surface, which, as described below, is always the background surface. (The operation of this sort of edge event-based stack was described and illustrated in the May column.) Each leading edge causes its surface to be 1/z-sorted into the surface stack, with a span emitted if necessary. Each trailing edge causes its surface to be removed from the surface stack, again with a span emitted if necessary. As you can see from Listing 1, it takes a fair bit of code to implement this, but all that’s really going on is a surface stack driven by edge events.

Implementation notes

Finally, a few notes on Listing 1. First, you’ll notice that although we clip all polygons to the view frustum in worldspace, we nonetheless later clamp them to valid screen coordinates before adding them to the edge list. This catches any cases where arithmetic imprecision results in clipped polygon vertices that are a bit outside the frustum. I’ve only found such imprecision to be significant at very small z distances, so clamping would probably be unnecessary if there were a near clip plane, and might not even be needed in Listing 1, because of the slight nudge inward that we give the frustum planes, as described in the March column. However, my experience has consistently been that relying on worldspace or viewspace clipping to produce valid screen coordinates 100 percent of the time leads to sporadic and hard-to-debug errors.

There is no separate clear of the background in Listing 1. Instead, a special background surface at an effectively infinite distance is added, so whenever no polygons are active the background color is drawn. If desired, it’s a simple matter to flag the background surface and draw the background specially. For example, the background could be drawn as a starfield or a cloudy sky.

The edge-processing code in Listing 1 is fully capable of handling concave polygons as easily as convex polygons, and can handle an arbitrary number of vertices per polygon, as well. One change is need needed for the latter case: Storage for the maximum number of vertices per polygon must be allocated in the polygon structures. In a fully polished implementation, vertices would be linked together or pointed to, and would be allocated dynamically from a vertex pool, so each polygon wouldn’t have to contain enough space for the maximum possible number of vertices.

Each surface has a field named state, which is incremented when a leading edge for that surface is encountered, and decremented when a trailing edge is reached. A surface is activated by a leading edge only if state increments to 1, and is deactivated by a trailing edge only if state decrements to 0. This is another guard against arithmetic problems, in this case quantization during the conversion of vertex coordinates from floating point to fixed point. Due to this conversion, it is possible, although rare, for a polygon that is viewed nearly edge-on to have a trailing edge that occurs slightly before the corresponding leading edge, and the span-generation code will behave badly if it tries to emit a span for a surface that hasn’t started yet. It would help performance if this sort of fix-up could be eliminated by careful arithmetic, but I haven’t yet found a way to do so for 1/z-sorted spans.

Lastly, as discussed last time, Listing 1 uses the gradients for 1/z with respect to changes in screen x and y to calculate 1/z for active surfaces each time a leading edge needs to be sorted into the surface stack. The natural origin for gradient calculations is the center of the screen, which is (x,y) coordinate (0,0) in viewspace. However, when the gradients are calculated in AddPolygonEdges(), the origin value is calculated at the upper left corner of the screen. This is done so that screen x and y coordinates can be used directly to calculate 1/z, with no need to adjust the coordinates to be relative to the center of the screen. Also, the screen gradients grow more extreme as a polygon is viewed closer to edge-on. In order to keep the gradient calculations from becoming meaningless or generating errors, a small epsilon is applied to backface culling, so that polygons that are very nearly edge-on are culled. This calculation would be more accurate if it were based directly on the viewing angle, rather than on the dot product of a viewing ray to the polygon with the polygon normal, but that would require a square root, and in my experience the epsilon used in Listing 1 works fine.

Bretton Wade’s BSP Web page has moved

A while back, I mentioned that Bretton Wade was constructing a promising Web site on BSPs. He has moved that site, which has grown to contain a lot of useful information, to http://www.qualia.com/bspfaq/; alternatively, mail bspfaq@qualia.com with a subject line of “help”
원본 사이트 : http://www.bluesnews.com/abrash/contents.shtml

Consider the Alternatives:? Quake’s Hidden-Surface Removal

by Michael Abrash

Okay, I admit it:? I’m sick and tired of classic rock.? Admittedly, it’s been a while--about 20 years--since last I was excited to hear anything by the Cars or Boston, and I was never particularly excited in the first place about Bob Seger or Queen--to say nothing of Elvis--so some things haven’t changed.? But I knew something was up when I found myself changing the station on the Allman Brothers and Steely Dan and Pink Floyd and, God help me, the Beatles (just stuff like “Hello Goodbye” and “I’ll Cry Instead,” though, not “Ticket to Ride” or “A Day in the Life”; I’m not that far gone).? It didn’t take long to figure out what the problem was; I’d been hearing the same songs for a quarter-century, and I was bored.

I tell you this by way of explaining why it was that when my daughter and I drove back from dinner the other night, the radio in my car was tuned, for the first time ever, to a station whose slogan is “There is no alternative.”

Now, we’re talking here about a ten-year-old who worships the Beatles and has been raised on a steady diet of oldies.? She loves melodies, catchy songs, and good singers, none of which you’re likely to find on an alternative rock station.? So it’s no surprise that when I turned on the radio, the first word out of her mouth was “Yuck!”

What did surprise me was that after listening for a while, she said, “You know, Dad, it’s actually kind of interesting.”

Apart from giving me a clue as to what sort of music I can expect to hear blasting through our house when she’s a teenager, her quick uptake on alternative rock (versus my decades-long devotion to the music of my youth) reminded me of something that it’s easy to forget as we become older and more set in our ways.? It reminded me that it’s essential to keep an open mind, and to be willing--better yet, eager--to try new things.? Programmers tend to become attached to familiar approaches, and are inclined to stick with whatever is currently doing the job adequately well, but in programming there are always alternatives, and I’ve found that they’re often worth considering.

Not that I should have needed any reminding, considering the ever-evolving nature of Quake.

Creative flux

Back in January, I described the creative flux that led to John Carmack’s decision to use a precalculated potentially visible set (PVS) of polygons for each possible viewpoint in Quake, the game we’re developing at id Software.? The precalculated PVS meant that instead of having to spend a lot of time searching through the world database to find out which polygons were visible from the current viewpoint, we could simply draw all the polygons in the PVS from back to front (getting the ordering courtesy of the world BSP tree; check out the May, July, and November 1995 columns for a discussion of BSP trees), and get the correct scene drawn with no searching at all, letting the back-to-front drawing perform the final stage of hidden-surface removal (HSR).? This was a terrific idea, but it was far from the end of the road for Quake’s design.

Drawing moving objects

For one thing, there was still the question of how to sort and draw moving objects properly; in fact, this is the question I’ve been asked most often since the January column came out, so I’ll take a moment to address it.? The primary problem is that a moving model can span multiple BSP leaves, with the leaves that are touched varying as the model moves; that, together with the possibility of multiple models in one leaf, means there’s no easy way to use BSP order to draw the models in correctly sorted order.? When I wrote the January column, we were drawing sprites (such as explosions), moveable BSP models (such as doors), and polygon models (such as monsters) by clipping each into all the leaves it touched, then drawing the appropriate parts as each BSP leaf was reached in back-to-front traversal.? However, this didn’t solve the issue of sorting multiple moving models in a single leaf against each other, and also left some ugly sorting problems with complex polygon models.

John solved the sorting issue for sprites and polygon models in a startlingly low-tech way:? We now z-buffer them.? (That is, before we draw each pixel, we compare its distance, or z, value with the z value of the pixel currently on the screen, drawing only if the new pixel is nearer than the current one.)? First, we draw the basic world--walls, ceilings, and the like.? No z-buffer testing is involved at this point (the world visible surface determination is done in a different way, as we’ll see soon); however, we do fill the z-buffer with the z values (actually, 1/z values, as discussed below) for all the world pixels.? Z-filling is a much faster process than z-buffering the entire world would be, because no reads or compares are involved, just writes of z values.? Once drawing and z-filling the world is done, we can simply draw the sprites and polygon models with z-buffering and get perfect sorting all around.

Whenever a z-buffer is involved, the questions inevitably are:? What’s the memory footprint, and what’s the performance impact?? Well, the memory footprint at 320x200 is 128K, not trivial but not a big deal for a game that

equires 8 Mb.? The performance impact is about 10% for z-filling the world, and roughly 20% (with lots of variation) for drawing sprites and polygon models.? In return, we get a perfectly sorted world, and also the ability to do additional effects, such as particle explosions and smoke, because the z-buffer lets us flawlessly sort such effects into the world.? All in all, the use of the z-buffer vastly improved the visual quality and flexibility of the Quake engine, and also simplified the code quite a bit, at an acceptable memory and performance cost.

Leveling and improving performance

As I said above, in the Quake architecture, the world itself is drawn first--without z-buffer reads or compares, but filling the z-buffer with the world polygons’ z values--and then the moving objects are drawn atop the world, using full z-buffering.? Thus far, I’ve discussed how to draw moving objects.? For the rest of this column, I’m going to talk about the other part of the drawing equation, how to draw the world itself, where the entire world is stored as a single BSP tree and never moves.

As you may recall from the January column, we’re concerned with both raw performance and level performance.? That is, we want the drawing code to run as fast as possible, but we also want the difference in drawing speed between the average scene and the slowest-drawing scene to be as small as possible.? It does little good to average 30 frames per second if 10% of the scenes draw at 5 fps, because the jerkiness in those scenes will be extremely obvious by comparison with the average scene, and highly objectionable.? It would be better to average 15 fps 100% of the time, even though the average drawing speed is only half as much.

The precalculated PVS was an important step toward both faster and more level performance, because it eliminated the need to identify visible polygons, a relatively slow step that tended to be at its worst in the most complex scenes.? Nonetheless, in some spots in real game levels the precalculated PVS contains five times more polygons than are actually visible; together with the back-to-front HSR approach, this created hot spots in which the frame rate bogged down visibly as hundreds of polygons are drawn back to front, most of those immediately getting overdrawn by nearer polygons.? Raw performance in general was also reduced by the typical 50% overdraw resulting from drawing everything in the PVS.? So, although drawing the PVS back to front as the final HSR stage worked and was an improvement over previous designs, it was not ideal.? Surely, John thought, there’s a better way to leverage the PVS than back-to-front drawing.

And indeed there is.

Sorted spans

The ideal final HSR stage for Quake would reject all the polygons in the PVS that are actually invisible, and draw only the visible pixels of the remaining polygons, with no overdraw--that is, with every pixel drawn exactly once--all at no performance cost, of course.? One way to do that (although certainly not at zero cost) would be to draw the polygons from front to back, maintaining a region describing the currently occluded portions of the screen and clipping each polygon to that region before drawing it.? That sounds promising, but it is in fact nothing more or less than the beam tree approach I described in the January column, an approach that we found to have considerable overhead and serious leveling problems.

We can do much better if we move the final HSR stage from the polygon level to the span level and use a sorted-spans approach.? In essence, this approach consists of turning each polygon into a set of spans, as shown in Figure 1, and then sorting and clipping the spans against each other until only the visible portions of visible spans are left to be drawn, as shown in Figure 2.? This may sound a lot like z-buffering (which is simply too slow for use in drawing the world, although it’s fine for smaller moving objects, as described earlier), but there are crucial differences.? By contrast with z-buffering, only visible portions of visible spans are scanned out pixel by pixel (although all polygon edges must still be rasterized).? Better yet, the sorting that z-buffering does at each pixel becomes a per-span operation with sorted spans, and because of the coherence implicit in a span list, each edge is sorted against only against some of the spans on the same line, and clipped only to the few spans that it overlaps horizontally.? Although complex scenes still take longer to process than simple scenes, the worst case isn’t as bad as with the beam tree or back-to-front approaches, because there’s no overdraw or scanning of hidden pixels, because complexity is limited to pixel resolution, and because span coherence tends to limit the worst-case sorting in any one area of the screen.? As a bonus, the output of sorted spans is in precisely the form that a low-level rasterizer needs, a set of span descriptors, each consisting of a start coordinate and a length.

Figure 1: Span generation.

In short, the sorted spans approach meets our original criteria pretty well; although it isn’t zero-cost, it’s not horribly expensive, it completely eliminates both overdraw and pixel scanning of obscured portions of polygons, and it tends to level worst-case performance.? We wouldn’t want to rely on sorted spans alone as our hidden-surface mechanism, but the precalculated PVS reduces the number of polygons to a level that sorted spans can handle quite nicely.

So we’ve found the approach we need; now it’s just a matter of writing some code and we’re on our way, right?? Well, yes and no.? Conceptually, the sorted-spans approach is simple, but it’s surprisingly difficult to implement, with a couple of major design choices to be made, a subtle mathematical element, and some tricky gotchas that we’ll see in the next column.? Let’s look at the design choices first.

Edges versus spans

The first design choice is whether to sort spans or edges (both of which fall into the general category of “sorted spans”).? Although the results are the same both ways--a list of spans to be drawn, with no overdraw--the implementations and performance implications are quite different, because the sorting and clipping are performed using very different data structures.

With span-sorting, spans are stored in x-sorted linked list buckets, typically with one bucket per scan line.? Each polygon in turn is rasterized into spans, as shown in Figure 1, and each span is sorted and clipped into the bucket for the scan line the span is on, as shown in Figure 2, so that at any time each bucket contains the nearest spans encountered thus far, always with no overlap.? This approach involves generating all spans for each polygon in turn, with each span immediately being sorted, clipped, and added to the appropriate bucket.

Figure 2: The spans from polygon A from Figure 1 sorted and clipped with the spans from polygon B, where polygon A is at a constant z distance of 100 and polygon B is at a constant z distance of 50 (polygon B is closer).


With edge-sorting, edges are stored in x-sorted linked list buckets according to their start scan line.? Each polygon in turn is decomposed into edges, cumulatively building a list of all the edges in the scene.? Once all edges for all polygons in the view frustum have been added to the edge list, the whole list is scanned out in a single top-to-bottom, left-to-right pass.? An active edge list (AEL) is maintained.? With each step to a new scan line, edges that end on that scan line are removed from the AEL, active edges are stepped to their new x coordinates, edges starting on the new scan line are added to the AEL, and the edges are sorted by current x coordinate.

For each scan line, a z-sorted active polygon list (APL) is maintained.? The x-sorted AEL is stepped through in order.? As each new edge is encountered (that is, as each polygon starts or ends as we move left to right), the associated polygon is activated and sorted into the APL, as shown in Figure 3, or deactivated and removed from the APL, as shown in Figure 4, for a leading or trailing edge, respectively.? If the nearest polygon has changed (that is, if the new polygon is nearest, or if the nearest polygon just ended), a span is emitted for the polygon that just stopped being the nearest, starting at the point where the polygon first because nearest and ending at the x coordinate of the current edge, and the current x coordinate is recorded in the polygon that is now the nearest.? This saved coordinate later serves as the start of the span emitted when the new nearest polygon ceases to be in front.

Figure 3: Activating a polygon when a leading edge is encountered in the AEL.

Figure 4: Deactivating a polygon when a trailing edge is encountered in the AEL.


Don’t worry if you didn’t follow all of that; the above is just a quick overview of edge-sorting to help make the rest of this column clearer.? There will a thorough discussion in the next column.

The spans that are generated with edge-sorting are exactly the same spans that ultimately emerge from span-sorting; the difference lies in the intermediate data structures that are used to sort the spans in the scene.? With edge-sorting, the spans are kept implicit in the edges until the final set of visible spans is generated, so the sorting, clipping, and span emission is done as each edge adds or removes a polygon, based on the span state implied by the edge and the set of active polygons.? With span-sorting, spans are immediately made explicit when each polygon is rasterized, and those intermediate spans are then sorted and clipped against other the spans on the scan line to generate the final spans, so the states of the spans are explicit at all times, and all work is done directly with spans.

Both span-sorting and edge-sorting work well, and both have been employed successfully in commercial projects.? We’ve chosen to use edge-sorting in Quake partly because it seems inherently more efficient, with excellent horizontal coherence that makes for minimal time spent sorting, in contrast with the potentially costly sorting into linked lists that span-sorting can involve.? A more important reason, though, is that with edge-sorting we’re able to share edges between adjacent polygons, and that cuts the work involved in sorting, clipping, and rasterizing edges nearly in half, while also shrinking the world database quite a bit due to the sharing.

One final advantage of edge-sorting is that it makes no distinction between convex and concave polygons.? That’s not an important consideration for most graphics engines, but in Quake, edge clipping, transformation, projection, and sorting has become a major bottleneck, so we’re doing everything we can to get the polygon and edge counts down, and concave polygons help a lot in that regard.? While it’s possible to handle concave polygons with span-sorting, that can involve significant performance penalties.

Nonetheless, there’s no cut-and-dried answer as to which approach is better.? In the end, span-sorting and edge-sorting amount to the same functionality, and the choice between them is a matter of whatever you feel most comfortable with.? In the next column, I’ll go into considerable detail about edge-sorting, complete with a full implementation.? I’m going the spend the rest of this column laying the foundation for next time by discussing sorting keys and 1/z calculation.? In the process, I’m going to have to make a few forward references to aspects of edge-sorting that I haven’t covered in detail; my apologies, but it’s unavoidable, and all should become clear by the end of the next column.

Edge-sorting keys

Now that we know we’re going to sort edges, using them to emit spans for the polygons nearest the viewer, the question becomes how we can tell which polygons are nearest.? Ideally, we’d just store a sorting key in each polygon, and whenever a new edge came along, we’d compare its surface’s key to the keys of other currently active polygons, and could easily tell which polygon was nearest.

That sounds too good to be true, but it is possible.? If, for example, your world database is stored as a BSP tree, with all polygons clipped into the BSP leaves, then BSP walk order is a valid drawing order.? So, for example, if you walk the BSP back to front, assigning each polygon an incrementally higher key as you reach it, polygons with higher keys are guaranteed to be in front of polygons with lower keys.? This is the approach Quake used for a while, although a different approach is now being used, for reasons I’ll explain shortly.

If you don’t happen to have a BSP or similar data structure handy, or if you have lots of moving polygons (BSPs don’t handle moving polygons very efficiently), another way to accomplish our objectives would be to sort all the polygons against one another before drawing the scene, assigning appropriate keys based on their spatial relationships in viewspace.? Unfortunately, this is generally an extremely slow task, because every polygon must be compared to every other polygon.? There are techniques to improve the performance of polygon sorts, but I don’t know of anyone who’s doing general polygon sorts of complex scenes in realtime on a PC.

An alternative is to sort by z distance from the viewer in screenspace, an approach that dovetails nicely with the excellent spatial coherence of edge-sorting.? As each new edge is encountered on a scan line, the corresponding polygon’s z distance can be calculated and compared to the other polygons’ distances, and the polygon can be sorted into the APL accordingly.

Getting z distances can be tricky, however.? Remember that we need to be able to calculate z at any arbitrary point on a polygon, because an edge may occur and cause its polygon to be sorted into the APL at any point on the screen.? We could calculate z directly from the screen x and y coordinates and the polygon’s plane equation, but unfortunately this can’t be done very quickly, because the z for a plane doesn’t vary linearly in screenspace; however, 1/z does vary linearly, so we’ll use that instead.? (See Chris Hecker’s series of columns on texture mapping over the past year in Game Developer magazine for a discussion of screenspace linearity and gradients for 1/z.)? Another advantage of using 1/z is that its resolution increases with decreasing distance, meaning that by using 1/z, we’ll have better depth resolution for nearby features, where it matters most.

The obvious way to get a 1/z value at any arbitrary point on a polygon is to calculate 1/z at the vertices, interpolate it down both edges of the polygon, and interpolate between the edges to get the value at the point of interest.? Unfortunately, that requires doing a lot of work along each edge, and worse, requires division to calculate the 1/z step per pixel across each span.

A better solution is to calculate 1/z directly from the plane equation and the screen x and y of the pixel of interest.? The equation is:

1/z = (a/d)x’ - (b/d)y’ + c/d

where z is the viewspace z coordinate of the point on the plane that projects to screen coordinate (x’,y’) (the origin for this calculation is the center of projection, the point on the screen straight ahead of the viewpoint), [a b c] is the plane normal in viewspace, and d is the distance from the viewspace origin to the plane along the normal.? Division is done only once per plane, because a, b, c, and d are per-plane constants.

The full 1/z calculation requires two multiplies and two adds, all of which should be floating-point to avoid range errors.? That much floating-point math sounds expensive but really isn’t, especially on a Pentium, where a plane’s 1/z value at any point can be calculated in as little as six cycles in assembly language.

For those who are interested, here’s a quick derivation of the 1/z equation.? The plane equation for a plane is

ax + by + cz - d = 0,

where x and y are viewspace coordinates, and a, b, c, d, and z are defined above.? If we substitute x=x’z and y=-y’z (from the definition of the perspective projection, with y inverted because y increases upward in viewspace but downward in screenspace), and do some rearrangement, we get

z = d / (ax’ - by’ + c).

Inverting and distributing yields

1/z = ax’/d - by’/d + c/d.

We’ll see 1/z sorting in action next time.

Quake and z-sorting

I mentioned above that Quake no longer uses BSP order as the sorting key; in fact, it uses 1/z as the key now.? Elegant as the gradients are, calculating 1/z from them is clearly slower than just doing a compare on a BSP-ordered key, so why have we switched Quake to 1/z?

The primary reason is to reduce the number of polygons.? Drawing in BSP order means following certain rules, including the rule that polygons must be split if they cross BSP planes.? This splitting increases the numbers of polygons and edges considerably.? By sorting on 1/z, we’re able to leave polygons unsplit but still get correct drawing order, so we have far fewer edges to process and faster drawing overall, despite the added cost of 1/z sorting.

Another advantage of 1/z sorting is that it solves the sorting issues I mentioned at the start involving moving models that are themselves small BSP trees.? Sorting in world BSP order wouldn’t work here, because these models are separate BSPs, and there’s no easy way to work them into the world BSP’s sequence order.? We don’t want to use z-buffering for these models because they’re often large objects such as doors, and we don’t want to lose the overdraw-reduction benefits that closed doors provide when drawn through the edge list.? With sorted spans, the edges of moving BSP models are simply placed in the edge list (first clipping polygons so they don’t cross any solid world surfaces, to avoid complications associated with interpenetration), along with all the world edges, and 1/z sorting takes care of the rest.

Onward to next time

There is, without a doubt, an awful lot of information in the preceding pages, and it may not all connect together yet in your mind.? The code and accompanying explanation next time should help; if you want to peek ahead, the code should be available from ftp.idsoftware.com/mikeab/ddjsort.zip by the time you read this column.? You may also want to take a look at Foley & van Dam’s Computer Graphics or Rogers’ Procedural Elements for Computer Graphics.

As I write this, it’s unclear whether Quake will end up sorting edges by BSP order or 1/z.? Actually, there’s no guarantee that sorted spans in any form will be the final design.? Sometimes it seems like we change graphics engines as often as they play Elvis on the ‘50s oldies stations (but, one would hope, with more aesthetically pleasing results!), and no doubt we’ll be considering the alternatives right up until the day we ship.
원본 사이트 : http://www.bluesnews.com/abrash/contents.shtml

Inside Quake: Visible-Surface Determination

by Michael Abrash

Years ago, I was working at Video Seven, a now-vanished video adapter manufacturer, helping to develop a VGA clone. The fellow who was designing Video Seven’s VGA chip, Tom Wilson, had worked around the clock for months to make his VGA run as fast as possible, and was confident he had pretty much maxed out its performance. As Tom was putting the finishing touches on his chip design, however, news came fourth-hand that a competitor, Paradise, had juiced up the performance of the clone they were developing, by putting in a FIFO.

That was it; there was no information about what sort of FIFO, or how much it helped, or anything else. Nonetheless, Tom, normally an affable, laid-back sort, took on the wide-awake, haunted look of a man with too much caffeine in him and no answers to show for it, as he tried to figure out, from hopelessly thin information, what Paradise had done. Finally, he concluded that Paradise must have put a write FIFO between the system bus and the VGA, so that when the CPU wrote to video memory, the write immediately went into the FIFO, allowing the CPU to keep on processing instead of stalling each time it wrote to display memory.

Tom couldn’t spare the gates or the time to do a full FIFO, but he could implement a one-deep FIFO, allowing the CPU to get one write ahead of the VGA. He wasn’t sure how well it would work, but it was all he could do, so he put it in and taped out the chip.

The one-deep FIFO turned out to work astonishingly well; for a time, Video Seven’s VGAs were the fastest around, a testament to Tom’s ingenuity and creativity under pressure. However, the truly remarkable part of this story is that Paradise’s FIFO design turned out to bear not the slightest resemblance to Tom’s, and didn’t work as well. Paradise had stuck a read FIFO between display memory and the video output stage of the VGA, allowing the video output to read ahead, so that when the CPU wanted to access display memory, pixels could come from the FIFO while the CPU was serviced immediately. That did indeed help performance--but not as much as Tom’s write FIFO.

What we have here is as neat a parable about the nature of creative design as one could hope to find. The scrap of news about Paradise’s chip contained almost no actual information, but it forced Tom to push past the limits he had unconsciously set in coming up with his original design. And, in the end, I think that the single most important element of great design, whether it be hardware or software or any creative endeavor, is precisely what the Paradise news triggered in Tom: The ability to detect the limits you have built into the way you think about your design, and transcend those limits.

The problem, of course, is how to go about transcending limits you don’t even know you’ve imposed. There’s no formula for success, but two principles can stand you in good stead: simplify, and keep on trying new things.

Generally, if you find your code getting more complex, you’re fine-tuning a frozen design, and it’s likely you can get more of a speed-up, with less code, by rethinking the design. A really good design should bring with it a moment of immense satisfaction in which everything falls into place, and you’re amazed at how little code is needed and how all the boundary cases just work properly.

As for how to rethink the design, do it by pursuing whatever ideas occur to you, no matter how off-the-wall they seem. Many of the truly brilliant design ideas I’ve heard over the years sounded like nonsense at first, because they didn’t fit my preconceived view of the world. Often, such ideas are in fact off-the-wall, but just as the news about Paradise’s chip sparked Tom’s imagination, aggressively pursuing seemingly-outlandish ideas can open up new design possibilities for you.

Case in point: The evolution of Quake’s 3-D graphics engine.

The toughest 3-D challenge of all

I’ve spent most of my waking hours for the last seven months working on Quake, id Software’s successor to DOOM, and after spending the next three months in much the same way, I expect Quake will be out as shareware around the time you read this.

In terms of graphics, Quake is to DOOM as DOOM was to its predecessor, Wolfenstein 3D. Quake adds true, arbitrary 3-D (you can look up and down, lean, and even fall on your side), detailed lighting and shadows, and 3-D monsters and players in place of DOOM’s sprites. Sometime soon, I’ll talk about how all that works, but this month I want to talk about what is, in my book, the toughest 3-D problem of all, visible surface determination (drawing the proper surface at each pixel), and its close relative, culling (discarding non-visible polygons as quickly as possible, a way of accelerating visible surface determination). In the interests of brevity, I’ll use the abbreviation VSD to mean both visible surface determination and culling from now on.

Why do I think VSD is the toughest 3-D challenge? Although rasterization issues such as texture mapping are fascinating and important, they are tasks of relatively finite scope, and are being moved into hardware as 3-D accelerators appear; also, they only scale with increases in screen resolution, which are relatively modest.

In contrast, VSD is an open-ended problem, and there are dozens of approaches currently in use. Even more significantly, the performance of VSD, done in an unsophisticated fashion, scales directly with scene complexity, which tends to increase as a square or cube function, so this very rapidly becomes the limiting factor in doing realistic worlds. I expect VSD increasingly to be the dominant issue in realtime PC 3-D over the next few years, as 3-D worlds become increasingly detailed. Already, a good-sized Quake level contains on the order of 10,000 polygons, about three times as many polygons as a comparable DOOM level.

The structure of Quake levels

Before diving into VSD, let me note that each Quake level is stored as a single huge 3-D BSP tree. This BSP tree, like any BSP, subdivides space, in this case along the planes of the polygons. However, unlike the BSP tree I presented last time, Quake’s BSP tree does not store polygons in the tree nodes, as part of the splitting planes, but rather in the empty (non-solid) leaves, as shown in overhead view in Figure 1.

Figure 1: In Quake, polygons are stored in the empty leaves. Shaded areas are solid leaves (solid volumes, such as the insides of walls).

Correct drawing order can be obtained by drawing the leaves in front-to-back or back-to-front BSP order, again as discussed last time. Also, because BSP leaves are always convex and the polygons are on the boundaries of the BSP leaves, facing inward, the polygons in a given leaf can never obscure one another and can be drawn in any order. (This is a general property of convex polyhedra.)

Culling and visible surface determination

The process of VSD would ideally work as follows. First, you would cull all polygons that are completely outside the view frustum (view pyramid), and would clip away the irrelevant portions of any polygons that are partially outside. Then you would draw only those pixels of each polygon that are actually visible from the current viewpoint, as shown in overhead view in Figure 2, wasting no time overdrawing pixels multiple times; note how little of the polygon set in Figure 2 actually need to be drawn. Finally, in a perfect world, the tests to figure out what parts of which polygons are visible would be free, and the processing time would be the same for all possible viewpoints, giving the game a smooth visual flow.

Figure 2: An ideal VSD architecture would draw only visible parts of visible polygons.


As it happens, it is easy to determine which polygons are outside the frustum or partially clipped, and it’s quite possible to figure out precisely which pixels need to be drawn. Alas, the world is far from perfect, and those tests are far from free, so the real trick is how to accelerate or skip various tests and still produce the desired result.

As I discussed at length last time, given a BSP, it’s easy and inexpensive to walk the world in front-to-back or back-to-front order. The simplest VSD solution, which I in fact demonstrated last time, is to simply walk the tree back-to-front, clip each polygon to the frustum, and draw it if it’s facing forward and not entirely clipped (the painter’s algorithm). Is that an adequate solution?

For relatively simple worlds, it is perfectly acceptable. It doesn’t scale very well, though. One problem is that as you add more polygons in the world, more transformations and tests have to be performed to cull polygons that aren’t visible; at some point, that will bog performance down considerably.

Happily, there’s a good workaround for this particular problem. As discussed earlier, each leaf of a BSP tree represents a convex subspace, with the nodes that bound the leaf delimiting the space. Perhaps less obvious is that each node in a BSP tree also describes a subspace--the subspace composed of all the node’s children, as shown in Figure 3. Another way of thinking of this is that each node splits into two pieces the subspace created by the nodes above it in the tree, and the node’s children then further carve that subspace into all the leaves that descend from the node.

Figure 3: Node E describes the shaded subspace, which contains leaves 5, 6, and 7, and node F.


Since a node’s subspace is bounded and convex, it is possible to test whether it is entirely outside the frustum. If it is, all of the node’s children are certain to be fully clipped, and can be rejected without any additional processing. Since most of the world is typically outside the frustum, many of the polygons in the world can be culled almost for free, in huge, node-subspace chunks. It’s relatively expensive to perform a perfect test for subspace clipping, so instead bounding spheres or boxes are often maintained for each node, specifically for culling tests.

So culling to the frustum isn’t a problem, and the BSP can be used to draw back to front. What’s the problem?


The problem John Carmack, the driving technical force behind DOOM and Quake, faced when he designed Quake was that in a complex world, many scenes have an awful lot of polygons in the frustum. Most of those polygons are partially or entirely obscured by other polygons, but the painter’s algorithm described above requires that every pixel of every polygon in the frustum be drawn, often only to be overdrawn. In a 10,000-polygon Quake level, it would be easy to get a worst-case overdraw level of 10 times or more; that is, in some frames each pixel could be drawn 10 times or more, on average. No rasterizer is fast enough to compensate for an order of magnitude more work than is actually necessary to show a scene; worse still, the painter’s algorithm will cause a vast difference between best-case and worst-case performance, so the frame rate can vary wildly as the viewer moves around.

So the problem John faced was how to keep overdraw down to a manageable level, preferably drawing each pixel exactly once, but certainly no more than two or three times in the worst case. As with frustum culling, it would be ideal if he could eliminate all invisible polygons in the frustum with virtually no work. It would also be a plus if he could manage to draw only the visible parts of partially-visible polygons, but that was a balancing act, in that it had to be a lower-cost operation than the overdraw that would otherwise result.

When I arrived at id at the beginning of March, John already had an engine prototyped and a plan in mind, and I assumed that our work was a simple matter of finishing and optimizing that engine. If I had been aware of id’s history, however, I would have known better. John had done not only DOOM, but also the engines for Wolf 3D and several earlier games, and had actually done several different versions of each engine in the course of development (once doing four engines in four weeks), for a total of perhaps 20 distinct engines over a four-year period. John’s tireless pursuit of new and better designs for Quake’s engine, from every angle he could think of, would end only when we shipped.

By three months after I arrived, only one element of the original VSD design was anywhere in sight, and John had taken “try new things” farther than I’d ever seen it taken.

The beam tree

John’s original Quake design was to draw front to back, using a second BSP tree to keep track of what parts of the screen were already drawn and which were still empty and therefore drawable by the remaining polygons. Logically, you can think of this BSP tree as being a 2-D region describing solid and empty areas of the screen, as shown in Figure 4, but in fact it is a 3-D tree, of the sort known as a beam tree. A beam tree is a collection of 3-D wedges (beams), bounded by planes, projecting out from some center point, in this case the viewpoint, as shown in Figure 5.

Figure 4: Quake's beam tree effectively partitioned the screen into 2-D regions.

Figure 5: Quake's beam tree was composed of 3-D wedges, or beams, projecting out from the viewpoint to polygon edges.


In John’s design, the beam tree started out consisting of a single beam describing the frustum; everything outside that beam was marked solid (so nothing would draw there), and the inside of the beam was marked empty. As each new polygon was reached while walking the world BSP tree front to back, that polygon was converted to a beam by running planes from its edges through the viewpoint, and any part of the beam that intersected empty beams in the beam tree was considered drawable and added to the beam tree as a solid beam. This continued until either there were no more polygons or the beam tree became entirely solid. Once the beam tree was completed, the visible portions of the polygons that had contributed to the beam tree were drawn.

The advantage to working with a 3-D beam tree, rather than a 2-D region, is that determining which side of a beam plane a polygon vertex is on involves only checking the sign of the dot product of the ray to the vertex and the plane normal, because all beam planes run through the origin (the viewpoint). Also, because a beam plane is completely described by a single normal, generating a beam from a polygon edge requires only a cross-product of the edge and a ray from the edge to the viewpoint. Finally, bounding spheres of BSP nodes can be used to do the aforementioned bulk culling to the frustum.

The early-out feature of the beam tree--stopping when the beam tree becomes solid--seems appealing, because it appears to cap worst-case performance. Unfortunately, there are still scenes where it’s possible to see all the way to the sky or the back wall of the world, so in the worst case, all polygons in the frustum will still have to be tested against the beam tree. Similar problems can arise from tiny cracks due to numeric precision limitations. Beam tree clipping is fairly time-consuming, and in scenes with long view distances, such as views across the top of a level, the total cost of beam processing slowed Quake’s frame rate to a crawl. So, in the end, the beam-tree approach proved to suffer from much the same malady as the painter’s algorithm: The worst case was much worse than the average case, and it didn’t scale well with increasing level complexity.

3-D engine du jour

Once the beam tree was working, John relentlessly worked at speeding up the 3-D engine, always trying to improve the design, rather than tweaking the implementation. At least once a week, and often every day, he would walk into my office and say “Last night I couldn’t get to sleep, so I was thinking...” and I’d know that I was about to get my mind stretched yet again. John tried many ways to improve the beam tree, with some success, but more interesting was the profusion of wildly different approaches that he generated, some of which were merely discussed, others of which were implemented in overnight or weekend-long bursts of coding, in both cases ultimately discarded or further evolved when they turned out not to meet the design criteria well enough. Here are some of those approaches, presented in minimal detail in the hopes that, like Tom Wilson with the Paradise FIFO, your imagination will be sparked.

Subdividing raycast: Rays are cast in an 8x8 screen-pixel grid; this is a highly efficient operation because the first intersection with a surface can be found by simply clipping the ray into the BSP tree, starting at the viewpoint, until a solid leaf is reached. If adjacent rays don’t hit the same surface, then a ray is cast halfway between, and so on until all adjacent rays either hit the same surface or are on adjacent pixels; then the block around each ray is drawn from the polygon that was hit. This scales very well, being limited by the number of pixels, with no overdraw. The problem is dropouts; it’s quite possible for small polygons to fall between rays and vanish.

Vertex-free surfaces: The world is represented by a set of surface planes. The polygons are implicit in the plane intersections, and are extracted from the planes as a final step before drawing. This makes for fast clipping and a very small data set (planes are far more compact than polygons), but it’s time-consuming to extract polygons from planes.

Draw-buffer: Like a z-buffer, but with 1 bit per pixel, indicating whether the pixel has been drawn yet. This eliminates overdraw, but at the cost of an inner-loop buffer test, extra writes and cache misses, and, worst of all, considerable complexity. Variations are testing the draw-buffer a byte at a time and completely skipping fully-occluded bytes, or branching off each draw-buffer byte to one of 256 unrolled inner loops for drawing 0-8 pixels, in the process possibly taking advantage of the ability of the x86 to do the perspective floating-point divide in parallel while 8 pixels are processed.

Span-based drawing: Polygons are rasterized into spans, which are added to a global span list and clipped against that list so that only the nearest span at each pixel remains. Little sorting is needed with front-to-back walking, because if there’s any overlap, the span already in the list is nearer. This eliminates overdraw, but at the cost of a lot of span arithmetic; also, every polygon still has to be turned into spans.

Portals: the holes where polygons are missing on surfaces are tracked, because it’s only through such portals that line-of-sight can extend. Drawing goes front-to-back, and when a portal is encountered, polygons and portals behind it are clipped to its limits, until no polygons or portals remain visible. Applied recursively, this allows drawing only the visible portions of visible polygons, but at the cost of a considerable amount of portal clipping.


In the end, John decided that the beam tree was a sort of second-order structure, reflecting information already implicitly contained in the world BSP tree, so he tackled the problem of extracting visibility information directly from the world BSP tree. He spent a week on this, as a byproduct devising a perfect DOOM (2-D) visibility architecture, whereby a single, linear walk of a DOOM BSP tree produces zero-overdraw 2-D visibility. Doing the same in 3-D turned out to be a much more complex problem, though, and by the end of the week John was frustrated by the increasing complexity and persistent glitches in the visibility code. Although the direct-BSP approach was getting closer to working, it was taking more and more tweaking, and a simple, clean design didn’t seem to be falling out. When I left work one Friday, John was preparing to try to get the direct-BSP approach working properly over the weekend.

When I came in on Monday, John had the look of a man who had broken through to the other side--and also the look of a man who hadn’t had much sleep. He had worked all weekend on the direct-BSP approach, and had gotten it working reasonably well, with insights into how to finish it off. At 3:30 AM Monday morning, as he lay in bed, thinking about portals, he thought of precalculating and storing in each leaf a list of all leaves visible from that leaf, and then at runtime just drawing the visible leaves back-to-front for whatever leaf the viewpoint happens to be in, ignoring all other leaves entirely.

Size was a concern; initially, a raw, uncompressed potentially visible set (PVS) was several megabytes in size. However, the PVS could be stored as a bit vector, with 1 bit per leaf, a structure that shrunk a great deal with simple zero-byte compression. Those steps, along with changing the BSP heuristic to generate fewer leaves (contrary to what I said a few months back, choosing as the next splitter the polygon that splits the fewest other polygons is clearly the best heuristic, based on the latest data) and sealing the outside of the levels so the BSPer can remove the outside surfaces, which can never be seen, eventually brought the PVS down to about 20 Kb for a good-size level.

In exchange for that 20 Kb, culling leaves outside the frustum is speeded up (because only leaves in the PVS are considered), and culling inside the frustum costs nothing more than a little overdraw (the PVS for a leaf includes all leaves visible from anywhere in the leaf, so some overdraw, typically on the order of 50% but ranging up to 150%, generally occurs). Better yet, precalculating the PVS results in a leveling of performance; worst case is no longer much worse than best case, because there’s no longer extra VSD processing--just more polygons and perhaps some extra overdraw--associated with complex scenes. The first time John showed me his working prototype, I went to the most complex scene I knew of, a place where the frame rate used to grind down into the single digits, and spun around smoothly, with no perceptible slowdown.

John says precalculating the PVS was a logical evolution of the approaches he had been considering, that there was no moment when he said “Eureka!”. Nonetheless, it was clearly a breakthrough to a brand-new, superior design, a design that, together with a still-in-development sorted-edge rasterizer that completely eliminates overdraw, comes remarkably close to meeting the “perfect-world” specifications we laid out at the start.

Simplify, and keep on trying new things

What does it all mean? Exactly what I said up front: Simplify, and keep trying new things. The precalculated PVS is simpler than any of the other schemes that had been considered (although precalculating the PVS is an interesting task that I’ll discuss another time). In fact, at runtime the precalculated PVS is just a constrained version of the painter’s algorithm. Does that mean it’s not particularly profound?

Not at all. All really great designs seem simple and even obvious--once they’ve been designed. But the process of getting there requires incredible persistence, and a willingness to try lots of different ideas until the right one falls into place, as happened here.

My friend Chris Hecker has a theory that all approaches work out to the same thing in the end, since they all reflect the same underlying state and functionality. In terms of underlying theory, I’ve found that to be true; whether you do perspective texture mapping with a divide or with incremental hyperbolic calculations, the numbers do exactly the same thing. When it comes to implementation, however, my experience is that simply time-shifting an approach, or matching hardware capabilities better, or caching can make an astonishing difference. My friend Terje Mathisen likes to say that “almost all programming can be viewed as an exercise in caching,” and that’s exactly what John did. No matter how fast he made his VSD calculations, they could never be as fast as precalculating and looking up the visibility, and his most inspired move was to yank himself out of the “faster code” mindset and realize that it was in fact possible to precalculate (in effect, cache) and look up the PVS.

The hardest thing in the world is to step outside a familiar, pretty good solution to a difficult problem and look for a different, better solution. The best ways I know to do that are to keep trying new, wacky things, and always, always, always try to simplify. One of John’s goals is to have fewer lines of code in each 3-D game than in the previous game, on the assumption that as he learns more, he should be able to do things better with less code.

So far, it seems to have worked out pretty well for him.

Learn now, pay forward

There’s one other thing I’d like to mention before I close up shop for this month. As far back as I can remember, DDJ has epitomized the attitude that sharing programming information is A Good Thing. I know a lot of programmers who were able to leap ahead in their development because of Hendrix’s Tiny C, or Stevens’ D-Flat, or simply by browsing through DDJ’s annual collections. (Me, for one.) Most companies understandably view sharing information in a very different way, as potential profit lost--but that’s what makes DDJ so valuable to the programming community.

It is in that spirit that id Software is allowing me to describe in these pages how Quake works, even before Quake has shipped. That’s also why id has placed the full source code for Wolfenstein 3D on ftp.idsoftware.com/idstuff/source; you can’t just recompile the code and sell it, but you can learn how a full-blown, successful game works; check wolfsrc.txt in the above-mentioned directory for details on how the code may be used.

So remember, when it’s legally possible, sharing information benefits us all in the long run. You can pay forward the debt for the information you gain here and elsewhere by sharing what you know whenever you can, by writing an article or book or posting on the Net. None of us learns in a vacuum; we all stand on the shoulders of giants such as Wirth and Knuth and thousands of others. Lend your shoulders to building the future!


Foley, James D., et al., Computer Graphics: Principles and Practice, Addison Wesley, 1990, ISBN 0-201-12110-7 (beams, BSP trees, VSD).

Teller, Seth, Visibility Computations in Densely Occluded Polyhedral Environments (dissertation), available on


along with several other papers relevant to visibility determination.

Teller, Seth, Visibility Preprocessing for Interactive Walkthroughs, SIGGRAPH 91 proceedings, pp. 61-69.

관능적인 몸매와 남자건 여자건 시종일관 애교 섞인 말로 플레이어를 농락하며 압도적인 차이로 제압하려는게 `devCAT`의 의도였건만 오히려 유저들에게 농락 당하며 카운터를 맞으면 옷이 하나씩  벗겨지는 비운의 보스 캐릭터 `서큐버스`

지금부터 필자는 마비노기에서 비운의 NPC가 되어버린 `서큐버스`의 진짜 모습을 설명 해주겠다. 이 글을 읽기전에 심장이 약하거나 임산부, 노약자 그리고 미성년자는 조용히 Alt + F4를 누르시기 바란다.

씰온라인의 서큐버스 (대사는 역시 비슷하다)

인큐버스, 서큐버스(incubus, succubus)

`인큐버스`와 `서큐버스`는 일반적으로 ′몽마(나이트메어)′라고 불리며, 인간을 타락 시키는 악마의 일종이다.

`인큐버스`는 남성의 모습을 하고 있으며, `서큐버스`는 여성의 모습으로 나타난다. 일설에 의하면 인간 여성에 대한 관능적인 욕구가 지나쳐서 타락한 천사라고하며 다른 설에 따르면 `서큐버스`는 ′숲의 님프′라고 한다. `인큐버스`는 라틴어의 ′incubo(위에서 자다,올라타다)′라는 낱말에서 유래된 이름이다. 마찬가지로 서큐버스는 ′succubo(밑에서 자다,아래에 눕다)′라는 단어에서 유래되었다.

`인큐버스`는 인간 여성을 꾀어서 거시기~(에잇 알려고 하지말자!)를 하는 것, 혹은 타락 시킬 목적으로 이 세상에 모습을 드러낸다. 그 방법은 다양하여 아름다운 젊은이의 모습으로 변하여 여성을 유혹해서 목적을 달성하는 경우도 있고, 밤중에 자고 있는 여성의 잠자리에 숨어 들어가서 음란한 꿈을 보이거나 꼼짝하지 못하도록 올라타기도 해서 상대방이 알아차리지 못하는 사이에 목적을 달성하는 방법을 쓰는 경우도 있다.

`인큐버스`는 남자 어른의 모습을 취하는 경우 외에도 `사티로스`의 모습으로 나타나는 수도 있고, 마녀로 불리는 자들 앞에서는 `음탕한 산양의 모습`을 취하는 경우도 있으며, 때로는 `여성의 모습`으로 변신하는 경우도 있다.

`인큐버스`와 `서큐버스`는 기본적으로 같은 존재이지만, 남성의 모습을 하고 있는 것이 `인큐버스`이고 여성의 모습을 하고 있는 것이 `서큐버스`라는 설도 있다.

인큐버스와 서큐버스

`서큐버스`는 `인큐버스`와는 반대로 인간 남성을 유혹해서 거시기(흠 자막처리)하고 타락시킬 목적을 가지고 있다. 인간의 남성과 거시기~ (또나왔어 버럭!)할 때 상대방의 삐리리리~(오 주여;; 내가 이걸 왜 알려준다 했던가....)을 얻는데, `인큐버스`가 그 삐리리~~(갈수록 태산이군)을 사용해서 인간 여성을 임신시킨다고 한다. 이 일에 대해서 `토마스 아퀴나스`는, 그런 결과로 태어난 아이의 아버지는 어머니인 여성과 거시기~(이제그만 ㅜㅜ)한 `인큐버스`가 아니라 삐리리~( 오 주여;;)을 서큐버스에게 빼앗긴 남성이라고 결론을 내렸다.

또한 `인큐버스`와 `서큐버스`는 `마녀`나 `마법사`의 조수로 일하는 경우도 있다. 일설에 따르면 가장 저급한 악마만이 `인큐버스`가 되어 일한다고 한다. `인큐버스`와 `서큐버스`는 마을이나 도시 등 사람이 사는 장소, 혹은 그 주변에 출몰한다.

그리고 유혹할 상대를 한번 정하면 상대방이 죽거나 자신이 그 상대에게 질리거나, 또는 효력이 있는 수단으로 쫓겨날 때까지 상대방을 물고 늘어진다. 때로는 마녀들의 집회와 같은 장소에 나타나서 그 자리에 있는 마녀들과 어울리기도 한다.

`인큐버스`나 `서큐버스`에게 대항하기 위해서는 교회의 힘, 특히 주술과 기도의 힘이 효과적이라고 믿어지는데, 그 힘을 가지고도 겨우 그들을 쫓아낼 수 있을 뿐이지 완전히 퇴치할 수는 없는 모양이다.

가장 효력이 있는 것은 이들이 들러붙은 사람 본인의 신앙심과 도덕심, 그리고 정신력일 것이다. 설령 `인큐버스`를 쫓아내는데 성공했다고 하더라도 `인큐버스`는 다시 다른 상대를 발견해서 들러붙거나 자칫 잘못하면 다시 한 번 원래의 상대에게 돌아와서 붙을 수도 있으므로 마음을 놓을 수 없는 일이다.

괴테의 <파우스트>에서 4대 (정령)의 주문 중 "가사를 돕는데 힘을 다해라 인쿠브스 인쿠브스 "라는 말이 나오는데 여기서 나오는 인쿠브스(인큐버스)는 땅의 정령인 `코볼트`(코볼드)와 같은 의미를 가지는 것으로 간주된다.


야릇한 생각 하지마라!

뭐 일단은 거시기~ 와 삐리리~ 밖에 기억에 안남겠지만 아무튼 현재까지의 MMORPG 게임에는 이러한 특징을 가진 서큐버스, 인큐버스를 나타내지 않고 오로지 생명력을 빨아들이는 형태로만 등장하는 경우가 전부이다. 이제 서큐버스에 대해 알았다면 옷벗기기 놀이로 서큐버스를 기만하지 말고 그냥 죽이자. 혹시 모른다 그대가 잠든 후에 나타나서... (-_ㅡㆀ)

던전의 천장에 닿을 듯한 거대한 몸집, 한방에 데들리 상태에 빠지게하는 무시무시한 파괴력, 어디에 얼굴이 있는지 전혀 알수 없는 얼굴, 얼마나 운동을 했으면 근육이 다 돌처럼 됐을까? 필자가 누구를 묘사 하고 있는지 짐작이 가는 독자가 분명히 있을 것이다. 그렇다 필자는 골렘을 묘사 하고 있었다.

키아던전과  피오르 던전에 보스방에서 서식하고 있는 골렘, 타 온라인 게임에서는 중간 보스 또는 일반 몬스터 급으로 등장하지만 마비노기에서 만큼은 엄청난 맷집과 똥파워를 자랑하는 보스로 등장한다.

판타지 게임을 하면서 얼마나 자주 등장하는 몬스터인지 필자가 설명하지 않아도 다 알고 있을 것이다. 필자는 이번 시간에 판타지 세계의 골렘에 대해서 얘기 해보려고 한다. 다소 지루한 내용이 될지 모르지만 독자들에게 판타지의 지식을 심어주고자 함이니 노력을 가상히 여기어 읽어 주기를 당부한다.

`골렘`은 유태인의 전설에 의해 탄생된 몬스터이다. 왕의 폭정에 괴로워하던 유태인들은 수호신으로서 거대한 진흙 인형을 만들었는데 이 인형에게 악마가 생명력을 불어넣어 왕에게 대항하게 한 것이 골렘이다.

중세 유럽에서 `골렘`은 전설의 영역에서 벗어나 하나의 사상으로까지 발전했다. 즉, 무의 상태에서 생명을 만들어 낸다고 하는, 인간에게 있어 궁극적인 테마의 제재가 된 것이다. 이 제재를 다루는 것은 연금술사나 밀교의 신앙인뿐이었다.

`골렘`이라는 단어가 생긴 것은 `카발리스트`라는 밀교 신앙자에 의해서였다. 그는 어떤 수수께끼를 풀기 위해 성서의 문자를 바꿔놓았는데 그 결과 만들어진 것이 이 `골렘`이라는 단어였다고 한다. 단어에 의해 만들어진 `골렘`은 마찬가지로 단어에 의해 원래의 점토로 다시 돌아간다고 한다.

중세 유럽의 `카발리스트`들이 `골렘`에 관한 연구를 하고 있었던 것과는 별도로 세계 각지에서도 무에서 생명을 만들어 내려고 하는 연구가 진행되었다. 13세기 독일의 연금술사 `알베르투스`는 30년 걸려서 점토로 `골렘`을 만들어 냈다고 한다.

`알베르투스`가 만든 `골렘`은 건물을 부시면서 걸어다니는 거대한 몬스터가 아니라 인간 정도 크기의 로봇과 같은 것이었다고 한다. 이 `골렘`은 걷고 말하고 인간의 질문에 답하고 수학 문제를 풀 수 있을 정도로 잘 만들어졌다. 그러나 너무 지나치게 떠드는 탓에 `알베르투스`의 제자가 망치로 부셔버렸다고 한다.

일반적으로 `골렘`은 악의 마법사에 의해 만들어지고 악의 마법사에 의해 움직이도록 되어 있다고 하는데, 원래 유태인의 전설에서는 민중을 돕는 좋은 존재였다. 그런데 골렘이 왜 악의 몬스터가 된 것일까? 그 원인은 `기독교 사상`과 깊이 관련되어 있는 것 같다.

기독교에서는 신 이외에 생명을 머물게 할 수 있는 자는 없다고 되어 있다. 만일 그러한 자가 있다면 그것은 바로 악마일 것이다. 즉, `골렘`을 만들어 내는 것은 악마의 임무라는 것이다.이렇게 골렘=악 이라는 도식이 성립하게 된 것이다.

골렘에 관한 이야기는 중세 유럽뿐만 아니라 현대에 있어서도 매우 친숙한 이야기이다. 대표적인 것으로는 `프랑켄쉬타인` 박사가 만든 몬스터를 들 수 있다. 고대에서 현대에 이르기까지 생명 있는 것을 만들어 내려는 인간의 시도는 끝이 없는 것 같다. 몬스터로서의 `골렘`의 특징으로는 진흙과 바위 등의 무기질에 유사생명을 부여받은 것이므로 지능이 없다는 점이 있다.

따라서 마법사가 `골렘`에게 명령을 내리면 이곳을 지켜라 , XX를 죽여라 등의 간단한 명령밖에 수행하지 못한다. 이 명령은 일종의 봉인(封印)이므로 명령이 없어지거나 마법사가 죽어버리면 골렘은 제멋대로 난폭해져 버린다. 이렇게 되면 `엘리멘탈`과 마찬가지로 누군가가 `골렘`을 쓰러뜨릴 때까지 난행을 멈추지 않는다.

종류에 따라 다르긴 하지만 `골렘`을 무찌르기 위해서는 `골렘` 자체를 파괴해야 한다. 이밖에도 `골렘`을 만들어 낼 때 함께 만들어진 `골렘`을 파괴시키는 주문이 있다면 싸우지 않고도 물리칠 수가 있다.

`골렘`의 공격방법은 대부분의 경우 맨주먹으로 치는 것이다. 경우에 따라서는 무기를 사용할 때도 있는데, 무기라고 해봤자 기껏해야 곤봉 정도이다. `골렘`을 종류별로 살펴보면 다음과 같다.

클레이 골렘(CLAY GOLEM)

유태인의 전설에 등장하는 골렘이다. 유태인뿐만 아니라 모든 전설에 등장하는 골렘은 대부분 이 골렘이다. 클레이란 점토를 말한다. 몸이 흙으로 만들어져 있기 때문에 칼 등의 무기로 찔러도 상당히 큰 대미지(팔을 자른다든가)를 입히지 않는 한 거의 효과가 없다.


인간의 육체를 사용해 만든 골렘이다. 프랑켄쉬타인이 만든 몬스터도 이 플레시 골렘이었다. 좀비와 비슷한 몬스터이지만, 좀비가 보통 사람 정도의 크기인데 비해 플레시 골렘은 2미터 정도의 큰 키를 가지고 있다. 플레시 골렘은 인간의 육체를 사용하고 있으므로 대미지를 가할 수가 있다. 그러나 인간처럼 출혈과다 등으로 죽지는 않는다. 플레시 골렘뿐만 아니라 모든 골렘에게는 심리적인 마법이 전혀 효과가 없다.

아이언 골렘(IRON GOLEM)

아이언이란 철을 말한다. 따라서 골렘 중에서는 가장 강력한 힘을 가지고 있으며 키도 5~6미터나 된다. 헤이파스토스가 만든 청동 거인 타로스(그리스 신화에 등장하는)가 금속으로 만들어진 골렘 중에서는 가장 유명하다. 아이언 골렘의 몸은 중세에서 가장 강력한 금속인 철로 만들어져 있으므로 보통의 무기로는 부상을 입힐 수가 없다. 마법이 걸린 무기, 그것도 매우 강력한 것이 아니면 안된다. 아이언 골렘은 골렘 중에서는 가장 강력한 공격력을 가지고 있다.


돌로 만들어진 골렘을 말한다. 리빙스태튜와 비슷한 것 같지만, 자신의 의지를 가지고 있지 못한 점이 리빙스태튜와 근본적으로 다르다. 돌로 만들어졌기 때문에 방어력은 높지만 반면에 재빨리 움직일 수 없으므로 공격을 받아도 잘 피하지 못한다. 또한 아이언 골렘과 달리 무기를 사용해 공격하는 일도 없으므로 공격력이 그다지 높지 못하다. 그렇기는 해도 역시 아이언 골렘과 같은 크기의 몸을 가지고 있으므로 간단히 쓰러뜨리기는 어렵다.

키아 던전과 피오드 던전에 나오는 골렘의 모습을 생각해보자. 마비노기에는 현재 스톤 골렘 밖에 존재하지 않는다. 골렘 뿐만 아니라 전체적인 몬스터의 종류가 판타지 라이프를 추구한다는 모토와는 달리 빈약하다는 느낌이 들고 있다. 따지고 들자면 아이템도 빈약하고 마법은 턱없이 빈약하다. 세상에 판타지에 마법이 6가지 밖에 존재하지 않는다는 것은 말도 안되지 않나?

말 만 `판타지 라이프` 라고 하지 말고 계속 모자라는 부분을 추가해서 정말 판타지 세계에 살고 있는 듯한 그날을 기다려보며 이 기사를 마치도록 하겠다.

보스 몬스터 탐구는 계속 되어야 한다;;;

일단 전의 므미님의 공략의 끝자락에서 얻은 ′마우러스′ 키워드를 가지고 많은 유저분들이 헤맬수 있을거라고 봅니다.

특히 타르라크에게 가는 줄 알고 타르라크에게 힘들게 가는 일이 있을수도 있겠지만 이 키워드는 타르라크나 메이븐이 아닌 티르코네일의 촌장 `던컨`에게 물어봐야 합니다.

그럼 던컨은 `마우러스 구이디온` 이라는 새로운 키워드를 주고 마우러스는 제 2차 포워르 전쟁에서 포워르 측의 마법사 자브키엘이 준비하던 마법을 막기위해 침투했다가 장렬히 전사했다는 말을 해줍니다.

그럼 이 키워드를 가지고 `타르라크`에게 찾아가게 되면...

그 마족의 책의 저자인 대마법사 마우러스가 자신의 스승이라는 말을 합니다. 그리고 타르라크는 책을 다시 보며 분노의 서 끝자락에 나오는 ′잃어버린 증표′가 무엇을 의미하는지 고민하다가 순간 분실물이 보관되어있는 관청을 생각해 내고 ′마우러스의 분실물′이라는 키워드가 생기며 관청에서 그 물건을 찾아와 줄것을 요구합니다. 그 즉시 에반에게 찾아가게 되면 에반은 방금전 사제 크리스텔이 그 분실물을 찾아갔음을 이야기 해줍니다.

그리고 다시 크리스텔에게 찾아가면 크리스텔은 붉은 여신의 날개(마스 던전행)과 부서진 토크(마우러스 RP아이템)을 주게 됩니다.

이제 이 부서진 토크를 넣고 들어가게 되면 대마법사 마우러스가 되어서 게임을 플레이 할수 있다.

마우러스는 아이스 볼트,파이어 볼트,라이트닝 볼트가 전부 1랭크이며 한번 차지를 할 경우 5차지가 되는 마우러스의 전용 스킬 ′체인 캐스팅′을 가지고 있어 수월한 플레이가 가능하다.(그 스킬만 배울수 있다면 필자는 무조건 마법사를 키울것이다 -_-)

일단 던전의 난이도는 기존의 마스던전과 동일하다. 코볼트가 나오고 아처도 나오지만 마우러스의 마법 차지 1~2번이면 방 하나가 끝장나니 그다지 어렵지는 않을 것이다.

이렇게 진행을 하다보면 중간에 마우러스가 휴식을 취하고 출구를 찾을 것을 다짐합니다. 그리고 보스방에 들어가게 되면 기존의 X개 타이틀을 가진 헬하운드 대신 고스트 아머들이 대거 등장하고 이벤트가 벌어지게 됩니다.

※ 미중년(?)의 파워풀 한 한마디. 매우 멋집니다 +_+

고스트 아머와 마우러스 간의 대화가 진행되고 얼마 안가 전투가 벌어집니다. 하지만 이 전투의 목적은 고스트 아머를 전멸시키기 위함이 아닌 스토리를 이어나가는 것입니다. 고스트 아머를 2~3마리 정도 처리를 하면(파이어볼트 1번 차지만 하면 한마리씩 죽어나갔다.) 어둠의 군주가 등장하여 싸움을 말립니다.

어둠의 군주와 마우러스와 대화를 하다가 어둠의 군주가 ′진실′을 보여주고 그 ′진실′을 보고 마우러스는 여신 모리안을 만나기를 원하게 되고 등노출 매니아, 입없는 여신 등등의 타이틀을 가진 모리안이 나타나게 됩니다.

그리고 또 새로운 키워드가 나오고 그리고 RP던전은 끝이 납니다.

키워드를 가지고 던컨에게 찾아가면 마우러스가 살아있음에 놀라며 분노의 서를 다시 봐야겠으니 타르라크에게 가져오라는 말을 합니다.(책을 가지고 있을 때는 안 보고... -_-) 그리고 ′분노의 서 돌려받기′라는 퀘스트가 생기게 됩니다. 타르라크는 아무래도 나이가 지긋한 던컨이 많은 것을 알고 있을것이라고 생각하며 그 책을 돌려줍니다.

(사실 필자는 타르라크가 ′그 책은 크리스텔에게 보냈는데요?′ 라든지 ′책을 복사하고 있으니 한달뒤에 오시면 돌려드리겠습니다′같은 식의 ′하루 버티기′ 혹은 ′유저 부려먹기 퀘스트′가 진행될까봐 조마조마했었다. -_-)

그리고 그 책을 던컨에게 가지고 가면 던컨은 복수의 서 가 3부작일 거라는 청천벽력과도 같은 소리를 합니다. 물론 아직 잘 모르시겠지만 3부작이라는 의미는 나중에 보시면 아시겠지만 엄청난 시간을 요구한다는 뜻입니다.

기다려라. 그러면 책이 도착할 것이다.

기다려라. 그러면 번역 될 것이다.

같은 일이 일어난다는 것이지요 -_-

일단 던컨은 이 책을 번역한 사람에게 다음 권에 대해서 물어봐 달라고 합니다.

네... 번역한 사람은 던바튼에 있지요. 밀납을 쓰던가 버스를 타던가 아니면 운동을 위한 도보를 해서 크리스텔에게 가면 자신은 모르지만 책방의 아이라 라면 알지도 모른다는 말을 합니다.

그리고 아이라에게 찾아가면 ′분노의 서 돌려받기′ 퀘스트가 끝이 나게 됩니다. 아이라는 1권은 가지고 있으니 낱개로 팔아야 한다고 궁시렁 대다가 마족의 언어로 쓰여진 책이라고 하자 사람이 읽지 못할 것을 총판에서 찍을리가 없다고 말을 합니다. 그러나 아이라는 그런 책을 찾는것에 흥미를 가지며 수소문을 할테니 좀 기다려 달라는 말을 합니다. (그 좀 기다려 달라는 것은?)

물론 그 기다리라는 것은 현실로 다음날에 오라는 것을 뜻합니다 -_-;;;

이제 분노의 서 2권에 대한 퀘스트로 들어가겠습니다.

분노의 서 2권에 대한 정보가 들어왔다고 부엉이가 날라오면 아이라를 찾아갑니다. 아이라는 영원의 땅, 티르 나 노이의 저자인 레슬리 씨가 그 책을 키아던전에서 본적이 있다고 하며 그 책을 보기 위해서는 현자의 메모 라는 아이템을 넣고 들어가야 한다는 말을 합니다.

현자의 메모와 키아 던전으로 가는 붉은 여신의 날개를 함께 줍니다. 참고로 이 키아던전은 팀플로 가서 클리어 해도 상관은 없습니다. 그리고 키아던전의 보상방에서 나오는(보상방에서 역인챈 굵은 나무가지가 안 나오고 대신 분노의 서가 나오게 됩니다.) 분노의 서를 들고 던컨에게 가지고 가면 던컨은 수고했다는 말과 함께 또 다시 번역! 해달라는 말을 합니다.

던바튼의 크리스텔에게 가게 되면 크리스텔은 번역을 해줄테니 책을 놓고가라고 합니다. 물론 그 번역기간은 현실시간으로 하루가 되겠습니다. -_-

현실시간으로 하루가 지나면 번역이 끝났음을 알리는 부엉이가 찾아오는데 그것을 보고 크리스텔을 찾아가면 크리스텔은 분노의 서 2권 번역본을 줍니다. 그리고 던컨에게 가게 되면 던컨은 정독을 하고 이번엔 3권을 달라고 합니다. 물론 다시 던바튼 서점의 아이라 에게 가야 합니다. 그러면 아이라는 구하도록 노력은 하겠지만 힘들거라는 말을 남깁니다.

그래도 하루가 지나면 부엉이가 오니 걱정을 놓고 기다리시기 바랍니다 -ㅅ-;;;(이제 이쯤 되면 메인 시나리오를 하는 많은 유저들이 인내의 도를 득도했을 것이라고 본다. -_-)

하루가 지나고 접속을 하면 아이라에게 부엉이가 날라오고 아이라에게 찾아가면 아이라는 ′분노의 서 3권′의 행방이 교복 매니아 라사에게 그 책이 있는것을 본적이 있다는 말을 합니다. 그대로 라사에게 찾아가면 라사는 가이레흐 유적 발굴장에서 일하는 세이머스 씨가 얻은 책이었고 그 책의 번역을 맡았던 것이라는 말을 합니다.

그리고 라사는 다시 세이머스에게 그 책을 보냈다는 말을 합니다. 즉, 마비노기 렉신강림 1순위 지역인 드래곤 유적지로 향해야 한다는 말이 됩니다. 반호르 지역에 가까우니 밀납 날개 하나를 구해서 반호르로 간 후 올라오는 것이 던바튼에서 내려가는 것에 비해 시간 절감이 될겁니다.

세이머스에게 찾아가면 세이머스는 자신도 어차피 읽지 못하는거 부탁 하나를 들어준다면 책을 주겠다고 합니다. 그 부탁이란 반호르에 살면서 많은 유저들에게 제련을 하기 위해 필요한 노를 미끼로 돈을 뜯어내는(?) 아들 숀에게 선물을 전해 달라는 것입니다.

선물에 쓰인 말에 전적으로 공감합니다 -_-

그리고 그 선물을 반호르의 숀에게 가져다가 주고 세이머스에게 돌아가면 세이머스는 분노의 서 3권을 줍니다.

이것을 크리스텔에게 가져가면 최대한 빨리 번역을 마치겠다는 말을 합니다.(그렇게 말 해도 하루. -_-) 하루가 지나고 접속을 하면 부엉이가 날라오고 크리스텔에게 가면 크리스텔은 분노의 서 3권 해석본을 전해줍니다. 그리고 또다시 그 책을 정독을 해야 합니다.

필자의 케릭터는 머리가 나쁜건지... -_- 3번의 정독을 했음에도 실패가 되더군요.(3번의 실패끝에 성공을 하고 기뻐서 날뛰는 제 케릭을 보며 한심함이 들었던...)

도대체 넌 누구를 닮아서 그렇게 머리가 나쁜거냐? -_-

정독을 하면 글라스 기브넨 이라는 키워드를 얻습니다. 분노의 서에서는 마족들의 침략 수단이라는 의미를 띄고 있었지만 던컨에게 찾아가면 화들짝 놀라며 어디서 들었냐고 물어보고 분노의 서 3권을 보겠다는 말을 합니다.

그 글라스 기브넨은 전설의 거인으로서 보이는 모든 것을 증오와 분노를 담아 파괴한다는 괴물이라고 말을 해줍니다. 그리고 그 글라스 기브넨에 대한 책을 반호르의 브라이스에게 빌려줬으니 그 책을 찾아보라는 말을 합니다. 물론 반호르로 가야한다는 뜻이 됩니다. 반호르로 가서 브라이스와 대화를 나누면 ′글라스 기브넨의 뼈′ 라는 키워드를 얻게 되고 그 괴물을 부활시키기 위해선 아다만티움이라는 광물질이 필요한데 그것을 반호르의 광산에서만 나온다는 말을 합니다.

그러나 얼마전 부터 그 광산에서는 그 광물질이 나오지 않았고 나오지 않는 원인으로 추정되는 것은 광맥이 끊어졌거나 누군가가 다 캐갔을 것이라는(트X스트냐? 48시간 -ㅅ-) 말을 합니다.

이제 이 키워드로 타르라크에게 찾아가면 타르라크는 글라스 기브넨을 만들기 위해서는 글라스 기브넨의 뼈대를 발굴해 맞춰야 하고 그 빠진 뼈가 있다면 아다만티움을 주조해서 만들어야 한다는 말을 합니다. 그리고 한번 굳어진 아다만티움은 절대 부숴지지 않으니 그 뼈를 부술 생각은 하지도 말라는 말을 합니다.

타르라크는 부탁을 하게 되는데 부서져버린 안경에 기억을 온전히 하기 위해 보존의 마법가루를 구해다 달라는 말을 합니다. 보존의 마법가루는 물건을 더이상 부서지지 않게 하는 효과(퍼거스의 웨폰 브레이크도 걸리지 않는 것일까?) 를 가지고 있다고 하고 그리고 그것을 구해주면 자신의 안경에 그 보존마법을 걸어서 주겠다는 말을 합니다.

이것을 구하기 위해서는 피오드 던전으로 향해야 합니다. 피오드 던전은 단일 층으로 구성되어 1층에 보스룸이 존재하여 빠른 레이드가 가능할거라고 생각하실지 모르겠지만 오토디펜을 가진 몹들로 인해 난이도가 상당하므로 아는 분들과 파티플을 이루어 가거나 아니면 필자처럼 용병을 구해서 가는 방법을 추천합니다.

일단 난이도가 높은 몹으로는 고르곤이라는 황소몬스터와 베어울프라는 몬스터로 두 몬스터 전부 오토 디펜스를 가지고 있으며 베어울프의 경우는 한번 때릴때의 데미지중 대부분이 부상률로 가기때문에 포션빨로 버틴다는것은 상당히 힘듭니다. 참고로 보존의 마법가루는 몬스터가 아닌 보상방에서 나오게 됩니다.

이렇게 마법가루를 구하는데 성공을 하였다면 타르라크에게 돌아가 보존의 마법가루를 건네고 메모리얼 아이템과 붉은 여신의 날개를 획득합시다. 참고로 이 메모리얼 아이템은 라비던전에서 사용해야하며 그 뿐만 아니라 3인플로 파티를 구성한 상태에서 사용해야 합니다.

이제 이 메모리얼 아이템을 3명의 파티를 이룬 상태에서 라비던전에 넣게되면 또다시 세전사가 되어서 게임을 플레이 할 수 있다. 이번 RP는 루에리와 마리의 역활이 특히 중요하며 마리의 경우 새로운 스킬이 가진 상태에서 게임을 플레이 합니다. 게임을 플레이하는 던전은 ′알베이 던전′으로 되어있고 던전 테마음이 다른 던전과 다릅니다.

그리고 세 전사는 이 곳을 ′티르 나 노이′라고 지칭하는데 알베이 던전이 티르 나 노이로 가는 길인지 아니면 사라진 세 전사가 알베이 던전이 티르 나 노이 인것으로 착각한 것인지는 자세히 알 수 가 없습니다. 일단 총 2층으로 구성되어 있으며 1층은 방이 어느정도 수를 가지고 있으나 2층은 방이 달랑 2개 입니다.

보스방에 들어서게 되면 이벤트 화면으로 바뀌게 됩니다. 그 이벤트에서 아주 충격적인 사실이 등장하고(그 사실이란... 비밀입니다~!) RP는 끝이 나게 됩니다. 그러고 나서 새로운 키워드를 얻게 되며 RP가 끝이 납니다.

필자가 타르라크의 이미지를 망치는데에는 고작 3초라는 시간밖에 걸리지 않았다. (도망)

이제 이 키워드로 타르라크와 던컨을 비롯한 에린의 NPC들에게 물어보아야 합니다. 이 다음의 스토리 진행이 구체적으로 제시가 되어있지 않습니다. 어떤 사람의 말로는 모든 에린의 NPC에게 말을 걸고 나서 50번정도의 접속을 해야 받았다는 말도 있고 23번의 접속끝에 받았다는 분도 있었습니다.

하지만 필자의 경우는 모든 에린의 NPC가 아닌 티르코네일과,던바튼의 NPC들에게 물어보고 우연히 캠핑패널티 때문에 껏다가 킨것 한번(강조중!!!)때문에 바로 모리안 여신이 나타났습니다. 이것으로 보아서 2가지의 추측이 가는 것이 있는데 필자가 행동했던 휴식 상태에서 껏다 킨 것 때문이 아닌가 싶은 것이 있고 또 하나의 경우로는 필자가 여신 모리안의 사랑을 받고 있는게 아닌가 싶은것도 있습니다.(-_-)

이 꿈을 꾸고 나면 여신의 펜던트가 손에 쥐여지게 되고 새 키워드가 생기게 되는데 이 키워드로 던컨에게 물어보면 ′티르 나 노이로 가는 방법′이라는 키워드가 생기며 타르라크에게 티르나노이로 가는 방법에 대해서 알아오라고 합니다. 그렇지만 타르라크에게 가게되면 알려주지 않습니다.

다시 던컨에게 찾아가면 타르라크가 되었던 크리스텔이 되었던 간에 빨리 알아오라고 합니다. 그리고 크리스텔에게 말을 하면 크리스텔은 레벨 40 이상이 되어야 한다는 말을 합니다.(한마디로 허접 즐 이라는 뜻인듯. -_-) 그리고 크리스텔과 타르라크가 한가지 의미있는 말을 담는데... 크리스텔의 경우는 ′소울 스트림의 인도자 마저도 도와줄수가 없다′라는 말을 합니다.

그 말을 풀이해보자면 나오의 부활이 안된다는 뜻인것으로 추정되며 타르라크의 경우는 ′에린과는 다른 방식으로 되어있는 곳이라고 하며 처음에 자신과 함께 들어갔던 동료가 아니고서는 부활을 해줄수 없다′ 라는 말을 합니다. 즉 마지막 여신상 부활이 안되고 오로지 동료에 의한 부활만 가능한 것으로 추정됩니다.

아직 필자가 레벨이 34인 관계로 더이상의 시나리오 진행은 불가능 했다.(현재 만돌린 4,5채널의 키아 무한 순례중) 확실히 메인시나리오를 즐기며 깨달은 것 중 하나는 상당한 인내심이 필요하고 많은 금전(밀납날개값)이 빠져나가게 될 것이라는 것이다.

하지만 무한 사냥에 지친 유저에게 새로운 재미로 다가올수 있는 메인 시나리오.(그러나 복수의 서 퀘스트에 들어서면 도리어 부작용이 일어날수도...)

아주 인내심이 좋은 유저분들에게 추천해 드립니다. ^^

- 공략하라고 리니지 GA에서 끌려온 수습기자 마나신궁 -

6월 22일 정식서비스 이후, 마비노기에는 새로운 의상이 많이 추가 되었었다.

추가 된 의상 중 극소수를 제외한 나머지 의상들은 유저들의 기대했던 것보다 디자인과 기능 면에서 부족한 점이 많아서 유저들의 외면을 받고 있는것은 누구나 아는 사실 일 것이다.

7월 6일 업데이트 된 내용을 보면 `데브캣`의 제휴사 `아이겐포스트`에서 실제로 판매하는 의상이 게임내에서 유니크성 아이템으로 한정 된 시간(7월 20일)까지 판매 될 것이라고 한다.

이번에 추가된 아이템을 보면 남자용과 여자용 하나씩이고 옵션이 없는 것임에도 불구하고 유니크 아이템이라는 이유로 가격이 비싸기 때문에 유저들의 반응은 그리 환영스럽지는 못하다.

※ 의상 원본 : http://www.mabinogi.com/4th/event_flff(b03).asp

남성복 아이겐 보이시 캐주얼

여성복 아이겐 걸리시 캐주얼

이상하게도 남자와 여자 옷이 같은 가격임에도 불구하고 내구력이 10이나 차이가 난다.

아이겐포스트 아이템 판매 NPC의 대사와 위치

게임에 실제 존재하는 회사의 광고가 삽입 되는 경우는 많았지만 의류 업체의 상품이 직접 적용 되는 것은 아주 드문 경우, 이런 광고가 가능한 것은 게임의 배경을 중세시대와 현대시대의 중간 쯤으로 설정한 마비노기 이기때문에 가능한 것 일까? 아마도 광고효과가 좋다면 앞으로 아이겐포스트의 의상은 추가 될 듯 한 느낌이 강하게 보인다.

다른 온라인 게임보다 튀는 아이디어로 항상 새로운 시도를 하는 `데브캣`의 자세는 높이 평가 할 만 하다. 하지만 일부 유저들은 "이런 아이템을 추가 하려고 연구하는 시간에 서버 안정화와 렉을 줄일 고민은 왜 하지 않느냐"라는 반응과 "제휴업체의 지원으로 좀 더 좋은 환경에서 게임을 할 수 있게 하려는 것이 아닐까" 이라는 대립적인 반응을 보이고 있다.

혹자는 마비노기를 커뮤니티가 중요시 되는 2세대 MMORPG 게임이라고 한다. 그래서 캐릭터의 의상은 게임에서 아주 중요한 자리를 차지하고 있으며 자신의 캐릭터를 꾸밀수 있는 의상이 다양해야 유저들에게 환영받을수 있을 것 이라고 말한다.

그러나 게임에 다양한 이벤트와 새로운 아이템들을 서비스 하는 것 보다 급한 것은 게임서버 안정화와 밸런스 조절이 아닐까? 진정한 판타지 라이프를 지향하는 `데브캣`이 앞으로 어떠한 길을 선택 할지 그 행로를 주목해 볼 필요가 있을 것 같다.

** 3차원 엔진 **
copyrightⓒ 김성완(찐빵귀신) [1999년 12월 21일]

1. 상용 3차원 엔진

국내는 대부분 Direct3D에 의존해서 3차원 게임을 개발하고 있고, 본격적인 의미의 3차원 엔진을 갖추고 있는 곳은 별로 없습니다.

상용3차원 엔진을 사서 하던 아니면 공개용 3차원 엔진을 사용하던 기본적인 3차원 엔진을 직접 개발 할 수 있을 정도의 지식과 능력을 갖추고 있지 않으면 제대로 활용하기 힘듭니다.

그러니까 3차원 엔진을 직접 개발할 능력이 되는데, 시간을 절약하기 위해서 상용3차원 엔진을 사는게 아니라면 사놓고도 어떻게 사용하는지 몰라서 헤멜 것이고, 결국 3차원 엔진을 개발하는데 드는 시간과 다름없는 시간을 소비할 겁니다. 기
본적인 3차원 프로그래밍에 대한 지식을 갖추었다 해도 전체 개발진이 상용 3차원 엔진에 익숙해져서 100% 제대로 활용하려면, 족히 6개월 정도의 시간이 필요합니다.
상용 3차원 엔진이 3차원 게임을 자동으로 척척 만들어 주는게 아니므로...

레인보우6 개발진도 3차원 엔진 라이센스 해서 개발하다 막판엔 결국 직접 만들어서 했다는 슬픈 스토리가 있죠.

2-3년씩 충분한 개발기간이 주어지는게 아니라면 남이 만든 엔진으로 어떻게 해볼려고 해도 부족한 시간속에서 제대로 분석하기도 힘들고, 일일이 기능들을 적용하고 테스트하고 해야하는데 시간은 턱없이 부족하고, 프로그램들에는 버그가 있게 마련이고 자기가 짠 프로그램도 버그 잡기 힘든데 하물며 남이 짠 건...

그리고 상용 3차원 엔진을 라이센스 할때는 반드시 소스도 함께 라이센스 해야합니다.
그런데 대개 소스까지 라이센스하면 라이센스 비가 엄청 뛰죠.
참고로 모노리쓰 사의 엔진의 경우 25만불 정도...
퀘이크나 언리얼 엔진의 경우는 소문에는 백만불 정도래는데..
아마 50만불 정도가 아닐까 합니다.
아주 저가형 엔진의 경우 '파워렌더'가 있는데 소스까지 라이센스하면 만불입니다.
그런데 싼게 비지떡이죠.

당장은 남이 만든 엔진을 사용하더라도 장기적인 안목에서 볼때 최소한 1년정도는 3차원 엔진 개발에 투자해야할 겁니다.

+ Recent posts