Games Look Bad, Part 1: HDR and Tone Mapping

This is Part 1 of a series examining techniques used in game graphics and how those techniques fail to deliver a visually appealing end result. See Part 0 for a more thorough explanation of the idea behind it.

High dynamic range. First experienced by most consumers in late 2005, with Valve’s Half Life 2: Lost Coast demo. Largely faked at the time due to technical limitations, but it laid the groundwork for something we take for granted in nearly every blockbuster title. The contemporaneous reviews were nothing short of gushing. We’ve been busy making a complete god awful mess of it ever since.

Let’s review, very quickly. In the real world, the total contrast ratio between the brightest highlights and darkest shadows during a sunny day is on the order of 1,000,000:1. We would need 20 bits of just luminance to represent those illumination ranges, before even including color in the mix. A typical DSLR can record 12-14 bits (16,000:1 in ideal conditions). A typical screen can show 8 (curved to 600:1 or so). Your eyes… well, it’s complicated. Wikipedia claims 6.5 (100:1) static. Others disagree.

Graphics programmers came up with HDR and tone mapping to solve the problem. Both film and digital cameras have this same issue, after all. They have to take enormous contrast ratios at the input, and generate sensible images at the output. So we use HDR to store the giant range for lighting computations, and tone maps to collapse the range to screen. The tone map acts as our virtual “film”, and our virtual camera is loaded with virtual film to make our virtual image. Oh, and we also throw in some eye-related effects that make no sense in cameras and don’t appear in film for good measure. Of course we do.

And now, let’s marvel in the ways it goes spectacularly wrong.

battlefield_1_10 20170716-rural-08_1500890393 cod132 32272656804_ccca70cc7e_o

In order: Battlefield 1, Uncharted: Lost Legacy, Call of Duty: Infinite Warfare, and Horizon Zero Dawn. HZD is a particular offender in the “terrible tone map” category and it’s one I could point to all day long. And so we run head first into the problem that plagues games today and will drive this series throughout: at first glance, these are all very pretty 2017 games and there is nothing obviously wrong with the screenshots. But all of them feel videogamey and none of them would pass for a film or a photograph. Or even a reasonably good offline render. Or a painting. They are instantly recognizable as video games, because only video games try to pass off these trashy contrast curves as aesthetically pleasing. These images look like a kid was playing around in Photoshop and maxed the Contrast slider. Or maybe that kid was just dragging the Curves control around at random.

The funny thing is, this actually has happened to movies before.

maxresdefault

Hahaha. Look at that Smaug. He looks terrible. Not terrifying. This could be an in-game screenshot any day. Is it easy to pick on Peter Jackson’s The Hobbit? Yes, it absolutely is. But I think it serves to highlight that while technical limitations are something we absolutely struggle with in games, there is a fundamental artistic component here that is actually not that easy to get right even for film industry professionals with nearly unlimited budgets.

Allow me an aside here into the world of film production. In 2006, the founder of Oakley sunglasses decided the movie world was disingenuous in their claims of what digital cameras could and could not do, and set out to produce a new class of cinema camera with higher resolution, higher dynamic range, higher everything than the industry had and would exceed the technical capabilities of film in every regard. The RED One 4K was born, largely accomplishing its stated goals and being adopted almost immediately by one Peter Jackson. Meanwhile, a cine supply company founded in 1917 called Arri decided they don’t give a damn about resolution, and shipped the 2K Arri Alexa camera in 2010. How did it go? 2015 Oscars: Four of the five nominees in the cinematography category were photographed using the ARRI Alexa. Happy belated 100th birthday, Arri.

So what gives? Well, in the days of film there was a lot of energy expended on developing the look of a particular film stock. It’s not just chemistry; color science and artistic qualities played heavily into designing film stocks, and good directors/cinematographers would (and still do) choose particular films to get the right feel for their productions. RED focused on exceeding the technical capabilities of film, leaving the actual color rendering largely in the hands of the studio. But Arri? Arri focused on achieving the distinctive feel and visual appeal of high quality films. They better understood that even in the big budget world of motion pictures, color rendering and luminance curves are extraordinarily difficult to nail. They perfected that piece of the puzzle and it paid off for them.

Let’s bring it back to games. The reality is, the tone maps we use in games are janky, partly due to technical limitations. We’re limited to a 1D luminance response where real film produces both hue and saturation shifts. The RGB color space is a bad choice to be doing this in the first place. And because nobody in the game industry has an understanding of film chemistry, we’ve all largely settled on blindly using the same function that somebody somewhere came up with. It was Reinhard in years past, then it was Hable, now it’s ACES RRT. And it’s stop #1 on the train of Why does every game this year look exactly the goddamn same?

The craziest part is we’re now at the point of real HDR televisions showing game renders with wider input ranges. Take this NVIDIA article which sees the real problem and walks right past it. The ACES tone map is destructive to chroma. Then they post a Nikon DSLR photo of a TV in HDR mode as a proxy for how much true HDR improves the viewing experience. Which is absolutely true – but then why does the LDR photo of your TV look so much better than the LDR tone map image? There’s another tone map in this chain which nobody thought to examine: Nikon’s. They have decades of expertise in doing this. Lo and behold, their curve makes a mockery of the ACES curve used in the reference render. Wanna know why that is? It’s because the ACES RRT was never designed to be an output curve in the first place. Its primary design goal is to massage differences between cameras and lenses used in set so they match better. You’re not supposed to send it to screen! It’s a preview/baseline curve which is supposed to receive a film LUT and color grading over top of it.

“Oh, but real games do use a post process LUT color grade!” Yeah, and we screwed that up too. We don’t have the technical capability to run real film industry LUTs in the correct color spaces, we don’t have good tools to tune ours, they’re stuck doing double duty for both “filmic look” as well as color grading, the person doing it doesn’t have the training background, and it’s extraordinary what an actual trained human can do after the fact to fix these garbage colors. Is he cheating by doing per-shot color tuning that a dynamic scene can’t possibly accomplish? Yes, obviously. But are you really going to tell me that any of these scenes from any of these games look like they are well balanced in color, contrast, and overall feel?

Of course while we’re all running left, Nintendo has always had a fascinating habit of running right. I can show any number of their games for this, but Zelda: Breath of the Wild probably exemplifies it best when it comes to HDR. double_1487330294849_file_the_legend_of_zelda_-_breath_of_the_wild_screenshot___3__

No HDR. No tone map. The bloom and volumetrics are being done entirely in LDR space. (Or possibly in 10 bit. Not sure.) Because in Nintendo’s eyes, if you can’t control the final outputs of the tone mapped render in the first place, why bother? There’s none of that awful heavy handed contrast. No crushed blacks. No randomly saturated whites in the sunset, and saturation overall stays where it belongs across the luminance range. The game doesn’t do that dynamic exposure adjustment effect that nobody actually likes. Does stylized rendering help? Sure. But you know what? Somebody would paint this. It’s artistic. It’s aesthetically pleasing. It’s balanced in its transition from light to dark tones, and the over-brightness is used tastefully without annihilating half the sky in the process.

Now I don’t think that everybody should walk away from HDR entirely. (Probably.) There’s too much other stuff we’ve committed to which requires it. But for god’s sake, we need to fix our tone maps. We need to find curves that are not so aggressively desaturating. We need curves that transition contrast better from crushed blacks to mid-tones to blown highlights. LUTs are garbage in, garbage out and they cannot be used to fix bad tone maps. We also need to switch to industry standard tools for authoring and using LUTs, so that artists have better control over what’s going on and can verify those LUTs outside of the rendering engine.

In the meantime, the industry’s heavy hitters are just going to keep releasing this kind of over-contrasty garbage.

45hfdmf

Before I finish up, I do want to take a moment to highlight some games that I think actually handle HDR very well. First up is Resident Evil 7, which benefits from a heavily stylized look that over-emphasizes contrast by design.

image_resident_evil_7_32138_3635_0003

That’s far too much contrast for any normal image, but because we’re dealing with a horror game it’s effective in giving the whole thing an unsettling feel that fits the setting wonderfully. The player should be uncomfortable with how the light and shadows collide. This particular scene places the jarring transition right in your face, and it’s powerful.

Next, at risk of seeming hypocritical I’m going to say Deus Ex: Mankind Divided (as well as its predecessor).

041599

The big caveat with DX is that some scenes work really well. The daytime outdoors scenes do not. The night time or indoor scenes that fully embrace the surrealistic feeling of the world, though, are just fantastic. Somehow the weird mix of harsh blacks and glowing highlights serves to reinforce the differences between the bright and dark spots that the game is playing with thematically throughout. It’s not a coincidence that Blade Runner 2049 has many similarities. Still too much contrast though.

Lastly, I’m going to give props to Forza Horizon 3.

forzahorizon39_25_201xiyor

Let’s be honest: cars are “easy mode” for HDR. They love it. But there is a specific reason this image works so well. It is low contrast. Nearly all of it lives in the mid-tones, with only a few places wandering into deep shadow (notably the trees) and almost nothing in the bright highlights. But the image is low contrast because cars themselves tend to use a lot of black accents and dark regions which are simply not visible when you crush the blacks as we’ve seen in other games. Thus the toe section of the curve is lifted much more than we normally see. Similarly, overblown highlights mean whiting out the car in the specular reflections, which are big and pretty much always image based lighting for cars. It does no good to lose all of that detail, but the entire scene benefits from the requisite decrease in contrast. The exposure level is also noticeably lower, which actually leaves room for better mid-tone saturation. (This is also a trick used by Canon cameras, whose images you see every single day.) The whole image ends up with a much softer and more pleasant look that doesn’t carry the inherent stress we find in the images I criticized at the top. If we’re looking for an exemplar for how to HDR correctly in a non-stylized context, this is the model to go by.

Where does all this leave us? With a bunch of terrible looking games, mostly. There are a few technical changes we need to make right up front, from basic decreases in contrast to simple tweaks to the tone map to improved tools for LUT authoring. But as the Zelda and Forza screenshots demonstrate, and as the Hobbit screenshot warns us, this is not just a technical problem. Bad aesthetic choices are being made in the output stages of the engine that are then forced on the rest of the creative process. Engine devs are telling art directors that their choices in tone maps are one of three and two are legacy options. Is it bad art direction or bad graphics engineering? It’s both, and I suspect both departments are blaming the other for it. The tone map may be at the end of graphics pipeline, but in film production it’s the first choice you make. You can’t make a movie without loading film stock in the camera, and you only get to make that choice once (digital notwithstanding). Don’t treat your tone map as something to tweak around the edges when balancing the final output LUT. Don’t just take someone else’s conveniently packaged function. The tone map’s role exists at the beginning of the visual development process and it should be treated as part of the foundation for how the game will look and feel. Pay attention to the aesthetics and visual quality of the map upfront. In today’s games these qualities are an afterthought, and it shows.

UPDATE: User “vinistois” on HackerNews shared a screenshot from GTA 5 and I looked up a few others. It’s very nicely done tone mapping. Good use of mid-tones and contrast throughout with great transitions into both extremes. You won’t quite mistake it for film, I don’t think, but it’s excellent for something that is barely even a current gen product. This is proof that we can do much better from an aesthetic perspective within current technical and stylistic constraints. Heck, this screenshot isn’t even from a PC – it’s the PS4 version.

place

Games Look Bad, Part 0: Explanation and Defense

I’m about to start a series of blog posts called Games Look Bad. Before I start throwing stones from my glass house over here, I wanted to offer an explanation of what I’m doing and a defense of why I’m doing it.

There’s no doubt that we’ve seen a sustained and significant period of improvement in real-time computer graphics over the past three decades. We’ve made significant advances in nearly every aspect of visual look and feel, drawing quite a bit from the film industry in the process. So why the heck do most games look so bad?

Games are technically much more sophisticated than ever before, but I’m going to stake out a claim: aesthetically something has gone quite wrong, and the products don’t live up to the hype. Show me a next-gen, cutting edge game and I will show you an image that no competent film industry professional would ever deem acceptable. Why not? The answer lives at the crossroads of art and technology, a strange neglected intermediary which we in the industry tend to avoid talking about. Particularly in the last ten years, several new techniques have appeared that are foundational to practically every high end game on the market. These are well documented from a technical standpoint, and it’s generally assumed that graphics programmers who have stayed current are fluent in at least the basic goals and implementations of these techniques, if not the finer points of them. I won’t labor to build a complete list, but you likely know them: normal maps, HDR/tonemaps, physically based shading, volumetrics, DoF/bokeh, etc.

What’s extremely difficult to find, though, is a discussion of how to make these techniques visually appealing. Oh sure, we’ll sort of handwave it from time to time, but graphics programmers as a set don’t like talking about visual appeal in the way that artists do. It’s much easier to build the tools and then let the artists make it pretty. Except the artists, even the tech artists, don’t always have the know-how or mathematical tools to solve that problem. Sometimes we end up borrowing our looks from someone else – how many of you have googled FilmLut.tga? How many of you are using Unreal’s tone map operator, tweaked or even verbatim?

This series is going to take a sharply critical tone towards most AAA games being shipped today, because it’s my belief that there are fundamental problems with many of the techniques we’re using today that reach beyond strictly technical constraints. Graphics programmers and engines are implementing many techniques for new effects without taking the time or energy to properly refine the visual and aesthetic aspects of those effects. Marketing tells us we should be impressed by all the new features, yet when you take a step back from the fact that these are games and evaluate the images without that context, they look horrible. This is a problem that is fixable today, with current technology.

I don’t know if my thesis here is particularly well developed, but it’s a good excuse for the meat of this series. I don’t want to talk about how to implement techniques. There are many people who have done an excellent job of that and you should have that background coming in. I’m going to talk about the visual choices we make in these techniques, how they make our games better, how they make our games worse, and whether we’re using them well. I’m going to encourage everyone to think critically about why and how we’re implementing the things that make modern games tick, and examine the tunnel vision that has afflicted that process maybe since the beginning. And in the process, I’m going to criticize people’s work which far exceeds my own in every respect, while largely failing to provide solutions to problems. I know that and I accept it. And that is where we shall start.

Quick tip: Retina mode in iOS OpenGL rendering is not all-or-nothing

Some of you are probably working on Retina support and performance for your OpenGL based game for iOS devices. If you’re like us, you’re probably finding that a few of the devices (*cough* iPad 3) don’t quiiite have the GPU horsepower to drive your fancy graphics at retina resolutions. So now you’re stuck with 1x and 4x MSAA, which performs decently well but frankly looks kind of bad. It’s a drastic step down in visual fidelity, especially with all the alpha blend stuff that doesn’t antialias. (Text!) Well it turns out you don’t have to choose such a drastic step. Here’s the typical enable-retina code you’ll find on StackOverflow or whatever:
if([[UIScreen mainScreen] respondsToSelector:@selector(scale)] && [[UIScreen mainScreen] scale] == 2) { self.contentScaleFactor = 2.0; eaglLayer.contentsScale = 2.0; }
//some GL setup stuff ...

//get the correct backing framebuffer size int fbWidth, fbHeight; glGetRenderbufferParameteriv(GL_RENDERBUFFER, GL_RENDERBUFFER_WIDTH, &fbWidth); glGetRenderbufferParameteriv(GL_RENDERBUFFER, GL_RENDERBUFFER_HEIGHT, &fbHeight);

The respondsToSelector bit is pretty token nowadays – what was that, iOS 3? But there’s not much to it. Is the screen a 2x scaled screen? Great, set our view to 2x scale also. Boom, retina. Then we ask the GL runtime what we are running at, and set everything up from there. The trouble is it’s a very drastic increase in resolution, and many of the early retina devices don’t have the GPU horsepower to really do nice rendering. The pleasant surprise is, the scale doesn’t have to be 2.0. Running just a tiny bit short on fill?
if([[UIScreen mainScreen] respondsToSelector:@selector(scale)] && [[UIScreen mainScreen] scale] == 2) { self.contentScaleFactor = 1.8; eaglLayer.contentsScale = 1.8; }
Now once you create the render buffers for your game, they’ll appear at 1.8x resolution in each each direction, which is very slightly softer than 2.0 but much, much crisper than 1.0. I waited until after I Am Dolphin cleared the Apple App Store approval process, to make sure that they wouldn’t red flag this usage. Now that it’s out, I feel fairly comfortable sharing it. This can also be layered with multisampling (which I’m also doing) to fine tune the look of poly edges that would otherwise give away the trick. I use this technique to get high resolution, high quality sharp rendering at 60 fps across the entire range of Apple devices, from the lowly iPhone 4S, iPod 5, and iPad 3 on up.

A Glimpse of What I’m Working On

I’ve decided to focus a little less on complaining and a little more on the actual work I do. Here’s a teaser:
Monitor array
I had a substantial amount of help with the over-water environmental rendering (not pictured) from a friend of mine, Nauful Shaikh. See his site for some great graphics work.

This wall of monitors was graciously made available to us by the Computer Science department for a presentation to the President of the University as well as a healthy mix of department chairs from Neuroscience, Neurology, Brain Sciences Institute, Computer Science, and Electrical/Computer Engineering at Johns Hopkins. I’m driving it at 60fps off a single 7970 in Eyefinity 6. It was supposed to be Crossfire but somebody’s driver is broken *cough cough* so I had to gut the render pipeline somewhat. Total resolution is 5760×2160 plus some margins for bezel compensation. The actual app is Kinect and PS Move enabled, and maybe I can share more about it this summer. The focus is a dolphin which we’ve developed with significant help and guidance from the National Aquarium in Baltimore, who let us work directly with their dolphins to better understand the animals, how they move and think, etc.

We’re planning to launch an iPad version this year on the iTunes App Store, and create a large scale interactive installation version for aquariums, hospitals, museums and similar at 4K resolution in stereoscopic 3D.

Follow-up on DirectX/XNA

Received today, and hopefully the “you can quote me” part means this is an exception to NDA because it’s important:

The message said “DirectX is no longer evolving as a technology.” That is definitely not true in any way, shape or form. Microsoft is actively investing in DirectX as the unified graphics foundation for our key platforms, including Xbox 360, Windows Phone and Windows. DirectX is evolving and will continue to evolve. For instance, right now we’re investing in some very cool graphics code authorizing [sic] technology in Visual Studio. We have absolutely no intention of stopping innovation with DirectX, and you can quote me on that. 🙂

My intent was not to start a firestorm of questioning on DirectX’s future viability, and I said up-front that I felt that communication was poorly worded with regards to intent. My frustrations were also apparently poorly worded. Since I accidentally launched this, let’s clear up a few things.

Number One: In the absolute (and implausible) worst case scenario that MS really scales back their Direct3D support to a minimum, that situation is still better than OpenGL. The Direct3D system is a technically superior piece of technology, and support for working with it is still better than OpenGL whether you’re a hobbyist or a pro. I cannot emphasize this point enough, so for the love of god stop bringing up OpenGL. It’s a badly designed API and has been since I started doing this in 2000.
Number Two: A new picture is coming into focus that shifts a lot of the DirectX SDK’s burden onto VS. This hasn’t been made previously clear to us on the MVP side. As I’ve begun to explore the tools already inside VS 2012, I like what I’m seeing. It’ll take some time to see how it all plays out, but in a very real way having Direct3D integrated into core VS development is a serious promotion.
Number Three: There’s more content in today’s email regarding XNA which I don’t care to share, thanks to a stern NDA reminder. (Ironically, when MS finally gives us what they should be saying to the public all along, I can’t share it.) But this is very much a case of “put up or shut up” and defending XNA’s status as a serious technology seems patently ridiculous to me right now. The community, whether it’s my work or someone else’s, has stepped in to integrate .NET and DirectX for many wonderful use cases. But there are things we can’t do (like Xbox) and it’s clear that matters to a lot of people. It’s not clear that it matters to Microsoft.

That said, I am not walking back my actual complaints about how DirectX and XNA are being handled. I like the work that’s been done in integrating VS and DirectX, which is arguably many years overdue. That doesn’t make everything else okay. The fact that we’re having this discussion, the fact that my dashed off blog post exploded on Twitter, the fact that clarification had to be written up behind the scenes — this is a problem. Which brings me at long last to the actual point I was trying to make yesterday:

As developers, we need Microsoft to communicate clearly with us, in public. As MVPs we were asked to act as community representatives, to guide everyone interested in the tech and have an open line on future development. Apparently that means we get half-hearted vague emails from time to time that dodges our serious questions and casts further doubts about the status of the technology and teams, all covered by an NDA agreement. And then, shockingly enough, people get the wrong idea. We’re sitting on the outside, trying to play this stupid guessing game of “which Microsoft technology is alive?” XNA doesn’t support DirectX 10+ or Windows 8, but it’s still a “supported product”, as if that means anything in the real world. Windows XP is still a “supported product” too.

It shouldn’t take a leaked email to force a straight answer.

DirectX/XNA Phase Out Continues

—
Please read the follow up post.
—

This email was sent out to DirectX/XNA MVPs today:

The XNA/DirectX expertise was created to recognize community leaders who focused on XNA Game Studio and/or DirectX development. Presently the XNA Game Studio is not in active development and DirectX is no longer evolving as a technology. Given the status within each technology, further value and engagement cannot be offered to the MVP community. As a result, effective April 1, 2014 XNA/DirectX will be fully retired from the MVP Award Program.

There’s actually a fair bit of information packed in there, and I think some of it is poorly worded. The most stunning part of it was this: “DirectX is no longer evolving as a technology.” That is a phrase I did not expect to hear from Microsoft. Before going to “the sky is falling” proclamations, I don’t think this is a death sentence for DirectX, per se. It conveys two things. Number one, DirectX outside of Direct3D is completely dead. I hope this is not a shock to you. Number two, it’s a reminder that Direct3D has been absorbed into Windows core, and thus is no more a “technology” than GDI or Winsock.

Like I said, poorly worded.

There are a few other things packed in there. XNA Game Studio is finished. That situation has been obvious for years now, so it also should not really come as a surprise either. And finally the critical point for me: our “MVP” role as community representatives and assistants is appreciated but no longer necessary. On this point, the writing has been on the wall for some time and so I should not be surprised. But I am. Maybe dismayed is a better word.

As I’ve said previously, I don’t feel that the way DirectX has been handled in recent years has been a positive thing. A number of technical decisions were made that were unfortunate, and then a number of business and marketing type decisions were made that compounded the problem. Many of the technologies (DirectInput, DirectSound, DirectShow) have splayed into a mess of intersecting fragments intended to replace them. The amount of developer support for Direct3D from Microsoft has been unsatisfactory, and anecdotal reports of internal team status have not been promising. Somebody told me a year or two back that the HLSL compiler team was one person. That’s not something you want to hear, true or not. Worst of all, though, was the communication. That’s the part that bugs me.

When you are in charge of a platform, whatever that platform may be, developers invest in your platform tech. That’s time and money spent, and opportunity costs lost elsewhere. This is an expected aspect of software development. As developers and managers, we want as much information as possible in order to make the best short and long term decisions on what to invest in. We don’t want to rewrite our systems from scratch every few years. We don’t want to fall behind competitors due to platform limitations. Navigating these pitfalls is crucial to survival for us. Microsoft has a vested interest in some level of non-disclosure and secrecy about what they’re doing. All companies do. I understand that. But some back and forth is necessary in order for the relationship to be productive.

Look at XNA — there have been a variety of questions surrounding it for years, about the extent to which the technology and its associated marketplace were going to be taken seriously and forward into the future. It is clear at this juncture that there was no future and the tech was being phased out. Direct3D 10 was launched in late 2006, a bit over six years ago, yet XNA was apparently never going to be brought along with the major improvements in DWM and Direct3D. How long was it known internally at Microsoft that XNA was a dead-end? How many people would’ve passed over XNA if MS had admitted circa 2008 (or even 2010, when 4.0 was released) that there was no future for the tech? The official response, of course, was always something vague and generic: “XNA is a supported technology.” That means nothing in Microsoft world, because “it will continue to work in its current state for a while” is not a viable way for developers to stay current with their competition.

Just to be clear, I don’t attribute any of this fumbling to malice or bad faith. There’s a lot of evidence that this type of behavior is merely a delayed reflection of internal forces at Microsoft which are wreaking havoc on the company’s ability to compete in any space. But the simple ground truth is that we’re entering an era where Windows’ domination is openly in question, and a lot of us have the flexibility and inclination to choose between a range of platforms, whether those platforms are personal computers, game consoles, or mobile devices. Microsoft’s offer in that world is lock-in to Windows, in exchange for powerful integrated platforms like .NET which are far more capable than their competitors (eg Java, which is just pathetic). That was an excellent trade-off for many years. Looking back now, though? The Windows tech hegemony is a graveyard. XNA. Silverlight. WPF. DirectX. Managed C++. C++/CLI. Managed DirectX. Visual Basic. So when you guys come knocking and ask us to commit to Metro — sorry, the Windows 8 User Experience — and its associated tech?

You’ll understand if I am not in a hurry to start coding for your newest framework.

Before things get out of hand: No, you should not switch to OpenGL. I get to use it professionally every day and it sucks. Direct3D 11 with the Win8 SDK is a perfectly viable choice, much more so than OpenGL for high end development. None of the contents of my frequent complaints should imply in any way that OpenGL is a good thing.

Gamma FAQ

I am working on Part 2 of my Digital Color posts, but it won’t be ready for a while yet. The goal of that post is to talk all about luminance, brightness, gamma, and the various other attributes and properties of how light a color is, rather than what shade it is.

In the meantime, please accept my apology and consider reading this page I found: the Gamma FAQ by Charles Poynton.

C++-JSON Serialization

I’ve decided to share some code today, just because I’m such a nice guy. Those of you who enjoy the more perverse ways to apply C++ tricks will enjoy this. Those who prefer simpler, more primitive approaches (that’s not a bad thing) may not appreciate this creation as much. What I’ve got here is a utility class that makes it fairly straightforward to serialize C++ objects to and from JSON using the generally decent JsonCpp library. Hierarchies are properly saved and loaded with no real effort. It works well for us, and probably has plenty of limitations too. Maybe some of you out there will find it useful. It seems to be difficult to find decent serialization code that isn’t also somehow awful to use.

This lives in a single file, but the bad news is it takes boost dependencies in order to get type traits. I think everything I’m using from boost is added to C++ core as of TR1, but I haven’t checked. It also depends on JsonCpp, but changing it over to use other JSON, XML, binary, etc libraries shouldn’t be terribly difficult. I don’t know how this compares to other serialization libraries, but boost::serialization sounded like a train-wreck so I wrote my own.

Let’s cover usage first. Generally speaking, you’ll simply add a member function to a structure that declares the members to be serialized (free functions are allowed too). Each declaration is a string name for the value, and the variable to be serialized under that value. There’s a few macros to combine those via preprocessor. Values can also be read-only or write-only serialized. The serializer is able to traverse vectors and structures, and will produce nicely structured JSON. Here’s a sample:

void Serialize(Vector3D& vec, JsonSerializer& s)
{
	//each Vector3D is written as an array
	s.Serialize(0, vec.x);
	s.Serialize(1, vec.y);
	s.Serialize(2, vec.z);
}

struct PathSave {

	bool LeftWall;
	bool RightWall;
	float LeftWallHeight;
	float RightWallHeight;

	vector<Vector3D>	c_Center,
				c_Left,
				c_Right;

	vector<int> hitPillarSingle_type;
	vector<struct PCC> hitPillarSingle_pcc;
	vector<Vector3D> hitPillarSingle_hh;

	int pathPointDensity;
	int pillarNum;
	
	void Serialize(JsonSerializer& s)
	{
		s.SerializeNVP(LeftWall);
		s.SerializeNVP(RightWall);
		s.SerializeNVP(LeftWallHeight);
		s.SerializeNVP(RightWallHeight);
		
		s.Serialize("Center", c_Center);
		s.Serialize("Left", c_Left);
		s.Serialize("Right", c_Right);
		
		s.SerializeNVP(hitPillarSingle_pcc);
		s.SerializeNVP(hitPillarSingle_hh);
		s.SerializeNVP(hitPillarSingle_type);
		
		s.WriteOnly(NVP(pathPointDensity));
		s.ReadOnly(NVP(pillarNum));
	}
};

void SaveToFile(PathSave& path)
{
	JsonSerializer s(true);
	path.Serialize(s);
	std::string styled = s.JsonValue.toStyledString();
	printf("Saved data:\n%s\n", styled.c_str());
}

bool LoadFromFile(const char* filename, PathSave& path)
{
	std::string levelJson;
	bool result = PlatformHelp::ReadDocument(filename, levelJson);
	if(!result)
		return false;

	JsonSerializer s(false);
	Json::Reader jsonReader;
	bool parse = jsonReader.parse(levelJson, s.JsonValue);
	if(!parse)
		return false;
	
	path.Serialize(s);
	return true;
}

And that will generally produce something that looks like this:

{
    "LeftWall" : true,
    "LeftWallHeight" : 4.50,
    "RightWall" : true,
    "RightWallHeight" : 4.50,
    "Center" : [
    [ 0.05266714096069336, 0.0, -15.13085746765137 ],
    [ 0.1941599696874619, 0.0, 1.553306341171265 ],
    [ 0.5984783172607422, 0.0, 50.54330444335938 ]
    ],
    "Left" : [
    [ -10.44694328308105, 0.0, -15.04044914245605 ],
    [ -15.55506420135498, 0.0, 1.709598302841187 ],
    [ -8.466680526733398, 2.896430828513985e-07, 42.68054962158203 ]
    ],
    "Right" : [
    [ 10.55227851867676, 0.0, -15.22126579284668 ],
    [ 15.94338321685791, 0.0, 1.397014379501343 ],
    [ 9.663637161254883, -1.829234150818593e-07, 58.40605926513672 ]
    ],
    "hitPillarSingle_hh" : null,
    "hitPillarSingle_pcc" : null,
    "hitPillarSingle_type" : null,
    "pathPointDensity" : 24,
    "pillarNum" : 0,
}

Now I happen to think that’s fairly tidy, as far as C++ serialization goes. Symmetry is maintained between read and write steps, and there’s very little in the way of syntax magic. I do have a few macros in there (the stuff that says NVP), but they’re optional and I find that they clean things up. Now shield your eyes, because here is the actual implementation.

/*
 * Copyright (c) 2011-2012 Promit Roy
 * 
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 * copies of the Software, and to permit persons to whom the Software is
 * furnished to do so, subject to the following conditions:
 * 
 * The above copyright notice and this permission notice shall be included in
 * all copies or substantial portions of the Software.
 * 
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
 * THE SOFTWARE.
 */

#ifndef JSONSERIALIZER_H
#define JSONSERIALIZER_H

#include <json/json.hpp>
#include <boost/utility.hpp>
#include <boost/type_traits.hpp>
#include <string>

class JsonSerializer
{
private:
	//SFINAE garbage to detect whether a type has a Serialize member
	typedef char SerializeNotFound;
	struct SerializeFound { char x[2]; };
	struct SerializeFoundStatic { char x[3]; };
	
	template<typename T, void (T::*)(JsonSerializer&)>
	struct SerializeTester { };
	template<typename T, void(*)(JsonSerializer&)>
	struct SerializeTesterStatic { };
	template<typename T>
	static SerializeFound SerializeTest(SerializeTester<T, &T::Serialize>*);
	template<typename T>
	static SerializeFoundStatic SerializeTest(SerializeTesterStatic<T, &T::Serialize>*);
	template<typename T>
	static SerializeNotFound SerializeTest(...);
	
	template<typename T>
	struct HasSerialize
	{
		static const bool value = sizeof(SerializeTest<T>(0)) == sizeof(SerializeFound);
	};
	
	//Serialize using a free function defined for the type (default fallback)
	template<typename TValue>
	void SerializeImpl(TValue& value,
						typename boost::disable_if<HasSerialize<TValue> >::type* dummy = 0)
	{
		//prototype for the serialize free function, so we will get a link error if it's missing
		//this way we don't need a header with all the serialize functions for misc types (eg math)
		void Serialize(TValue&, JsonSerializer&);
		
		Serialize(value, *this);
	}

	//Serialize using a member function Serialize(JsonSerializer&)
	template<typename TValue>
	void SerializeImpl(TValue& value, typename boost::enable_if<HasSerialize<TValue> >::type* dummy = 0)
	{
		value.Serialize(*this);
	}
	
public:
	JsonSerializer(bool isWriter)
	: IsWriter(isWriter)
	{ }
	
	template<typename TKey, typename TValue>
	void Serialize(TKey key, TValue& value, typename boost::enable_if<boost::is_class<TValue> >::type* dummy = 0)
	{
		JsonSerializer subVal(IsWriter);
		if(!IsWriter)
			subVal.JsonValue = JsonValue[key];
		
		subVal.SerializeImpl(value);
		
		if(IsWriter)
			JsonValue[key] = subVal.JsonValue;
	}
		
	//Serialize a string value
	template<typename TKey>
	void Serialize(TKey key, std::string& value)
	{
		if(IsWriter)
			Write(key, value);
		else
			Read(key, value);
	}
	
	//Serialize a non class type directly using JsonCpp
	template<typename TKey, typename TValue>
	void Serialize(TKey key, TValue& value, typename boost::enable_if<boost::is_fundamental<TValue> >::type* dummy = 0)
	{
		if(IsWriter)
			Write(key, value);
		else
			Read(key, value);
	}
	
	//Serialize an enum type to JsonCpp 
	template<typename TKey, typename TEnum>
	void Serialize(TKey key, TEnum& value, typename boost::enable_if<boost::is_enum<TEnum> >::type* dummy = 0)
	{
		int ival = (int) value;
		if(IsWriter)
		{
			Write(key, ival);
		}
		else
		{
			Read(key, ival);
			value = (TEnum) ival;
		}
	}
	
	//Serialize only when writing (saving), useful for r-values
	template<typename TKey, typename TValue>
	void WriteOnly(TKey key, TValue value, typename boost::enable_if<boost::is_fundamental<TValue> >::type* dummy = 0)
	{
		if(IsWriter)
			Write(key, value);
	}
	
	//Serialize a series of items by start and end iterators
	template<typename TKey, typename TItor>
	void WriteOnly(TKey key, TItor first, TItor last)
	{
		if(!IsWriter)
			return;
		
		JsonSerializer subVal(IsWriter);
		int index = 0;
		for(TItor it = first; it != last; ++it)
		{
			subVal.Serialize(index, *it);
			++index;
		}
		JsonValue[key] = subVal.JsonValue;
	}
	
	template<typename TKey, typename TValue>
	void ReadOnly(TKey key, TValue& value, typename boost::enable_if<boost::is_fundamental<TValue> >::type* dummy = 0)
	{
		if(!IsWriter)
			Read(key, value);
	}

	template<typename TValue>
	void ReadOnly(std::vector<TValue>& vec)
	{
		if(IsWriter)
			return;
		if(!JsonValue.isArray())
			return;
		
		vec.clear();
		vec.reserve(vec.size() + JsonValue.size());
		for(int i = 0; i < JsonValue.size(); ++i)
		{
			TValue val;
			Serialize(i, val);
			vec.push_back(val);
		}
	}
	
	template<typename TKey, typename TValue>
	void Serialize(TKey key, std::vector<TValue>& vec)
	{
		if(IsWriter)
		{
			WriteOnly(key, vec.begin(), vec.end());
		}
		else
		{
			JsonSerializer subVal(IsWriter);
			subVal.JsonValue = JsonValue[key];
			subVal.ReadOnly(vec);
		}
	}
	
	//Append a Json::Value directly
	template<typename TKey>
	void WriteOnly(TKey key, const Json::Value& value)
	{
		Write(key, value);
	}
	
	//Forward a pointer
	template<typename TKey, typename TValue>
	void Serialize(TKey key, TValue* value, typename boost::disable_if<boost::is_fundamental<TValue> >::type* dummy = 0)
	{
		Serialize(key, *value);
	}
	
	template<typename TKey, typename TValue>
	void WriteOnly(TKey key, TValue* value, typename boost::disable_if<boost::is_fundamental<TValue> >::type* dummy = 0)
	{
		Serialize(key, *value);
	}
	
	template<typename TKey, typename TValue>
	void ReadOnly(TKey key, TValue* value, typename boost::disable_if<boost::is_fundamental<TValue> >::type* dummy = 0)
	{
		ReadOnly(key, *value);
	}
	
	//Shorthand operator to serialize
	template<typename TKey, typename TValue>
	void operator()(TKey key, TValue& value)
	{
		Serialize(key, value);
	}
	
	Json::Value JsonValue;
	bool IsWriter;
	
private:
	template<typename TKey, typename TValue>
	void Write(TKey key, TValue value)
	{
		JsonValue[key] = value;
	}
				  
	template<typename TKey, typename TValue>
	void Read(TKey key, TValue& value, typename boost::enable_if<boost::is_arithmetic<TValue> >::type* dummy = 0)
	{
		int ival = JsonValue[key].asInt();
		value = (TValue) ival;
	}
	
	template<typename TKey>
	void Read(TKey key, bool& value)
	{
		value = JsonValue[key].asBool();
	}
	
	template<typename TKey>
	void Read(TKey key, int& value)
	{
		value = JsonValue[key].asInt();
	}
	
	template<typename TKey>
	void Read(TKey key, unsigned int& value)
	{
		value = JsonValue[key].asUInt();
	}
	
	template<typename TKey>
	void Read(TKey key, float& value)
	{
		value = JsonValue[key].asFloat();
	}
	
	template<typename TKey>
	void Read(TKey key, double& value)
	{
		value = JsonValue[key].asDouble();
	}
	
	template<typename TKey>
	void Read(TKey key, std::string& value)
	{
		value = JsonValue[key].asString();
	}
};

//"name value pair", derived from boost::serialization terminology
#define NVP(name) #name, name
#define SerializeNVP(name) Serialize(NVP(name))

#endif

Now that’s not so bad, is it? A bit under three hundred lines of type traits and template games and we’re ready to get on with our lives. A lot of the code is just fussing about what type it’s being applied to and drilling down to the correct read or write function. The SFINAE based block at the top of the class is used to locate the correct Serialize function for any given type, which can be an instance member function, static member function, or free function.

There is your free C++ to JSON serializer utility class for the day, complete with ultra permissive license. Enjoy.

Cinematic Color

I chose not to go to SIGGRAPH 2012, and I’m starting to wish I had. Via Julien Guertault, I found the course on Cinematic Color.

I’ve mentioned this in the past: I believe that as a graphics programmer, a thorough understanding of photography and cinematography through the entire production pipeline is necessary. Apparently I am not alone in this regard. Interesting corollary: should cinematographers understand computer graphics? Hmm.

Digital Color Part 1

What do you know about how computers read, store, process, and display colors? If your answer is R, G, and B color channels in the range of [0, 255], go hang your head in shame — and no credit for alpha/transparency channels. If you said spanning [0, 1], that’s very slightly better. More points if you mentioned HSV, YUV, YcbCr, etc. Less points if you didn’t mention sRGB. Extra credit if you thought about gamma curves or color temperatures, and a highly approving nod if the word “gamut” crossed your mind. Yes, today we’re going to be talking about color spaces, gamuts, color management, bit depth, and all the fun stuff that defines digital color. I’ve been doing a lot of photography work recently, and it’s brought a number of things to the forefront which I did not previously understand and of which I haven’t seen concise, centralized discussion. Most of this applies equally well to digital image capture (cameras, scanners), digital image rendering (offline or real time), and digital image reproduction (monitors, printers). Understand in advance that I’m trying to distill a book’s worth of theory into a blog post. This will be fast, loose, and cursory at best, but hopefully it’s enough of a glimpse to be enlightening. I’m multi-targeting this post for software engineers (game developers mainly), artists and photographers. We’ll see how that goes.

I decided to write this post after an incident where a self-portrait turned my skin distinctly pink when I uploaded it to Facebook. (I’m brown-skinned.) I went into significant depth trying to understand why, and found out that there is a lot of complexity in digital color that is not well explained in a unified format. This was originally intended to be a single monster post, but it’s just too much material to pack into one blog entry. I think I could write a small eBook if I really went into detail. In this first part, I’ll just be talking about representation of color tones, independent of brightness. Part 2 will talk about brightness, luminance, gamma, dynamic range, and so on. Part 3 will discuss the details of how devices reproduce colors and how we can work with multiple devices correctly.

Colors in Real Life

I do not want to talk about real life colors. That is an enormously complicated (and fairly awesome) topic that integrates physics, optics, biology, neurology, and cognitive studies. I just want to cover enough of the basics to allow us to discuss digital colors. Real life has a lot of colors. An infinite number of colors. Humans can perceive a limited number of these colors which exist in the visible spectrum. Each color represents a spectrum of light, which our eyes receive as an R/G/B triplet of information (for the purposes of this discussion). There are many discrete spectrums that we cannot differentiate and thus appear as the same color even though they’re not. This ambiguity shows up very strongly in people suffering from any type of color blindness. For a normal person, we can describe the total range of colors perceived and graph it on what’s known as a chromaticity diagram:

This is called the CIE 1931 color space. The full range of color perception forms a 3D volume region. The X and Y axes shown above describe the chromaticity of color, and the Z axis is brightness. This diagram represents a slice through that volume at 50% brightness. As an added bonus, this diagram will look different in different browsers if you have any type of calibrated or high end monitor. That’s a hint about the rabbit hole we’re about to enter. The colors on the diagram are merely an illustrated aid, not actual colors. It’s also a single 2D slice through a 3D volume of colors, with the third axis being brightness. We’ll talk more about brightness in Part 2. For now, just assume that the graph describes the total range of colors we can perceive, and values outside the colored area may as well not exist. This chart will be our basis for the discussion of digital color.

Colors on Digital Devices

You might know that most computer monitors are able to express about 16.8 million discrete color values in 24 bits per pixel. That sounds like a lot of colors, but it isn’t. It translates to 256 discrete values from 8 bits for each of red, blue, and green color channels, and 256 total levels of luminance covering 8 stops (one stop represents a doubling of light intensity). That means that for any given luminance level, you can describe 65536 different colors. So already a lot of our possible values have been spent on just describing luminance, leaving us very few to encode the actual shade of color. It’s not even adequate to express real luminance values; a typical high end digital camera can capture 12-14 stops in a single scene, and human perception can span 20+ stops. In short, we’d need nearly all of our 24 bits just to express all of the levels of luminance that humans can perceive, let alone the color range.

Because we’re talking about digital color, we have another problem — the mediums we work with cannot hope to cover the totality of human vision. Cameras, scanners, monitors, and printers have limitations in the colors they can understand and reproduce. More infuriatingly, each device has its own independent gamut of colors that does not match up with any other device. You’ve probably seen a printer vomit out colors that don’t match your screen. In many cases, the printer can never match your screen. When you take a photo and import it to a computer, you get two color shifts, first in the camera’s capture and processing and second in your computer’s processing and display. Send the image to someone else and it shifts again. Is the color on screen really related to the color in real life anymore? Take a photo of a leaf, then bring it inside and put it up against your computer screen. Go ahead and scan it too. Odds are the scan, photo, and leaf are all radically different colors when you see them side by side.

In Part 3 of this series, I’ll go into detail about how the different categories of digital devices detect or reproduce colors at a hardware level. These engineering details have a very real effect on our color workflow, and will be important in understanding how to compromise effectively across different hardware. There’s no point getting an image perfect on a computer screen if its final destination is print. Reconciling the differences in hardware and producing your desired colors anywhere will be our overarching goal for this series.

One last footnote: most digital cameras output JPEG images by default. These images are not relevant to a serious discussion of color, as they tend to interpret the digital sensor’s data rather creatively. Instead we will be talking about the RAW format data that higher quality digital cameras can optionally produce directly from the image sensor. These files are usually proprietary but there are many common software packages that can read them, and produce more even-handed representations of the sensor data. These color-accurate conversions will be the focus for the photography aspects of this discussion. The same applies to video data, with the caveat that consumers don’t have cameras that can produce RAW video data files at all.

Terminology

With that series of observations, it’s time to get a little bit more formal and look at what is going on. Let’s start with some proper definitions of terms I’ve been throwing around:

Color space: This is a mathematically defined subset of colors. It’s a theoretical construct, independent of any digital device or bit depth.
Color gamut: This is a description of the colors a given device can actually produce, and will vary per device. The gamut can be adjusted on many devices.
Color calibration: The process of matching a device’s gamut to a desired color space.
Luminance: An absolute measure of the intensity of light, independent of any device or perception.
Brightness: The subjective perception of luminance.
Stop: A difference of one exposure value. This is a measure of light, and an increase of one stop represents a doubling of the photon flux (roughly, density).
Chromaticity: An objective, physical description of a color independent of any device or perception.
Bit depth/bits per pixel: The number of bits we use to describe the value of an individual image pixel. With n bits, we can express 2ⁿ different values.
RGB: The general idea of encoding a pixel value using a set of three values that describe the strength of red, blue, and green to be combined. This is the native operation mode of nearly all digital devices.
White point/balance: The physical definition of the color considered “white”, independent of luminance.
Color temperature: A thermodynamics-derived value expressing a pure color of light that would be emitted by a body at that temperature. These are the colors we associate with stars. Trust me, you don’t want more detail.

That’s just to start us off, we’ll meet more terms along the way.

Spaces and Gamuts

Let’s start with computer monitors, as they are probably the easiest to understand. Above, I showed you the CIE 1931 color space describing the totality of human color perception. Monitors cannot express anything close to that range. Instead, monitors have traditionally tried to match an alternate, much smaller space known as sRGB. If you graph sRGB, it forms a triangle on top of CIE 1931 like this:

sRGB was created in 1996 by Microsoft and HP in order to match the capabilities of existing CRT displays, applying a formal mathematical structure. When people talk about RGB colors, they are almost certainly referring to values within the sRGB color space, where each value represents the weight applied to a weighted average of the three points of the sRGB triangle. Thus for any RGB value, you can pinpoint one location on the graph which is the desired color. A perfectly calibrated monitor will generate exactly that color when given the matching RGB value. In reality, the monitor’s triangle tends to be slightly misaligned. In any case, this is the color range that nearly the entire world uses for nearly everything. Most of the software on your computer doesn’t understand that anything else exists.

It should be blatantly obvious at this point that sRGB is very small. Compared to our full perceptual range, it misses an awful lot and you can begin to see why our leaf doesn’t match anything the monitor can display. There are a number of other color spaces out there:

AdobeRGB in particular has gained significant popularity, as a number of cameras and monitors support it and it is nearly identical to the NTSC color space standard for televisions. When we’re talking about monitors, we typically express the gamut as a percentage coverage of the NTSC space. The sRGB space represents a 70% gamut; a modern high end Dell UltraSharp will do about 110%. These monitors still take those same 24 color bits, but spread them over a wider area (actually, volume). These high end monitors are called wide gamut displays and they come with a very nasty catch.

Color values appear completely different on wide gamut displays. Those [0, 255] values for each channel represent different points in the color spectrum, spread farther apart. A wide gamut is a double edged sword because it represents a larger, more saturated space with less detail within the space. sRGB can describe smaller changes in color than AdobeRGB, but AdobeRGB can express more extreme colors than sRGB. This leads to nasty, unpleasant accidents if you’re not careful. Here’s a screenshot of two applications displaying exactly the same image:

Notice the massive shift in the red? The application on the left is MS Paint; the application on the right is Adobe Lightroom. Lightroom is a photo post-processing tool which is fully color-aware. The pixels of this image are stored in the sRGB color space, but my monitor is not in sRGB. Windows knows the model of my monitor and has downloaded a color profile, which tells it the attributes of my monitor’s color rendition. Lightroom knows this, and alters the image using the color profile to look correct on my monitor. Paint, however, has no clue about color profiles, and simply forwards the pixel data blindly to the monitor. The monitor’s wider color space causes a massive boost in saturation, changing my neighbor’s tastefully red house into a eye-searing abomination.

This can happen in reverse too. If you’ve got a nice image in AdobeRGB, it will look washed out and generally bad in sRGB mode. It will look even worse if you don’t print it correctly. Even if you do interpret it correctly, there are problems. AdobeRGB is a larger space than sRGB, so colors you can see on a wide gamut monitor simply won’t exist for an sRGB monitor and color saturation will get squished. Because so few people have wide gamut monitors, and because print gamuts are so much smaller, working on a wide gamut AdobeRGB display can be a dicey proposition. Making use of those extra colors may not pay dividends, and you may wind up with an image that cannot even be displayed correctly for your intended audience. As a result, it’s extremely important to understand which applications are color managed, what color space you’re working on, and what will happen when you produce the final image for other people to view. I call applications that use color management profiles color-aware, and others color-stupid (not a technical term).

Color Aware Software (on Windows 7)

Mac is traditionally much better about color management than Windows, due to the long graphic design history. Windows 7 does have full color management support, but following the tradition of Windows, most applications blithely ignore it. The first step is making sure you have a full color profile for your monitor. I won’t provide instructions on that here; it is usually automatic or derived from color calibration which we’ll discuss shortly. The second, somewhat more difficult step is making sure that your applications are all color-aware. On Windows 7, this is the situation:

Windows Explorer: Fully color aware. Your thumbnails are correct.
Windows Photo Viewer: Fully color aware. This is the default image preview tool, so when you’re previewing images all is well.
MS Office 2010: Fully color-aware, pleasantly enough.
MS Paint: Completely color-stupid.
Internet Explorer 9: Aware of image profiles, but ignores monitor profiles and blindly outputs everything as sRGB. Your IE colors are all rendered wrong on a wide gamut display. This despite the fact that IE specifically advertises color-awareness.
Mozilla Firefox: Fully color aware, but images that don’t specify a profile explicitly are assumed to match the monitor. You probably want FF to assume they’re sRGB, which is a hidden setting in about:config. Change gfx.color_management.mode to 2.
Google Chrome: Completely color-stupid.
Google Picasa: Fully color-aware, but not by default. Enable it in the View menu of Picasa and both the organizer tool and the preview tool become fully aware. You want to do this.
Adobe Anything: Fully color-aware, and pretty much the standard for color management — EXCEPT PREMIERE.
Corel Paintshop Pro: Fully color-aware, but glitchy for no apparent reason in the usual Corel way.
Blender: Almost completely color-stupid.
The GIMP: Fully color-aware, but ignores the system settings by default. Go into Edit->Preferences, Color Management tab, and check “Try to use the system monitor profile”. This is an important step if you’re using GIMP on a wide gamut or calibrated monitor.
Visual Studio: Color-stupid, which is disappointing but not surprising.
Video players: Blatant color-stupidity across the board. WMP, Quicktime, VLC, and MPC all failed my test.
Video games/3D rendering: Hah, not a chance. All color-stupid. Don’t count on 3D modeling tools to be color-aware in 3D mode either. The entire Autodesk suite of tools (3DS Max, Maya, Softimage, Mudbox) are all incorrect in this respect.

Given Mac’s longer legacy in graphic design, it is probably safe to assume that all image applications are color-aware. I know Chrome and Safari both handle colors correctly on Mac, for example. I have not yet tested video or 3D on Mac with a wide gamut display, but I suspect that they will not handle colors correctly.

Bit Depth

We’ve covered the idea that colors are expressed in a color space. Mathematically, a color space is a polygon on the chromaticity diagram; we’ll assume it’s always a triangle. A color represents a point inside this triangle, and any color can be represented as a weighted average of the three points on the triangle. These weighted averages form what we commonly refer to as the RGB value. In its purest form, the RGB value is a normal length vector of three coordinates in the interval [0, 1]. We traditionally also have an intensity value which gives the overall color luminance. In practical terms, we store the colors differently; the RGB value that most people are familiar with expresses the intensity of each of red, green, and blue color values (called channels). These are ratios between 0 (black) and 1 (fully saturated single color), and together they describe both a color and an intensity for that color. This is not the only representation; many televisions use YCbCr, which stores the luminance Y and two of the three color weights, Cb and Cr. You can compute the third color weight quite easily, and so these different representations (and many others, like HSV) are all basically equivalent. Hardware devices natively work with RGB intensities though, so that’s the color representation we will stick to.

Because computers don’t really like working with decimal numbers, we usually transform the [0, 1] range for the RGB channels into a wide range that we can describe using only integers. Most people are familiar with the [0, 255] range seen in many art programs such as Photoshop. This representation assigns an 8 bit integer to each color channel, which can store up to 256 values. With three channels of 256 values each, we have a total of 256³ colors, 16,777,216 in all. Computers have used this specification for many years, calling it TrueColor, 24 bit color, millions of colors, or something along those lines. I’ve already mentioned that this is a very limiting space of colors in many ways. It’s often adequate for our final images, but we really want much better color fidelity, especially in the process of editing or rendering our images. Otherwise, every image adjustment will cause rounding errors that cause our colors to subtly drift, eventually causing significant damage to color accuracy.

If you’ve worked with digital camera data, you probably know that most do not only use 8 bits per color channel. It’s typical for high end cameras to use 12 or even 14 bits for their internal data, yielding raw image files of 36 or 42 bits per pixel. Modern computer graphics applications and games use 16 or even 32 bits per color channel, totaling 48 or 96 bits of color information per pixel. Even though that level of detail is not necessary for the final image, it is important that we store the images as accurately as possible while working on them to avoid losing data before we are ready.

This problem extends to monitors, too. The vast majority of LCD monitors on the market only have 6 — that’s six — bits per color channel, and use various tricks to display the missing colors. (Yes, even many high end IPS type screens.) For many years, this meant that doing serious imaging work on LCDs was out of the question; you either used an older CRT or a very expensive design grade LCD. Nowadays, the color replication on quality 6 bit monitors like my Dell UltraSharp U2311H is excellent, and I don’t have any qualms in recommending one of these monitors for serious graphics work. I’ve compared the output side by side to my real 8 bit monitors and there is a difference, but it is minute and only visible in a direct comparison or test charts.

However, there is another consideration. I hinted earlier that wide gamut can hurt color accuracy. When using a wide gamut monitor, those color bits are stretched over a wider range of colors than normal. Because the bit depth hasn’t changed, we can no longer represent as many colors within the smaller sRGB triangle, and sRGB images will have some of their colors “crushed” when processed by the monitor’s color profile in a color-aware application. In order to combat this, high end modern monitors like the Dell U2711H actually process and display colors at 10 bits per channel, 30 bits total. 30, 36, and 48 bit color representations are known as Deep Color and they allow the monitor to be significantly more precise in its color rendition, even if the physical panel is still limited to 8 bits per color. It also allows more precise color calibration. If your monitor and graphics card support it, applications like Photoshop can take advantage of deep color to display extremely accurate wide gamut colors. And that brings me to an unfortunate caveat.

UPDATE: I previously claimed that Radeons could output 30 bit Deep Color. This appears not to be the case; more to come. The paragraph below has been revised.
Only AMD FirePro and NVIDIA Quadro chips support deep color, and only under Windows. Intel chips do not have deep color support at all. NVIDIA GeForce and AMD Radeon chips have the necessary hardware for 30 bit output, but the drivers do not support it. Mac OSX, up to and including 10.7 Lion, cannot do 30 bit under any situation no matter what hardware you have. This is despite the fact that both AMD and NVIDIA explicitly advertise 30-bit support in a number of these cases.

Color Calibration

Color calibration is the process of aligning the gamut of a display to a desired color space. This applies both to devices that capture images (cameras, scanners) and devices that generate them (monitors, printers). These devices frequently have adjustable settings that will alter their color gamut, but those controls are not usually adequate to match a color space. In the case of computer monitors, calibration actually refers to two discrete steps. The first step, calibration, corrects the monitor’s basic settings (brightness, contrast, hardware color settings) and graphics card settings to optimal values. The second step, profiling, measures the error between the gamut and the space, and encodes how to convert different color spaces to the actual gamut of the display as a software calibration profile. Manufacturers provide a default profile that describes the monitor, but calibration corrects the settings for your specific environment and screen. Monitor gamuts can shift over time, and calibration depends on the ambient conditions as well. Thus for truly accurate work, it is necessary to calibrate the monitor periodically, rather than set-and-forget.

Within the graphics card or monitor, there is a look-up table (LUT) that is used to pick the hardware output for a certain input. The LUT is used to provide basic calibration control for colors. For example, the red channel on your monitor may be too strong, so a calibrator could set the LUT entries for red 251-255 to output a red value of 250. In this case our color gamut has been corrected, but we’ve also lost color accuracy, since 256 input colors are now mapped to only 251 output colors. Depending on the hardware, this correction can happen at 6, 8, or 10 bit precision. 10 bit allows much more color detail, and so even in 8 bit mode, a 10 bit monitor’s expanded LUT makes it much more capable of responding to color calibration accurately. The LUT is a global hardware setting that lives outside of any particular software, and so it will provide calibration to all programs regardless of whether they are color aware. However, the LUT only operates within the native gamut of the monitor. That is, it can correct an AdobeRGB image to display correctly on an AdobeRGB monitor, but it cannot convert between sRGB and AdobeRGB.

Color conversion and correction is handled by an ICC profile, sometimes called an ICM profile on Windows. The monitor typically has a default profile, and the profiling step of a color calibrator can create one customized to your display and environment. The profile describes how to convert from various color spaces to the gamut of the monitor correctly. On a perfectly calibrated monitor, we would expect the ICC profile to have no effect on colors that are in the same color space as the monitor. In reality the monitor’s gamut never matches the color space perfectly, so the ICC profile may specify corrections even within the same color space. Its primary purpose, however, is to describe how to convert various color spaces to the monitor’s color gamut. We can never display an AdobeRGB image correctly on an sRGB monitor, because the space is too wide. Instead the display must decide to how to convert colors that are out of gamut. One option is to simply force out of gamut colors to the nearest edge of the target gamut. This preserves color accuracy within the smaller space, but destroys detail in the out of gamut areas entirely. Alternately we can scale the entire space, which will lead to inaccurate colors everywhere but better preservation of color detail. I’ll look at this dilemma in more detail in a future post.

The ICC profile is necessary to reproduce accurate colors on the monitor. This brings up an important point: Hardware color calibration is ineffective for color-stupid applications when image and monitor color spaces do not match. Consider the case of an sRGB image on an AdobeRGB monitor. A color stupid application will tell the monitor that these are sRGB colors. The LUT only specifies how to correct AdobeRGB colors, so for sRGB it simply changes one incorrect color to another. No amount of expense on calibration hardware will fix this problem.

In the case of a digital camera, it is not possible to alter the color response of the internal sensor. Instead, the sensor needs to be measured and corrected according to a profile. Tools like Adobe Camera Raw ship a set of default camera profiles containing this data. Unfortunately the correct calibration varies based on lighting conditions, camera settings like ISO, the lens in use, etc in unpredictable ways. For highly color-critical work (eg studio photography), it’s common to use a product like the X-Rite Color Checker to acquire the correct colors for the shooting conditions. Either way, the calibration data is used in RAW conversion to determine final colors (along with other settings like white balance). The details of this process are at the discretion of the RAW conversion software. Adobe uses the profile (whether it’s the built-in default or an X-Rite calibrated alternative) to move everything to the enormous ProPhotoRGB color space at 16 bits per color channel, 48 bits per pixel. This gives them the widest possible flexibility in editing and color outputs, but it is critical to understand what will happen when the data is baked into a more common output format. We’ll see more of that in Part 3.

White Balance

What color is white? It’s a tricky question, because it depends on lighting conditions and to some extent is a subjective choice. Day to day, the brain automatically corrects our perception of white for the environment. Digital devices have to pick a specific rendition of white, based on their hardware and processing algorithms. Mathematically, white is the dead center of our color space, the point where R, G, and B all balance perfectly. But that point itself is adjustable, controlled by a value we call the white balance. White balance is a range of tones that encompass “white”, and it is defined primarily by a color temperature. You were probably told at some point that “white” light contains every color. Although it’s true, the balance of those colors varies. The color temperature is actually a value from thermodynamic physics, and it describes a particular color spectrum emitted by any “black body” at a particular temperature in Kelvins. We’ll ignore the apparent contradiction of terms and the physics in general. In short, cooler temperatures, 4000K and below, tend towards orange and red. Warmer temperatures, 6000K and above, are blue and eventually violet. 5000K is generally considered to be an even medium white and matches the average color of daylight (not to be confused with the color of the sky or sun). We can graph color temperatures on a chromaticity diagram:

The white point of a color space is the color temperature that it expects to correspond to mathematical white. In the case of sRGB and most digital devices, the white point is a particular illuminant known as D65, a theoretical white value roughly equivalent to a color temperature of 6504K. There’s no point agonizing about the details here; simply remember that the standard white is 6500K.

All digital devices have a native white point, derived from their physical parameters. In an LCD monitor, it comes from the color of the backlight. This color is usually close to, but not exactly 6500K. Correcting the white balance of the monitor is one of the biggest benefits of calibration, especially in multi-monitor situations. Because the hardware white point cannot be changed, these adjustments operate by correcting the individual channel intensities downwards, which reduces the color gamut. Thus a wider gamut display is more tolerant of color calibration, because it has more flexibility to compensate for shifts in white balance. Similarly, digital cameras capture all colors relative to a physical white point and white balance adjustments to photos will shift the entire gamut.

White balance is probably the most common color adjustment on photographs. As I said earlier, our brains constantly color correct our environments. Cameras don’t have that luxury, and have to make a best guess about what to do with the incoming light. They will make a best guess and store that value along with the RAW data, but that guess can be changed later in processing for the final output. Most cameras allow the user to specify a particular white balance at capture time. Either way, white balance adjustment typically happens along two axes: color temperature (which we’ve already discussed) and green-magenta hue. The hue adjustment moves the colors perpendicular to the color temperature, functioning as a separate independent axis. You can see this most clearly on the diagram above where 6000K is marked, but depending on the color temperature in use the hue shift will not always be between green and magenta. For example, at 1500K, it appears to be between orange and green instead. If you skip back up, the chart of the sRGB space has its central D65 point marked. You can imagine that point shifting and the whole triangle pinned to it as we change the white balance. All of the points of the triangle will move in color space to center around the white point.

Be careful in how you use “warmer” and “cooler” in describing white balance, because it can get confusing quickly. If you’ve done photo work, you might notice that the chart displays the temperatures reversed from what you expect. The colors of a photo shift in the opposite direction of the white point, which leads to our common description of warm and cool colors. If you set the white point to a very cold value, neutral white is now considered to be a very yellow color, and all the colors of the photo are pushed into blue territory. If you pick a very warm color, blue is considered neutral and all our colors shift towards yellow. This is because the rest of our colors are effectively described relative to the white point, and the color temperature of the photograph is the physical temperature that gets mapped to RGB (1, 1, 1).

Summary

In this post, I talked about the basics of digital color representation. We looked at color spaces, the mathematical ranges of color, and gamuts, the actual ranges that devices can work in. We talked about the implications of images and devices in different spaces, and the importance of color-aware applications. Next I explained bit depth and color calibration, and closed with an overview of white point and white balance.

What’s more interesting is what we did not cover yet. The discussion covered color, but not brightness. We know how to express various shades and tints, but not how to describe how bright they are (or the differences between brightness, luminance, and saturation). We also don’t know how to put any of this knowledge into practical use in our graphics work. Those will be the subjects of Parts 2 and 3, respectively.