SlimDX August Release Plan

I probably don’t need to tell you that it has been an incredibly long time since the last SlimDX release. That’s because it has been an incredibly long time since the last DirectX release. The reason seems to be somewhat quirky schedule planning on Microsoft’s part; basically they’ve decided to hold out on the next SDK until they have a completely stable Windows 7 compatible release. (IMO they should’ve done a release in June but okay fine.)

I do know two things. One, the SDK is branded as August, so unless they screw this up really badly it should arrive soon. Today is Wednesday; my guess is it’ll show up Friday mid-day. I have confirmed that there’s a bunch of new functionality coming, which of course we’ll have to wrap. That will take some time. Not sure exactly how much, but it will be at least a few days to write and check that code. IOW, our August release will actually come out in early September sometime, hopefully before Labor Day. It will have fully merged support for 10.1, 11, and Direct2D. DirectWrite remains as a maybe.

SlimDX Performance Tips and Tricks

Previously, I discussed some of the inherent performance costs that SlimDX suffers. Although that’s somewhat educational if you’re evaluating SlimDX, it’s pretty useless if you’re already using it and would like to get the most out of your code. This time, I’ll go over what you can actually do to make sure you’re running optimally.

There are two big problems with managed vector/matrix math, and this applies just as well to XNA. First, all of the types are value types, and passed by value to operators. That means when you multiply two matrices via operator*, two matrices (32 floats, 128 bytes) are copied onto the stack, and then another one is copied back into your result. This can get quite expensive, and the solution is to pass by reference, not by value. Unfortunately that means operators are a problem for performance sensitive code; you’ll have to use functions like Add and Multiply instead.

There’s also the problem that generally speaking, vector operations are not candidates for inlining. They’re too big for the JIT’s metrics to pick them up as inlining candidates (the 3.5 SP1 revision may have changed this). For small vector operations, this can again become a substantial cost. Unfortunately this is a messy one to deal with, as you can’t ask the compiler or JIT to inline things for you. The most effective approach I’ve seen is to replace vector operations in stable code with hand-inlined code. Farseer Physics uses this method, and wraps the inlined blocks in #region to clarify what’s going on. Yes it’s incredibly tedious, but if that’s what you have to do, then there it is.

Don’t use strings as effect handles if you can help it. We have to convert from Unicode to ANSI internally, and create a temporary handle. This gets slow and can cause other bugs as well. In future releases, this problem will actually be alleviated somewhat, but it’s best to avoid it completely.

Also make sure that SlimDX itself is configured correctly. These settings in the Configuration class. For example, object tracking is an incredibly useful debugging feature that tells you what objects you’re leaking and where they came from. But because it records call stacks, it’s also quite expensive. The default setting is for it to be active; turn it off for production builds. Also consider disabling exceptions for return codes you don’t care about (device lost and device not reset are common ones), instead of catching and ignoring.

Be careful with get properties and functions. An object’s Description property will always call GetDesc() on the underlying object, and then return a whole struct. This can get expensive quickly, especially if you casually access the property multiple times. We’ve chosen not to cache much of anything in SlimDX for the time being due to some nasty bugs early on. Querying data is expensive as a result.

Anything involving callbacks and callback interfaces is bad news, and it’d be best to avoid them for performance critical code. Every time you cross the boundary from managed to unmanaged or back again involves overhead, and for callbacks we end up bouncing multiple times — all while doing various kinds of fix up and data marshaling. Texture.Fill in particular is incredibly slow.

If you’re working with large amounts of raw data that will be sent to SlimDX, consider using DataStream, especially as a replacement for (Unmanaged)MemoryStream. When you give SlimDX a generic Stream to work with, it has to allocate a buffer large enough to hold the data, read all the data into the buffer, and then copy that into the target native DirectX buffer. This is quite inefficient for certain types of data that are already in memory. If you hand us a DataStream, we can skip the allocation and read, doing a fast memory copy only.

Hopefully that’s helpful. I’ll update this post as I remember more tips.

Windows API Code Pack — Is It Any Good?

Just to be clear, I’m restricting my comments to the DirectX section of the codepack. I’ll probably be integrating some of the other bits (jump list support etc) into SlimTune, so we’ll see how that goes later. But around a week ago, they finally released 1.0. Same day as Windows 7 was out on MSDN actually — I’m pretty sure that’s not a coincidence. SlimDX’s release with Direct3D 11, Direct2D, and DirectWrite should come later this month, if all goes well. Now, the code pack does cover a few things we don’t, like the Windows Imaging Component.

So it’s not beta anymore. Now, I’m fairly sure that they haven’t looked at our code for legal reasons, but I did make some harsh comments about their work. They’ve made some changes that seem to follow directly from those comments. I’m about to make some more. I’ve been perusing the release and frankly, it’s just not any good. They seem to have spent most of their time implementing equally shoddy support for D3D 10 than fixing the actual problems. I’m going to run through all the reasons I see that this thing is not well done.

Okay, so they added a math library. I very pointedly slammed them for not having one in the 0.90 release, so let’s start with the bread and butter of graphics — Vector3F, as it’s called in the Code Pack. It offers X, Y, Z, Normalize, NormalizeInPlace, static Dot, static Cross, operator +, operator -, and (in)equality comparisons. Yes, that’s the entire class. No ref overloads, which are important for performance. No other helper methods of any kind. No PropertyGrid or System.ComponentModel compatibility. Matrix is even sadder — it has operator * (by-value only) and Identity. That’s it. Compare to ours, or XNA.

Functions that return object references still create brand new instances, which then not only have to be garbage collected, but if you don’t remember to Dispose them, they’ll be queued for finalization too. (These USED to be properties…it’s an improvement, I guess.) This is a similar effect to the original MDX’s event problems, just smeared out over time and difficult to track. There’s certainly no leak tracking functionality like SlimDX has. (OTOH they will be released eventually, which SlimDX does not promise.) These are lots of small allocations, which the .NET GC is good at handling, but if you don’t remember to Dispose them, or have a lot of them in general, this could really sour your day. It’s a problem that just doesn’t exist in SlimDX.

As for 64 bit support, it’s simply not configured at all in the solution (remember, the code pack is source code only, no binaries). I set up and ran an x64 build that went off without a hitch, and there’s no inherent reason for x64 to not work. I haven’t tested it though, and neither has anyone else apparently.

Lastly, even though they’ve added D3D 10 support, there’s no D3DX support in here at all. They basically invite you to go ahead and write what you need yourself, but none of it is done for you. For something that’s intended to make your job easier by letting you use managed code, this is another odd omission.

The 1.0 of the code pack IS dramatically improved in several respects — 0.90 had no math code at all, the memory situation was far worse, etc. Even so, this really doesn’t inspire a lot of confidence. Although the core APIs are wrapped, the support code is basically non-existent. There’s no binary distribution or redistributable, so you’re on your own there. I know this is probably a small team at Microsoft with nowhere near the level of resources it needs, and I’m sorry that I’m continually trashing your work. But if this is what constitutes the successor to Managed DirectX, I don’t think SlimDX is in any danger and I can’t say I mind.

SlimTune Profiler 0.1.5 Released!

Let’s recap. For about two months now, I’ve been working on a brand new profiling tool for .NET, C#, CLR, and all that jazz. It’s open source, completely free, and supports frameworks 2.0 and later (no 1.x, sorry). Some of the notable features include remote profiling, real time results analysis, and multiple visualizers. Today, the first public release, version 0.1.5, is available to the public.

Project Homepage
Direct Link to Installer

Although this is still an early version, it is already quite capable. It supports sampling mode profiling for both x86 and x64 applications, and provides views that will be familiar to users of NProf or dotTrace. Speaking of NProf, it’s my belief that this completely replaces it for .NET 2.0+, with a better UI and more features too. (And a far more lenient source code license as well.) There is still a lot to come, of course, but with this release I finally feel that this is ready for the general public.

I’m looking forward to getting lots of feedback, both positive and negative, and I hope that this is a useful tool for everyone.

(P.S. If you want to build from source, you’ll need to do it with a non-express version of VC++ 2008 SP1 and VC# 08, with a full boost installation. Also install the SQL Server Compact redist, which is in the repository under trunk\install\ExtraFiles.)

SlimTune’s Hybrid Mode

I decided to try out the dotTrace Profiler, which runs $200 for a personal license and $500 per developer for organizations. IOW, it’s expensive. That $200 license makes it the cheapest of the commercial options, and I ran the trial on one of my games. They have some nice UI touches I like. The data is valuable as well — I would not have guessed that my MainGame.Update function takes five billion percent of total program time.

Generally speaking, profilers operate in one of two modes: sampling (statistical), and tracing (instrumenting). Sampling operates by suspending the process at high frequency and examining the program state. It converges to a decent overview of where your code is spending time, but it doesn’t produce meaningful timing information. The frequency simply isn’t high enough. Tracing injects calls to the profiler every time a function is entered or exited, allowing it to monitor the complete progression of your code. You get accurate results with fairly reliable timings, but it’s incredibly slow. (Oh, and it crashes dotTrace. People pay for this?)

I’ve been looking at doing a hybrid mode since the beginning of the project; I finally came up with a concrete approach when a friend gave me a rough overview of SN Tuner, the profiler for the PS3. The idea is fairly simple: you don’t generally want tracing-level accuracy for the vast majority of the code. All you really need is an overview, which sampling does a good job of, and then tracing when you’re focusing on one specific piece. You also don’t really need detailed profiling of framework code (everything in System, for example). Although I’m still working on the front-end, SlimTune is now able to do this type of selective instrumentation at runtime.

Using it is pretty simple. When you start up the target, select Hybrid in the SlimTune UI. This will cause the program to run in sampling mode, and you’ll get your overview results. Then, you can select a function from the overview and ask it to be traced, and then results will flow in from that function and its children only. You can also turn it off again, and you can ask for entire namespaces to be skipped in either tracing or hybrid mode. Hybrid mode is a little slower than sampling overall, but it allows you to get very detailed results without the huge performance hit that normally accompanies that level of detail.

Internally, it’s a little tricky to pull this off but it’s not too bad. I discovered early on that taking a lock on the function hooks is hideously expensive, even at zero contention. I use a few lockfree tricks to get the necessary data much faster. It’s also very important not to let the sampling profiler attempt to sample inside the hooks, as this leads to some nasty deadlocks; again, lockfree code is used to lay out some unsafe zones that the sampler can detect and avoid. SuspendThread is one messy son of a bitch.

So there are at least three features I’m giving you for free which a five hundred dollar competitor doesn’t have. Sure they have a much cleaner interface, VS integration, memory profiling, and so forth…but I’ve only been doing this for a month. Kinda makes me wonder. Oh, and guess what I spent the last day or so doing…

Cleaned up version of dotTrace style visualizer

More SlimTune Pictures

Hey, why not?

More pies, courtesy of my other monitor.

SlimTune’s NProf Style and Pie Visualizers

SlimTune‘s UI supports pluggable visualizers. What I realized was people were going to want to see their data sliced in different ways, and there would be no sane way to anticipate all those needs, let alone fuse them into a single viewing style. You’ll actually be able to drop in .NET assemblies of your own as plugins and have the option to view the profiling data using YOUR plugins, or the ones I ship. Multiple plugins at once on the same data? Check. Real-time view? As long as the plugin supports it. Remote profiling? Absolutely check.

I spent today working on the support for visualizers, and threw together an NProf style visualizer. It’s sampling-only, doesn’t handle real time yet, and generally a bit rough around the edges. But seeing as how I was typing in SQL by hand before, this is a pretty useful step up. It looks pretty decent, I think:

And then there’s this thing that took two or three hours, is a pain in the ass to navigate, but does update in real time:

Pie chart visualizer :D — Pie chart visualizer 😀