3.11.12

Voxel Performance

In my previous post about the voxel prototype  performance was not a concern but this time I'd like to share a brief explaination about the two basic optimizations I did to boost performance significantly:
  • y-buffer: only draw visible pixels
  • setPixels and direct memory access

Previously I drew from back to front and for each pixel the whole column, no matter if the next column covers most of it. This creates an enormous overhead and removing it is obviously the best way to improve performance.

y-buffer
Now I iterate from front to back and compare the column height with the height of the previous column below and only continue if at least one pixel  is visible. I actually compare the y-coordinate of the heighest pixel of my current column with the y-coordinate of the last drawn pixel. But the idea stays the same.

You can see the drawn pixels in the illustrations to the right in green. The second column is entirely invisible and only the top 2 pixel of the third are drawn. It might seem confusing that each column has an offset of 1 in y, but it might get clearer if you imagine the terrain flat. Although the illustration is isometric, the calculation is done in pure 2D, in screen coordinates.

The second step is to improve the way the pixel-infromations are read and written. I previously used getPixel and setPixel on each iterationstep but there are better ways.

For instance it is noticeable faster to save the color informations once for each map via getVector and access the color informations using the vector. The same goes for writing by setting the new color first into a vector and draw it at the end via setVector. The only big difference in your algorithm is that the vector is one dimensional so you have to calculate x, y or the vector index on each iteration step. You can also do this with ByteArrays instead of Vectors. The speed difference is not sooo significant compared to vector but it gets more interessting combined with direct memory access.

The idea is to use one big bytearray where I store the data for each map and also reserve enough bytes for the output image. I then select this block of memory and read/write on it using the Memory api of haxe. Unfortunatly, Memory uses little edian and not big edian which means our color informations are stored "in reverse".
var i:Int = i = (y * width + x) * 4;  // 4 byte = 1 color (a,b,g,r)
var px_color:Int = Memory.getI32( i );

a = px_color & 0xFF;
r = (px_color >> 8) & 0xFF);
g = (px_color >> 16) & 0xFF);
b = (px_color >> 24) & 0xFF);
 
Memory.setI32( i, a | r << 8 | g << 16 | b << 2 4);
I'm happy about the new performance but it is still not fast enough for my taste. The only way I can see how to significantly boost the speed is to shift the work to the GPU using shaders. Haxe does provide its own way to write shaders and although it claims to support other platforms besides flash I can't see how.

I'm by far not that deep into haxe to really understand how their shaders are working but they seem to be working on a big haxe update including an update for their shader language to support a newer version of open gl. But before I write any shader I think I'll wait for the update and imo its time to move on to a different topic anyway.

Demo
Source

No comments:

Post a Comment