Today I wanted to share two versions of a function that I wrote, in order to show how my iterative approach to software development works when I am doing code optimization to improve performance.
This example comes from my iMprOVE_WRAP asset. It’s a function that returns the shortest distance (taking into account the wrap capabilities of the calling object) between the calling instance and a target object.
The first implementation works, in that it correctly does what it’s supposed to do, but I never released it, because I wasn’t satisfied that it was good enough code to ship.
///iw_distance_to_object(target_obj, x1, y1, x2, y2, do_wrap_h, do_wrap_v,)
///@description Returns the distance_to_object from an improve_wrap object calling this function to another instance.
///Compares all relevant points for the iw_object and returns the nearest distance, taking the wrap range into account.
///@param target_obj id of the target object to determine the distance to.
///@param x1 left x boundary of wrap range
///@param y1 top y boundary of wrap range
///@param x2 right x boundary of wrap range
///@param y2 bottom y boundary of wrap range
///@param do_wrap_h set whether the horizontal wrap is on (true) or off (false)
///@param do_wrap_v set whether the vertical wrap is on (true) or off (false)
//get the distance from the nine virtual positions
//return the shortest distance
var obj = argument[0];
var iw_distance, iw_distance_up, iw_distance_down, iw_distance_left, iw_distance_right,
iw_distance_up_left, iw_distance_up_right, iw_distance_down_left, iw_distance_down_right;
var tempx, tempy, shortest;
var x1, y1, x2, y2, range_width, range_height, do_wrap_h, do_wrap_v;
//keep track of original location of target object
tempx = x;
tempy = y;
//set up wrap range
x1 = min(argument[1], argument[3]);
y1 = min(argument[2], argument[4]);
x2 = max(argument[1], argument[3]);
y2 = max(argument[2], argument[4]);
range_width = x2 - x1;
range_height = y2 - y1;
do_wrap_h = argument[5];
do_wrap_v = argument[6];
//check distances
//check center
iw_distance = distance_to_object(obj);
if do_wrap_h && do_wrap_v //wrap vertical and horizontal
{
//check corners
x = tempx - range_width;
y = tempx - range_height;
iw_distance_up_left = distance_to_object(obj);
y = tempx + range_height;
iw_distance_down_left = distance_to_object(obj);
x = tempx + range_width;
iw_distance_down_right = distance_to_object(obj);
y = tempy - range_height;
iw_distance_up_right = distance_to_object(obj);
//check left and right
y = tempy;
x = tempx - range_width;
iw_distance_left = distance_to_object(obj);
x = tempx + range_width;
iw_distance_right = distance_to_object(obj);
//check up and down
x = tempx;
y = tempy - range_height;
iw_distance_up = distance_to_object(obj);
y = tempy + range_height;
iw_distance_down = distance_to_object(obj);
shortest = min(iw_distance, iw_distance_up, iw_distance_down, iw_distance_left, iw_distance_right,
iw_distance_up_left, iw_distance_up_right, iw_distance_down_left, iw_distance_down_right);
}
if do_wrap_h && !do_wrap_v //do_wrap_h
{
//check left and right
x = tempx - range_width;
iw_distance_left = distance_to_object(obj);
x = tempx + range_width;
iw_distance_right = distance_to_object(obj);
shortest = min(iw_distance, iw_distance_left, iw_distance_right);
}
if do_wrap_v && !do_wrap_h //do_wrap_v
{
//check up and down
y = tempy - range_height;
iw_distance_up = distance_to_object(obj);
y = tempy + range_height;
iw_distance_down = distance_to_object(obj);
shortest = min(iw_distance, iw_distance_up, iw_distance_down);
}
if !do_wrap_h && !do_wrap_v
{
shortest = iw_distance;
}
//return calling instance to original location
x = tempx;
y = tempy;
return shortest;
Let’s take a moment to appreciate this function as it’s written. It’s well-structured, documented, and expressive. First we declare a bunch of variables, then we do stuff with the variables, then we get our answer and return it. And this gives a correct result…
So what’s wrong with the above? It’s an inefficient approach, which checks each virtual position of the wrapping object. If the calling instance wraps vertically and horizontally, it has to temporarily move the calling instance 9 times and check the distance from each of 9 virtual positions, then return it back to its original position, only to return the shortest of those 9 virtual positions.
There’s also a lot of code duplication.
Still, it’s not horrible code. But it’s up to 9x slower than the distance_to_object()
function it’s based on, if you’re wrapping in both directions, which will probably be common. I didn’t think that was good enough.
Rather than check each virtual location to see which is the shortest distance, we just need to know whether the horizontal and vertical distances are more than half of the width and height of the wrap region. If they are, then it’s shorter to go around the wrap. To know this, you simply take the x
and y
values of the two positions, subtract one from the other, and compare to the size of the wrap range. Once you know which virtual position is the closest one, you can temporarily place the calling instance there, and use distance_to_object()
to get that distance. Put the calling instance back where it was, and then return the distance.
I realized as well that depending on whether the calling object wraps in both directions, you may not need to check for a wrap shortcut in the horizontal or vertical. So we can potentially avoid doing some or all of the checks depending on whether the do_wrap_h
and do_wrap_v
arguments are true or false. As well, this means we can avoid declaring certain variables if they’re not needed, which conserves both execution time as well as RAM.
I usually create local script variables in a var
declaration, and assign the arguments to them so the code will be more readable, but I wanted to avoid doing that so that this function could be as lean and fast as possible. This might be an unnecessary optimization, but that’s hard to predict since I have no way of knowing ahead of time how this function might be used in a future project. In a project with many wrapping instances, it could very well be called many times per step, and every optimization could be critical. Since the script is intended to be included as a function in an extension, once I have it working properly it shouldn’t be opened for future maintenance, so making the script readable is not as important. So I opted to remove the local variable declarations as much as possible and just use the argument[]
variables directly.
Also, to ensure that the wrap range is defined properly, in the non-optimized version of this function, I declare x1, y1, x2, y2
and assign their values using min()
and max()
so that (x1, y1)
is always the top left corner, and (x2, y2)
is always the bottom right corner of the wrap range. Technically for this function, we don’t care precisely where the wrap range is, only what the width and height of the wrap range are. That being the case, I can further optimize what I have here, and rather than use min and max, I can just take the absolute value of the difference of these two values.
It turns out that the process I went through to optimize this function is pretty interesting, if you care about optimizing. So I’ll go into greater detail at the end of this article about the approach I took to get there. But for now, let’s skip ahead and look at the finished, optimized function. Here it is, re-implemented, this time doing only the minimum amount of work needed:
///iw_distance_to_object(obj, x1, y1, x2, y2, do_wrap_h, do_wrap_v)
///@description iw_distance_to_object returns the shortest distance in room pixels between two objects in the wrap range,
///taking into account the horizontal and/or vertical wrap properites of the calling object.
///@param obj the id of the target object
///@param x1 left x boundary of wrap range
///@param y1 top y boundary of wrap range
///@param x2 right x boundary of wrap range
///@param y2 bottom y boundary of wrap range
///@param do_wrap_h set whether the horizontal wrap is on (true) or off (false)
///@param do_wrap_v set whether the vertical wrap is on (true) or off (false)
if !(argument[5] || argument[6]) //not wrapping actually
{
return distance_to_object(argument[0]);
}
else
{
//We're going to figure out which virtual position is the nearest to measure from
//To do that, we have to compare the h-distance and v-distance of the calling instance and the target position
//If this distance is <half the range size, then the original position of the calling instance is closest
//Otherwise we have to use one of the virtual positions
//Then we're going to temporarily put the calling instance in that location, get the distance, and put it back
//arguments
var tempx = x, tempy = y;
if argument[5] //do_wrap_h
{
var range_width = abs(argument[3] - argument[1]);
if abs(x - argument[0].x) > (range_width * 0.5)
{
x -= sign(x - argument[0].x) * range_width;
}
}
if argument[6] //do_wrap_v
{
var range_height = abs(argument[4] - argument[2]);
if abs(y - argument[0].y) > (range_height * 0.5)
{
y -= sign(y - argument[0].y) * range_height;
}
}
var d = distance_to_object(argument[0]);
//return calling instance to where it was
x = tempx;
y = tempy;
return d;
}
We don’t need to measure all nine distances to know which is the shortest; we can tell by comparing the direct distance to the size of the wrap zone — if it’s less than half as big as the wrap zone, the direct distance is the shortest. If not, then we need to wrap. We can check the x and y axes separately, and if both are called for then we can just combine them.
The second function should be much faster to execute, and uses less RAM. How much faster? Well, let’s do a test project using my Simple Performance Test and compare.
Download the iMprOVE_WRAP distance_to_object test project
It turns out that the improved code runs about 50% faster than the old code! That’s a measurable and worthwhile improvement. Although, that said, the old function ran well enough that I could have released it, and likely it would not have been a problem for many uses, particularly in Windows YYC builds.
Appendix: Optimization through Iteration
In re-implementing this function, I went through an iterative process, making one change at a time. This enabled me to avoid making mistakes and getting confused by too many changes happening all at once. Each iteration, I did a simple, small change, and then tested it. If the function stopped working properly, I knew exactly what change was causing the problem.
Stage 1: Better algorithm
The first, biggest, and most important stage is to re-think the algorithm. If you’re doing micro-optimizations on an inefficient algorithm, there’s not much point to it. In this case, I knew that checking nine distances in order to compare them and find the smallest was not efficient. Armed with a tip my friend Sam gave me, I knew a better way was to compare the vertical and horizontal distances and if they were greater than half the height and width of the wrap range, wrapping was the shorter distance.
Rather than try to edit the working script to get this algorithm in place, I started over with a fresh script. I just find it less confusing to not have the old code in the way, particularly when I’m in the middle of changing it. I did keep the function’s argument declaration block, so I could still work with the same script-local variables. At this point, the code looked like this:
///iw_distance_to_object(target_obj, x1, y1, x2, y2, do_wrap_h, do_wrap_v,)
var obj = argument[0];
var iw_distance, iw_distance_up, iw_distance_down, iw_distance_left, iw_distance_right,
iw_distance_up_left, iw_distance_up_right, iw_distance_down_left, iw_distance_down_right;
var tempx, tempy, shortest;
var x1, y1, x2, y2, range_width, range_height, do_wrap_h, do_wrap_v;
//keep track of original location of target object
tempx = x;
tempy = y;
//set up wrap range
x1 = min(argument[1], argument[3]);
y1 = min(argument[2], argument[4]);
x2 = max(argument[1], argument[3]);
y2 = max(argument[2], argument[4]);
range_width = x2 - x1;
range_height = y2 - y1;
do_wrap_h = argument[5];
do_wrap_v = argument[6];
//TODO: BUILD A MORE EFFICIENT ALGORITHM
//Return calling instance to original location
x = tempx;
y = tempy;
return shortest;
In that TODO space, I wrote out the code for the new algorithm:
if do_wrap_h
{
if abs(x - obj.x) > (wrap_width * 0.5)
{
x -= sign(x - obj.x) * range_width;
}
}
if do_wrap_v
{
if abs(y - obj.y) > (wrap_height * 0.5)
{
y -= sign(y - obj.y) * range_height;
}
}
var d = distance_to_object(obj);
Tested it out, and it is working. So far, so good. I could have called this “done” and moved on, but by now I was interested in seeing just how far I could go. And immediately it was obvious that there was still some fairly easy things that I could take care of.
Stage 2: Rewrite the flow to avoid unnecessary work.
To get the optimal flow, ask yourself, “What’s the least amount of work I can possibly do to get to a return
statement?”
Let’s take a moment to appreciate that our new algorithm can handle the wrap axes independently of each other. In the first implementation, I had four flows to code:
!do_wrap_h && !do_wrap_v
, where I’m doing a regulardistance_to_object()
call, because no wrapping is happeningdo_wrap_h and !do_wrap_v
, where I need to get the distances for the horizontal virtual positions,!do_wrap_h and do_wrap_v
, where I need to get the distances for the vertical virtual positions, anddo_wrap_h && do_wrap_v
, where I need to get the distances for the corner virtual positions.
Right away, we can see that I have some duplicated lines of code that I had to write because I have to check each of these four cases separately.
Because the new algorithm is able to handle each wrap direction independently, we don’t need a separate if
block for all four of these cases. We just check the horizontal and vertical components of the distance, and apply the wrap distance to them if we need to. This means only one check for both do_wrap_h
and do_wrap_v
if do_wrap_h
{
//check if the normal distance is bigger than half the wrap range
if abs(x - obj.x) > (range_width * 0.5)
{
//if it is, shift x position to the closest virtual position
x -= sign(x - obj.x) * range_width;
}
}
//now do the same for y
if do_wrap_v
{
if abs(y - obj.y) > (range_height * 0.5)
{
y -= sign(y - obj.y) * range_height;
}
}
Stage 3: Avoid unnecessary variables and calculations.
Let’s review what the function looks like so far:
///iw_distance_to_object(target_obj, x1, y1, x2, y2, do_wrap_h, do_wrap_v,)
var obj = argument[0];
var iw_distance, iw_distance_up, iw_distance_down, iw_distance_left, iw_distance_right,
iw_distance_up_left, iw_distance_up_right, iw_distance_down_left, iw_distance_down_right;
var tempx, tempy, shortest;
var x1, y1, x2, y2, range_width, range_height, do_wrap_h, do_wrap_v;
//keep track of original location of target object
tempx = x;
tempy = y;
//set up wrap range
x1 = min(argument[1], argument[3]);
y1 = min(argument[2], argument[4]);
x2 = max(argument[1], argument[3]);
y2 = max(argument[2], argument[4]);
range_width = x2 - x1;
range_height = y2 - y1;
do_wrap_h = argument[5];
do_wrap_v = argument[6];
if do_wrap_h
{
if abs(x - obj.x) > (wrap_width * 0.5)
{
x -= sign(x - obj.x) * wrap_width;
}
}
if do_wrap_v
{
if abs(y - obj.y) > (wrap_height * 0.5)
{
y -= sign(y - obj.y) * wrap_height;
}
}
var d = distance_to_object(obj);
//Return calling instance to original location
x = tempx;
y = tempy;
return shortest;
Looking at this new code, I realized a few things can be eliminated:
- We aren’t using any of these variables at all any more, and can eliminate them:
var iw_distance, iw_distance_up, iw_distance_down, iw_distance_left, iw_distance_right, iw_distance_up_left, iw_distance_up_right, iw_distance_down_left, iw_distance_down_right;
- If the
do_wrap_h
anddo_wrap_v
arguments are bothfalse
, the answer is simply toreturn distance_to_object(obj)
. In that case, I don’t need any of the other arguments, so declaring them and setting their values is a complete waste of time. Of course, it’s pretty unlikely that someone using the iw_distance_to_object function would pass “false” into both arguments, instead of just use the non-wrapping version of the function, distance_to_object(). But we can add a line of code at the top of the function that will act as a shortcut in this case.if !do_wrap_h && !do_wrap_v {return distance_to_object(obj);} //skip everything else; we're done!
The only thing is, if we do this as written above, we still need to declare and assign do_wrap an obj, which doesn’t take much time at all, but it’s still unnecessary and uses a little bit of memory that we don’t need to. So we can avoid the declarations altogether, and do this:
if !argument[5] && !argument[6] return distance_to_object(argument[0]);
Another way to say
!argument[5] && !argument[6]
is!(argument[5] && argument[6])
, which should evaluate a bit faster because ifargument[5]
is true, short-circuiting kicks in and returns true without even looking atargument[6]
, and we only need to use oneNot
operation.
This change does make the code harder to read, and more brittle in the event we decide to change the order of the arguments or add new arguments, so we’re sacrificing a bit of maintainability here. I don’t know that it’s worth it for the tiny performance gain, but in a game where there’s a lot of instances calling this function every step, every little bit might count. And once the function is complete and proven correct through testing, I don’t anticipate needing to maintain it. - The variables
x1, y1, x2, y2
aren’t really needed, either; they were just there for my benefit so I could use them to determinerange_width
andrange_height
. But we can consolidate a step, and calculate the range width and height directly from theargument[]
variables. In some of my functions, it was important to establish the wrap range in such a way thatx1
was always to the left of (less than)x2
, and so too withy1
andy2
. So to guarantee this in some of my iMprOVE_WRAP functions, I force this by usingmin()
andmax()
on the arguments to guarantee the larger and smaller of the two values is assigned to the correct variable. I was simply following convention and re-using this code from elsewhere, but we don’t need to do that here. We can just get the width and height by subtracting:x2 - x1
andy2 - y1
. But we still need to ensure thatrange_width
andrange_height
are positive values, so we can do this by usingabs()
. - Also, we don’t need
range_width
orrange_height
unless we are able to wrap in that dimension. So rather than declaring all variables in a big block of declaration statements at the top of the function, we can avoid declaring them until we know we actually need them. Doing this makes the code a bit less organized from a readability standpoint, but it makes more sense from an execution standpoint. Don’t declare variables until you’re sure you’re going to use them for something!
This gives us:///iw_distance_to_object(target_obj, x1, y1, x2, y2, do_wrap_h, do_wrap_v,) if !(argument[5] || argument[6]) { return distance_to_object(argument[0]); } else { var tempx = x, tempy = y; if argument[5] //do_wrap_h { var range_width = abs(argument[3] - argument[1]); if abs(x - argument[0].x) > (range_width * 0.5) { x -= sign(x - argument[0].x) * range_width; } } if argument[6] //do_wrap_v { var range_height = abs(argument[4] - argument[2]); if abs(y - argument[0].y) > (range_height * 0.5) { y -= sign(y - argument[0].y) * range_height; } } var d = distance_to_object(argument[0]); //Return calling instance to original location x = tempx; y = tempy; return d; }
And that’s as optimized as I know how to make it.