Today I wanted to share two versions of a function that I wrote, in order to show how my iterative approach to software development works when I am doing code optimization to improve performance.
This example comes from my iMprOVE_WRAP asset. It’s a function that returns the shortest distance (taking into account the wrap capabilities of the calling object) between the calling instance and a target object.
The first implementation works, in that it correctly does what it’s supposed to do, but I never released it, because I wasn’t satisfied that it was good enough code to ship.
///iw_distance_to_object(target_obj, x1, y1, x2, y2, do_wrap_h, do_wrap_v,)
///@description Returns the distance_to_object from an improve_wrap object calling this function to another instance.
///Compares all relevant points for the iw_object and returns the nearest distance, taking the wrap range into account.
///@param target_obj id of the target object to determine the distance to.
///@param x1 left x boundary of wrap range
///@param y1 top y boundary of wrap range
///@param x2 right x boundary of wrap range
///@param y2 bottom y boundary of wrap range
///@param do_wrap_h set whether the horizontal wrap is on (true) or off (false)
///@param do_wrap_v set whether the vertical wrap is on (true) or off (false)
//get the distance from the nine virtual positions
//return the shortest distance
var obj = argument[0];
var iw_distance, iw_distance_up, iw_distance_down, iw_distance_left, iw_distance_right,
iw_distance_up_left, iw_distance_up_right, iw_distance_down_left, iw_distance_down_right;
var tempx, tempy, shortest;
var x1, y1, x2, y2, range_width, range_height, do_wrap_h, do_wrap_v;
//keep track of original location of target object
tempx = x;
tempy = y;
//set up wrap range
x1 = min(argument[1], argument[3]);
y1 = min(argument[2], argument[4]);
x2 = max(argument[1], argument[3]);
y2 = max(argument[2], argument[4]);
range_width = x2 - x1;
range_height = y2 - y1;
do_wrap_h = argument[5];
do_wrap_v = argument[6];
//check distances
//check center
iw_distance = distance_to_object(obj);
if do_wrap_h && do_wrap_v //wrap vertical and horizontal
{
//check corners
x = tempx - range_width;
y = tempx - range_height;
iw_distance_up_left = distance_to_object(obj);
y = tempx + range_height;
iw_distance_down_left = distance_to_object(obj);
x = tempx + range_width;
iw_distance_down_right = distance_to_object(obj);
y = tempy - range_height;
iw_distance_up_right = distance_to_object(obj);
//check left and right
y = tempy;
x = tempx - range_width;
iw_distance_left = distance_to_object(obj);
x = tempx + range_width;
iw_distance_right = distance_to_object(obj);
//check up and down
x = tempx;
y = tempy - range_height;
iw_distance_up = distance_to_object(obj);
y = tempy + range_height;
iw_distance_down = distance_to_object(obj);
shortest = min(iw_distance, iw_distance_up, iw_distance_down, iw_distance_left, iw_distance_right,
iw_distance_up_left, iw_distance_up_right, iw_distance_down_left, iw_distance_down_right);
}
if do_wrap_h && !do_wrap_v //do_wrap_h
{
//check left and right
x = tempx - range_width;
iw_distance_left = distance_to_object(obj);
x = tempx + range_width;
iw_distance_right = distance_to_object(obj);
shortest = min(iw_distance, iw_distance_left, iw_distance_right);
}
if do_wrap_v && !do_wrap_h //do_wrap_v
{
//check up and down
y = tempy - range_height;
iw_distance_up = distance_to_object(obj);
y = tempy + range_height;
iw_distance_down = distance_to_object(obj);
shortest = min(iw_distance, iw_distance_up, iw_distance_down);
}
if !do_wrap_h && !do_wrap_v
{
shortest = iw_distance;
}
//return calling instance to original location
x = tempx;
y = tempy;
return shortest;
Let’s take a moment to appreciate this function as it’s written. It’s well-structured, documented, and expressive. First we declare a bunch of variables, then we do stuff with the variables, then we get our answer and return it. And this gives a correct result…
So what’s wrong with the above? It’s an inefficient approach, which checks each virtual position of the wrapping object. If the calling instance wraps vertically and horizontally, it has to temporarily move the calling instance 9 times and check the distance from each of 9 virtual positions, then return it back to its original position, only to return the shortest of those 9 virtual positions.
There’s also a lot of code duplication.
Still, it’s not horrible code. But it’s up to 9x slower than the distance_to_object()
function it’s based on, if you’re wrapping in both directions, which will probably be common. I didn’t think that was good enough.
Rather than check each virtual location to see which is the shortest distance, we just need to know whether the horizontal and vertical distances are more than half of the width and height of the wrap region. If they are, then it’s shorter to go around the wrap. To know this, you simply take the x
and y
values of the two positions, subtract one from the other, and compare to the size of the wrap range. Once you know which virtual position is the closest one, you can temporarily place the calling instance there, and use distance_to_object()
to get that distance. Put the calling instance back where it was, and then return the distance.
I realized as well that depending on whether the calling object wraps in both directions, you may not need to check for a wrap shortcut in the horizontal or vertical. So we can potentially avoid doing some or all of the checks depending on whether the do_wrap_h
and do_wrap_v
arguments are true or false. As well, this means we can avoid declaring certain variables if they’re not needed, which conserves both execution time as well as RAM.
I usually create local script variables in a var
declaration, and assign the arguments to them so the code will be more readable, but I wanted to avoid doing that so that this function could be as lean and fast as possible. This might be an unnecessary optimization, but that’s hard to predict since I have no way of knowing ahead of time how this function might be used in a future project. In a project with many wrapping instances, it could very well be called many times per step, and every optimization could be critical. Since the script is intended to be included as a function in an extension, once I have it working properly it shouldn’t be opened for future maintenance, so making the script readable is not as important. So I opted to remove the local variable declarations as much as possible and just use the argument[]
variables directly.
Also, to ensure that the wrap range is defined properly, in the non-optimized version of this function, I declare x1, y1, x2, y2
and assign their values using min()
and max()
so that (x1, y1)
is always the top left corner, and (x2, y2)
is always the bottom right corner of the wrap range. Technically for this function, we don’t care precisely where the wrap range is, only what the width and height of the wrap range are. That being the case, I can further optimize what I have here, and rather than use min and max, I can just take the absolute value of the difference of these two values.
It turns out that the process I went through to optimize this function is pretty interesting, if you care about optimizing. So I’ll go into greater detail at the end of this article about the approach I took to get there. But for now, let’s skip ahead and look at the finished, optimized function. Here it is, re-implemented, this time doing only the minimum amount of work needed:
///iw_distance_to_object(obj, x1, y1, x2, y2, do_wrap_h, do_wrap_v)
///@description iw_distance_to_object returns the shortest distance in room pixels between two objects in the wrap range,
///taking into account the horizontal and/or vertical wrap properites of the calling object.
///@param obj the id of the target object
///@param x1 left x boundary of wrap range
///@param y1 top y boundary of wrap range
///@param x2 right x boundary of wrap range
///@param y2 bottom y boundary of wrap range
///@param do_wrap_h set whether the horizontal wrap is on (true) or off (false)
///@param do_wrap_v set whether the vertical wrap is on (true) or off (false)
if !(argument[5] || argument[6]) //not wrapping actually
{
return distance_to_object(argument[0]);
}
else
{
//We're going to figure out which virtual position is the nearest to measure from
//To do that, we have to compare the h-distance and v-distance of the calling instance and the target position
//If this distance is <half the range size, then the original position of the calling instance is closest
//Otherwise we have to use one of the virtual positions
//Then we're going to temporarily put the calling instance in that location, get the distance, and put it back
//arguments
var tempx = x, tempy = y;
if argument[5] //do_wrap_h
{
var range_width = abs(argument[3] - argument[1]);
if abs(x - argument[0].x) > (range_width * 0.5)
{
x -= sign(x - argument[0].x) * range_width;
}
}
if argument[6] //do_wrap_v
{
var range_height = abs(argument[4] - argument[2]);
if abs(y - argument[0].y) > (range_height * 0.5)
{
y -= sign(y - argument[0].y) * range_height;
}
}
var d = distance_to_object(argument[0]);
//return calling instance to where it was
x = tempx;
y = tempy;
return d;
}
We don’t need to measure all nine distances to know which is the shortest; we can tell by comparing the direct distance to the size of the wrap zone — if it’s less than half as big as the wrap zone, the direct distance is the shortest. If not, then we need to wrap. We can check the x and y axes separately, and if both are called for then we can just combine them.
The second function should be much faster to execute, and uses less RAM. How much faster? Well, let’s do a test project using my Simple Performance Test and compare.
Download the iMprOVE_WRAP distance_to_object test project
It turns out that the improved code runs about 50% faster than the old code! That’s a measurable and worthwhile improvement. Although, that said, the old function ran well enough that I could have released it, and likely it would not have been a problem for many uses, particularly in Windows YYC builds.