INSPIRED RUBBISH: NEED FOR SPEED

Hi ya:).. don't ye any time came across time overshooting problems while coding some real cool stuff for programming contests like ACM ICPC? Yes, I know you certainly would have atleast once. And then nothing but a sigh - "Oh my-Oh my!! .. dhat wazz a correct code, dude; the logic I uzzed waz deadly, juss dose extra millisecondzz program took screwed it ahl! Damn! :X" And that was a very trivial example - to say the least.

In real world, while designing sophisticated real time response systems (RTRS) we do need to optimise our code again and again to make it as dazzling fast as we can. These days, Machines and Robos are almost everywhere performing numerous highly spohisticated tasks, the automotives equipped with highly intelligent DAS - Driver Assistant Systems are being launched, coolest video games with eye candy graphics and acrobatic combats are rocketing the gaming market: all having one thing in common - NEED FOR SPEED - and here, that's not the name of the famous racing game, in case you just got astrayed a bit, it is, but the commmon uproar these gadgets have - TO OPTIMIZE AGAINST TIME - JUST SPEED UP EVERYTHING!

So,right time, friends, to share some tips to optimize our programs against time. What follows is the compilation of several tricks to optimize C code. Most of them are practical and hold true even if developer has programmed the project in C++. Check them out-

Before we start Optimizing

So..Are we ready to optimize our C code..Wait..do you recall Ahmdal’s Law.This is what the law says -
Speedup = Told/Tnew

= 1 / ( (1-fruntime) + fruntime/fspeedup)

where fruntime is the percentage of program run time used by the function 'f' and fspeedup is the factor by which we speed up this function. Now, consider a practical case. Suppose we are working on a module (say, function 'f') with the goal to optimize it against time and the whole project takes 0.250 milliseconds and we make the module run twice as fast (i.e. fspeedup=2) ; then we will get 25% speedup assuming the module(function 'f') takes 40% of the execution time of the whole project. Henceforth, the project will take 0.200 milliseconds. The overall gain is of just 50 microseconds despite doubling the speed of the module. So, optimizing has an essential pre-requisite - we must first check which modules take the most of the project's execution time. Need is to attack the code within intensive loops or that part of project which is used more frequently. Else, it's all futile spending our energy and time and we may eventually employ the coolest optimization trick from our inventory stock but that may not reflect any significant time conservation in the overall project. Checkpoint: Before we start optimizing - we need to do Profiling stuff - that is, to check the bottlenecks in our code - that is, to identify the modules we would target for optimization.

Is Optimization necessary?

The nerds have something to share - "You may write very efficient code but spend at least twice as long optimizing it." Hence, don't do it unless absolutely necessary. Necessity may arise when there is need to get a real time response as in Games, DAS gadgets, Robos etc.

Manual vs. Automatic Optimization & Algorithm Selection

The modern compilers are smart!! Thanks to the cutting edge technology - we now have really smart compilers that can do most of our work required for optimization. However, what if the requisite is to optimize something that is beyond the scope of the compiler? That's really challenging!! We need to be sure we don't baffle the optimizer by our tweaking methods. Hell lot of algorithm analysis and code understanding is required to think of changes at the algorithm level. Surely, if the code is using linear search somewhere in the module, the automatic optimizer won't implement binary search on it's own. Well, that was a very simple case. Real life projects and applications have extremely sophisticated algorithms that have a lot of interdependencies with other algorithms, and bringing about a significant change in them is not that easy; Well, even saying not that easy is an understatement.;) So good so far. So, it's like the best method of optimizing a project is to use a mixed optimization scheme - use smart and intelligent compilers to do the automatic stuff and then think something beyond the scope of the compiler. Checkpoint: The first step in Optimization is to double check that the code design is complete and then attack algorithm level changes subsequently. That's the toughest phase but at the same time, a prudently selected algorithm may conserve considerable amount of time; that cannot be obtained even after application of zillion of optimization tricks on the old algorithm.

Can a programmer outperform an optimizer?

YES

This is possible by
i - choosing a better algorithm

ii - knowing more about the definitions and uses of data items in the program eg. a programmer may know relative probabilities of branches being taken. Also An optimiser has to ensure correctness of the optimised program under all conditions hence it has to be conservative.
An optimiser may miss optimisation opportunities because of this..!
Also NO Because -
i - It takes too much time to perform some optimisations by hands viz

strength reduction, copy propagation, dead code elimination

ii - Certain machine level details are beyond the control of a programmer viz. instructions and addressing modes supported by the target machine

Tips and Tricks

Using Array Indices

If wished to set a variable to a particular character, depending upon the value of something, might do this:
switch ( queue ) {
case 0 : letter = 'W';
break;
case 1 : letter = 'S';
break;
case 2 : letter = 'U';
break;
}

Alternative method is to simply use the value as an index into a character array, eg.
static char *classes="WSU";
letter = classes[queue];

Also, note that switch is always better alternative in case there are multiple if else statements. Long if...else if...else if...else if... chains require lots of jumps for cases near the end of the chain (in addition to testing each condition). If possible, convert to a switch statement, which the compiler sometimes optimizes into a table lookup with a single jump.

Integers

Use unsigned ints instead of ints if known the value will never be negative. The best declaration for an int variable in a tight loop would be:
register unsigned int var_name;
(Although it is not guaranteed that the compiler will take any notice of "register", and "unsigned" may make no difference to the processor)
Remember, integer arithmetic is much faster than floating-point arithmetic, as it can be usually be done directly by the processor, rather than relying on external FPUs or floating point maths libraries.

Passing structures

Whenever possible, pass structures by reference (i.e. pass a pointer to the structure), otherwise the whole darn thing will be copied onto the stack and passed, which will slow things down. Functions receiving pointers to structures as arguments should declare them as pointer to const if the function is not going to alter the contents of the structure.

void print_data( const bigstruct *data_pointer)
{
...display contents of structure...

}
This example informs the compiler that the function does not alter the contents (pointer to constant structure) of the external structure, and does not need to keep re-reading the contents each time they are accessed. It also ensures that the compiler will trap any accidental attempts by the code to write to the read-only structure.

Minimize the use of global variables

Global variables are never allocated to registers. Global variables can be changed by assigning them indirectly using a pointer, or by a function call. Hence, the compiler cannot cache the value of a global variable in a register, resulting in extra (often unnecessary) loads and stores when globals are used. We should therefore not use global variables inside critical loops.
If a function uses global variables heavily, it is beneficial to copy those global variables into local variables so that they can be assigned to registers.

Don't use recursion

Recursion can be very elegant and neat, but creates many more function calls which can become a large overhead.

Inline Functions

There are several advantages to using inline functions:No function call overhead. As the code is substituted directly, there is no overhead, like saving and restoring registers. Lower argument evaluation overhead. The overhead of parameter passing is generally lower, since it is not necessary to copy variables. If some of the parameters are constants, the compiler can optimize the resulting code even further. The big disadvantage of inline functions is that the code sizes increase if the function is used in many places. This can vary significantly depending on the size of the function, and the number of places where it is used.

Operations' selection

Addition is quicker than multiplication –
Do use val + val + val instead of val * 3

If necessary, make it easy on the C compiler and use bit shifts, and divisors that are a power of 2. Divisors that are a power of 2, such as 256=2^8, can be optimized into a bit shift by the C compiler.

Sometimes, such expressions can be rewritten by replacing the division by a multiplication. For example, (a / b) > c can be rewritten as a > (c * b) if it is known that b is positive and b *c fits in an integer. It will be better to use unsigned division by ensuring that one of the operands is unsigned, as this is faster than signed division.

Use shift operations >> and <<>

Avoiding Memory reads

typedef struct { int x, y, z; } Point3;
typedef struct { Point3 *pos, *direction; } Object;
void InitPos1(Object *p)
{
p->pos->x = 0;
p->pos->y = 0;
p->pos->z = 0;
}
Here better to store Object *P=p->pos;
Then P->x, P->y, P->z used.
At any cost, avoid memory reads as far as possible.

Lazy Evaluation

In if (a>10 && b=4) type of thing, make sure that the first part of the AND expression is the most likely to give a false answer (or the easiest/quickest to calculate), therefore the second part will be less likely to be executed.

Loops

Loops are a common construct in most programs; a significant amount of the execution time is often spent in loops. It is therefore worthwhile to pay attention to time-critical loops. If(a==10)
Else if(a==9)
Else if(a==8)
Else if(a==7)
Else if(a==6)
…
May be changed to
If(a<5)>5) ... etc.

Loop Unrolling and Loop peeling may be automatically handled by the complier in case the optimizer is ON.

Parameters to Functions

To minimize the overhead of passing parameters to functions:
•Try to ensure that small functions take four or fewer arguments. These will not use the stack for argument passing.
•If a function needs more than four arguments, try to ensure that it does a significant amount of work, so that the cost of passing the stacked arguments is outweighed.
•Pass pointers to structures instead of passing the structure itself.
•Put related arguments in a structure, and pass a pointer to the structure to functions. This will reduce the number of parameters and increase readability.

Fixed Point Arithmetic

When the range of values needed is sufficiently small, fixed-point arithmetic is more accurate and much faster than floating-point arithmetic.

Instruction-level-parallelism

Even though many applications still rely on single threaded execution, modern CPUs already have a significant amount of parallelism inside a single core. This means a single CPU might be simultaneously executing 4 floating point multiplies, waiting for 4 memory requests, and performing a comparison for an upcoming branch. So, must know and exploit the architecture well.

Avoid casting wherever possible

Try to avoid casting where possible. Integer and floating point instructions often operate on different registers, so a cast requires a copy. Shorter integer types (char and short) still require the use of a full-sized register, and they need to be padded to 32/64-bits and then converted back to the smaller size before storing back in memory. (However, this
cost must be weighed against the additional memory cost of a larger data type.)

Simplify mathematical equations on paper

In many equations, terms cancel out, either always or in some special cases. The compiler cannot find these simplifications, but we can. Eliminating a few expensive operations inside an inner loop can speed program more than days working on other parts.

this way, we save two divisions and even MIN macro function usage is avoided. However, this is code specific optimization and needs careful scrutiny from the optimizer's side.

Avoid redundant conditional checks

Often, many if conditions are unnecessary and can be avoided. Also, whenever possible move conditional checks outside the loops rather than keeping it inside (in case the check is independent of the index of the loop)

Club conditional checks

Two consecutive if conditional checks may be clubbed into one using logical && as this translates two JUMPS in corresponding assembly to one JUMP. Also, consecutively occuring for loop and if conditional check may be clubbed into one by removing the check within if condition and clubbing it with for loop conditional check, as follows:so, for...if... can be changed to for...only.

INSPIRED RUBBISH

Tuesday, November 17, 2009

NEED FOR SPEED

No comments:

About Me

Blog Archive