What Registers Store Parameters In Assembly

The basics of programming in assembly, the blueprint of the processor, registers, retentiveness, instruction, and use of assembly language inside C++ and Delphi.

1. Introduction to associates

Assembly language, a low-level programming language which allows y'all to utilise all the features of a computer processor is nowadays somewhat forgotten by "modern" developers.

The main reason for this is that writing in assembly is not the simplest of tasks, and is very fourth dimension-consuming (testing lawmaking, finding bugs etc.).

Yet, in some situations assembly may be an ideal solution. An case is whatsoever kind of algorithm where speed is essential, such as in cryptographic (i.e. encryption) algorithms.

Despite incredible advancements in compilers in contempo years, algorithms such as Blowfish, Rijndael, Idea written in associates and "manually" optimised bear witness significant speed advantages over their counterparts written e.g. in C++ and compiled at the maximum optimisation level.

In addition to cryptography, assembly is as well often used by game developers. The best case may exist the game Quake 2. Afterwards the publication of its source code, it turned out that all the algorithms that require speed were written in assembly.

So let's become started. To be articulate, I should add that in this article I will focus on associates for x86 processors, and its utilize in a Windows environment.

2. Fundamentals of assembly

If you accept never written in associates, earlier y'all tin even create the simplest programme, you must first acquire several fundamentals similar the CPU registers, instructions, and the stack.

From the programmer's perspective, a standard processor (I will use the Intel Pentium MMX as an example, as information technology is all I've got :-) has a large range of instructions ranging from viii to sixteen to 32-bit x86 instructions, as well every bit floating point and MMX instructions.

2.one. CPU registers

The processor has viii 32-scrap general purpose registers and flags register, likewise equally viii lxxx-bit coprocessor registers (st0 - st7) and an equal number of 64-bit MMX registers (mm0 - mm7). The processor also has several control registers, that we generally don't utilize.

What is a annals? A annals is like a memory prison cell, which can temporarily store data; we tin exchange information between the registers, and perform logical operations and arithmetic on the registers. The Pentium processor is 32-chip, which ways that each of the general purpose registers is 32 bits broad (corresponding to unsigned int in C). All 32-scrap registers accept a 16-bit half (a remnant from the 286 processor), while the 16-scrap halves of registers EAX, EBX, ECX and EDX are each divided into two eight-bit halves:

Register Name	16-bit half	viii-bit halves	Clarification
EAX	AX	AH and AL	Accumulator
EBX	BX	BH and BL	Base
ECX	CX	CH and CL	Counter for string operations and loops
EDX	DX	DH and DL	Data
ESI	SI	due north/a	Source annals for cord instructions
EDI	DI	north/a	Destination register for cord instructions
EBP	BP	n/a	Pointer to data within the stack, used by functions to locate parameters saved on the stack
ESP	SP	n/a	Stack pointer

2.2. Full general purpose registers

When writing a program, or inline assembly lawmaking nether Windows, you can apply all the general purpose registers, but using the special registers ESP and EBP can interfere with the operation of the program. For example, if you reset the ESP register to goose egg within a function, the program will virtually likely crash afterwards (e.1000. if the plan tries to return from the function).

2.3. The stack

The stack is an area of retentivity reserved for the needs of the program. These include passing parameters to functions (as 32-scrap values), temporary data storage, and all local variables. When the plan starts, the ESP annals (stack pointer) points to the end of the stack. When data is stored on the stack, the ESP register is decremented, and the data is then stored in the memory location which ESP points to. To store data on the stack, the push instruction is used, for instance:

            __asm {      push    v                // shop the number 5 (32 flake) on the stack     push    eax              // relieve the contents of register EAX on the stack     push    dword ptr[edx]   // save the contents of memory referenced by                              // the EDX annals      sub    esp,iv             // equivalent to 'push 5'     mov    dword ptr[esp],5      sub    esp,4             // equivalent to 'push eax'     mov    dword ptr[esp],eax }

To call back and remove a value from the stack, the popular instruction is used, which works in the reverse style to push. First the value is read from the address indicated by the ESP annals, then the ESP register is incremented:

            __asm {      button    5                // store iv 32-flake values on the stack     push button    eax     push    dword ptr[edx]     push    13B0C032h      popular    eax               // remove the virtually recent value from the stack,                              // which in this instance is the number 13B0C032h     popular    dword ptr[edx]    // this operation does not change anything, since                              // the value stored on the stack came from the                              // location referenced by EDX and is simply being                              // returned there      pop    edx               // put the value originally held by EAX into EDX     pop    ecx               // put the value 5 into register ECX      push    5                // store the value 5 on the stack                               // the following instructions simulate 'pop eax'     mov    eax,dword ptr[esp]     add together    esp,4 }

ii.4. Limitations in Windows

If you lot have written associates programs nether MS/DOS, where there were no limitations, you volition need to be aware that there are some differences under Windows. Every bit I said before, in assembly nosotros tin can utilise all the instructions that the CPU supports, however some instructions are non permitted by the operating organization, in our case Windows. For instance, if we apply I/O port instructions, the compiler volition not requite an error, but the plan will most likely crash if these instructions are executed under Windows.

Instructions which tin cause the plan to be terminated include the higher up-mentioned I/O port instructions, too as instructions that refer to interrupts, segment registers and control registers.

Regarding the segment registers, Windows uses the flat retentivity model, which means that all code and information exists in the aforementioned retentiveness space ranging from 0 up to 0xFFFFFFFF. So, when accessing memory in that location is no need to bother with segment registers. Dissimilar in MS-DOS, there is no need to use segment prefixes like DS:.

3. Using assembly language

To take advantage of the benefits of assembly, you must outset check whether your development tools allow its use. Products such equally Borland Delphi, Builder, Watcom C++ or Microsoft Visual C++ allow you to use (compile) assembly code; Visual Bones is the only popular RAD package which does non allow writing code in associates. These products back up the employ of assembly code in two ways. The offset is called inline assembly, where the assembly code is inserted into the regular lawmaking written in e.g. C++. The 2d method is linking modules (i.e. separate files) written in assembly with modules written e.thou. in Delphi or C++.

3.one. Inline assembly

Earlier y'all start writing assembly code, you must bank check how to write information technology, because at that place are 2 types of syntax for assembly code. The first blazon is chosen "intel syntax", and is used in products such as Delphi, Builder, MSVC, Borland TASM, Microsoft MASM (assembly compilers). This syntax is at present the standard and is used in 90% of sources. The 2nd blazon is chosen "at&t syntax", and is used e.m. in C compilers, such as GCC (Linux platform), DJGPP and LCC.

Inline assembly is the easiest manner to write asm code. When writing assembly code in Delphi or Builder, it must be enclosed between the asm keyword marking the get-go of the assembly lawmaking, and the finish; keyword later on the code. For example:

            // our outset 'hi globe' in associates, Delphi version asm                          // start of assembly code      mov    eax,1             // movement the value 0x00000001 into register EAX                              // the C++ equivalent of this teaching is the                              // assignment operator '=', due east.g.                              // ten = one;                              // the Delphi equivalent is the assignment                              // operator ':=', eastward.g.                              // y := 1;      mov    ecx,eax           // motion the contents of annals EAX into                              // register ECX, that is, the value 0x00000001                              // will terminate upward in ECX      shl    ecx,2             // this 'Shift Left' instruction will shift the                              // contents of register ECX to the left by 2 bits                              // As you may know, left shifting serves to                              // multiply values by successive powers of two                              // Shifting 0x00000001 to the left by two bits                              // volition result in the value 0x00000001 * four = 0x00000004                              // saved to ECX                              // in C++, bit shifts are accomplished with the '<<'                              // operator, east.g.                              // x = y << 2;                              // in Delphi, fleck shifts use the same keywords as                              // as assembly code, namely 'shl' or 'shr', e.thousand.                              // x := y shl 2;      shr    eax,1             // this 'Shift Right' instruction will shift the                              // EAX register to the correct past 1 fleck      and    eax,0             // 'And' is a logical multiplication of bits                              // according to the post-obit tabular array:                              // 0 * 0 = 0                              // 1 * 0 = 0                              // 0 * 1 = 0                              // 1 * 1 = ane                              // Whatever value multiplied by 0 will requite 0; in this                              // instance, the EAX register will be zeroed out                              // The C++ equivalent of this didactics is                              // the '&' operator, east.grand.                              // x = y & 0;                              // in Delphi:                              // ten = y and 0;       or    eax,0FFFFFFFFh     // 'Or' is a logical sum of bits according                              // to the post-obit table:                              // 0 + 0 = 0                              // 1 + 0 = ane                              // 0 + 1 = 1                              // 1 + 1 = 1                              // in this case EAX will exist ORed with the value                              // 0xFFFFFFFF, which volition result in the value                              // 0xFFFFFFFF no matter what EAX contains                              // The C++ equivalent of this functioning is the                              // '|' operator, e.grand.                              // ten = y | 0xFFFFFFFF;                              // in Delphi:                              // x := y or $FFFFFFFF;      sub    edx,edx           // 'Subtract' subtracts the value of one register                              // from another. In this case, EDX volition get naught                              // The C++ equivalent is '-', e.g.                              // x = x - x;      xor    eax,eax           // 'sectional Or' follows this table:                              // 0 ^ 0 = 0                              // 1 ^ 0 = one                              // 0 ^ 1 = i                              // one ^ one = 0                              // This function yields ane when its two inputs are                              // different; if they are the same it will give 0                              // Hence the didactics 'xor eax,eax' will zero                              // out the EAX register                              // The C++ equivalent is the '^' operator, e.yard.                              // ten = x ^ y                              // in Delphi:                              // x := 10 xor y;  end;                         // end of associates code

Writing inline associates in MSVC only actually differs in how the assembly code is introduced to the compiler:

            // our 2d 'hello globe' in assembly __asm {                      // start of associates code      push button    5                // save the value 0x00000005 on the stack     pop    eax               // remove 0x00000005 from the stack and write                              // information technology to register EAX      push    eax              // save the contents of annals EAX on the stack                              // (in this case the value 5)     pop    edx               // remove the value 5 from the stack and write it                              // to register EDX      mov    ax,0FFFFh         // write the value 0FFFFh to the 16-scrap lower                              // half of register EAX     mov    dx,ax             // write the value from register AX to the sixteen-scrap                              // lower half of register EDX     mov    al,11             // write the value 11 (decimal) to the viii-bit                              // lower half of register AX     mov    ah,11h            // write the value eleven (hex) to the eight-bit upper                              // half of register AX, which is 17 in decimal }                            // end of assembly lawmaking

three.2. Using variables in assembly

Writing in assembly, you have admission to all global variables, and if the code is in a process, it also has admission to the local variables and parameters of the process/function, so its capabilities are practically the same as normal lawmaking. An case of the use of global and local variables:

            // global variables var     ByteVar: Byte;           // byte - 8 $.25     WordVar: Discussion;           // word - 16 $.25     IntVar: Integer;         // double-discussion - 32 bits   ...  procedure noop;  // local variables of office 'noop' var     LocalByte: Byte;     LocalWord: Give-and-take;     LocalInt: Integer;  begin      // initialise global variables     ByteVar := $FF;          // 8-flake value     WordVar := $FFFF;        // xvi-bit value     IntVar  := $FFFFFFFF;    // 32-chip value      asm         mov    al,ByteVar    // write an eight-bit value to an 8-fleck register         mov    LocalByte,al  // write an viii-bit value to a local variable          mov    ax,WordVar    // 16-fleck value to 16-bit annals         mov    LocalWord,ax          mov    eax,IntVar    // 32-scrap value to 32-chip annals         mov    LocalInt,eax     end;  cease;

The instance for MSVC is not much dissimilar from that of Delphi:

            // global variables char ByteVar; short WordVar; int IntVar; ...  void noop() {     // local variables     char LocalByte;     curt LocalWord;     int LocalInt;      // initialise global variables     ByteVar = 0xFF;          // 8-fleck value     WordVar = 0xFFFF;        // sixteen-chip value     IntVar  = 0xFFFFFFFF;    // 32-flake value      __asm {          mov    al,ByteVar    // write an 8-bit value to an 8-bit register         mov    LocalByte,al  // write an viii-fleck value to a local variable          mov    ax,WordVar    // 16-bit value to 16-fleck register         mov    LocalWord,ax          mov    eax,IntVar    // 32-flake value to 32-chip register         mov    LocalInt,eax     }  }

You can write entire functions in assembly language. When doing this, in that location are a few things to go along in mind. If the function returns a value, we must ensure that the returned value is stored in the EAX register earlier leaving the office. A simple case:

            // Delphi version function add(x, y:integer):integer; asm     mov    edx,x             // copy the function's commencement parameter to EDX     mov    ecx,y             // copy the office'due south second parameter to ECX     add together    edx,ecx           // add x and y together     mov    eax,edx           // write the event to annals EAX                              // this becomes the function's return value end;

            // C++ version int mult(int ten,int y) {   __asm {      mov    edx,x             // copy the function'south kickoff parameter to EDX     mov    ecx,y             // copy the function's second parameter to ECX     imul   edx,ecx           // multiply x by y     mov    eax,edx           // write the result to register EAX                              // this becomes the function's return value }  }

We already know that functions written in assembly must place the return value in the EAX register, but what about the other registers?

In brusk, registers EAX, EDX, and ECX may comprise whatsoever value when the office exits, only registers EDI, ESI, EBX, and EBP generally must not change (their value must be the same as it was earlier the call). You may wonder why this is the instance. Well, the code produced by the compilers of the HLL (high-level language) use this 2d group of registers throughout the program to concord e.k. addresses of functions, constants, etc., and if they are changed by a office, code that runs later may utilize invalid values, which can cause anything from data corruption to a crash. Information technology is like shooting fish in a barrel to foreclose such errors:

            // Delphi version part count(west,x,y,z:integer):integer; asm      push    edi              // save the contents of registers EDI, ESI and EBX     push    esi              // on the stack     push    ebx      mov    edi,w             // copy each function parameter to a register     mov    esi,ten     mov    edx,y     mov    ebx,z      add    edi,esi           // w + ten     add    edx,ebx           // y + z      imul    edi,edx          // (w+x) * (y+z)      xchg    eax,edi          // 'substitution' swaps the contents of 2 registers                              // in this case EAX and EDI, in other words,                              // the quondam value of EAX is now in EDI, and the                              // old value of EDI is now in EAX, which becomes                              // the function'due south render value      pop    ebx               // Remove the saved values of the registers from     pop    esi               // the stack, and put them back in the registers     popular    edi               // We must remove the values in contrary order -                              // looking at the lawmaking we can see that it is                              // 'symmetrical'. If the values were saved in the                              // society EDI, ESI, EBX, and then they must exist removed                              // in the order EBX, ESI, EDI terminate;

In addition to the registers EDI, ESI, EBX, and EBP, the status flag DF (Direction Flag) is expected to exist zero (cleared) before and after any call. Just use the CLD instruction if its status is changed within the role.

When writing code in assembly that uses the stack, special attending should be paid to ensuring that the stack pointer ESP is ever restored. E.k. if the procedure or role stores something on the stack, then this detail must be removed before exiting the role. This time we'll look at an example in MSVC:

            // example of an encryption function void crypt(unsigned char *string) { __asm {      push    edx              // save the contents of register EDX on the stack     mov    edx,cord        // take hold of the parameter from the stack; in this example                              // a pointer to the string nosotros must encrypt      cmp    edx,0            // bank check whether the parameter is valid     je    _exit_encrypt     // if invalid, leave the part  _encrypt_loop:      mov    al,byte ptr[edx] // load the side by side byte of the cord     cmp    al,0             // check for the end of the string                             // strings are represented every bit ASCII; byte 00h                             // means end-of-string      je    _exit_encrypt     // once we reach the stop of the string, leave      xor    al,7             // encrypt the byte with a simple xor     mov    byte ptr[edx],al // shop the encrypted byte in the cord     inc    edx              // set the cord arrow to point to the side by side byte      jmp    _encrypt_loop    // go to the start of the loop then that the                             // process repeats  _exit_encrypt:      pop    edx              // Important: correct the stack, and restore the                             // register EDX to its original value } }

3.three. Calling functions from associates

Sometimes in assembly code you will need to call a function written in another language. How is this done? Very only, a part is called with the teaching call func_name. It is worth noting that there are several ways to call and "clean upwards" after a function:

Proper noun	in C lawmaking	Parameters	Return values	Modified registers	Info
cdecl	cdecl	passed on the stack; the parameters are not removed by the function	eax, viii bytes: eax:edx	eax, ecx, edx, st(0), st(7), mm0, mm7, xmm0, xmm7	This is the method of calling C library functions, introduced by Microsoft. All arrangement functions on the Linux platform also use this convention
fastcall	__fastcall	ecx, edx, any remaining parameters are passed on the stack	eax, eight bytes: eax:edx	eax, ecx, edx, st(0), st(7), mm0, mm7, xmm0, xmm7	Microsoft introduced this standard, but later switched to the cdecl convention in its products
watcom	__declspec (wcall)	eax, ebx, ecx, edx	eax, 8 bytes: eax:edx	eax	This function calling convention was introduced by Watcom in their C++ compiler
stdcall	__stdcall	passed on the stack; parameters are removed past the part	eax, 8 bytes: eax:edx	eax, ecx, edx, st(0), st(7), mm0, mm7, xmm0, xmm7	The default calling convention for Windows API functions in DLLs
register	n/a	eax, edx, ecx, any remaining parameters are passed on the stack	eax	eax, ecx, edx, st(0), st(7), mm0, mm7, xmm0, xmm7	This is the calling convention used in Borland's Delphi

The correct calling convention for functions in our own programs (as opposed to WinApi) often depends on the options with which the program was compiled. In Delphi the default convention is "annals", while for most programs written in C, the default is "cdecl".

WinApi functions (Windows system functions) use the mechanism stdcall, where office parameters are kickoff stored on the stack, and so the function is called. Afterward the role returns, in that location is no need to adjust the stack (remove the previously saved parameters), since the called function does it for united states of america. Interestingly, a few WinApi functions practice not utilize the stdcall convention, but instead utilize cdecl, that is, the parameters are stored on the stack, and then the office is chosen, but afterwards the stack must be cleaned up manually. An example of such a office is the wsprintfA function from the Windows organisation library user32.dll (whose counterpart in the C standard library is sprintf). The cdecl was probably chosen because these functions exercise non have a fixed number of parameters:

            // global string unsigned char title[] = "The values of x and y"; ...  // this part changes the values x and y into ASCII form, after which // a message box is displayed showing x and y in their string course unsigned int int2str(unsigned char *buffer, unsigned int 10, unsigned int y) {     // local string, accessible only by the function int2str     unsigned char format[] = "x = %lu\ny = 0x%10\n";      __asm {          // Note the fashion in which the parameters of the part are passed.         // In C++, the function call would await like this:         // wsprintf(buffer, "x = %lu\ny = 0x%X\n", 10, y);         // In assembly the parameters are pushed onto the stack in reverse         // lodge, after which the function is called.          push    y            // salve y on the stack         push    x            // save ten on the stack         lea     eax,format   // load the accost of the local string into EAX         push    eax          // salve the address of this string on the stack         push    buffer       // relieve the pointer to the output buffer, where                              // the formatted text will cease up         phone call    wsprintfA    // call this WinApi role         add together    esp,4*4       // clean up the stack - iv*4 = xvi bytes. This is                              // how much space was taken by the parameters                              // saved on the stack before the function was called                              // When writing code east.g. in C++, the compiler                              // takes care of this for you, but in assembly you                              // must practise this yourself          button    MB_ICONINFORMATION // specifies the icon that will appear                              // next to the text in the bulletin box         push    offset title // the window championship (a global variable); we apply                              // the keyword 'offset' considering we desire to write                              // the accost of the string to the stack         push    buffer       // the text which will appear in the message box         push button    0            // handle of the parent window         call    MessageBoxA  // show the message box      } }

iv. MMX instructions

MMX is the proper name of an extension to the Pentium serial of processors, introduced by Intel. The proper name is said to be an abbreviation of "MultiMedia eXtensions", but Intel denies this, and has never explained the issue. The MMX extension to the Pentium line of processors includes a set up of new instructions (57, to be verbal), and 8 additional 64-bit registers.

MMX registers are shared with the FPU registers. This means that y'all cannot mix FPU (Floating Point Unit) instructions with MMX unit instructions otherwise the contents of the registers volition be corrupted. MMX instructions tin can operate on data in SIMD fashion (Single Teaching Multiple Data). This means that i operation tin exist performed simultaneously on many data items, which is not possible using standard x86 pedagogy.

MMX instructions are ideal for processing multimedia data, due east.one thousand. video, graphics, sound. For example, programs such equally DivX or Winamp make intensive use of MMX code. Currently, nigh processors produced by Intel, AMD and Cyrix possess MMX support.

Although MMX has for quite a few years been practically standard, HLL compilers generally do not generate MMX code (except specialised compilers like VectorC). It seems that the natural solution is to program MMX in associates.

Writing procedures using MMX can sometimes become a 100% speed increase compared to the original code. This is possible considering of the same SIMD fashion. Imagine a situation where we have two tables of 8 bytes, and we desire to add together corresponding bytes from both tables to each other. In C++ we would exercise it this way:

                          unsigned char table1[] = { 0x0A,0x1A,0x2A,0x3A,0x4A,0x5A,0x6A,0x7A }; unsigned char table2[] = { 0xA7,0xA6,0xA5,0xA4,0xA3,0xA2,0xA1,0xA0 }; ...  for (int i = 0; i < 8; i++) {     table1[i] += table2[i]; }

There'southward no problem with this, but the operation of adding bytes will be repeated 8 times. Permit's look at how this can be done much more efficiently by using MMX:

            __asm {      movq    mm0,qword ptr[table1]    // load 8 bytes from the outset table                                      // into annals MM0      movq    mm1,qword ptr[table2]    // 8 bytes from the second table into MM1     paddb   mm0,mm1                  // add the bytes from MM1 to MM0     movq    qword ptr[table1],mm0    // write the result back to table1 }

In total, simply one education is executed instead of 8 additions. Neat, isn't information technology? And more importantly, efficient. Hither a few examples of graphical functions:

            #define IMG_WIDTH 640 #define IMG_HEIGHT 320  ...  // // this function initialises the MMX unit // information technology should be chosen: // - before using the MMX unit for the first time // - afterwards using MMX when nosotros intend to make use of the FPU // - after using the FPU when we intend to make use of MMX // void InitMMX() {     __asm emms;                      // Empty MultiMedia State; }                                    // initialises the MMX unit  // // a fadeout effect of the screen (fullscreen) // void fadeout(DWORD *lpScreen,DWORD iRounds) {     __asm {          mov          edx,iRounds     // load the total number of repetitions          mov          eax,03030303h   // mask for each component of a pixel;                                      // reducing the value of each RGB                                      // component gives the impression of a                                      // fading prototype          movd         mm0,eax         // transfer the mask to the lower half                                      // of annals MM0          punpckldq    mm0,mm0         // copy the mask to the upper half of MM0                                      // such that its full value becomes                                      // 0x0303030303030303                                      // (call back that MM0 is a 64-bit register)          pxor         mm1,mm1         // cipher out register MM1      _fadeout_max:          paddb        mm1,mm0         // multiply the mask, which will be                                      // subtracted from the components of         dec          edx             // pixels by the number of rounds         jne          _fadeout_max    //          mov          eax,lpScreen    // load the arrow to the image buffer                                      // into register EAX                                       // the number of pixels divided by 2                                      // we divide by two because past using MMX we                                      // tin can procedure two pixels simultaneously                                      // (MM1 is an 8-byte register, but each                                      // pixel is only iv bytes)         mov          ecx,(IMG_WIDTH*IMG_HEIGHT) / 2      _clear_screen_2_mmx:                                      // load two pixels from the prototype buffer                                      // into MM0         movq         mm0,qword ptr[eax]         psubusb      mm0,mm1         // decrease our mask from all components                                      // (bytes) of those 2 pixels                                      // Both the mask and the pixels are                                      // treated as tables of 8 carve up bytes                                      // SIMD-way                                       // write the two modified pixels back to                                      // the image buffer         movq         qword ptr[eax],mm0          add          eax,viii           // update the pointer to the image buffer,                                      // ready for the next 2 pixels          dec          ecx             // reduce the loop counter (the loop will                                      // repeat for the number of pixels / 2)         jne          _clear_screen_2_mmx      } }  // // image negative effect // void negative(DWORD *lpScreen) {     __asm {          mov          eax,lpScreen    // load the pointer to the image buffer                                      // into EAX                                       // write the pixel count / 4 into ECX,                                      // since we will process four pixels at once         mov          ecx,(IMG_WIDTH*IMG_HEIGHT) / 4          pcmpeqb      mm7,mm7         // prepare register MM7 to 0xFFFFFFFFFFFFFFFF      _neg_mmx:                                      // load 2 pixels from the image to MM0         movq         mm0,qword ptr[eax]         pxor         mm0,mm7         // XOR-ing with all 1s works like the                                      // logical 'Non' function         movq         qword ptr[eax],mm0                                       // repeat with the next 2 pixels         movq         mm0,qword ptr[eax+8]         pxor         mm0,mm7         movq         qword ptr[eax+8],mm0          add together          eax,16          // update the pointer to the paradigm         dec          ecx             // and the loop counter         jne          _neg_mmx      } }  // // image blur event // void blur(DWORD *lpScreen) {     __asm {          push         esi             // save registers ESI and EDI         push         edi          mov          esi,lpScreen    // load the pointer to the image buffer                                      // into ESI          mov          ecx,( (IMG_WIDTH*IMG_HEIGHT) - (IMG_WIDTH*eight) + 4 )         mov          eax,IMG_WIDTH*4 // the width of a line in the image         mov          edx,IMG_WIDTH*8 // the width of two lines          lea          esi,[esi+eax+four] // set the pointer to the outset pixel                                      // of the 2nd line of the prototype          pxor         mm7,mm7         // aught out MM7         movd         mm0,[esi-4]     // read pixel to the left into MM0      _blur_more:          movd         mm1,[esi+iv]     // read pixel to the right into MM0          mov          edx,esi         sub          edx,eax         movd         mm2,[edx]       // read pixel above into MM2          movd         mm3,[esi+eax]   // read pixel beneath into MM3         punpcklbw    mm0,mm7         // unpack the components of 4 successive         punpcklbw    mm1,mm7         // pixels into WORDs         punpcklbw    mm2,mm7         punpcklbw    mm3,mm7         paddusw      mm0,mm1         // add together the components of the iv pixels         paddusw      mm0,mm2         paddusw      mm0,mm3         psrlw        mm0,two           // divide this sum by four, in this way                                      // we discover the 'average' of the 4 pixels         packuswb     mm0,mm7         // pack the components (each of which is                                      // a WORD) dorsum into a single DWORD         movd         [esi],mm0       // write the pixel to the epitome buffer         add          esi,4          dec          ecx         jne          _blur_more          pop          edi         popular          esi      } }

5. When to use associates

As I mentioned at the commencement of the commodity, associates is used mainly where speed is of import. When writing an algorithm, we should sometimes stop and ask ourselves whether our program could be enhanced, if at some critical points (for instance in loops, etc.), we were to use, say, MMX.

Imagine that you just wrote an mp3 encoder, and a competitor did the same, but y'all used hand-written MMX lawmaking which is three times faster than the contest. Which production will users choose, when they tin can complete a task in 10 minutes instead of xxx? The reply is obvious.

Too being ideal for writing algorithms that crave speed, assembly is also used to write particular programs such every bit EXE-compressors. I'll bet that most people volition call up of programs like UPX or Aspack, which are used to shrink executables. Put simply, if you write a programme which occupies let's say 700 kB, when compressed by UPX its size volition decrease to approx. 300 kB, but the program will still be in the course of an EXE file, and will be just as functional as before pinch. This is achieved by using assembly to write a loader for the code. This is a fragment of code that is stored in the EXE file (near like a virus), and when you kickoff such a program, the loader decompresses the remainder of the EXE file and allows it to run. Writing a loader in a HLL, whether it exist C++, Delphi or even Power Basic is virtually impossible.

It can be said that associates programming is merely useful for speed and unusual applications, just this is non entirely true. Writing in assembly language can be more just inline routines and a few procedures hither and there. Entire programs tin be written in associates language! Sometimes I hear people say that it is incommunicable; that y'all can't write big applications in assembly from scratch. Often these are people who take only dabbled in assembly for a few hours. If you are a competent programmer, at that place is nothing stopping you lot from building professional applications in assembly language. Writing programs in assembly gives u.s. full command over them. Everything is up to us, the program is executed according to our will, and we are not at the mercy of the compiler.

These days, writing in assembly is reasonably uncomplicated and convenient. A lot of people around the globe are starting time to see the magic of this language. People are creating many projects; you tin discover a whole bunch of sample tutorials and source lawmaking, thanks to which many challenges have ceased to exist bug. Writing unabridged applications in associates also has the advantage that a project with 5MB of source code will exist compiled to an executable of approximately 90kB. Compare an application written in Delphi half-dozen, containing i window, which takes approx. 300kb compiled, to a program written in associates language which does exactly the aforementioned matter, and works on every Windows release from 95 to XP, with just a 4kb executable. Why the large departure? Information technology's elementary: the compiler adds a lot of unnecessary things, "simply in instance". Why isn't this fabricated more efficient? We should ask the companies who brand compilers.

Despite the fact that associates tin can be used for many useful things, information technology is as well used to write malicious programs, such as viruses, ransomware, or exploits, but in the words of Winnie the Pooh, that is a story for another day...

half dozen. Summary

These examples represent only a small range of what is possible with associates. There is a lot to detect, just every bit much for me as there is for yous, considering contrary to what they say, assembly is non dead, it is constantly changing, evolving, giving the states possibilities which do not exist in whatever loftier-level language. The terms we hear in the press: SSE, SSE2, 3DNow, are not fiction. Everything is out there. We just have to reach for information technology.

For my part, writing assembly linguistic communication gives me a feeling of liberty, which I never plant when writing in any other language. I promise that your journey into associates doesn't finish with this article!

vii. References

~~world wide web.win32asm.cjb.cyberspace~~	a folio for assembly programmers, sources, tutorials, forums
www.int80h.org	FreeBSA assembly programming
~~www.rbthomas.freeserve.co.uk~~	programming Windows graphics, algorithms, fractals
~~world wide web.chrisdragan.org~~	Chris Dragan'south page, many samples in assembler (MMX)
world wide web.azillionmonkeys.com/qed/index.html	an excellent articles nearly low level code optimization (MMX, Pentium)
~~asmjournal.freeservers.com~~	Associates Programming Journal, a computer programming mag for the assembler linguistic communication, C libraries code optimization, associates programming for Unix shells, game programming in associates with DirectX and many other interesting resources
www.nasm.us	an official folio for the complimentary NASM assembler framework (Windows, Unix)
www.borland.com/Products/Software-Testing/Automatic-Testing/Devpartner-Studio	SoftIce, debugger that let you analyze any awarding on high and depression level formats