Registers and Stacks in NASM

https://softwareengineering.stackexchange.com/questions/379683

14-02-2021
|

Вопрос

So, I am more or less voluntarily learning NASM, and I have problems finding sources that really explain it. Unlike with Java or C# I can't just use google as well, since Assembly just isn't used by many anymore. I need to prepare a presentation about an "Introduction to Assembly/NASM" without explaining code but rather how the language works. Including how to roughly access Registers, what they do, how to acces the stack and to store stuff there.

Now, I got that the Names of the registers are E(A-D)X as well as some others, that each register has their own function or something (which, honestly I haven't really understood. Would appreciate it if someone could explain further, but that is not that necessary for me (yet)) and that they are what gets "stored"/used by the CPU, but then some sources are talking about RAX and that completly threw me off. Are there more Registers than just E[]X or is that just Syntax of another Assembly language?

Second of all, my biggest problem so far: The stack. No where it explains properly how the stack works, what it does, how to address it etc. How do I store data into the Stack, what are the Names/Addresses of the certain Stacks? The code also gets stored in the Stack, how do I know how many places of the stack gets occupied by that? If I understood correctly you just save everything on top of the Stack (ESP shows the highest Value saved in the Stack if I am not wrong) And you use that or the base pointer to go where you know you stored the value. eg: I save 7, 8, 9, Apple, Help, [Does each word (or even number) get its own place in the Stack or is it by letters?], and if I want to access 9 it is the 3rd lowest thing in the stack. So if 7 was 011001(25), 9 would be 011011(27)? Would I need to know that Address or would I say EBP + 2?

Okay, that is all I haven't gotten so far, only the two probably most important elements of Assembly.

However, thank all of you who will be able to help me, I will go on now, trying to make sense of the manual and looking for other information. I will be very happy though if any of you actually know the answer.

Решение

Unlike with Java or C# I can't just use google as well, since Assembly just isn't used by many anymore.

I don't think this is accurate: I found dozens of helpful articles and presentations by searching "understanding assembly language".

Further, you will find the search terms x64 and "instruction set" helpful. The following describes additional search terms you might use to dig deeper.

There are many different kinds of CPUs. Each uses an instruction set architecture.

The instruction set architecture describes the various instructions that the CPU can execute. These instructions have encodings and so are stored as bit patterns. A program consists of sequences of instructions that a given CPU can execute. The language of a program using these bit patterns is called machine code.

Assembly language refers to a human readable version of machine code. Instructions are specified using mnemonic instruction names and operands that can be read and edited as text. Assembly language is compiled (assembled) into machine code by program called an assembler. Assembly language has many features to make the source code readable and more maintainable than machine code. For example, assembly language uses labels — whereas machine code uses offsets. Inserting a new instruction into a program in assembly language is easy, but doing so in machine code is hard as it will throw off other offsets being used nearby. So, assembly language is much preferred.

The Application Binary Interface for a given ISA, determines conventions of register and stack usage, such that one function written by one author can "call" another function written by another author (provided they both adhere to the convention). Another useful term here is "calling convention", which is part of an ABI that specfically describes how parameters are passed from one function to another. Also relevant is the term stack frame.

It is useful to understand that the difference between what is allowed/supported by the hardware ISA and what is allowed/supported by software convention of the ABI.

This Wikipedia article lists the registers on x64 (https://en.wikipedia.org/wiki/X86-64#Architectural_features), and this article illustrates the x86's overlapping register names (https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/x64-architecture).

If you don't want to adhere to the standard conventions (ABI), you can create your own, which is often done in simple programs by students learning to write assembly language.

Beyond these search terms, you should consider writing some simple C programs that illustrate your questions, compile them (with and/or without optimization) and look at the compiler output as disassembly or in a debugger to see how the instruction set is used to manipulate data.

NASM is a specific assembler for x86/x64 architecture, but certainly not the only one. Questions specifically about NASM would go to the different syntaxes and expressions you can write in that assembler.

Your question about registers and stack should be directed more to the instruction set architecture and the calling convention than toward any specific assembler. While the instruction set allows certain operations regardless of the operating system, the ABI differs somewhat between linux and windows, so that there are some differences in register usage and stack usage.

A stack frame can use a single stack pointer or both a stack pointer and a frame pointer. The stack pointer can move during the execution of a function, so the offset of stack allocated variables relative to the stack pointer can change. A frame pointer remains fixed during the execution of a function, and thus variables located in the stack can be referred to by a fixed offset from the frame pointer even as the stack pointer moves from pushing and popping.

The frame pointer approach is easier to use and also supports easier debugging, and may also support stack unwinding and exception handling. However, it somewhat less efficient (as it involves a second register, and a few extra instructions to save, establish, and restore the frame pointer).

The x86 architecture has a long linage. If you see RAX, RBX, RSP, RBP, these are names of registers in the 64-bit extension of this architecture. EAX, EBX are names of the 32-bit registers, and you may see these in 32-bit code or 64-bit code. Any given program should be either intended for the 64-bit architecture (x64) or the 32-bit (x86) architecture but not both mixed together. Therefore we can look to how registers like SP and BP are used to see which (RSP/RBP for 64 and ESP/EBP for 32).

In the original 16-bit 8086, AX was a favored register since encodings that target that register are shorter than other instructions. Further, multiplies and divides target AX/DX register pair. Many of these special register uses have been removed in favor of the registers being more general purpose as the architecture has evolved to 32-bits and 64-bits. This evolution is friendlier toward compilers and hence high-level language. These architectures still have dedicated stack pointers and instructions that implicitly target this register. However, the other registers today are general purpose registers. Once again I bring up the calling convention, which will tell you which register is used, for example, to pass the first argument, or to return a return value.

Лицензировано под: CC-BY-SA с атрибуция

Не связан с softwareengineering.stackexchange