ALERT: If there are grammatical errors or logical errors, please let us know, thank you.
If we really want to be able to find vulnerabilities in a computational system, we cannot fail to know the system architectures, such as memory allocations and the management of abstraction (and not) PC software. In fact, if we didn’t know that the basic unit of memory is byte, we wouldn’t understand how many of the software vulnerabilities are born and exploited. So in this article I will discuss the theoretical bases of the architecture of systems and continue with Assembly. I would advise you to always go into the subjects that I am going to deal with in this section, considering that the more things you know, the easier it is to understand and be able to exploit a vulnerability.
Well, now here some information will be listed that it to be know obligatorily.
- The memory base unit is the bytes that are stored in memory marked with a unique number that is its address.
- Often the memory can be exploited even in pieces larger than a single byte, and to these larger sections have been given names, which are:
- Word: 2 Byte
Double Word: 4 Byte
Quad Word: 8 Byte
Paragraph: 16 Byte
- All of data in memory are numeric. The characters are stored using an character code that associating a number for each character.
- The CPU can access data in registers in much faster than data in memory.
- Computer hardware provides a mechanism called Interrupts, which is used to interrupt an ordinary flow of a program to process events that require rapid response.
- Each assembly instruction represents exactly one machine language instruction
- Each CPU type only understands its machine language.
In this article we will discuss Assembly with 32/16/8-bit registers, because it is important to know how to work with both architectures and above all it will be useful for a conversion of some exploits.
It should be borne in mind that each instruction consists of an instruction in mnemonic format (understandable to human beings) more than optional operands based on menmonic (topics).
The calculator includes the opcode represented by mnemonic and the oprand represented the arguments or operands.
Fortunately, this conversion is the task of the assembler.
A directive is an artifact of assembler, not CPU. Directives are generally used to instruct the assembler to do something or to inform him of something.
They are not translated into machine code.
Byte -> B Word -> W Double Word -> D Quad Word -> Q 10 Bytes -> T
To define memory spaces, data segments are used, there are two ways to do this work and they are:
The use of the RESX Directive (which will be replaced by the letter identifying the stored objects)
The alternative use of the DX Directive where letter X has the same use as the RESX Directive and can take the same values.
L1 dw 4 L2 db 100000000b L3 db "w","o","r","d","d", 0
And instead for RESX
L4 resb 10 ; Reserve 10 byte-size leases
Another very important point is the following:
L5 db 0.1,2.3 L6 db "h","a","c","k","k", 0 L7 db' hack', 0
In L5 I define 4 bytes.
In L6 I define a string:”hack”.
In L7 I define the same result as the L6 string.
There are many things to take into account when planning an assembly, for example, you have to take special care about it:
The DD directive can be used with both integers and single precision floating point constants (single-precision floating point and’ equivalent to a float type C variable).
The DQ Directive can only be used to define double precision floating point constants.
If we want to refer to data in the code we could use labels, there are simply two ways in which labels can be used: Flat Label: It is interpreted as an offset (or address) of the data. Label with Brackets Square: This is interpetrated as data value at that address. Flat label: mov eax, msg Label with Brackets Square: mov eax,[msg]. You can then think of a label as a pointer, while the square brackets (Brackets Square) dereferenc the pointer type the asterisk in C.
Working with interiors The integers can be identified in 2 types:
Signed: They can be both positive and negative
Unsigned: They cannot be negative and are directly represented in binary.
But in binary form, how does the computer define its type? There are various methodologies including signed magnitude. With this methodology the whole is broken down into 2 parts, of which the first part represents the bit of the sign, while the second part represents the magnitude of the whole. Link: https://en.wikipedia.org/wiki/Signed_number_representations But this methodology is not advisable because, in that case we find the value +0 (0000000000) or -0 (10000000), the “game” casca, why? Because we know that mathematically speaking, 0 is neither positive nor negative, so it is a neutral value and because it is neutral it cannot take on any sign. This information, although dated, does not explain it in detail as it is easy to find and then you need it for your own responsibility. Another methodology is the one-to-one completion, which can be achieved by converting each bit of the number, for example:
00011110 -> 30 11100001 -> - 30
In assembly all data has a specified size, and in many cases it may be necessary to change this dimension to give space to other data.
For example: We have 8 containers, and in each of them a maximum of one mobile phone can be inserted, what we now have 16 mobile phones to insert, we clearly need other containers, to be exact 16.
Clearly the choice is not really arbitrary, in the sense that we programmers have to stick to what are the registers memory locations, in this part we will make use of the 8-bit and 16-bit registers, just to mention what is the usefulness of the extension of the sign.
We will treat this part in a superficial way, but I would strongly advise you to go into it further.
Decrease the size of a data element:
In order to decrease a data size, the most significant bits are removed. Here is an example:
mov ax, 00F2h ; 242 decimal, 00000000001111110010 and stored in 16 bits mov cl, al ; 00F2h -> F2h -> 11110010
Instead, to increase the size of a datum, the matter becomes slightly more complicated.
Consider byte F2. If it is extended to a word, we do not know what value the word will take on… the result is given by how FF is interpreted. If FF is an unsigned byte (255 decimal place), then the word should be 00F2h.
Here we enter into increasingly complex speeches, and in my way this article can’t go beyond what is its ultimate purpose, so look for it on DuckDuckGo.
In the next article we will see the comparisons and its related registers, how to form a loop and other basic notions.
For this article, but in general I should like to thank Neetx, who has supported and continues to support this section.