Pointer arithmetic

I do consider assignment statements and pointer variables to be among computer science's "most valuable treasures."

Donald Knuth, Structured Programming, with go to Statements^[1]

A pointer a pointing to the memory address associated with a variable b, i.e., a contains the memory address 1008 of the variable b. In this diagram, the computing architecture uses the same address space and data primitive for both pointers and non-pointers; this need should not be the case.

In computer science, a pointer is an object in many programming languages that stores a memory address. This can be that of another value located in computer memory, or in some cases, that of memory-mapped computer hardware. A pointer references a location in memory, and obtaining the value stored at that location is known as dereferencing the pointer. As an analogy, a page number in a book's index could be considered a pointer to the corresponding page; dereferencing such a pointer would be done by flipping to the page with the given page number and reading the text found on that page. The actual format and content of a pointer variable is dependent on the underlying computer architecture.

Using pointers significantly improves performance for repetitive operations, like traversing iterable data structures (e.g. strings, lookup tables, control tables and tree structures). In particular, it is often much cheaper in time and space to copy and dereference pointers than it is to copy and access the data to which the pointers point.

Pointers are also used to hold the addresses of entry points for called subroutines in procedural programming and for run-time linking to dynamic link libraries (DLLs). In object-oriented programming, pointers to functions are used for binding methods, often using virtual method tables.

A pointer is a simple, more concrete implementation of the more abstract reference data type. Several languages, especially low-level languages, support some type of pointer, although some have more restrictions on their use than others. While "pointer" has been used to refer to references in general, it more properly applies to data structures whose interface explicitly allows the pointer to be manipulated (arithmetically via pointer arithmetic) as a memory address, as opposed to a magic cookie or capability which does not allow such.^{[citation needed]} Because pointers allow both protected and unprotected access to memory addresses, there are risks associated with using them, particularly in the latter case. Primitive pointers are often stored in a format similar to an integer; however, attempting to dereference or "look up" such a pointer whose value is not a valid memory address could cause a program to crash (or contain invalid data). To alleviate this potential problem, as a matter of type safety, pointers are considered a separate type parameterized by the type of data they point to, even if the underlying representation is an integer. Other measures may also be taken (such as validation and bounds checking), to verify that the pointer variable contains a value that is both a valid memory address and within the numerical range that the processor is capable of addressing.

History

In 1955, Soviet Ukrainian computer scientist Kateryna Yushchenko invented Address (programming language) that made possible indirect addressing and addresses of the highest rank – analogous to pointers. This language was widely used on the Soviet Union computers. However, it was unknown outside the Soviet Union and usually Harold Lawson is credited with the invention, in 1964, of the pointer.^[2] In 2000, Lawson was presented the Computer Pioneer Award by the IEEE "or inventing the pointer variable and introducing this concept into PL/I, thus providing for the first time, the capability to flexibly treat linked lists in a general-purpose high-level language".^[3] His seminal paper on the concepts appeared in the June 1967 issue of CACM entitled: PL/I List Processing. According to the Oxford English Dictionary, the word pointer first appeared in print as a stack pointer in a technical memorandum by the System Development Corporation.

Formal description

In computer science, a pointer is a kind of reference.

A data primitive (or just primitive) is any datum that can be read from or written to computer memory using one memory access (for instance, both a byte and a word are primitives).

A data aggregate (or just aggregate) is a group of primitives that are logically contiguous in memory and that are viewed collectively as one datum (for instance, an aggregate could be 3 logically contiguous bytes, the values of which represent the 3 coordinates of a point in space). When an aggregate is entirely composed of the same type of primitive, the aggregate may be called an array; in a sense, a multi-byte word primitive is an array of bytes, and some programs use words in this way.

A pointer is a programming concept used in computer science to reference or point to a memory location that stores a value or an object. It is essentially a variable that stores the memory address of another variable or data structure rather than storing the data itself.

Pointers are commonly used in programming languages that support direct memory manipulation, such as C and C++. They allow programmers to work with memory directly, enabling efficient memory management and more complex data structures. By using pointers, you can access and modify data located in memory, pass data efficiently between functions, and create dynamic data structures like linked lists, trees, and graphs.

In simpler terms, you can think of a pointer as an arrow that points to a specific spot in a computer's memory, allowing you to interact with the data stored at that location.

A memory pointer (or just pointer) is a primitive, the value of which is intended to be used as a memory address; it is said that a pointer points to a memory address. It is also said that a pointer points to a datum when the pointer's value is the datum's memory address.

More generally, a pointer is a kind of reference, and it is said that a pointer references a datum stored somewhere in memory; to obtain that datum is to dereference the pointer. The feature that separates pointers from other kinds of reference is that a pointer's value is meant to be interpreted as a memory address, which is a rather low-level concept.

References serve as a level of indirection: A pointer's value determines which memory address (that is, which datum) is to be used in a calculation. Because indirection is a fundamental aspect of algorithms, pointers are often expressed as a fundamental data type in programming languages; in statically (or strongly) typed programming languages, the type of a pointer determines the type of the datum to which the pointer points.

Architectural roots

Pointers are a very thin abstraction on top of the addressing capabilities provided by most modern architectures. In the simplest scheme, an address, or a numeric index, is assigned to each unit of memory in the system, where the unit is typically either a byte or a word – depending on whether the architecture is byte-addressable or word-addressable – effectively transforming all of memory into a very large array. The system would then also provide an operation to retrieve the value stored in the memory unit at a given address (usually utilizing the machine's general-purpose registers).

In the usual case, a pointer is large enough to hold more addresses than there are units of memory in the system. This introduces the possibility that a program may attempt to access an address which corresponds to no unit of memory, either because not enough memory is installed (i.e. beyond the range of available memory) or the architecture does not support such addresses. The first case may, in certain platforms such as the Intel x86 architecture, be called a segmentation fault (segfault). The second case is possible in the current implementation of AMD64, where pointers are 64 bit long and addresses only extend to 48 bits. Pointers must conform to certain rules (canonical addresses), so if a non-canonical pointer is dereferenced, the processor raises a general protection fault.

On the other hand, some systems have more units of memory than there are addresses. In this case, a more complex scheme such as memory segmentation or paging is employed to use different parts of the memory at different times. The last incarnations of the x86 architecture support up to 36 bits of physical memory addresses, which were mapped to the 32-bit linear address space through the PAE paging mechanism. Thus, only 1/16 of the possible total memory may be accessed at a time. Another example in the same computer family was the 16-bit protected mode of the 80286 processor, which, though supporting only 16 MB of physical memory, could access up to 1 GB of virtual memory, but the combination of 16-bit address and segment registers made accessing more than 64 KB in one data structure cumbersome.

In order to provide a consistent interface, some architectures provide memory-mapped I/O, which allows some addresses to refer to units of memory while others refer to device registers of other devices in the computer. There are analogous concepts such as file offsets, array indices, and remote object references that serve some of the same purposes as addresses for other types of objects.

Uses

Pointers are directly supported without restrictions in languages such as PL/I, C, C++, Pascal, FreeBASIC, and implicitly in most assembly languages. They are primarily used for constructing references, which in turn are fundamental to constructing nearly all data structures, as well as in passing data between different parts of a program.

In functional programming languages that rely heavily on lists, data references are managed abstractly by using primitive constructs like cons and the corresponding elements car and cdr, which can be thought of as specialised pointers to the first and second components of a cons-cell. This gives rise to some of the idiomatic "flavour" of functional programming. By structuring data in such cons-lists, these languages facilitate recursive means for building and processing data—for example, by recursively accessing the head and tail elements of lists of lists; e.g. "taking the car of the cdr of the cdr". By contrast, memory management based on pointer dereferencing in some approximation of an array of memory addresses facilitates treating variables as slots into which data can be assigned imperatively.

When dealing with arrays, the critical lookup operation typically involves a stage called address calculation which involves constructing a pointer to the desired data element in the array. In other data structures, such as linked lists, pointers are used as references to explicitly tie one piece of the structure to another.

Pointers are used to pass parameters by reference. This is useful if the programmer wants a function's modifications to a parameter to be visible to the function's caller. This is also useful for returning multiple values from a function.

Pointers can also be used to allocate and deallocate dynamic variables and arrays in memory. Since a variable will often become redundant after it has served its purpose, it is a waste of memory to keep it, and therefore it is good practice to deallocate it (using the original pointer reference) when it is no longer needed. Failure to do so may result in a memory leak (where available free memory gradually, or in severe cases rapidly, diminishes because of an accumulation of numerous redundant memory blocks).

C pointers

The basic syntax to define a pointer is:^[4]

int *ptr;

This declares ptr as the identifier of an object of the following type:

pointer that points to an object of type int

This is usually stated more succinctly as "ptr is a pointer to int."

Because the C language does not specify an implicit initialization for objects of automatic storage duration,^[5] care should often be taken to ensure that the address to which ptr points is valid; this is why it is sometimes suggested that a pointer be explicitly initialized to the null pointer value, which is traditionally specified in C with the standardized macro NULL:^[6]

int *ptr = NULL;

Dereferencing a null pointer in C produces undefined behavior,^[7] which could be catastrophic. However, most implementations^{[citation needed]} simply halt execution of the program in question, usually with a segmentation fault.

However, initializing pointers unnecessarily could hinder program analysis, thereby hiding bugs.

In any case, once a pointer has been declared, the next logical step is for it to point at something:

int a = 5;
int *ptr = NULL;

ptr = &a;

This assigns the value of the address of a to ptr. For example, if a is stored at memory location of 0x8130 then the value of ptr will be 0x8130 after the assignment. To dereference the pointer, an asterisk is used again:

*ptr = 8;

This means take the contents of ptr (which is 0x8130), "locate" that address in memory and set its value to 8. If a is later accessed again, its new value will be 8.

This example may be clearer if memory is examined directly. Assume that a is located at address 0x8130 in memory and ptr at 0x8134; also assume this is a 32-bit machine such that an int is 32-bits wide. The following is what would be in memory after the following code snippet is executed:

int a = 5;
int *ptr = NULL;

Address	Contents
0x8130	0x00000005
0x8134	0x00000000

(The NULL pointer shown here is 0x00000000.) By assigning the address of a to ptr:

ptr = &a;

yields the following memory values:

Address	Contents
0x8130	0x00000005
0x8134	0x00008130

Then by dereferencing ptr by coding:

*ptr = 8;

the computer will take the contents of ptr (which is 0x8130), 'locate' that address, and assign 8 to that location yielding the following memory:

Address	Contents
0x8130	0x00000008
0x8134	0x00008130

Clearly, accessing a will yield the value of 8 because the previous instruction modified the contents of a by way of the pointer ptr.

Use in data structures

When setting up data structures like lists, queues and trees, it is necessary to have pointers to help manage how the structure is implemented and controlled. Typical examples of pointers are start pointers, end pointers, and stack pointers. These pointers can either be absolute (the actual physical address or a virtual address in virtual memory) or relative (an offset from an absolute start address ("base") that typically uses fewer bits than a full address, but will usually require one additional arithmetic operation to resolve).

Relative addresses are a form of manual memory segmentation, and share many of its advantages and disadvantages. A two-byte offset, containing a 16-bit, unsigned integer, can be used to provide relative addressing for up to 64 KiB (2¹⁶ bytes) of a data structure. This can easily be extended to 128, 256 or 512 KiB if the address pointed to is forced to be aligned on a half-word, word or double-word boundary (but, requiring an additional "shift left" bitwise operation—by 1, 2 or 3 bits—in order to adjust the offset by a factor of 2, 4 or 8, before its addition to the base address). Generally, though, such schemes are a lot of trouble, and for convenience to the programmer absolute addresses (and underlying that, a flat address space) is preferred.

A one byte offset, such as the hexadecimal ASCII value of a character (e.g. X'29') can be used to point to an alternative integer value (or index) in an array (e.g., X'01'). In this way, characters can be very efficiently translated from 'raw data' to a usable sequential index and then to an absolute address without a lookup table.

C arrays

In C, array indexing is formally defined in terms of pointer arithmetic; that is, the language specification requires that array be equivalent to *(array + i).^[8] Thus in C, arrays can be thought of as pointers to consecutive areas of memory (with no gaps),^[8] and the syntax for accessing arrays is identical for that which can be used to dereference pointers. For example, an array array can be declared and used in the following manner:

int array;      /* Declares 5 contiguous integers */
int *ptr = array;  /* Arrays can be used as pointers */
ptr = 1;        /* Pointers can be indexed with array syntax */
*(array + 1) = 2;  /* Arrays can be dereferenced with pointer syntax */
*(1 + array) = 2;  /* Pointer addition is commutative */
2 = 4;      /* Subscript operator is commutative */

This allocates a block of five integers and names the block array, which acts as a pointer to the block. Another common use of pointers is to point to dynamically allocated memory from malloc which returns a consecutive block of memory of no less than the requested size that can be used as an array.

While most operators on arrays and pointers are equivalent, the result of the sizeof operator differs. In this example, sizeof(array) will evaluate to 5*sizeof(int) (the size of the array), while sizeof(ptr) will evaluate to sizeof(int*), the size of the pointer itself.

Default values of an array can be declared like:

int array = {2, 4, 3, 1, 5};

If array is located in memory starting at address 0x1000 on a 32-bit little-endian machine then memory will contain the following (values are in hexadecimal, like the addresses):

	0	1	2	3
1000	2	0	0	0
1004	4	0	0	0
1008	3	0	0	0
100C	1	0	0	0
1010	5	0	0	0

Represented here are five integers: 2, 4, 3, 1, and 5. These five integers occupy 32 bits (4 bytes) each with the least-significant byte stored first (this is a little-endian CPU architecture) and are stored consecutively starting at address 0x1000.

The syntax for C with pointers is:

array means 0x1000;
array + 1 means 0x1004: the "+ 1" means to add the size of 1 int, which is 4 bytes;
*array means to dereference the contents of array. Considering the contents as a memory address (0x1000), look up the value at that location (0x0002);
array means element number i, 0-based, of array which is translated into *(array + i).

The last example is how to access the contents of array. Breaking it down:

array + i is the memory location of the (i)^th element of array, starting at i=0;
*(array + i) takes that memory address and dereferences it to access the value.

C linked list

Below is an example definition of a linked list in C.

/* the empty linked list is represented by NULL
 * or some other sentinel value */
#define EMPTY_LIST  NULL

struct link {
    void        *data;  /* data of this link */
    struct link *next;  /* next link; EMPTY_LIST if there is none */
};

This pointer-recursive definition is essentially the same as the reference-recursive definition from the Haskell programming language:

 data Link a = Nil
             | Cons a (Link a)

Nil is the empty list, and Cons a (Link a) is a cons cell of type a with another link also of type a.

The definition with references, however, is type-checked and does not use potentially confusing signal values. For this reason, data structures in C are usually dealt with via wrapper functions, which are carefully checked for correctness.

Pass-by-address using pointers

Pointers can be used to pass variables by their address, allowing their value to be changed. For example, consider the following C code:

/* a copy of the int n can be changed within the function without affecting the calling code */
void passByValue(int n) {
    n = 12;
}

/* a pointer m is passed instead. No copy of the value pointed to by m is created */
void passByAddress(int *m) {
    *m = 14;
}

int main(void) {
    int x = 3;

    /* pass a copy of x's value as the argument */
    passByValue(x);
    // the value was changed inside the function, but x is still 3 from here on

    /* pass x's address as the argument */
    passByAddress(&x);
    // x was actually changed by the function and is now equal to 14 here

    return 0;
}

Dynamic memory allocation

In some programs, the required amount of memory depends on what the user may enter. In such cases the programmer needs to allocate memory dynamically. This is done by allocating memory at the heap rather than on the stack, where variables usually are stored (although variables can also be stored in the CPU registers). Dynamic memory allocation can only be made through pointers, and names – like with common variables – cannot be given.

Pointers are used to store and manage the addresses of dynamically allocated blocks of memory. Such blocks are used to store data objects or arrays of objects. Most structured and object-oriented languages provide an area of memory, called the heap or free store, from which objects are dynamically allocated.

The example C code below illustrates how structure objects are dynamically allocated and referenced. The standard C library provides the function malloc() for allocating memory blocks from the heap. It takes the size of an object to allocate as a parameter and returns a pointer to a newly allocated block of memory suitable for storing the object, or it returns a null pointer if the allocation failed.

/* Parts inventory item */
struct Item {
    int         id;     /* Part number */
    char *      name;   /* Part name   */
    float       cost;   /* Cost        */
};

/* Allocate and initialize a new Item object */
struct Item * make_item(const char *name) {
    struct Item * item;

    /* Allocate a block of memory for a new Item object */
    item = malloc(sizeof(struct Item));
    if (item == NULL)
        return NULL;

    /* Initialize the members of the new Item */
    memset(item, 0, sizeof(struct Item));
    item->id =   -1;
    item->name = NULL;
    item->cost = 0.0;

    /* Save a copy of the name in the new Item */
    item->name = malloc(strlen(name) + 1);
    if (item->name == NULL) {
        free(item);
        return NULL;
    }
    strcpy(item->name, name);

    /* Return the newly created Item object */
    return item;
}

The code below illustrates how memory objects are dynamically deallocated, i.e., returned to the heap or free store. The standard C library provides the function free() for deallocating a previously allocated memory block and returning it back to the heap.

/* Deallocate an Item object */
void destroy_item(struct Item *item) {
    /* Check for a null object pointer */
    if (item == NULL)
        return;

    /* Deallocate the name string saved within the Item */
    if (item->name != NULL) {
        free(item->name);
        item->name = NULL;
    }

    /* Deallocate the Item object itself */
    free(item);
}

Memory-mapped hardware

On some computing architectures, pointers can be used to directly manipulate memory or memory-mapped devices.

Assigning addresses to pointers is an invaluable tool when programming microcontrollers. Below is a simple example declaring a pointer of type int and initialising it to a hexadecimal address in this example the constant 0x7FFF:

int *hardware_address = (int *)0x7FFF;

In the mid 80s, using the BIOS to access the video capabilities of PCs was slow. Applications that were display-intensive typically used to access CGA video memory directly by casting the hexadecimal constant 0xB8000 to a pointer to an array of 80 unsigned 16-bit int values. Each value consisted of an ASCII code in the low byte, and a colour in the high byte. Thus, to put the letter 'A' at row 5, column 2 in bright white on blue, one would write code like the following:

#define VID ((unsigned short (*))0xB8000)

void foo(void) {
    VID = 0x1F00 | 'A';
}

Use in control tables

Control tables that are used to control program flow usually make extensive use of pointers. The pointers, usually embedded in a table entry, may, for instance, be used to hold the entry points to subroutines to be executed, based on certain conditions defined in the same table entry. The pointers can however be simply indexes to other separate, but associated, tables comprising an array of the actual addresses or the addresses themselves (depending upon the programming language constructs available). They can also be used to point to earlier table entries (as in loop processing) or forward to skip some table entries (as in a switch or "early" exit from a loop). For this latter purpose, the "pointer" may simply be the table entry number itself and can be transformed into an actual address by simple arithmetic.

Typed pointers and casting

In many languages, pointers have the additional restriction that the object they point to has a specific type. For example, a pointer may be declared to point to an integer; the language will then attempt to prevent the programmer from pointing it to objects which are not integers, such as floating-point numbers, eliminating some errors.

For example, in C

int *money;
char *bags;

money would be an integer pointer and bags would be a char pointer. The following would yield a compiler warning of "assignment from incompatible pointer type" under GCC

bags = money;

because money and bags were declared with different types. To suppress the compiler warning, it must be made explicit that you do indeed wish to make the assignment by typecasting it

bags = (char *)money;

which says to cast the integer pointer of money to a char pointer and assign to bags.

A 2005 draft of the C standard requires that casting a pointer derived from one type to one of another type should maintain the alignment correctness for both types (6.3.2.3 Pointers, par. 7):^[9]

char *external_buffer = "abcdef";
int *internal_data;

internal_data = (int *)external_buffer;  // UNDEFINED BEHAVIOUR if "the resulting pointer
                                         // is not correctly aligned"

Zdroj:https://en.wikipedia.org?pojem=Pointer_arithmetic
Text je dostupný za podmienok Creative Commons Attribution/Share-Alike License 3.0 Unported; prípadne za ďalších podmienok. Podrobnejšie informácie nájdete na stránke Podmienky použitia.

Navigácia: Veda >

Analytika
Antropológia
Aplikované vedy
Bibliometria
Dejiny vedy
Encyklopédie
Filozofia vedy
Forenzné vedy
Humanitné vedy
Knižničná veda
Kryogenika
Kryptológia
Kulturológia
Literárna veda
Medzidisciplinárne oblasti
Metódy kvantitatívnej analýzy
Metavedy
Metodika

Metodológia vedy
Náboženstvo a veda
Náučná literatúra
Podvody vo vede
Popularizácia vedy
Potravinárstvo
Prírodné vedy
Pseudoveda
Scientometria
Spoločenské vedy
Teórie
Teatrológia
Technické vedy
Technika
Terminológia
Umenie
Výskum

Veda
Veda a technika podľa štátu
Veda a technika podľa kontinentu
Veda a technika podľa roka
Veda v kozme
Vedci
Vedecká literatúra
Vedecké databázy
Vedecké experimenty
Vedecké konferencie
Vedecké metódy
Vedecké ocenenia
Vedecké organizácie
Vedecké parky
Vedeckí spisovatelia
Vzdelávanie
Záhady

Príbuzné výrazy:

Text je dostupný za podmienok Creative Commons Attribution/Share-Alike License 3.0 Unported; prípadne za ďalších podmienok.
Podrobnejšie informácie nájdete na stránke Podmienky použitia.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]