Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

Consider a simple jump instruction (jmp) in assembly, where destination is a pre-defined label.

jmp destination

According to Kip Irvine's "Assembly Language for x86 Processors" when the CPU executes an unconditional transfer, the offset of destination is moved into the instruction pointer.

Could someone explain this because I thought the address to which we want to jump must moved into the instruction pointer?

What's the difference between "the address to which we want to jump" and "the offset (address) of destination"? – Margaret Bloom Jul 20, 2016 at 10:12 Well, is the absolute address stored in the instruction pointer ore an offset to another value? – Dennis Jul 20, 2016 at 10:24 IIRC the value in IP is, the address of the codesegment+the adress of the jmp-label. This is always the case. So in our times, this might be already automatically done, but in former times this means: address, where codesegement starts + adress of jmp label ( hence the word offset). – icbytes Jul 20, 2016 at 10:25 x86 is a segmented architecture. While in long mode segmentation is forced to a flat model, the mechanism is still there. So technically the ip/eip/rip always holds the offset part of the logical (segment:offset) address. In practice since the segment starts at 0, the offset is also the linear address. Don't be confused with the offset encoded in the immediate, this is the number added to the current value of IP to reach the target. – Margaret Bloom Jul 20, 2016 at 10:56 Can you edit your question to include the quote from the book? Without context, it is difficult to answer this question – user1354557 Jul 20, 2016 at 17:07

4.5.1 JMP Instruction

The JMP instruction causes an unconditional transfer to a destination, identified by a code label that is translated by the assembler into an offset. The syntax is

JMP destination

When the CPU executes an unconditional transfer, the offset of destination is moved into the instruction pointer, causing execution to continue at the new location.

Your confusion is understandable; this is poorly explained.

First of all, if an instruction says jmp destination, then it will set the instruction pointer equal to destination. You're right about that.

But the instruction behavior is being confused with the instruction encoding.

Instructions of the form jmp address are encoded using relative offsets in x86. The offsets are relative to the address immediately following the jmp instruction.

This can be encoded either as an EB followed by a signed byte offset or an E9 followed by a signed dword offset. (Integers are little endian in x86)

For example,

00010000:  EB 01 CC 90

Disassembles to

loc_10000:
    jmp loc_10003  ; EB 01
    int3           ; CC
loc_10003:
    nop            ; 90
00010000:  E9 01 00 00 00 CC 90

Disassembles to

loc_10000:
    jmp loc_10006  ; E9 01 00 00 00
    int3           ; CC
loc_10006:
    nop            ; 90

Note that this means instructions written the same way may have different encodings when located at different addresses. For example,

00010000:  EB 02 EB 00 CC EB FD EB FB

Disassembles to

loc_10000:
    jmp loc_10004  ; EB 02
    jmp loc_10004  ; EB 00
loc_10004:
    int3           ; CC
    jmp loc_10004  ; EB FD   (FD == -3)
    jmp loc_10004  ; EB FB   (FB == -5)

Side note: There are several different forms of the jmp instruction, but the type you are speaking of can only be encoded with a relative offset.

Anyway, what the author is saying is that, for an assembler to generate machine code for an instruction like jmp destination, it must convert destination to a byte offset relative to the end of the jmp instruction. Most of the time, you don't need to worry about this process, however. You can just define a label in your assembly and write jmp my_label, and the assembler will take care of everything for you.

When they say "offset" there, they mean "offset from the CS segment base" (which is normally 0 outside of 16bit code), not the relative displacement used in the machine encoding. All addresses in x86 are called offsets, except for "far pointers" which include a segment and offset. e.g. the jmp ptr16:32 form that has an immediate absolute address. – Peter Cordes Jul 21, 2016 at 7:25
address:    bytes:       comment:
0x0004      01 20 00     ; jmp destination  ; here ip = 0x0004
0x0007      ?? repeated 0x19 times
destination:
0x0020      02           ; hlt  ; here ip = 0x0020

compiled from this source:

    .code
    org  0x0004
    jmp destination
    org  0x0020
destination:

So the symbol destination here means absolute address 0x0020 in section .code (which I won't give any special meaning, but you can imagine whatever complex construction as you wish, for example see segment registers in 16b mode of x86).

Then if the instruction with code 0x01 jmp is "near", only offset of that absolute address is used, which is 0x0020 in this simple fake example.

You can still have other variants of jmp on your CPU, like "relative" 0x03 jmp rel8 capable to jump -128..+127 bytes from current ip, or "far" 0x04 jmp bank/segment:offset, which would set not only ip, but also some banking/segment mechanism.

So that word "offset" points to an era of segment:offset addressing, where full instruction pointer on x86 is cs:ip, not just ip. (cs = code segment)

In modern 32/64b x86 OS you usually don't have to touch cs, and work only with offsets inside 32/64b flat virtual memory mapping, then "address" has the same meaning as "offset of address".

I don't see the value in creating a fake architecture like the one above to answer this question when 1) he already mentioned x86, and 2) absolute direct jumps do not exist in x86 – user1354557 Jul 20, 2016 at 16:15 @user1354557 I have to confess... I can't hold a sh*t in my head (or maybe only that I can), so whenever I do x86, I have to imitate everything I see around in older code, or re-read the instruction guide. I was lazy this time, so I rather added some fake machine just to explain what the word "offset" means in this context. (btw JMP m16:32 is quite close to it, except being something completely different and indirect). – Ped7g Jul 20, 2016 at 16:39

Thanks for contributing an answer to Stack Overflow!

  • Please be sure to answer the question. Provide details and share your research!

But avoid

  • Asking for help, clarification, or responding to other answers.
  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.