What makes MOVEQ quicker than a normal MOVE in 68000 assembly?MSX Assembly/Basic programming documentationWhat is the origin of different styles of assembly language mnemonics?Replacing 80286 with 68000Best path to learn C64 assemblyResource for 6502 assembly directives?What does “jmp *” mean in 6502 assembly?What limited the use of the Z8000 (vs. 68K and 8086) CPU for 16-bit computers?What was the first publication documenting AT&T syntax assembly language?Did Nintendo change its mind about 68000 SNES?Can Access Fault Exceptions of the MC68040 caused by internal access faults occur in normal situations?
Why isn't there any 9.5 digit multimeter or higher?
If Trump gets impeached, how long would Pence be president?
Polyhedra, Polyhedron, Polytopes and Polygon
Why does Canada require mandatory bilingualism in a lot of federal government posts?
Telling manager project isn't worth the effort?
How can I say in Russian "I am not afraid to write anything"?
How can Paypal know my card is being used in another account?
Why does this RX-X lock not appear in Extended Events?
What do you call a flexible diving platform?
What container to use to store developer concentrate?
ECDSA: Why is SigningKey shorter than VerifyingKey
Does Dispel Magic destroy Artificer Turrets?
Why did some Apollo missions carry a grenade launcher?
Can a black hole formation be stopped or interrupted
Will this creature from Curse of Strahd reappear after being banished?
Why is it considered acid rain with pH <5.6?
Irreducible factors of primitive permutation group representation
What are the cons of stateless password generators?
Is it error of law to judge on less relevant case law when there is much more relevant one?
If an arcane trickster rogue uses his mage hand and makes it invisible, does that mean anything the hand picks up is also invisible?
Sci-fi change: Too much or Not enough
Why were contact sensors put on three of the Lunar Module's four legs? Did they ever bend and stick out sideways?
Copying an existing HTML page and use it, is that against any copyright law?
Is this photo showing a woman standing in the nude before teenagers real?
What makes MOVEQ quicker than a normal MOVE in 68000 assembly?
MSX Assembly/Basic programming documentationWhat is the origin of different styles of assembly language mnemonics?Replacing 80286 with 68000Best path to learn C64 assemblyResource for 6502 assembly directives?What does “jmp *” mean in 6502 assembly?What limited the use of the Z8000 (vs. 68K and 8086) CPU for 16-bit computers?What was the first publication documenting AT&T syntax assembly language?Did Nintendo change its mind about 68000 SNES?Can Access Fault Exceptions of the MC68040 caused by internal access faults occur in normal situations?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I'm "re-learning" 68000 assembly language and came across the "MOVEQ" command that is labeled "MOVE QUICK".
According to the NXP Programmers Reference Manual (reference below), the command MOVEQ
(MOVE QUICK) is described as:
Moves a byte of immediate data to a 32-bit data register. The data in an 8-bit
field within the operation word is sign- extended to a long operand in the data
register as it is transferred.
I've searched the manual and cannot find why it's "quick".
Meaning, what's the difference (in performance) in the following instructions?
MOVEQ #100, D0
MOVE #100, D0
I gather the MOVEQ
is a better fit for moving 8-bit data. Or, is it ONLY 8-bits of data as I cannot seem to confirm.
REF:
https://www.nxp.com/files-static/archives/doc/ref_manual/M68000PRM.pdf
assembly motorola-68000
add a comment |
I'm "re-learning" 68000 assembly language and came across the "MOVEQ" command that is labeled "MOVE QUICK".
According to the NXP Programmers Reference Manual (reference below), the command MOVEQ
(MOVE QUICK) is described as:
Moves a byte of immediate data to a 32-bit data register. The data in an 8-bit
field within the operation word is sign- extended to a long operand in the data
register as it is transferred.
I've searched the manual and cannot find why it's "quick".
Meaning, what's the difference (in performance) in the following instructions?
MOVEQ #100, D0
MOVE #100, D0
I gather the MOVEQ
is a better fit for moving 8-bit data. Or, is it ONLY 8-bits of data as I cannot seem to confirm.
REF:
https://www.nxp.com/files-static/archives/doc/ref_manual/M68000PRM.pdf
assembly motorola-68000
add a comment |
I'm "re-learning" 68000 assembly language and came across the "MOVEQ" command that is labeled "MOVE QUICK".
According to the NXP Programmers Reference Manual (reference below), the command MOVEQ
(MOVE QUICK) is described as:
Moves a byte of immediate data to a 32-bit data register. The data in an 8-bit
field within the operation word is sign- extended to a long operand in the data
register as it is transferred.
I've searched the manual and cannot find why it's "quick".
Meaning, what's the difference (in performance) in the following instructions?
MOVEQ #100, D0
MOVE #100, D0
I gather the MOVEQ
is a better fit for moving 8-bit data. Or, is it ONLY 8-bits of data as I cannot seem to confirm.
REF:
https://www.nxp.com/files-static/archives/doc/ref_manual/M68000PRM.pdf
assembly motorola-68000
I'm "re-learning" 68000 assembly language and came across the "MOVEQ" command that is labeled "MOVE QUICK".
According to the NXP Programmers Reference Manual (reference below), the command MOVEQ
(MOVE QUICK) is described as:
Moves a byte of immediate data to a 32-bit data register. The data in an 8-bit
field within the operation word is sign- extended to a long operand in the data
register as it is transferred.
I've searched the manual and cannot find why it's "quick".
Meaning, what's the difference (in performance) in the following instructions?
MOVEQ #100, D0
MOVE #100, D0
I gather the MOVEQ
is a better fit for moving 8-bit data. Or, is it ONLY 8-bits of data as I cannot seem to confirm.
REF:
https://www.nxp.com/files-static/archives/doc/ref_manual/M68000PRM.pdf
assembly motorola-68000
assembly motorola-68000
edited Jul 19 at 22:31
Community♦
1
1
asked Jul 18 at 15:29
cbmeekscbmeeks
4,05913 silver badges61 bronze badges
4,05913 silver badges61 bronze badges
add a comment |
add a comment |
3 Answers
3
active
oldest
votes
The MOVE immediate instruction takes 8 cycles in byte and word modes. There are two memory reads, one for the instruction and one for the immediate value.
The MOVEQ instruction encodes the immediate value into the instruction op-code itself, so only takes 4 cycles and 1 memory read. It can only take a byte immediate value.
MOVEQ #1, D0 (4 clocks, 1 memory read)
MOVE.b #1, D0 (8 clocks, 2 memory reads)
MOVE.w #1000, D0 (8 clocks, 2 memory reads)
Note that the immediate value loaded for byte and word size moves overwrites the entire 32 bits of the register, and is sign extended.
As such, for loading values $00-$FF, it is twice as fast in instruction cycles and uses half as much memory bandwidth (important on systems where it is shared with DMA).
7
"for loading values 0-255..." - more precisely, it loads a 32 bit value between -128 and +127.This is 3 times faster than doing it the normal way with move.l #...
– Bruce Abbott
Jul 19 at 1:20
1
@BruceAbbott that's a good point, it does sign extend to 32 bits.
– user
Jul 19 at 7:54
1
Showing the actual hex codes of the three instructions will improve the answer.
– Leo B.
Jul 19 at 23:17
add a comment |
To give the exact cycle-by-cycle breakdown:
MOVEQ is a one word instruction so will nominally perform in four cycles; in practice it can occur immediately following operation decoding because all necessary information is within the instruction word. Four cycles are then expended fetching the next value to feed into the instruction prefetch queue.
Both MOVE.b MOVE.w are two-word instructions. The 68000 actually knows both words before either instruction begins, so both can occur pretty much immediately but both then require that a further two words be fetched to repopulate the instruction prefetch queue, which occupies eight cycles before the next instruction can begin.
MOVE.l is a three-word instruction. The 68000's prefetch queue is only two words long. So after decoding it can't actually be completed until a further word has been fetched, and after that fetch a further two will be needed to repopulate the queue. So twelve cycles total.
MOVEs are the most primitive operation available; the general rule is that the number of words needed to complete an operation plus the number needed then to [re]populate the prefetch queue is only a floor for cycle counting. See Yacht.txt for a more detailed summary of the work each instruction does; bear in mind that things like RTS are only one word long but imply two further prefetches since the whole queue needs to be replenished, and anything that might change the supervisor flag will often result in a refetch of data that's ostensibly already in the queue, in case the memory subsystem is designed to serve conditional results.
(obiter: this answer was offered despite the other answer already being present because I felt the fact of the prefetch queue makes it a different answer, technically. Even if very similar)
– Tommy
Jul 19 at 20:56
1
Comment only: Your last paragraph doesn't seem to quite 'scan' correctly - or I'm still half asleep :-). I think "that anything that things" contains at least one typo (but may not) and " in case the ..." may not say what you want as precisely as it could (but may :-) ).
– Russell McMahon
Jul 19 at 22:32
Are there any circumstances in which the time to execute a 68000 instruction would vary with context? For example, how would the timing ofmuls r0,r1 / moveq r0,#0 / rts
compare tomuls r0,r1 / move.l r0,#0 / rts
?
– supercat
Jul 19 at 22:33
@supercat I can't think of anything, as every instruction is microcoded to make sure the prefetch queue is exactly full again within its execution time. It's not intelligent like, say, the instruction queue on an 8086.
– Tommy
Jul 20 at 13:40
@Tommy: I can't think of any advantage to waiting before the second fetch is complete before starting instruction execution, but could easily imagine that instruction execution decode couldn't start until the cycle after the first fetch was complete, starting the fetch of the second word immediately without regard for whether it's needed would allow it to begin a cycle or two sooner than would otherwise be possible. It may have been possible to design the 68000 shave two cycles off an RTS if the attached memory system could process a two-cycle "ignored value" read, but...
– supercat
Jul 20 at 16:34
|
show 1 more comment
I've searched the manual and cannot find why it's "quick".
Simply because MOVEQ
is a single word (two byte) instruction, which can be fetched in a single memory cycle, while an equal constant move will be 2 (MOVE.W
) or 3 words (MOVE.L
) and need one/two additional memory cycles - each four clocks.
So effectively you'll get the following execution timing:
MOVEQ #5,D0
- 4 Clocks,MOVE.B #5,D0
- 8 Clocks,MOVE.W #5,D0
- 8 Clocks,MOVE.L #5,D0
- 12 Clocks,
making MOVEQ
about 50/66% faster.
MOVEQ
even got it's own opcode (7) to squeeze all into a single word.
ADDQ
and SUBQ
works similar (*1) - except mixed into the Scc
/DBcc
/TRAPcc
group (5).
I gather the MOVEQ is a better fit for moving 8-bit data. Or, is it ONLY 8-bits of data as I cannot seem to confirm.
Only. There is no room for more than 8 bits of constant within the 16 bit instruction word (*2), as the encoding is
|OPCODE|Dest.| Res || Data |
|Group |Reg. | || |
| 0111 | xxx | 0 || yyyy yyyy |
*1 - Not exactly like it as they may have additional parameters.
*2 - Well, in the original 68000 encoding there was one unused bit, but won't get far.
Don't forget the move.b instruction...
– UncleBod
Jul 18 at 16:14
@UncleBodMOVE.B
is exactly like MOVE.W
– Raffzahn
Jul 18 at 16:40
"making MOVEQ about 50/66% faster." - math.stackexchange.com/questions/1404234/…
– Bruce Abbott
Jul 20 at 23:20
@BruceAbbott And your point is?
– Raffzahn
Jul 21 at 10:19
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "648"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fretrocomputing.stackexchange.com%2fquestions%2f11720%2fwhat-makes-moveq-quicker-than-a-normal-move-in-68000-assembly%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
The MOVE immediate instruction takes 8 cycles in byte and word modes. There are two memory reads, one for the instruction and one for the immediate value.
The MOVEQ instruction encodes the immediate value into the instruction op-code itself, so only takes 4 cycles and 1 memory read. It can only take a byte immediate value.
MOVEQ #1, D0 (4 clocks, 1 memory read)
MOVE.b #1, D0 (8 clocks, 2 memory reads)
MOVE.w #1000, D0 (8 clocks, 2 memory reads)
Note that the immediate value loaded for byte and word size moves overwrites the entire 32 bits of the register, and is sign extended.
As such, for loading values $00-$FF, it is twice as fast in instruction cycles and uses half as much memory bandwidth (important on systems where it is shared with DMA).
7
"for loading values 0-255..." - more precisely, it loads a 32 bit value between -128 and +127.This is 3 times faster than doing it the normal way with move.l #...
– Bruce Abbott
Jul 19 at 1:20
1
@BruceAbbott that's a good point, it does sign extend to 32 bits.
– user
Jul 19 at 7:54
1
Showing the actual hex codes of the three instructions will improve the answer.
– Leo B.
Jul 19 at 23:17
add a comment |
The MOVE immediate instruction takes 8 cycles in byte and word modes. There are two memory reads, one for the instruction and one for the immediate value.
The MOVEQ instruction encodes the immediate value into the instruction op-code itself, so only takes 4 cycles and 1 memory read. It can only take a byte immediate value.
MOVEQ #1, D0 (4 clocks, 1 memory read)
MOVE.b #1, D0 (8 clocks, 2 memory reads)
MOVE.w #1000, D0 (8 clocks, 2 memory reads)
Note that the immediate value loaded for byte and word size moves overwrites the entire 32 bits of the register, and is sign extended.
As such, for loading values $00-$FF, it is twice as fast in instruction cycles and uses half as much memory bandwidth (important on systems where it is shared with DMA).
7
"for loading values 0-255..." - more precisely, it loads a 32 bit value between -128 and +127.This is 3 times faster than doing it the normal way with move.l #...
– Bruce Abbott
Jul 19 at 1:20
1
@BruceAbbott that's a good point, it does sign extend to 32 bits.
– user
Jul 19 at 7:54
1
Showing the actual hex codes of the three instructions will improve the answer.
– Leo B.
Jul 19 at 23:17
add a comment |
The MOVE immediate instruction takes 8 cycles in byte and word modes. There are two memory reads, one for the instruction and one for the immediate value.
The MOVEQ instruction encodes the immediate value into the instruction op-code itself, so only takes 4 cycles and 1 memory read. It can only take a byte immediate value.
MOVEQ #1, D0 (4 clocks, 1 memory read)
MOVE.b #1, D0 (8 clocks, 2 memory reads)
MOVE.w #1000, D0 (8 clocks, 2 memory reads)
Note that the immediate value loaded for byte and word size moves overwrites the entire 32 bits of the register, and is sign extended.
As such, for loading values $00-$FF, it is twice as fast in instruction cycles and uses half as much memory bandwidth (important on systems where it is shared with DMA).
The MOVE immediate instruction takes 8 cycles in byte and word modes. There are two memory reads, one for the instruction and one for the immediate value.
The MOVEQ instruction encodes the immediate value into the instruction op-code itself, so only takes 4 cycles and 1 memory read. It can only take a byte immediate value.
MOVEQ #1, D0 (4 clocks, 1 memory read)
MOVE.b #1, D0 (8 clocks, 2 memory reads)
MOVE.w #1000, D0 (8 clocks, 2 memory reads)
Note that the immediate value loaded for byte and word size moves overwrites the entire 32 bits of the register, and is sign extended.
As such, for loading values $00-$FF, it is twice as fast in instruction cycles and uses half as much memory bandwidth (important on systems where it is shared with DMA).
edited Jul 19 at 7:56
answered Jul 18 at 15:47
useruser
8,6072 gold badges15 silver badges34 bronze badges
8,6072 gold badges15 silver badges34 bronze badges
7
"for loading values 0-255..." - more precisely, it loads a 32 bit value between -128 and +127.This is 3 times faster than doing it the normal way with move.l #...
– Bruce Abbott
Jul 19 at 1:20
1
@BruceAbbott that's a good point, it does sign extend to 32 bits.
– user
Jul 19 at 7:54
1
Showing the actual hex codes of the three instructions will improve the answer.
– Leo B.
Jul 19 at 23:17
add a comment |
7
"for loading values 0-255..." - more precisely, it loads a 32 bit value between -128 and +127.This is 3 times faster than doing it the normal way with move.l #...
– Bruce Abbott
Jul 19 at 1:20
1
@BruceAbbott that's a good point, it does sign extend to 32 bits.
– user
Jul 19 at 7:54
1
Showing the actual hex codes of the three instructions will improve the answer.
– Leo B.
Jul 19 at 23:17
7
7
"for loading values 0-255..." - more precisely, it loads a 32 bit value between -128 and +127.This is 3 times faster than doing it the normal way with move.l #...
– Bruce Abbott
Jul 19 at 1:20
"for loading values 0-255..." - more precisely, it loads a 32 bit value between -128 and +127.This is 3 times faster than doing it the normal way with move.l #...
– Bruce Abbott
Jul 19 at 1:20
1
1
@BruceAbbott that's a good point, it does sign extend to 32 bits.
– user
Jul 19 at 7:54
@BruceAbbott that's a good point, it does sign extend to 32 bits.
– user
Jul 19 at 7:54
1
1
Showing the actual hex codes of the three instructions will improve the answer.
– Leo B.
Jul 19 at 23:17
Showing the actual hex codes of the three instructions will improve the answer.
– Leo B.
Jul 19 at 23:17
add a comment |
To give the exact cycle-by-cycle breakdown:
MOVEQ is a one word instruction so will nominally perform in four cycles; in practice it can occur immediately following operation decoding because all necessary information is within the instruction word. Four cycles are then expended fetching the next value to feed into the instruction prefetch queue.
Both MOVE.b MOVE.w are two-word instructions. The 68000 actually knows both words before either instruction begins, so both can occur pretty much immediately but both then require that a further two words be fetched to repopulate the instruction prefetch queue, which occupies eight cycles before the next instruction can begin.
MOVE.l is a three-word instruction. The 68000's prefetch queue is only two words long. So after decoding it can't actually be completed until a further word has been fetched, and after that fetch a further two will be needed to repopulate the queue. So twelve cycles total.
MOVEs are the most primitive operation available; the general rule is that the number of words needed to complete an operation plus the number needed then to [re]populate the prefetch queue is only a floor for cycle counting. See Yacht.txt for a more detailed summary of the work each instruction does; bear in mind that things like RTS are only one word long but imply two further prefetches since the whole queue needs to be replenished, and anything that might change the supervisor flag will often result in a refetch of data that's ostensibly already in the queue, in case the memory subsystem is designed to serve conditional results.
(obiter: this answer was offered despite the other answer already being present because I felt the fact of the prefetch queue makes it a different answer, technically. Even if very similar)
– Tommy
Jul 19 at 20:56
1
Comment only: Your last paragraph doesn't seem to quite 'scan' correctly - or I'm still half asleep :-). I think "that anything that things" contains at least one typo (but may not) and " in case the ..." may not say what you want as precisely as it could (but may :-) ).
– Russell McMahon
Jul 19 at 22:32
Are there any circumstances in which the time to execute a 68000 instruction would vary with context? For example, how would the timing ofmuls r0,r1 / moveq r0,#0 / rts
compare tomuls r0,r1 / move.l r0,#0 / rts
?
– supercat
Jul 19 at 22:33
@supercat I can't think of anything, as every instruction is microcoded to make sure the prefetch queue is exactly full again within its execution time. It's not intelligent like, say, the instruction queue on an 8086.
– Tommy
Jul 20 at 13:40
@Tommy: I can't think of any advantage to waiting before the second fetch is complete before starting instruction execution, but could easily imagine that instruction execution decode couldn't start until the cycle after the first fetch was complete, starting the fetch of the second word immediately without regard for whether it's needed would allow it to begin a cycle or two sooner than would otherwise be possible. It may have been possible to design the 68000 shave two cycles off an RTS if the attached memory system could process a two-cycle "ignored value" read, but...
– supercat
Jul 20 at 16:34
|
show 1 more comment
To give the exact cycle-by-cycle breakdown:
MOVEQ is a one word instruction so will nominally perform in four cycles; in practice it can occur immediately following operation decoding because all necessary information is within the instruction word. Four cycles are then expended fetching the next value to feed into the instruction prefetch queue.
Both MOVE.b MOVE.w are two-word instructions. The 68000 actually knows both words before either instruction begins, so both can occur pretty much immediately but both then require that a further two words be fetched to repopulate the instruction prefetch queue, which occupies eight cycles before the next instruction can begin.
MOVE.l is a three-word instruction. The 68000's prefetch queue is only two words long. So after decoding it can't actually be completed until a further word has been fetched, and after that fetch a further two will be needed to repopulate the queue. So twelve cycles total.
MOVEs are the most primitive operation available; the general rule is that the number of words needed to complete an operation plus the number needed then to [re]populate the prefetch queue is only a floor for cycle counting. See Yacht.txt for a more detailed summary of the work each instruction does; bear in mind that things like RTS are only one word long but imply two further prefetches since the whole queue needs to be replenished, and anything that might change the supervisor flag will often result in a refetch of data that's ostensibly already in the queue, in case the memory subsystem is designed to serve conditional results.
(obiter: this answer was offered despite the other answer already being present because I felt the fact of the prefetch queue makes it a different answer, technically. Even if very similar)
– Tommy
Jul 19 at 20:56
1
Comment only: Your last paragraph doesn't seem to quite 'scan' correctly - or I'm still half asleep :-). I think "that anything that things" contains at least one typo (but may not) and " in case the ..." may not say what you want as precisely as it could (but may :-) ).
– Russell McMahon
Jul 19 at 22:32
Are there any circumstances in which the time to execute a 68000 instruction would vary with context? For example, how would the timing ofmuls r0,r1 / moveq r0,#0 / rts
compare tomuls r0,r1 / move.l r0,#0 / rts
?
– supercat
Jul 19 at 22:33
@supercat I can't think of anything, as every instruction is microcoded to make sure the prefetch queue is exactly full again within its execution time. It's not intelligent like, say, the instruction queue on an 8086.
– Tommy
Jul 20 at 13:40
@Tommy: I can't think of any advantage to waiting before the second fetch is complete before starting instruction execution, but could easily imagine that instruction execution decode couldn't start until the cycle after the first fetch was complete, starting the fetch of the second word immediately without regard for whether it's needed would allow it to begin a cycle or two sooner than would otherwise be possible. It may have been possible to design the 68000 shave two cycles off an RTS if the attached memory system could process a two-cycle "ignored value" read, but...
– supercat
Jul 20 at 16:34
|
show 1 more comment
To give the exact cycle-by-cycle breakdown:
MOVEQ is a one word instruction so will nominally perform in four cycles; in practice it can occur immediately following operation decoding because all necessary information is within the instruction word. Four cycles are then expended fetching the next value to feed into the instruction prefetch queue.
Both MOVE.b MOVE.w are two-word instructions. The 68000 actually knows both words before either instruction begins, so both can occur pretty much immediately but both then require that a further two words be fetched to repopulate the instruction prefetch queue, which occupies eight cycles before the next instruction can begin.
MOVE.l is a three-word instruction. The 68000's prefetch queue is only two words long. So after decoding it can't actually be completed until a further word has been fetched, and after that fetch a further two will be needed to repopulate the queue. So twelve cycles total.
MOVEs are the most primitive operation available; the general rule is that the number of words needed to complete an operation plus the number needed then to [re]populate the prefetch queue is only a floor for cycle counting. See Yacht.txt for a more detailed summary of the work each instruction does; bear in mind that things like RTS are only one word long but imply two further prefetches since the whole queue needs to be replenished, and anything that might change the supervisor flag will often result in a refetch of data that's ostensibly already in the queue, in case the memory subsystem is designed to serve conditional results.
To give the exact cycle-by-cycle breakdown:
MOVEQ is a one word instruction so will nominally perform in four cycles; in practice it can occur immediately following operation decoding because all necessary information is within the instruction word. Four cycles are then expended fetching the next value to feed into the instruction prefetch queue.
Both MOVE.b MOVE.w are two-word instructions. The 68000 actually knows both words before either instruction begins, so both can occur pretty much immediately but both then require that a further two words be fetched to repopulate the instruction prefetch queue, which occupies eight cycles before the next instruction can begin.
MOVE.l is a three-word instruction. The 68000's prefetch queue is only two words long. So after decoding it can't actually be completed until a further word has been fetched, and after that fetch a further two will be needed to repopulate the queue. So twelve cycles total.
MOVEs are the most primitive operation available; the general rule is that the number of words needed to complete an operation plus the number needed then to [re]populate the prefetch queue is only a floor for cycle counting. See Yacht.txt for a more detailed summary of the work each instruction does; bear in mind that things like RTS are only one word long but imply two further prefetches since the whole queue needs to be replenished, and anything that might change the supervisor flag will often result in a refetch of data that's ostensibly already in the queue, in case the memory subsystem is designed to serve conditional results.
edited Jul 19 at 23:36
answered Jul 18 at 17:25
TommyTommy
18.6k1 gold badge53 silver badges94 bronze badges
18.6k1 gold badge53 silver badges94 bronze badges
(obiter: this answer was offered despite the other answer already being present because I felt the fact of the prefetch queue makes it a different answer, technically. Even if very similar)
– Tommy
Jul 19 at 20:56
1
Comment only: Your last paragraph doesn't seem to quite 'scan' correctly - or I'm still half asleep :-). I think "that anything that things" contains at least one typo (but may not) and " in case the ..." may not say what you want as precisely as it could (but may :-) ).
– Russell McMahon
Jul 19 at 22:32
Are there any circumstances in which the time to execute a 68000 instruction would vary with context? For example, how would the timing ofmuls r0,r1 / moveq r0,#0 / rts
compare tomuls r0,r1 / move.l r0,#0 / rts
?
– supercat
Jul 19 at 22:33
@supercat I can't think of anything, as every instruction is microcoded to make sure the prefetch queue is exactly full again within its execution time. It's not intelligent like, say, the instruction queue on an 8086.
– Tommy
Jul 20 at 13:40
@Tommy: I can't think of any advantage to waiting before the second fetch is complete before starting instruction execution, but could easily imagine that instruction execution decode couldn't start until the cycle after the first fetch was complete, starting the fetch of the second word immediately without regard for whether it's needed would allow it to begin a cycle or two sooner than would otherwise be possible. It may have been possible to design the 68000 shave two cycles off an RTS if the attached memory system could process a two-cycle "ignored value" read, but...
– supercat
Jul 20 at 16:34
|
show 1 more comment
(obiter: this answer was offered despite the other answer already being present because I felt the fact of the prefetch queue makes it a different answer, technically. Even if very similar)
– Tommy
Jul 19 at 20:56
1
Comment only: Your last paragraph doesn't seem to quite 'scan' correctly - or I'm still half asleep :-). I think "that anything that things" contains at least one typo (but may not) and " in case the ..." may not say what you want as precisely as it could (but may :-) ).
– Russell McMahon
Jul 19 at 22:32
Are there any circumstances in which the time to execute a 68000 instruction would vary with context? For example, how would the timing ofmuls r0,r1 / moveq r0,#0 / rts
compare tomuls r0,r1 / move.l r0,#0 / rts
?
– supercat
Jul 19 at 22:33
@supercat I can't think of anything, as every instruction is microcoded to make sure the prefetch queue is exactly full again within its execution time. It's not intelligent like, say, the instruction queue on an 8086.
– Tommy
Jul 20 at 13:40
@Tommy: I can't think of any advantage to waiting before the second fetch is complete before starting instruction execution, but could easily imagine that instruction execution decode couldn't start until the cycle after the first fetch was complete, starting the fetch of the second word immediately without regard for whether it's needed would allow it to begin a cycle or two sooner than would otherwise be possible. It may have been possible to design the 68000 shave two cycles off an RTS if the attached memory system could process a two-cycle "ignored value" read, but...
– supercat
Jul 20 at 16:34
(obiter: this answer was offered despite the other answer already being present because I felt the fact of the prefetch queue makes it a different answer, technically. Even if very similar)
– Tommy
Jul 19 at 20:56
(obiter: this answer was offered despite the other answer already being present because I felt the fact of the prefetch queue makes it a different answer, technically. Even if very similar)
– Tommy
Jul 19 at 20:56
1
1
Comment only: Your last paragraph doesn't seem to quite 'scan' correctly - or I'm still half asleep :-). I think "that anything that things" contains at least one typo (but may not) and " in case the ..." may not say what you want as precisely as it could (but may :-) ).
– Russell McMahon
Jul 19 at 22:32
Comment only: Your last paragraph doesn't seem to quite 'scan' correctly - or I'm still half asleep :-). I think "that anything that things" contains at least one typo (but may not) and " in case the ..." may not say what you want as precisely as it could (but may :-) ).
– Russell McMahon
Jul 19 at 22:32
Are there any circumstances in which the time to execute a 68000 instruction would vary with context? For example, how would the timing of
muls r0,r1 / moveq r0,#0 / rts
compare to muls r0,r1 / move.l r0,#0 / rts
?– supercat
Jul 19 at 22:33
Are there any circumstances in which the time to execute a 68000 instruction would vary with context? For example, how would the timing of
muls r0,r1 / moveq r0,#0 / rts
compare to muls r0,r1 / move.l r0,#0 / rts
?– supercat
Jul 19 at 22:33
@supercat I can't think of anything, as every instruction is microcoded to make sure the prefetch queue is exactly full again within its execution time. It's not intelligent like, say, the instruction queue on an 8086.
– Tommy
Jul 20 at 13:40
@supercat I can't think of anything, as every instruction is microcoded to make sure the prefetch queue is exactly full again within its execution time. It's not intelligent like, say, the instruction queue on an 8086.
– Tommy
Jul 20 at 13:40
@Tommy: I can't think of any advantage to waiting before the second fetch is complete before starting instruction execution, but could easily imagine that instruction execution decode couldn't start until the cycle after the first fetch was complete, starting the fetch of the second word immediately without regard for whether it's needed would allow it to begin a cycle or two sooner than would otherwise be possible. It may have been possible to design the 68000 shave two cycles off an RTS if the attached memory system could process a two-cycle "ignored value" read, but...
– supercat
Jul 20 at 16:34
@Tommy: I can't think of any advantage to waiting before the second fetch is complete before starting instruction execution, but could easily imagine that instruction execution decode couldn't start until the cycle after the first fetch was complete, starting the fetch of the second word immediately without regard for whether it's needed would allow it to begin a cycle or two sooner than would otherwise be possible. It may have been possible to design the 68000 shave two cycles off an RTS if the attached memory system could process a two-cycle "ignored value" read, but...
– supercat
Jul 20 at 16:34
|
show 1 more comment
I've searched the manual and cannot find why it's "quick".
Simply because MOVEQ
is a single word (two byte) instruction, which can be fetched in a single memory cycle, while an equal constant move will be 2 (MOVE.W
) or 3 words (MOVE.L
) and need one/two additional memory cycles - each four clocks.
So effectively you'll get the following execution timing:
MOVEQ #5,D0
- 4 Clocks,MOVE.B #5,D0
- 8 Clocks,MOVE.W #5,D0
- 8 Clocks,MOVE.L #5,D0
- 12 Clocks,
making MOVEQ
about 50/66% faster.
MOVEQ
even got it's own opcode (7) to squeeze all into a single word.
ADDQ
and SUBQ
works similar (*1) - except mixed into the Scc
/DBcc
/TRAPcc
group (5).
I gather the MOVEQ is a better fit for moving 8-bit data. Or, is it ONLY 8-bits of data as I cannot seem to confirm.
Only. There is no room for more than 8 bits of constant within the 16 bit instruction word (*2), as the encoding is
|OPCODE|Dest.| Res || Data |
|Group |Reg. | || |
| 0111 | xxx | 0 || yyyy yyyy |
*1 - Not exactly like it as they may have additional parameters.
*2 - Well, in the original 68000 encoding there was one unused bit, but won't get far.
Don't forget the move.b instruction...
– UncleBod
Jul 18 at 16:14
@UncleBodMOVE.B
is exactly like MOVE.W
– Raffzahn
Jul 18 at 16:40
"making MOVEQ about 50/66% faster." - math.stackexchange.com/questions/1404234/…
– Bruce Abbott
Jul 20 at 23:20
@BruceAbbott And your point is?
– Raffzahn
Jul 21 at 10:19
add a comment |
I've searched the manual and cannot find why it's "quick".
Simply because MOVEQ
is a single word (two byte) instruction, which can be fetched in a single memory cycle, while an equal constant move will be 2 (MOVE.W
) or 3 words (MOVE.L
) and need one/two additional memory cycles - each four clocks.
So effectively you'll get the following execution timing:
MOVEQ #5,D0
- 4 Clocks,MOVE.B #5,D0
- 8 Clocks,MOVE.W #5,D0
- 8 Clocks,MOVE.L #5,D0
- 12 Clocks,
making MOVEQ
about 50/66% faster.
MOVEQ
even got it's own opcode (7) to squeeze all into a single word.
ADDQ
and SUBQ
works similar (*1) - except mixed into the Scc
/DBcc
/TRAPcc
group (5).
I gather the MOVEQ is a better fit for moving 8-bit data. Or, is it ONLY 8-bits of data as I cannot seem to confirm.
Only. There is no room for more than 8 bits of constant within the 16 bit instruction word (*2), as the encoding is
|OPCODE|Dest.| Res || Data |
|Group |Reg. | || |
| 0111 | xxx | 0 || yyyy yyyy |
*1 - Not exactly like it as they may have additional parameters.
*2 - Well, in the original 68000 encoding there was one unused bit, but won't get far.
Don't forget the move.b instruction...
– UncleBod
Jul 18 at 16:14
@UncleBodMOVE.B
is exactly like MOVE.W
– Raffzahn
Jul 18 at 16:40
"making MOVEQ about 50/66% faster." - math.stackexchange.com/questions/1404234/…
– Bruce Abbott
Jul 20 at 23:20
@BruceAbbott And your point is?
– Raffzahn
Jul 21 at 10:19
add a comment |
I've searched the manual and cannot find why it's "quick".
Simply because MOVEQ
is a single word (two byte) instruction, which can be fetched in a single memory cycle, while an equal constant move will be 2 (MOVE.W
) or 3 words (MOVE.L
) and need one/two additional memory cycles - each four clocks.
So effectively you'll get the following execution timing:
MOVEQ #5,D0
- 4 Clocks,MOVE.B #5,D0
- 8 Clocks,MOVE.W #5,D0
- 8 Clocks,MOVE.L #5,D0
- 12 Clocks,
making MOVEQ
about 50/66% faster.
MOVEQ
even got it's own opcode (7) to squeeze all into a single word.
ADDQ
and SUBQ
works similar (*1) - except mixed into the Scc
/DBcc
/TRAPcc
group (5).
I gather the MOVEQ is a better fit for moving 8-bit data. Or, is it ONLY 8-bits of data as I cannot seem to confirm.
Only. There is no room for more than 8 bits of constant within the 16 bit instruction word (*2), as the encoding is
|OPCODE|Dest.| Res || Data |
|Group |Reg. | || |
| 0111 | xxx | 0 || yyyy yyyy |
*1 - Not exactly like it as they may have additional parameters.
*2 - Well, in the original 68000 encoding there was one unused bit, but won't get far.
I've searched the manual and cannot find why it's "quick".
Simply because MOVEQ
is a single word (two byte) instruction, which can be fetched in a single memory cycle, while an equal constant move will be 2 (MOVE.W
) or 3 words (MOVE.L
) and need one/two additional memory cycles - each four clocks.
So effectively you'll get the following execution timing:
MOVEQ #5,D0
- 4 Clocks,MOVE.B #5,D0
- 8 Clocks,MOVE.W #5,D0
- 8 Clocks,MOVE.L #5,D0
- 12 Clocks,
making MOVEQ
about 50/66% faster.
MOVEQ
even got it's own opcode (7) to squeeze all into a single word.
ADDQ
and SUBQ
works similar (*1) - except mixed into the Scc
/DBcc
/TRAPcc
group (5).
I gather the MOVEQ is a better fit for moving 8-bit data. Or, is it ONLY 8-bits of data as I cannot seem to confirm.
Only. There is no room for more than 8 bits of constant within the 16 bit instruction word (*2), as the encoding is
|OPCODE|Dest.| Res || Data |
|Group |Reg. | || |
| 0111 | xxx | 0 || yyyy yyyy |
*1 - Not exactly like it as they may have additional parameters.
*2 - Well, in the original 68000 encoding there was one unused bit, but won't get far.
edited Jul 20 at 7:36
answered Jul 18 at 15:39
RaffzahnRaffzahn
65.5k6 gold badges160 silver badges270 bronze badges
65.5k6 gold badges160 silver badges270 bronze badges
Don't forget the move.b instruction...
– UncleBod
Jul 18 at 16:14
@UncleBodMOVE.B
is exactly like MOVE.W
– Raffzahn
Jul 18 at 16:40
"making MOVEQ about 50/66% faster." - math.stackexchange.com/questions/1404234/…
– Bruce Abbott
Jul 20 at 23:20
@BruceAbbott And your point is?
– Raffzahn
Jul 21 at 10:19
add a comment |
Don't forget the move.b instruction...
– UncleBod
Jul 18 at 16:14
@UncleBodMOVE.B
is exactly like MOVE.W
– Raffzahn
Jul 18 at 16:40
"making MOVEQ about 50/66% faster." - math.stackexchange.com/questions/1404234/…
– Bruce Abbott
Jul 20 at 23:20
@BruceAbbott And your point is?
– Raffzahn
Jul 21 at 10:19
Don't forget the move.b instruction...
– UncleBod
Jul 18 at 16:14
Don't forget the move.b instruction...
– UncleBod
Jul 18 at 16:14
@UncleBod
MOVE.B
is exactly like MOVE.W– Raffzahn
Jul 18 at 16:40
@UncleBod
MOVE.B
is exactly like MOVE.W– Raffzahn
Jul 18 at 16:40
"making MOVEQ about 50/66% faster." - math.stackexchange.com/questions/1404234/…
– Bruce Abbott
Jul 20 at 23:20
"making MOVEQ about 50/66% faster." - math.stackexchange.com/questions/1404234/…
– Bruce Abbott
Jul 20 at 23:20
@BruceAbbott And your point is?
– Raffzahn
Jul 21 at 10:19
@BruceAbbott And your point is?
– Raffzahn
Jul 21 at 10:19
add a comment |
Thanks for contributing an answer to Retrocomputing Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fretrocomputing.stackexchange.com%2fquestions%2f11720%2fwhat-makes-moveq-quicker-than-a-normal-move-in-68000-assembly%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown