What makes MOVEQ quicker than a normal MOVE in 68000 assembly?MSX Assembly/Basic programming documentationWhat is the origin of different styles of assembly language mnemonics?Replacing 80286 with 68000Best path to learn C64 assemblyResource for 6502 assembly directives?What does “jmp *” mean in 6502 assembly?What limited the use of the Z8000 (vs. 68K and 8086) CPU for 16-bit computers?What was the first publication documenting AT&T syntax assembly language?Did Nintendo change its mind about 68000 SNES?Can Access Fault Exceptions of the MC68040 caused by internal access faults occur in normal situations?

Why isn't there any 9.5 digit multimeter or higher?

If Trump gets impeached, how long would Pence be president?

Polyhedra, Polyhedron, Polytopes and Polygon

Why does Canada require mandatory bilingualism in a lot of federal government posts?

Telling manager project isn't worth the effort?

How can I say in Russian "I am not afraid to write anything"?

How can Paypal know my card is being used in another account?

Why does this RX-X lock not appear in Extended Events?

What do you call a flexible diving platform?

What container to use to store developer concentrate?

ECDSA: Why is SigningKey shorter than VerifyingKey

Does Dispel Magic destroy Artificer Turrets?

Why did some Apollo missions carry a grenade launcher?

Can a black hole formation be stopped or interrupted

Will this creature from Curse of Strahd reappear after being banished?

Why is it considered acid rain with pH <5.6?

Irreducible factors of primitive permutation group representation

What are the cons of stateless password generators?

Is it error of law to judge on less relevant case law when there is much more relevant one?

If an arcane trickster rogue uses his mage hand and makes it invisible, does that mean anything the hand picks up is also invisible?

Sci-fi change: Too much or Not enough

Why were contact sensors put on three of the Lunar Module's four legs? Did they ever bend and stick out sideways?

Copying an existing HTML page and use it, is that against any copyright law?

Is this photo showing a woman standing in the nude before teenagers real?



What makes MOVEQ quicker than a normal MOVE in 68000 assembly?


MSX Assembly/Basic programming documentationWhat is the origin of different styles of assembly language mnemonics?Replacing 80286 with 68000Best path to learn C64 assemblyResource for 6502 assembly directives?What does “jmp *” mean in 6502 assembly?What limited the use of the Z8000 (vs. 68K and 8086) CPU for 16-bit computers?What was the first publication documenting AT&T syntax assembly language?Did Nintendo change its mind about 68000 SNES?Can Access Fault Exceptions of the MC68040 caused by internal access faults occur in normal situations?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








19















I'm "re-learning" 68000 assembly language and came across the "MOVEQ" command that is labeled "MOVE QUICK".



According to the NXP Programmers Reference Manual (reference below), the command MOVEQ (MOVE QUICK) is described as:



Moves a byte of immediate data to a 32-bit data register. The data in an 8-bit
field within the operation word is sign- extended to a long operand in the data
register as it is transferred.


I've searched the manual and cannot find why it's "quick".



Meaning, what's the difference (in performance) in the following instructions?



MOVEQ #100, D0
MOVE #100, D0


I gather the MOVEQ is a better fit for moving 8-bit data. Or, is it ONLY 8-bits of data as I cannot seem to confirm.



REF:



https://www.nxp.com/files-static/archives/doc/ref_manual/M68000PRM.pdf










share|improve this question
































    19















    I'm "re-learning" 68000 assembly language and came across the "MOVEQ" command that is labeled "MOVE QUICK".



    According to the NXP Programmers Reference Manual (reference below), the command MOVEQ (MOVE QUICK) is described as:



    Moves a byte of immediate data to a 32-bit data register. The data in an 8-bit
    field within the operation word is sign- extended to a long operand in the data
    register as it is transferred.


    I've searched the manual and cannot find why it's "quick".



    Meaning, what's the difference (in performance) in the following instructions?



    MOVEQ #100, D0
    MOVE #100, D0


    I gather the MOVEQ is a better fit for moving 8-bit data. Or, is it ONLY 8-bits of data as I cannot seem to confirm.



    REF:



    https://www.nxp.com/files-static/archives/doc/ref_manual/M68000PRM.pdf










    share|improve this question




























      19












      19








      19


      2






      I'm "re-learning" 68000 assembly language and came across the "MOVEQ" command that is labeled "MOVE QUICK".



      According to the NXP Programmers Reference Manual (reference below), the command MOVEQ (MOVE QUICK) is described as:



      Moves a byte of immediate data to a 32-bit data register. The data in an 8-bit
      field within the operation word is sign- extended to a long operand in the data
      register as it is transferred.


      I've searched the manual and cannot find why it's "quick".



      Meaning, what's the difference (in performance) in the following instructions?



      MOVEQ #100, D0
      MOVE #100, D0


      I gather the MOVEQ is a better fit for moving 8-bit data. Or, is it ONLY 8-bits of data as I cannot seem to confirm.



      REF:



      https://www.nxp.com/files-static/archives/doc/ref_manual/M68000PRM.pdf










      share|improve this question
















      I'm "re-learning" 68000 assembly language and came across the "MOVEQ" command that is labeled "MOVE QUICK".



      According to the NXP Programmers Reference Manual (reference below), the command MOVEQ (MOVE QUICK) is described as:



      Moves a byte of immediate data to a 32-bit data register. The data in an 8-bit
      field within the operation word is sign- extended to a long operand in the data
      register as it is transferred.


      I've searched the manual and cannot find why it's "quick".



      Meaning, what's the difference (in performance) in the following instructions?



      MOVEQ #100, D0
      MOVE #100, D0


      I gather the MOVEQ is a better fit for moving 8-bit data. Or, is it ONLY 8-bits of data as I cannot seem to confirm.



      REF:



      https://www.nxp.com/files-static/archives/doc/ref_manual/M68000PRM.pdf







      assembly motorola-68000






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jul 19 at 22:31









      Community

      1




      1










      asked Jul 18 at 15:29









      cbmeekscbmeeks

      4,05913 silver badges61 bronze badges




      4,05913 silver badges61 bronze badges























          3 Answers
          3






          active

          oldest

          votes


















          29














          The MOVE immediate instruction takes 8 cycles in byte and word modes. There are two memory reads, one for the instruction and one for the immediate value.



          The MOVEQ instruction encodes the immediate value into the instruction op-code itself, so only takes 4 cycles and 1 memory read. It can only take a byte immediate value.



          MOVEQ #1, D0 (4 clocks, 1 memory read)
          MOVE.b #1, D0 (8 clocks, 2 memory reads)
          MOVE.w #1000, D0 (8 clocks, 2 memory reads)



          Note that the immediate value loaded for byte and word size moves overwrites the entire 32 bits of the register, and is sign extended.



          As such, for loading values $00-$FF, it is twice as fast in instruction cycles and uses half as much memory bandwidth (important on systems where it is shared with DMA).






          share|improve this answer






















          • 7





            "for loading values 0-255..." - more precisely, it loads a 32 bit value between -128 and +127.This is 3 times faster than doing it the normal way with move.l #...

            – Bruce Abbott
            Jul 19 at 1:20






          • 1





            @BruceAbbott that's a good point, it does sign extend to 32 bits.

            – user
            Jul 19 at 7:54






          • 1





            Showing the actual hex codes of the three instructions will improve the answer.

            – Leo B.
            Jul 19 at 23:17


















          11














          To give the exact cycle-by-cycle breakdown:



          MOVEQ is a one word instruction so will nominally perform in four cycles; in practice it can occur immediately following operation decoding because all necessary information is within the instruction word. Four cycles are then expended fetching the next value to feed into the instruction prefetch queue.



          Both MOVE.b MOVE.w are two-word instructions. The 68000 actually knows both words before either instruction begins, so both can occur pretty much immediately but both then require that a further two words be fetched to repopulate the instruction prefetch queue, which occupies eight cycles before the next instruction can begin.



          MOVE.l is a three-word instruction. The 68000's prefetch queue is only two words long. So after decoding it can't actually be completed until a further word has been fetched, and after that fetch a further two will be needed to repopulate the queue. So twelve cycles total.



          MOVEs are the most primitive operation available; the general rule is that the number of words needed to complete an operation plus the number needed then to [re]populate the prefetch queue is only a floor for cycle counting. See Yacht.txt for a more detailed summary of the work each instruction does; bear in mind that things like RTS are only one word long but imply two further prefetches since the whole queue needs to be replenished, and anything that might change the supervisor flag will often result in a refetch of data that's ostensibly already in the queue, in case the memory subsystem is designed to serve conditional results.






          share|improve this answer



























          • (obiter: this answer was offered despite the other answer already being present because I felt the fact of the prefetch queue makes it a different answer, technically. Even if very similar)

            – Tommy
            Jul 19 at 20:56






          • 1





            Comment only: Your last paragraph doesn't seem to quite 'scan' correctly - or I'm still half asleep :-). I think "that anything that things" contains at least one typo (but may not) and " in case the ..." may not say what you want as precisely as it could (but may :-) ).

            – Russell McMahon
            Jul 19 at 22:32











          • Are there any circumstances in which the time to execute a 68000 instruction would vary with context? For example, how would the timing of muls r0,r1 / moveq r0,#0 / rts compare to muls r0,r1 / move.l r0,#0 / rts?

            – supercat
            Jul 19 at 22:33











          • @supercat I can't think of anything, as every instruction is microcoded to make sure the prefetch queue is exactly full again within its execution time. It's not intelligent like, say, the instruction queue on an 8086.

            – Tommy
            Jul 20 at 13:40











          • @Tommy: I can't think of any advantage to waiting before the second fetch is complete before starting instruction execution, but could easily imagine that instruction execution decode couldn't start until the cycle after the first fetch was complete, starting the fetch of the second word immediately without regard for whether it's needed would allow it to begin a cycle or two sooner than would otherwise be possible. It may have been possible to design the 68000 shave two cycles off an RTS if the attached memory system could process a two-cycle "ignored value" read, but...

            – supercat
            Jul 20 at 16:34


















          7















          I've searched the manual and cannot find why it's "quick".




          Simply because MOVEQ is a single word (two byte) instruction, which can be fetched in a single memory cycle, while an equal constant move will be 2 (MOVE.W) or 3 words (MOVE.L) and need one/two additional memory cycles - each four clocks.



          So effectively you'll get the following execution timing:




          • MOVEQ #5,D0 - 4 Clocks,


          • MOVE.B #5,D0 - 8 Clocks,


          • MOVE.W #5,D0 - 8 Clocks,


          • MOVE.L #5,D0 - 12 Clocks,

          making MOVEQ about 50/66% faster.



          MOVEQ even got it's own opcode (7) to squeeze all into a single word.



          ADDQ and SUBQ works similar (*1) - except mixed into the Scc/DBcc/TRAPcc group (5).




          I gather the MOVEQ is a better fit for moving 8-bit data. Or, is it ONLY 8-bits of data as I cannot seem to confirm.




          Only. There is no room for more than 8 bits of constant within the 16 bit instruction word (*2), as the encoding is



          |OPCODE|Dest.| Res || Data |
          |Group |Reg. | || |
          | 0111 | xxx | 0 || yyyy yyyy |



          *1 - Not exactly like it as they may have additional parameters.



          *2 - Well, in the original 68000 encoding there was one unused bit, but won't get far.






          share|improve this answer



























          • Don't forget the move.b instruction...

            – UncleBod
            Jul 18 at 16:14











          • @UncleBod MOVE.B is exactly like MOVE.W

            – Raffzahn
            Jul 18 at 16:40











          • "making MOVEQ about 50/66% faster." - math.stackexchange.com/questions/1404234/…

            – Bruce Abbott
            Jul 20 at 23:20











          • @BruceAbbott And your point is?

            – Raffzahn
            Jul 21 at 10:19














          Your Answer








          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "648"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          autoActivateHeartbeat: false,
          convertImagesToLinks: false,
          noModals: true,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          imageUploader:
          brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
          contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
          allowUrls: true
          ,
          noCode: true, onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );













          draft saved

          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fretrocomputing.stackexchange.com%2fquestions%2f11720%2fwhat-makes-moveq-quicker-than-a-normal-move-in-68000-assembly%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown

























          3 Answers
          3






          active

          oldest

          votes








          3 Answers
          3






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes









          29














          The MOVE immediate instruction takes 8 cycles in byte and word modes. There are two memory reads, one for the instruction and one for the immediate value.



          The MOVEQ instruction encodes the immediate value into the instruction op-code itself, so only takes 4 cycles and 1 memory read. It can only take a byte immediate value.



          MOVEQ #1, D0 (4 clocks, 1 memory read)
          MOVE.b #1, D0 (8 clocks, 2 memory reads)
          MOVE.w #1000, D0 (8 clocks, 2 memory reads)



          Note that the immediate value loaded for byte and word size moves overwrites the entire 32 bits of the register, and is sign extended.



          As such, for loading values $00-$FF, it is twice as fast in instruction cycles and uses half as much memory bandwidth (important on systems where it is shared with DMA).






          share|improve this answer






















          • 7





            "for loading values 0-255..." - more precisely, it loads a 32 bit value between -128 and +127.This is 3 times faster than doing it the normal way with move.l #...

            – Bruce Abbott
            Jul 19 at 1:20






          • 1





            @BruceAbbott that's a good point, it does sign extend to 32 bits.

            – user
            Jul 19 at 7:54






          • 1





            Showing the actual hex codes of the three instructions will improve the answer.

            – Leo B.
            Jul 19 at 23:17















          29














          The MOVE immediate instruction takes 8 cycles in byte and word modes. There are two memory reads, one for the instruction and one for the immediate value.



          The MOVEQ instruction encodes the immediate value into the instruction op-code itself, so only takes 4 cycles and 1 memory read. It can only take a byte immediate value.



          MOVEQ #1, D0 (4 clocks, 1 memory read)
          MOVE.b #1, D0 (8 clocks, 2 memory reads)
          MOVE.w #1000, D0 (8 clocks, 2 memory reads)



          Note that the immediate value loaded for byte and word size moves overwrites the entire 32 bits of the register, and is sign extended.



          As such, for loading values $00-$FF, it is twice as fast in instruction cycles and uses half as much memory bandwidth (important on systems where it is shared with DMA).






          share|improve this answer






















          • 7





            "for loading values 0-255..." - more precisely, it loads a 32 bit value between -128 and +127.This is 3 times faster than doing it the normal way with move.l #...

            – Bruce Abbott
            Jul 19 at 1:20






          • 1





            @BruceAbbott that's a good point, it does sign extend to 32 bits.

            – user
            Jul 19 at 7:54






          • 1





            Showing the actual hex codes of the three instructions will improve the answer.

            – Leo B.
            Jul 19 at 23:17













          29












          29








          29







          The MOVE immediate instruction takes 8 cycles in byte and word modes. There are two memory reads, one for the instruction and one for the immediate value.



          The MOVEQ instruction encodes the immediate value into the instruction op-code itself, so only takes 4 cycles and 1 memory read. It can only take a byte immediate value.



          MOVEQ #1, D0 (4 clocks, 1 memory read)
          MOVE.b #1, D0 (8 clocks, 2 memory reads)
          MOVE.w #1000, D0 (8 clocks, 2 memory reads)



          Note that the immediate value loaded for byte and word size moves overwrites the entire 32 bits of the register, and is sign extended.



          As such, for loading values $00-$FF, it is twice as fast in instruction cycles and uses half as much memory bandwidth (important on systems where it is shared with DMA).






          share|improve this answer















          The MOVE immediate instruction takes 8 cycles in byte and word modes. There are two memory reads, one for the instruction and one for the immediate value.



          The MOVEQ instruction encodes the immediate value into the instruction op-code itself, so only takes 4 cycles and 1 memory read. It can only take a byte immediate value.



          MOVEQ #1, D0 (4 clocks, 1 memory read)
          MOVE.b #1, D0 (8 clocks, 2 memory reads)
          MOVE.w #1000, D0 (8 clocks, 2 memory reads)



          Note that the immediate value loaded for byte and word size moves overwrites the entire 32 bits of the register, and is sign extended.



          As such, for loading values $00-$FF, it is twice as fast in instruction cycles and uses half as much memory bandwidth (important on systems where it is shared with DMA).







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Jul 19 at 7:56

























          answered Jul 18 at 15:47









          useruser

          8,6072 gold badges15 silver badges34 bronze badges




          8,6072 gold badges15 silver badges34 bronze badges










          • 7





            "for loading values 0-255..." - more precisely, it loads a 32 bit value between -128 and +127.This is 3 times faster than doing it the normal way with move.l #...

            – Bruce Abbott
            Jul 19 at 1:20






          • 1





            @BruceAbbott that's a good point, it does sign extend to 32 bits.

            – user
            Jul 19 at 7:54






          • 1





            Showing the actual hex codes of the three instructions will improve the answer.

            – Leo B.
            Jul 19 at 23:17












          • 7





            "for loading values 0-255..." - more precisely, it loads a 32 bit value between -128 and +127.This is 3 times faster than doing it the normal way with move.l #...

            – Bruce Abbott
            Jul 19 at 1:20






          • 1





            @BruceAbbott that's a good point, it does sign extend to 32 bits.

            – user
            Jul 19 at 7:54






          • 1





            Showing the actual hex codes of the three instructions will improve the answer.

            – Leo B.
            Jul 19 at 23:17







          7




          7





          "for loading values 0-255..." - more precisely, it loads a 32 bit value between -128 and +127.This is 3 times faster than doing it the normal way with move.l #...

          – Bruce Abbott
          Jul 19 at 1:20





          "for loading values 0-255..." - more precisely, it loads a 32 bit value between -128 and +127.This is 3 times faster than doing it the normal way with move.l #...

          – Bruce Abbott
          Jul 19 at 1:20




          1




          1





          @BruceAbbott that's a good point, it does sign extend to 32 bits.

          – user
          Jul 19 at 7:54





          @BruceAbbott that's a good point, it does sign extend to 32 bits.

          – user
          Jul 19 at 7:54




          1




          1





          Showing the actual hex codes of the three instructions will improve the answer.

          – Leo B.
          Jul 19 at 23:17





          Showing the actual hex codes of the three instructions will improve the answer.

          – Leo B.
          Jul 19 at 23:17













          11














          To give the exact cycle-by-cycle breakdown:



          MOVEQ is a one word instruction so will nominally perform in four cycles; in practice it can occur immediately following operation decoding because all necessary information is within the instruction word. Four cycles are then expended fetching the next value to feed into the instruction prefetch queue.



          Both MOVE.b MOVE.w are two-word instructions. The 68000 actually knows both words before either instruction begins, so both can occur pretty much immediately but both then require that a further two words be fetched to repopulate the instruction prefetch queue, which occupies eight cycles before the next instruction can begin.



          MOVE.l is a three-word instruction. The 68000's prefetch queue is only two words long. So after decoding it can't actually be completed until a further word has been fetched, and after that fetch a further two will be needed to repopulate the queue. So twelve cycles total.



          MOVEs are the most primitive operation available; the general rule is that the number of words needed to complete an operation plus the number needed then to [re]populate the prefetch queue is only a floor for cycle counting. See Yacht.txt for a more detailed summary of the work each instruction does; bear in mind that things like RTS are only one word long but imply two further prefetches since the whole queue needs to be replenished, and anything that might change the supervisor flag will often result in a refetch of data that's ostensibly already in the queue, in case the memory subsystem is designed to serve conditional results.






          share|improve this answer



























          • (obiter: this answer was offered despite the other answer already being present because I felt the fact of the prefetch queue makes it a different answer, technically. Even if very similar)

            – Tommy
            Jul 19 at 20:56






          • 1





            Comment only: Your last paragraph doesn't seem to quite 'scan' correctly - or I'm still half asleep :-). I think "that anything that things" contains at least one typo (but may not) and " in case the ..." may not say what you want as precisely as it could (but may :-) ).

            – Russell McMahon
            Jul 19 at 22:32











          • Are there any circumstances in which the time to execute a 68000 instruction would vary with context? For example, how would the timing of muls r0,r1 / moveq r0,#0 / rts compare to muls r0,r1 / move.l r0,#0 / rts?

            – supercat
            Jul 19 at 22:33











          • @supercat I can't think of anything, as every instruction is microcoded to make sure the prefetch queue is exactly full again within its execution time. It's not intelligent like, say, the instruction queue on an 8086.

            – Tommy
            Jul 20 at 13:40











          • @Tommy: I can't think of any advantage to waiting before the second fetch is complete before starting instruction execution, but could easily imagine that instruction execution decode couldn't start until the cycle after the first fetch was complete, starting the fetch of the second word immediately without regard for whether it's needed would allow it to begin a cycle or two sooner than would otherwise be possible. It may have been possible to design the 68000 shave two cycles off an RTS if the attached memory system could process a two-cycle "ignored value" read, but...

            – supercat
            Jul 20 at 16:34















          11














          To give the exact cycle-by-cycle breakdown:



          MOVEQ is a one word instruction so will nominally perform in four cycles; in practice it can occur immediately following operation decoding because all necessary information is within the instruction word. Four cycles are then expended fetching the next value to feed into the instruction prefetch queue.



          Both MOVE.b MOVE.w are two-word instructions. The 68000 actually knows both words before either instruction begins, so both can occur pretty much immediately but both then require that a further two words be fetched to repopulate the instruction prefetch queue, which occupies eight cycles before the next instruction can begin.



          MOVE.l is a three-word instruction. The 68000's prefetch queue is only two words long. So after decoding it can't actually be completed until a further word has been fetched, and after that fetch a further two will be needed to repopulate the queue. So twelve cycles total.



          MOVEs are the most primitive operation available; the general rule is that the number of words needed to complete an operation plus the number needed then to [re]populate the prefetch queue is only a floor for cycle counting. See Yacht.txt for a more detailed summary of the work each instruction does; bear in mind that things like RTS are only one word long but imply two further prefetches since the whole queue needs to be replenished, and anything that might change the supervisor flag will often result in a refetch of data that's ostensibly already in the queue, in case the memory subsystem is designed to serve conditional results.






          share|improve this answer



























          • (obiter: this answer was offered despite the other answer already being present because I felt the fact of the prefetch queue makes it a different answer, technically. Even if very similar)

            – Tommy
            Jul 19 at 20:56






          • 1





            Comment only: Your last paragraph doesn't seem to quite 'scan' correctly - or I'm still half asleep :-). I think "that anything that things" contains at least one typo (but may not) and " in case the ..." may not say what you want as precisely as it could (but may :-) ).

            – Russell McMahon
            Jul 19 at 22:32











          • Are there any circumstances in which the time to execute a 68000 instruction would vary with context? For example, how would the timing of muls r0,r1 / moveq r0,#0 / rts compare to muls r0,r1 / move.l r0,#0 / rts?

            – supercat
            Jul 19 at 22:33











          • @supercat I can't think of anything, as every instruction is microcoded to make sure the prefetch queue is exactly full again within its execution time. It's not intelligent like, say, the instruction queue on an 8086.

            – Tommy
            Jul 20 at 13:40











          • @Tommy: I can't think of any advantage to waiting before the second fetch is complete before starting instruction execution, but could easily imagine that instruction execution decode couldn't start until the cycle after the first fetch was complete, starting the fetch of the second word immediately without regard for whether it's needed would allow it to begin a cycle or two sooner than would otherwise be possible. It may have been possible to design the 68000 shave two cycles off an RTS if the attached memory system could process a two-cycle "ignored value" read, but...

            – supercat
            Jul 20 at 16:34













          11












          11








          11







          To give the exact cycle-by-cycle breakdown:



          MOVEQ is a one word instruction so will nominally perform in four cycles; in practice it can occur immediately following operation decoding because all necessary information is within the instruction word. Four cycles are then expended fetching the next value to feed into the instruction prefetch queue.



          Both MOVE.b MOVE.w are two-word instructions. The 68000 actually knows both words before either instruction begins, so both can occur pretty much immediately but both then require that a further two words be fetched to repopulate the instruction prefetch queue, which occupies eight cycles before the next instruction can begin.



          MOVE.l is a three-word instruction. The 68000's prefetch queue is only two words long. So after decoding it can't actually be completed until a further word has been fetched, and after that fetch a further two will be needed to repopulate the queue. So twelve cycles total.



          MOVEs are the most primitive operation available; the general rule is that the number of words needed to complete an operation plus the number needed then to [re]populate the prefetch queue is only a floor for cycle counting. See Yacht.txt for a more detailed summary of the work each instruction does; bear in mind that things like RTS are only one word long but imply two further prefetches since the whole queue needs to be replenished, and anything that might change the supervisor flag will often result in a refetch of data that's ostensibly already in the queue, in case the memory subsystem is designed to serve conditional results.






          share|improve this answer















          To give the exact cycle-by-cycle breakdown:



          MOVEQ is a one word instruction so will nominally perform in four cycles; in practice it can occur immediately following operation decoding because all necessary information is within the instruction word. Four cycles are then expended fetching the next value to feed into the instruction prefetch queue.



          Both MOVE.b MOVE.w are two-word instructions. The 68000 actually knows both words before either instruction begins, so both can occur pretty much immediately but both then require that a further two words be fetched to repopulate the instruction prefetch queue, which occupies eight cycles before the next instruction can begin.



          MOVE.l is a three-word instruction. The 68000's prefetch queue is only two words long. So after decoding it can't actually be completed until a further word has been fetched, and after that fetch a further two will be needed to repopulate the queue. So twelve cycles total.



          MOVEs are the most primitive operation available; the general rule is that the number of words needed to complete an operation plus the number needed then to [re]populate the prefetch queue is only a floor for cycle counting. See Yacht.txt for a more detailed summary of the work each instruction does; bear in mind that things like RTS are only one word long but imply two further prefetches since the whole queue needs to be replenished, and anything that might change the supervisor flag will often result in a refetch of data that's ostensibly already in the queue, in case the memory subsystem is designed to serve conditional results.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Jul 19 at 23:36

























          answered Jul 18 at 17:25









          TommyTommy

          18.6k1 gold badge53 silver badges94 bronze badges




          18.6k1 gold badge53 silver badges94 bronze badges















          • (obiter: this answer was offered despite the other answer already being present because I felt the fact of the prefetch queue makes it a different answer, technically. Even if very similar)

            – Tommy
            Jul 19 at 20:56






          • 1





            Comment only: Your last paragraph doesn't seem to quite 'scan' correctly - or I'm still half asleep :-). I think "that anything that things" contains at least one typo (but may not) and " in case the ..." may not say what you want as precisely as it could (but may :-) ).

            – Russell McMahon
            Jul 19 at 22:32











          • Are there any circumstances in which the time to execute a 68000 instruction would vary with context? For example, how would the timing of muls r0,r1 / moveq r0,#0 / rts compare to muls r0,r1 / move.l r0,#0 / rts?

            – supercat
            Jul 19 at 22:33











          • @supercat I can't think of anything, as every instruction is microcoded to make sure the prefetch queue is exactly full again within its execution time. It's not intelligent like, say, the instruction queue on an 8086.

            – Tommy
            Jul 20 at 13:40











          • @Tommy: I can't think of any advantage to waiting before the second fetch is complete before starting instruction execution, but could easily imagine that instruction execution decode couldn't start until the cycle after the first fetch was complete, starting the fetch of the second word immediately without regard for whether it's needed would allow it to begin a cycle or two sooner than would otherwise be possible. It may have been possible to design the 68000 shave two cycles off an RTS if the attached memory system could process a two-cycle "ignored value" read, but...

            – supercat
            Jul 20 at 16:34

















          • (obiter: this answer was offered despite the other answer already being present because I felt the fact of the prefetch queue makes it a different answer, technically. Even if very similar)

            – Tommy
            Jul 19 at 20:56






          • 1





            Comment only: Your last paragraph doesn't seem to quite 'scan' correctly - or I'm still half asleep :-). I think "that anything that things" contains at least one typo (but may not) and " in case the ..." may not say what you want as precisely as it could (but may :-) ).

            – Russell McMahon
            Jul 19 at 22:32











          • Are there any circumstances in which the time to execute a 68000 instruction would vary with context? For example, how would the timing of muls r0,r1 / moveq r0,#0 / rts compare to muls r0,r1 / move.l r0,#0 / rts?

            – supercat
            Jul 19 at 22:33











          • @supercat I can't think of anything, as every instruction is microcoded to make sure the prefetch queue is exactly full again within its execution time. It's not intelligent like, say, the instruction queue on an 8086.

            – Tommy
            Jul 20 at 13:40











          • @Tommy: I can't think of any advantage to waiting before the second fetch is complete before starting instruction execution, but could easily imagine that instruction execution decode couldn't start until the cycle after the first fetch was complete, starting the fetch of the second word immediately without regard for whether it's needed would allow it to begin a cycle or two sooner than would otherwise be possible. It may have been possible to design the 68000 shave two cycles off an RTS if the attached memory system could process a two-cycle "ignored value" read, but...

            – supercat
            Jul 20 at 16:34
















          (obiter: this answer was offered despite the other answer already being present because I felt the fact of the prefetch queue makes it a different answer, technically. Even if very similar)

          – Tommy
          Jul 19 at 20:56





          (obiter: this answer was offered despite the other answer already being present because I felt the fact of the prefetch queue makes it a different answer, technically. Even if very similar)

          – Tommy
          Jul 19 at 20:56




          1




          1





          Comment only: Your last paragraph doesn't seem to quite 'scan' correctly - or I'm still half asleep :-). I think "that anything that things" contains at least one typo (but may not) and " in case the ..." may not say what you want as precisely as it could (but may :-) ).

          – Russell McMahon
          Jul 19 at 22:32





          Comment only: Your last paragraph doesn't seem to quite 'scan' correctly - or I'm still half asleep :-). I think "that anything that things" contains at least one typo (but may not) and " in case the ..." may not say what you want as precisely as it could (but may :-) ).

          – Russell McMahon
          Jul 19 at 22:32













          Are there any circumstances in which the time to execute a 68000 instruction would vary with context? For example, how would the timing of muls r0,r1 / moveq r0,#0 / rts compare to muls r0,r1 / move.l r0,#0 / rts?

          – supercat
          Jul 19 at 22:33





          Are there any circumstances in which the time to execute a 68000 instruction would vary with context? For example, how would the timing of muls r0,r1 / moveq r0,#0 / rts compare to muls r0,r1 / move.l r0,#0 / rts?

          – supercat
          Jul 19 at 22:33













          @supercat I can't think of anything, as every instruction is microcoded to make sure the prefetch queue is exactly full again within its execution time. It's not intelligent like, say, the instruction queue on an 8086.

          – Tommy
          Jul 20 at 13:40





          @supercat I can't think of anything, as every instruction is microcoded to make sure the prefetch queue is exactly full again within its execution time. It's not intelligent like, say, the instruction queue on an 8086.

          – Tommy
          Jul 20 at 13:40













          @Tommy: I can't think of any advantage to waiting before the second fetch is complete before starting instruction execution, but could easily imagine that instruction execution decode couldn't start until the cycle after the first fetch was complete, starting the fetch of the second word immediately without regard for whether it's needed would allow it to begin a cycle or two sooner than would otherwise be possible. It may have been possible to design the 68000 shave two cycles off an RTS if the attached memory system could process a two-cycle "ignored value" read, but...

          – supercat
          Jul 20 at 16:34





          @Tommy: I can't think of any advantage to waiting before the second fetch is complete before starting instruction execution, but could easily imagine that instruction execution decode couldn't start until the cycle after the first fetch was complete, starting the fetch of the second word immediately without regard for whether it's needed would allow it to begin a cycle or two sooner than would otherwise be possible. It may have been possible to design the 68000 shave two cycles off an RTS if the attached memory system could process a two-cycle "ignored value" read, but...

          – supercat
          Jul 20 at 16:34











          7















          I've searched the manual and cannot find why it's "quick".




          Simply because MOVEQ is a single word (two byte) instruction, which can be fetched in a single memory cycle, while an equal constant move will be 2 (MOVE.W) or 3 words (MOVE.L) and need one/two additional memory cycles - each four clocks.



          So effectively you'll get the following execution timing:




          • MOVEQ #5,D0 - 4 Clocks,


          • MOVE.B #5,D0 - 8 Clocks,


          • MOVE.W #5,D0 - 8 Clocks,


          • MOVE.L #5,D0 - 12 Clocks,

          making MOVEQ about 50/66% faster.



          MOVEQ even got it's own opcode (7) to squeeze all into a single word.



          ADDQ and SUBQ works similar (*1) - except mixed into the Scc/DBcc/TRAPcc group (5).




          I gather the MOVEQ is a better fit for moving 8-bit data. Or, is it ONLY 8-bits of data as I cannot seem to confirm.




          Only. There is no room for more than 8 bits of constant within the 16 bit instruction word (*2), as the encoding is



          |OPCODE|Dest.| Res || Data |
          |Group |Reg. | || |
          | 0111 | xxx | 0 || yyyy yyyy |



          *1 - Not exactly like it as they may have additional parameters.



          *2 - Well, in the original 68000 encoding there was one unused bit, but won't get far.






          share|improve this answer



























          • Don't forget the move.b instruction...

            – UncleBod
            Jul 18 at 16:14











          • @UncleBod MOVE.B is exactly like MOVE.W

            – Raffzahn
            Jul 18 at 16:40











          • "making MOVEQ about 50/66% faster." - math.stackexchange.com/questions/1404234/…

            – Bruce Abbott
            Jul 20 at 23:20











          • @BruceAbbott And your point is?

            – Raffzahn
            Jul 21 at 10:19
















          7















          I've searched the manual and cannot find why it's "quick".




          Simply because MOVEQ is a single word (two byte) instruction, which can be fetched in a single memory cycle, while an equal constant move will be 2 (MOVE.W) or 3 words (MOVE.L) and need one/two additional memory cycles - each four clocks.



          So effectively you'll get the following execution timing:




          • MOVEQ #5,D0 - 4 Clocks,


          • MOVE.B #5,D0 - 8 Clocks,


          • MOVE.W #5,D0 - 8 Clocks,


          • MOVE.L #5,D0 - 12 Clocks,

          making MOVEQ about 50/66% faster.



          MOVEQ even got it's own opcode (7) to squeeze all into a single word.



          ADDQ and SUBQ works similar (*1) - except mixed into the Scc/DBcc/TRAPcc group (5).




          I gather the MOVEQ is a better fit for moving 8-bit data. Or, is it ONLY 8-bits of data as I cannot seem to confirm.




          Only. There is no room for more than 8 bits of constant within the 16 bit instruction word (*2), as the encoding is



          |OPCODE|Dest.| Res || Data |
          |Group |Reg. | || |
          | 0111 | xxx | 0 || yyyy yyyy |



          *1 - Not exactly like it as they may have additional parameters.



          *2 - Well, in the original 68000 encoding there was one unused bit, but won't get far.






          share|improve this answer



























          • Don't forget the move.b instruction...

            – UncleBod
            Jul 18 at 16:14











          • @UncleBod MOVE.B is exactly like MOVE.W

            – Raffzahn
            Jul 18 at 16:40











          • "making MOVEQ about 50/66% faster." - math.stackexchange.com/questions/1404234/…

            – Bruce Abbott
            Jul 20 at 23:20











          • @BruceAbbott And your point is?

            – Raffzahn
            Jul 21 at 10:19














          7












          7








          7








          I've searched the manual and cannot find why it's "quick".




          Simply because MOVEQ is a single word (two byte) instruction, which can be fetched in a single memory cycle, while an equal constant move will be 2 (MOVE.W) or 3 words (MOVE.L) and need one/two additional memory cycles - each four clocks.



          So effectively you'll get the following execution timing:




          • MOVEQ #5,D0 - 4 Clocks,


          • MOVE.B #5,D0 - 8 Clocks,


          • MOVE.W #5,D0 - 8 Clocks,


          • MOVE.L #5,D0 - 12 Clocks,

          making MOVEQ about 50/66% faster.



          MOVEQ even got it's own opcode (7) to squeeze all into a single word.



          ADDQ and SUBQ works similar (*1) - except mixed into the Scc/DBcc/TRAPcc group (5).




          I gather the MOVEQ is a better fit for moving 8-bit data. Or, is it ONLY 8-bits of data as I cannot seem to confirm.




          Only. There is no room for more than 8 bits of constant within the 16 bit instruction word (*2), as the encoding is



          |OPCODE|Dest.| Res || Data |
          |Group |Reg. | || |
          | 0111 | xxx | 0 || yyyy yyyy |



          *1 - Not exactly like it as they may have additional parameters.



          *2 - Well, in the original 68000 encoding there was one unused bit, but won't get far.






          share|improve this answer
















          I've searched the manual and cannot find why it's "quick".




          Simply because MOVEQ is a single word (two byte) instruction, which can be fetched in a single memory cycle, while an equal constant move will be 2 (MOVE.W) or 3 words (MOVE.L) and need one/two additional memory cycles - each four clocks.



          So effectively you'll get the following execution timing:




          • MOVEQ #5,D0 - 4 Clocks,


          • MOVE.B #5,D0 - 8 Clocks,


          • MOVE.W #5,D0 - 8 Clocks,


          • MOVE.L #5,D0 - 12 Clocks,

          making MOVEQ about 50/66% faster.



          MOVEQ even got it's own opcode (7) to squeeze all into a single word.



          ADDQ and SUBQ works similar (*1) - except mixed into the Scc/DBcc/TRAPcc group (5).




          I gather the MOVEQ is a better fit for moving 8-bit data. Or, is it ONLY 8-bits of data as I cannot seem to confirm.




          Only. There is no room for more than 8 bits of constant within the 16 bit instruction word (*2), as the encoding is



          |OPCODE|Dest.| Res || Data |
          |Group |Reg. | || |
          | 0111 | xxx | 0 || yyyy yyyy |



          *1 - Not exactly like it as they may have additional parameters.



          *2 - Well, in the original 68000 encoding there was one unused bit, but won't get far.







          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited Jul 20 at 7:36

























          answered Jul 18 at 15:39









          RaffzahnRaffzahn

          65.5k6 gold badges160 silver badges270 bronze badges




          65.5k6 gold badges160 silver badges270 bronze badges















          • Don't forget the move.b instruction...

            – UncleBod
            Jul 18 at 16:14











          • @UncleBod MOVE.B is exactly like MOVE.W

            – Raffzahn
            Jul 18 at 16:40











          • "making MOVEQ about 50/66% faster." - math.stackexchange.com/questions/1404234/…

            – Bruce Abbott
            Jul 20 at 23:20











          • @BruceAbbott And your point is?

            – Raffzahn
            Jul 21 at 10:19


















          • Don't forget the move.b instruction...

            – UncleBod
            Jul 18 at 16:14











          • @UncleBod MOVE.B is exactly like MOVE.W

            – Raffzahn
            Jul 18 at 16:40











          • "making MOVEQ about 50/66% faster." - math.stackexchange.com/questions/1404234/…

            – Bruce Abbott
            Jul 20 at 23:20











          • @BruceAbbott And your point is?

            – Raffzahn
            Jul 21 at 10:19

















          Don't forget the move.b instruction...

          – UncleBod
          Jul 18 at 16:14





          Don't forget the move.b instruction...

          – UncleBod
          Jul 18 at 16:14













          @UncleBod MOVE.B is exactly like MOVE.W

          – Raffzahn
          Jul 18 at 16:40





          @UncleBod MOVE.B is exactly like MOVE.W

          – Raffzahn
          Jul 18 at 16:40













          "making MOVEQ about 50/66% faster." - math.stackexchange.com/questions/1404234/…

          – Bruce Abbott
          Jul 20 at 23:20





          "making MOVEQ about 50/66% faster." - math.stackexchange.com/questions/1404234/…

          – Bruce Abbott
          Jul 20 at 23:20













          @BruceAbbott And your point is?

          – Raffzahn
          Jul 21 at 10:19






          @BruceAbbott And your point is?

          – Raffzahn
          Jul 21 at 10:19


















          draft saved

          draft discarded
















































          Thanks for contributing an answer to Retrocomputing Stack Exchange!


          • Please be sure to answer the question. Provide details and share your research!

          But avoid


          • Asking for help, clarification, or responding to other answers.

          • Making statements based on opinion; back them up with references or personal experience.

          To learn more, see our tips on writing great answers.




          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fretrocomputing.stackexchange.com%2fquestions%2f11720%2fwhat-makes-moveq-quicker-than-a-normal-move-in-68000-assembly%23new-answer', 'question_page');

          );

          Post as a guest















          Required, but never shown





















































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown

































          Required, but never shown














          Required, but never shown












          Required, but never shown







          Required, but never shown







          Popular posts from this blog

          Category:9 (number) SubcategoriesMedia in category "9 (number)"Navigation menuUpload mediaGND ID: 4485639-8Library of Congress authority ID: sh85091979ReasonatorScholiaStatistics

          Circuit construction for execution of conditional statements using least significant bitHow are two different registers being used as “control”?How exactly is the stated composite state of the two registers being produced using the $R_zz$ controlled rotations?Efficiently performing controlled rotations in HHLWould this quantum algorithm implementation work?How to prepare a superposed states of odd integers from $1$ to $sqrtN$?Why is this implementation of the order finding algorithm not working?Circuit construction for Hamiltonian simulationHow can I invert the least significant bit of a certain term of a superposed state?Implementing an oracleImplementing a controlled sum operation

          Magento 2 “No Payment Methods” in Admin New OrderHow to integrate Paypal Express Checkout with the Magento APIMagento 1.5 - Sales > Order > edit order and shipping methods disappearAuto Invoice Check/Money Order Payment methodAdd more simple payment methods?Shipping methods not showingWhat should I do to change payment methods if changing the configuration has no effects?1.9 - No Payment Methods showing upMy Payment Methods not Showing for downloadable/virtual product when checkout?Magento2 API to access internal payment methodHow to call an existing payment methods in the registration form?