Embedded C - Most elegant way to insert a delayEmbedded C - USB stack, arm-none-eabi-gcc settings and Matlab interfaceCalculating Circuit Delayavr attiny84: wrong delayDelay function in ARM programmingSimple delay functions for TI Launchpad?PIC MCU: Software or hardware delay?Will it work? 12V power off delay for embedded computerDelay is calculated wrong in embedded CNon blocking delay for state machinesDelay pulse (trigger) for a variable delay respectively to 0-5V inputWhat is the simplest way to introduce a ~ 1 second delay in powering on a 230v AC device?

As a 16 year old, how can I keep my money safe from my mother?

Can the ground attached to neutral fool a receptacle tester?

A tool to replace all words with antonyms

How is this kind of structure made?

Does this Foo machine halt?

Who are these characters/superheroes in the posters from Chris's room in Family Guy?

What happens if I delete an icloud backup?

How to avoid the "need" to learn more before conducting research?

Generate Brainfuck for the numbers 1–255

Te-form and かつ and も?

What game uses dice with sides powers of 2?

Is this curved text blend possible in Illustrator?

DeclareMathOperator and widearcarrow with kpfonts

(11 of 11: Meta) What is Pyramid Cult's All-Time Favorite?

Is it okay for a ticket seller to grab a tip in the USA?

AsyncDictionary - Can you break thread safety?

How can Radagast come across Gandalf and Thorin's company?

Why isn’t SHA-3 in wider use?

constant evaluation when using differential equations.

Double redundancy for the Saturn V LVDC computer memory, how were disagreements resolved?

Loading military units into ships optimally, using backtracking

Three legged NOT gate? What is this symbol?

Email address etiquette - Which address should I use to contact professors?

Different inverter (logic gate) symbols



Embedded C - Most elegant way to insert a delay


Embedded C - USB stack, arm-none-eabi-gcc settings and Matlab interfaceCalculating Circuit Delayavr attiny84: wrong delayDelay function in ARM programmingSimple delay functions for TI Launchpad?PIC MCU: Software or hardware delay?Will it work? 12V power off delay for embedded computerDelay is calculated wrong in embedded CNon blocking delay for state machinesDelay pulse (trigger) for a variable delay respectively to 0-5V inputWhat is the simplest way to introduce a ~ 1 second delay in powering on a 230v AC device?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








7












$begingroup$


I'm working on a project involving a cortex-m4 mcu (LPC4370).
And I need to insert a delay while turning on compiler's optimization.
So far my workaround was to move up and down a digital output inside a for-loop:



for (int i = 0; i < 50000; i++)

LPC_GPIO_PORT->B[DEBUGPIN_PORT][DEBUG_PIN1] = TRUE;
LPC_GPIO_PORT->B[DEBUGPIN_PORT][DEBUG_PIN1] = FALSE;



But I wonder if there's a better way to fool GCC.










share|improve this question









$endgroup$









  • 7




    $begingroup$
    This is not a good way of doing delays.
    $endgroup$
    – Marko Buršič
    Jul 31 at 9:59






  • 3




    $begingroup$
    How long do you want the delay to be? How precise must it be?
    $endgroup$
    – Elliot Alderson
    Jul 31 at 11:31






  • 4




    $begingroup$
    You could set a timer to interrupt on underflow / overflow for your desired delay and just enter sleep mode. The processor will wake up at the interrupt which could simply have a single return statement.
    $endgroup$
    – Peter Smith
    Jul 31 at 13:06






  • 4




    $begingroup$
    I'm voting to close this question as off-topic because it is an XY problem: Using a delay in communication code is essentially always incorrect. You need to understand and solve your actual problem. In non-communication cases where delay is appropriate you'll find that most MCU software setups have a busy-wait delay mechanism.
    $endgroup$
    – Chris Stratton
    Jul 31 at 14:28






  • 3




    $begingroup$
    @ChrisStratton I'm not aware that XY-type posts is a valid close reason anywhere across the network.
    $endgroup$
    – Marc.2377
    Aug 1 at 2:47

















7












$begingroup$


I'm working on a project involving a cortex-m4 mcu (LPC4370).
And I need to insert a delay while turning on compiler's optimization.
So far my workaround was to move up and down a digital output inside a for-loop:



for (int i = 0; i < 50000; i++)

LPC_GPIO_PORT->B[DEBUGPIN_PORT][DEBUG_PIN1] = TRUE;
LPC_GPIO_PORT->B[DEBUGPIN_PORT][DEBUG_PIN1] = FALSE;



But I wonder if there's a better way to fool GCC.










share|improve this question









$endgroup$









  • 7




    $begingroup$
    This is not a good way of doing delays.
    $endgroup$
    – Marko Buršič
    Jul 31 at 9:59






  • 3




    $begingroup$
    How long do you want the delay to be? How precise must it be?
    $endgroup$
    – Elliot Alderson
    Jul 31 at 11:31






  • 4




    $begingroup$
    You could set a timer to interrupt on underflow / overflow for your desired delay and just enter sleep mode. The processor will wake up at the interrupt which could simply have a single return statement.
    $endgroup$
    – Peter Smith
    Jul 31 at 13:06






  • 4




    $begingroup$
    I'm voting to close this question as off-topic because it is an XY problem: Using a delay in communication code is essentially always incorrect. You need to understand and solve your actual problem. In non-communication cases where delay is appropriate you'll find that most MCU software setups have a busy-wait delay mechanism.
    $endgroup$
    – Chris Stratton
    Jul 31 at 14:28






  • 3




    $begingroup$
    @ChrisStratton I'm not aware that XY-type posts is a valid close reason anywhere across the network.
    $endgroup$
    – Marc.2377
    Aug 1 at 2:47













7












7








7


0



$begingroup$


I'm working on a project involving a cortex-m4 mcu (LPC4370).
And I need to insert a delay while turning on compiler's optimization.
So far my workaround was to move up and down a digital output inside a for-loop:



for (int i = 0; i < 50000; i++)

LPC_GPIO_PORT->B[DEBUGPIN_PORT][DEBUG_PIN1] = TRUE;
LPC_GPIO_PORT->B[DEBUGPIN_PORT][DEBUG_PIN1] = FALSE;



But I wonder if there's a better way to fool GCC.










share|improve this question









$endgroup$




I'm working on a project involving a cortex-m4 mcu (LPC4370).
And I need to insert a delay while turning on compiler's optimization.
So far my workaround was to move up and down a digital output inside a for-loop:



for (int i = 0; i < 50000; i++)

LPC_GPIO_PORT->B[DEBUGPIN_PORT][DEBUG_PIN1] = TRUE;
LPC_GPIO_PORT->B[DEBUGPIN_PORT][DEBUG_PIN1] = FALSE;



But I wonder if there's a better way to fool GCC.







c arm delay gcc






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Jul 31 at 9:21









a_beta_bet

961 silver badge11 bronze badges




961 silver badge11 bronze badges










  • 7




    $begingroup$
    This is not a good way of doing delays.
    $endgroup$
    – Marko Buršič
    Jul 31 at 9:59






  • 3




    $begingroup$
    How long do you want the delay to be? How precise must it be?
    $endgroup$
    – Elliot Alderson
    Jul 31 at 11:31






  • 4




    $begingroup$
    You could set a timer to interrupt on underflow / overflow for your desired delay and just enter sleep mode. The processor will wake up at the interrupt which could simply have a single return statement.
    $endgroup$
    – Peter Smith
    Jul 31 at 13:06






  • 4




    $begingroup$
    I'm voting to close this question as off-topic because it is an XY problem: Using a delay in communication code is essentially always incorrect. You need to understand and solve your actual problem. In non-communication cases where delay is appropriate you'll find that most MCU software setups have a busy-wait delay mechanism.
    $endgroup$
    – Chris Stratton
    Jul 31 at 14:28






  • 3




    $begingroup$
    @ChrisStratton I'm not aware that XY-type posts is a valid close reason anywhere across the network.
    $endgroup$
    – Marc.2377
    Aug 1 at 2:47












  • 7




    $begingroup$
    This is not a good way of doing delays.
    $endgroup$
    – Marko Buršič
    Jul 31 at 9:59






  • 3




    $begingroup$
    How long do you want the delay to be? How precise must it be?
    $endgroup$
    – Elliot Alderson
    Jul 31 at 11:31






  • 4




    $begingroup$
    You could set a timer to interrupt on underflow / overflow for your desired delay and just enter sleep mode. The processor will wake up at the interrupt which could simply have a single return statement.
    $endgroup$
    – Peter Smith
    Jul 31 at 13:06






  • 4




    $begingroup$
    I'm voting to close this question as off-topic because it is an XY problem: Using a delay in communication code is essentially always incorrect. You need to understand and solve your actual problem. In non-communication cases where delay is appropriate you'll find that most MCU software setups have a busy-wait delay mechanism.
    $endgroup$
    – Chris Stratton
    Jul 31 at 14:28






  • 3




    $begingroup$
    @ChrisStratton I'm not aware that XY-type posts is a valid close reason anywhere across the network.
    $endgroup$
    – Marc.2377
    Aug 1 at 2:47







7




7




$begingroup$
This is not a good way of doing delays.
$endgroup$
– Marko Buršič
Jul 31 at 9:59




$begingroup$
This is not a good way of doing delays.
$endgroup$
– Marko Buršič
Jul 31 at 9:59




3




3




$begingroup$
How long do you want the delay to be? How precise must it be?
$endgroup$
– Elliot Alderson
Jul 31 at 11:31




$begingroup$
How long do you want the delay to be? How precise must it be?
$endgroup$
– Elliot Alderson
Jul 31 at 11:31




4




4




$begingroup$
You could set a timer to interrupt on underflow / overflow for your desired delay and just enter sleep mode. The processor will wake up at the interrupt which could simply have a single return statement.
$endgroup$
– Peter Smith
Jul 31 at 13:06




$begingroup$
You could set a timer to interrupt on underflow / overflow for your desired delay and just enter sleep mode. The processor will wake up at the interrupt which could simply have a single return statement.
$endgroup$
– Peter Smith
Jul 31 at 13:06




4




4




$begingroup$
I'm voting to close this question as off-topic because it is an XY problem: Using a delay in communication code is essentially always incorrect. You need to understand and solve your actual problem. In non-communication cases where delay is appropriate you'll find that most MCU software setups have a busy-wait delay mechanism.
$endgroup$
– Chris Stratton
Jul 31 at 14:28




$begingroup$
I'm voting to close this question as off-topic because it is an XY problem: Using a delay in communication code is essentially always incorrect. You need to understand and solve your actual problem. In non-communication cases where delay is appropriate you'll find that most MCU software setups have a busy-wait delay mechanism.
$endgroup$
– Chris Stratton
Jul 31 at 14:28




3




3




$begingroup$
@ChrisStratton I'm not aware that XY-type posts is a valid close reason anywhere across the network.
$endgroup$
– Marc.2377
Aug 1 at 2:47




$begingroup$
@ChrisStratton I'm not aware that XY-type posts is a valid close reason anywhere across the network.
$endgroup$
– Marc.2377
Aug 1 at 2:47










4 Answers
4






active

oldest

votes


















21












$begingroup$

The context of this inline no-dependency delay is missing here. But I'm assuming you need a short delay during initialization or other part of the code where it is allowed to be blocking.



Your question shouldn't be how to fool GCC. You should tell GCC what you want.



#pragma GCC push_options
#pragma GCC optimize ("O0")

for(uint i=0; i<T; i++)__NOP()

#pragma GCC pop_options


From the top of my head, this loop will be approximately 5*T clocks.



(source)




Fair comment by Colin on another answer. A NOP is not guaranteed to take cycles on an M4. If you want to slow things down, perhaps ISB (flush pipeline) is a better option. See the Generic User Guide.






share|improve this answer











$endgroup$














  • $begingroup$
    Ok, so it's the first time I see a #pragma. If I understood correctly this kind of settings applies only to this small section of code. Would you advice this over an implementation which uses a timer?
    $endgroup$
    – a_bet
    Jul 31 at 13:22






  • 2




    $begingroup$
    You can also uses a non-nop instruction instead that will not be removed from the pipeline.
    $endgroup$
    – Harry Beadle
    Jul 31 at 14:03






  • 4




    $begingroup$
    @Jeroen3: -O0 does a bit worse than 5*T, something like 8 instructions with a bunch of overhead. It would be better to create a short optimized loop (or at least one which compiles the same way without using pragmas) and use __asm__ __volatile__(""); to prevent GCC from optimizing the loop away, i.e. something like this.
    $endgroup$
    – Groo
    Aug 1 at 3:28






  • 1




    $begingroup$
    @Groo I can't believe we are discussing the effectiveness of delay code in the most dirty way known to man. But yes, a volatile inline assembly line will work just as well. I believe the pragma expresses the intention better to any new readers.
    $endgroup$
    – Jeroen3
    Aug 1 at 5:42






  • 1




    $begingroup$
    asm volatile is the correct way to do this if you don't have a vendor-provided delay function/macro. Don't disable optimizations, even for 1 line, it messes with the for loop.
    $endgroup$
    – Navin
    Aug 1 at 11:40


















12












$begingroup$

Use a timer if you have one available. The SysTick is very simple to configure, with documentation in the Cortex M4 User guide (or M0 if you're on the M0 part). Increment a number in its interrupt, and in your delay function you can block until the number has incremented a certain number of steps.



Your part contains many timers if the systick is already in use, and the principle remains the same. If using a different timer you could configure it as a counter, and just look at its count register to avoid having an interrupt.



If you really want to do it in software, then you can put asm("nop"); inside your loop. nop doesn't have to take time, the processor can remove them from its pipeline without executing it, but the compiler should still generate the loop.






share|improve this answer









$endgroup$














  • $begingroup$
    Systick is very simple to configure but I recommend using another timer as soon as you can since Systick has its limitations with regard to counter size and interrupts when being used for delays.
    $endgroup$
    – DKNguyen
    Jul 31 at 15:47










  • $begingroup$
    You don't even need to use interrupts, just poll the count register. This should be defined as a volatile, so the compiler will not optimise it out. IMO, SysTick is a good choice, as it's often configured to give an 'O/S timer', e.g. a microsecond timer. You will then have simple wait_microseconds(100); kind of things in the code.
    $endgroup$
    – Evil Dog Pie
    Aug 1 at 12:37










  • $begingroup$
    @EvilDogPie Isn't "just poll the count register" almost as bad as just having a tight loop? (although probably easier to stop GCC optimizing it away).
    $endgroup$
    – TripeHound
    Aug 1 at 13:57











  • $begingroup$
    @TripeHound Yes, it's exactly having a tight loop. That's what the o/p is asking for: a tight loop for a short delay that doesn't get removed by the compiler optimisation. There are places where a tight loop is not a bad way to do a short delay, particularly in an embedded system that's not multitasking.
    $endgroup$
    – Evil Dog Pie
    Aug 1 at 14:12


















10












$begingroup$

Not to detract from other answers here, but exactly what length delay do you need? Some datasheets mention nanoseconds; others microseconds; and still others milliseconds.



  • Nanosecond delays are usually best served by adding "time-wasting" instructions. Indeed, sometimes the very speed of the microcontroller means that the delay has been satisfied between the "set the pin high, then set the pin low" instructions that you show. Otherwise, one or more NOP, JMP-to-next-instruction, or other time-wasting instructions are sufficient.

  • Short microsecond delays could be done by a for loop (depending on CPU rate), but longer ones may warrant waiting on an actual timer;

  • Millisecond delays are usually best served by doing something else completely while waiting for the process to complete, then going back to ensure that it has actually been completed before continuing.

In short, it all depends on the peripheral.






share|improve this answer









$endgroup$






















    3












    $begingroup$

    The best way is to use on-chip timers. Systick, RTC or peripheral timers. These have the advantage that the timing is precise, deterministic and can be easily adapted if CPU clock speed is changed. Optionally, you can even let the CPU sleep and use a wake-up interrupt.



    Dirty "busy-delay" loops on the other hand, are rarely accurate and come with various problems such as "tight coupling" to a specific CPU instruction set and clock.



    Some things of note:



    • Toggling a GPIO pin repeatedly is a bad idea since this will draw current needlessly, and potentially also cause EMC issues if the pin is connected to traces.

    • Using NOP instructions might not work. Many architectures (like Cortex M, iirc) are free to skip NOP on the CPU level and actually not execute them.

    If you want insist on generating a dirty busy-loop, then it is sufficient to just volatile qualify the loop iterator. For example:



    void dirty_delay (void)

    for(volatile uint32_t i=0; i<50000u; i++)
    ;



    This is guaranteed to generate various crap code. For example ARM gcc -O3 -ffreestanding gives:



    dirty_delay:
    mov r3, #0
    sub sp, sp, #8
    str r3, [sp, #4]
    ldr r3, [sp, #4]
    ldr r2, .L7
    cmp r3, r2
    bhi .L1
    .L3:
    ldr r3, [sp, #4]
    add r3, r3, #1
    str r3, [sp, #4]
    ldr r3, [sp, #4]
    cmp r3, r2
    bls .L3
    .L1:
    add sp, sp, #8
    bx lr
    .L7:
    .word 49999


    From there on you can in theory calculate how many ticks each instruction takes and change the magic number 50000 accordingly. Pipelining, branch prediction etc will mean that the code might execute faster than just the sum of the clock cycles though. Since the compiler decided to involve the stack, data caching could also play a part.



    My whole point here is that accurately calculating how much time this code will actually take is difficult. Trial & error benchmarking with a scope is probably a more sensible idea than attempting theoretical calculations.






    share|improve this answer









    $endgroup$

















      Your Answer






      StackExchange.ifUsing("editor", function ()
      return StackExchange.using("schematics", function ()
      StackExchange.schematics.init();
      );
      , "cicuitlab");

      StackExchange.ready(function()
      var channelOptions =
      tags: "".split(" "),
      id: "135"
      ;
      initTagRenderer("".split(" "), "".split(" "), channelOptions);

      StackExchange.using("externalEditor", function()
      // Have to fire editor after snippets, if snippets enabled
      if (StackExchange.settings.snippets.snippetsEnabled)
      StackExchange.using("snippets", function()
      createEditor();
      );

      else
      createEditor();

      );

      function createEditor()
      StackExchange.prepareEditor(
      heartbeatType: 'answer',
      autoActivateHeartbeat: false,
      convertImagesToLinks: false,
      noModals: true,
      showLowRepImageUploadWarning: true,
      reputationToPostImages: null,
      bindNavPrevention: true,
      postfix: "",
      imageUploader:
      brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
      contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
      allowUrls: true
      ,
      onDemand: true,
      discardSelector: ".discard-answer"
      ,immediatelyShowMarkdownHelp:true
      );



      );













      draft saved

      draft discarded


















      StackExchange.ready(
      function ()
      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2felectronics.stackexchange.com%2fquestions%2f450971%2fembedded-c-most-elegant-way-to-insert-a-delay%23new-answer', 'question_page');

      );

      Post as a guest















      Required, but never shown

























      4 Answers
      4






      active

      oldest

      votes








      4 Answers
      4






      active

      oldest

      votes









      active

      oldest

      votes






      active

      oldest

      votes









      21












      $begingroup$

      The context of this inline no-dependency delay is missing here. But I'm assuming you need a short delay during initialization or other part of the code where it is allowed to be blocking.



      Your question shouldn't be how to fool GCC. You should tell GCC what you want.



      #pragma GCC push_options
      #pragma GCC optimize ("O0")

      for(uint i=0; i<T; i++)__NOP()

      #pragma GCC pop_options


      From the top of my head, this loop will be approximately 5*T clocks.



      (source)




      Fair comment by Colin on another answer. A NOP is not guaranteed to take cycles on an M4. If you want to slow things down, perhaps ISB (flush pipeline) is a better option. See the Generic User Guide.






      share|improve this answer











      $endgroup$














      • $begingroup$
        Ok, so it's the first time I see a #pragma. If I understood correctly this kind of settings applies only to this small section of code. Would you advice this over an implementation which uses a timer?
        $endgroup$
        – a_bet
        Jul 31 at 13:22






      • 2




        $begingroup$
        You can also uses a non-nop instruction instead that will not be removed from the pipeline.
        $endgroup$
        – Harry Beadle
        Jul 31 at 14:03






      • 4




        $begingroup$
        @Jeroen3: -O0 does a bit worse than 5*T, something like 8 instructions with a bunch of overhead. It would be better to create a short optimized loop (or at least one which compiles the same way without using pragmas) and use __asm__ __volatile__(""); to prevent GCC from optimizing the loop away, i.e. something like this.
        $endgroup$
        – Groo
        Aug 1 at 3:28






      • 1




        $begingroup$
        @Groo I can't believe we are discussing the effectiveness of delay code in the most dirty way known to man. But yes, a volatile inline assembly line will work just as well. I believe the pragma expresses the intention better to any new readers.
        $endgroup$
        – Jeroen3
        Aug 1 at 5:42






      • 1




        $begingroup$
        asm volatile is the correct way to do this if you don't have a vendor-provided delay function/macro. Don't disable optimizations, even for 1 line, it messes with the for loop.
        $endgroup$
        – Navin
        Aug 1 at 11:40















      21












      $begingroup$

      The context of this inline no-dependency delay is missing here. But I'm assuming you need a short delay during initialization or other part of the code where it is allowed to be blocking.



      Your question shouldn't be how to fool GCC. You should tell GCC what you want.



      #pragma GCC push_options
      #pragma GCC optimize ("O0")

      for(uint i=0; i<T; i++)__NOP()

      #pragma GCC pop_options


      From the top of my head, this loop will be approximately 5*T clocks.



      (source)




      Fair comment by Colin on another answer. A NOP is not guaranteed to take cycles on an M4. If you want to slow things down, perhaps ISB (flush pipeline) is a better option. See the Generic User Guide.






      share|improve this answer











      $endgroup$














      • $begingroup$
        Ok, so it's the first time I see a #pragma. If I understood correctly this kind of settings applies only to this small section of code. Would you advice this over an implementation which uses a timer?
        $endgroup$
        – a_bet
        Jul 31 at 13:22






      • 2




        $begingroup$
        You can also uses a non-nop instruction instead that will not be removed from the pipeline.
        $endgroup$
        – Harry Beadle
        Jul 31 at 14:03






      • 4




        $begingroup$
        @Jeroen3: -O0 does a bit worse than 5*T, something like 8 instructions with a bunch of overhead. It would be better to create a short optimized loop (or at least one which compiles the same way without using pragmas) and use __asm__ __volatile__(""); to prevent GCC from optimizing the loop away, i.e. something like this.
        $endgroup$
        – Groo
        Aug 1 at 3:28






      • 1




        $begingroup$
        @Groo I can't believe we are discussing the effectiveness of delay code in the most dirty way known to man. But yes, a volatile inline assembly line will work just as well. I believe the pragma expresses the intention better to any new readers.
        $endgroup$
        – Jeroen3
        Aug 1 at 5:42






      • 1




        $begingroup$
        asm volatile is the correct way to do this if you don't have a vendor-provided delay function/macro. Don't disable optimizations, even for 1 line, it messes with the for loop.
        $endgroup$
        – Navin
        Aug 1 at 11:40













      21












      21








      21





      $begingroup$

      The context of this inline no-dependency delay is missing here. But I'm assuming you need a short delay during initialization or other part of the code where it is allowed to be blocking.



      Your question shouldn't be how to fool GCC. You should tell GCC what you want.



      #pragma GCC push_options
      #pragma GCC optimize ("O0")

      for(uint i=0; i<T; i++)__NOP()

      #pragma GCC pop_options


      From the top of my head, this loop will be approximately 5*T clocks.



      (source)




      Fair comment by Colin on another answer. A NOP is not guaranteed to take cycles on an M4. If you want to slow things down, perhaps ISB (flush pipeline) is a better option. See the Generic User Guide.






      share|improve this answer











      $endgroup$



      The context of this inline no-dependency delay is missing here. But I'm assuming you need a short delay during initialization or other part of the code where it is allowed to be blocking.



      Your question shouldn't be how to fool GCC. You should tell GCC what you want.



      #pragma GCC push_options
      #pragma GCC optimize ("O0")

      for(uint i=0; i<T; i++)__NOP()

      #pragma GCC pop_options


      From the top of my head, this loop will be approximately 5*T clocks.



      (source)




      Fair comment by Colin on another answer. A NOP is not guaranteed to take cycles on an M4. If you want to slow things down, perhaps ISB (flush pipeline) is a better option. See the Generic User Guide.







      share|improve this answer














      share|improve this answer



      share|improve this answer








      edited Jul 31 at 19:00

























      answered Jul 31 at 11:54









      Jeroen3Jeroen3

      13.5k20 silver badges53 bronze badges




      13.5k20 silver badges53 bronze badges














      • $begingroup$
        Ok, so it's the first time I see a #pragma. If I understood correctly this kind of settings applies only to this small section of code. Would you advice this over an implementation which uses a timer?
        $endgroup$
        – a_bet
        Jul 31 at 13:22






      • 2




        $begingroup$
        You can also uses a non-nop instruction instead that will not be removed from the pipeline.
        $endgroup$
        – Harry Beadle
        Jul 31 at 14:03






      • 4




        $begingroup$
        @Jeroen3: -O0 does a bit worse than 5*T, something like 8 instructions with a bunch of overhead. It would be better to create a short optimized loop (or at least one which compiles the same way without using pragmas) and use __asm__ __volatile__(""); to prevent GCC from optimizing the loop away, i.e. something like this.
        $endgroup$
        – Groo
        Aug 1 at 3:28






      • 1




        $begingroup$
        @Groo I can't believe we are discussing the effectiveness of delay code in the most dirty way known to man. But yes, a volatile inline assembly line will work just as well. I believe the pragma expresses the intention better to any new readers.
        $endgroup$
        – Jeroen3
        Aug 1 at 5:42






      • 1




        $begingroup$
        asm volatile is the correct way to do this if you don't have a vendor-provided delay function/macro. Don't disable optimizations, even for 1 line, it messes with the for loop.
        $endgroup$
        – Navin
        Aug 1 at 11:40
















      • $begingroup$
        Ok, so it's the first time I see a #pragma. If I understood correctly this kind of settings applies only to this small section of code. Would you advice this over an implementation which uses a timer?
        $endgroup$
        – a_bet
        Jul 31 at 13:22






      • 2




        $begingroup$
        You can also uses a non-nop instruction instead that will not be removed from the pipeline.
        $endgroup$
        – Harry Beadle
        Jul 31 at 14:03






      • 4




        $begingroup$
        @Jeroen3: -O0 does a bit worse than 5*T, something like 8 instructions with a bunch of overhead. It would be better to create a short optimized loop (or at least one which compiles the same way without using pragmas) and use __asm__ __volatile__(""); to prevent GCC from optimizing the loop away, i.e. something like this.
        $endgroup$
        – Groo
        Aug 1 at 3:28






      • 1




        $begingroup$
        @Groo I can't believe we are discussing the effectiveness of delay code in the most dirty way known to man. But yes, a volatile inline assembly line will work just as well. I believe the pragma expresses the intention better to any new readers.
        $endgroup$
        – Jeroen3
        Aug 1 at 5:42






      • 1




        $begingroup$
        asm volatile is the correct way to do this if you don't have a vendor-provided delay function/macro. Don't disable optimizations, even for 1 line, it messes with the for loop.
        $endgroup$
        – Navin
        Aug 1 at 11:40















      $begingroup$
      Ok, so it's the first time I see a #pragma. If I understood correctly this kind of settings applies only to this small section of code. Would you advice this over an implementation which uses a timer?
      $endgroup$
      – a_bet
      Jul 31 at 13:22




      $begingroup$
      Ok, so it's the first time I see a #pragma. If I understood correctly this kind of settings applies only to this small section of code. Would you advice this over an implementation which uses a timer?
      $endgroup$
      – a_bet
      Jul 31 at 13:22




      2




      2




      $begingroup$
      You can also uses a non-nop instruction instead that will not be removed from the pipeline.
      $endgroup$
      – Harry Beadle
      Jul 31 at 14:03




      $begingroup$
      You can also uses a non-nop instruction instead that will not be removed from the pipeline.
      $endgroup$
      – Harry Beadle
      Jul 31 at 14:03




      4




      4




      $begingroup$
      @Jeroen3: -O0 does a bit worse than 5*T, something like 8 instructions with a bunch of overhead. It would be better to create a short optimized loop (or at least one which compiles the same way without using pragmas) and use __asm__ __volatile__(""); to prevent GCC from optimizing the loop away, i.e. something like this.
      $endgroup$
      – Groo
      Aug 1 at 3:28




      $begingroup$
      @Jeroen3: -O0 does a bit worse than 5*T, something like 8 instructions with a bunch of overhead. It would be better to create a short optimized loop (or at least one which compiles the same way without using pragmas) and use __asm__ __volatile__(""); to prevent GCC from optimizing the loop away, i.e. something like this.
      $endgroup$
      – Groo
      Aug 1 at 3:28




      1




      1




      $begingroup$
      @Groo I can't believe we are discussing the effectiveness of delay code in the most dirty way known to man. But yes, a volatile inline assembly line will work just as well. I believe the pragma expresses the intention better to any new readers.
      $endgroup$
      – Jeroen3
      Aug 1 at 5:42




      $begingroup$
      @Groo I can't believe we are discussing the effectiveness of delay code in the most dirty way known to man. But yes, a volatile inline assembly line will work just as well. I believe the pragma expresses the intention better to any new readers.
      $endgroup$
      – Jeroen3
      Aug 1 at 5:42




      1




      1




      $begingroup$
      asm volatile is the correct way to do this if you don't have a vendor-provided delay function/macro. Don't disable optimizations, even for 1 line, it messes with the for loop.
      $endgroup$
      – Navin
      Aug 1 at 11:40




      $begingroup$
      asm volatile is the correct way to do this if you don't have a vendor-provided delay function/macro. Don't disable optimizations, even for 1 line, it messes with the for loop.
      $endgroup$
      – Navin
      Aug 1 at 11:40













      12












      $begingroup$

      Use a timer if you have one available. The SysTick is very simple to configure, with documentation in the Cortex M4 User guide (or M0 if you're on the M0 part). Increment a number in its interrupt, and in your delay function you can block until the number has incremented a certain number of steps.



      Your part contains many timers if the systick is already in use, and the principle remains the same. If using a different timer you could configure it as a counter, and just look at its count register to avoid having an interrupt.



      If you really want to do it in software, then you can put asm("nop"); inside your loop. nop doesn't have to take time, the processor can remove them from its pipeline without executing it, but the compiler should still generate the loop.






      share|improve this answer









      $endgroup$














      • $begingroup$
        Systick is very simple to configure but I recommend using another timer as soon as you can since Systick has its limitations with regard to counter size and interrupts when being used for delays.
        $endgroup$
        – DKNguyen
        Jul 31 at 15:47










      • $begingroup$
        You don't even need to use interrupts, just poll the count register. This should be defined as a volatile, so the compiler will not optimise it out. IMO, SysTick is a good choice, as it's often configured to give an 'O/S timer', e.g. a microsecond timer. You will then have simple wait_microseconds(100); kind of things in the code.
        $endgroup$
        – Evil Dog Pie
        Aug 1 at 12:37










      • $begingroup$
        @EvilDogPie Isn't "just poll the count register" almost as bad as just having a tight loop? (although probably easier to stop GCC optimizing it away).
        $endgroup$
        – TripeHound
        Aug 1 at 13:57











      • $begingroup$
        @TripeHound Yes, it's exactly having a tight loop. That's what the o/p is asking for: a tight loop for a short delay that doesn't get removed by the compiler optimisation. There are places where a tight loop is not a bad way to do a short delay, particularly in an embedded system that's not multitasking.
        $endgroup$
        – Evil Dog Pie
        Aug 1 at 14:12















      12












      $begingroup$

      Use a timer if you have one available. The SysTick is very simple to configure, with documentation in the Cortex M4 User guide (or M0 if you're on the M0 part). Increment a number in its interrupt, and in your delay function you can block until the number has incremented a certain number of steps.



      Your part contains many timers if the systick is already in use, and the principle remains the same. If using a different timer you could configure it as a counter, and just look at its count register to avoid having an interrupt.



      If you really want to do it in software, then you can put asm("nop"); inside your loop. nop doesn't have to take time, the processor can remove them from its pipeline without executing it, but the compiler should still generate the loop.






      share|improve this answer









      $endgroup$














      • $begingroup$
        Systick is very simple to configure but I recommend using another timer as soon as you can since Systick has its limitations with regard to counter size and interrupts when being used for delays.
        $endgroup$
        – DKNguyen
        Jul 31 at 15:47










      • $begingroup$
        You don't even need to use interrupts, just poll the count register. This should be defined as a volatile, so the compiler will not optimise it out. IMO, SysTick is a good choice, as it's often configured to give an 'O/S timer', e.g. a microsecond timer. You will then have simple wait_microseconds(100); kind of things in the code.
        $endgroup$
        – Evil Dog Pie
        Aug 1 at 12:37










      • $begingroup$
        @EvilDogPie Isn't "just poll the count register" almost as bad as just having a tight loop? (although probably easier to stop GCC optimizing it away).
        $endgroup$
        – TripeHound
        Aug 1 at 13:57











      • $begingroup$
        @TripeHound Yes, it's exactly having a tight loop. That's what the o/p is asking for: a tight loop for a short delay that doesn't get removed by the compiler optimisation. There are places where a tight loop is not a bad way to do a short delay, particularly in an embedded system that's not multitasking.
        $endgroup$
        – Evil Dog Pie
        Aug 1 at 14:12













      12












      12








      12





      $begingroup$

      Use a timer if you have one available. The SysTick is very simple to configure, with documentation in the Cortex M4 User guide (or M0 if you're on the M0 part). Increment a number in its interrupt, and in your delay function you can block until the number has incremented a certain number of steps.



      Your part contains many timers if the systick is already in use, and the principle remains the same. If using a different timer you could configure it as a counter, and just look at its count register to avoid having an interrupt.



      If you really want to do it in software, then you can put asm("nop"); inside your loop. nop doesn't have to take time, the processor can remove them from its pipeline without executing it, but the compiler should still generate the loop.






      share|improve this answer









      $endgroup$



      Use a timer if you have one available. The SysTick is very simple to configure, with documentation in the Cortex M4 User guide (or M0 if you're on the M0 part). Increment a number in its interrupt, and in your delay function you can block until the number has incremented a certain number of steps.



      Your part contains many timers if the systick is already in use, and the principle remains the same. If using a different timer you could configure it as a counter, and just look at its count register to avoid having an interrupt.



      If you really want to do it in software, then you can put asm("nop"); inside your loop. nop doesn't have to take time, the processor can remove them from its pipeline without executing it, but the compiler should still generate the loop.







      share|improve this answer












      share|improve this answer



      share|improve this answer










      answered Jul 31 at 9:35









      ColinColin

      3,5562 gold badges11 silver badges26 bronze badges




      3,5562 gold badges11 silver badges26 bronze badges














      • $begingroup$
        Systick is very simple to configure but I recommend using another timer as soon as you can since Systick has its limitations with regard to counter size and interrupts when being used for delays.
        $endgroup$
        – DKNguyen
        Jul 31 at 15:47










      • $begingroup$
        You don't even need to use interrupts, just poll the count register. This should be defined as a volatile, so the compiler will not optimise it out. IMO, SysTick is a good choice, as it's often configured to give an 'O/S timer', e.g. a microsecond timer. You will then have simple wait_microseconds(100); kind of things in the code.
        $endgroup$
        – Evil Dog Pie
        Aug 1 at 12:37










      • $begingroup$
        @EvilDogPie Isn't "just poll the count register" almost as bad as just having a tight loop? (although probably easier to stop GCC optimizing it away).
        $endgroup$
        – TripeHound
        Aug 1 at 13:57











      • $begingroup$
        @TripeHound Yes, it's exactly having a tight loop. That's what the o/p is asking for: a tight loop for a short delay that doesn't get removed by the compiler optimisation. There are places where a tight loop is not a bad way to do a short delay, particularly in an embedded system that's not multitasking.
        $endgroup$
        – Evil Dog Pie
        Aug 1 at 14:12
















      • $begingroup$
        Systick is very simple to configure but I recommend using another timer as soon as you can since Systick has its limitations with regard to counter size and interrupts when being used for delays.
        $endgroup$
        – DKNguyen
        Jul 31 at 15:47










      • $begingroup$
        You don't even need to use interrupts, just poll the count register. This should be defined as a volatile, so the compiler will not optimise it out. IMO, SysTick is a good choice, as it's often configured to give an 'O/S timer', e.g. a microsecond timer. You will then have simple wait_microseconds(100); kind of things in the code.
        $endgroup$
        – Evil Dog Pie
        Aug 1 at 12:37










      • $begingroup$
        @EvilDogPie Isn't "just poll the count register" almost as bad as just having a tight loop? (although probably easier to stop GCC optimizing it away).
        $endgroup$
        – TripeHound
        Aug 1 at 13:57











      • $begingroup$
        @TripeHound Yes, it's exactly having a tight loop. That's what the o/p is asking for: a tight loop for a short delay that doesn't get removed by the compiler optimisation. There are places where a tight loop is not a bad way to do a short delay, particularly in an embedded system that's not multitasking.
        $endgroup$
        – Evil Dog Pie
        Aug 1 at 14:12















      $begingroup$
      Systick is very simple to configure but I recommend using another timer as soon as you can since Systick has its limitations with regard to counter size and interrupts when being used for delays.
      $endgroup$
      – DKNguyen
      Jul 31 at 15:47




      $begingroup$
      Systick is very simple to configure but I recommend using another timer as soon as you can since Systick has its limitations with regard to counter size and interrupts when being used for delays.
      $endgroup$
      – DKNguyen
      Jul 31 at 15:47












      $begingroup$
      You don't even need to use interrupts, just poll the count register. This should be defined as a volatile, so the compiler will not optimise it out. IMO, SysTick is a good choice, as it's often configured to give an 'O/S timer', e.g. a microsecond timer. You will then have simple wait_microseconds(100); kind of things in the code.
      $endgroup$
      – Evil Dog Pie
      Aug 1 at 12:37




      $begingroup$
      You don't even need to use interrupts, just poll the count register. This should be defined as a volatile, so the compiler will not optimise it out. IMO, SysTick is a good choice, as it's often configured to give an 'O/S timer', e.g. a microsecond timer. You will then have simple wait_microseconds(100); kind of things in the code.
      $endgroup$
      – Evil Dog Pie
      Aug 1 at 12:37












      $begingroup$
      @EvilDogPie Isn't "just poll the count register" almost as bad as just having a tight loop? (although probably easier to stop GCC optimizing it away).
      $endgroup$
      – TripeHound
      Aug 1 at 13:57





      $begingroup$
      @EvilDogPie Isn't "just poll the count register" almost as bad as just having a tight loop? (although probably easier to stop GCC optimizing it away).
      $endgroup$
      – TripeHound
      Aug 1 at 13:57













      $begingroup$
      @TripeHound Yes, it's exactly having a tight loop. That's what the o/p is asking for: a tight loop for a short delay that doesn't get removed by the compiler optimisation. There are places where a tight loop is not a bad way to do a short delay, particularly in an embedded system that's not multitasking.
      $endgroup$
      – Evil Dog Pie
      Aug 1 at 14:12




      $begingroup$
      @TripeHound Yes, it's exactly having a tight loop. That's what the o/p is asking for: a tight loop for a short delay that doesn't get removed by the compiler optimisation. There are places where a tight loop is not a bad way to do a short delay, particularly in an embedded system that's not multitasking.
      $endgroup$
      – Evil Dog Pie
      Aug 1 at 14:12











      10












      $begingroup$

      Not to detract from other answers here, but exactly what length delay do you need? Some datasheets mention nanoseconds; others microseconds; and still others milliseconds.



      • Nanosecond delays are usually best served by adding "time-wasting" instructions. Indeed, sometimes the very speed of the microcontroller means that the delay has been satisfied between the "set the pin high, then set the pin low" instructions that you show. Otherwise, one or more NOP, JMP-to-next-instruction, or other time-wasting instructions are sufficient.

      • Short microsecond delays could be done by a for loop (depending on CPU rate), but longer ones may warrant waiting on an actual timer;

      • Millisecond delays are usually best served by doing something else completely while waiting for the process to complete, then going back to ensure that it has actually been completed before continuing.

      In short, it all depends on the peripheral.






      share|improve this answer









      $endgroup$



















        10












        $begingroup$

        Not to detract from other answers here, but exactly what length delay do you need? Some datasheets mention nanoseconds; others microseconds; and still others milliseconds.



        • Nanosecond delays are usually best served by adding "time-wasting" instructions. Indeed, sometimes the very speed of the microcontroller means that the delay has been satisfied between the "set the pin high, then set the pin low" instructions that you show. Otherwise, one or more NOP, JMP-to-next-instruction, or other time-wasting instructions are sufficient.

        • Short microsecond delays could be done by a for loop (depending on CPU rate), but longer ones may warrant waiting on an actual timer;

        • Millisecond delays are usually best served by doing something else completely while waiting for the process to complete, then going back to ensure that it has actually been completed before continuing.

        In short, it all depends on the peripheral.






        share|improve this answer









        $endgroup$

















          10












          10








          10





          $begingroup$

          Not to detract from other answers here, but exactly what length delay do you need? Some datasheets mention nanoseconds; others microseconds; and still others milliseconds.



          • Nanosecond delays are usually best served by adding "time-wasting" instructions. Indeed, sometimes the very speed of the microcontroller means that the delay has been satisfied between the "set the pin high, then set the pin low" instructions that you show. Otherwise, one or more NOP, JMP-to-next-instruction, or other time-wasting instructions are sufficient.

          • Short microsecond delays could be done by a for loop (depending on CPU rate), but longer ones may warrant waiting on an actual timer;

          • Millisecond delays are usually best served by doing something else completely while waiting for the process to complete, then going back to ensure that it has actually been completed before continuing.

          In short, it all depends on the peripheral.






          share|improve this answer









          $endgroup$



          Not to detract from other answers here, but exactly what length delay do you need? Some datasheets mention nanoseconds; others microseconds; and still others milliseconds.



          • Nanosecond delays are usually best served by adding "time-wasting" instructions. Indeed, sometimes the very speed of the microcontroller means that the delay has been satisfied between the "set the pin high, then set the pin low" instructions that you show. Otherwise, one or more NOP, JMP-to-next-instruction, or other time-wasting instructions are sufficient.

          • Short microsecond delays could be done by a for loop (depending on CPU rate), but longer ones may warrant waiting on an actual timer;

          • Millisecond delays are usually best served by doing something else completely while waiting for the process to complete, then going back to ensure that it has actually been completed before continuing.

          In short, it all depends on the peripheral.







          share|improve this answer












          share|improve this answer



          share|improve this answer










          answered Jul 31 at 15:19









          John BurgerJohn Burger

          1,1111 gold badge5 silver badges14 bronze badges




          1,1111 gold badge5 silver badges14 bronze badges
























              3












              $begingroup$

              The best way is to use on-chip timers. Systick, RTC or peripheral timers. These have the advantage that the timing is precise, deterministic and can be easily adapted if CPU clock speed is changed. Optionally, you can even let the CPU sleep and use a wake-up interrupt.



              Dirty "busy-delay" loops on the other hand, are rarely accurate and come with various problems such as "tight coupling" to a specific CPU instruction set and clock.



              Some things of note:



              • Toggling a GPIO pin repeatedly is a bad idea since this will draw current needlessly, and potentially also cause EMC issues if the pin is connected to traces.

              • Using NOP instructions might not work. Many architectures (like Cortex M, iirc) are free to skip NOP on the CPU level and actually not execute them.

              If you want insist on generating a dirty busy-loop, then it is sufficient to just volatile qualify the loop iterator. For example:



              void dirty_delay (void)

              for(volatile uint32_t i=0; i<50000u; i++)
              ;



              This is guaranteed to generate various crap code. For example ARM gcc -O3 -ffreestanding gives:



              dirty_delay:
              mov r3, #0
              sub sp, sp, #8
              str r3, [sp, #4]
              ldr r3, [sp, #4]
              ldr r2, .L7
              cmp r3, r2
              bhi .L1
              .L3:
              ldr r3, [sp, #4]
              add r3, r3, #1
              str r3, [sp, #4]
              ldr r3, [sp, #4]
              cmp r3, r2
              bls .L3
              .L1:
              add sp, sp, #8
              bx lr
              .L7:
              .word 49999


              From there on you can in theory calculate how many ticks each instruction takes and change the magic number 50000 accordingly. Pipelining, branch prediction etc will mean that the code might execute faster than just the sum of the clock cycles though. Since the compiler decided to involve the stack, data caching could also play a part.



              My whole point here is that accurately calculating how much time this code will actually take is difficult. Trial & error benchmarking with a scope is probably a more sensible idea than attempting theoretical calculations.






              share|improve this answer









              $endgroup$



















                3












                $begingroup$

                The best way is to use on-chip timers. Systick, RTC or peripheral timers. These have the advantage that the timing is precise, deterministic and can be easily adapted if CPU clock speed is changed. Optionally, you can even let the CPU sleep and use a wake-up interrupt.



                Dirty "busy-delay" loops on the other hand, are rarely accurate and come with various problems such as "tight coupling" to a specific CPU instruction set and clock.



                Some things of note:



                • Toggling a GPIO pin repeatedly is a bad idea since this will draw current needlessly, and potentially also cause EMC issues if the pin is connected to traces.

                • Using NOP instructions might not work. Many architectures (like Cortex M, iirc) are free to skip NOP on the CPU level and actually not execute them.

                If you want insist on generating a dirty busy-loop, then it is sufficient to just volatile qualify the loop iterator. For example:



                void dirty_delay (void)

                for(volatile uint32_t i=0; i<50000u; i++)
                ;



                This is guaranteed to generate various crap code. For example ARM gcc -O3 -ffreestanding gives:



                dirty_delay:
                mov r3, #0
                sub sp, sp, #8
                str r3, [sp, #4]
                ldr r3, [sp, #4]
                ldr r2, .L7
                cmp r3, r2
                bhi .L1
                .L3:
                ldr r3, [sp, #4]
                add r3, r3, #1
                str r3, [sp, #4]
                ldr r3, [sp, #4]
                cmp r3, r2
                bls .L3
                .L1:
                add sp, sp, #8
                bx lr
                .L7:
                .word 49999


                From there on you can in theory calculate how many ticks each instruction takes and change the magic number 50000 accordingly. Pipelining, branch prediction etc will mean that the code might execute faster than just the sum of the clock cycles though. Since the compiler decided to involve the stack, data caching could also play a part.



                My whole point here is that accurately calculating how much time this code will actually take is difficult. Trial & error benchmarking with a scope is probably a more sensible idea than attempting theoretical calculations.






                share|improve this answer









                $endgroup$

















                  3












                  3








                  3





                  $begingroup$

                  The best way is to use on-chip timers. Systick, RTC or peripheral timers. These have the advantage that the timing is precise, deterministic and can be easily adapted if CPU clock speed is changed. Optionally, you can even let the CPU sleep and use a wake-up interrupt.



                  Dirty "busy-delay" loops on the other hand, are rarely accurate and come with various problems such as "tight coupling" to a specific CPU instruction set and clock.



                  Some things of note:



                  • Toggling a GPIO pin repeatedly is a bad idea since this will draw current needlessly, and potentially also cause EMC issues if the pin is connected to traces.

                  • Using NOP instructions might not work. Many architectures (like Cortex M, iirc) are free to skip NOP on the CPU level and actually not execute them.

                  If you want insist on generating a dirty busy-loop, then it is sufficient to just volatile qualify the loop iterator. For example:



                  void dirty_delay (void)

                  for(volatile uint32_t i=0; i<50000u; i++)
                  ;



                  This is guaranteed to generate various crap code. For example ARM gcc -O3 -ffreestanding gives:



                  dirty_delay:
                  mov r3, #0
                  sub sp, sp, #8
                  str r3, [sp, #4]
                  ldr r3, [sp, #4]
                  ldr r2, .L7
                  cmp r3, r2
                  bhi .L1
                  .L3:
                  ldr r3, [sp, #4]
                  add r3, r3, #1
                  str r3, [sp, #4]
                  ldr r3, [sp, #4]
                  cmp r3, r2
                  bls .L3
                  .L1:
                  add sp, sp, #8
                  bx lr
                  .L7:
                  .word 49999


                  From there on you can in theory calculate how many ticks each instruction takes and change the magic number 50000 accordingly. Pipelining, branch prediction etc will mean that the code might execute faster than just the sum of the clock cycles though. Since the compiler decided to involve the stack, data caching could also play a part.



                  My whole point here is that accurately calculating how much time this code will actually take is difficult. Trial & error benchmarking with a scope is probably a more sensible idea than attempting theoretical calculations.






                  share|improve this answer









                  $endgroup$



                  The best way is to use on-chip timers. Systick, RTC or peripheral timers. These have the advantage that the timing is precise, deterministic and can be easily adapted if CPU clock speed is changed. Optionally, you can even let the CPU sleep and use a wake-up interrupt.



                  Dirty "busy-delay" loops on the other hand, are rarely accurate and come with various problems such as "tight coupling" to a specific CPU instruction set and clock.



                  Some things of note:



                  • Toggling a GPIO pin repeatedly is a bad idea since this will draw current needlessly, and potentially also cause EMC issues if the pin is connected to traces.

                  • Using NOP instructions might not work. Many architectures (like Cortex M, iirc) are free to skip NOP on the CPU level and actually not execute them.

                  If you want insist on generating a dirty busy-loop, then it is sufficient to just volatile qualify the loop iterator. For example:



                  void dirty_delay (void)

                  for(volatile uint32_t i=0; i<50000u; i++)
                  ;



                  This is guaranteed to generate various crap code. For example ARM gcc -O3 -ffreestanding gives:



                  dirty_delay:
                  mov r3, #0
                  sub sp, sp, #8
                  str r3, [sp, #4]
                  ldr r3, [sp, #4]
                  ldr r2, .L7
                  cmp r3, r2
                  bhi .L1
                  .L3:
                  ldr r3, [sp, #4]
                  add r3, r3, #1
                  str r3, [sp, #4]
                  ldr r3, [sp, #4]
                  cmp r3, r2
                  bls .L3
                  .L1:
                  add sp, sp, #8
                  bx lr
                  .L7:
                  .word 49999


                  From there on you can in theory calculate how many ticks each instruction takes and change the magic number 50000 accordingly. Pipelining, branch prediction etc will mean that the code might execute faster than just the sum of the clock cycles though. Since the compiler decided to involve the stack, data caching could also play a part.



                  My whole point here is that accurately calculating how much time this code will actually take is difficult. Trial & error benchmarking with a scope is probably a more sensible idea than attempting theoretical calculations.







                  share|improve this answer












                  share|improve this answer



                  share|improve this answer










                  answered Aug 5 at 9:26









                  LundinLundin

                  4,97611 silver badges32 bronze badges




                  4,97611 silver badges32 bronze badges






























                      draft saved

                      draft discarded
















































                      Thanks for contributing an answer to Electrical Engineering Stack Exchange!


                      • Please be sure to answer the question. Provide details and share your research!

                      But avoid


                      • Asking for help, clarification, or responding to other answers.

                      • Making statements based on opinion; back them up with references or personal experience.

                      Use MathJax to format equations. MathJax reference.


                      To learn more, see our tips on writing great answers.




                      draft saved


                      draft discarded














                      StackExchange.ready(
                      function ()
                      StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2felectronics.stackexchange.com%2fquestions%2f450971%2fembedded-c-most-elegant-way-to-insert-a-delay%23new-answer', 'question_page');

                      );

                      Post as a guest















                      Required, but never shown





















































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown

































                      Required, but never shown














                      Required, but never shown












                      Required, but never shown







                      Required, but never shown







                      Popular posts from this blog

                      Category:9 (number) SubcategoriesMedia in category "9 (number)"Navigation menuUpload mediaGND ID: 4485639-8Library of Congress authority ID: sh85091979ReasonatorScholiaStatistics

                      Circuit construction for execution of conditional statements using least significant bitHow are two different registers being used as “control”?How exactly is the stated composite state of the two registers being produced using the $R_zz$ controlled rotations?Efficiently performing controlled rotations in HHLWould this quantum algorithm implementation work?How to prepare a superposed states of odd integers from $1$ to $sqrtN$?Why is this implementation of the order finding algorithm not working?Circuit construction for Hamiltonian simulationHow can I invert the least significant bit of a certain term of a superposed state?Implementing an oracleImplementing a controlled sum operation

                      Magento 2 “No Payment Methods” in Admin New OrderHow to integrate Paypal Express Checkout with the Magento APIMagento 1.5 - Sales > Order > edit order and shipping methods disappearAuto Invoice Check/Money Order Payment methodAdd more simple payment methods?Shipping methods not showingWhat should I do to change payment methods if changing the configuration has no effects?1.9 - No Payment Methods showing upMy Payment Methods not Showing for downloadable/virtual product when checkout?Magento2 API to access internal payment methodHow to call an existing payment methods in the registration form?