What happens if I start too many background jobs?Where do background jobs go?Is it possible to customise the prompt to show the if there are any background jobs?What happens to suspended jobs in unix?Modify background jobs and/or notify on job completion post-launchLive monitoring of background jobsAre background jobs and foreground jobs always in running state?What happens to background jobs after exiting the shell?How do I wait for background jobs in the background?how to wait for many background jobs in bashHow can we kill all the background jobs (running or not) in the current shell?

My advisor talks about me to his colleague

How did the Venus Express detect lightning?

Where can I go to avoid planes overhead?

Introducing Gladys, an intrepid globetrotter

In Russian, how do you idiomatically express the idea of the figurative "overnight"?

What does this wavy downward arrow preceding a piano chord mean?

Should I decline this job offer that requires relocating to an area with high cost of living?

Where are the "shires" in the UK?

finding a solution for this recurrence relation

SafeCracker #3 - We've Been Blocked

Floor of Riemann zeta function

Can my company stop me from working overtime?

ZSPL language, anyone heard of it?

Would glacier 'trees' be plausible?

How do inspiraling black holes get closer?

60s/70s science fiction novel where a man (after years of trying) finally succeeds to make a coin levitate by sheer concentration

Out of scope work duties and resignation

Why is "breaking the mould" positively connoted?

What was the first story to feature the plot "the monsters were human all along"?

What exactly are the `size issues' preventing formation of presheaves being a left adjoint to some forgetful functor?

What does "Managed by Windows" do in the Power options for network connection?

Adjacent DEM color matching in QGIS

Identifying characters

Are Finitely generated modules over a ring also finitely generated over a subring containing the identity?



What happens if I start too many background jobs?


Where do background jobs go?Is it possible to customise the prompt to show the if there are any background jobs?What happens to suspended jobs in unix?Modify background jobs and/or notify on job completion post-launchLive monitoring of background jobsAre background jobs and foreground jobs always in running state?What happens to background jobs after exiting the shell?How do I wait for background jobs in the background?how to wait for many background jobs in bashHow can we kill all the background jobs (running or not) in the current shell?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








13















I need to do some work on 700 network devices using an expect script. I can get it done sequentially, but so far the runtime is around 24 hours. This is mostly due to the time it takes to establish a connection and the delay in the output from these devices (old ones). I'm able to establish two connections and have them run in parallel just fine, but how far can I push that?



I don't imagine I could do all 700 of them at once, surely there's some limit to the no. of telnet connections my VM can manage.



If I did try to start 700 of them in some sort of loop like this:



for node in `ls ~/sagLogs/`; do 
foo &
done


With



  • CPU 12 CPUs x Intel(R) Xeon(R) CPU E5649 @ 2.53GHz


  • Memory 47.94 GB


My question is:



  1. Could all 700 instances possibly run concurrently?

  2. How far could I get until my server reaches its limit?

  3. When that limit is reached, will it just wait to begin the next iteration off foo or will the box crash?

I'm running in a corporate production environment unfortunately, so I can't exactly just try and see what happens.










share|improve this question

















  • 3





    I've had good luck with parallel, using around 50 concurrent jobs. It's a great medium between parallelism of 1 and 700. The other nice thing is that's batchless. A single stalled connection will only stall itself, not any of the others. The main downside is error management. None of these shell-based approaches will gracefully handle errors. You'll have to manually check for success yourself, and do your own retries.

    – Adam
    Apr 29 at 21:38






  • 1





    Your task queue may be 700 today, but can the size expand? Watch for swap space to grow - that is indication you have reached memory limit. And cpu % is not a good measure (for linux/unix), better to consider load average (run queue length).

    – ChuckCottrill
    Apr 29 at 21:45






  • 1





    The most recent way I broke production at my still-kinda-new job was by accidentally running a million plus short-lived background jobs at once. They involved JVMs (wait wait put the pitchforks down), so the consequences were 'limited' to hundreds of thousands of error report files that threads couldn't be started.

    – michaelb958
    Apr 29 at 23:04






  • 4





    Nitpick: Don't parse ls output

    – l0b0
    Apr 30 at 1:18






  • 1





    @KuboMD And as long as nobody else ever wants to use your code.

    – l0b0
    2 days ago

















13















I need to do some work on 700 network devices using an expect script. I can get it done sequentially, but so far the runtime is around 24 hours. This is mostly due to the time it takes to establish a connection and the delay in the output from these devices (old ones). I'm able to establish two connections and have them run in parallel just fine, but how far can I push that?



I don't imagine I could do all 700 of them at once, surely there's some limit to the no. of telnet connections my VM can manage.



If I did try to start 700 of them in some sort of loop like this:



for node in `ls ~/sagLogs/`; do 
foo &
done


With



  • CPU 12 CPUs x Intel(R) Xeon(R) CPU E5649 @ 2.53GHz


  • Memory 47.94 GB


My question is:



  1. Could all 700 instances possibly run concurrently?

  2. How far could I get until my server reaches its limit?

  3. When that limit is reached, will it just wait to begin the next iteration off foo or will the box crash?

I'm running in a corporate production environment unfortunately, so I can't exactly just try and see what happens.










share|improve this question

















  • 3





    I've had good luck with parallel, using around 50 concurrent jobs. It's a great medium between parallelism of 1 and 700. The other nice thing is that's batchless. A single stalled connection will only stall itself, not any of the others. The main downside is error management. None of these shell-based approaches will gracefully handle errors. You'll have to manually check for success yourself, and do your own retries.

    – Adam
    Apr 29 at 21:38






  • 1





    Your task queue may be 700 today, but can the size expand? Watch for swap space to grow - that is indication you have reached memory limit. And cpu % is not a good measure (for linux/unix), better to consider load average (run queue length).

    – ChuckCottrill
    Apr 29 at 21:45






  • 1





    The most recent way I broke production at my still-kinda-new job was by accidentally running a million plus short-lived background jobs at once. They involved JVMs (wait wait put the pitchforks down), so the consequences were 'limited' to hundreds of thousands of error report files that threads couldn't be started.

    – michaelb958
    Apr 29 at 23:04






  • 4





    Nitpick: Don't parse ls output

    – l0b0
    Apr 30 at 1:18






  • 1





    @KuboMD And as long as nobody else ever wants to use your code.

    – l0b0
    2 days ago













13












13








13


1






I need to do some work on 700 network devices using an expect script. I can get it done sequentially, but so far the runtime is around 24 hours. This is mostly due to the time it takes to establish a connection and the delay in the output from these devices (old ones). I'm able to establish two connections and have them run in parallel just fine, but how far can I push that?



I don't imagine I could do all 700 of them at once, surely there's some limit to the no. of telnet connections my VM can manage.



If I did try to start 700 of them in some sort of loop like this:



for node in `ls ~/sagLogs/`; do 
foo &
done


With



  • CPU 12 CPUs x Intel(R) Xeon(R) CPU E5649 @ 2.53GHz


  • Memory 47.94 GB


My question is:



  1. Could all 700 instances possibly run concurrently?

  2. How far could I get until my server reaches its limit?

  3. When that limit is reached, will it just wait to begin the next iteration off foo or will the box crash?

I'm running in a corporate production environment unfortunately, so I can't exactly just try and see what happens.










share|improve this question














I need to do some work on 700 network devices using an expect script. I can get it done sequentially, but so far the runtime is around 24 hours. This is mostly due to the time it takes to establish a connection and the delay in the output from these devices (old ones). I'm able to establish two connections and have them run in parallel just fine, but how far can I push that?



I don't imagine I could do all 700 of them at once, surely there's some limit to the no. of telnet connections my VM can manage.



If I did try to start 700 of them in some sort of loop like this:



for node in `ls ~/sagLogs/`; do 
foo &
done


With



  • CPU 12 CPUs x Intel(R) Xeon(R) CPU E5649 @ 2.53GHz


  • Memory 47.94 GB


My question is:



  1. Could all 700 instances possibly run concurrently?

  2. How far could I get until my server reaches its limit?

  3. When that limit is reached, will it just wait to begin the next iteration off foo or will the box crash?

I'm running in a corporate production environment unfortunately, so I can't exactly just try and see what happens.







bash background-process expect telnet jobs






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Apr 29 at 17:30









KuboMDKuboMD

23810




23810







  • 3





    I've had good luck with parallel, using around 50 concurrent jobs. It's a great medium between parallelism of 1 and 700. The other nice thing is that's batchless. A single stalled connection will only stall itself, not any of the others. The main downside is error management. None of these shell-based approaches will gracefully handle errors. You'll have to manually check for success yourself, and do your own retries.

    – Adam
    Apr 29 at 21:38






  • 1





    Your task queue may be 700 today, but can the size expand? Watch for swap space to grow - that is indication you have reached memory limit. And cpu % is not a good measure (for linux/unix), better to consider load average (run queue length).

    – ChuckCottrill
    Apr 29 at 21:45






  • 1





    The most recent way I broke production at my still-kinda-new job was by accidentally running a million plus short-lived background jobs at once. They involved JVMs (wait wait put the pitchforks down), so the consequences were 'limited' to hundreds of thousands of error report files that threads couldn't be started.

    – michaelb958
    Apr 29 at 23:04






  • 4





    Nitpick: Don't parse ls output

    – l0b0
    Apr 30 at 1:18






  • 1





    @KuboMD And as long as nobody else ever wants to use your code.

    – l0b0
    2 days ago












  • 3





    I've had good luck with parallel, using around 50 concurrent jobs. It's a great medium between parallelism of 1 and 700. The other nice thing is that's batchless. A single stalled connection will only stall itself, not any of the others. The main downside is error management. None of these shell-based approaches will gracefully handle errors. You'll have to manually check for success yourself, and do your own retries.

    – Adam
    Apr 29 at 21:38






  • 1





    Your task queue may be 700 today, but can the size expand? Watch for swap space to grow - that is indication you have reached memory limit. And cpu % is not a good measure (for linux/unix), better to consider load average (run queue length).

    – ChuckCottrill
    Apr 29 at 21:45






  • 1





    The most recent way I broke production at my still-kinda-new job was by accidentally running a million plus short-lived background jobs at once. They involved JVMs (wait wait put the pitchforks down), so the consequences were 'limited' to hundreds of thousands of error report files that threads couldn't be started.

    – michaelb958
    Apr 29 at 23:04






  • 4





    Nitpick: Don't parse ls output

    – l0b0
    Apr 30 at 1:18






  • 1





    @KuboMD And as long as nobody else ever wants to use your code.

    – l0b0
    2 days ago







3




3





I've had good luck with parallel, using around 50 concurrent jobs. It's a great medium between parallelism of 1 and 700. The other nice thing is that's batchless. A single stalled connection will only stall itself, not any of the others. The main downside is error management. None of these shell-based approaches will gracefully handle errors. You'll have to manually check for success yourself, and do your own retries.

– Adam
Apr 29 at 21:38





I've had good luck with parallel, using around 50 concurrent jobs. It's a great medium between parallelism of 1 and 700. The other nice thing is that's batchless. A single stalled connection will only stall itself, not any of the others. The main downside is error management. None of these shell-based approaches will gracefully handle errors. You'll have to manually check for success yourself, and do your own retries.

– Adam
Apr 29 at 21:38




1




1





Your task queue may be 700 today, but can the size expand? Watch for swap space to grow - that is indication you have reached memory limit. And cpu % is not a good measure (for linux/unix), better to consider load average (run queue length).

– ChuckCottrill
Apr 29 at 21:45





Your task queue may be 700 today, but can the size expand? Watch for swap space to grow - that is indication you have reached memory limit. And cpu % is not a good measure (for linux/unix), better to consider load average (run queue length).

– ChuckCottrill
Apr 29 at 21:45




1




1





The most recent way I broke production at my still-kinda-new job was by accidentally running a million plus short-lived background jobs at once. They involved JVMs (wait wait put the pitchforks down), so the consequences were 'limited' to hundreds of thousands of error report files that threads couldn't be started.

– michaelb958
Apr 29 at 23:04





The most recent way I broke production at my still-kinda-new job was by accidentally running a million plus short-lived background jobs at once. They involved JVMs (wait wait put the pitchforks down), so the consequences were 'limited' to hundreds of thousands of error report files that threads couldn't be started.

– michaelb958
Apr 29 at 23:04




4




4





Nitpick: Don't parse ls output

– l0b0
Apr 30 at 1:18





Nitpick: Don't parse ls output

– l0b0
Apr 30 at 1:18




1




1





@KuboMD And as long as nobody else ever wants to use your code.

– l0b0
2 days ago





@KuboMD And as long as nobody else ever wants to use your code.

– l0b0
2 days ago










4 Answers
4






active

oldest

votes


















16















Could all 700 instances possibly run concurrently?




That depends on what you mean by concurrently. If we're being picky, then no, they can't unless you have 700 threads of execution on your system you can utilize (so probably not). Realistically though, yes, they probably can, provided you have enough RAM and/or swap space on the system. UNIX and it's various children are remarkably good at managing huge levels of concurrency, that's part of why they're so popular for large-scale HPC usage.




How far could I get until my server reaches its limit?




This is impossible to answer concretely without a whole lot more info. Pretty much, you need to have enough memory to meet:



  • The entire run-time memory requirements of one job, times 700.

  • The memory requirements of bash to manage that many jobs (bash is not horrible about this, but the job control isn't exactly memory efficient).

  • Any other memory requirements on the system.

Assuming you meet that (again, with only 50GB of RAM, you still ahve to deal with other issues:



  • How much CPU time is going to be wasted by bash on job control? Probably not much, but with hundreds of jobs, it could be significant.

  • How much network bandwidth is this going to need? Just opening all those connections may swamp your network for a couple of minutes depending on your bandwidth and latency.

  • Many other things I probably haven't thought of.


When that limit is reached, will it just wait to begin the next iteration off foo or will the box crash?




It depends on what limit is hit. If it's memory, something will die on the system (more specifically, get killed by the kernel in an attempt to free up memory) or the system itself may crash (it's not unusual to configure systems to intentionally crash when running out of memory). If it's CPU time, it will just keep going without issue, it'll just be impossible to do much else on the system. If it's the network though, you might crash other systems or services.




What you really need here is not to run all the jobs at the same time. Instead, split them into batches, and run all the jobs within a batch at the same time, let them finish, then start the next batch. GNU Parallel (https://www.gnu.org/software/parallel/) can be used for this, but it's less than ideal at that scale in a production environment (if you go with it, don't get too aggressive, like I said, you might swamp the network and affect systems you otherwise would not be touching). I would really recommend looking into a proper network orchestration tool like Ansible (https://www.ansible.com/), as that will not only solve your concurrency issues (Ansible does batching like I mentioned above automatically), but also give you a lot of other useful features to work with (like idempotent execution of tasks, nice status reports, and native integration with a very large number of other tools).






share|improve this answer

























  • There are ways to run a limited number of background tasks (using bash, perl, python, et al), monitor for task completion, and run more tasks as prior tasks complete. A simple approach would be to collect batches of tasks represented by files in subdirectories, and process a batch at a time. There are other ways...

    – ChuckCottrill
    Apr 29 at 21:41











  • Does this also include unix-like systems? And what is "GUN parallel"?

    – Biswapriyo
    2 days ago






  • 2





    @ChuckCottrill Yes, there are indeed other ways this could be done. Given my own experience dealing with this type of thing though, it's almost always better to just get a real orchestration tool than to try and roll your own solution, especially once you're past a few dozen systems in terms of scale.

    – Austin Hemmelgarn
    2 days ago






  • 2





    @Baldrickk geekz.co.uk/lovesraymond/archive/gun-linux

    – pipe
    2 days ago






  • 3





    @forest Yes, you could use rlimits to prevent the system from crashing, but getting them right in a case like this is not easy (you kind of need to know what the resource requirements for the tasks are beforehand) and doesn't protect the rest of the network from any impact these jobs may cause (which is arguably a potentially much bigger issue than crashing the local system).

    – Austin Hemmelgarn
    2 days ago


















12














It's hard to say specifically how many instances could be run as background jobs in the manner you describe. But a normal server can certainly maintain 700 concurrent connections as long as you do it correctly. Webservers do this all the time.



May I suggest that you use GNU parallel (https://www.gnu.org/software/parallel/) or something similar to accomplish this? It would give you a number of advantages to the background job approach:



  • You can easily change the number of concurrent sessions.

  • And it will wait until sessions complete before it starts new ones.

  • It it easier to abort.

Have a look here for a quick start: https://www.gnu.org/software/parallel/parallel_tutorial.html#A-single-input-source






share|improve this answer








New contributor




laenkeio is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.















  • 1





    Interesting! I'll take a look at this. Do you know if attempting this kind of operation (without the help of Parallel) would risk crashing the hypervisor?

    – KuboMD
    Apr 29 at 18:15






  • 2





    @KuboMD if you can crash the hypervisor with something so mundane, it's a bug in the hypervisor :)

    – hobbs
    2 days ago











  • as an aside, web servers often use threading or event-based processing (example: gunicorn.org)

    – ChuckCottrill
    2 days ago


















9














Using & for parallel processing is fine when doing a few, and when you monitor progress. But if you are running in a corporate production environment you need something that gives you better control.



ls ~/sagLogs/ | parallel --delay 0.5 --memfree 1G -j0 --joblog my.log --retries 10 foo 


This will run foo for each file in ~/sagLogs. It start a job every 0.5 seconds, it will run as many jobs in parallel as possible as long as 1 GB RAM is free, but will respect the limits on your system (e.g. number of files and processes). Typically this means you will be running 250 jobs in parallel if you have not adjusted the number of open files allowed. If you adjust the number of open files, you should have no problem running 32000 in parallel - as long as you have enough memory.



If a job fails (i.e. returns with an error code) it will be retried 10 times.



my.log will tell you if a job succeed (after possibly retries) or not.






share|improve this answer























  • This looks very promising, thank you.

    – KuboMD
    2 days ago











  • Ran a simple test doing cat ~/sagLogs/* >> ~/woah | parallel and holy moly that was fast. 1,054,552 lines in the blink of an eye.

    – KuboMD
    2 days ago






  • 3





    The command you gave has dual redirection, so I donot think it does what you intend it to do. GNU Parallel has an overhead of 10 ms per job, so 1M jobs should take in the order of 3 hours.

    – Ole Tange
    2 days ago






  • 1





    It is not applicable at all if all you want to do is simply to concatenate the files.

    – Ole Tange
    2 days ago







  • 1





    @KuboMD a trivial CPU busy loop like awk 'BEGINfor(i=rand()*10000000; i<100000000;i++)' would work for playing around with. Or try it on a task like sleep 10 to see it keep n jobs in flight without using much CPU time. e.g. time parallel sleep ::: 100..1 to run sleeps from 100 down to 1 second.

    – Peter Cordes
    2 days ago



















1















What happens if I start too many background jobs?




the system will become slow and unresponsive, worst case is so unresponsive it would be best to just push the power button and do a hard reboot... this would be running something as root where it had the privilege to get away with doing that. If your bash script is running under regular user privileges, then the first thing that comes to mind is /etc/security/limits.conf and /etc/systemd/system.conf and all the variables therein to [ideally speaking] prevent user(s) from overloading the system.




  • cpu = xeon E5649, that is a 12-core cpu; so you have 12 cores for 12 processes to run concurrently each utilizing one of twelve cores at 100%. If you kick off 24 processes, then each would run at 50% utilization on each of twelve cores, 700 processes = 1.7% but it's a computer as long as everything completes properly in an ok amount of time then that = success; being efficient is not always relevant.



    1. Could all 700 instances possibly run concurrently? Certainly, 700 is not a large number; my /etc/security/limits.conf maxproc default is 4,135,275 for example


    2. How far could I get until my server reaches its limit? Much farther than 700 I'm sure.


    3. Limits... what will happen if the script is kicked off under a user account [and generally root as well limits.conf pretty much applies to everyone] is that the script will just exit after having tried to do foo & 700 times; you would expect to then see 700 foo processes each with a different pid but you might only see 456 (random number choice) and the other 244 never started because they got blocked by some security or systemd limit.



Million $ question: how many should u run concurrently?



being involved with network and you said each will do a telnet connection, educated guess is you will run into network limits and overhead before you do for cpu and ram limits. But I don't know what you are doing specifically, what will likely happen is u can kick off all 700 at once, but things will automatically block until previous processes and network connections finish and close based on various system limits, or something like the first 500 will kick off then the remaining 200 won't because system or kernel limits prevent it. But however many run at once, there will be some sweetish spot to get things done as fast as possible... minimizing overhead and increasing efficiency. Being 12 cores (or 24 if you have 2 cpu's) then start with 12 (or 24) at once and then increase that concurrent batch number by 12 or 24 until you don't see run time improvement.



hint: google max telnet connections and see how this applies to your system(s). Also don't forget about firewalls. Also do quick calculation of memory needed per process x 700; make sure < available RAM (about 50gb in your case) otherwise system will start using SWAP and basically become unresponsive. So kick of 12, 24, N processes at a time and monitor RAM free, then increase N already having some knowledge of what's happening.




By default, RHEL limits the number of telnet connections from a single host to 10 simultaneous sessions. This is a security feature... set to 10, /etc/xinetd.conf, change “per_source” value.







share|improve this answer

























    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: false,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f516203%2fwhat-happens-if-i-start-too-many-background-jobs%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    4 Answers
    4






    active

    oldest

    votes








    4 Answers
    4






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    16















    Could all 700 instances possibly run concurrently?




    That depends on what you mean by concurrently. If we're being picky, then no, they can't unless you have 700 threads of execution on your system you can utilize (so probably not). Realistically though, yes, they probably can, provided you have enough RAM and/or swap space on the system. UNIX and it's various children are remarkably good at managing huge levels of concurrency, that's part of why they're so popular for large-scale HPC usage.




    How far could I get until my server reaches its limit?




    This is impossible to answer concretely without a whole lot more info. Pretty much, you need to have enough memory to meet:



    • The entire run-time memory requirements of one job, times 700.

    • The memory requirements of bash to manage that many jobs (bash is not horrible about this, but the job control isn't exactly memory efficient).

    • Any other memory requirements on the system.

    Assuming you meet that (again, with only 50GB of RAM, you still ahve to deal with other issues:



    • How much CPU time is going to be wasted by bash on job control? Probably not much, but with hundreds of jobs, it could be significant.

    • How much network bandwidth is this going to need? Just opening all those connections may swamp your network for a couple of minutes depending on your bandwidth and latency.

    • Many other things I probably haven't thought of.


    When that limit is reached, will it just wait to begin the next iteration off foo or will the box crash?




    It depends on what limit is hit. If it's memory, something will die on the system (more specifically, get killed by the kernel in an attempt to free up memory) or the system itself may crash (it's not unusual to configure systems to intentionally crash when running out of memory). If it's CPU time, it will just keep going without issue, it'll just be impossible to do much else on the system. If it's the network though, you might crash other systems or services.




    What you really need here is not to run all the jobs at the same time. Instead, split them into batches, and run all the jobs within a batch at the same time, let them finish, then start the next batch. GNU Parallel (https://www.gnu.org/software/parallel/) can be used for this, but it's less than ideal at that scale in a production environment (if you go with it, don't get too aggressive, like I said, you might swamp the network and affect systems you otherwise would not be touching). I would really recommend looking into a proper network orchestration tool like Ansible (https://www.ansible.com/), as that will not only solve your concurrency issues (Ansible does batching like I mentioned above automatically), but also give you a lot of other useful features to work with (like idempotent execution of tasks, nice status reports, and native integration with a very large number of other tools).






    share|improve this answer

























    • There are ways to run a limited number of background tasks (using bash, perl, python, et al), monitor for task completion, and run more tasks as prior tasks complete. A simple approach would be to collect batches of tasks represented by files in subdirectories, and process a batch at a time. There are other ways...

      – ChuckCottrill
      Apr 29 at 21:41











    • Does this also include unix-like systems? And what is "GUN parallel"?

      – Biswapriyo
      2 days ago






    • 2





      @ChuckCottrill Yes, there are indeed other ways this could be done. Given my own experience dealing with this type of thing though, it's almost always better to just get a real orchestration tool than to try and roll your own solution, especially once you're past a few dozen systems in terms of scale.

      – Austin Hemmelgarn
      2 days ago






    • 2





      @Baldrickk geekz.co.uk/lovesraymond/archive/gun-linux

      – pipe
      2 days ago






    • 3





      @forest Yes, you could use rlimits to prevent the system from crashing, but getting them right in a case like this is not easy (you kind of need to know what the resource requirements for the tasks are beforehand) and doesn't protect the rest of the network from any impact these jobs may cause (which is arguably a potentially much bigger issue than crashing the local system).

      – Austin Hemmelgarn
      2 days ago















    16















    Could all 700 instances possibly run concurrently?




    That depends on what you mean by concurrently. If we're being picky, then no, they can't unless you have 700 threads of execution on your system you can utilize (so probably not). Realistically though, yes, they probably can, provided you have enough RAM and/or swap space on the system. UNIX and it's various children are remarkably good at managing huge levels of concurrency, that's part of why they're so popular for large-scale HPC usage.




    How far could I get until my server reaches its limit?




    This is impossible to answer concretely without a whole lot more info. Pretty much, you need to have enough memory to meet:



    • The entire run-time memory requirements of one job, times 700.

    • The memory requirements of bash to manage that many jobs (bash is not horrible about this, but the job control isn't exactly memory efficient).

    • Any other memory requirements on the system.

    Assuming you meet that (again, with only 50GB of RAM, you still ahve to deal with other issues:



    • How much CPU time is going to be wasted by bash on job control? Probably not much, but with hundreds of jobs, it could be significant.

    • How much network bandwidth is this going to need? Just opening all those connections may swamp your network for a couple of minutes depending on your bandwidth and latency.

    • Many other things I probably haven't thought of.


    When that limit is reached, will it just wait to begin the next iteration off foo or will the box crash?




    It depends on what limit is hit. If it's memory, something will die on the system (more specifically, get killed by the kernel in an attempt to free up memory) or the system itself may crash (it's not unusual to configure systems to intentionally crash when running out of memory). If it's CPU time, it will just keep going without issue, it'll just be impossible to do much else on the system. If it's the network though, you might crash other systems or services.




    What you really need here is not to run all the jobs at the same time. Instead, split them into batches, and run all the jobs within a batch at the same time, let them finish, then start the next batch. GNU Parallel (https://www.gnu.org/software/parallel/) can be used for this, but it's less than ideal at that scale in a production environment (if you go with it, don't get too aggressive, like I said, you might swamp the network and affect systems you otherwise would not be touching). I would really recommend looking into a proper network orchestration tool like Ansible (https://www.ansible.com/), as that will not only solve your concurrency issues (Ansible does batching like I mentioned above automatically), but also give you a lot of other useful features to work with (like idempotent execution of tasks, nice status reports, and native integration with a very large number of other tools).






    share|improve this answer

























    • There are ways to run a limited number of background tasks (using bash, perl, python, et al), monitor for task completion, and run more tasks as prior tasks complete. A simple approach would be to collect batches of tasks represented by files in subdirectories, and process a batch at a time. There are other ways...

      – ChuckCottrill
      Apr 29 at 21:41











    • Does this also include unix-like systems? And what is "GUN parallel"?

      – Biswapriyo
      2 days ago






    • 2





      @ChuckCottrill Yes, there are indeed other ways this could be done. Given my own experience dealing with this type of thing though, it's almost always better to just get a real orchestration tool than to try and roll your own solution, especially once you're past a few dozen systems in terms of scale.

      – Austin Hemmelgarn
      2 days ago






    • 2





      @Baldrickk geekz.co.uk/lovesraymond/archive/gun-linux

      – pipe
      2 days ago






    • 3





      @forest Yes, you could use rlimits to prevent the system from crashing, but getting them right in a case like this is not easy (you kind of need to know what the resource requirements for the tasks are beforehand) and doesn't protect the rest of the network from any impact these jobs may cause (which is arguably a potentially much bigger issue than crashing the local system).

      – Austin Hemmelgarn
      2 days ago













    16












    16








    16








    Could all 700 instances possibly run concurrently?




    That depends on what you mean by concurrently. If we're being picky, then no, they can't unless you have 700 threads of execution on your system you can utilize (so probably not). Realistically though, yes, they probably can, provided you have enough RAM and/or swap space on the system. UNIX and it's various children are remarkably good at managing huge levels of concurrency, that's part of why they're so popular for large-scale HPC usage.




    How far could I get until my server reaches its limit?




    This is impossible to answer concretely without a whole lot more info. Pretty much, you need to have enough memory to meet:



    • The entire run-time memory requirements of one job, times 700.

    • The memory requirements of bash to manage that many jobs (bash is not horrible about this, but the job control isn't exactly memory efficient).

    • Any other memory requirements on the system.

    Assuming you meet that (again, with only 50GB of RAM, you still ahve to deal with other issues:



    • How much CPU time is going to be wasted by bash on job control? Probably not much, but with hundreds of jobs, it could be significant.

    • How much network bandwidth is this going to need? Just opening all those connections may swamp your network for a couple of minutes depending on your bandwidth and latency.

    • Many other things I probably haven't thought of.


    When that limit is reached, will it just wait to begin the next iteration off foo or will the box crash?




    It depends on what limit is hit. If it's memory, something will die on the system (more specifically, get killed by the kernel in an attempt to free up memory) or the system itself may crash (it's not unusual to configure systems to intentionally crash when running out of memory). If it's CPU time, it will just keep going without issue, it'll just be impossible to do much else on the system. If it's the network though, you might crash other systems or services.




    What you really need here is not to run all the jobs at the same time. Instead, split them into batches, and run all the jobs within a batch at the same time, let them finish, then start the next batch. GNU Parallel (https://www.gnu.org/software/parallel/) can be used for this, but it's less than ideal at that scale in a production environment (if you go with it, don't get too aggressive, like I said, you might swamp the network and affect systems you otherwise would not be touching). I would really recommend looking into a proper network orchestration tool like Ansible (https://www.ansible.com/), as that will not only solve your concurrency issues (Ansible does batching like I mentioned above automatically), but also give you a lot of other useful features to work with (like idempotent execution of tasks, nice status reports, and native integration with a very large number of other tools).






    share|improve this answer
















    Could all 700 instances possibly run concurrently?




    That depends on what you mean by concurrently. If we're being picky, then no, they can't unless you have 700 threads of execution on your system you can utilize (so probably not). Realistically though, yes, they probably can, provided you have enough RAM and/or swap space on the system. UNIX and it's various children are remarkably good at managing huge levels of concurrency, that's part of why they're so popular for large-scale HPC usage.




    How far could I get until my server reaches its limit?




    This is impossible to answer concretely without a whole lot more info. Pretty much, you need to have enough memory to meet:



    • The entire run-time memory requirements of one job, times 700.

    • The memory requirements of bash to manage that many jobs (bash is not horrible about this, but the job control isn't exactly memory efficient).

    • Any other memory requirements on the system.

    Assuming you meet that (again, with only 50GB of RAM, you still ahve to deal with other issues:



    • How much CPU time is going to be wasted by bash on job control? Probably not much, but with hundreds of jobs, it could be significant.

    • How much network bandwidth is this going to need? Just opening all those connections may swamp your network for a couple of minutes depending on your bandwidth and latency.

    • Many other things I probably haven't thought of.


    When that limit is reached, will it just wait to begin the next iteration off foo or will the box crash?




    It depends on what limit is hit. If it's memory, something will die on the system (more specifically, get killed by the kernel in an attempt to free up memory) or the system itself may crash (it's not unusual to configure systems to intentionally crash when running out of memory). If it's CPU time, it will just keep going without issue, it'll just be impossible to do much else on the system. If it's the network though, you might crash other systems or services.




    What you really need here is not to run all the jobs at the same time. Instead, split them into batches, and run all the jobs within a batch at the same time, let them finish, then start the next batch. GNU Parallel (https://www.gnu.org/software/parallel/) can be used for this, but it's less than ideal at that scale in a production environment (if you go with it, don't get too aggressive, like I said, you might swamp the network and affect systems you otherwise would not be touching). I would really recommend looking into a proper network orchestration tool like Ansible (https://www.ansible.com/), as that will not only solve your concurrency issues (Ansible does batching like I mentioned above automatically), but also give you a lot of other useful features to work with (like idempotent execution of tasks, nice status reports, and native integration with a very large number of other tools).







    share|improve this answer














    share|improve this answer



    share|improve this answer








    edited 2 days ago

























    answered Apr 29 at 19:50









    Austin HemmelgarnAustin Hemmelgarn

    6,47111120




    6,47111120












    • There are ways to run a limited number of background tasks (using bash, perl, python, et al), monitor for task completion, and run more tasks as prior tasks complete. A simple approach would be to collect batches of tasks represented by files in subdirectories, and process a batch at a time. There are other ways...

      – ChuckCottrill
      Apr 29 at 21:41











    • Does this also include unix-like systems? And what is "GUN parallel"?

      – Biswapriyo
      2 days ago






    • 2





      @ChuckCottrill Yes, there are indeed other ways this could be done. Given my own experience dealing with this type of thing though, it's almost always better to just get a real orchestration tool than to try and roll your own solution, especially once you're past a few dozen systems in terms of scale.

      – Austin Hemmelgarn
      2 days ago






    • 2





      @Baldrickk geekz.co.uk/lovesraymond/archive/gun-linux

      – pipe
      2 days ago






    • 3





      @forest Yes, you could use rlimits to prevent the system from crashing, but getting them right in a case like this is not easy (you kind of need to know what the resource requirements for the tasks are beforehand) and doesn't protect the rest of the network from any impact these jobs may cause (which is arguably a potentially much bigger issue than crashing the local system).

      – Austin Hemmelgarn
      2 days ago

















    • There are ways to run a limited number of background tasks (using bash, perl, python, et al), monitor for task completion, and run more tasks as prior tasks complete. A simple approach would be to collect batches of tasks represented by files in subdirectories, and process a batch at a time. There are other ways...

      – ChuckCottrill
      Apr 29 at 21:41











    • Does this also include unix-like systems? And what is "GUN parallel"?

      – Biswapriyo
      2 days ago






    • 2





      @ChuckCottrill Yes, there are indeed other ways this could be done. Given my own experience dealing with this type of thing though, it's almost always better to just get a real orchestration tool than to try and roll your own solution, especially once you're past a few dozen systems in terms of scale.

      – Austin Hemmelgarn
      2 days ago






    • 2





      @Baldrickk geekz.co.uk/lovesraymond/archive/gun-linux

      – pipe
      2 days ago






    • 3





      @forest Yes, you could use rlimits to prevent the system from crashing, but getting them right in a case like this is not easy (you kind of need to know what the resource requirements for the tasks are beforehand) and doesn't protect the rest of the network from any impact these jobs may cause (which is arguably a potentially much bigger issue than crashing the local system).

      – Austin Hemmelgarn
      2 days ago
















    There are ways to run a limited number of background tasks (using bash, perl, python, et al), monitor for task completion, and run more tasks as prior tasks complete. A simple approach would be to collect batches of tasks represented by files in subdirectories, and process a batch at a time. There are other ways...

    – ChuckCottrill
    Apr 29 at 21:41





    There are ways to run a limited number of background tasks (using bash, perl, python, et al), monitor for task completion, and run more tasks as prior tasks complete. A simple approach would be to collect batches of tasks represented by files in subdirectories, and process a batch at a time. There are other ways...

    – ChuckCottrill
    Apr 29 at 21:41













    Does this also include unix-like systems? And what is "GUN parallel"?

    – Biswapriyo
    2 days ago





    Does this also include unix-like systems? And what is "GUN parallel"?

    – Biswapriyo
    2 days ago




    2




    2





    @ChuckCottrill Yes, there are indeed other ways this could be done. Given my own experience dealing with this type of thing though, it's almost always better to just get a real orchestration tool than to try and roll your own solution, especially once you're past a few dozen systems in terms of scale.

    – Austin Hemmelgarn
    2 days ago





    @ChuckCottrill Yes, there are indeed other ways this could be done. Given my own experience dealing with this type of thing though, it's almost always better to just get a real orchestration tool than to try and roll your own solution, especially once you're past a few dozen systems in terms of scale.

    – Austin Hemmelgarn
    2 days ago




    2




    2





    @Baldrickk geekz.co.uk/lovesraymond/archive/gun-linux

    – pipe
    2 days ago





    @Baldrickk geekz.co.uk/lovesraymond/archive/gun-linux

    – pipe
    2 days ago




    3




    3





    @forest Yes, you could use rlimits to prevent the system from crashing, but getting them right in a case like this is not easy (you kind of need to know what the resource requirements for the tasks are beforehand) and doesn't protect the rest of the network from any impact these jobs may cause (which is arguably a potentially much bigger issue than crashing the local system).

    – Austin Hemmelgarn
    2 days ago





    @forest Yes, you could use rlimits to prevent the system from crashing, but getting them right in a case like this is not easy (you kind of need to know what the resource requirements for the tasks are beforehand) and doesn't protect the rest of the network from any impact these jobs may cause (which is arguably a potentially much bigger issue than crashing the local system).

    – Austin Hemmelgarn
    2 days ago













    12














    It's hard to say specifically how many instances could be run as background jobs in the manner you describe. But a normal server can certainly maintain 700 concurrent connections as long as you do it correctly. Webservers do this all the time.



    May I suggest that you use GNU parallel (https://www.gnu.org/software/parallel/) or something similar to accomplish this? It would give you a number of advantages to the background job approach:



    • You can easily change the number of concurrent sessions.

    • And it will wait until sessions complete before it starts new ones.

    • It it easier to abort.

    Have a look here for a quick start: https://www.gnu.org/software/parallel/parallel_tutorial.html#A-single-input-source






    share|improve this answer








    New contributor




    laenkeio is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.















    • 1





      Interesting! I'll take a look at this. Do you know if attempting this kind of operation (without the help of Parallel) would risk crashing the hypervisor?

      – KuboMD
      Apr 29 at 18:15






    • 2





      @KuboMD if you can crash the hypervisor with something so mundane, it's a bug in the hypervisor :)

      – hobbs
      2 days ago











    • as an aside, web servers often use threading or event-based processing (example: gunicorn.org)

      – ChuckCottrill
      2 days ago















    12














    It's hard to say specifically how many instances could be run as background jobs in the manner you describe. But a normal server can certainly maintain 700 concurrent connections as long as you do it correctly. Webservers do this all the time.



    May I suggest that you use GNU parallel (https://www.gnu.org/software/parallel/) or something similar to accomplish this? It would give you a number of advantages to the background job approach:



    • You can easily change the number of concurrent sessions.

    • And it will wait until sessions complete before it starts new ones.

    • It it easier to abort.

    Have a look here for a quick start: https://www.gnu.org/software/parallel/parallel_tutorial.html#A-single-input-source






    share|improve this answer








    New contributor




    laenkeio is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.















    • 1





      Interesting! I'll take a look at this. Do you know if attempting this kind of operation (without the help of Parallel) would risk crashing the hypervisor?

      – KuboMD
      Apr 29 at 18:15






    • 2





      @KuboMD if you can crash the hypervisor with something so mundane, it's a bug in the hypervisor :)

      – hobbs
      2 days ago











    • as an aside, web servers often use threading or event-based processing (example: gunicorn.org)

      – ChuckCottrill
      2 days ago













    12












    12








    12







    It's hard to say specifically how many instances could be run as background jobs in the manner you describe. But a normal server can certainly maintain 700 concurrent connections as long as you do it correctly. Webservers do this all the time.



    May I suggest that you use GNU parallel (https://www.gnu.org/software/parallel/) or something similar to accomplish this? It would give you a number of advantages to the background job approach:



    • You can easily change the number of concurrent sessions.

    • And it will wait until sessions complete before it starts new ones.

    • It it easier to abort.

    Have a look here for a quick start: https://www.gnu.org/software/parallel/parallel_tutorial.html#A-single-input-source






    share|improve this answer








    New contributor




    laenkeio is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.










    It's hard to say specifically how many instances could be run as background jobs in the manner you describe. But a normal server can certainly maintain 700 concurrent connections as long as you do it correctly. Webservers do this all the time.



    May I suggest that you use GNU parallel (https://www.gnu.org/software/parallel/) or something similar to accomplish this? It would give you a number of advantages to the background job approach:



    • You can easily change the number of concurrent sessions.

    • And it will wait until sessions complete before it starts new ones.

    • It it easier to abort.

    Have a look here for a quick start: https://www.gnu.org/software/parallel/parallel_tutorial.html#A-single-input-source







    share|improve this answer








    New contributor




    laenkeio is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.









    share|improve this answer



    share|improve this answer






    New contributor




    laenkeio is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.









    answered Apr 29 at 17:53









    laenkeiolaenkeio

    1866




    1866




    New contributor




    laenkeio is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.





    New contributor





    laenkeio is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.






    laenkeio is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
    Check out our Code of Conduct.







    • 1





      Interesting! I'll take a look at this. Do you know if attempting this kind of operation (without the help of Parallel) would risk crashing the hypervisor?

      – KuboMD
      Apr 29 at 18:15






    • 2





      @KuboMD if you can crash the hypervisor with something so mundane, it's a bug in the hypervisor :)

      – hobbs
      2 days ago











    • as an aside, web servers often use threading or event-based processing (example: gunicorn.org)

      – ChuckCottrill
      2 days ago












    • 1





      Interesting! I'll take a look at this. Do you know if attempting this kind of operation (without the help of Parallel) would risk crashing the hypervisor?

      – KuboMD
      Apr 29 at 18:15






    • 2





      @KuboMD if you can crash the hypervisor with something so mundane, it's a bug in the hypervisor :)

      – hobbs
      2 days ago











    • as an aside, web servers often use threading or event-based processing (example: gunicorn.org)

      – ChuckCottrill
      2 days ago







    1




    1





    Interesting! I'll take a look at this. Do you know if attempting this kind of operation (without the help of Parallel) would risk crashing the hypervisor?

    – KuboMD
    Apr 29 at 18:15





    Interesting! I'll take a look at this. Do you know if attempting this kind of operation (without the help of Parallel) would risk crashing the hypervisor?

    – KuboMD
    Apr 29 at 18:15




    2




    2





    @KuboMD if you can crash the hypervisor with something so mundane, it's a bug in the hypervisor :)

    – hobbs
    2 days ago





    @KuboMD if you can crash the hypervisor with something so mundane, it's a bug in the hypervisor :)

    – hobbs
    2 days ago













    as an aside, web servers often use threading or event-based processing (example: gunicorn.org)

    – ChuckCottrill
    2 days ago





    as an aside, web servers often use threading or event-based processing (example: gunicorn.org)

    – ChuckCottrill
    2 days ago











    9














    Using & for parallel processing is fine when doing a few, and when you monitor progress. But if you are running in a corporate production environment you need something that gives you better control.



    ls ~/sagLogs/ | parallel --delay 0.5 --memfree 1G -j0 --joblog my.log --retries 10 foo 


    This will run foo for each file in ~/sagLogs. It start a job every 0.5 seconds, it will run as many jobs in parallel as possible as long as 1 GB RAM is free, but will respect the limits on your system (e.g. number of files and processes). Typically this means you will be running 250 jobs in parallel if you have not adjusted the number of open files allowed. If you adjust the number of open files, you should have no problem running 32000 in parallel - as long as you have enough memory.



    If a job fails (i.e. returns with an error code) it will be retried 10 times.



    my.log will tell you if a job succeed (after possibly retries) or not.






    share|improve this answer























    • This looks very promising, thank you.

      – KuboMD
      2 days ago











    • Ran a simple test doing cat ~/sagLogs/* >> ~/woah | parallel and holy moly that was fast. 1,054,552 lines in the blink of an eye.

      – KuboMD
      2 days ago






    • 3





      The command you gave has dual redirection, so I donot think it does what you intend it to do. GNU Parallel has an overhead of 10 ms per job, so 1M jobs should take in the order of 3 hours.

      – Ole Tange
      2 days ago






    • 1





      It is not applicable at all if all you want to do is simply to concatenate the files.

      – Ole Tange
      2 days ago







    • 1





      @KuboMD a trivial CPU busy loop like awk 'BEGINfor(i=rand()*10000000; i<100000000;i++)' would work for playing around with. Or try it on a task like sleep 10 to see it keep n jobs in flight without using much CPU time. e.g. time parallel sleep ::: 100..1 to run sleeps from 100 down to 1 second.

      – Peter Cordes
      2 days ago
















    9














    Using & for parallel processing is fine when doing a few, and when you monitor progress. But if you are running in a corporate production environment you need something that gives you better control.



    ls ~/sagLogs/ | parallel --delay 0.5 --memfree 1G -j0 --joblog my.log --retries 10 foo 


    This will run foo for each file in ~/sagLogs. It start a job every 0.5 seconds, it will run as many jobs in parallel as possible as long as 1 GB RAM is free, but will respect the limits on your system (e.g. number of files and processes). Typically this means you will be running 250 jobs in parallel if you have not adjusted the number of open files allowed. If you adjust the number of open files, you should have no problem running 32000 in parallel - as long as you have enough memory.



    If a job fails (i.e. returns with an error code) it will be retried 10 times.



    my.log will tell you if a job succeed (after possibly retries) or not.






    share|improve this answer























    • This looks very promising, thank you.

      – KuboMD
      2 days ago











    • Ran a simple test doing cat ~/sagLogs/* >> ~/woah | parallel and holy moly that was fast. 1,054,552 lines in the blink of an eye.

      – KuboMD
      2 days ago






    • 3





      The command you gave has dual redirection, so I donot think it does what you intend it to do. GNU Parallel has an overhead of 10 ms per job, so 1M jobs should take in the order of 3 hours.

      – Ole Tange
      2 days ago






    • 1





      It is not applicable at all if all you want to do is simply to concatenate the files.

      – Ole Tange
      2 days ago







    • 1





      @KuboMD a trivial CPU busy loop like awk 'BEGINfor(i=rand()*10000000; i<100000000;i++)' would work for playing around with. Or try it on a task like sleep 10 to see it keep n jobs in flight without using much CPU time. e.g. time parallel sleep ::: 100..1 to run sleeps from 100 down to 1 second.

      – Peter Cordes
      2 days ago














    9












    9








    9







    Using & for parallel processing is fine when doing a few, and when you monitor progress. But if you are running in a corporate production environment you need something that gives you better control.



    ls ~/sagLogs/ | parallel --delay 0.5 --memfree 1G -j0 --joblog my.log --retries 10 foo 


    This will run foo for each file in ~/sagLogs. It start a job every 0.5 seconds, it will run as many jobs in parallel as possible as long as 1 GB RAM is free, but will respect the limits on your system (e.g. number of files and processes). Typically this means you will be running 250 jobs in parallel if you have not adjusted the number of open files allowed. If you adjust the number of open files, you should have no problem running 32000 in parallel - as long as you have enough memory.



    If a job fails (i.e. returns with an error code) it will be retried 10 times.



    my.log will tell you if a job succeed (after possibly retries) or not.






    share|improve this answer













    Using & for parallel processing is fine when doing a few, and when you monitor progress. But if you are running in a corporate production environment you need something that gives you better control.



    ls ~/sagLogs/ | parallel --delay 0.5 --memfree 1G -j0 --joblog my.log --retries 10 foo 


    This will run foo for each file in ~/sagLogs. It start a job every 0.5 seconds, it will run as many jobs in parallel as possible as long as 1 GB RAM is free, but will respect the limits on your system (e.g. number of files and processes). Typically this means you will be running 250 jobs in parallel if you have not adjusted the number of open files allowed. If you adjust the number of open files, you should have no problem running 32000 in parallel - as long as you have enough memory.



    If a job fails (i.e. returns with an error code) it will be retried 10 times.



    my.log will tell you if a job succeed (after possibly retries) or not.







    share|improve this answer












    share|improve this answer



    share|improve this answer










    answered Apr 29 at 22:52









    Ole TangeOle Tange

    13.2k1658107




    13.2k1658107












    • This looks very promising, thank you.

      – KuboMD
      2 days ago











    • Ran a simple test doing cat ~/sagLogs/* >> ~/woah | parallel and holy moly that was fast. 1,054,552 lines in the blink of an eye.

      – KuboMD
      2 days ago






    • 3





      The command you gave has dual redirection, so I donot think it does what you intend it to do. GNU Parallel has an overhead of 10 ms per job, so 1M jobs should take in the order of 3 hours.

      – Ole Tange
      2 days ago






    • 1





      It is not applicable at all if all you want to do is simply to concatenate the files.

      – Ole Tange
      2 days ago







    • 1





      @KuboMD a trivial CPU busy loop like awk 'BEGINfor(i=rand()*10000000; i<100000000;i++)' would work for playing around with. Or try it on a task like sleep 10 to see it keep n jobs in flight without using much CPU time. e.g. time parallel sleep ::: 100..1 to run sleeps from 100 down to 1 second.

      – Peter Cordes
      2 days ago


















    • This looks very promising, thank you.

      – KuboMD
      2 days ago











    • Ran a simple test doing cat ~/sagLogs/* >> ~/woah | parallel and holy moly that was fast. 1,054,552 lines in the blink of an eye.

      – KuboMD
      2 days ago






    • 3





      The command you gave has dual redirection, so I donot think it does what you intend it to do. GNU Parallel has an overhead of 10 ms per job, so 1M jobs should take in the order of 3 hours.

      – Ole Tange
      2 days ago






    • 1





      It is not applicable at all if all you want to do is simply to concatenate the files.

      – Ole Tange
      2 days ago







    • 1





      @KuboMD a trivial CPU busy loop like awk 'BEGINfor(i=rand()*10000000; i<100000000;i++)' would work for playing around with. Or try it on a task like sleep 10 to see it keep n jobs in flight without using much CPU time. e.g. time parallel sleep ::: 100..1 to run sleeps from 100 down to 1 second.

      – Peter Cordes
      2 days ago

















    This looks very promising, thank you.

    – KuboMD
    2 days ago





    This looks very promising, thank you.

    – KuboMD
    2 days ago













    Ran a simple test doing cat ~/sagLogs/* >> ~/woah | parallel and holy moly that was fast. 1,054,552 lines in the blink of an eye.

    – KuboMD
    2 days ago





    Ran a simple test doing cat ~/sagLogs/* >> ~/woah | parallel and holy moly that was fast. 1,054,552 lines in the blink of an eye.

    – KuboMD
    2 days ago




    3




    3





    The command you gave has dual redirection, so I donot think it does what you intend it to do. GNU Parallel has an overhead of 10 ms per job, so 1M jobs should take in the order of 3 hours.

    – Ole Tange
    2 days ago





    The command you gave has dual redirection, so I donot think it does what you intend it to do. GNU Parallel has an overhead of 10 ms per job, so 1M jobs should take in the order of 3 hours.

    – Ole Tange
    2 days ago




    1




    1





    It is not applicable at all if all you want to do is simply to concatenate the files.

    – Ole Tange
    2 days ago






    It is not applicable at all if all you want to do is simply to concatenate the files.

    – Ole Tange
    2 days ago





    1




    1





    @KuboMD a trivial CPU busy loop like awk 'BEGINfor(i=rand()*10000000; i<100000000;i++)' would work for playing around with. Or try it on a task like sleep 10 to see it keep n jobs in flight without using much CPU time. e.g. time parallel sleep ::: 100..1 to run sleeps from 100 down to 1 second.

    – Peter Cordes
    2 days ago






    @KuboMD a trivial CPU busy loop like awk 'BEGINfor(i=rand()*10000000; i<100000000;i++)' would work for playing around with. Or try it on a task like sleep 10 to see it keep n jobs in flight without using much CPU time. e.g. time parallel sleep ::: 100..1 to run sleeps from 100 down to 1 second.

    – Peter Cordes
    2 days ago












    1















    What happens if I start too many background jobs?




    the system will become slow and unresponsive, worst case is so unresponsive it would be best to just push the power button and do a hard reboot... this would be running something as root where it had the privilege to get away with doing that. If your bash script is running under regular user privileges, then the first thing that comes to mind is /etc/security/limits.conf and /etc/systemd/system.conf and all the variables therein to [ideally speaking] prevent user(s) from overloading the system.




    • cpu = xeon E5649, that is a 12-core cpu; so you have 12 cores for 12 processes to run concurrently each utilizing one of twelve cores at 100%. If you kick off 24 processes, then each would run at 50% utilization on each of twelve cores, 700 processes = 1.7% but it's a computer as long as everything completes properly in an ok amount of time then that = success; being efficient is not always relevant.



      1. Could all 700 instances possibly run concurrently? Certainly, 700 is not a large number; my /etc/security/limits.conf maxproc default is 4,135,275 for example


      2. How far could I get until my server reaches its limit? Much farther than 700 I'm sure.


      3. Limits... what will happen if the script is kicked off under a user account [and generally root as well limits.conf pretty much applies to everyone] is that the script will just exit after having tried to do foo & 700 times; you would expect to then see 700 foo processes each with a different pid but you might only see 456 (random number choice) and the other 244 never started because they got blocked by some security or systemd limit.



    Million $ question: how many should u run concurrently?



    being involved with network and you said each will do a telnet connection, educated guess is you will run into network limits and overhead before you do for cpu and ram limits. But I don't know what you are doing specifically, what will likely happen is u can kick off all 700 at once, but things will automatically block until previous processes and network connections finish and close based on various system limits, or something like the first 500 will kick off then the remaining 200 won't because system or kernel limits prevent it. But however many run at once, there will be some sweetish spot to get things done as fast as possible... minimizing overhead and increasing efficiency. Being 12 cores (or 24 if you have 2 cpu's) then start with 12 (or 24) at once and then increase that concurrent batch number by 12 or 24 until you don't see run time improvement.



    hint: google max telnet connections and see how this applies to your system(s). Also don't forget about firewalls. Also do quick calculation of memory needed per process x 700; make sure < available RAM (about 50gb in your case) otherwise system will start using SWAP and basically become unresponsive. So kick of 12, 24, N processes at a time and monitor RAM free, then increase N already having some knowledge of what's happening.




    By default, RHEL limits the number of telnet connections from a single host to 10 simultaneous sessions. This is a security feature... set to 10, /etc/xinetd.conf, change “per_source” value.







    share|improve this answer





























      1















      What happens if I start too many background jobs?




      the system will become slow and unresponsive, worst case is so unresponsive it would be best to just push the power button and do a hard reboot... this would be running something as root where it had the privilege to get away with doing that. If your bash script is running under regular user privileges, then the first thing that comes to mind is /etc/security/limits.conf and /etc/systemd/system.conf and all the variables therein to [ideally speaking] prevent user(s) from overloading the system.




      • cpu = xeon E5649, that is a 12-core cpu; so you have 12 cores for 12 processes to run concurrently each utilizing one of twelve cores at 100%. If you kick off 24 processes, then each would run at 50% utilization on each of twelve cores, 700 processes = 1.7% but it's a computer as long as everything completes properly in an ok amount of time then that = success; being efficient is not always relevant.



        1. Could all 700 instances possibly run concurrently? Certainly, 700 is not a large number; my /etc/security/limits.conf maxproc default is 4,135,275 for example


        2. How far could I get until my server reaches its limit? Much farther than 700 I'm sure.


        3. Limits... what will happen if the script is kicked off under a user account [and generally root as well limits.conf pretty much applies to everyone] is that the script will just exit after having tried to do foo & 700 times; you would expect to then see 700 foo processes each with a different pid but you might only see 456 (random number choice) and the other 244 never started because they got blocked by some security or systemd limit.



      Million $ question: how many should u run concurrently?



      being involved with network and you said each will do a telnet connection, educated guess is you will run into network limits and overhead before you do for cpu and ram limits. But I don't know what you are doing specifically, what will likely happen is u can kick off all 700 at once, but things will automatically block until previous processes and network connections finish and close based on various system limits, or something like the first 500 will kick off then the remaining 200 won't because system or kernel limits prevent it. But however many run at once, there will be some sweetish spot to get things done as fast as possible... minimizing overhead and increasing efficiency. Being 12 cores (or 24 if you have 2 cpu's) then start with 12 (or 24) at once and then increase that concurrent batch number by 12 or 24 until you don't see run time improvement.



      hint: google max telnet connections and see how this applies to your system(s). Also don't forget about firewalls. Also do quick calculation of memory needed per process x 700; make sure < available RAM (about 50gb in your case) otherwise system will start using SWAP and basically become unresponsive. So kick of 12, 24, N processes at a time and monitor RAM free, then increase N already having some knowledge of what's happening.




      By default, RHEL limits the number of telnet connections from a single host to 10 simultaneous sessions. This is a security feature... set to 10, /etc/xinetd.conf, change “per_source” value.







      share|improve this answer



























        1












        1








        1








        What happens if I start too many background jobs?




        the system will become slow and unresponsive, worst case is so unresponsive it would be best to just push the power button and do a hard reboot... this would be running something as root where it had the privilege to get away with doing that. If your bash script is running under regular user privileges, then the first thing that comes to mind is /etc/security/limits.conf and /etc/systemd/system.conf and all the variables therein to [ideally speaking] prevent user(s) from overloading the system.




        • cpu = xeon E5649, that is a 12-core cpu; so you have 12 cores for 12 processes to run concurrently each utilizing one of twelve cores at 100%. If you kick off 24 processes, then each would run at 50% utilization on each of twelve cores, 700 processes = 1.7% but it's a computer as long as everything completes properly in an ok amount of time then that = success; being efficient is not always relevant.



          1. Could all 700 instances possibly run concurrently? Certainly, 700 is not a large number; my /etc/security/limits.conf maxproc default is 4,135,275 for example


          2. How far could I get until my server reaches its limit? Much farther than 700 I'm sure.


          3. Limits... what will happen if the script is kicked off under a user account [and generally root as well limits.conf pretty much applies to everyone] is that the script will just exit after having tried to do foo & 700 times; you would expect to then see 700 foo processes each with a different pid but you might only see 456 (random number choice) and the other 244 never started because they got blocked by some security or systemd limit.



        Million $ question: how many should u run concurrently?



        being involved with network and you said each will do a telnet connection, educated guess is you will run into network limits and overhead before you do for cpu and ram limits. But I don't know what you are doing specifically, what will likely happen is u can kick off all 700 at once, but things will automatically block until previous processes and network connections finish and close based on various system limits, or something like the first 500 will kick off then the remaining 200 won't because system or kernel limits prevent it. But however many run at once, there will be some sweetish spot to get things done as fast as possible... minimizing overhead and increasing efficiency. Being 12 cores (or 24 if you have 2 cpu's) then start with 12 (or 24) at once and then increase that concurrent batch number by 12 or 24 until you don't see run time improvement.



        hint: google max telnet connections and see how this applies to your system(s). Also don't forget about firewalls. Also do quick calculation of memory needed per process x 700; make sure < available RAM (about 50gb in your case) otherwise system will start using SWAP and basically become unresponsive. So kick of 12, 24, N processes at a time and monitor RAM free, then increase N already having some knowledge of what's happening.




        By default, RHEL limits the number of telnet connections from a single host to 10 simultaneous sessions. This is a security feature... set to 10, /etc/xinetd.conf, change “per_source” value.







        share|improve this answer
















        What happens if I start too many background jobs?




        the system will become slow and unresponsive, worst case is so unresponsive it would be best to just push the power button and do a hard reboot... this would be running something as root where it had the privilege to get away with doing that. If your bash script is running under regular user privileges, then the first thing that comes to mind is /etc/security/limits.conf and /etc/systemd/system.conf and all the variables therein to [ideally speaking] prevent user(s) from overloading the system.




        • cpu = xeon E5649, that is a 12-core cpu; so you have 12 cores for 12 processes to run concurrently each utilizing one of twelve cores at 100%. If you kick off 24 processes, then each would run at 50% utilization on each of twelve cores, 700 processes = 1.7% but it's a computer as long as everything completes properly in an ok amount of time then that = success; being efficient is not always relevant.



          1. Could all 700 instances possibly run concurrently? Certainly, 700 is not a large number; my /etc/security/limits.conf maxproc default is 4,135,275 for example


          2. How far could I get until my server reaches its limit? Much farther than 700 I'm sure.


          3. Limits... what will happen if the script is kicked off under a user account [and generally root as well limits.conf pretty much applies to everyone] is that the script will just exit after having tried to do foo & 700 times; you would expect to then see 700 foo processes each with a different pid but you might only see 456 (random number choice) and the other 244 never started because they got blocked by some security or systemd limit.



        Million $ question: how many should u run concurrently?



        being involved with network and you said each will do a telnet connection, educated guess is you will run into network limits and overhead before you do for cpu and ram limits. But I don't know what you are doing specifically, what will likely happen is u can kick off all 700 at once, but things will automatically block until previous processes and network connections finish and close based on various system limits, or something like the first 500 will kick off then the remaining 200 won't because system or kernel limits prevent it. But however many run at once, there will be some sweetish spot to get things done as fast as possible... minimizing overhead and increasing efficiency. Being 12 cores (or 24 if you have 2 cpu's) then start with 12 (or 24) at once and then increase that concurrent batch number by 12 or 24 until you don't see run time improvement.



        hint: google max telnet connections and see how this applies to your system(s). Also don't forget about firewalls. Also do quick calculation of memory needed per process x 700; make sure < available RAM (about 50gb in your case) otherwise system will start using SWAP and basically become unresponsive. So kick of 12, 24, N processes at a time and monitor RAM free, then increase N already having some knowledge of what's happening.




        By default, RHEL limits the number of telnet connections from a single host to 10 simultaneous sessions. This is a security feature... set to 10, /etc/xinetd.conf, change “per_source” value.








        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited 2 days ago

























        answered 2 days ago









        ronron

        1,2022819




        1,2022819



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Unix & Linux Stack Exchange!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f516203%2fwhat-happens-if-i-start-too-many-background-jobs%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Get product attribute by attribute group code in magento 2get product attribute by product attribute group in magento 2Magento 2 Log Bundle Product Data in List Page?How to get all product attribute of a attribute group of Default attribute set?Magento 2.1 Create a filter in the product grid by new attributeMagento 2 : Get Product Attribute values By GroupMagento 2 How to get all existing values for one attributeMagento 2 get custom attribute of a single product inside a pluginMagento 2.3 How to get all the Multi Source Inventory (MSI) locations collection in custom module?Magento2: how to develop rest API to get new productsGet product attribute by attribute group code ( [attribute_group_code] ) in magento 2

            Category:9 (number) SubcategoriesMedia in category "9 (number)"Navigation menuUpload mediaGND ID: 4485639-8Library of Congress authority ID: sh85091979ReasonatorScholiaStatistics

            Magento 2.3: How do i solve this, Not registered handle, on custom form?How can i rewrite TierPrice Block in Magento2magento 2 captcha not rendering if I override layout xmlmain.CRITICAL: Plugin class doesn't existMagento 2 : Problem while adding custom button order view page?Magento 2.2.5: Overriding Admin Controller sales/orderMagento 2.2.5: Add, Update and Delete existing products Custom OptionsMagento 2.3 : File Upload issue in UI Component FormMagento2 Not registered handleHow to configured Form Builder Js in my custom magento 2.3.0 module?Magento 2.3. How to create image upload field in an admin form