What happens if I start too many background jobs?Where do background jobs go?Is it possible to customise the prompt to show the if there are any background jobs?What happens to suspended jobs in unix?Modify background jobs and/or notify on job completion post-launchLive monitoring of background jobsAre background jobs and foreground jobs always in running state?What happens to background jobs after exiting the shell?How do I wait for background jobs in the background?how to wait for many background jobs in bashHow can we kill all the background jobs (running or not) in the current shell?

My advisor talks about me to his colleague

How did the Venus Express detect lightning?

Where can I go to avoid planes overhead?

Introducing Gladys, an intrepid globetrotter

In Russian, how do you idiomatically express the idea of the figurative "overnight"?

What does this wavy downward arrow preceding a piano chord mean?

Should I decline this job offer that requires relocating to an area with high cost of living?

Where are the "shires" in the UK?

finding a solution for this recurrence relation

SafeCracker #3 - We've Been Blocked

Floor of Riemann zeta function

Can my company stop me from working overtime?

ZSPL language, anyone heard of it?

Would glacier 'trees' be plausible?

How do inspiraling black holes get closer?

60s/70s science fiction novel where a man (after years of trying) finally succeeds to make a coin levitate by sheer concentration

Out of scope work duties and resignation

Why is "breaking the mould" positively connoted?

What was the first story to feature the plot "the monsters were human all along"?

What exactly are the `size issues' preventing formation of presheaves being a left adjoint to some forgetful functor?

What does "Managed by Windows" do in the Power options for network connection?

Adjacent DEM color matching in QGIS

Identifying characters

Are Finitely generated modules over a ring also finitely generated over a subring containing the identity?

What happens if I start too many background jobs?

Where do background jobs go?Is it possible to customise the prompt to show the if there are any background jobs?What happens to suspended jobs in unix?Modify background jobs and/or notify on job completion post-launchLive monitoring of background jobsAre background jobs and foreground jobs always in running state?What happens to background jobs after exiting the shell?How do I wait for background jobs in the background?how to wait for many background jobs in bashHow can we kill all the background jobs (running or not) in the current shell?

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;

I need to do some work on 700 network devices using an expect script. I can get it done sequentially, but so far the runtime is around 24 hours. This is mostly due to the time it takes to establish a connection and the delay in the output from these devices (old ones). I'm able to establish two connections and have them run in parallel just fine, but how far can I push that?

I don't imagine I could do all 700 of them at once, surely there's some limit to the no. of telnet connections my VM can manage.

If I did try to start 700 of them in some sort of loop like this:

for node in `ls ~/sagLogs/`; do 
 foo & 
done

With

CPU 12 CPUs x Intel(R) Xeon(R) CPU E5649 @ 2.53GHz

Memory 47.94 GB

My question is:

Could all 700 instances possibly run concurrently?

How far could I get until my server reaches its limit?

When that limit is reached, will it just wait to begin the next iteration off foo or will the box crash?

I'm running in a corporate production environment unfortunately, so I can't exactly just try and see what happens.

asked Apr 29 at 17:30

KuboMD

23810

3

I've had good luck with parallel, using around 50 concurrent jobs. It's a great medium between parallelism of 1 and 700. The other nice thing is that's batchless. A single stalled connection will only stall itself, not any of the others. The main downside is error management. None of these shell-based approaches will gracefully handle errors. You'll have to manually check for success yourself, and do your own retries.

– Adam
Apr 29 at 21:38

1

Your task queue may be 700 today, but can the size expand? Watch for swap space to grow - that is indication you have reached memory limit. And cpu % is not a good measure (for linux/unix), better to consider load average (run queue length).

– ChuckCottrill
Apr 29 at 21:45

1

The most recent way I broke production at my still-kinda-new job was by accidentally running a million plus short-lived background jobs at once. They involved JVMs (wait wait put the pitchforks down), so the consequences were 'limited' to hundreds of thousands of error report files that threads couldn't be started.

– michaelb958
Apr 29 at 23:04

4

Nitpick: Don't parse ls output

– l0b0
Apr 30 at 1:18

1

@KuboMD And as long as nobody else ever wants to use your code.

– l0b0
2 days ago

|
show 6 more comments

I don't imagine I could do all 700 of them at once, surely there's some limit to the no. of telnet connections my VM can manage.

If I did try to start 700 of them in some sort of loop like this:

for node in `ls ~/sagLogs/`; do 
 foo & 
done

With

CPU 12 CPUs x Intel(R) Xeon(R) CPU E5649 @ 2.53GHz

Memory 47.94 GB

My question is:

Could all 700 instances possibly run concurrently?

How far could I get until my server reaches its limit?

When that limit is reached, will it just wait to begin the next iteration off foo or will the box crash?

I'm running in a corporate production environment unfortunately, so I can't exactly just try and see what happens.

asked Apr 29 at 17:30

KuboMD

23810

3

I've had good luck with parallel, using around 50 concurrent jobs. It's a great medium between parallelism of 1 and 700. The other nice thing is that's batchless. A single stalled connection will only stall itself, not any of the others. The main downside is error management. None of these shell-based approaches will gracefully handle errors. You'll have to manually check for success yourself, and do your own retries.

– Adam
Apr 29 at 21:38

1

Your task queue may be 700 today, but can the size expand? Watch for swap space to grow - that is indication you have reached memory limit. And cpu % is not a good measure (for linux/unix), better to consider load average (run queue length).

– ChuckCottrill
Apr 29 at 21:45

1

The most recent way I broke production at my still-kinda-new job was by accidentally running a million plus short-lived background jobs at once. They involved JVMs (wait wait put the pitchforks down), so the consequences were 'limited' to hundreds of thousands of error report files that threads couldn't be started.

– michaelb958
Apr 29 at 23:04

4

Nitpick: Don't parse ls output

– l0b0
Apr 30 at 1:18

1

@KuboMD And as long as nobody else ever wants to use your code.

– l0b0
2 days ago

|
show 6 more comments

I don't imagine I could do all 700 of them at once, surely there's some limit to the no. of telnet connections my VM can manage.

If I did try to start 700 of them in some sort of loop like this:

for node in `ls ~/sagLogs/`; do 
 foo & 
done

With

CPU 12 CPUs x Intel(R) Xeon(R) CPU E5649 @ 2.53GHz

Memory 47.94 GB

My question is:

Could all 700 instances possibly run concurrently?

How far could I get until my server reaches its limit?

When that limit is reached, will it just wait to begin the next iteration off foo or will the box crash?

I'm running in a corporate production environment unfortunately, so I can't exactly just try and see what happens.

asked Apr 29 at 17:30

KuboMD

23810

I don't imagine I could do all 700 of them at once, surely there's some limit to the no. of telnet connections my VM can manage.

If I did try to start 700 of them in some sort of loop like this:

for node in `ls ~/sagLogs/`; do 
 foo & 
done

With

CPU 12 CPUs x Intel(R) Xeon(R) CPU E5649 @ 2.53GHz

Memory 47.94 GB

My question is:

Could all 700 instances possibly run concurrently?

How far could I get until my server reaches its limit?

When that limit is reached, will it just wait to begin the next iteration off foo or will the box crash?

I'm running in a corporate production environment unfortunately, so I can't exactly just try and see what happens.

bash background-process expect telnet jobs

asked Apr 29 at 17:30

KuboMD

23810

asked Apr 29 at 17:30

KuboMD

23810

asked Apr 29 at 17:30

KuboMD

23810

asked Apr 29 at 17:30

KuboMD

23810

asked Apr 29 at 17:30

KuboMD

23810

3

I've had good luck with parallel, using around 50 concurrent jobs. It's a great medium between parallelism of 1 and 700. The other nice thing is that's batchless. A single stalled connection will only stall itself, not any of the others. The main downside is error management. None of these shell-based approaches will gracefully handle errors. You'll have to manually check for success yourself, and do your own retries.

– Adam
Apr 29 at 21:38

1

Your task queue may be 700 today, but can the size expand? Watch for swap space to grow - that is indication you have reached memory limit. And cpu % is not a good measure (for linux/unix), better to consider load average (run queue length).

– ChuckCottrill
Apr 29 at 21:45

1

The most recent way I broke production at my still-kinda-new job was by accidentally running a million plus short-lived background jobs at once. They involved JVMs (wait wait put the pitchforks down), so the consequences were 'limited' to hundreds of thousands of error report files that threads couldn't be started.

– michaelb958
Apr 29 at 23:04

4

Nitpick: Don't parse ls output

– l0b0
Apr 30 at 1:18

1

@KuboMD And as long as nobody else ever wants to use your code.

– l0b0
2 days ago

|
show 6 more comments

3

I've had good luck with parallel, using around 50 concurrent jobs. It's a great medium between parallelism of 1 and 700. The other nice thing is that's batchless. A single stalled connection will only stall itself, not any of the others. The main downside is error management. None of these shell-based approaches will gracefully handle errors. You'll have to manually check for success yourself, and do your own retries.

– Adam
Apr 29 at 21:38

1

Your task queue may be 700 today, but can the size expand? Watch for swap space to grow - that is indication you have reached memory limit. And cpu % is not a good measure (for linux/unix), better to consider load average (run queue length).

– ChuckCottrill
Apr 29 at 21:45

1

The most recent way I broke production at my still-kinda-new job was by accidentally running a million plus short-lived background jobs at once. They involved JVMs (wait wait put the pitchforks down), so the consequences were 'limited' to hundreds of thousands of error report files that threads couldn't be started.

– michaelb958
Apr 29 at 23:04

4

Nitpick: Don't parse ls output

– l0b0
Apr 30 at 1:18

1

@KuboMD And as long as nobody else ever wants to use your code.

– l0b0
2 days ago

I've had good luck with parallel, using around 50 concurrent jobs. It's a great medium between parallelism of 1 and 700. The other nice thing is that's batchless. A single stalled connection will only stall itself, not any of the others. The main downside is error management. None of these shell-based approaches will gracefully handle errors. You'll have to manually check for success yourself, and do your own retries.

– Adam
Apr 29 at 21:38

Your task queue may be 700 today, but can the size expand? Watch for swap space to grow - that is indication you have reached memory limit. And cpu % is not a good measure (for linux/unix), better to consider load average (run queue length).

– ChuckCottrill
Apr 29 at 21:45

The most recent way I broke production at my still-kinda-new job was by accidentally running a million plus short-lived background jobs at once. They involved JVMs (wait wait put the pitchforks down), so the consequences were 'limited' to hundreds of thousands of error report files that threads couldn't be started.

– michaelb958
Apr 29 at 23:04

Nitpick: Don't parse ls output

– l0b0
Apr 30 at 1:18

@KuboMD And as long as nobody else ever wants to use your code.

– l0b0
2 days ago

|
show 6 more comments

4 Answers
4

active

oldest

votes

Could all 700 instances possibly run concurrently?

That depends on what you mean by concurrently. If we're being picky, then no, they can't unless you have 700 threads of execution on your system you can utilize (so probably not). Realistically though, yes, they probably can, provided you have enough RAM and/or swap space on the system. UNIX and it's various children are remarkably good at managing huge levels of concurrency, that's part of why they're so popular for large-scale HPC usage.

How far could I get until my server reaches its limit?

This is impossible to answer concretely without a whole lot more info. Pretty much, you need to have enough memory to meet:

The entire run-time memory requirements of one job, times 700.

The memory requirements of bash to manage that many jobs (bash is not horrible about this, but the job control isn't exactly memory efficient).

Any other memory requirements on the system.

Assuming you meet that (again, with only 50GB of RAM, you still ahve to deal with other issues:

How much CPU time is going to be wasted by bash on job control? Probably not much, but with hundreds of jobs, it could be significant.

How much network bandwidth is this going to need? Just opening all those connections may swamp your network for a couple of minutes depending on your bandwidth and latency.

Many other things I probably haven't thought of.

When that limit is reached, will it just wait to begin the next iteration off foo or will the box crash?

It depends on what limit is hit. If it's memory, something will die on the system (more specifically, get killed by the kernel in an attempt to free up memory) or the system itself may crash (it's not unusual to configure systems to intentionally crash when running out of memory). If it's CPU time, it will just keep going without issue, it'll just be impossible to do much else on the system. If it's the network though, you might crash other systems or services.

What you really need here is not to run all the jobs at the same time. Instead, split them into batches, and run all the jobs within a batch at the same time, let them finish, then start the next batch. GNU Parallel (https://www.gnu.org/software/parallel/) can be used for this, but it's less than ideal at that scale in a production environment (if you go with it, don't get too aggressive, like I said, you might swamp the network and affect systems you otherwise would not be touching). I would really recommend looking into a proper network orchestration tool like Ansible (https://www.ansible.com/), as that will not only solve your concurrency issues (Ansible does batching like I mentioned above automatically), but also give you a lot of other useful features to work with (like idempotent execution of tasks, nice status reports, and native integration with a very large number of other tools).

edited 2 days ago

answered Apr 29 at 19:50

Austin Hemmelgarn

6,47111120

There are ways to run a limited number of background tasks (using bash, perl, python, et al), monitor for task completion, and run more tasks as prior tasks complete. A simple approach would be to collect batches of tasks represented by files in subdirectories, and process a batch at a time. There are other ways...

– ChuckCottrill
Apr 29 at 21:41

Does this also include unix-like systems? And what is "GUN parallel"?

– Biswapriyo
2 days ago

2

@ChuckCottrill Yes, there are indeed other ways this could be done. Given my own experience dealing with this type of thing though, it's almost always better to just get a real orchestration tool than to try and roll your own solution, especially once you're past a few dozen systems in terms of scale.

– Austin Hemmelgarn
2 days ago

2

@Baldrickk geekz.co.uk/lovesraymond/archive/gun-linux

– pipe
2 days ago

3

@forest Yes, you could use rlimits to prevent the system from crashing, but getting them right in a case like this is not easy (you kind of need to know what the resource requirements for the tasks are beforehand) and doesn't protect the rest of the network from any impact these jobs may cause (which is arguably a potentially much bigger issue than crashing the local system).

– Austin Hemmelgarn
2 days ago

|
show 5 more comments

It's hard to say specifically how many instances could be run as background jobs in the manner you describe. But a normal server can certainly maintain 700 concurrent connections as long as you do it correctly. Webservers do this all the time.

May I suggest that you use GNU parallel (https://www.gnu.org/software/parallel/) or something similar to accomplish this? It would give you a number of advantages to the background job approach:

You can easily change the number of concurrent sessions.

And it will wait until sessions complete before it starts new ones.

It it easier to abort.

Have a look here for a quick start: https://www.gnu.org/software/parallel/parallel_tutorial.html#A-single-input-source

answered Apr 29 at 17:53

laenkeio

1866

New contributor

1

Interesting! I'll take a look at this. Do you know if attempting this kind of operation (without the help of Parallel) would risk crashing the hypervisor?

– KuboMD
Apr 29 at 18:15

2

@KuboMD if you can crash the hypervisor with something so mundane, it's a bug in the hypervisor :)

– hobbs
2 days ago

as an aside, web servers often use threading or event-based processing (example: gunicorn.org)

– ChuckCottrill
2 days ago

add a comment |

Using & for parallel processing is fine when doing a few, and when you monitor progress. But if you are running in a corporate production environment you need something that gives you better control.

ls ~/sagLogs/ | parallel --delay 0.5 --memfree 1G -j0 --joblog my.log --retries 10 foo

This will run foo for each file in ~/sagLogs. It start a job every 0.5 seconds, it will run as many jobs in parallel as possible as long as 1 GB RAM is free, but will respect the limits on your system (e.g. number of files and processes). Typically this means you will be running 250 jobs in parallel if you have not adjusted the number of open files allowed. If you adjust the number of open files, you should have no problem running 32000 in parallel - as long as you have enough memory.

If a job fails (i.e. returns with an error code) it will be retried 10 times.

my.log will tell you if a job succeed (after possibly retries) or not.

answered Apr 29 at 22:52

Ole Tange

13.2k1658107

This looks very promising, thank you.

– KuboMD
2 days ago

Ran a simple test doing cat ~/sagLogs/* >> ~/woah | parallel and holy moly that was fast. 1,054,552 lines in the blink of an eye.

– KuboMD
2 days ago

3

The command you gave has dual redirection, so I donot think it does what you intend it to do. GNU Parallel has an overhead of 10 ms per job, so 1M jobs should take in the order of 3 hours.

– Ole Tange
2 days ago

1

It is not applicable at all if all you want to do is simply to concatenate the files.

– Ole Tange
2 days ago

1

@KuboMD a trivial CPU busy loop like awk 'BEGINfor(i=rand()*10000000; i<100000000;i++)' would work for playing around with. Or try it on a task like sleep 10 to see it keep n jobs in flight without using much CPU time. e.g. time parallel sleep ::: 100..1 to run sleeps from 100 down to 1 second.

– Peter Cordes
2 days ago

|
show 7 more comments

What happens if I start too many background jobs?

the system will become slow and unresponsive, worst case is so unresponsive it would be best to just push the power button and do a hard reboot... this would be running something as root where it had the privilege to get away with doing that. If your bash script is running under regular user privileges, then the first thing that comes to mind is /etc/security/limits.conf and /etc/systemd/system.conf and all the variables therein to [ideally speaking] prevent user(s) from overloading the system.

cpu = xeon E5649, that is a 12-core cpu; so you have 12 cores for 12 processes to run concurrently each utilizing one of twelve cores at 100%. If you kick off 24 processes, then each would run at 50% utilization on each of twelve cores, 700 processes = 1.7% but it's a computer as long as everything completes properly in an ok amount of time then that = success; being efficient is not always relevant.
1. Could all 700 instances possibly run concurrently? Certainly, 700 is not a large number; my /etc/security/limits.conf maxproc default is 4,135,275 for example
2. How far could I get until my server reaches its limit? Much farther than 700 I'm sure.
3. Limits... what will happen if the script is kicked off under a user account [and generally root as well limits.conf pretty much applies to everyone] is that the script will just exit after having tried to do foo & 700 times; you would expect to then see 700 foo processes each with a different pid but you might only see 456 (random number choice) and the other 244 never started because they got blocked by some security or systemd limit.

Million $ question: how many should u run concurrently?

being involved with network and you said each will do a telnet connection, educated guess is you will run into network limits and overhead before you do for cpu and ram limits. But I don't know what you are doing specifically, what will likely happen is u can kick off all 700 at once, but things will automatically block until previous processes and network connections finish and close based on various system limits, or something like the first 500 will kick off then the remaining 200 won't because system or kernel limits prevent it. But however many run at once, there will be some sweetish spot to get things done as fast as possible... minimizing overhead and increasing efficiency. Being 12 cores (or 24 if you have 2 cpu's) then start with 12 (or 24) at once and then increase that concurrent batch number by 12 or 24 until you don't see run time improvement.

hint: google max telnet connections and see how this applies to your system(s). Also don't forget about firewalls. Also do quick calculation of memory needed per process x 700; make sure < available RAM (about 50gb in your case) otherwise system will start using SWAP and basically become unresponsive. So kick of 12, 24, N processes at a time and monitor RAM free, then increase N already having some knowledge of what's happening.

By default, RHEL limits the number of telnet connections from a single host to 10 simultaneous sessions. This is a security feature... set to 10, /etc/xinetd.conf, change “per_source” value.

edited 2 days ago

answered 2 days ago

ron

1,2022819

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f516203%2fwhat-happens-if-i-start-too-many-background-jobs%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

Could all 700 instances possibly run concurrently?

How far could I get until my server reaches its limit?

This is impossible to answer concretely without a whole lot more info. Pretty much, you need to have enough memory to meet:

The entire run-time memory requirements of one job, times 700.

The memory requirements of bash to manage that many jobs (bash is not horrible about this, but the job control isn't exactly memory efficient).

Any other memory requirements on the system.

Assuming you meet that (again, with only 50GB of RAM, you still ahve to deal with other issues:

How much CPU time is going to be wasted by bash on job control? Probably not much, but with hundreds of jobs, it could be significant.

How much network bandwidth is this going to need? Just opening all those connections may swamp your network for a couple of minutes depending on your bandwidth and latency.

Many other things I probably haven't thought of.

When that limit is reached, will it just wait to begin the next iteration off foo or will the box crash?

edited 2 days ago

answered Apr 29 at 19:50

Austin Hemmelgarn

6,47111120

There are ways to run a limited number of background tasks (using bash, perl, python, et al), monitor for task completion, and run more tasks as prior tasks complete. A simple approach would be to collect batches of tasks represented by files in subdirectories, and process a batch at a time. There are other ways...

– ChuckCottrill
Apr 29 at 21:41

Does this also include unix-like systems? And what is "GUN parallel"?

– Biswapriyo
2 days ago

2

@ChuckCottrill Yes, there are indeed other ways this could be done. Given my own experience dealing with this type of thing though, it's almost always better to just get a real orchestration tool than to try and roll your own solution, especially once you're past a few dozen systems in terms of scale.

– Austin Hemmelgarn
2 days ago

2

@Baldrickk geekz.co.uk/lovesraymond/archive/gun-linux

– pipe
2 days ago

3

@forest Yes, you could use rlimits to prevent the system from crashing, but getting them right in a case like this is not easy (you kind of need to know what the resource requirements for the tasks are beforehand) and doesn't protect the rest of the network from any impact these jobs may cause (which is arguably a potentially much bigger issue than crashing the local system).

– Austin Hemmelgarn
2 days ago

|
show 5 more comments

Could all 700 instances possibly run concurrently?

How far could I get until my server reaches its limit?

This is impossible to answer concretely without a whole lot more info. Pretty much, you need to have enough memory to meet:

The entire run-time memory requirements of one job, times 700.

The memory requirements of bash to manage that many jobs (bash is not horrible about this, but the job control isn't exactly memory efficient).

Any other memory requirements on the system.

Assuming you meet that (again, with only 50GB of RAM, you still ahve to deal with other issues:

How much CPU time is going to be wasted by bash on job control? Probably not much, but with hundreds of jobs, it could be significant.

How much network bandwidth is this going to need? Just opening all those connections may swamp your network for a couple of minutes depending on your bandwidth and latency.

Many other things I probably haven't thought of.

When that limit is reached, will it just wait to begin the next iteration off foo or will the box crash?

edited 2 days ago

answered Apr 29 at 19:50

Austin Hemmelgarn

6,47111120

There are ways to run a limited number of background tasks (using bash, perl, python, et al), monitor for task completion, and run more tasks as prior tasks complete. A simple approach would be to collect batches of tasks represented by files in subdirectories, and process a batch at a time. There are other ways...

– ChuckCottrill
Apr 29 at 21:41

Does this also include unix-like systems? And what is "GUN parallel"?

– Biswapriyo
2 days ago

2

@ChuckCottrill Yes, there are indeed other ways this could be done. Given my own experience dealing with this type of thing though, it's almost always better to just get a real orchestration tool than to try and roll your own solution, especially once you're past a few dozen systems in terms of scale.

– Austin Hemmelgarn
2 days ago

2

@Baldrickk geekz.co.uk/lovesraymond/archive/gun-linux

– pipe
2 days ago

3

@forest Yes, you could use rlimits to prevent the system from crashing, but getting them right in a case like this is not easy (you kind of need to know what the resource requirements for the tasks are beforehand) and doesn't protect the rest of the network from any impact these jobs may cause (which is arguably a potentially much bigger issue than crashing the local system).

– Austin Hemmelgarn
2 days ago

|
show 5 more comments

Could all 700 instances possibly run concurrently?

How far could I get until my server reaches its limit?

This is impossible to answer concretely without a whole lot more info. Pretty much, you need to have enough memory to meet:

The entire run-time memory requirements of one job, times 700.

The memory requirements of bash to manage that many jobs (bash is not horrible about this, but the job control isn't exactly memory efficient).

Any other memory requirements on the system.

Assuming you meet that (again, with only 50GB of RAM, you still ahve to deal with other issues:

How much CPU time is going to be wasted by bash on job control? Probably not much, but with hundreds of jobs, it could be significant.

How much network bandwidth is this going to need? Just opening all those connections may swamp your network for a couple of minutes depending on your bandwidth and latency.

Many other things I probably haven't thought of.

When that limit is reached, will it just wait to begin the next iteration off foo or will the box crash?

edited 2 days ago

answered Apr 29 at 19:50

Austin Hemmelgarn

6,47111120

Could all 700 instances possibly run concurrently?

How far could I get until my server reaches its limit?

This is impossible to answer concretely without a whole lot more info. Pretty much, you need to have enough memory to meet:

The entire run-time memory requirements of one job, times 700.

The memory requirements of bash to manage that many jobs (bash is not horrible about this, but the job control isn't exactly memory efficient).

Any other memory requirements on the system.

Assuming you meet that (again, with only 50GB of RAM, you still ahve to deal with other issues:

How much CPU time is going to be wasted by bash on job control? Probably not much, but with hundreds of jobs, it could be significant.

How much network bandwidth is this going to need? Just opening all those connections may swamp your network for a couple of minutes depending on your bandwidth and latency.

Many other things I probably haven't thought of.

When that limit is reached, will it just wait to begin the next iteration off foo or will the box crash?

edited 2 days ago

answered Apr 29 at 19:50

Austin Hemmelgarn

6,47111120

edited 2 days ago

answered Apr 29 at 19:50

Austin Hemmelgarn

6,47111120

answered Apr 29 at 19:50

Austin Hemmelgarn

6,47111120

answered Apr 29 at 19:50

Austin Hemmelgarn

6,47111120

There are ways to run a limited number of background tasks (using bash, perl, python, et al), monitor for task completion, and run more tasks as prior tasks complete. A simple approach would be to collect batches of tasks represented by files in subdirectories, and process a batch at a time. There are other ways...

– ChuckCottrill
Apr 29 at 21:41

Does this also include unix-like systems? And what is "GUN parallel"?

– Biswapriyo
2 days ago

2

@ChuckCottrill Yes, there are indeed other ways this could be done. Given my own experience dealing with this type of thing though, it's almost always better to just get a real orchestration tool than to try and roll your own solution, especially once you're past a few dozen systems in terms of scale.

– Austin Hemmelgarn
2 days ago

2

@Baldrickk geekz.co.uk/lovesraymond/archive/gun-linux

– pipe
2 days ago

3

@forest Yes, you could use rlimits to prevent the system from crashing, but getting them right in a case like this is not easy (you kind of need to know what the resource requirements for the tasks are beforehand) and doesn't protect the rest of the network from any impact these jobs may cause (which is arguably a potentially much bigger issue than crashing the local system).

– Austin Hemmelgarn
2 days ago

|
show 5 more comments

There are ways to run a limited number of background tasks (using bash, perl, python, et al), monitor for task completion, and run more tasks as prior tasks complete. A simple approach would be to collect batches of tasks represented by files in subdirectories, and process a batch at a time. There are other ways...

– ChuckCottrill
Apr 29 at 21:41

Does this also include unix-like systems? And what is "GUN parallel"?

– Biswapriyo
2 days ago

2

@ChuckCottrill Yes, there are indeed other ways this could be done. Given my own experience dealing with this type of thing though, it's almost always better to just get a real orchestration tool than to try and roll your own solution, especially once you're past a few dozen systems in terms of scale.

– Austin Hemmelgarn
2 days ago

2

@Baldrickk geekz.co.uk/lovesraymond/archive/gun-linux

– pipe
2 days ago

3

@forest Yes, you could use rlimits to prevent the system from crashing, but getting them right in a case like this is not easy (you kind of need to know what the resource requirements for the tasks are beforehand) and doesn't protect the rest of the network from any impact these jobs may cause (which is arguably a potentially much bigger issue than crashing the local system).

– Austin Hemmelgarn
2 days ago

There are ways to run a limited number of background tasks (using bash, perl, python, et al), monitor for task completion, and run more tasks as prior tasks complete. A simple approach would be to collect batches of tasks represented by files in subdirectories, and process a batch at a time. There are other ways...

– ChuckCottrill
Apr 29 at 21:41

Does this also include unix-like systems? And what is "GUN parallel"?

– Biswapriyo
2 days ago

@ChuckCottrill Yes, there are indeed other ways this could be done. Given my own experience dealing with this type of thing though, it's almost always better to just get a real orchestration tool than to try and roll your own solution, especially once you're past a few dozen systems in terms of scale.

– Austin Hemmelgarn
2 days ago

@Baldrickk geekz.co.uk/lovesraymond/archive/gun-linux

– pipe
2 days ago

@forest Yes, you could use rlimits to prevent the system from crashing, but getting them right in a case like this is not easy (you kind of need to know what the resource requirements for the tasks are beforehand) and doesn't protect the rest of the network from any impact these jobs may cause (which is arguably a potentially much bigger issue than crashing the local system).

– Austin Hemmelgarn
2 days ago

|
show 5 more comments

May I suggest that you use GNU parallel (https://www.gnu.org/software/parallel/) or something similar to accomplish this? It would give you a number of advantages to the background job approach:

You can easily change the number of concurrent sessions.

And it will wait until sessions complete before it starts new ones.

It it easier to abort.

Have a look here for a quick start: https://www.gnu.org/software/parallel/parallel_tutorial.html#A-single-input-source

answered Apr 29 at 17:53

laenkeio

1866

New contributor

1

Interesting! I'll take a look at this. Do you know if attempting this kind of operation (without the help of Parallel) would risk crashing the hypervisor?

– KuboMD
Apr 29 at 18:15

2

@KuboMD if you can crash the hypervisor with something so mundane, it's a bug in the hypervisor :)

– hobbs
2 days ago

as an aside, web servers often use threading or event-based processing (example: gunicorn.org)

– ChuckCottrill
2 days ago

add a comment |

May I suggest that you use GNU parallel (https://www.gnu.org/software/parallel/) or something similar to accomplish this? It would give you a number of advantages to the background job approach:

You can easily change the number of concurrent sessions.

And it will wait until sessions complete before it starts new ones.

It it easier to abort.

Have a look here for a quick start: https://www.gnu.org/software/parallel/parallel_tutorial.html#A-single-input-source

answered Apr 29 at 17:53

laenkeio

1866

New contributor

1

Interesting! I'll take a look at this. Do you know if attempting this kind of operation (without the help of Parallel) would risk crashing the hypervisor?

– KuboMD
Apr 29 at 18:15

2

@KuboMD if you can crash the hypervisor with something so mundane, it's a bug in the hypervisor :)

– hobbs
2 days ago

as an aside, web servers often use threading or event-based processing (example: gunicorn.org)

– ChuckCottrill
2 days ago

add a comment |

May I suggest that you use GNU parallel (https://www.gnu.org/software/parallel/) or something similar to accomplish this? It would give you a number of advantages to the background job approach:

You can easily change the number of concurrent sessions.

And it will wait until sessions complete before it starts new ones.

It it easier to abort.

Have a look here for a quick start: https://www.gnu.org/software/parallel/parallel_tutorial.html#A-single-input-source

answered Apr 29 at 17:53

laenkeio

1866

New contributor

May I suggest that you use GNU parallel (https://www.gnu.org/software/parallel/) or something similar to accomplish this? It would give you a number of advantages to the background job approach:

You can easily change the number of concurrent sessions.

And it will wait until sessions complete before it starts new ones.

It it easier to abort.

Have a look here for a quick start: https://www.gnu.org/software/parallel/parallel_tutorial.html#A-single-input-source

answered Apr 29 at 17:53

laenkeio

1866

New contributor

answered Apr 29 at 17:53

laenkeio

1866

New contributor

answered Apr 29 at 17:53

laenkeio

1866

answered Apr 29 at 17:53

laenkeio

1866

New contributor

laenkeio is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.

1

Interesting! I'll take a look at this. Do you know if attempting this kind of operation (without the help of Parallel) would risk crashing the hypervisor?

– KuboMD
Apr 29 at 18:15

2

@KuboMD if you can crash the hypervisor with something so mundane, it's a bug in the hypervisor :)

– hobbs
2 days ago

as an aside, web servers often use threading or event-based processing (example: gunicorn.org)

– ChuckCottrill
2 days ago

add a comment |

1

Interesting! I'll take a look at this. Do you know if attempting this kind of operation (without the help of Parallel) would risk crashing the hypervisor?

– KuboMD
Apr 29 at 18:15

2

@KuboMD if you can crash the hypervisor with something so mundane, it's a bug in the hypervisor :)

– hobbs
2 days ago

as an aside, web servers often use threading or event-based processing (example: gunicorn.org)

– ChuckCottrill
2 days ago

Interesting! I'll take a look at this. Do you know if attempting this kind of operation (without the help of Parallel) would risk crashing the hypervisor?

– KuboMD
Apr 29 at 18:15

@KuboMD if you can crash the hypervisor with something so mundane, it's a bug in the hypervisor :)

– hobbs
2 days ago

as an aside, web servers often use threading or event-based processing (example: gunicorn.org)

– ChuckCottrill
2 days ago

add a comment |

ls ~/sagLogs/ | parallel --delay 0.5 --memfree 1G -j0 --joblog my.log --retries 10 foo

If a job fails (i.e. returns with an error code) it will be retried 10 times.

my.log will tell you if a job succeed (after possibly retries) or not.

answered Apr 29 at 22:52

Ole Tange

13.2k1658107

This looks very promising, thank you.

– KuboMD
2 days ago

Ran a simple test doing cat ~/sagLogs/* >> ~/woah | parallel and holy moly that was fast. 1,054,552 lines in the blink of an eye.

– KuboMD
2 days ago

3

The command you gave has dual redirection, so I donot think it does what you intend it to do. GNU Parallel has an overhead of 10 ms per job, so 1M jobs should take in the order of 3 hours.

– Ole Tange
2 days ago

1

It is not applicable at all if all you want to do is simply to concatenate the files.

– Ole Tange
2 days ago

1

@KuboMD a trivial CPU busy loop like awk 'BEGINfor(i=rand()*10000000; i<100000000;i++)' would work for playing around with. Or try it on a task like sleep 10 to see it keep n jobs in flight without using much CPU time. e.g. time parallel sleep ::: 100..1 to run sleeps from 100 down to 1 second.

– Peter Cordes
2 days ago

|
show 7 more comments

ls ~/sagLogs/ | parallel --delay 0.5 --memfree 1G -j0 --joblog my.log --retries 10 foo

If a job fails (i.e. returns with an error code) it will be retried 10 times.

my.log will tell you if a job succeed (after possibly retries) or not.

answered Apr 29 at 22:52

Ole Tange

13.2k1658107

This looks very promising, thank you.

– KuboMD
2 days ago

Ran a simple test doing cat ~/sagLogs/* >> ~/woah | parallel and holy moly that was fast. 1,054,552 lines in the blink of an eye.

– KuboMD
2 days ago

3

The command you gave has dual redirection, so I donot think it does what you intend it to do. GNU Parallel has an overhead of 10 ms per job, so 1M jobs should take in the order of 3 hours.

– Ole Tange
2 days ago

1

It is not applicable at all if all you want to do is simply to concatenate the files.

– Ole Tange
2 days ago

1

@KuboMD a trivial CPU busy loop like awk 'BEGINfor(i=rand()*10000000; i<100000000;i++)' would work for playing around with. Or try it on a task like sleep 10 to see it keep n jobs in flight without using much CPU time. e.g. time parallel sleep ::: 100..1 to run sleeps from 100 down to 1 second.

– Peter Cordes
2 days ago

|
show 7 more comments

ls ~/sagLogs/ | parallel --delay 0.5 --memfree 1G -j0 --joblog my.log --retries 10 foo

If a job fails (i.e. returns with an error code) it will be retried 10 times.

my.log will tell you if a job succeed (after possibly retries) or not.

answered Apr 29 at 22:52

Ole Tange

13.2k1658107

ls ~/sagLogs/ | parallel --delay 0.5 --memfree 1G -j0 --joblog my.log --retries 10 foo

If a job fails (i.e. returns with an error code) it will be retried 10 times.

my.log will tell you if a job succeed (after possibly retries) or not.

answered Apr 29 at 22:52

Ole Tange

13.2k1658107

answered Apr 29 at 22:52

Ole Tange

13.2k1658107

answered Apr 29 at 22:52

Ole Tange

13.2k1658107

answered Apr 29 at 22:52

Ole Tange

13.2k1658107

This looks very promising, thank you.

– KuboMD
2 days ago

Ran a simple test doing cat ~/sagLogs/* >> ~/woah | parallel and holy moly that was fast. 1,054,552 lines in the blink of an eye.

– KuboMD
2 days ago

3

The command you gave has dual redirection, so I donot think it does what you intend it to do. GNU Parallel has an overhead of 10 ms per job, so 1M jobs should take in the order of 3 hours.

– Ole Tange
2 days ago

1

It is not applicable at all if all you want to do is simply to concatenate the files.

– Ole Tange
2 days ago

1

@KuboMD a trivial CPU busy loop like awk 'BEGINfor(i=rand()*10000000; i<100000000;i++)' would work for playing around with. Or try it on a task like sleep 10 to see it keep n jobs in flight without using much CPU time. e.g. time parallel sleep ::: 100..1 to run sleeps from 100 down to 1 second.

– Peter Cordes
2 days ago

|
show 7 more comments

This looks very promising, thank you.

– KuboMD
2 days ago

Ran a simple test doing cat ~/sagLogs/* >> ~/woah | parallel and holy moly that was fast. 1,054,552 lines in the blink of an eye.

– KuboMD
2 days ago

3

The command you gave has dual redirection, so I donot think it does what you intend it to do. GNU Parallel has an overhead of 10 ms per job, so 1M jobs should take in the order of 3 hours.

– Ole Tange
2 days ago

1

It is not applicable at all if all you want to do is simply to concatenate the files.

– Ole Tange
2 days ago

1

@KuboMD a trivial CPU busy loop like awk 'BEGINfor(i=rand()*10000000; i<100000000;i++)' would work for playing around with. Or try it on a task like sleep 10 to see it keep n jobs in flight without using much CPU time. e.g. time parallel sleep ::: 100..1 to run sleeps from 100 down to 1 second.

– Peter Cordes
2 days ago

This looks very promising, thank you.

– KuboMD
2 days ago

Ran a simple test doing cat ~/sagLogs/* >> ~/woah | parallel and holy moly that was fast. 1,054,552 lines in the blink of an eye.

– KuboMD
2 days ago

The command you gave has dual redirection, so I donot think it does what you intend it to do. GNU Parallel has an overhead of 10 ms per job, so 1M jobs should take in the order of 3 hours.

– Ole Tange
2 days ago

It is not applicable at all if all you want to do is simply to concatenate the files.

– Ole Tange
2 days ago

@KuboMD a trivial CPU busy loop like awk 'BEGINfor(i=rand()*10000000; i<100000000;i++)' would work for playing around with. Or try it on a task like sleep 10 to see it keep n jobs in flight without using much CPU time. e.g. time parallel sleep ::: 100..1 to run sleeps from 100 down to 1 second.

– Peter Cordes
2 days ago

|
show 7 more comments

What happens if I start too many background jobs?

cpu = xeon E5649, that is a 12-core cpu; so you have 12 cores for 12 processes to run concurrently each utilizing one of twelve cores at 100%. If you kick off 24 processes, then each would run at 50% utilization on each of twelve cores, 700 processes = 1.7% but it's a computer as long as everything completes properly in an ok amount of time then that = success; being efficient is not always relevant.
1. Could all 700 instances possibly run concurrently? Certainly, 700 is not a large number; my /etc/security/limits.conf maxproc default is 4,135,275 for example
2. How far could I get until my server reaches its limit? Much farther than 700 I'm sure.
3. Limits... what will happen if the script is kicked off under a user account [and generally root as well limits.conf pretty much applies to everyone] is that the script will just exit after having tried to do foo & 700 times; you would expect to then see 700 foo processes each with a different pid but you might only see 456 (random number choice) and the other 244 never started because they got blocked by some security or systemd limit.

Million $ question: how many should u run concurrently?

By default, RHEL limits the number of telnet connections from a single host to 10 simultaneous sessions. This is a security feature... set to 10, /etc/xinetd.conf, change “per_source” value.

edited 2 days ago

answered 2 days ago

ron

1,2022819

add a comment |

What happens if I start too many background jobs?

cpu = xeon E5649, that is a 12-core cpu; so you have 12 cores for 12 processes to run concurrently each utilizing one of twelve cores at 100%. If you kick off 24 processes, then each would run at 50% utilization on each of twelve cores, 700 processes = 1.7% but it's a computer as long as everything completes properly in an ok amount of time then that = success; being efficient is not always relevant.
1. Could all 700 instances possibly run concurrently? Certainly, 700 is not a large number; my /etc/security/limits.conf maxproc default is 4,135,275 for example
2. How far could I get until my server reaches its limit? Much farther than 700 I'm sure.
3. Limits... what will happen if the script is kicked off under a user account [and generally root as well limits.conf pretty much applies to everyone] is that the script will just exit after having tried to do foo & 700 times; you would expect to then see 700 foo processes each with a different pid but you might only see 456 (random number choice) and the other 244 never started because they got blocked by some security or systemd limit.

Million $ question: how many should u run concurrently?

By default, RHEL limits the number of telnet connections from a single host to 10 simultaneous sessions. This is a security feature... set to 10, /etc/xinetd.conf, change “per_source” value.

edited 2 days ago

answered 2 days ago

ron

1,2022819

add a comment |

What happens if I start too many background jobs?

cpu = xeon E5649, that is a 12-core cpu; so you have 12 cores for 12 processes to run concurrently each utilizing one of twelve cores at 100%. If you kick off 24 processes, then each would run at 50% utilization on each of twelve cores, 700 processes = 1.7% but it's a computer as long as everything completes properly in an ok amount of time then that = success; being efficient is not always relevant.
1. Could all 700 instances possibly run concurrently? Certainly, 700 is not a large number; my /etc/security/limits.conf maxproc default is 4,135,275 for example
2. How far could I get until my server reaches its limit? Much farther than 700 I'm sure.
3. Limits... what will happen if the script is kicked off under a user account [and generally root as well limits.conf pretty much applies to everyone] is that the script will just exit after having tried to do foo & 700 times; you would expect to then see 700 foo processes each with a different pid but you might only see 456 (random number choice) and the other 244 never started because they got blocked by some security or systemd limit.

Million $ question: how many should u run concurrently?

By default, RHEL limits the number of telnet connections from a single host to 10 simultaneous sessions. This is a security feature... set to 10, /etc/xinetd.conf, change “per_source” value.

edited 2 days ago

answered 2 days ago

ron

1,2022819

What happens if I start too many background jobs?

cpu = xeon E5649, that is a 12-core cpu; so you have 12 cores for 12 processes to run concurrently each utilizing one of twelve cores at 100%. If you kick off 24 processes, then each would run at 50% utilization on each of twelve cores, 700 processes = 1.7% but it's a computer as long as everything completes properly in an ok amount of time then that = success; being efficient is not always relevant.
1. Could all 700 instances possibly run concurrently? Certainly, 700 is not a large number; my /etc/security/limits.conf maxproc default is 4,135,275 for example
2. How far could I get until my server reaches its limit? Much farther than 700 I'm sure.
3. Limits... what will happen if the script is kicked off under a user account [and generally root as well limits.conf pretty much applies to everyone] is that the script will just exit after having tried to do foo & 700 times; you would expect to then see 700 foo processes each with a different pid but you might only see 456 (random number choice) and the other 244 never started because they got blocked by some security or systemd limit.

Million $ question: how many should u run concurrently?

By default, RHEL limits the number of telnet connections from a single host to 10 simultaneous sessions. This is a security feature... set to 10, /etc/xinetd.conf, change “per_source” value.

edited 2 days ago

answered 2 days ago

ron

1,2022819

edited 2 days ago

answered 2 days ago

ron

1,2022819

answered 2 days ago

ron

1,2022819

answered 2 days ago

ron

1,2022819

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

BB,AX5FFcP6F VU,biRWL6d,KkHXdw11MguDC,elGLDLS nJZ cciRv4H7CM,TY NFYb,baF3H,6T9zHtVB20DZ4Q,JxTySF2x3zf

搜尋此網誌

Ttdfjt

4 Answers
4

Your Answer

Post as a guest

4 Answers
4

4 Answers
4

Post as a guest

Popular posts from this blog

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

4 Answers 4

4 Answers 4

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

4 Answers
4

4 Answers
4

4 Answers
4