What happens if I start too many background jobs?Where do background jobs go?Is it possible to customise the prompt to show the if there are any background jobs?What happens to suspended jobs in unix?Modify background jobs and/or notify on job completion post-launchLive monitoring of background jobsAre background jobs and foreground jobs always in running state?What happens to background jobs after exiting the shell?How do I wait for background jobs in the background?how to wait for many background jobs in bashHow can we kill all the background jobs (running or not) in the current shell?
My advisor talks about me to his colleague
How did the Venus Express detect lightning?
Where can I go to avoid planes overhead?
Introducing Gladys, an intrepid globetrotter
In Russian, how do you idiomatically express the idea of the figurative "overnight"?
What does this wavy downward arrow preceding a piano chord mean?
Should I decline this job offer that requires relocating to an area with high cost of living?
Where are the "shires" in the UK?
finding a solution for this recurrence relation
SafeCracker #3 - We've Been Blocked
Floor of Riemann zeta function
Can my company stop me from working overtime?
ZSPL language, anyone heard of it?
Would glacier 'trees' be plausible?
How do inspiraling black holes get closer?
60s/70s science fiction novel where a man (after years of trying) finally succeeds to make a coin levitate by sheer concentration
Out of scope work duties and resignation
Why is "breaking the mould" positively connoted?
What was the first story to feature the plot "the monsters were human all along"?
What exactly are the `size issues' preventing formation of presheaves being a left adjoint to some forgetful functor?
What does "Managed by Windows" do in the Power options for network connection?
Adjacent DEM color matching in QGIS
Identifying characters
Are Finitely generated modules over a ring also finitely generated over a subring containing the identity?
What happens if I start too many background jobs?
Where do background jobs go?Is it possible to customise the prompt to show the if there are any background jobs?What happens to suspended jobs in unix?Modify background jobs and/or notify on job completion post-launchLive monitoring of background jobsAre background jobs and foreground jobs always in running state?What happens to background jobs after exiting the shell?How do I wait for background jobs in the background?how to wait for many background jobs in bashHow can we kill all the background jobs (running or not) in the current shell?
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I need to do some work on 700 network devices using an expect script. I can get it done sequentially, but so far the runtime is around 24 hours. This is mostly due to the time it takes to establish a connection and the delay in the output from these devices (old ones). I'm able to establish two connections and have them run in parallel just fine, but how far can I push that?
I don't imagine I could do all 700 of them at once, surely there's some limit to the no. of telnet connections my VM can manage.
If I did try to start 700 of them in some sort of loop like this:
for node in `ls ~/sagLogs/`; do
foo &
done
With
CPU 12 CPUs x Intel(R) Xeon(R) CPU E5649 @ 2.53GHz
Memory 47.94 GB
My question is:
- Could all 700 instances possibly run concurrently?
- How far could I get until my server reaches its limit?
- When that limit is reached, will it just wait to begin the next iteration off
foo
or will the box crash?
I'm running in a corporate production environment unfortunately, so I can't exactly just try and see what happens.
bash background-process expect telnet jobs
|
show 6 more comments
I need to do some work on 700 network devices using an expect script. I can get it done sequentially, but so far the runtime is around 24 hours. This is mostly due to the time it takes to establish a connection and the delay in the output from these devices (old ones). I'm able to establish two connections and have them run in parallel just fine, but how far can I push that?
I don't imagine I could do all 700 of them at once, surely there's some limit to the no. of telnet connections my VM can manage.
If I did try to start 700 of them in some sort of loop like this:
for node in `ls ~/sagLogs/`; do
foo &
done
With
CPU 12 CPUs x Intel(R) Xeon(R) CPU E5649 @ 2.53GHz
Memory 47.94 GB
My question is:
- Could all 700 instances possibly run concurrently?
- How far could I get until my server reaches its limit?
- When that limit is reached, will it just wait to begin the next iteration off
foo
or will the box crash?
I'm running in a corporate production environment unfortunately, so I can't exactly just try and see what happens.
bash background-process expect telnet jobs
3
I've had good luck withparallel
, using around 50 concurrent jobs. It's a great medium between parallelism of 1 and 700. The other nice thing is that's batchless. A single stalled connection will only stall itself, not any of the others. The main downside is error management. None of these shell-based approaches will gracefully handle errors. You'll have to manually check for success yourself, and do your own retries.
– Adam
Apr 29 at 21:38
1
Your task queue may be 700 today, but can the size expand? Watch for swap space to grow - that is indication you have reached memory limit. And cpu % is not a good measure (for linux/unix), better to consider load average (run queue length).
– ChuckCottrill
Apr 29 at 21:45
1
The most recent way I broke production at my still-kinda-new job was by accidentally running a million plus short-lived background jobs at once. They involved JVMs (wait wait put the pitchforks down), so the consequences were 'limited' to hundreds of thousands of error report files that threads couldn't be started.
– michaelb958
Apr 29 at 23:04
4
Nitpick: Don't parsels
output
– l0b0
Apr 30 at 1:18
1
@KuboMD And as long as nobody else ever wants to use your code.
– l0b0
2 days ago
|
show 6 more comments
I need to do some work on 700 network devices using an expect script. I can get it done sequentially, but so far the runtime is around 24 hours. This is mostly due to the time it takes to establish a connection and the delay in the output from these devices (old ones). I'm able to establish two connections and have them run in parallel just fine, but how far can I push that?
I don't imagine I could do all 700 of them at once, surely there's some limit to the no. of telnet connections my VM can manage.
If I did try to start 700 of them in some sort of loop like this:
for node in `ls ~/sagLogs/`; do
foo &
done
With
CPU 12 CPUs x Intel(R) Xeon(R) CPU E5649 @ 2.53GHz
Memory 47.94 GB
My question is:
- Could all 700 instances possibly run concurrently?
- How far could I get until my server reaches its limit?
- When that limit is reached, will it just wait to begin the next iteration off
foo
or will the box crash?
I'm running in a corporate production environment unfortunately, so I can't exactly just try and see what happens.
bash background-process expect telnet jobs
I need to do some work on 700 network devices using an expect script. I can get it done sequentially, but so far the runtime is around 24 hours. This is mostly due to the time it takes to establish a connection and the delay in the output from these devices (old ones). I'm able to establish two connections and have them run in parallel just fine, but how far can I push that?
I don't imagine I could do all 700 of them at once, surely there's some limit to the no. of telnet connections my VM can manage.
If I did try to start 700 of them in some sort of loop like this:
for node in `ls ~/sagLogs/`; do
foo &
done
With
CPU 12 CPUs x Intel(R) Xeon(R) CPU E5649 @ 2.53GHz
Memory 47.94 GB
My question is:
- Could all 700 instances possibly run concurrently?
- How far could I get until my server reaches its limit?
- When that limit is reached, will it just wait to begin the next iteration off
foo
or will the box crash?
I'm running in a corporate production environment unfortunately, so I can't exactly just try and see what happens.
bash background-process expect telnet jobs
bash background-process expect telnet jobs
asked Apr 29 at 17:30
KuboMDKuboMD
23810
23810
3
I've had good luck withparallel
, using around 50 concurrent jobs. It's a great medium between parallelism of 1 and 700. The other nice thing is that's batchless. A single stalled connection will only stall itself, not any of the others. The main downside is error management. None of these shell-based approaches will gracefully handle errors. You'll have to manually check for success yourself, and do your own retries.
– Adam
Apr 29 at 21:38
1
Your task queue may be 700 today, but can the size expand? Watch for swap space to grow - that is indication you have reached memory limit. And cpu % is not a good measure (for linux/unix), better to consider load average (run queue length).
– ChuckCottrill
Apr 29 at 21:45
1
The most recent way I broke production at my still-kinda-new job was by accidentally running a million plus short-lived background jobs at once. They involved JVMs (wait wait put the pitchforks down), so the consequences were 'limited' to hundreds of thousands of error report files that threads couldn't be started.
– michaelb958
Apr 29 at 23:04
4
Nitpick: Don't parsels
output
– l0b0
Apr 30 at 1:18
1
@KuboMD And as long as nobody else ever wants to use your code.
– l0b0
2 days ago
|
show 6 more comments
3
I've had good luck withparallel
, using around 50 concurrent jobs. It's a great medium between parallelism of 1 and 700. The other nice thing is that's batchless. A single stalled connection will only stall itself, not any of the others. The main downside is error management. None of these shell-based approaches will gracefully handle errors. You'll have to manually check for success yourself, and do your own retries.
– Adam
Apr 29 at 21:38
1
Your task queue may be 700 today, but can the size expand? Watch for swap space to grow - that is indication you have reached memory limit. And cpu % is not a good measure (for linux/unix), better to consider load average (run queue length).
– ChuckCottrill
Apr 29 at 21:45
1
The most recent way I broke production at my still-kinda-new job was by accidentally running a million plus short-lived background jobs at once. They involved JVMs (wait wait put the pitchforks down), so the consequences were 'limited' to hundreds of thousands of error report files that threads couldn't be started.
– michaelb958
Apr 29 at 23:04
4
Nitpick: Don't parsels
output
– l0b0
Apr 30 at 1:18
1
@KuboMD And as long as nobody else ever wants to use your code.
– l0b0
2 days ago
3
3
I've had good luck with
parallel
, using around 50 concurrent jobs. It's a great medium between parallelism of 1 and 700. The other nice thing is that's batchless. A single stalled connection will only stall itself, not any of the others. The main downside is error management. None of these shell-based approaches will gracefully handle errors. You'll have to manually check for success yourself, and do your own retries.– Adam
Apr 29 at 21:38
I've had good luck with
parallel
, using around 50 concurrent jobs. It's a great medium between parallelism of 1 and 700. The other nice thing is that's batchless. A single stalled connection will only stall itself, not any of the others. The main downside is error management. None of these shell-based approaches will gracefully handle errors. You'll have to manually check for success yourself, and do your own retries.– Adam
Apr 29 at 21:38
1
1
Your task queue may be 700 today, but can the size expand? Watch for swap space to grow - that is indication you have reached memory limit. And cpu % is not a good measure (for linux/unix), better to consider load average (run queue length).
– ChuckCottrill
Apr 29 at 21:45
Your task queue may be 700 today, but can the size expand? Watch for swap space to grow - that is indication you have reached memory limit. And cpu % is not a good measure (for linux/unix), better to consider load average (run queue length).
– ChuckCottrill
Apr 29 at 21:45
1
1
The most recent way I broke production at my still-kinda-new job was by accidentally running a million plus short-lived background jobs at once. They involved JVMs (wait wait put the pitchforks down), so the consequences were 'limited' to hundreds of thousands of error report files that threads couldn't be started.
– michaelb958
Apr 29 at 23:04
The most recent way I broke production at my still-kinda-new job was by accidentally running a million plus short-lived background jobs at once. They involved JVMs (wait wait put the pitchforks down), so the consequences were 'limited' to hundreds of thousands of error report files that threads couldn't be started.
– michaelb958
Apr 29 at 23:04
4
4
Nitpick: Don't parse
ls
output– l0b0
Apr 30 at 1:18
Nitpick: Don't parse
ls
output– l0b0
Apr 30 at 1:18
1
1
@KuboMD And as long as nobody else ever wants to use your code.
– l0b0
2 days ago
@KuboMD And as long as nobody else ever wants to use your code.
– l0b0
2 days ago
|
show 6 more comments
4 Answers
4
active
oldest
votes
Could all 700 instances possibly run concurrently?
That depends on what you mean by concurrently. If we're being picky, then no, they can't unless you have 700 threads of execution on your system you can utilize (so probably not). Realistically though, yes, they probably can, provided you have enough RAM and/or swap space on the system. UNIX and it's various children are remarkably good at managing huge levels of concurrency, that's part of why they're so popular for large-scale HPC usage.
How far could I get until my server reaches its limit?
This is impossible to answer concretely without a whole lot more info. Pretty much, you need to have enough memory to meet:
- The entire run-time memory requirements of one job, times 700.
- The memory requirements of bash to manage that many jobs (bash is not horrible about this, but the job control isn't exactly memory efficient).
- Any other memory requirements on the system.
Assuming you meet that (again, with only 50GB of RAM, you still ahve to deal with other issues:
- How much CPU time is going to be wasted by bash on job control? Probably not much, but with hundreds of jobs, it could be significant.
- How much network bandwidth is this going to need? Just opening all those connections may swamp your network for a couple of minutes depending on your bandwidth and latency.
- Many other things I probably haven't thought of.
When that limit is reached, will it just wait to begin the next iteration off foo or will the box crash?
It depends on what limit is hit. If it's memory, something will die on the system (more specifically, get killed by the kernel in an attempt to free up memory) or the system itself may crash (it's not unusual to configure systems to intentionally crash when running out of memory). If it's CPU time, it will just keep going without issue, it'll just be impossible to do much else on the system. If it's the network though, you might crash other systems or services.
What you really need here is not to run all the jobs at the same time. Instead, split them into batches, and run all the jobs within a batch at the same time, let them finish, then start the next batch. GNU Parallel (https://www.gnu.org/software/parallel/) can be used for this, but it's less than ideal at that scale in a production environment (if you go with it, don't get too aggressive, like I said, you might swamp the network and affect systems you otherwise would not be touching). I would really recommend looking into a proper network orchestration tool like Ansible (https://www.ansible.com/), as that will not only solve your concurrency issues (Ansible does batching like I mentioned above automatically), but also give you a lot of other useful features to work with (like idempotent execution of tasks, nice status reports, and native integration with a very large number of other tools).
There are ways to run a limited number of background tasks (using bash, perl, python, et al), monitor for task completion, and run more tasks as prior tasks complete. A simple approach would be to collect batches of tasks represented by files in subdirectories, and process a batch at a time. There are other ways...
– ChuckCottrill
Apr 29 at 21:41
Does this also include unix-like systems? And what is "GUN parallel"?
– Biswapriyo
2 days ago
2
@ChuckCottrill Yes, there are indeed other ways this could be done. Given my own experience dealing with this type of thing though, it's almost always better to just get a real orchestration tool than to try and roll your own solution, especially once you're past a few dozen systems in terms of scale.
– Austin Hemmelgarn
2 days ago
2
@Baldrickk geekz.co.uk/lovesraymond/archive/gun-linux
– pipe
2 days ago
3
@forest Yes, you could use rlimits to prevent the system from crashing, but getting them right in a case like this is not easy (you kind of need to know what the resource requirements for the tasks are beforehand) and doesn't protect the rest of the network from any impact these jobs may cause (which is arguably a potentially much bigger issue than crashing the local system).
– Austin Hemmelgarn
2 days ago
|
show 5 more comments
It's hard to say specifically how many instances could be run as background jobs in the manner you describe. But a normal server can certainly maintain 700 concurrent connections as long as you do it correctly. Webservers do this all the time.
May I suggest that you use GNU parallel (https://www.gnu.org/software/parallel/) or something similar to accomplish this? It would give you a number of advantages to the background job approach:
- You can easily change the number of concurrent sessions.
- And it will wait until sessions complete before it starts new ones.
- It it easier to abort.
Have a look here for a quick start: https://www.gnu.org/software/parallel/parallel_tutorial.html#A-single-input-source
New contributor
1
Interesting! I'll take a look at this. Do you know if attempting this kind of operation (without the help of Parallel) would risk crashing the hypervisor?
– KuboMD
Apr 29 at 18:15
2
@KuboMD if you can crash the hypervisor with something so mundane, it's a bug in the hypervisor :)
– hobbs
2 days ago
as an aside, web servers often use threading or event-based processing (example: gunicorn.org)
– ChuckCottrill
2 days ago
add a comment |
Using &
for parallel processing is fine when doing a few, and when you monitor progress. But if you are running in a corporate production environment you need something that gives you better control.
ls ~/sagLogs/ | parallel --delay 0.5 --memfree 1G -j0 --joblog my.log --retries 10 foo
This will run foo
for each file in ~/sagLogs
. It start a job every 0.5 seconds, it will run as many jobs in parallel as possible as long as 1 GB RAM is free, but will respect the limits on your system (e.g. number of files and processes). Typically this means you will be running 250 jobs in parallel if you have not adjusted the number of open files allowed. If you adjust the number of open files, you should have no problem running 32000 in parallel - as long as you have enough memory.
If a job fails (i.e. returns with an error code) it will be retried 10 times.
my.log
will tell you if a job succeed (after possibly retries) or not.
This looks very promising, thank you.
– KuboMD
2 days ago
Ran a simple test doingcat ~/sagLogs/* >> ~/woah | parallel
and holy moly that was fast. 1,054,552 lines in the blink of an eye.
– KuboMD
2 days ago
3
The command you gave has dual redirection, so I donot think it does what you intend it to do. GNU Parallel has an overhead of 10 ms per job, so 1M jobs should take in the order of 3 hours.
– Ole Tange
2 days ago
1
It is not applicable at all if all you want to do is simply to concatenate the files.
– Ole Tange
2 days ago
1
@KuboMD a trivial CPU busy loop likeawk 'BEGINfor(i=rand()*10000000; i<100000000;i++)'
would work for playing around with. Or try it on a task likesleep 10
to see it keepn
jobs in flight without using much CPU time. e.g.time parallel sleep ::: 100..1
to run sleeps from 100 down to 1 second.
– Peter Cordes
2 days ago
|
show 7 more comments
What happens if I start too many background jobs?
the system will become slow and unresponsive, worst case is so unresponsive it would be best to just push the power button and do a hard reboot... this would be running something as root where it had the privilege to get away with doing that. If your bash script is running under regular user privileges, then the first thing that comes to mind is /etc/security/limits.conf
and /etc/systemd/system.conf
and all the variables therein to [ideally speaking] prevent user(s) from overloading the system.
cpu = xeon E5649, that is a 12-core cpu; so you have 12 cores for 12 processes to run concurrently each utilizing one of twelve cores at 100%. If you kick off 24 processes, then each would run at 50% utilization on each of twelve cores, 700 processes = 1.7% but it's a computer as long as everything completes properly in an ok amount of time then that = success; being efficient is not always relevant.
Could all 700 instances possibly run concurrently? Certainly, 700 is not a large number; my /etc/security/limits.conf
maxproc
default is 4,135,275 for exampleHow far could I get until my server reaches its limit? Much farther than 700 I'm sure.
Limits... what will happen if the script is kicked off under a user account [and generally root as well
limits.conf
pretty much applies to everyone] is that the script will just exit after having tried to dofoo &
700 times; you would expect to then see 700 foo processes each with a different pid but you might only see 456 (random number choice) and the other 244 never started because they got blocked by some security or systemd limit.
Million $ question: how many should u run concurrently?
being involved with network and you said each will do a telnet connection, educated guess is you will run into network limits and overhead before you do for cpu and ram limits. But I don't know what you are doing specifically, what will likely happen is u can kick off all 700 at once, but things will automatically block until previous processes and network connections finish and close based on various system limits, or something like the first 500 will kick off then the remaining 200 won't because system or kernel limits prevent it. But however many run at once, there will be some sweetish spot to get things done as fast as possible... minimizing overhead and increasing efficiency. Being 12 cores (or 24 if you have 2 cpu's) then start with 12 (or 24) at once and then increase that concurrent batch number by 12 or 24 until you don't see run time improvement.
hint: google max telnet connections and see how this applies to your system(s). Also don't forget about firewalls. Also do quick calculation of memory needed per process x 700; make sure < available RAM (about 50gb in your case) otherwise system will start using SWAP and basically become unresponsive. So kick of 12, 24, N processes at a time and monitor RAM free, then increase N already having some knowledge of what's happening.
By default, RHEL limits the number of telnet connections from a single host to 10 simultaneous sessions. This is a security feature... set to 10, /etc/xinetd.conf, change “per_source” value.
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f516203%2fwhat-happens-if-i-start-too-many-background-jobs%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
Could all 700 instances possibly run concurrently?
That depends on what you mean by concurrently. If we're being picky, then no, they can't unless you have 700 threads of execution on your system you can utilize (so probably not). Realistically though, yes, they probably can, provided you have enough RAM and/or swap space on the system. UNIX and it's various children are remarkably good at managing huge levels of concurrency, that's part of why they're so popular for large-scale HPC usage.
How far could I get until my server reaches its limit?
This is impossible to answer concretely without a whole lot more info. Pretty much, you need to have enough memory to meet:
- The entire run-time memory requirements of one job, times 700.
- The memory requirements of bash to manage that many jobs (bash is not horrible about this, but the job control isn't exactly memory efficient).
- Any other memory requirements on the system.
Assuming you meet that (again, with only 50GB of RAM, you still ahve to deal with other issues:
- How much CPU time is going to be wasted by bash on job control? Probably not much, but with hundreds of jobs, it could be significant.
- How much network bandwidth is this going to need? Just opening all those connections may swamp your network for a couple of minutes depending on your bandwidth and latency.
- Many other things I probably haven't thought of.
When that limit is reached, will it just wait to begin the next iteration off foo or will the box crash?
It depends on what limit is hit. If it's memory, something will die on the system (more specifically, get killed by the kernel in an attempt to free up memory) or the system itself may crash (it's not unusual to configure systems to intentionally crash when running out of memory). If it's CPU time, it will just keep going without issue, it'll just be impossible to do much else on the system. If it's the network though, you might crash other systems or services.
What you really need here is not to run all the jobs at the same time. Instead, split them into batches, and run all the jobs within a batch at the same time, let them finish, then start the next batch. GNU Parallel (https://www.gnu.org/software/parallel/) can be used for this, but it's less than ideal at that scale in a production environment (if you go with it, don't get too aggressive, like I said, you might swamp the network and affect systems you otherwise would not be touching). I would really recommend looking into a proper network orchestration tool like Ansible (https://www.ansible.com/), as that will not only solve your concurrency issues (Ansible does batching like I mentioned above automatically), but also give you a lot of other useful features to work with (like idempotent execution of tasks, nice status reports, and native integration with a very large number of other tools).
There are ways to run a limited number of background tasks (using bash, perl, python, et al), monitor for task completion, and run more tasks as prior tasks complete. A simple approach would be to collect batches of tasks represented by files in subdirectories, and process a batch at a time. There are other ways...
– ChuckCottrill
Apr 29 at 21:41
Does this also include unix-like systems? And what is "GUN parallel"?
– Biswapriyo
2 days ago
2
@ChuckCottrill Yes, there are indeed other ways this could be done. Given my own experience dealing with this type of thing though, it's almost always better to just get a real orchestration tool than to try and roll your own solution, especially once you're past a few dozen systems in terms of scale.
– Austin Hemmelgarn
2 days ago
2
@Baldrickk geekz.co.uk/lovesraymond/archive/gun-linux
– pipe
2 days ago
3
@forest Yes, you could use rlimits to prevent the system from crashing, but getting them right in a case like this is not easy (you kind of need to know what the resource requirements for the tasks are beforehand) and doesn't protect the rest of the network from any impact these jobs may cause (which is arguably a potentially much bigger issue than crashing the local system).
– Austin Hemmelgarn
2 days ago
|
show 5 more comments
Could all 700 instances possibly run concurrently?
That depends on what you mean by concurrently. If we're being picky, then no, they can't unless you have 700 threads of execution on your system you can utilize (so probably not). Realistically though, yes, they probably can, provided you have enough RAM and/or swap space on the system. UNIX and it's various children are remarkably good at managing huge levels of concurrency, that's part of why they're so popular for large-scale HPC usage.
How far could I get until my server reaches its limit?
This is impossible to answer concretely without a whole lot more info. Pretty much, you need to have enough memory to meet:
- The entire run-time memory requirements of one job, times 700.
- The memory requirements of bash to manage that many jobs (bash is not horrible about this, but the job control isn't exactly memory efficient).
- Any other memory requirements on the system.
Assuming you meet that (again, with only 50GB of RAM, you still ahve to deal with other issues:
- How much CPU time is going to be wasted by bash on job control? Probably not much, but with hundreds of jobs, it could be significant.
- How much network bandwidth is this going to need? Just opening all those connections may swamp your network for a couple of minutes depending on your bandwidth and latency.
- Many other things I probably haven't thought of.
When that limit is reached, will it just wait to begin the next iteration off foo or will the box crash?
It depends on what limit is hit. If it's memory, something will die on the system (more specifically, get killed by the kernel in an attempt to free up memory) or the system itself may crash (it's not unusual to configure systems to intentionally crash when running out of memory). If it's CPU time, it will just keep going without issue, it'll just be impossible to do much else on the system. If it's the network though, you might crash other systems or services.
What you really need here is not to run all the jobs at the same time. Instead, split them into batches, and run all the jobs within a batch at the same time, let them finish, then start the next batch. GNU Parallel (https://www.gnu.org/software/parallel/) can be used for this, but it's less than ideal at that scale in a production environment (if you go with it, don't get too aggressive, like I said, you might swamp the network and affect systems you otherwise would not be touching). I would really recommend looking into a proper network orchestration tool like Ansible (https://www.ansible.com/), as that will not only solve your concurrency issues (Ansible does batching like I mentioned above automatically), but also give you a lot of other useful features to work with (like idempotent execution of tasks, nice status reports, and native integration with a very large number of other tools).
There are ways to run a limited number of background tasks (using bash, perl, python, et al), monitor for task completion, and run more tasks as prior tasks complete. A simple approach would be to collect batches of tasks represented by files in subdirectories, and process a batch at a time. There are other ways...
– ChuckCottrill
Apr 29 at 21:41
Does this also include unix-like systems? And what is "GUN parallel"?
– Biswapriyo
2 days ago
2
@ChuckCottrill Yes, there are indeed other ways this could be done. Given my own experience dealing with this type of thing though, it's almost always better to just get a real orchestration tool than to try and roll your own solution, especially once you're past a few dozen systems in terms of scale.
– Austin Hemmelgarn
2 days ago
2
@Baldrickk geekz.co.uk/lovesraymond/archive/gun-linux
– pipe
2 days ago
3
@forest Yes, you could use rlimits to prevent the system from crashing, but getting them right in a case like this is not easy (you kind of need to know what the resource requirements for the tasks are beforehand) and doesn't protect the rest of the network from any impact these jobs may cause (which is arguably a potentially much bigger issue than crashing the local system).
– Austin Hemmelgarn
2 days ago
|
show 5 more comments
Could all 700 instances possibly run concurrently?
That depends on what you mean by concurrently. If we're being picky, then no, they can't unless you have 700 threads of execution on your system you can utilize (so probably not). Realistically though, yes, they probably can, provided you have enough RAM and/or swap space on the system. UNIX and it's various children are remarkably good at managing huge levels of concurrency, that's part of why they're so popular for large-scale HPC usage.
How far could I get until my server reaches its limit?
This is impossible to answer concretely without a whole lot more info. Pretty much, you need to have enough memory to meet:
- The entire run-time memory requirements of one job, times 700.
- The memory requirements of bash to manage that many jobs (bash is not horrible about this, but the job control isn't exactly memory efficient).
- Any other memory requirements on the system.
Assuming you meet that (again, with only 50GB of RAM, you still ahve to deal with other issues:
- How much CPU time is going to be wasted by bash on job control? Probably not much, but with hundreds of jobs, it could be significant.
- How much network bandwidth is this going to need? Just opening all those connections may swamp your network for a couple of minutes depending on your bandwidth and latency.
- Many other things I probably haven't thought of.
When that limit is reached, will it just wait to begin the next iteration off foo or will the box crash?
It depends on what limit is hit. If it's memory, something will die on the system (more specifically, get killed by the kernel in an attempt to free up memory) or the system itself may crash (it's not unusual to configure systems to intentionally crash when running out of memory). If it's CPU time, it will just keep going without issue, it'll just be impossible to do much else on the system. If it's the network though, you might crash other systems or services.
What you really need here is not to run all the jobs at the same time. Instead, split them into batches, and run all the jobs within a batch at the same time, let them finish, then start the next batch. GNU Parallel (https://www.gnu.org/software/parallel/) can be used for this, but it's less than ideal at that scale in a production environment (if you go with it, don't get too aggressive, like I said, you might swamp the network and affect systems you otherwise would not be touching). I would really recommend looking into a proper network orchestration tool like Ansible (https://www.ansible.com/), as that will not only solve your concurrency issues (Ansible does batching like I mentioned above automatically), but also give you a lot of other useful features to work with (like idempotent execution of tasks, nice status reports, and native integration with a very large number of other tools).
Could all 700 instances possibly run concurrently?
That depends on what you mean by concurrently. If we're being picky, then no, they can't unless you have 700 threads of execution on your system you can utilize (so probably not). Realistically though, yes, they probably can, provided you have enough RAM and/or swap space on the system. UNIX and it's various children are remarkably good at managing huge levels of concurrency, that's part of why they're so popular for large-scale HPC usage.
How far could I get until my server reaches its limit?
This is impossible to answer concretely without a whole lot more info. Pretty much, you need to have enough memory to meet:
- The entire run-time memory requirements of one job, times 700.
- The memory requirements of bash to manage that many jobs (bash is not horrible about this, but the job control isn't exactly memory efficient).
- Any other memory requirements on the system.
Assuming you meet that (again, with only 50GB of RAM, you still ahve to deal with other issues:
- How much CPU time is going to be wasted by bash on job control? Probably not much, but with hundreds of jobs, it could be significant.
- How much network bandwidth is this going to need? Just opening all those connections may swamp your network for a couple of minutes depending on your bandwidth and latency.
- Many other things I probably haven't thought of.
When that limit is reached, will it just wait to begin the next iteration off foo or will the box crash?
It depends on what limit is hit. If it's memory, something will die on the system (more specifically, get killed by the kernel in an attempt to free up memory) or the system itself may crash (it's not unusual to configure systems to intentionally crash when running out of memory). If it's CPU time, it will just keep going without issue, it'll just be impossible to do much else on the system. If it's the network though, you might crash other systems or services.
What you really need here is not to run all the jobs at the same time. Instead, split them into batches, and run all the jobs within a batch at the same time, let them finish, then start the next batch. GNU Parallel (https://www.gnu.org/software/parallel/) can be used for this, but it's less than ideal at that scale in a production environment (if you go with it, don't get too aggressive, like I said, you might swamp the network and affect systems you otherwise would not be touching). I would really recommend looking into a proper network orchestration tool like Ansible (https://www.ansible.com/), as that will not only solve your concurrency issues (Ansible does batching like I mentioned above automatically), but also give you a lot of other useful features to work with (like idempotent execution of tasks, nice status reports, and native integration with a very large number of other tools).
edited 2 days ago
answered Apr 29 at 19:50
Austin HemmelgarnAustin Hemmelgarn
6,47111120
6,47111120
There are ways to run a limited number of background tasks (using bash, perl, python, et al), monitor for task completion, and run more tasks as prior tasks complete. A simple approach would be to collect batches of tasks represented by files in subdirectories, and process a batch at a time. There are other ways...
– ChuckCottrill
Apr 29 at 21:41
Does this also include unix-like systems? And what is "GUN parallel"?
– Biswapriyo
2 days ago
2
@ChuckCottrill Yes, there are indeed other ways this could be done. Given my own experience dealing with this type of thing though, it's almost always better to just get a real orchestration tool than to try and roll your own solution, especially once you're past a few dozen systems in terms of scale.
– Austin Hemmelgarn
2 days ago
2
@Baldrickk geekz.co.uk/lovesraymond/archive/gun-linux
– pipe
2 days ago
3
@forest Yes, you could use rlimits to prevent the system from crashing, but getting them right in a case like this is not easy (you kind of need to know what the resource requirements for the tasks are beforehand) and doesn't protect the rest of the network from any impact these jobs may cause (which is arguably a potentially much bigger issue than crashing the local system).
– Austin Hemmelgarn
2 days ago
|
show 5 more comments
There are ways to run a limited number of background tasks (using bash, perl, python, et al), monitor for task completion, and run more tasks as prior tasks complete. A simple approach would be to collect batches of tasks represented by files in subdirectories, and process a batch at a time. There are other ways...
– ChuckCottrill
Apr 29 at 21:41
Does this also include unix-like systems? And what is "GUN parallel"?
– Biswapriyo
2 days ago
2
@ChuckCottrill Yes, there are indeed other ways this could be done. Given my own experience dealing with this type of thing though, it's almost always better to just get a real orchestration tool than to try and roll your own solution, especially once you're past a few dozen systems in terms of scale.
– Austin Hemmelgarn
2 days ago
2
@Baldrickk geekz.co.uk/lovesraymond/archive/gun-linux
– pipe
2 days ago
3
@forest Yes, you could use rlimits to prevent the system from crashing, but getting them right in a case like this is not easy (you kind of need to know what the resource requirements for the tasks are beforehand) and doesn't protect the rest of the network from any impact these jobs may cause (which is arguably a potentially much bigger issue than crashing the local system).
– Austin Hemmelgarn
2 days ago
There are ways to run a limited number of background tasks (using bash, perl, python, et al), monitor for task completion, and run more tasks as prior tasks complete. A simple approach would be to collect batches of tasks represented by files in subdirectories, and process a batch at a time. There are other ways...
– ChuckCottrill
Apr 29 at 21:41
There are ways to run a limited number of background tasks (using bash, perl, python, et al), monitor for task completion, and run more tasks as prior tasks complete. A simple approach would be to collect batches of tasks represented by files in subdirectories, and process a batch at a time. There are other ways...
– ChuckCottrill
Apr 29 at 21:41
Does this also include unix-like systems? And what is "GUN parallel"?
– Biswapriyo
2 days ago
Does this also include unix-like systems? And what is "GUN parallel"?
– Biswapriyo
2 days ago
2
2
@ChuckCottrill Yes, there are indeed other ways this could be done. Given my own experience dealing with this type of thing though, it's almost always better to just get a real orchestration tool than to try and roll your own solution, especially once you're past a few dozen systems in terms of scale.
– Austin Hemmelgarn
2 days ago
@ChuckCottrill Yes, there are indeed other ways this could be done. Given my own experience dealing with this type of thing though, it's almost always better to just get a real orchestration tool than to try and roll your own solution, especially once you're past a few dozen systems in terms of scale.
– Austin Hemmelgarn
2 days ago
2
2
@Baldrickk geekz.co.uk/lovesraymond/archive/gun-linux
– pipe
2 days ago
@Baldrickk geekz.co.uk/lovesraymond/archive/gun-linux
– pipe
2 days ago
3
3
@forest Yes, you could use rlimits to prevent the system from crashing, but getting them right in a case like this is not easy (you kind of need to know what the resource requirements for the tasks are beforehand) and doesn't protect the rest of the network from any impact these jobs may cause (which is arguably a potentially much bigger issue than crashing the local system).
– Austin Hemmelgarn
2 days ago
@forest Yes, you could use rlimits to prevent the system from crashing, but getting them right in a case like this is not easy (you kind of need to know what the resource requirements for the tasks are beforehand) and doesn't protect the rest of the network from any impact these jobs may cause (which is arguably a potentially much bigger issue than crashing the local system).
– Austin Hemmelgarn
2 days ago
|
show 5 more comments
It's hard to say specifically how many instances could be run as background jobs in the manner you describe. But a normal server can certainly maintain 700 concurrent connections as long as you do it correctly. Webservers do this all the time.
May I suggest that you use GNU parallel (https://www.gnu.org/software/parallel/) or something similar to accomplish this? It would give you a number of advantages to the background job approach:
- You can easily change the number of concurrent sessions.
- And it will wait until sessions complete before it starts new ones.
- It it easier to abort.
Have a look here for a quick start: https://www.gnu.org/software/parallel/parallel_tutorial.html#A-single-input-source
New contributor
1
Interesting! I'll take a look at this. Do you know if attempting this kind of operation (without the help of Parallel) would risk crashing the hypervisor?
– KuboMD
Apr 29 at 18:15
2
@KuboMD if you can crash the hypervisor with something so mundane, it's a bug in the hypervisor :)
– hobbs
2 days ago
as an aside, web servers often use threading or event-based processing (example: gunicorn.org)
– ChuckCottrill
2 days ago
add a comment |
It's hard to say specifically how many instances could be run as background jobs in the manner you describe. But a normal server can certainly maintain 700 concurrent connections as long as you do it correctly. Webservers do this all the time.
May I suggest that you use GNU parallel (https://www.gnu.org/software/parallel/) or something similar to accomplish this? It would give you a number of advantages to the background job approach:
- You can easily change the number of concurrent sessions.
- And it will wait until sessions complete before it starts new ones.
- It it easier to abort.
Have a look here for a quick start: https://www.gnu.org/software/parallel/parallel_tutorial.html#A-single-input-source
New contributor
1
Interesting! I'll take a look at this. Do you know if attempting this kind of operation (without the help of Parallel) would risk crashing the hypervisor?
– KuboMD
Apr 29 at 18:15
2
@KuboMD if you can crash the hypervisor with something so mundane, it's a bug in the hypervisor :)
– hobbs
2 days ago
as an aside, web servers often use threading or event-based processing (example: gunicorn.org)
– ChuckCottrill
2 days ago
add a comment |
It's hard to say specifically how many instances could be run as background jobs in the manner you describe. But a normal server can certainly maintain 700 concurrent connections as long as you do it correctly. Webservers do this all the time.
May I suggest that you use GNU parallel (https://www.gnu.org/software/parallel/) or something similar to accomplish this? It would give you a number of advantages to the background job approach:
- You can easily change the number of concurrent sessions.
- And it will wait until sessions complete before it starts new ones.
- It it easier to abort.
Have a look here for a quick start: https://www.gnu.org/software/parallel/parallel_tutorial.html#A-single-input-source
New contributor
It's hard to say specifically how many instances could be run as background jobs in the manner you describe. But a normal server can certainly maintain 700 concurrent connections as long as you do it correctly. Webservers do this all the time.
May I suggest that you use GNU parallel (https://www.gnu.org/software/parallel/) or something similar to accomplish this? It would give you a number of advantages to the background job approach:
- You can easily change the number of concurrent sessions.
- And it will wait until sessions complete before it starts new ones.
- It it easier to abort.
Have a look here for a quick start: https://www.gnu.org/software/parallel/parallel_tutorial.html#A-single-input-source
New contributor
New contributor
answered Apr 29 at 17:53
laenkeiolaenkeio
1866
1866
New contributor
New contributor
1
Interesting! I'll take a look at this. Do you know if attempting this kind of operation (without the help of Parallel) would risk crashing the hypervisor?
– KuboMD
Apr 29 at 18:15
2
@KuboMD if you can crash the hypervisor with something so mundane, it's a bug in the hypervisor :)
– hobbs
2 days ago
as an aside, web servers often use threading or event-based processing (example: gunicorn.org)
– ChuckCottrill
2 days ago
add a comment |
1
Interesting! I'll take a look at this. Do you know if attempting this kind of operation (without the help of Parallel) would risk crashing the hypervisor?
– KuboMD
Apr 29 at 18:15
2
@KuboMD if you can crash the hypervisor with something so mundane, it's a bug in the hypervisor :)
– hobbs
2 days ago
as an aside, web servers often use threading or event-based processing (example: gunicorn.org)
– ChuckCottrill
2 days ago
1
1
Interesting! I'll take a look at this. Do you know if attempting this kind of operation (without the help of Parallel) would risk crashing the hypervisor?
– KuboMD
Apr 29 at 18:15
Interesting! I'll take a look at this. Do you know if attempting this kind of operation (without the help of Parallel) would risk crashing the hypervisor?
– KuboMD
Apr 29 at 18:15
2
2
@KuboMD if you can crash the hypervisor with something so mundane, it's a bug in the hypervisor :)
– hobbs
2 days ago
@KuboMD if you can crash the hypervisor with something so mundane, it's a bug in the hypervisor :)
– hobbs
2 days ago
as an aside, web servers often use threading or event-based processing (example: gunicorn.org)
– ChuckCottrill
2 days ago
as an aside, web servers often use threading or event-based processing (example: gunicorn.org)
– ChuckCottrill
2 days ago
add a comment |
Using &
for parallel processing is fine when doing a few, and when you monitor progress. But if you are running in a corporate production environment you need something that gives you better control.
ls ~/sagLogs/ | parallel --delay 0.5 --memfree 1G -j0 --joblog my.log --retries 10 foo
This will run foo
for each file in ~/sagLogs
. It start a job every 0.5 seconds, it will run as many jobs in parallel as possible as long as 1 GB RAM is free, but will respect the limits on your system (e.g. number of files and processes). Typically this means you will be running 250 jobs in parallel if you have not adjusted the number of open files allowed. If you adjust the number of open files, you should have no problem running 32000 in parallel - as long as you have enough memory.
If a job fails (i.e. returns with an error code) it will be retried 10 times.
my.log
will tell you if a job succeed (after possibly retries) or not.
This looks very promising, thank you.
– KuboMD
2 days ago
Ran a simple test doingcat ~/sagLogs/* >> ~/woah | parallel
and holy moly that was fast. 1,054,552 lines in the blink of an eye.
– KuboMD
2 days ago
3
The command you gave has dual redirection, so I donot think it does what you intend it to do. GNU Parallel has an overhead of 10 ms per job, so 1M jobs should take in the order of 3 hours.
– Ole Tange
2 days ago
1
It is not applicable at all if all you want to do is simply to concatenate the files.
– Ole Tange
2 days ago
1
@KuboMD a trivial CPU busy loop likeawk 'BEGINfor(i=rand()*10000000; i<100000000;i++)'
would work for playing around with. Or try it on a task likesleep 10
to see it keepn
jobs in flight without using much CPU time. e.g.time parallel sleep ::: 100..1
to run sleeps from 100 down to 1 second.
– Peter Cordes
2 days ago
|
show 7 more comments
Using &
for parallel processing is fine when doing a few, and when you monitor progress. But if you are running in a corporate production environment you need something that gives you better control.
ls ~/sagLogs/ | parallel --delay 0.5 --memfree 1G -j0 --joblog my.log --retries 10 foo
This will run foo
for each file in ~/sagLogs
. It start a job every 0.5 seconds, it will run as many jobs in parallel as possible as long as 1 GB RAM is free, but will respect the limits on your system (e.g. number of files and processes). Typically this means you will be running 250 jobs in parallel if you have not adjusted the number of open files allowed. If you adjust the number of open files, you should have no problem running 32000 in parallel - as long as you have enough memory.
If a job fails (i.e. returns with an error code) it will be retried 10 times.
my.log
will tell you if a job succeed (after possibly retries) or not.
This looks very promising, thank you.
– KuboMD
2 days ago
Ran a simple test doingcat ~/sagLogs/* >> ~/woah | parallel
and holy moly that was fast. 1,054,552 lines in the blink of an eye.
– KuboMD
2 days ago
3
The command you gave has dual redirection, so I donot think it does what you intend it to do. GNU Parallel has an overhead of 10 ms per job, so 1M jobs should take in the order of 3 hours.
– Ole Tange
2 days ago
1
It is not applicable at all if all you want to do is simply to concatenate the files.
– Ole Tange
2 days ago
1
@KuboMD a trivial CPU busy loop likeawk 'BEGINfor(i=rand()*10000000; i<100000000;i++)'
would work for playing around with. Or try it on a task likesleep 10
to see it keepn
jobs in flight without using much CPU time. e.g.time parallel sleep ::: 100..1
to run sleeps from 100 down to 1 second.
– Peter Cordes
2 days ago
|
show 7 more comments
Using &
for parallel processing is fine when doing a few, and when you monitor progress. But if you are running in a corporate production environment you need something that gives you better control.
ls ~/sagLogs/ | parallel --delay 0.5 --memfree 1G -j0 --joblog my.log --retries 10 foo
This will run foo
for each file in ~/sagLogs
. It start a job every 0.5 seconds, it will run as many jobs in parallel as possible as long as 1 GB RAM is free, but will respect the limits on your system (e.g. number of files and processes). Typically this means you will be running 250 jobs in parallel if you have not adjusted the number of open files allowed. If you adjust the number of open files, you should have no problem running 32000 in parallel - as long as you have enough memory.
If a job fails (i.e. returns with an error code) it will be retried 10 times.
my.log
will tell you if a job succeed (after possibly retries) or not.
Using &
for parallel processing is fine when doing a few, and when you monitor progress. But if you are running in a corporate production environment you need something that gives you better control.
ls ~/sagLogs/ | parallel --delay 0.5 --memfree 1G -j0 --joblog my.log --retries 10 foo
This will run foo
for each file in ~/sagLogs
. It start a job every 0.5 seconds, it will run as many jobs in parallel as possible as long as 1 GB RAM is free, but will respect the limits on your system (e.g. number of files and processes). Typically this means you will be running 250 jobs in parallel if you have not adjusted the number of open files allowed. If you adjust the number of open files, you should have no problem running 32000 in parallel - as long as you have enough memory.
If a job fails (i.e. returns with an error code) it will be retried 10 times.
my.log
will tell you if a job succeed (after possibly retries) or not.
answered Apr 29 at 22:52
Ole TangeOle Tange
13.2k1658107
13.2k1658107
This looks very promising, thank you.
– KuboMD
2 days ago
Ran a simple test doingcat ~/sagLogs/* >> ~/woah | parallel
and holy moly that was fast. 1,054,552 lines in the blink of an eye.
– KuboMD
2 days ago
3
The command you gave has dual redirection, so I donot think it does what you intend it to do. GNU Parallel has an overhead of 10 ms per job, so 1M jobs should take in the order of 3 hours.
– Ole Tange
2 days ago
1
It is not applicable at all if all you want to do is simply to concatenate the files.
– Ole Tange
2 days ago
1
@KuboMD a trivial CPU busy loop likeawk 'BEGINfor(i=rand()*10000000; i<100000000;i++)'
would work for playing around with. Or try it on a task likesleep 10
to see it keepn
jobs in flight without using much CPU time. e.g.time parallel sleep ::: 100..1
to run sleeps from 100 down to 1 second.
– Peter Cordes
2 days ago
|
show 7 more comments
This looks very promising, thank you.
– KuboMD
2 days ago
Ran a simple test doingcat ~/sagLogs/* >> ~/woah | parallel
and holy moly that was fast. 1,054,552 lines in the blink of an eye.
– KuboMD
2 days ago
3
The command you gave has dual redirection, so I donot think it does what you intend it to do. GNU Parallel has an overhead of 10 ms per job, so 1M jobs should take in the order of 3 hours.
– Ole Tange
2 days ago
1
It is not applicable at all if all you want to do is simply to concatenate the files.
– Ole Tange
2 days ago
1
@KuboMD a trivial CPU busy loop likeawk 'BEGINfor(i=rand()*10000000; i<100000000;i++)'
would work for playing around with. Or try it on a task likesleep 10
to see it keepn
jobs in flight without using much CPU time. e.g.time parallel sleep ::: 100..1
to run sleeps from 100 down to 1 second.
– Peter Cordes
2 days ago
This looks very promising, thank you.
– KuboMD
2 days ago
This looks very promising, thank you.
– KuboMD
2 days ago
Ran a simple test doing
cat ~/sagLogs/* >> ~/woah | parallel
and holy moly that was fast. 1,054,552 lines in the blink of an eye.– KuboMD
2 days ago
Ran a simple test doing
cat ~/sagLogs/* >> ~/woah | parallel
and holy moly that was fast. 1,054,552 lines in the blink of an eye.– KuboMD
2 days ago
3
3
The command you gave has dual redirection, so I donot think it does what you intend it to do. GNU Parallel has an overhead of 10 ms per job, so 1M jobs should take in the order of 3 hours.
– Ole Tange
2 days ago
The command you gave has dual redirection, so I donot think it does what you intend it to do. GNU Parallel has an overhead of 10 ms per job, so 1M jobs should take in the order of 3 hours.
– Ole Tange
2 days ago
1
1
It is not applicable at all if all you want to do is simply to concatenate the files.
– Ole Tange
2 days ago
It is not applicable at all if all you want to do is simply to concatenate the files.
– Ole Tange
2 days ago
1
1
@KuboMD a trivial CPU busy loop like
awk 'BEGINfor(i=rand()*10000000; i<100000000;i++)'
would work for playing around with. Or try it on a task like sleep 10
to see it keep n
jobs in flight without using much CPU time. e.g. time parallel sleep ::: 100..1
to run sleeps from 100 down to 1 second.– Peter Cordes
2 days ago
@KuboMD a trivial CPU busy loop like
awk 'BEGINfor(i=rand()*10000000; i<100000000;i++)'
would work for playing around with. Or try it on a task like sleep 10
to see it keep n
jobs in flight without using much CPU time. e.g. time parallel sleep ::: 100..1
to run sleeps from 100 down to 1 second.– Peter Cordes
2 days ago
|
show 7 more comments
What happens if I start too many background jobs?
the system will become slow and unresponsive, worst case is so unresponsive it would be best to just push the power button and do a hard reboot... this would be running something as root where it had the privilege to get away with doing that. If your bash script is running under regular user privileges, then the first thing that comes to mind is /etc/security/limits.conf
and /etc/systemd/system.conf
and all the variables therein to [ideally speaking] prevent user(s) from overloading the system.
cpu = xeon E5649, that is a 12-core cpu; so you have 12 cores for 12 processes to run concurrently each utilizing one of twelve cores at 100%. If you kick off 24 processes, then each would run at 50% utilization on each of twelve cores, 700 processes = 1.7% but it's a computer as long as everything completes properly in an ok amount of time then that = success; being efficient is not always relevant.
Could all 700 instances possibly run concurrently? Certainly, 700 is not a large number; my /etc/security/limits.conf
maxproc
default is 4,135,275 for exampleHow far could I get until my server reaches its limit? Much farther than 700 I'm sure.
Limits... what will happen if the script is kicked off under a user account [and generally root as well
limits.conf
pretty much applies to everyone] is that the script will just exit after having tried to dofoo &
700 times; you would expect to then see 700 foo processes each with a different pid but you might only see 456 (random number choice) and the other 244 never started because they got blocked by some security or systemd limit.
Million $ question: how many should u run concurrently?
being involved with network and you said each will do a telnet connection, educated guess is you will run into network limits and overhead before you do for cpu and ram limits. But I don't know what you are doing specifically, what will likely happen is u can kick off all 700 at once, but things will automatically block until previous processes and network connections finish and close based on various system limits, or something like the first 500 will kick off then the remaining 200 won't because system or kernel limits prevent it. But however many run at once, there will be some sweetish spot to get things done as fast as possible... minimizing overhead and increasing efficiency. Being 12 cores (or 24 if you have 2 cpu's) then start with 12 (or 24) at once and then increase that concurrent batch number by 12 or 24 until you don't see run time improvement.
hint: google max telnet connections and see how this applies to your system(s). Also don't forget about firewalls. Also do quick calculation of memory needed per process x 700; make sure < available RAM (about 50gb in your case) otherwise system will start using SWAP and basically become unresponsive. So kick of 12, 24, N processes at a time and monitor RAM free, then increase N already having some knowledge of what's happening.
By default, RHEL limits the number of telnet connections from a single host to 10 simultaneous sessions. This is a security feature... set to 10, /etc/xinetd.conf, change “per_source” value.
add a comment |
What happens if I start too many background jobs?
the system will become slow and unresponsive, worst case is so unresponsive it would be best to just push the power button and do a hard reboot... this would be running something as root where it had the privilege to get away with doing that. If your bash script is running under regular user privileges, then the first thing that comes to mind is /etc/security/limits.conf
and /etc/systemd/system.conf
and all the variables therein to [ideally speaking] prevent user(s) from overloading the system.
cpu = xeon E5649, that is a 12-core cpu; so you have 12 cores for 12 processes to run concurrently each utilizing one of twelve cores at 100%. If you kick off 24 processes, then each would run at 50% utilization on each of twelve cores, 700 processes = 1.7% but it's a computer as long as everything completes properly in an ok amount of time then that = success; being efficient is not always relevant.
Could all 700 instances possibly run concurrently? Certainly, 700 is not a large number; my /etc/security/limits.conf
maxproc
default is 4,135,275 for exampleHow far could I get until my server reaches its limit? Much farther than 700 I'm sure.
Limits... what will happen if the script is kicked off under a user account [and generally root as well
limits.conf
pretty much applies to everyone] is that the script will just exit after having tried to dofoo &
700 times; you would expect to then see 700 foo processes each with a different pid but you might only see 456 (random number choice) and the other 244 never started because they got blocked by some security or systemd limit.
Million $ question: how many should u run concurrently?
being involved with network and you said each will do a telnet connection, educated guess is you will run into network limits and overhead before you do for cpu and ram limits. But I don't know what you are doing specifically, what will likely happen is u can kick off all 700 at once, but things will automatically block until previous processes and network connections finish and close based on various system limits, or something like the first 500 will kick off then the remaining 200 won't because system or kernel limits prevent it. But however many run at once, there will be some sweetish spot to get things done as fast as possible... minimizing overhead and increasing efficiency. Being 12 cores (or 24 if you have 2 cpu's) then start with 12 (or 24) at once and then increase that concurrent batch number by 12 or 24 until you don't see run time improvement.
hint: google max telnet connections and see how this applies to your system(s). Also don't forget about firewalls. Also do quick calculation of memory needed per process x 700; make sure < available RAM (about 50gb in your case) otherwise system will start using SWAP and basically become unresponsive. So kick of 12, 24, N processes at a time and monitor RAM free, then increase N already having some knowledge of what's happening.
By default, RHEL limits the number of telnet connections from a single host to 10 simultaneous sessions. This is a security feature... set to 10, /etc/xinetd.conf, change “per_source” value.
add a comment |
What happens if I start too many background jobs?
the system will become slow and unresponsive, worst case is so unresponsive it would be best to just push the power button and do a hard reboot... this would be running something as root where it had the privilege to get away with doing that. If your bash script is running under regular user privileges, then the first thing that comes to mind is /etc/security/limits.conf
and /etc/systemd/system.conf
and all the variables therein to [ideally speaking] prevent user(s) from overloading the system.
cpu = xeon E5649, that is a 12-core cpu; so you have 12 cores for 12 processes to run concurrently each utilizing one of twelve cores at 100%. If you kick off 24 processes, then each would run at 50% utilization on each of twelve cores, 700 processes = 1.7% but it's a computer as long as everything completes properly in an ok amount of time then that = success; being efficient is not always relevant.
Could all 700 instances possibly run concurrently? Certainly, 700 is not a large number; my /etc/security/limits.conf
maxproc
default is 4,135,275 for exampleHow far could I get until my server reaches its limit? Much farther than 700 I'm sure.
Limits... what will happen if the script is kicked off under a user account [and generally root as well
limits.conf
pretty much applies to everyone] is that the script will just exit after having tried to dofoo &
700 times; you would expect to then see 700 foo processes each with a different pid but you might only see 456 (random number choice) and the other 244 never started because they got blocked by some security or systemd limit.
Million $ question: how many should u run concurrently?
being involved with network and you said each will do a telnet connection, educated guess is you will run into network limits and overhead before you do for cpu and ram limits. But I don't know what you are doing specifically, what will likely happen is u can kick off all 700 at once, but things will automatically block until previous processes and network connections finish and close based on various system limits, or something like the first 500 will kick off then the remaining 200 won't because system or kernel limits prevent it. But however many run at once, there will be some sweetish spot to get things done as fast as possible... minimizing overhead and increasing efficiency. Being 12 cores (or 24 if you have 2 cpu's) then start with 12 (or 24) at once and then increase that concurrent batch number by 12 or 24 until you don't see run time improvement.
hint: google max telnet connections and see how this applies to your system(s). Also don't forget about firewalls. Also do quick calculation of memory needed per process x 700; make sure < available RAM (about 50gb in your case) otherwise system will start using SWAP and basically become unresponsive. So kick of 12, 24, N processes at a time and monitor RAM free, then increase N already having some knowledge of what's happening.
By default, RHEL limits the number of telnet connections from a single host to 10 simultaneous sessions. This is a security feature... set to 10, /etc/xinetd.conf, change “per_source” value.
What happens if I start too many background jobs?
the system will become slow and unresponsive, worst case is so unresponsive it would be best to just push the power button and do a hard reboot... this would be running something as root where it had the privilege to get away with doing that. If your bash script is running under regular user privileges, then the first thing that comes to mind is /etc/security/limits.conf
and /etc/systemd/system.conf
and all the variables therein to [ideally speaking] prevent user(s) from overloading the system.
cpu = xeon E5649, that is a 12-core cpu; so you have 12 cores for 12 processes to run concurrently each utilizing one of twelve cores at 100%. If you kick off 24 processes, then each would run at 50% utilization on each of twelve cores, 700 processes = 1.7% but it's a computer as long as everything completes properly in an ok amount of time then that = success; being efficient is not always relevant.
Could all 700 instances possibly run concurrently? Certainly, 700 is not a large number; my /etc/security/limits.conf
maxproc
default is 4,135,275 for exampleHow far could I get until my server reaches its limit? Much farther than 700 I'm sure.
Limits... what will happen if the script is kicked off under a user account [and generally root as well
limits.conf
pretty much applies to everyone] is that the script will just exit after having tried to dofoo &
700 times; you would expect to then see 700 foo processes each with a different pid but you might only see 456 (random number choice) and the other 244 never started because they got blocked by some security or systemd limit.
Million $ question: how many should u run concurrently?
being involved with network and you said each will do a telnet connection, educated guess is you will run into network limits and overhead before you do for cpu and ram limits. But I don't know what you are doing specifically, what will likely happen is u can kick off all 700 at once, but things will automatically block until previous processes and network connections finish and close based on various system limits, or something like the first 500 will kick off then the remaining 200 won't because system or kernel limits prevent it. But however many run at once, there will be some sweetish spot to get things done as fast as possible... minimizing overhead and increasing efficiency. Being 12 cores (or 24 if you have 2 cpu's) then start with 12 (or 24) at once and then increase that concurrent batch number by 12 or 24 until you don't see run time improvement.
hint: google max telnet connections and see how this applies to your system(s). Also don't forget about firewalls. Also do quick calculation of memory needed per process x 700; make sure < available RAM (about 50gb in your case) otherwise system will start using SWAP and basically become unresponsive. So kick of 12, 24, N processes at a time and monitor RAM free, then increase N already having some knowledge of what's happening.
By default, RHEL limits the number of telnet connections from a single host to 10 simultaneous sessions. This is a security feature... set to 10, /etc/xinetd.conf, change “per_source” value.
edited 2 days ago
answered 2 days ago
ronron
1,2022819
1,2022819
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f516203%2fwhat-happens-if-i-start-too-many-background-jobs%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
3
I've had good luck with
parallel
, using around 50 concurrent jobs. It's a great medium between parallelism of 1 and 700. The other nice thing is that's batchless. A single stalled connection will only stall itself, not any of the others. The main downside is error management. None of these shell-based approaches will gracefully handle errors. You'll have to manually check for success yourself, and do your own retries.– Adam
Apr 29 at 21:38
1
Your task queue may be 700 today, but can the size expand? Watch for swap space to grow - that is indication you have reached memory limit. And cpu % is not a good measure (for linux/unix), better to consider load average (run queue length).
– ChuckCottrill
Apr 29 at 21:45
1
The most recent way I broke production at my still-kinda-new job was by accidentally running a million plus short-lived background jobs at once. They involved JVMs (wait wait put the pitchforks down), so the consequences were 'limited' to hundreds of thousands of error report files that threads couldn't be started.
– michaelb958
Apr 29 at 23:04
4
Nitpick: Don't parse
ls
output– l0b0
Apr 30 at 1:18
1
@KuboMD And as long as nobody else ever wants to use your code.
– l0b0
2 days ago