I have large batches of bash processes. Each bash script invokes executeables which have their stdout redirected to distinct log files. About 5% of the runs end up with: sh: [name of log]: Resource temporarily unavailable I tried to reduce amount of jobs running in parallel, but still the error persisted on some of the bash scripts.
Additional info:
- Ubuntu 14.04 LTS running on VM using ESXi
- Happens on a new partition, allocated with gparted and LVM (new logical volume consisting of the entire partition)
- The LV is exported using nfs-kernel-server
- The LV is also shared to windows using Samba
- The LV is formatted using ext4
- I have admin rights on this machine
More detailed info
- Everything is run in a cluster, using Sun-Grid-Engine
- There are 4 virtual machines: m1, m2, m3, m4
- m1 runs sge master, sge exec, and ldap server
- m2, m3, m4 run sge exec
- m3 runs nfs-kernel-server, exporting a home folder sitting in logical volume (using LVM) that uses a partition on a local disk, to m1, m2, m4
- m3 has a soft link to the home folder
- m1, m2, m4 mount the home folder through fstab, so all machines end up pointing to the same home folder
- m3, m2, m4 run ldap clients, connecting to m1
- All jobs are submitted to the cluster through m1 (configured as a submission host)
- Jobs fail exclusively on m3 (which exports the disk). Most of the jobs on m3 are passing though. Failures are random, but consistently on m3 alone.
- m3 also shares the home via samba to windows clients Any help would be greatly appreciated :) (how to debug, which logs are relevant, how to get more info out of the system, etc...)
Thank you in advance!
Aucun commentaire:
Enregistrer un commentaire