Barley info

(Difference between revisions)

Revision as of 20:31, 20 September 2013

480 max jobs per user (look for max_u_jobs in output of 'qconf -sconf')
3000 max jobs in the system (look for max_jobs in output of 'qconf -sconf')
48hr max runtime for any job in regular queue (look for h_rt in output of 'qconf -sq precise.q')
30 days max runtime for the long queue (look for h_rt in output of 'qconf -sq precise-long.q')
15min max runtime in test.q
4GB default mem_free request per slot ('qconf -sc |grep mem_free')

19 new machines, AMD Magny Cours 24 cores each, 96GB RAM
1 new machine, AMD Magny Cours 24 cores, 192GB RAM
~450GB local scratch on each
~100TB in /farmshare/user_data shared across all barley and corn systems (introduced summer 2013)
Open Grid Scheduler 2011.11p1
10GbE interconnect (Juniper QFX3500 switch)

To start using these new machines, you can check out the man page for 'sge_intro' or the 'qhost', 'qstat', 'qsub' and 'qdel' commands.

Initial issues:

You are limited in space to your AFS homedir ($HOME) and local scratch disk on each node ($TMPDIR)
The execution hosts don't accept interactive jobs, only batch jobs for now.
You'll want to make sure you have your Kerberos TGT and your AFS token.

If you want to use the newer bigger storage:

log into any FarmShare machine: ssh sunetid@corn.stanford.edu
cd to /farmshare/user_data/<your username> (or wait 5mins if it doesn't exist yet)
write a job script: "$EDITOR test_job.script"
1. see 'man qsub' for more info
2. use env var $TMPDIR for local scratch
3. use /farmshare/user_data/<your username> for shared data directory
submit the job for processing: "qsub -cwd test_job.script"
monitor the jobs with "qstat -f -j JOBID"
1. see 'man qstat' for more info
check the output files that you specified in your job script (the input and output files must be in /farmshare/user_data/)

Any questions, please email 'farmshare-discuss@lists.stanford.edu' Some good intro usage examples here: http://gridscheduler.sourceforge.net/howto/basic_usage.html

@@ Line 15: / Line 15: @@
 *1 new machine, AMD Magny Cours 24 cores, 192GB RAM
 *~450GB local scratch on each
-*~7TB in /mnt/glusterfs shared across all barley and corn systems (due to be retired end of summer 2013)
 *~100TB in /farmshare/user_data shared across all barley and corn systems (introduced summer 2013)
-*Grid Engine v6.2u5 (via standard Debian package)
+*Open Grid Scheduler 2011.11p1
 *10GbE interconnect (Juniper QFX3500 switch)
@@ Line 32: / Line 31: @@
 If you want to use the newer bigger storage:
-#log into any FarmShare machine: "ssh sunetid@&lt;host&gt;.stanford.edu"
+#log into any FarmShare machine: ssh sunetid@corn.stanford.edu
-  ssh sunetid@corn.stanford.edu
 #cd to /farmshare/user_data/&lt;your username&gt; (or wait 5mins if it doesn't exist yet)
 #write a job script: "$EDITOR test_job.script"