Barley info

(Difference between revisions)

Revision as of 14:41, 30 July 2014

19 new machines, AMD Magny Cours 24 cores each, 96GB RAM
1 new machine, AMD Magny Cours 24 cores, 192GB RAM
~450GB local scratch on each
~100TB in /farmshare/user_data shared across all barley and corn systems (introduced summer 2013)
Open Grid Scheduler 2011.11p1
10GbE interconnect (Juniper QFX3500 switch)

To start using these new machines, you can check out the man page for 'sge_intro' or the 'qhost', 'qstat', 'qsub' and 'qdel' commands.

Initial issues:

You are limited in space to your AFS homedir ($HOME) and local scratch disk on each node ($TMPDIR)
The execution hosts don't accept interactive jobs, only batch jobs for now.
You'll want to make sure you have your Kerberos TGT and your AFS token.

If you want to use the newer bigger storage:

log into any FarmShare machine: ssh sunetid@corn.stanford.edu
cd to /farmshare/user_data/<your username> (or wait 5mins if it doesn't exist yet)
write a job script: "$EDITOR test_job.script"
1. see 'man qsub' for more info
2. use env var $TMPDIR for local scratch
3. use /farmshare/user_data/<your username> for shared data directory
submit the job for processing: "qsub -cwd test_job.script"
monitor the jobs with "qstat -f -j JOBID"
1. see 'man qstat' for more info
check the output files that you specified in your job script (the input and output files must be in /farmshare/user_data/)

Any questions, please email 'farmshare-discuss@lists.stanford.edu' Some good intro usage examples here: http://gridscheduler.sourceforge.net/howto/basic_usage.html

@@ Line 3: / Line 3: @@
 === current barley policies  ===
-*480 max jobs per user (look for max_u_jobs in output of 'qconf -sconf')
+*480 max jobs per user ('qconf -sconf | grep max_u_jobs')
-*3000 max jobs in the system (look for max_jobs in output of 'qconf -sconf')
+*3000 max jobs in the system ('qconf -sconf | grep max_jobs')
-*48hr max runtime for any job in regular queue (look for h_rt in output of 'qconf -sq raring.q')
+*48hr max runtime for any job in regular queue ('qconf -sq saucy.q | grep h_rt')
-*30 days max runtime for the long queue (look for h_rt in output of 'qconf -sq raring-long.q')
+*30 days max runtime for the long queue ('qconf -sq saucy-long.q | grep h_rt')
-*15min max runtime in test.q
+*15min max runtime in test.q ('qconf -sq test.q | grep h_rt')
-*4GB default mem_free request per slot ('qconf -sc |grep mem_free')
+*4GB default mem_free request per slot ('qconf -sc | grep mem_free')
 === Technical details  ===