Barley info
From FarmShare
(Difference between revisions)
(→current barley policies: Made more concise, and also updated raring to saucy.) |
|||
Line 3: | Line 3: | ||
=== current barley policies === | === current barley policies === | ||
- | *480 max jobs per user ( | + | *480 max jobs per user ('qconf -sconf | grep max_u_jobs') |
- | *3000 max jobs in the system ( | + | *3000 max jobs in the system ('qconf -sconf | grep max_jobs') |
- | *48hr max runtime for any job in regular queue ( | + | *48hr max runtime for any job in regular queue ('qconf -sq saucy.q | grep h_rt') |
- | *30 days max runtime for the long queue ( | + | *30 days max runtime for the long queue ('qconf -sq saucy-long.q | grep h_rt') |
- | *15min max runtime in test.q | + | *15min max runtime in test.q ('qconf -sq test.q | grep h_rt') |
- | *4GB default mem_free request per slot ('qconf -sc |grep mem_free') | + | *4GB default mem_free request per slot ('qconf -sc | grep mem_free') |
=== Technical details === | === Technical details === |
Revision as of 14:41, 30 July 2014
Follow the FarmShare tutorial or the User Guide
current barley policies
- 480 max jobs per user ('qconf -sconf | grep max_u_jobs')
- 3000 max jobs in the system ('qconf -sconf | grep max_jobs')
- 48hr max runtime for any job in regular queue ('qconf -sq saucy.q | grep h_rt')
- 30 days max runtime for the long queue ('qconf -sq saucy-long.q | grep h_rt')
- 15min max runtime in test.q ('qconf -sq test.q | grep h_rt')
- 4GB default mem_free request per slot ('qconf -sc | grep mem_free')
Technical details
- 19 new machines, AMD Magny Cours 24 cores each, 96GB RAM
- 1 new machine, AMD Magny Cours 24 cores, 192GB RAM
- ~450GB local scratch on each
- ~100TB in /farmshare/user_data shared across all barley and corn systems (introduced summer 2013)
- Open Grid Scheduler 2011.11p1
- 10GbE interconnect (Juniper QFX3500 switch)
how to use the barley machines
To start using these new machines, you can check out the man page for 'sge_intro' or the 'qhost', 'qstat', 'qsub' and 'qdel' commands.
Initial issues:
- You are limited in space to your AFS homedir ($HOME) and local scratch disk on each node ($TMPDIR)
- The execution hosts don't accept interactive jobs, only batch jobs for now.
- You'll want to make sure you have your Kerberos TGT and your AFS token.
If you want to use the newer bigger storage:
- log into any FarmShare machine: ssh sunetid@corn.stanford.edu
- cd to /farmshare/user_data/<your username> (or wait 5mins if it doesn't exist yet)
- write a job script: "$EDITOR test_job.script"
- see 'man qsub' for more info
- use env var $TMPDIR for local scratch
- use /farmshare/user_data/<your username> for shared data directory
- submit the job for processing: "qsub -cwd test_job.script"
- monitor the jobs with "qstat -f -j JOBID"
- see 'man qstat' for more info
- check the output files that you specified in your job script (the input and output files must be in /farmshare/user_data/)
Any questions, please email 'farmshare-discuss@lists.stanford.edu' Some good intro usage examples here: http://gridscheduler.sourceforge.net/howto/basic_usage.html