Barley info

From FarmShare

(Difference between revisions)
Jump to: navigation, search
Line 15: Line 15:
*1 new machine, AMD Magny Cours 24 cores, 192GB RAM  
*1 new machine, AMD Magny Cours 24 cores, 192GB RAM  
*~450GB local scratch on each  
*~450GB local scratch on each  
-
*~7TB in /mnt/glusterfs shared across all barley and corn systems (due to be retired end of summer 2013)
 
*~100TB in /farmshare/user_data shared across all barley and corn systems (introduced summer 2013)
*~100TB in /farmshare/user_data shared across all barley and corn systems (introduced summer 2013)
-
*Grid Engine v6.2u5 (via standard Debian package)
+
*Open Grid Scheduler 2011.11p1
*10GbE interconnect (Juniper QFX3500 switch)
*10GbE interconnect (Juniper QFX3500 switch)
Line 32: Line 31:
If you want to use the newer bigger storage:  
If you want to use the newer bigger storage:  
-
#log into any FarmShare machine: "ssh sunetid@<host>.stanford.edu"
+
#log into any FarmShare machine: ssh sunetid@corn.stanford.edu
-
  ssh sunetid@corn.stanford.edu
+
#cd to /farmshare/user_data/<your username> (or wait 5mins if it doesn't exist yet)  
#cd to /farmshare/user_data/<your username> (or wait 5mins if it doesn't exist yet)  
#write a job script: "$EDITOR test_job.script"  
#write a job script: "$EDITOR test_job.script"  

Revision as of 20:31, 20 September 2013

Follow the FarmShare tutorial or the User Guide

current barley policies

  • 480 max jobs per user (look for max_u_jobs in output of 'qconf -sconf')
  • 3000 max jobs in the system (look for max_jobs in output of 'qconf -sconf')
  • 48hr max runtime for any job in regular queue (look for h_rt in output of 'qconf -sq precise.q')
  • 30 days max runtime for the long queue (look for h_rt in output of 'qconf -sq precise-long.q')
  • 15min max runtime in test.q
  • 4GB default mem_free request per slot ('qconf -sc |grep mem_free')

Technical details

  • 19 new machines, AMD Magny Cours 24 cores each, 96GB RAM
  • 1 new machine, AMD Magny Cours 24 cores, 192GB RAM
  • ~450GB local scratch on each
  • ~100TB in /farmshare/user_data shared across all barley and corn systems (introduced summer 2013)
  • Open Grid Scheduler 2011.11p1
  • 10GbE interconnect (Juniper QFX3500 switch)

how to use the barley machines

To start using these new machines, you can check out the man page for 'sge_intro' or the 'qhost', 'qstat', 'qsub' and 'qdel' commands.

Initial issues:

  • You are limited in space to your AFS homedir ($HOME) and local scratch disk on each node ($TMPDIR)
  • The execution hosts don't accept interactive jobs, only batch jobs for now.
  • You'll want to make sure you have your Kerberos TGT and your AFS token.

If you want to use the newer bigger storage:

  1. log into any FarmShare machine: ssh sunetid@corn.stanford.edu
  2. cd to /farmshare/user_data/<your username> (or wait 5mins if it doesn't exist yet)
  3. write a job script: "$EDITOR test_job.script"
    1. see 'man qsub' for more info
    2. use env var $TMPDIR for local scratch
    3. use /farmshare/user_data/<your username> for shared data directory
  4. submit the job for processing: "qsub -cwd test_job.script"
  5. monitor the jobs with "qstat -f -j JOBID"
    1. see 'man qstat' for more info
  6. check the output files that you specified in your job script (the input and output files must be in /farmshare/user_data/)

Any questions, please email 'farmshare-discuss@lists.stanford.edu' Some good intro usage examples here: http://gridscheduler.sourceforge.net/howto/basic_usage.html

Personal tools
Toolbox
LANGUAGES