We're waging war against COVID-19 and this is our call to arms!
Thanks to folding@home's software, we all have chance to be part of an (unofficial) record breaking global initiative, which could soon create the world's first ExaFLOP Supercomputer*! To find out more about the initiative and research being carried out on COVID-19, read our blog post here.
As of a few days ago, folding@home reported having over 470 petaFLOPS of compute available for researchers working on projects including COVID-19, and we're imploring the HPC community to join us in a bid to getting it to 1 ExaFLOP (1000 petaFLOPs).
We've already documented the instructions for Centos7 environments and we were pleasantly surprised by some of the feedback querying how this could be run on a HPC system as a standard user, rather than the root / admin account. Below we've added instructions on how to run the software without admin or root access and while working with a job schedular (in this instance SLURM).
Let's create a local area to work from .
mkdir scratch/fah -p cd scratch/fah
Next, lets grab the RPM for the client
Extract the files from the RPMs
[david@vcontroller fah]$ rpm2cpio ./fahclient-7.4.4-1.x86_64.rpm | cpio -idmv ./etc/init.d/FAHClient ./usr/bin/FAHClient ./usr/bin/FAHCoreWrapper ./usr/share/applications/FAHWebControl.desktop ./usr/share/doc/fahclient/ChangeLog ./usr/share/doc/fahclient/README ./usr/share/doc/fahclient/copyright ./usr/share/doc/fahclient/sample-config.xml ./usr/share/pixmaps/FAHClient.icns ./usr/share/pixmaps/FAHClient.png 19617 blocks
Now lets take a look at the pre/post scripts in the RPM
[david@vcontroller fah]$ rpm -qp --scripts ./fahclient-7.4.4-1.x86_64.rpm > fahclient-postinstall.sh
Lets investigate the post install script from the RPM above - but in short all looks ok, nothing major being done (although please have a peek if curious).
Check the init script to see what its doing when it starts a job
# file ./etc/init.d/FAHClient # Changed line 26 to: 26 QUIET=false # Changed line 120 to: 120 echo "$EXEC $OPTS &"
So here's our output
[david@vcontroller fah]$ ./etc/init.d/FAHClient start Starting fahclient ... /usr/bin/FAHClient /etc/fahclient/config.xml --run-as fahclient --pid-file=/var/run/fahclient.pid --daemon & cat: /var/run/fahclient.pid: No such file or directory FAIL
We are missing our fah config.xml file and a local var/run - but we can create that from the previous blog entry where we covered installing this as root.
Next up lets create some of the directories required for the fah client to run locally
[david@node0001 fah]$ mkdir var/run -p [david@node0001 fah]$ mkdir etc/fahclient -p [david@node0001 fah]$ cat ./etc/fahclient/config.xml <config> <!-- Folding Slot Configuration --> <gpu v='false'/> <!-- Slot Control --> <power v='full'/> <!-- User Information --> <user v='vscaler'/> <!-- Folding Slots --> <slot id='0' type='CPU'/> </config>
Now lets try this as an interactive job
[david@vcontroller ~]$ salloc -N 1 salloc: Granted job allocation 11470 [david@vcontroller ~]$ ssh $SLURM_JOB_NODELIST Last login: Tue Mar 24 17:51:11 2020 [david@node0001 ~]$ cd scratch/fah
Ok, we checked the FAHClient --help output to check a few things out - and changed the exec command to the following
[david@node0001 fah]$ ./usr/bin/FAHClient ./etc/fahclient/config.xml --run-as david --pid-file=./var/run/fahclient.pid --pid --service --respawn --log='./fahclient.log' --fork 18:09:41:INFO(1):Read GPUs.txt 18:09:41:************************* Folding@home Client ************************* 18:09:41: Website: http://folding.stanford.edu/ 18:09:41: Copyright: (c) 2009-2014 Stanford University 18:09:41: Author: Joseph Coffland <email@example.com> 18:09:41: Args: ./etc/fahclient/config.xml --run-as david 18:09:41: --pid-file=./var/run/fahclient.pid --pid --service --respawn 18:09:41: --log=./fahclient.log --fork 18:09:41: Config: /home/david/scratch/fah/./etc/fahclient/config.xml 18:09:41:******************************** Build ******************************** ...
Looking good - lets check the logfile
[david@node0001 fah]$ tail fahclient.log 18:10:50:WU00:FS00:0xa7:Completed 1 out of 500000 steps (0%) 18:11:04:WU00:FS00:0xa7:Completed 5000 out of 500000 steps (1%) 18:11:10:WU00:FS00:0xa7:Completed 10000 out of 500000 steps (2%) 18:11:16:WU00:FS00:0xa7:Completed 15000 out of 500000 steps (3%) 18:11:22:WU00:FS00:0xa7:Completed 20000 out of 500000 steps (4%) 18:11:28:WU00:FS00:0xa7:Completed 25000 out of 500000 steps (5%) 18:11:33:WU00:FS00:0xa7:Completed 30000 out of 500000 steps (6%) 18:11:39:WU00:FS00:0xa7:Completed 35000 out of 500000 steps (7%) 18:11:44:WU00:FS00:0xa7:Completed 40000 out of 500000 steps (8%) 18:11:52:WU00:FS00:0xa7:Completed 45000 out of 500000 steps (9%)
Happy days - we are up and running without root access 🙂
PS. Get permission or if needs be, bribe your local HPC admin before you fill up Slurm with jobs!
(*This supercomputer is not a supercomputer, as HPC fans will no doubt point out this is more of a distributed architecture which would probably be appalling at running tightly coupled workloads, such as Linpack, and would be nowhere on the top500 Supercomputer ranks - however its still a really cool achievement and a testament to whats possible when a community pulls together.)