Difference between revisions of "Analysing Data on Armis or Great Lakes using Rmd Files"

From Bridges Lab Protocols
Jump to: navigation, search
(Configuring R: added notes about creating R scripts)
(Added a slurm script.)
Line 56: Line 56:
 
Follow the prompts agreeing to make a personal library and using the Michigan mirror (71)
 
Follow the prompts agreeing to make a personal library and using the Michigan mirror (71)
  
=== Creating a R script ===
+
=== Creating a Rmd Script===
I prefer to generate Rmd scripts on my own computer and then transfer them to the data folder to run
+
I prefer to generate Rmd scripts on my own computer and then transfer them to the data folder to run (see [[#Transferring Files|Transfering Files]] to move scripts onto the server).
  
== Transferring Results ==
+
=== Creating a Batch Script ===
 +
To submit a job to the server you will need to create a batch script to submit your jobs.  The Armis2 and Greatlakes servers use slurm to submit jobs.
 +
Details of how to configure a slurm file can be found [https://arc.umich.edu/armis2/slurm-user-guide here].  Each script can include multiple Rmd files.  A sample script is below.  Make sure to replace <USER> with your slurm user id and entering a name for the job and your email.
 +
 
 +
<pre>
 +
#!  /bin/bash
 +
#SBATCH --job-name <JOB_NAME>
 +
#SBATCH --nodes=1
 +
#SBATCH --cpus-per-task=1
 +
#SBATCH --mem-per-cpu=24G
 +
#SBATCH --time=00:55:00
 +
#SBATCH --account=<USER>
 +
#SBATCH --partition=standard
 +
#SBATCH --mail-user=<YOUR_EMAIL>
 +
#SBATCH --mail-type=END,FAIL
 +
#SBATCH --output=/home/%u/%x-%j.log
 +
#SBATCH --error=/home/%u/error-%x-%j.log
 +
 
 +
module purge
 +
module load R
 +
module load RStudio
 +
 
 +
echo "Running from $(pwd)"
 +
 
 +
 
 +
# requires 30 mins and 64G
 +
Rscript -e "rmarkdown::render('<FIRST_SCRIPT>.Rmd')"
 +
Rscript -e "rmarkdown::render('<SECOND_SCRIPT>.Rmd')"
 +
</pre>
 +
 
 +
Save this as a <JOB_NAME>.slurm file in your working folder on turbo.
 +
 
 +
== Transferring Files ==

Revision as of 17:15, 10 July 2023

Software

On OSX

On Windows

Access

At Michigan there are two clusters, one for human data (Armis) and one for non-PHI data (Great Lakes). Both of these are connected to a storage service called Turbo. These are available as part of the UM Research Computing Package.

Permissions


You can access the server either through the command line or through a remote desktop if you prefer a GUI. There a limited functionality to Armis2 here that can also be used (see these instructions).

If you submit a deidentified data request via https://datadirect.precisionhealth.umich.edu, your data will be sent as a 7zip file to the turbo storage in a folder named for your IRB. You will be prompted to enter a password to protect this file.

Accessing the Server - Command Line

On OSX open a terminal and enter ssh <UNIQNAME>@armis2.arc-ts.umich.edu replacing with your uniqname. Enter your level 1 password. Authenticate using Duo following the instructions. You will be placed in your home folder. Your data should be in /nfs/turbo/precision-health/DataDirect/<YOUR-IRB>. I find it convenient to make symlinks to quickly navigate to your data folder so first find your folder name

ls /nfs/turbo/precision-health/DataDirect/

locate the name of your folder <FOLDER_NAME>

create the symlink

ln -s /nfs/turbo/precision-health/DataDirect/<FOLDER_NAME> <LINK_NAME>

Now you can navigate from your home folder to your data folder by typing

cd <LINK_NAME>

Accessing the Server - Remote Desktop

Extracting Data

Submitting Scripts to the Server

Configuring R

Some R packages may need to be installed in your home folder. To do this go to your home folder cd ~/ and enter the following to load the R modules, enter an R shell, install the packages and exit out. You will only have to do this once to install the relevant R packages in your script

module load R
R
install.packages("PACKAGE_NAME")
exit()

Follow the prompts agreeing to make a personal library and using the Michigan mirror (71)

Creating a Rmd Script

I prefer to generate Rmd scripts on my own computer and then transfer them to the data folder to run (see Transfering Files to move scripts onto the server).

Creating a Batch Script

To submit a job to the server you will need to create a batch script to submit your jobs. The Armis2 and Greatlakes servers use slurm to submit jobs.

Details of how to configure a slurm file can be found here.  Each script can include multiple Rmd files.  A sample script is below.  Make sure to replace <USER> with your slurm user id and entering a name for the job and your email.
#!  /bin/bash
#SBATCH --job-name <JOB_NAME>
#SBATCH --nodes=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=24G
#SBATCH --time=00:55:00
#SBATCH --account=<USER>
#SBATCH --partition=standard
#SBATCH --mail-user=<YOUR_EMAIL>
#SBATCH --mail-type=END,FAIL
#SBATCH --output=/home/%u/%x-%j.log
#SBATCH --error=/home/%u/error-%x-%j.log

module purge
module load R
module load RStudio

echo "Running from $(pwd)"


# requires 30 mins and 64G
Rscript -e "rmarkdown::render('<FIRST_SCRIPT>.Rmd')"
Rscript -e "rmarkdown::render('<SECOND_SCRIPT>.Rmd')"

Save this as a <JOB_NAME>.slurm file in your working folder on turbo.

Transferring Files