Difference between revisions of "Analysing Data on Armis or Great Lakes using Rmd Files"
Davebridges (Talk | contribs) (→Configuring R: added notes about creating R scripts) |
Davebridges (Talk | contribs) (Added a slurm script.) |
||
Line 56: | Line 56: | ||
Follow the prompts agreeing to make a personal library and using the Michigan mirror (71) | Follow the prompts agreeing to make a personal library and using the Michigan mirror (71) | ||
− | === Creating a | + | === Creating a Rmd Script=== |
− | I prefer to generate Rmd scripts on my own computer and then transfer them to the data folder to run | + | I prefer to generate Rmd scripts on my own computer and then transfer them to the data folder to run (see [[#Transferring Files|Transfering Files]] to move scripts onto the server). |
− | == Transferring | + | === Creating a Batch Script === |
+ | To submit a job to the server you will need to create a batch script to submit your jobs. The Armis2 and Greatlakes servers use slurm to submit jobs. | ||
+ | Details of how to configure a slurm file can be found [https://arc.umich.edu/armis2/slurm-user-guide here]. Each script can include multiple Rmd files. A sample script is below. Make sure to replace <USER> with your slurm user id and entering a name for the job and your email. | ||
+ | |||
+ | <pre> | ||
+ | #! /bin/bash | ||
+ | #SBATCH --job-name <JOB_NAME> | ||
+ | #SBATCH --nodes=1 | ||
+ | #SBATCH --cpus-per-task=1 | ||
+ | #SBATCH --mem-per-cpu=24G | ||
+ | #SBATCH --time=00:55:00 | ||
+ | #SBATCH --account=<USER> | ||
+ | #SBATCH --partition=standard | ||
+ | #SBATCH --mail-user=<YOUR_EMAIL> | ||
+ | #SBATCH --mail-type=END,FAIL | ||
+ | #SBATCH --output=/home/%u/%x-%j.log | ||
+ | #SBATCH --error=/home/%u/error-%x-%j.log | ||
+ | |||
+ | module purge | ||
+ | module load R | ||
+ | module load RStudio | ||
+ | |||
+ | echo "Running from $(pwd)" | ||
+ | |||
+ | |||
+ | # requires 30 mins and 64G | ||
+ | Rscript -e "rmarkdown::render('<FIRST_SCRIPT>.Rmd')" | ||
+ | Rscript -e "rmarkdown::render('<SECOND_SCRIPT>.Rmd')" | ||
+ | </pre> | ||
+ | |||
+ | Save this as a <JOB_NAME>.slurm file in your working folder on turbo. | ||
+ | |||
+ | == Transferring Files == |
Revision as of 17:15, 10 July 2023
Contents
Software
On OSX
- Install Filezilla for file transfers https://filezilla-project.org/
On Windows
- Install Putty for command line access https://www.chiark.greenend.org.uk/~sgtatham/putty/latest.html
Access
At Michigan there are two clusters, one for human data (Armis) and one for non-PHI data (Great Lakes). Both of these are connected to a storage service called Turbo. These are available as part of the UM Research Computing Package.
Permissions
- For access to https://datadirect.precisionhealth.umich.edu/, you need to complete these two online classes and have the VPN installed on your device. For more details on access follow these instructions.
- PEERS - Human Subjects Research Protections
- HIPAA Training for Self Service Tools.
- You also need an approved IRB noting that you will use the data direct self-service tools.
- To get access to Armis2 fill out this form.
You can access the server either through the command line or through a remote desktop if you prefer a GUI. There a limited functionality to Armis2 here that can also be used (see these instructions).
If you submit a deidentified data request via https://datadirect.precisionhealth.umich.edu, your data will be sent as a 7zip file to the turbo storage in a folder named for your IRB. You will be prompted to enter a password to protect this file.
Accessing the Server - Command Line
On OSX open a terminal and enter ssh <UNIQNAME>@armis2.arc-ts.umich.edu
replacing with your uniqname. Enter your level 1 password. Authenticate using Duo following the instructions. You will be placed in your home folder. Your data should be in /nfs/turbo/precision-health/DataDirect/<YOUR-IRB>
. I find it convenient to make symlinks to quickly navigate to your data folder so first find your folder name
ls /nfs/turbo/precision-health/DataDirect/
locate the name of your folder <FOLDER_NAME>
create the symlink
ln -s /nfs/turbo/precision-health/DataDirect/<FOLDER_NAME> <LINK_NAME>
Now you can navigate from your home folder to your data folder by typing
cd <LINK_NAME>
Accessing the Server - Remote Desktop
Extracting Data
Submitting Scripts to the Server
Configuring R
Some R packages may need to be installed in your home folder. To do this go to your home folder cd ~/
and enter the following to load the R modules, enter an R shell, install the packages and exit out. You will only have to do this once to install the relevant R packages in your script
module load R R install.packages("PACKAGE_NAME") exit()
Follow the prompts agreeing to make a personal library and using the Michigan mirror (71)
Creating a Rmd Script
I prefer to generate Rmd scripts on my own computer and then transfer them to the data folder to run (see Transfering Files to move scripts onto the server).
Creating a Batch Script
To submit a job to the server you will need to create a batch script to submit your jobs. The Armis2 and Greatlakes servers use slurm to submit jobs.
Details of how to configure a slurm file can be found here. Each script can include multiple Rmd files. A sample script is below. Make sure to replace <USER> with your slurm user id and entering a name for the job and your email.
#! /bin/bash #SBATCH --job-name <JOB_NAME> #SBATCH --nodes=1 #SBATCH --cpus-per-task=1 #SBATCH --mem-per-cpu=24G #SBATCH --time=00:55:00 #SBATCH --account=<USER> #SBATCH --partition=standard #SBATCH --mail-user=<YOUR_EMAIL> #SBATCH --mail-type=END,FAIL #SBATCH --output=/home/%u/%x-%j.log #SBATCH --error=/home/%u/error-%x-%j.log module purge module load R module load RStudio echo "Running from $(pwd)" # requires 30 mins and 64G Rscript -e "rmarkdown::render('<FIRST_SCRIPT>.Rmd')" Rscript -e "rmarkdown::render('<SECOND_SCRIPT>.Rmd')"
Save this as a <JOB_NAME>.slurm file in your working folder on turbo.