Storage/Data
You can store your files on your home, work, and/or scratch directory. All directories are mounted on both submit nodes and all compute nodes. Each directory is assigned a variable
when you log in that you can use for quicker access to that space and reference in your scripts.
NOT BACKED UP: No data, anywhere, is backed up. We recommend using the Research File Storage for those purposes.
Tailing output - Slow down
Due to the nature of the parallel file system used for the cluster, tailing the output of a job could cause the job to perform slower.
This is due to the file system having to update your tail as well as accept the writes from the job.
If your job is mostly CPU intensive, you may not notice a different. But if I/O intensive, you may see a performance drop.
To see any of the information below, run the command crctool
- Home Directory (
$HOME
): Your home directory is a place only you have access to. If you wish to share your data, see$WORK
directory below. You have a 100 GB disk and 100,000 file limit quota.The path to your home directory is (replace
username
with your KU Online ID):/home/username
-
Work Directory (
$WORK
): Your work directory is a shared space for you to collaborate with your group. 1 TB is given for free to all owner groups. The quota is based off how much storage the owner of the group has purchased directory or indirectly from purchasing nodes. Additional$WORK
can be purchased for $100 per TB per year.To find your
$WORK
directory, run the commandcrctool
. The path to your$WORK
directory is:/panfs/pfs.local/work/groupname/username
-
Scratch Directory (
$SCRATCH
): Your scratch directory is a place for you to use as a temporary space for your data processing. The quota for scratch is a finite amount, but is set for the whole volume, which CRC staff will maintain.Files equal to or greater than 60 days old will be deleted
To find your
$SCRATCH
directory, run the commandcrctool
. The path to your$SCRATCH
directory is:/panfs/pfs.local/scratch/groupname/username
-
30 Day Temp Directory (
$TEMP
): Your temp directory is a place for you to use as a temporary space for your data processing. The quota for temp is a finite amount, but is set for the whole volume, which CRC staff will maintain. The temp storage system is off warranty and if broken beyond repair will be recycled.Files equal to or greater than 30 days old will be deleted daily
To find your
$TEMP
directory, run the commandcrctool
. The path to your$TEMP
directory is:/temp/30day/groupname/username
You can use $HOME
, $WORK
, $SCRATCH
, and $TEMP
in you submit scripts and make it easier to get around the file systems.
Quota
Location | Hard Quota | Hard File Limit |
---|---|---|
$HOME |
100 GB | 100,000 |
$WORK |
Varies | No Limit |
$SCRATCH |
85 TB | No Limit |
$TEMP |
164 TB | No Limit |
Each of the above locations, $HOME
, $WORK
, $SCRATCH
, and $TEMP
each have an enforced quota. To determine how much of your quota you are using for each of these volumes, login to the cluster and run crctool
This will produce output similar to the following:
------------------------------- Storage Variables ------------------------------ | Variable Path | | $HOME /home/username | | $WORK /panfs/pfs.local/work/groupname/username | | $SCRATCH /panfs/pfs.local/scratch/groupname/username | | $TEMP /temp/30day/groupname/username | -------------------------------------------------------------------------------- --------------------------------- Disk Quotas ---------------------------------- | Disk Usage (GB) Limit %Used File Usage Limit %Used | | $HOME 33.51 100.00 33.51 99054 100000 99.05 | | $WORK 6436.80 13969.84 46.08 296488 0 0 | | $SCRATCH 39533.15 55879.35 70.75 1 0 0 | | $TEMP 0.04 167605.89 0.00 0 0 | --------------------------------------------------------------------------------
Violation
Users will receive an email from the Panasas File System when a quota for $HOME
, $WORK
, and $SCRATCH
has been exceeded (Hard Quota) or is about to be exceeded (Soft Quota). They will all start with the information below:
PanActive Manager Warning: User Quota Violation Soft (bytes) Date: Wed Feb 07 00:00:16 CST 2018 System Name: <name> System IP: <ip range> Version <version> Customer ID: <custid>
-
Soft Quota: A warning email is sent to the user that the specified resource is about to exceed the size or file limit quota. The example below is for
$HOME
for the user and has crossed the 85 GB threshold.User Quota Violation Soft (bytes): Limit reached on volume /home for Unix User: (Id: uid:<uid>) Limit = 85.00 GB. The above message applies to the following component: Volume: /home
This example is for a Soft Quota Violation of the file limit size in
$HOME
User Quota Violation Soft (files): Limit reached on volume /home for Unix User: (Id: uid:<uid>) Limit = 85.00 K. The above message applies to the following component: Volume: /home
-
Hard Quota: The maximum allotted space has been reached for that volume. This could be for any of locations above, including the system-wide location of
$SCRATCH
. No further writes are allowed, and you must remove files before creating any new onesUser Quota Violation Hard (bytes): Limit reached on volume /home for Unix User: (Id: uid:<uid>) Limit = 100.00 GB. No further writes allowed for this Unix User in this volume unless it has at most 95.00 GB of data. The above message applies to the following component: Volume: /home
When you have removed enough files to drop below the Hard or Soft Quota violation, an Event CLEARED email will be sent to you. At the top of the email, you will notice the below:
Event CLEARED: PanActive Manager Warning: User Quota Violation Hard (bytes)
Transferring Files
Using the Data Transfer Node (dtn.ku.edu)
, files on the Research File Storage (ResFS) and the KU Community Cluster may be accessed
Beginning path to access files:
ResFS: /resfs/GROUPS/
RAS: /resfs/ARCHIVE/
KU Community Cluster: /panfs/pfs.local/
Globus
crchelp@ku.edu.Non Cluster Users You will receive a warning of Permission Denied if you do not have an account on the KU Community Cluster after accessing the KU Data Transfer Node. This is due to not having a home directory set up. You may ignore this and input the path to your files on /resfs/GROUPS or we can create a home directory for you by you emailing
Globus is a mechanism to transfer files between managed endpoints and personal endpoints. It does not store any data. Globus works through your internet browser, so therefore works on Windows, Linux, and MacOS.
To start using Globus, navigate to the Globus File Manager in your browser. From there, choose University of Kansas as your institution. You will be redirected to login with your KU Online ID.
Follow the Globus getting started documentation to access the KU Data Transfer Node
. Instead of Globus Tutorial Endpoint search for KU and choose KU Data Transfer Node
.
Transfer files to desktop or laptop
You need to create a personal endpoint on your desktop or laptop to transfer files from the KU Data Transfer Node to.
Full documentation for setting up a Globus Connect Personal endpoint.
On the File Manager
page, you need to choose the two endpoints you'd like to copy between. The cluster endpoint is named KU Data Transfer Node
. The other endpoint will be the name you gave in the previous step. You can now drag and drop the files you wish to transfer.
If you wish to share data from you personal endpoint or transfer data between two personal endpoints, you will need to request access to the CRC Globus group.
Transfer between ResFS, RAS, and KU Community Cluster
On each panel under the File Manager, choose the KU Data Transfer Node
endpoint. You can then navigate to ResFS on one side and the KU Community Cluster storage on the other panel. Simply drag and drop the files or folder you wish to transfer
Sharing Data
If you are not on the KU Community Cluster and you wish to share data, please send an email to crchelp@ku.edu to get your account set up.
The recipient will need to set up Globus Connect Personal if their institution does not have an endpoint to transfer the data to
The Globus documentation on sharing has a great walk through on the steps taken to share data
SCP
You must be on KU's network or connected to KU Anywhere to access the Data Transfer Node (DTN).
The KU Community Cluster supports SCP, SFTP and Rsync for transferring files:
Windows
Macintosh
Linux
- Filezilla
- Use command line SCP
Host:dtn.ku.edu Port: 22 scp username@host1:~/file1 username@host2:~/file1_copy
For example, to copy a file from your home directory on your local computer (e.g., ~/foo.txt
) to your home directory on the HPC, on the command line, enter (replace username
with your KU Online ID username):
scp ~/foo.txt username@dtn.ku.edu:~/foo.txt
Rsync
Rsync is a command for Linux and Mac only. It is used to transfer files back and forth using the Terminal and comes with no GUI.
You must be on KU's network or connected to KU Anywhere to access the Data Transfer Node (DTN).
The KU Community Cluster supports SCP, SFTP and Rsync for transferring files:
Host: dtn.ku.edu Port: 22 rsync -avP username@host1:~/file1 username@host2:~/file1_copy
For example, to copy a file from your home directory on your local computer (e.g., ~/foo.txt
) to your home directory on the HPC, on the command line, enter (replace username
with your KU Online ID username):
rsync -avP ~/foo.txt username@dtn.ku.edu:~/foo.txt
Recovering your files
One of the features of our cluster filesystem is the concept of snapshots. Snapshots are a daily capture of files in a given directory. All snapshots are user accessible, but only for volumes that are owned by a group the user is part of. Snapshots are read-only, but can be used for when you accidentally delete a file, you can retrieve that file up to seven days later.
Snapshots are stored in the .snapshot
directory in the root of the your work or home directory, but this directory is hidden, and won't be displayed in listings (ls)
of that directory. Snapshots are captured for $HOME
and $WORK
directories but not $SCRATCH
For example, say you're working in your work directory, (i.e. /panfs/pfs.local/work/groupname/username
) and you accidentally delete a file named oops.txt
. To restore that file from a previous snapshot, you can navigate to the .snapshot
directory for your group's work and there you will find directories containing snapshots from the past seven days. Each of these directories contain a file structure similar to that of /panfs/pfs.local/groupname
and has a snapshot of what was in those files when that snapshot was taken. You can navigate into those directories and copy the file(s) you accidentally deleted back to your work directory.
cd /panfs/pfs.local/work/groupname/.snapshot ls cd date-of-snapshot.automatic cd username cp oops.txt /panfs/pfs.local/work/groupname/username
If one particular file was heavily modified, the snapshot may not recover the most recent change, but it will have the files that were in those directories when the snapshot was taken for that day.
Snapshots of home directories
can also be found in
/home/.snapshot/date-of-snapshot.automatic/username
Due to the way that directory is set up you cannot ls
inside the date-of-snapshot.automatic directory, instead you must go directly to your own home directory as shown above.
Snapshots are on a rolling seven day purge, so if you accidentally delete a file you will need to restore it within seven days or it will be gone forever.
Research File Storage
Research File Storage (ResFS) provides easily accessible file sharing services to KU research projects, research groups and service labs. Any tenure track faculty member, designated research center, department or principal investigator on a research grant can purchase ResFS storage.
Stored data is accessible within the KU network or remotely via the KU Anywhere virtual private network (VPN). ResFS is protected by a RAID system. Data that is inadvertently deleted or otherwise lost can be recovered through self-service snapshots. A Snapshot is a 'frozen,' read-only view of a volume that provides easy access to previous versions of files and directories for up to 30 days. To recover data, follow our Knowledge Base articles for PC or Mac. ResFS data is also mirrored in the KUMC datacenter for major disaster recovery purposes only.
Research File Storage cost:
The Office of Research subsidizes ResFS, lowering the cost for researchers, faculty, departments, research projects and principal investigators.
250 gigabytes (GB):
No cost for the first 250GB.
More than 250GB:
Additional storage can be purchased in 1TB increments at a cost of $75 per terabyte, per year. Minimum purchase is 1TB.
Request New Research File Storage or Increase to Existing Storage
For a new storage allotment or for an increase in your current storage allotment, please complete the Research File Storage Request Form in ServiceNow.