Storage

There are three types of storage directly connected to the cluster. All directories are mounted on both submit nodes and all compute nodes. Each directory is assigned a variable when you log in that you may use in your scripts.

NOT BACKED UP: No data, anywhere, is backed up. We recommend using the Research File Storage or Research Archive Storage for those purposes.

Tailing Follow - Slow down
Following the tail of output files may cause your job to perform slower

To see any of the information below, run the command crctool

Name Quota Variable Purpose
Home

100 GB
100,000 Files

$HOME Personal storage assigned to every user
Work

Based on group allocation

Use crctool to see usage

$WORK Shared storage between research group to collaborate within group. Store raw data sets.
Scratch

85 TB Total shared between all users

$SCRATCH      Temporary storage used to process raw data sets. Subject to 60 day purge with notice.
Temporary      140 TB Total shared between all users $TEMP30 Temporary storage used to process raw data sets. Subject to 30 day purge with no notice. (Older hardware)

You can use $HOME, $WORK, $SCRATCH, and $TEMP30 in you submit scripts and make it easier to get around the file systems.


Transfer Data

Using the Data Transfer Node (dtn.ku.edu), files on the Research File Storage (ResFS), Research Archive Storage (RAS), and the KU Community Cluster may be accessed

Name Purpose KU Anywhere Required? Notes
SCP/SFTP        Transfer small data sets to and from the cluster. Yes If source or destination off-campus, must use KU Anywhere. Must keep connection open while transferring.
Globus Transfer large data sets between storage and the world. Share data sets to anyone to be downloaded or uploaded. No Uses web application. Can transfer data outside of KU easily. Must use Globus software for destination.

Path to access files

Storage Path
KU Community Cluster /panfs/pfs.local
ResFS /resfs/GROUPS
RAS /resfs/ARCHIVE

Quota

Each type of storage has an enforced quota. To determine how much of your quota you are using for each of these volumes, login to the cluster and run crctool

This will produce output similar to the following:

------------------------------- Storage Variables ------------------------------
| Variable     Path                                                            |
| $HOME        /home/username                                                  |
| $WORK        /panfs/pfs.local/work/groupname/username                        |
| $SCRATCH     /panfs/pfs.local/scratch/groupname/username                     |
| $TEMP30      /temp/30day/groupname/username                                  |
--------------------------------------------------------------------------------

--------------------------------- Disk Quotas ----------------------------------
| Disk         Usage (GB)     Limit    %Used   File Usage       Limit   %Used  |
| $HOME             33.51    100.00    33.51        99054      100000   99.05  |
| $WORK           6436.80  13969.84    46.08       296488           0       0  |
| $SCRATCH       39533.15  55879.35    70.75            1           0       0  |
| $TEMP30            0.04 167605.89     0.00                        0       0  |
--------------------------------------------------------------------------------

Violation

Users will receive an email from the Panasas File System when a quota for $HOME, $WORK, and $SCRATCH has been exceeded (Hard Quota) or is about to be exceeded (Soft Quota). They will all start with the information below:

PanActive Manager Warning: User Quota Violation Soft (bytes)

Date:        Wed Feb 07 00:00:16 CST 2018           
System Name: <name>                              
System IP:   <ip range>
Version      <version>                    
Customer ID: <custid>           
  • Soft Quota: A warning email is sent to the user that the specified resource is about to exceed the size or file limit quota. The example below is for $HOME for the user and has crossed the 85 GB threshold.

    User Quota Violation Soft (bytes):  Limit reached on volume /home for Unix User:   (Id:  uid:<uid>) Limit = 85.00 GB.
    
    The above message applies to the following component:
        Volume: /home

    This example is for a Soft Quota Violation of the file limit size in $HOME

    User Quota Violation Soft (files):  Limit reached on volume /home for Unix User:   (Id:  uid:<uid>) Limit = 85.00 K.
    
    The above message applies to the following component:
        Volume: /home 
  • Hard Quota: The maximum allotted space has been reached for that volume. This could be for any of locations above, including the system-wide location of $SCRATCH. No further writes are allowed, and you must remove files before creating any new ones

    User Quota Violation Hard (bytes):  Limit reached on volume /home for Unix User:   (Id:  uid:<uid>) Limit = 100.00 GB.  No further writes allowed for this Unix User in this volume unless it has at most 95.00 GB of data.
    
    The above message applies to the following component:
        Volume: /home

When you have removed enough files to drop below the Hard or Soft Quota violation, an Event CLEARED email will be sent to you. At the top of the email, you will notice the below:

Event CLEARED: PanActive Manager Warning: User Quota Violation Hard (bytes)

Recovering your files

One of the features of our cluster filesystem is the concept of snapshots. Snapshots are a daily capture of files in a given directory. All snapshots are user accessible, but only for volumes that are owned by a group the user is part of. Snapshots are read-only, but can be used for when you accidentally delete a file, you can retrieve that file up to seven days later.

Snapshots are stored in the .snapshot directory in the root of the your work or home directory, but this directory is hidden, and won't be displayed in listings (ls) of that directory. Snapshots are captured for $HOME and $WORK directories but not $SCRATCH

For example, say you're working in your work directory, (i.e. /panfs/pfs.local/work/groupname/username) and you accidentally delete a file named oops.txt. To restore that file from a previous snapshot, you can navigate to the .snapshot directory for your group's work and there you will find directories containing snapshots from the past seven days. Each of these directories contain a file structure similar to that of /panfs/pfs.local/groupname and has a snapshot of what was in those files when that snapshot was taken. You can navigate into those directories and copy the file(s) you accidentally deleted back to your work directory.

cd /panfs/pfs.local/work/groupname/.snapshot
ls
cd date-of-snapshot.automatic 
cd username
cp oops.txt /panfs/pfs.local/work/groupname/username

If one particular file was heavily modified, the snapshot may not recover the most recent change, but it will have the files that were in those directories when the snapshot was taken for that day.

Snapshots of home directories can also be found in

/home/.snapshot/date-of-snapshot.automatic/username

Due to the way that directory is set up you cannot ls inside the date-of-snapshot.automatic directory, instead you must go directly to your own home directory as shown above.

Snapshots are on a rolling seven day purge, so if you accidentally delete a file you will need to restore it within seven days or it will be gone forever.


Research File Storage

Research File Storage (ResFS) provides easily accessible file sharing services to KU research projects, research groups and service labs. Any tenure track faculty member, designated research center, department or principal investigator on a research grant can purchase ResFS storage.

Stored data is accessible within the KU network or remotely via the KU Anywhere virtual private network (VPN). ResFS is protected by a RAID system. Data that is inadvertently deleted or otherwise lost can be recovered through self-service snapshots. A Snapshot is a 'frozen,' read-only view of a volume that provides easy access to previous versions of files and directories for up to 30 days. To recover data, follow our Knowledge Base articles for PC or Mac. ResFS data is also mirrored in the KUMC datacenter for major disaster recovery purposes only.

Research File Storage cost:

The Office of Research subsidizes ResFS, lowering the cost for researchers, faculty, departments, research projects and principal investigators.

250 gigabytes (GB):
No cost for the first 250GB.

More than 250GB:
Additional storage can be purchased in 1TB increments at a cost of $75 per terabyte, per year. Minimum purchase is 1TB.

Request New Research File Storage or Increase to Existing Storage

For a new storage allotment, please complete the Research File Storage Request Form in ServiceNow. For for an increase in your current storage allotment, please complete the ServiceNow Research File Storage Increase Request Form.


Cluster Support

If you need any help with the cluster or have general questions related to the cluster, please contact crchelp@ku.edu.

In your email, please include your submission script, any relevant log files, and steps in which you took to produce the problem

One of 34 U.S. public institutions in the prestigious Association of American Universities
44 nationally ranked graduate programs.
—U.S. News & World Report
Top 50 nationwide for size of library collection.
—ALA
5th nationwide for service to veterans —"Best for Vets: Colleges," Military Times
KU Today