Storage


There are three types of storage directly connected to the cluster. All directories are mounted on both submit nodes and all compute nodes. Each directory is assigned a variable when you log in that you may use in your scripts.

NOT BACKED UP: No data, anywhere, is backed up. We recommend using the Research File Storage for those purposes.

To see any of the information below, run the command crctool

 
NameQuotaVariablePurpose
Home

100 GB
1,000,000 Files

$HOMEPersonal storage assigned to every user
Work

Based on group allocation
Use crctool to see usage

$WORKShared storage between research group to collaborate within group. Store raw data sets.
Scratch85 TB total shared between all users$SCRATCHTemporary storage used to process raw data sets. Subject to 60 day purge with notice.

You can use $HOME, $WORK, and $SCRATCH in you submit scripts and make it easier to get around the file systems.

All owner groups will receive 1TB of $WORK storage for free. Any additional space of $WORK can be purchased for $100 per TB per year. Invoices for storage will be sent 1 year after the storage's start date.


Transfer Data

Using the Data Transfer Node (dtn.ku.edu), files on the Research File Storage (ResFS) and the KU Community Cluster may be accessed

 
NamePurposeKU Anywhere Required?Notes
SCP / SFTPTransfer small data sets to and from the cluster.YesIf source or destination off-campus, must use KU Anywhere. Must keep connection open while transferring.
GlobusTransfer large data sets between storage and the world. Share data sets to anyone to be downloaded or uploaded.NoUses web application. Can transfer data outside of KU easily. Must use Globus software for destination.

Path to Access Files

StoragePath
KU Community Cluster/panfs/pfs.local
ResFS/resfs/GROUPS

Quota

Each type of storage has an enforced quota. To determine how much of your quota you are using for each of these volumes, login to the cluster and run crctool

This will produce output similar to the following:

------------------------------- Storage Variables ------------------------------
| Variable     Path                                                            |
| $HOME        /home/username                                                  |
| $WORK        /panfs/pfs.local/work/groupname/username                        |
| $SCRATCH     /panfs/pfs.local/scratch/groupname/username                     |
--------------------------------------------------------------------------------

--------------------------------- Disk Quotas ----------------------------------
| Disk         Usage (GB)     Limit    %Used   File Usage       Limit   %Used  |
| $HOME             33.51    100.00    33.51        99054      100000   99.05  |
| $WORK           6436.80  13969.84    46.08       296488           0       0  |
| $SCRATCH       39533.15  55879.35    70.75            1           0       0  |
--------------------------------------------------------------------------------

Violation

Users will receive an email from the Panasas File System when a quota for $HOME, $WORK, and $SCRATCH has been exceeded (Hard Quota) or is about to be exceeded (Soft Quota). They will all start with the information below:

PanActive Manager Warning: User Quota Violation Soft (bytes)

Date:        Wed Feb 07 00:00:16 CST 2018           
System Name: <name>                              
System IP:   <ip range>
Version      <version>                    
Customer ID: <custid>           
  • Soft Quota: A warning email is sent to the user that the specified resource is about to exceed the size or file limit quota. The example below is for $HOME for the user and has crossed the 85 GB threshold.

    User Quota Violation Soft (bytes):  Limit reached on volume /home for Unix User:   (Id:  uid:<uid>) Limit = 85.00 GB.
    
    The above message applies to the following component:
        Volume: /home

    This example is for a Soft Quota Violation of the file limit size in $HOME

    User Quota Violation Soft (files):  Limit reached on volume /home for Unix User:   (Id:  uid:<uid>) Limit = 85.00 K.
    
    The above message applies to the following component:
        Volume: /home 
  • Hard Quota: The maximum allotted space has been reached for that volume. This could be for any of locations above, including the system-wide location of $SCRATCH. No further writes are allowed, and you must remove files before creating any new ones

    User Quota Violation Hard (bytes):  Limit reached on volume /home for Unix User:   (Id:  uid:<uid>) Limit = 100.00 GB.  No further writes allowed for this Unix User in this volume unless it has at most 95.00 GB of data.
    
    The above message applies to the following component:
        Volume: /home

When you have removed enough files to drop below the Hard or Soft Quota violation, an Event CLEARED email will be sent to you. At the top of the email, you will notice the below:

Event CLEARED: PanActive Manager Warning: User Quota Violation Hard (bytes)

Recovering your files

One of the features of our cluster filesystem is the concept of snapshots. Snapshots are a daily capture of files in a given directory. All snapshots are user accessible, but only for volumes that are owned by a group the user is part of. Snapshots are read-only, but can be used for when you accidentally delete a file, you can retrieve that file up to seven days later.

Snapshots are stored in the .snapshot directory in the root of the your work or home directory, but this directory is hidden, and won't be displayed in listings (ls) of that directory. Snapshots are captured for $HOME and $WORK directories but not $SCRATCH

For example, say you're working in your work directory, (i.e. /panfs/pfs.local/work/groupname/username) and you accidentally delete a file named oops.txt. To restore that file from a previous snapshot, you can navigate to the .snapshot directory for your group's work and there you will find directories containing snapshots from the past seven days. Each of these directories contain a file structure similar to that of /panfs/pfs.local/groupname and has a snapshot of what was in those files when that snapshot was taken. You can navigate into those directories and copy the file(s) you accidentally deleted back to your work directory.

cd /panfs/pfs.local/work/groupname/.snapshot
ls
cd date-of-snapshot.automatic 
cd username
cp oops.txt /panfs/pfs.local/work/groupname/username

If one particular file was heavily modified, the snapshot may not recover the most recent change, but it will have the files that were in those directories when the snapshot was taken for that day.

Snapshots of home directories can also be found in

/home/.snapshot/date-of-snapshot.automatic/username

Due to the way that directory is set up you cannot ls inside the date-of-snapshot.automatic directory, instead you must go directly to your own home directory as shown above.

Snapshots are on a rolling seven day purge, so if you accidentally delete a file you will need to restore it within seven days or it will be gone forever.