Linux Cluster Disk Guide 
The most important things to learn from this page: 
-  /home and /sharepartitions are backed up.
-  /data and /scratchare not.
-  The disk quota for temporary and guest accounts is 10GB.
This is a guide to issues with disk storage on the on the 
Linux cluster.  
-  To find out how disks on one system can be accessed from another, see the automount page. 
-  To understand different partition names (e.g., why are there /shareand/scratchdirectories), see the disk sharing page.
-  To learn about the server for temporary and guest accounts, see the student file server page. 
 How much disk space do I have? 
To find out how much disk space you have available, use the 
df
 command.  You'll probably always want to use the 
-h option, so the sizes appear in human-readable form:
df -h
You'll almost certainly see disks in the list that are mounted via 
automount. If you find the automounted disks to be distracting, add 
-l to the command:
df -hl
Bear in mind that you 
don't want to use the 
-l option if your home directory is not on the machine to which you've logged in. (As of Jan-2017, this mainly applies to ATLAS users logged onto 
xenia.)
Here's the result of executing 
df -hl on the machine 
tanya on 28-Jan-2017:
Filesystem           Size  Used Avail Use% Mounted on
/dev/mapper/VG-root   20G  9.2G  9.5G  50% /
tmpfs                494M  760K  493M   1% /dev/shm
/dev/md0             461M   98M  340M  23% /boot
/dev/mapper/VG-home   50G   16G   32G  33% /home
/dev/mapper/VG-data  222G  6.1G  205G   3% /data
If we ignore partitions that relate to the operating system, we're left with two key user-accessible filesystems: 
/home and 
/data.  (Many systems have other key partitions, such as 
/share and 
/scratch.)  
 It's not much 
Your first reaction may be: "There's not much disk space for my home directory, and I have to share that space with other people in the collaboration. Why, my watch has more storage than that!"
You're right. It's intended that the 
/home be used for "source" files (program code, scientific papers, plots, etc.); 
/data or 
/scratch should be used for large and re-creatable files (compiled binaries, data summaries, temporary work files, etc.). We have to ask you to use judgement and discipline, and to be aware that you're sharing space with your fellow scientists. 
 If you're just skimming this page, stop and read this 
The reason why 
/home is small and 
/data is big is that the 
/home partition is backed up; 
/data is not.  In fact, it goes one step further: the 
/data partition is always considered expendable for any type of system maintenance activity. If a system is being repaired, upgraded, or restored, the 
/data partition may be erased.  
There's more about this in the section on backups below. 
 What do I do if I need more disk space? 
First, look to 
/data partitions on other systems in your working group. The 
/data partitions on all the systems that belong to a group are intended to be a shared resource; if you don't have enough space on 
/nevis/yourmachine/data, 
cd /nevis/othermachine/data in your group and see how much free space it has.
I strongly advise you to exercise common courtesy as you're scrounging for disk space. If I found someone had used a big chunk of my server's 
/data partition without asking, I might be annoyed.  
If you still don't have enough disk space on all your group's machines to satisfy your needs, you may have to request more disks be added to the existing systems (or buy a new box).
 Backups 
The Nevis Linux cluster sytems are backed up periodically onto 
backup.nevis.columbia.edu, the Nevis backup server. 
For speed, we don't copy every file from every system; we use a program called 
backintime
 to copy only those files that have changed over time. The 
backintime
 program functions similarly to Apple's 
Time Machine
: Periodically, only those files that have changed since the last 
snapshot
 are copied to the backup server.
We don't back up every file on every system on the cluster.  The practice is: the 
/home partition and 
/share partitions are backed up; 
/data is not.  There is a web page that contains the 
list
 of which partitions are backed up. 
The frequency with which we take backintime snapshots varies between systems and disk partitions. For example, 
/home and 
/share partitions have snapshots made every four hours; the cluster library directory 
/usr/nevis (whose contents change rarely) has a snapshot made once a day. Old snapshots are kept for a time, again based on how much space is available on the backup server; for example, as of Feb-2025 old files from 
/home partitions are kept for years; mail files (which can grow very large depending on user preferences) are only kept for only a few days. 
- Why are /homepartitions so small?
The answer is backup. There are roughly 100 systems on the Linux cluster, and we back them all up. Some of the backintime jobs take hours to run. Even if we had more disk space, as a practical matter we can't have a regular backup that takes more than 4 hours to run. 
We therefore have to ask users to segregate their files into key files that will be backed up, and re-creatable files that won't. The relative sizes of/homeversus/datapartitions help enforce this segregation.
- Why don't you back up /datapartitions?
We have vastly more disk storage on the Nevis cluster than we can hope to back up on any system that we can afford.  As of Jan-2025 we have over 500TB of storage assigned to/datapartitions on different systems.
- I've got files in a /datapartition that would be a pain to re-create.  How can I back them up myself?
The simplest thing to do is to make copies on other/datapartitions in your workgroup's cluster.  After all, that's all a backup is: a second copy of your files.
- Why don't we make backups more frequently, or keep files for longer?
We're doing what we can with the resources we have available.  We don't have the disk space on our backup server for a year's worth of mail backups, for example.
- I've got a brilliant idea! I'll put a link in my home directory to a directory on a /datadisk. That way, when my home directory is backed up --
Stop right there. It won't work. The backup procedure does not follow links. We've had at least one student who lost critical files because they tried this trick. You can't magically increase your available backed-up disk space in this way.
- I've got critical files that that are not in my /home directory, or that I want backed up even more often. What can I do?
Supplement our backups with your own.  You can run the program backintime-qt yourself, saving files from any source to any destination for which you have read/write permission. You may also want to look at the documentation for command-line backintime yourself, saving files from any source to any destination for which you have read/write permission. You may also want to look at the documentation for command-line backintime and details of the backintime configuration file and details of the backintime configuration file .
Take care! Think carefully about what you include in your snapshots and how long you want to keep them. Without limits, backintime will fill up an entire drive before it deletes excess snapshots. If other users share the same disk space, this may irritate them if you're doing four-times-an-hour backups of your frequently-refreshed 10GB files. .
Take care! Think carefully about what you include in your snapshots and how long you want to keep them. Without limits, backintime will fill up an entire drive before it deletes excess snapshots. If other users share the same disk space, this may irritate them if you're doing four-times-an-hour backups of your frequently-refreshed 10GB files.
 Long-term data storage 
For the purposes of this section, "long-term" means more than six months or so.
By the above definition, there is no long-term data storage at Nevis. As noted above: 
-  we back up /homedirectories, but keep we don't keep snapshots indefinitely;
-  /datadirectories are not backed up at all;
-  RAID arrays can and do fail.  (This section is being written on 25-Apr-2006; on that day, we lost the contents of a RAID5 array.)
If you need long-term storage for any of your files, I suggest you consider the facilities at BNL, FNAL, or CERN.