Hidden directories and package management
If you use either
conda or
pip
to manage software packages, or work with
keras
,you may have encountered something strange: Your home directory runs out of disk space, even though your files are small.
The reason is likely
hidden directories
. By default, in UNIX files and directories whose names begin with a period (
.
) are not listed when using the
ls
command. To see these files, use
ls -a
. For example:
> ls
CreateSubdirectories.C Untitled.ipynb
> ls -a
. .condarc .ipython .profile .viminfo
.. .condor .jupyter .root_hist .vimrc
.astropy .config .local .python_history .spamassassin .Xauthority
.bash_history CreateSubdirectories.C .myprofile .ssh .zprofile
.bashrc .npm Untitled.ipynb
.cache .ipynb_checkpoints .pyzor .vim
In UNIX, files and directories created for bookkeeping purposes often are hidden, so you can focus on what you've created for your own work. As you can see from the example above, many programs create additional files that you don't usually see. For example, you can use hidden files like
.myprofile
to configure your
shell environment.
For the most part, these hidden files are small. Among the exceptions are the directories used by
conda
and
pip
to store the packages you download to create custom software environments.
Here are a couple of examples:
The conda packages are typically stored in
~/.conda
. I use the
du
command (for "disk usage") to see how much disk space is used by the directory:
> cd
> du -shx .conda
4.0G .conda
The pip user packages are normally stored in
~/.cache
, and the python packages in
~/.local
:
> cd
> du -shx .cache
4.8G .cache
> du -shx .local
1.3G .local
As you can see from the above examples, a substantial fraction of your available disk storage can be used by these utilities, in a place where you may not easily see them.
The rest of this page discusses how to store these directories outside of their default locations, presumably on your group's file servers which have much more available free space. You only have to go through the following steps
once; any changes you've made to your configuration files will be preserved between login sessions.
If you don't understand what an example path like
/nevis/olga/data
means, you may want to review the
automount page. Although I suggest storing your package-management directories on a file server, don't forget the warning from the
disk guide: files in
/data
partitions are
not backed up. You may want to periodically save the names of the packages you've downloaded so that you can re-create your environments should the worst occur, e.g., with
conda list
or
pip freeze
.
Maybe it's already there
Before creating a custom environment, consider looking at the packages already installed on the Nevis
Linux cluster. It might be that all the packages you need are already there. There's a list in the
conda topic.
Conda file management
This is covered in more detail in the
conda wiki page. Here's the short version.
# Define a handy variable for these operations. The variable
# $USER is your account name on the UNIX system.
# Note that /nevis/olga/data/$USER/conda does not exist at
# Nevis; this is an example that you'll have to edit to suit
# your group's file servers.
prefix=/nevis/olga/data/$USER/conda
# Create the new package environment directory.
mkdir -p ${prefix}/envs
# If you have an existing environment, clone it to the new location;
# assume your original environment is named 'myenv'.
conda create --name myenv --clone ${prefix}/envs/myenv
# Move the package directory to the new location
(cd ~/.conda; tar -cf - ./pkgs) | (cd ${prefix}; tar -xf -)
# Delete the old environment
conda remove --name myenv --all
# Delete the old package directory
rm -rf ~/.conda/pkgs
# Add the new definitions to ~/.condarc
conda config --add envs_dirs ${prefix}/envs
conda config --add pkgs_dirs ${prefix}/pkgs
pip and python locations
Note: This section is meant for those who use pip outside of a conda environment. If you're using pip within a conda environment (which I don't recommend if you can avoid it; see the conda topic), then the version of pip within that environment will automatically store files within that environment's directory.
There are two aspects to pip's package management: The location of its cache, and the location in which it installs packages. Both of these directories can grow quite large.
If you're using multiple python versions (this is
extremely likely, even if you're not aware of it), you want the path of the package directory to include the python version somehow. In the example below, I'm using fancy UNIX tricks to make that happen.
This command looks cryptic. What it does it detect the version of Python you're running, and set the variable
$pvers
to 'pythonM.N', where M.N is the major and minor version of Python. For example, if you're using Python 3.9.6,
$pvers
will be set to 'python3.9`.
pvers=$(echo $(python --version) | sed "s/Python \([0-9]\+\.[0-9]\+\).*/python\1/")
For these examples, assume you want to store pip's cache in
/nevis/olga/data/$USER/cache
, and have pip store the python libraries in
/nevis/olga/data/$USER/${pvers}/site-packages
. Note that these names are examples; in particular, there's no machine named
olga
at Nevis.
To change the default location of pip's directories, you have to modify pip's configuration file. Use variables (to refer to the directories later) and
pip config
to set new values:
# Define variables for our pip/python directories.
cacheDir=/nevis/olga/data/$USER/cache
pkgDir=/nevis/olga/data/$USER/${pvers}/site-packages
# Modify pip's configuration
pip config set global.cache-dir ${cacheDir}
pip config set global.target ${pkgDir}
Create these directories. You can copy your existing files if you wish:
mkdir -p ${cacheDir}
mkdir -p ${pkgDir}
# Copy the old cache files to the new locations. This is
# optional.
(cd ~/.cache/pip; tar -cf -) | (cd ${cacheDir}; tar -xvf -)
# For the libraries, you may have to reinstall them. The following
# line may work, but I make no promises:
( cd $(python -m site --user-site); tar -cf - ) | (cd ${pkgDir}; tar -xvf -)
# Delete the old pip cache directory
rm -rf ~/.cache/pip
# Erase the contents of your python packages directory (see below).
( cd $(python -m site --user-site); rm -rf * )
You've instructed pip to install the python packages in a new location. Now you have tell python to search this location. You can do this by modifying the
$PYTHONPATH
shell variable, but you have to do this for every login session. What I did was to create a
.pth file
. To find out where to put the file:
python -m site --user-site
To create the file, I got fancy with shell scripting (the command in a
$()
expression is executed and its output is put into the command line):
pthFile=$(python -m site --user-site)/my-packages.pth
Note that you can use any name of the form
name.pth
for the path-name file. In
my-packages.pth
, you would put in the location of your relocated python library; here that's what's in the variable
${pkgDir}
. I'm using another UNIX shell trick to add the name of the new package directory to the end of my
.pth
file:
echo ${pkgDir} | cat >> ${pthFile}
Again, I'm aware that all of the above is cryptic to anyone unfamiliar with UNIX shells. Assuming that you cut-and-paste the above commands, and edit the name
olga
to something suitable for your servers, you will hopefully end up with a consistent configuration for current future
pip install
commands. You can check with:
pip config list
keras downloads
There is one other hidden directory that I've seen that can grow unexpectedly large:
~/.keras
. It's used to download pre-trained models. Here's an example:
> cd
> du -shx .keras
2.6G .keras
At the time of this writing (Jul-2023), there is only one way to
re-direct keras downloads to a different directory
. That is to set the shell variable
$KERAS_HOME
to a different value
before you start a program that invokes keras. For example:
export KERAS_HOME="/nevis/olga/data/$USER/keras"
# Then start jupyter or run python or whatever.
You can put the above statement (suitably edited, since the machine
olga
does not exist at Nevis) into your
~/.myprofile
file so you can be assured that the variable will be set at every login session.
Is that all?
No. This page focuses on unexpected large hidden directories that have occurred on the Nevis particle-physic Linux cluster. Depending on your work and programs you use, there may be other "hidden" directories that store files that you did not expect, such as in the
Library
folder on MacOS. If you use python
virtualenv
in a project, you may want to look at the directories
venv
or
.venv
in the project's directory, or
~/.virtualenvs
in your home directory.
Treat this page as a guide to finding other such directories:
ls -a
to find hidden files/directories,
du
to see how much space they use.