Jupyter/IPython at Nevis

Jupyter (formerly IPython) has become a popular tool for interactive physics analysis. Here are some examples of what you can do with notebooks. There is a dedicated Jupyter server, notebook, available for the users of the Nevis Linux cluster. To use it, visit https://notebook.nevis.columbia.edu and enter your Nevis cluster account name and password.

The basics

When you visit notebook for the first time, you'll see your home directory. You can perform some elementary file operations from this screen: check the box next to a filename, and you'll see an option near the top of the screen to rename or delete the file. The "Upload" button near the top left allows you to copy files from the computer you're using to the Nevis cluster.

To start a notebook, click on the "New" button near the top left and select one of the "kernels". (Underneath the kernel list: selecting "Text File" will give you a basic text editor and "Terminal" will give you a terminal emulator; see this page for more information.)

To "execute" the contents of a given cell within the notebook, hit SHIFT-ENTER with your cursor in that cell. I strongly suggest that you look at the Help menu. The User Interface Tour only takes a minute, and the Keyboard Shortcuts will be handy.

Why notebooks?

Type some commands in the language of the kernel you chose in the first cell. Hit SHIFT-ENTER to execute them. If there's an error, make an appropriate fix and hit SHIFT-ENTER again.

Continue editing lines in that first cell until you have finished some small task (e.g., creating a histogram). Execute the cell (SHIFT-ENTER) to demonstrate to yourself that you've got it right.

Go to the next cell and continue your work. The variables and functions you defined in the first cell are still available to you. Again, you can iteratively execute and debug that new set of code until it does what you want.

In the File menu, select "Rename..." (otherwise your notebook will have the name "Untitled"). Again from the File menu, select "Close and Halt". On your main Jupyter page, you'll see your new notebook in your home directory, with the suffix .ipynb. Click on it to start up the notebook again.

Explore the menus. Note how you can save, rename, checkpoint, switch kernels, execute some or all of the cells.

Click in an empty cell. Go to the pop-up menu near the top of the page that reads "Code". Select "Markdown" from that menu. Now you can type plain text in that cell. You can also include Markdown, LaTeX, and HTML commands to format the text. When you're done, hit SHIFT-ENTER to see the formatted result.

So notebooks:

  • let you quickly prototype, save, and update code;
  • make plots, then fiddle with the code that created the plots and quickly refresh them;
  • easily document your work, and update the documentation as quickly as you update your code and plots;
  • you can do all of this from within your web browser.

That's the answer to "why notebooks".

Magic commands

In addition to the kernel languages listed below, in any notebook cell you can type "magic" commands that have effects on your system beyond what the kernel's language normally provides. The magic commands I use most often are:

!ls
%cd <directory>
%cp <file> <new-file-loc>
%lsmagic
%jsroot on

That last magic command, %jsroot on, is only available in ROOT notebooks (the first two kernels listed below). JSROOT adds some interactivity to ROOT plots.

Handy Jupyter links

Kernels

These are the "kernels" (active interpreters/compilers) available on the notebook server. The first two listed are the ones mostly likely to be used; the rest are listed in alphabetical order.

The kernels inherit the environment variables that you set in your shell initialization scripts. This can be convenient, but be sure to read the Limitations section below.

Python 2

Python is a interpretive scripting language. It's becoming more widely used in physics for both scripting and analysis. Here's a Python tutorial. You'll probably also be interested in the commonly-used scientific packages NumPy (which implements arrays), SciPy, and mathplotlib. If there's some standard Python package that's not included on the notebook server, let WilliamSeligman know.

The Python 2 set up on notebook includes PyROOT, a Python-based interface to ROOT. Here's an example of how to use it.

You can copy-n-paste the following example directly from this web page into a Python 2 notebook cell:

import ROOT

# You may want this if you'd like your ROOT plots to be interactive in the notebook.
%jsroot on

# Define a canvas
my_canvas = ROOT.TCanvas("my_canvas","my_canvas",800,600)

hist=ROOT.TH1F("hist","example histogram",100,-3,3)
hist.FillRandom("gaus",100000)
hist.Draw()

# You have to draw the canvas to see it in the web page.
my_canvas.Draw()

ROOT C++

This kernel is the ROOT C++ interpreter, cling. In addition to working with ROOT, it also provides the C++ language within a notebook. Here's an example of using ROOT within a notebook (the C++ example is near the bottom).

A simple test, which you can copy-n-paste directly from this web page into a ROOT C++ notebook cell:

%jsroot on
TCanvas mycanvas("name","title",800,600);
TH1D test("test","example title",200,-3,3);
test.FillRandom("gaus",10000);
test.Draw();
// Unlike interactive ROOT, once you've drawn on a canvas,
// you must draw the canvas explicitly to see it in the notebook. 
mycanvas.Draw();

Bash

Bash (from "Bourne-Again SHell") is a shell language for UNIX systems. There's a good chance it's the shell you use when you login to the Nevis Linux cluster. With this kernel, you can develop shell scripts.

Fortran

Fortran (from "FORmula TRANslation") is a mathematical computer language. For decades it was the backbone of computer programming in physics, and many say that it's still the most efficient language for implementing mathematical tasks. This kernel provides an interface to the GNU gfortran compiler, which is fully compliant with the Fortran 95 Standard and includes some Fortran 2003 and Fortran 2008 features.

Note:

  • The Fortran compiler provided within the notebook server does not include CERNLIB.
  • You can also create Fortran functions that can be called by Python routines using Fortran magic.

Gnuplot

Gnuplot is a graphing utility for visualizing mathematical functions and data interactively. There are Gnuplot cell magics that let you use Gnuplot graphics to create plots within some of the other kernels on this page, such as Julia and Octave.

Julia

Julia is a high-level, high-performance dynamic programming language developed at MIT for technical computing. It combines the ease-of-use of Python with the speed of Fortran. Here's a Julia tutorial, though the plotting examples won't work in Jupyter unless you use PyPlot); e.g.:
using PyPlot
x=linspace(0,2*pi,1000)
y=sin(3x + 3cos(2x))
plot(x,y,color="red",linewidth=2.0,linestyle="--")
title("plot of oscillatory function")
xlabel("the x axis")

Octave

Octave is a scientific programming language, with nice features for handling vectors and matrices, and good visualization tools. It's an open-source equivalent of Matlab. Here's an Octave tutorial.

Python 3

Python 3 is also available as a kernel. Python 3 is the future of the Python language, but not all Python packages have Python 3 versions yet. Most of the critical scientific packages (e.g., NumPy, SciPy, mathplotlib, ROOT) are available in Python 3.

As with Python 2, if there's some standard package you'd like available to run Python 3 scripts on notebook, let WilliamSeligman know.

R

R is a language for statistical computing and graphics; it's the open-source version of S+. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, …) and graphical techniques. Here is an R tutorial.

Ruby

Ruby is a dynamic, open source programming language with a focus on simplicity. Many prefer it to Python as a first programming language. The SciRuby packages are part of this installation, so that Ruby can be used for scientific computation. Unfortunately, there is no current working link between ROOT and Ruby (though one used to exist and might exist again someday). Here's a link to tutorials.

Plotting in Ruby notebooks is limited. The GnuplotRB package is available for plots; you'll have to search through the examples for x-y plots and histograms. Use the svg format for your plots; for some reason GnuplotRB won't display in png or jpeg within a Ruby kernel.

No, Ruby on Rails is not installed. We do not want you building web applications on the notebook server.

SageMath

SageMath is an open-source mathematics software system. It's a wrapper around different symbolic math and statistical packages. It's intended as a open-source replacement for Maple, Mathematica, and Matlab. Here's a SageMath tutorial.

If you ask me if we have Mathematica at Nevis, this is where I'll send you.

Tcl

Tcl is another scripting language, frequently used with a cross-platform graphic user interface package Tk. The latter is included with Python, but it will not function properly in the web-browser environment of Jupyter (there's no X-windows environment inside a web browser). Here's a Tcl tutorial.

...and more

You don't necessarily need an explicit kernel to develop scripts. Jupyter has "cell magics" that let you redefine the language being used within a given cell. If you execute

%lsmagic
you'll see a list of available cell magic commands. Among the commands are those that switch between different languages within a single notebook.

Perl

For example, if you want to work on a Perl script, you can put lines like this in a cell in any kernel:

%%perl
# An uninteresting example: display all the regular files in my home directory
use strict;
use warnings;

use Path::Class;

my $dir = dir($ENV{'HOME'});

# Iterate over the content of my home directory
while (my $file = $dir->next) {
    
    # Skip if it is a directory
    next if $file->is_dir();
    
    # Skip if the filename ends a ~ (emacs work file)
    my $filename = $file->stringify;
    next if $filename =~ /~$/;
   
    # Print out the file name and path
    print $filename . "\n";
}

Cython

Cython is a superset of python in which the commands are compiled into C. For example:

%load_ext Cython
# Next cell:
%%cython -a
def geo_prog_cython(double alpha, int n):
    cdef double current = 1.0
    cdef double sum = current
    cdef int i
    for i in range(n):
        current = current * alpha
        sum = sum + current
    return sum
# Test in next cell:
geo_prog_cython(4.0,5)

Want even more?

It is my intention to include every available Jupyter kernel that might have an application in physics, as long as there's a clear installation method for adding it to Jupyter. If you have a request for an additional kernel, or for a library or extension to be added to an existing language, please let WilliamSeligman know.

Before you ask: I've already tried to install Jupyter kernels for Forth, Haskell, and Perl. Each presented a technical issue, ranging from "simply doesn't work" to "not compatible with being invoked from Jupyter".

Limitations and workarounds

  • The notebook server is a shared resource for use by anyone in the Nevis particle-physics groups and/or the REU students to do light development tasks. If you need to run long, CPU-intensive, or multi-threaded parallel process via Jupyter, notebook is not a good choice. You'll potentially interfere with everyone else trying to use it at the same time. For these high-resource tasks, you can run Jupyter on your workgroup's server instead. (You may interfere with everyone else in your workgroup, but that's between you and them, not you and everyone else with a Nevis Linux cluster account.)

  • The Jupyter notebooks inherit your user environment, that is, the variables that you define in your shell startup scripts. However, if you modify certain variables such as $LD_LIBRARY_PATH or run customization programs (such as module load root) in your initialization, it can affect the execution of the notebook server. The typical symptoms are a notebook kernel that refuses to start or you get library load errors.

  • The software on notebook was compiled under CentOS 7, but the software loaded by the environment modules was compiled under Scientific Linux 6. Also, some physics software is a “chimera”, a blend of software compiled in two languages; for example, the Neutrino Deep Learning group uses Python to call pre-compiled C++ routines. If you need libraries that were compiled for your workgroup server, you'll probably have to use them on your workgroup server. You'll know if this is the case if you get library errors when trying to use your own compiled libraries via notebook.

The solution to most of these issues is to run Jupyter on your workgroup server.

Jupyter on your workgroup server

Jupyter has been made part of the Python 3.6 distribution at Nevis, which is automatically set up when you type the environment modules command at the terminal:

module load root

This will load ROOT 06.12 or later. See the environment modules page for more information, including how to look up available ROOT versions.

Once you set up ROOT, in theory you'll be able to run Jupyter:

jupyter notebook

This will start up a web browser on the system on which you execute the command (not your laptop!), with the web page open to localhost:8888. This is not what you'll want to do normally.

Remote access

You probably want to see Jupyter via a web browser on your laptop. To do this, you must port-forward a connection via ssh. The complete instructions are here. What follows is a brief summary.

On the workgroup server, this command will start up Jupyter for you:

jupyter notebook --no-browser --port=XXXX

... where XXXX is an unused port on the server; e.g., 7000. If multiple users want to run Jupyter on your server, you'll have to coordinate with them so that you don't use the same port. You will see a message on your terminal that includes something like this:

    Copy/paste this URL into your browser when you connect for the first time,
    to login with a token:
        http://localhost:XXXX/?token=<string-of-hex-digits>

...where XXXX is your argument to the --port option. Copy that entire URL string.

On your laptop, forward that port XXXX to you:

ssh -N -f -L localhost:YYYY:localhost:XXXX <username@server.nevis.columbia.edu>

...where <username@server.nevis.columbia.edu> is your Nevis account and server on which you ran the jupyter command. YYYY can be any unused port on your laptop; often users pick YYYY=XXXX, but you don't have to.

Then to access the notebook, go to the web browser on your laptop and visit http://localhost:YYYY/?token=<string-of-hex-digits>. Note that this is almost the same as the URL you copied above, except that you'll have to substitute the laptop port YYYY for the XXXX in the message.

There's a potential problem with just using any value for YYYY: Most current web browsers won't let you visit random ports anymore. In Firefox, there is a workaround; in Safari there isn't. The simplest solution is to let YYYY be a generally-recognized port for internet access; e.g., 8080.

After all that...

You'll hopefully see a Jupyter home window similar to the one you see when using the notebook server. The chief difference is that you won't have the full range of exotic kernels available on that server, just Python 2 and ROOT C++. As long as your jupyter command keeps running, you can login again without the token by making sure the ssh port forwarding is running on your laptop, then visiting http://localhost:YYYY in your browser.

If you want to keep your Jupyter process running even after you've closed the terminal window on your workgroup server, you may want to use the UNIX tmux command. The commands would look something like this:

tmux
module load root
jupyter notebook --no-browser --port=XXXX
# Copy the URL
# Switch to a different screen to work
<Ctrl-b c>

You can close the terminal window whenever you wish; your processes (including jupyter) will continue to run. When you login to your workgroup server again, the command

tmux attach
will reconnect you with the screen(s) you created before, including the jupyter screen.

Jupyter on your laptop

You can install Python, ROOT, and Jupyter on your laptop. In fact, Jupyter is meant to be a laptop tool; the server installations I've prepared are to save you time, and to give you access to the Nevis cluster resources without copying files to and from your laptop. If you want to try your own installation:

  • These are not applications that you can just double-click to install. The process requires some knowledge of the UNIX shell.
  • You'll need to read the documentation for the package installations and use some thought and initiative. The links in the previous paragraph point to the installation documentation.
Edit | Attach | Watch | Print version | History: r38 < r37 < r36 < r35 < r34 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r38 - 2018-07-18 - WilliamSeligman
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback