Nathaniel R. Stickley

Software Engineer - Astrophysicist - Data Scientist

Projects

NebulOS

Dates

Primarily August, 2014 – May, 2015

Type

Job assignment

Languages

C++11, Python 2.7

Libraries & Frameworks

  • Apache Mesos
  • libHDFS
  • Boost (string algorithms & Python)
  • Linux system calls
  • Curlpp
  • SQLite (for caching old task info to disk)

Summary

NebulOS is a flexible, user-friendly, Big Data analysis platform. It can be thought of as a cluster operating system that allows a user to treat a group of Linux systems (e.g., a typical data center) as a single machine. Apache Mesos and the Hadoop Distributed File System (HDFS) act as the OS kernel and file system, respectively. The component that differentiates NebulOS from other Big Data systems is its Mesos-based framework, which allows the user to...

  • run pre-existing software on the cluster, without modification.
  • easily write monitoring code in any language to examine the standard error and output streams, memory usage, and CPU usage of tasks launched on the system.
  • write code which performs actions, based upon the behavior of the individual tasks. For instance, tasks that meet certain user-defined conditions can be terminated and automatically relaunched with modified parameters or modified input data.

Of course, the system is also able to handle node failures seamlessly and it is aware of data locality; tasks are preferentially performed on nodes that contain the greatest amount of relevant data. The user interface is Python-based, so that the user can issue commands interactively or write Python scripts to build more complex analysis routines. More details can be found here.

Motivation

Researchers at UC Riverside (primarily Miguel A. Aragon-Calvo) needed a Big Data framework to efficiently analyze cosmological simulation data. They desired a fault-tolerant system with high data throughput, capable of being used with existing software. Automated task monitoring was also highly desired, since the system would also be used to perform a large number of simulations with a simulation code prone to hanging. Rather than building a solution that would only work for a specific application, the researchers decided that a general-purpose framework would be more valuable.

Existing tools, such as TORQUE, Hadoop Mapreduce, Hama, and Spark are not ideal for analyzing terabytes or petabytes of scientific data in custom binary formats because these tools are either difficult to use or do not simultaneously allow high data throughput, fault tolerance, and flexibility. Ideally, scientists would like to use pre-existing analysis software so that time, resources, and effort can be spent on doing science rather than writing software. So, the framework needed to be able to handle pre-existing software and get out of the way of the user.

I was hired to develop the framework, described above. I evaluated several available technologies and eventually decided to use Apache Mesos and HDFS as the core components of the system. I then began implementing and testing the software and receiving feedback from potential users. As of March 2015, new features are being planned and user interface improvements are being made. A draft paper, announcing the software, can be found here.

An architectural overview, showing how Mesos, HDFS, and the NebulOS application framework are related to one another.

Summary of the interaction between Mesos and the NebulOS framework.

Example usage of the NebulOS Python module interface being used in streaming / interactive mode. The command, 'from nebulos import Processor' must be executed prior to this.

Performance scaling on Amazon Web Services' d2.xlarge EC2 instances when reading single-block files (approx 110 MB per file). The reading speed reported is the net speed, which includes the overhead of submitting the tasks to NebulOS framework, launching the tasks, and retrieving the output of each task.

Pretty Parametric Plots

Dates

February, 2014 – April, 2014

Type

Personal project, for fun

Languages

JavaScript, PHP, HTML5, CSS3, SVG

Libraries & Frameworks

  • jQuery
  • jQueryTOOLS
  • Spectrum Colorpicker

Summary

Pretty Parametric Plots is a small web application that generates artistic-looking parametric plots as in-line SVG images. The plotting algorithm adaptively spaces the SVG control points (i.e., Bézier spline control points) along the path so that fine details of the plot are faithfully represented without wasting control points on the regions of low curvature. This allows the web browser's SVG rendering engine to render high-quality plots quickly. Without adaptive spacing, there is a significant trade-off between quality and rendering time.

The underlying algorithm could easily be used in a general-purpose plotting library, but I have not yet done so. A more detailed description can be found here.

Motivation

While playing with the parametric plotter plugin that comes with InkScape, I stumbled upon an interesting class of functions whose plots were especially visually-pleasing. The quality and performance of the InkScape plotter (and other plotters) was disappointing, so I wrote a program in C++, capable of generating much higher-quality plots and saving the plots as SVG images. A few days later, I decided to learn JavaScript. I ported the C++ code to JavaScript as a learning exercise. In the process, I learned to use jQuery and a few other JavaScript libraries. I also became familiar with the DOM, and CSS3.

Click to view this plot Live!

Click the image to view this plot Live! (Internet Explorer is not supported)

GSnap

Dates

December, 2011 – September, 2013

Type

Ph.D. research tool (and for fun)

Languages

C++11

Libraries & Frameworks

  • Qt Framework
  • OpenMP

Summary

GSnap is a tool for analyzing, viewing, and manipulating snapshots from galaxy simulations. GSnap was initially written to measure the velocity dispersion of particles in galaxy simulations, but it is now useful for interactively rotating and zooming snapshots, measuring distances between objects (and sizes of objects), interpolating between snapshots, manipulating / editing snapshots, and creating high quality visualizations of the stars and gas in galaxy simulation snapshots. In addition to a GUI, GSnap offers a powerful command line interface, which allows the user to operate the program from a script, and a built-in ECMAScript interpreter, which allows the user to potentially extend GSnap’s functionality. Currently, only very specialized GADGET-2 and GADGET-3, type 1 snapshots are supported, however, adding a new file format is a fairly straightforward task. No significant work has been done on the project since September of 2013, but I intend to officially announce the code to the astronomy community within the next year so that other people can begin to contribute. For more information, visit GSnap's web page. My most recent research blog post about GSnap and the blog post about the interpolation scheme are also helpful.

Refer to the paper, Stellar Velocity Dispersion in Dissipative Galaxy Mergers with Star Formation, for examples of the types of analysis that can be performed with GSnap.

Motivation

I began working on GSnap because I needed to efficiently analyze thousands of snapshots from galaxy merger simulations, as part of my Ph.D. research. No existing software was able to perform the types of analysis that I needed, so I developed the code myself. I continued adding features to the software for fun after its minimal functionality had been implemented. Most notably, the volume rendering and snapshot interpolation features were irrelevant to my actual research.

Visualization of a simulation snapshot.

A visualization of a simulation snapshot.

  • The GUI
A plot showing how stellar velocity dispersion varies with age after a galactic merger. The merger leaves an imprint on the dynamics of the remnant galaxy.

A plot showing how stellar velocity dispersion varies with stellar age after a galactic merger. The merger leaves an imprint on the dynamics of the remnant galaxy. For more info, see the paper.

PNG Tagger

Dates

Primarily February, 2013 – March, 2013

Type

Personal productivity

Languages

C++11

Libraries & Frameworks

Qt Framework

Summary

PNG Tagger is a photo organization tool that allows Facebook-like tags, descriptions, and date information to be stored in the metadata of a PNG image. This allows people to share photos while retaining the tag information, without the need to update a separate database. Furthermore, the photos are viewed and tagged offline, so an Internet connection is not needed. Private photos can remain private because they never need to be uploaded to a public server.

In total, less than two full weeks of work have gone into PNG Tagger, but I am interested in adding a few features eventually. For instance, the program will eventually make it easy to filter photos by date range and search for keywords, specific people, and locations. The only way to do this at the moment is to use a program like pngmeta along with grep to do the search from the command line.

Motivation

While visiting my family in Virginia during February of 2013, we scanned many family photos. We have thousands of old family photos and we often have to look at the backs of photos to see who appears in the photo, as well as the date and the location of the photo. I wanted an easy way to find all of that information while looking at the scanned images, so I began working on PNG tagger during my vacation.

png-tagger

png_tagger_demo1

png_tagger_demo2

Direct N-Body

Dates

June, 2010 – August, 2010

Type

Ph.D. research

Languages

C++03

Libraries & Frameworks

  • Boost
  • Magick++
  • OpenMP

Summary

As part of my dissertation research, I wrote an N-body code for studying the dynamics of galaxy mergers. The code performed the following tasks:

  • Constructed dynamically stable model galaxies.
  • Placed model galaxies on a user-defined collision course.
  • Evolved the system of particles forward in time, using an adaptive time-step leapfrog integrator.
  • Computed mass-weighted statistics on the stellar dynamics of the system at fixed time intervals. Optionally, a toy model for dust attenuation could be used to perform flux-weighted statistics.
  • Generated simple visualizations of the merger from various directions while the statistics were being computed.

The code is described in more detail in the resulting publication.

Motivation

When I began my Ph.D. research, I did not have access to any of the standard tools for building model galaxies, nor the codes used for analyzing simulations performed by the standard simulation codes. Additionally, I wanted to have a better understanding of how galaxies are modeled and how N-body simulations of galaxy mergers worked, so I wrote my own tools from scratch. It was a great learning experience for both software engineering and astrophysics.

An example of a merger simulation performed by the code.

The actual density profile of a model galaxy (black) and the desired, analytic Hernquist profile (red).

The actual density profile of a model galaxy (black) and the analytic Hernquist profile (red).

OpenConvection

Dates

January, 2008 – April, 2008

Type

Master's degree research

Languages

MATLAB

Libraries & Frameworks

Built-in MATLAB functionality

Summary

Given solar wind conditions and a magnetic field model, OpenConvection computes the electrical currents, potential distribution, particle drift velocities, and pressure in the inner magnetosphere and ionosphere. It is an implementation of the Rice Convection Algorithm that I generalized to include hemispheric asymmetry, caused by seasonal variation.

Motivation

The fact that Earth's axis of rotation is not perpendicular to its orbital plane causes the conductivity of the northern and southern hemispheres of the ionosphere to differ considerably throughout the year. The effect is similar to the seasonal variation in the average surface temperature. The conductivity changes because UV radiation from the Sun is responsible for ionizing the upper atmosphere; when there are more ions, the conductivity increases. The existing version of the Rice Convection Model did not account for this asymmetry and the authors were unwilling to share their source code, so I was forced to implement the algorithm myself, based on the published descriptions.

Magnetic field lines on the outer shell of the modeling region.

Magnetic field lines on the outer shell of the modeling region.

The electric potential during quiescent times (low cross-polar-cap potential).

The electric potential during quiescent times (low cross-polar-cap potential). The equi-potentials roughly correspond to pure E × B drift paths. Thus, this plot hints at the well-known duskward bulge of the plasmasphere.

The Birkeland (i.e. magnetic field-aligned) current density in the northern hemispheres, mapped, along field lines, to the equatorial plane.

The Birkeland (i.e. magnetic field-aligned) current density in the northern hemisphere, mapped along field lines to the equatorial plane. Red indicates that the current is flowing parallel to the field; blue indicates anti-parallel flow. The position of Earth is indicated with a circle.

Nathaniel R. Stickley
nrs@nrstickley.com
626-269-9830