List of Modules and Sub-Packages¶

Note

Once the API documentation is fixed (by cleaning up the import * statements), we can add links to each module here.

core¶

Provides the core components of pyspecdata. Currently, this is a very large file that we will slowly break down into separate modules or packages.

The classes nddata, nddata_hdf, ndshape, the function plot(), and the class fitdata are the core components of the N-Dimensional processing routines. Start by familiarizing yourself with those.

The figlist is the base class for “Figure lists.” Figure lists allows you to organize plots and text and to refer to plots by name, rather than number. They are designed so that same code can be used seamlessly from within ipython, jupyter, a python script, or a python environment within latex (JMF can also distribute latex code for this – nice python based installer is planned). The user does not initialize the figlist class directly, but rather initializes figlist_var. At the end of this file, there is a snippet of code that sets figlist_var to choice that’s appropriate for the working environment (i.e., python, latex environment, *etc.)

There are many helper and utility functions that need to be sorted an documented by JMF, and can be ignored. These are somewhat wide-ranging in nature. For example, box_muller() is a helper function (based on numerical recipes) used by nddata.add_noise(), while h5 functions are helper functions for using pytables in a fashion that will hopefull be intuitive to those familiar with SQL, etc.

figlist¶

Contains the figure list class

The figure list gives us three things:

Automatically handle the display and scaling of nddata units.
Refer to plots by name, rather than number (matplotlib has a mechanism for this, which we ignore)
A “basename” allowing us to generate multiple sets of plots for different datasets – e.g. 5 plots with 5 names plotted for 3 different datasets and labeled by 3 different basenames to give 15 plots total
Ability to run the same code from the command line or from within a python environment inside latex. * this is achieved by choosing figlist (default gui) and figlistl

(inherits from figlist – renders to latex – the figlist.show() method is changed)
- potential planned future ability to handle html
Ability to handle mayavi plots and matplotlib plots (switch to glumpy, etc.?) * potential planned future ability to handle gnuplot

Todo

Currently the “items” that the list tracks correspond to either plot formatting directives (see figlist.setprops()), text, or figures.

We should scrap most elements of the current implementation of figlist and rebuild it

currently the figlist is set up to use a context block. We will not only keep this, but also make it so the individual axes. Syntax (following a fl = figlist_var() should look like this: with fl['my plot name'] as p: and contents of the block would then be p.plot(...), etc.
define an “organization” function of the figlist block. This allows us to use standard matplotlib commands to set up and organize the axes, using standard matplotlib commands (twinx, subplot, etc.)
figlist will still have a “next” function, but its purpose will be to simply: * grab the current axis using matplotlib gca() (assuming the id of

the axis isn’t yet assigned to an existing figlist_axis – see below)
- otherwise, if the name argument to “next” has not yet been called, call matplotlib’s figure(), followed by subplot(111), then do the previous bullet point
- the next function is only intended to be called explicitly from within the organization function
figlist will consist simply of a list of figlist_axis objects (a new object type), which have the following attributes: * type – indicating the type of object:
- axis (default)
- text (raw latex (or html))
- H1 (first level header – translates to latex section)
- H2 (second level…)
- the name of the plot
- a matplotlib or mayavi axes object
- the units associated with the axes
- a collection.OrderedDict giving the nddata that are associated with the plot, by name. * If these do not have a name, they will be automatically
  
  assigned a name.
  - The name should be used by the new “plot” method to generate the “label” for the legend, and can be subsequently used to quickly replace data – e.g. in a Qt application.
- a dictionary giving any arguments to the pyspecdata.core.plot (or countour, waterfall, etc) function
- the title – by default the name of the plot – can be a setter
- the result of the id(…) function, called on the axes object –> this can be used to determine if the axes has been used yet
- do not use check_units – the plot method (or contour, waterfall, etc.) will only add the nddata objects to the OrderedDict, add the arguments to the argument dictionary, then exit * In the event that more than one plot method is called, the name
  
  of the underlying nddaata should be changed
- a boolean legend_suppress attribute
- a boolean legend_internal attribute (to place the legend internally, rather than outside the axis)
- a show method that is called by the figlistl show method. This will determine the appropriate units and use them to determine the units and scale of the axes, and then go through and call pyspecdata.core.plot on each dataset (in matplotlib, this should be done with a formatting statement rather than by manipulating the axes themselves) and finally call autolegend, unless the legend is supressed
The “plottype” (currently an argument to the plot function) should be an attribute of the axis object

general_functions¶

These are general functions that need to be accessible to everything inside pyspecdata.core. I can’t just put these inside pyspecdata.core, because that would lead to cyclic imports, and e.g. submodules of pyspecdata can’t find them.

datadir¶

Allows the user to run the same code on different machines, even though the location of the raw spectral data might change.

This is controlled by the ~/.pyspecdata or ~/_pyspecdata config file.

load_files¶

This subpackage holds all the routines for reading raw data in proprietary formats. It’s intended to be accessed entirely through the function find_file(), which uses :module:`datadir` to search for the filename, then automatically identifies the file type and calls the appropriate module to load the data into an nddata.

Currently, Bruker file formats (both ESR and NMR) are supported, as well as (at least some earlier iteration) of Magritek file formats.

Users/developers are very strongly encouraged to add support for new file types.

pyspecdata.load_files.find_file(searchstring, exp_type=None, postproc=None, print_result=True, verbose=False, prefilter=None, expno=None, dimname='', return_acq=False, add_sizes=[], add_dims=[], use_sweep=None, indirect_dimlabels=None, lookup={}, return_list=False, zenodo=None, **kwargs)¶

Find the file given by the regular expression searchstring inside the directory identified by exp_type, load the nddata object, and postprocess with the function postproc.

Used to find data in a way that works seamlessly across different computers (and operating systems). The basic scheme we assume is that:

Laboratory data is stored on the cloud (on something like Microsoft Teams or Google Drive, etc.)
The user wants to seamlessly access the data on their laptop.

The .pyspecdata config file stores all the info about where the data lives + is stored locally. You have basically two options:

Point the source directories for the different data folders (exp_type) to a synced folder on your laptop.
Recommended Point the source directories to a local directory on your computer, where local copies of files are stored, and then also set up one or more remotes using rclone (which is an open source cloud access tool). * pyspecdata can automatically search all your rclone remotes when

you try to load a file. This is obviously slow.
- After the auto-search, it adds a line to .pyspecdata so that it knows how to find that directory in the future.
- It will tell you when it’s searching the remotes. If you know what you’re doing, we highly recommend pressing ctrl-C and then manually adding the appropriate line to RcloneRemotes. (Once you allow it to auto-search and add a line once, the format should be obvious.)

Supports the case where data is processed both on a laboratory computer and (e.g. after transferring via ssh or a syncing client) on a user’s laptop. While it will return a default directory without any arguments, it is typically used with the keyword argument exp_type, described below.

It looks at the top level of the directory first, and if that fails, starts to look recursively. Whenever it finds a file in the current directory, it will not return data from files in the directories underneath. (For a more thorough description, see getDATADIR()).

Note that all loaded files will be logged in the data_files.log file in the directory that you run your python scripts from (so that you can make sure they are properly synced to the cloud, etc.).

It calls load_indiv_file(), which finds the specific routine from inside one of the modules (sub-packages) associated with a particular file-type.

If it can’t find any files matching the criterion, it logs the missing file and throws an exception.

Parameters:

searchstring (str) –
If you don’t know what a regular expression is, you probably want to wrap your filename with re.escape(, like this: re.escape(filename), and use that for your searchstring. (Where you have to import the re module.)

If you know what a regular expression is, pass one here, and it will find any filenames that match.
exp_type (str) – Gives the name of a directory, known to be pyspecdata, that contains the file of interest. For a directory to be known to pyspecdata, it must be registered with the (terminal/shell/command prompt) command pyspecdata_register_dir or in a directory contained inside (underneath) such a directory.
expno (int) – For Bruker NMR and Prospa files, where the files are stored in numbered subdirectories, give the number of the subdirectory that you want. Currently, this parameter is needed to load Bruker and Kea files. If it finds multiple files that match the regular expression, it will try to load this experiment number from all the directories.
postproc (function, str, or None) –
This function is fed the nddata data and the remaining keyword arguments (kwargs) as arguments. It’s assumed that each module for each different file type provides a dictionary called postproc_lookup (some are already available in pySpecData, but also, see the lookup argument, below).

Note that we call this “postprocessing” here because it follows the data organization, etc., performed by the rest of the file in other contexts, however, we might call this “preprocessing”

If postproc is a string, it looks up the string inside the postproc_lookup dictionary that’s appropriate for the file type.

If postproc is “none”, then explicitly do not apply any type of postprocessing.

If postproc is None, it checks to see if the any of the loading functions that were called set the postproc_type property – i.e. it checks the value of data.get_prop('postproc_type') – if this is set, it uses this as a key to pull the corresponding value from postproc_lookup. For example, if this is a bruker file, it sets postproc to the name of the pulse sequence.

For instance, when the acert module loads an ACERT HDF5 file, it sets postproc_type to the value of (h5 root).experiment.description['class']. This, in turn, is used to choose the type of post-processing.

dimname:

passed to load_indiv_file()

return_acq:

passed to load_indiv_file()

add_sizes:

passed to load_indiv_file()

add_dims:

passed to load_indiv_file()

use_sweep:

passed to load_indiv_file()

indirect_dimlabels:

passed to load_indiv_file() lookup : dictionary with str:function pairs

types of postprocessing to add to the postproc_lookup dictionary
zenodo (str, optional) – Deposition number on Zenodo. When the requested file is not found locally, a file matching searchstring will be downloaded from this deposition instead of searching rclone remotes.

fornotebook¶

This provides figlistl, the Latex figure list. Any other functions here are helper functions for the class. figlist is generally not chosen manually, but figlist_var will be assigned to figlistl when python code is embedded in a python environment inside latex.

latexscripts¶

Provides the pdflatex_notebook_wrapper shell/dos command, which you run instead of your normal Latex command to build a lab notebook. The results of python environments are cached and only re-run if the code changes, even if the python environments are moved around. This makes the compilation of a Latex lab notebook extremely efficient.

ipy¶

Provides the jupyter extension:

%load_ext pyspecdata.ipy

That allows for fancy representation nddata instances – i.e. you can type the name of an instance and hit shift-Enter, and a plot will appear rather than some text representation.

Also overrides plain text representation of numpy arrays with latex representation that we build ourselves or pull from sympy.

Also known as “generalized jupyter awesomeness” in only ~150 lines of code!

See [O’Reilly Book](https://www.safaribooksonline.com/blog/2014/02/11/altering-display-existing-classes-ipython/) for minimal guidance if you’re interested.

ndshape¶

The ndshape class allows you to allocate arrays and determine the shape of existing arrays.

units¶

Not yet implemented – a preliminary idea for how to handle actual unit conversion. (Currently, we only do s to Hz during FT and order of magnitude prefixes when plotting.)