Note
Go to the end to download the full example code
Time-Domain Noise¶
Here, we want to calculate the time-domain variance to use in error propagation. But, to make sure we calculate only noise, we want to mask out portions of the frequency domain. We propose that if we use a unitary transform, Parseval’s theorem tells us we can calculate \(\sigma_t^2\) (time domain variance) directly from the frequency-domain variance, i.e. \(\sigma_ u^2=\sigma_t^2\). To confirm this, we construct a “spectrum” of pure noise and generate a frequency-masked noise, and show that is the same as the unmasked time-domain.

If we apply just FT as we normally would the std in the frequency domain is: array([0.03133493, 0.0310249 , 0.03087452, 0.03153816, 0.03131391,
0.03096058, 0.03160122, 0.03117801, 0.03126264, 0.03076478,
0.03088133, 0.03103172, 0.03112032, 0.03136118, 0.03158249,
0.03082116, 0.03087223, 0.0311701 , 0.0313617 , 0.03084312,
0.03140348, 0.03140812, 0.03147682, 0.0314111 , 0.03162665,
0.03113172, 0.03115235, 0.0311152 , 0.03142608, 0.03146255,
0.03131619, 0.03134668, 0.03131779, 0.03099557, 0.03135103,
0.03127095, 0.03112738, 0.03150568, 0.03130068, 0.03121569,
0.03128878, 0.03144053, 0.03136608, 0.03161975, 0.03089907,
0.03176012, 0.03131556, 0.03136479, 0.03108606, 0.03149692])
+/-None
dimlabels=['repeats']
axes={`repeats':array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])
+/-None}
These values are NOT the same so we need a unitary FT for this to work
When we apply a unitary FT the std over all the frequency domain is: array([1.00173849, 0.99182734, 0.98701987, 1.00823557, 1.00106652,
0.989771 , 1.01025153, 0.99672207, 0.99942747, 0.98351153,
0.98723756, 0.99204514, 0.99487786, 1.00257757, 1.00965265,
0.98531386, 0.98694674, 0.99646927, 1.00259427, 0.98601599,
1.00393005, 1.00407837, 1.00627466, 1.00417353, 1.01106454,
0.99524206, 0.9959016 , 0.99471398, 1.00465262, 1.0058185 ,
1.00113949, 1.00211427, 1.00119074, 0.9908895 , 1.00225328,
0.99969313, 0.99510333, 1.00719706, 1.00064366, 0.99792673,
1.00026323, 1.00511447, 1.00273429, 1.01084399, 0.98780474,
1.01533137, 1.00111916, 1.00269298, 0.99378239, 1.00691723])
+/-None
dimlabels=['repeats']
axes={`repeats':array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])
+/-None}
Because we have no signal, this again corresponds to our noise.
1: show the mask in white |||(None, None)
/home/jmfranck/base/lib/python3.11/site-packages/numpy/core/fromnumeric.py:3787: RuntimeWarning: Degrees of freedom <= 0 for slice
return _methods._var(a, axis=axis, dtype=dtype, out=out, ddof=ddof,
/home/jmfranck/base/lib/python3.11/site-packages/numpy/core/_methods.py:163: RuntimeWarning: invalid value encountered in divide
arrmean = um.true_divide(arrmean, div, out=arrmean,
/home/jmfranck/base/lib/python3.11/site-packages/numpy/core/_methods.py:198: RuntimeWarning: invalid value encountered in scalar divide
ret = ret.dtype.type(ret / rcount)
The std when using the mask on unitary data is: array([0.98993434, 1.00085091, 0.98313161, 1.00818705, 0.99758744,
0.98960387, 1.00969641, 0.99388606, 0.99319969, 0.98895011,
0.99828855, 0.98790134, 0.99655692, 1.00998001, 1.00569159,
0.97099462, 0.99955595, 1.00269788, 1.0102701 , 0.98713458,
1.00421123, 1.00629517, 1.00671573, 1.01306883, 1.0086519 ,
0.99513845, 1.00149426, 1.00092035, 1.01551376, 1.00895013,
0.99727754, 0.99891376, 0.99659213, 0.98923538, 0.9984164 ,
0.9980498 , 0.99789894, 1.00615097, 1.00834723, 0.99443605,
1.01354014, 1.02519006, 1.00213205, 1.01778102, 0.98497144,
1.00780609, 0.98811374, 0.99560624, 0.99601896, 1.00558236])
+/-None
dimlabels=['repeats']
axes={`repeats':array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])
+/-None}
Because we can use the mask in the DCCT domain to exclude signal, that is the number we will want, in general.
However, here we know that all of our data is noise, and so we should make sure that this matches the naive, direct time-domain calculation.
If it does, all the following numbers will be about 1.0:
array([0.98775915, 1.00947173, 0.99606845, 0.99935323, 0.9966833 ,
0.9999185 , 0.99950514, 0.99657521, 0.99319529, 1.00530952,
1.01121127, 0.99539303, 1.00205525, 1.0075366 , 0.99588587,
0.98549159, 1.0123384 , 1.00606177, 1.00747714, 1.000892 ,
1.00004356, 1.00219001, 1.00034167, 1.00916587, 0.99764565,
0.99975116, 1.00522464, 1.00620426, 1.01129103, 1.00285814,
0.99662417, 0.99671768, 0.99562997, 0.99826647, 0.99611734,
0.99848401, 1.00280985, 0.99922003, 1.00829592, 0.99655859,
1.01348139, 1.01963677, 0.99939527, 1.00772439, 0.99724153,
0.99259741, 0.98700995, 0.9928987 , 1.00186831, 0.99861617])
+/-None
dimlabels=['repeats']
axes={`repeats':array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33,
34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])
+/-None}
import numpy as np
from numpy import r_
from pyspecdata import *
from pyspecProcScripts import *
N = 1024
n_repeats = 50
signal_window = (-100, 200) # wherever my "peak" shows up
# {{{ we know how to write a masked mean or std only along 1 dimension, so
# use numpy apply_along_axis to make it a function that works along 1
# dimension of multidimensional data
def masked_mean_multi(x, axis=None):
"Calculates the mean of nan-masked data on a 1D axis"
assert axis is not None
def masked_mean(x):
"this only works for 1D data"
return np.mean(x[np.isfinite(x)])
return np.apply_along_axis(masked_mean, axis, x)
def masked_var_multi(x, axis=None, var_has_imag=True):
"calculates the variance of nan-masked data along a 1D axis"
assert axis is not None
def masked_var(x):
"this only works for 1D data"
if var_has_imag: # take average of variance along real and image
return np.var(x[np.isfinite(x)], ddof=1) / 2
else:
return np.var(x[np.isfinite(x)], ddof=1)
return np.apply_along_axis(masked_var, axis, x)
# }}}
# {{ {generate data with just noise with a phase cycling dimension and repeats dimension
signal_pathway = {"ph": 1}
example_data = nddata(
np.random.normal(size=4 * n_repeats * N)
+ 1j * np.random.normal(size=4 * n_repeats * N),
[4, n_repeats, N],
["ph", "repeats", "t"],
)
example_data.setaxis("ph", r_[0:4] / 4)
example_data.setaxis("repeats", r_[0:n_repeats])
example_data.setaxis("t", r_[0 : 1 : 1j * N])
# }}}
# calculate the variance directly in the time domain.
# Because the data has no signal, know that this actually corresponds to the noise level:
direct_t_dom_std = sqrt(example_data.C.run(np.var, "t").mean("ph") / 2)
# the way that we do FT is parseval preserved?
temp = example_data.C.ft("t", shift=True)
print(
"If we apply just FT as we normally would the std in the frequency domain is:",
sqrt(temp.run(np.var, "t").mean("ph") / 2),
)
# it's not! I need to use a unitary FT for this to work
print("These values are NOT the same so we need a unitary FT for this to work")
example_data.ft("t", shift=True, unitary=True)
example_data.ft("ph", unitary=True)
freq_dom_std = sqrt(example_data.C.run(np.var, "t").mean("ph") / 2)
print(
"When we apply a unitary FT the std over all the frequency domain is:",
freq_dom_std,
)
print("Because we have no signal, this again corresponds to our noise.")
# now, I can just calculate the "time domain" noise variance in the
# frequency domain, where it's easier to mask out regions of the coherence
# domain where I expect there is signal (or phase cycling noise)
# {{{ I'm doing a mildly odd thing where I'm using "nan" to identify signal I
# want to exclude from the variance calculation -- i.e. to mask it. This
# is assuming that I have signal that I'm not interested in including in
# the calculation.
temp = select_pathway(example_data, signal_pathway)
temp.data[:] = nan # note how I am NOT acting on a copy -- I am trying to
# manipulate the data at its original memory position!
# for the most complicated case I'll also say I want to exclude phase cycling
# noise -- so also exclude everything from the signal bandwidth
# this will give a conservative (small) estimate of the noise
temp = example_data["t":signal_window]
temp.data[:] = nan
# }}}
with figlist_var(black=True) as fl:
fl.next("show the mask in white")
forplot = example_data.C
# in pyspecdata, nan shows up as the opposite (black vs. white) color vs. 0
fl.image(forplot)
# {{{ Calculate the variance using new functions
# now, I can do this:
example_data.run(masked_var_multi, "t")
example_data.run(masked_mean_multi, "ph")
example_data.run(
lambda x: sqrt(x)
) # convert variance to std for subsequent comparison
print("The std when using the mask on unitary data is:", example_data)
print(
"Because we can use the mask in the DCCT domain to exclude signal, that is the number we will want, in general."
)
print(
"However, here we know that all of our data is noise, and so we should make sure that this matches the naive, direct time-domain calculation."
)
print("If it does, all the following numbers will be about 1.0:")
print(example_data / direct_t_dom_std)
# }}}
Total running time of the script: (0 minutes 1.428 seconds)