API Reference ¶

This reference provides detailed documentation for all modules, classes, and methods in the current release of Neurolearn.
`nltools.data` : Data Types ¶

class nltools.data. Brain_Data ( data = None , Y = None , X = None , mask = None , ** kwargs ) [source] ¶
Brain_Data is a class to represent neuroimaging data in python as a vector rather than a 3-dimensional matrix.This makes it easier to perform data manipulation and analyses.
Parameters :
data – nibabel data instance or list of files
Y – Pandas DataFrame of training labels
X – Pandas DataFrame Design Matrix for running univariate models
mask – binary nifiti file to mask brain data
**kwargs – Additional keyword arguments to pass to the prediction algorithm
align ( target , method = 'procrustes' , axis = 0 , * args , ** kwargs ) [source] ¶
Align Brain_Data instance to target object using functional alignment
Alignment type can be hyperalignment or Shared Response Model. When using hyperalignment, target image can be another subject or an already estimated common model. When using SRM, target must be a previously estimated common model stored as a numpy array. Transformed data can be back projected to original data using Tranformation matrix.
See nltools.stats.align for aligning multiple Brain_Data instances
Parameters :
target – (Brain_Data) object to align to.
method – (str) alignment method to use [‘probabilistic_srm’,’deterministic_srm’,’procrustes’]
axis – (int) axis to align on
Returns :
(dict) a dictionary containing transformed object,
transformation matrix, and the shared response matrix
data – (Brain_Data) Brain_Data instance to append
kwargs – optional inputs to Design_Matrix append
Returns :
(Brain_Data) new appended Brain_Data instance
Return type :
apply_mask ( mask , resample_mask_to_brain = False ) [source] ¶
Mask Brain_Data instance
Note target data will be resampled into the same space as the mask. If you would like the mask resampled into the Brain_Data space, then set resample_mask_to_brain=True.
Parameters :
mask – (Brain_Data or nifti object) mask to apply to Brain_Data object.
resample_mask_to_brain – (bool) Will resample mask to brain space before applying mask (default=False).
Returns :
(Brain_Data) masked Brain_Data object
Return type :
masked
bootstrap ( function , n_samples = 5000 , save_weights = False , n_jobs = -1 , random_state = None , * args , ** kwargs ) [source] ¶
Bootstrap a Brain_Data method.
Parameters :
function – (str) method to apply to data for each bootstrap
n_samples – (int) number of samples to bootstrap with replacement
save_weights – (bool) Save each bootstrap iteration (useful for aggregating
cluster ) ( many bootstraps on a ) –
n_jobs – (int) The number of CPUs to use to do the computation. -1 means all
CPUs.Returns –
Returns :
summarized studentized bootstrap output
Return type :
output
Examples
>>>  b = dat




    
.bootstrap('mean', n_samples=5000)
>>>  b = dat.bootstrap('predict', n_samples=5000, algorithm='ridge')
>>>  b = dat.bootstrap('predict', n_samples=5000, save_weights=True)
decompose(algorithm='pca', axis='voxels', n_components=None, *args, **kwargs)[source]¶
Decompose Brain_Data object
Parameters:
algorithm – (str) Algorithm to perform decomposition
types=[‘pca’,’ica’,’nnmf’,’fa’,’dictionary’,’kernelpca’]
axis – dimension to decompose [‘voxels’,’images’]
n_components – (int) number of components. If None then retain
as many as possible.
Returns:
a dictionary of decomposition parameters
Return type:
output
distance(metric='euclidean', **kwargs)[source]¶
Calculate distance between images within a Brain_Data() instance.
Parameters:
metric – (str) type of distance metric (can use any scikit learn or
sciypy metric)
Returns:
(Adjacency) Outputs a 2D distance matrix.
Return type:
extract_roi(mask, metric='mean', n_components=None)[source]¶
Extract activity from mask
Parameters:
mask – (nifti) nibabel mask can be binary or numbered for
different rois
metric – type of extraction method [‘mean’, ‘median’, ‘pca’], (default=mean)
NOTE: Only mean currently works!
n_components – if metric=’pca’, number of components to return (takes any input into sklearn.Decomposition.PCA)
Returns:
mean within each ROI across images
Return type:
filter(sampling_freq=None, high_pass=None, low_pass=None, **kwargs)[source]¶
Apply 5th order butterworth filter to data. Wraps nilearn
functionality. Does not default to detrending and standardizing like
nilearn implementation, but this can be overridden using kwargs.
Parameters:
sampling_freq – sampling freq in hertz (i.e. 1 / TR)
high_pass – high pass cutoff frequency
low_pass – low pass cutoff frequency
kwargs – other keyword arguments to nilearn.signal.clean
Returns:
Filtered Brain_Data instance
Return type:
Brain_Data
find_spikes(global_spike_cutoff=3, diff_spike_cutoff=3)[source]¶
Function to identify spikes from Time Series Data
Parameters:
global_spike_cutoff – (int,None) cutoff to identify spikes in global signal
in standard deviations, None indicates do not calculate.
diff_spike_cutoff – (int,None) cutoff to identify spikes in average frame difference
in standard deviations, None indicates do not calculate.
Returns:
pandas dataframe with spikes as indicator variables
icc(icc_type='icc2')[source]¶
Calculate intraclass correlation coefficient for data within
Brain_Data class
ICC Formulas are based on:
Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: uses in
assessing rater reliability. Psychological bulletin, 86(2), 420.
icc1:  x_ij = mu + beta_j + w_ij
icc2/3:  x_ij = mu + alpha_i + beta_j + (ab)_ij + epsilon_ij
Code modifed from nipype algorithms.icc
https://github.com/nipy/nipype/blob/master/nipype/algorithms/icc.py
Parameters:
icc_type – type of icc to calculate (icc: voxel random effect,
icc2: voxel and column random effect, icc3: voxel and
column fixed effect)
Returns:
(np.array) intraclass correlation coefficient
Return type:
iplot(threshold=0, surface=False, anatomical=None, **kwargs)[source]¶
Create an interactive brain viewer for the current brain data instance.
Parameters:
threshold – (float/str) two-sided threshold to initialize the
visualization, maybe be a percentile string; default 0
surface – (bool) whether to create a surface-based plot; default False
anatomical – nifti image or filename to overlay
kwargs – optional arguments to nilearn.view_img or
nilearn.view_img_on_surf
Returns:
interactive brain viewer widget
multivariate_similarity(images, method='ols')[source]¶
Predict spatial distribution of Brain_Data() instance from linear
combination of other Brain_Data() instances or Nibabel images
Parameters:
self – Brain_Data instance of data to be applied
images – Brain_Data instance of weight map
Returns:
dictionary of regression statistics in Brain_Data
instances {‘beta’,’t’,’p’,’df’,’residual’}
plot(limit=5, anatomical=None, view='axial', colorbar=False, black_bg=True, draw_cross=False, threshold_upper=None, threshold_lower=None, figsize=(15, 2), axes=None, **kwargs)[source]¶
Create a quick plot of self.data.  Will plot each image separately
Parameters:
limit – (int) max number of images to return
anatomical – (nifti, str) nifti image or file name to overlay
view – (str) ‘axial’ for limit number of axial slices;
‘glass’ for ortho-view glass brain; ‘mni’ for
multi-slice view mni brain; ‘full’ for both glass and
mni views
threshold_upper – (str/float) threshold if view is ‘glass’,
‘mni’, or ‘full’
threshold_lower – (str/float)threshold if view is ‘glass’,
‘mni’, or ‘full’
save – (str/bool): optional string file name or path for saving; only applies if view is ‘mni’, ‘glass’, or ‘full’.
Filenames will appended with the orientation they belong to
predict(algorithm=None, cv_dict=None, plot=True, verbose=True, **kwargs)[source]




    
¶
Run prediction
Parameters:
algorithm – Algorithm to use for prediction.  Must be one of ‘svm’,
‘svr’, ‘linear’, ‘logistic’, ‘lasso’, ‘ridge’,
‘ridgeClassifier’,’pcr’, or ‘lassopcr’
cv_dict – Type of cross_validation to use. A dictionary of
{‘type’: ‘kfolds’, ‘n_folds’: n},
{‘type’: ‘kfolds’, ‘n_folds’: n, ‘stratified’: Y},
{‘type’: ‘kfolds’, ‘n_folds’: n, ‘subject_id’: holdout}, or
{‘type’: ‘loso’, ‘subject_id’: holdout}
where ‘n’ = number of folds, and ‘holdout’ = vector of
subject ids that corresponds to self.Y
plot – Boolean indicating whether or not to create plots.
verbose (bool) – print performance; Default True
**kwargs – Additional keyword arguments to pass to the prediction
algorithm
Returns:
a dictionary of prediction parameters
Return type:
output
predict_multi(algorithm=None, cv_dict=None, method='searchlight', rois=None, process_mask=None, radius=2.0, scoring=None, n_jobs=1, verbose=0, **kwargs)[source]¶
Perform multi-region prediction. This can be a searchlight analysis or multi-roi analysis if provided a Brain_Data instance with labeled non-overlapping rois.
Parameters:
algorithm (string) – algorithm to use for prediction Must be one of ‘svm’,
‘svr’, ‘linear’, ‘logistic’, ‘lasso’, ‘ridge’,
‘ridgeClassifier’,’pcr’, or ‘lassopcr’
cv_dict – Type of cross_validation to use. Default is 3-fold. A dictionary of
{‘type’: ‘kfolds’, ‘n_folds’: n},
{‘type’: ‘kfolds’, ‘n_folds’: n, ‘stratified’: Y},
{‘type’: ‘kfolds’, ‘n_folds’: n, ‘subject_id’: holdout}, or
{‘type’: ‘loso’, ‘subject_id’: holdout}
where ‘n’ = number of folds, and ‘holdout’ = vector of
subject ids that corresponds to self.Y
method (string) – one of ‘searchlight’ or ‘roi’
rois (string/nltools.Brain_Data) – nifti file path or Brain_data instance containing non-overlapping regions-of-interest labeled by integers
process_mask (nib.Nifti1Image/nltools.Brain_Data) – mask to constrain where to perform analyses; only applied if method = ‘searchlight’
radius (float) – radius of searchlight in mm; default 2mm
scoring (function) – callable scoring function; see sklearn documentation; defaults to estimator’s default scoring function
n_jobs (int) – The number of CPUs to use to do permutation; default 1 because this can be very memory intensive
verbose (int) – whether parallelization progress should be printed; default 0
Returns:
image of results
Return type:
output
randomise(n_permute=5000, threshold_dict=None, return_mask=False, **kwargs)[source]¶
Run mass-univariate regression at each voxel with inference performed
via permutation testing ala randomise in FSL. Operates just like
.regress(), but intended to be used for second-level analyses.
Parameters:
n_permute (int) – number of permutations
threshold_dict – (dict) a dictionary of threshold parameters
{‘unc’:.001} or {‘fdr’:.05}
return_mask – (bool) optionally return the thresholding mask
Returns:
dictionary of maps for betas, tstats, and pvalues
Return type:
regions(min_region_size=1350, extract_type='local_regions', smoothing_fwhm=6, is_mask=False)[source]¶
Extract brain connected regions into separate regions.
Parameters:
min_region_size (int) – Minimum volume in mm3 for a region to be
kept.
extract_type (str) – Type of extraction method
[‘connected_components’, ‘local_regions’].
If ‘connected_components’, each component/region
in the image is extracted automatically by
labelling each region based upon the presence of
unique features in their respective regions.
If ‘local_regions’, each component/region is
extracted based on their maximum peak value to
define a seed marker and then using random
walker segementation algorithm on these
markers for region separation.
smoothing_fwhm (scalar) – Smooth an image to extract more sparser
regions. Only works for extract_type
‘local_regions’.
is_mask (bool) – Whether the Brain_Data instance should be treated
as a boolean mask and if so, calls
connected_label_regions instead.
Returns:
Brain_Data instance with extracted ROIs as data.
Return type:
Brain_Data
regress(mode='ols', **kwargs)[source]¶
Run a mass-univariate regression across voxels. Three types of regressions can be run:
1) Standard OLS (default)
2) Robust OLS (heteroscedasticty and/or auto-correlation robust errors), i.e. OLS with “sandwich estimators”
3) ARMA (auto-regressive and moving-average lags = 1 by default; experimental)
For more information see the help for nltools.stats.regress
ARMA notes: This experimental mode is similar to AFNI’s 3dREMLFit but without spatial smoothing of voxel auto-correlation estimates. It can be very computationally intensive so parallelization is used by default to try to speed things up. Speed is limited because a unique ARMA model is fit to each voxel (like AFNI/FSL), but unlike SPM, which assumes the same AR parameters (~0.2) at each voxel. While coefficient results are typically very similar to OLS, std-errors and so t-stats, dfs and and p-vals can differ greatly depending on how much auto-correlation is explaining the response in a voxel
relative to other regressors in the design matrix.
Parameters:
mode (str) – kind of model to fit; must be one of ‘ols’ (default), ‘robust’, or ‘arma’
kwargs (dict) – keyword arguments to nltools.stats.regress
Returns:
dictionary of regression statistics in Brain_Data instances
{‘beta’,’t’,’p’,’df’,’residual’}
scale(scale_val=100.0)[source]¶
Scale all values such that they are on the range [0, scale_val],
via grand-mean scaling. This is NOT global-scaling/intensity
normalization. This is useful for ensuring that data is on a
common scale (e.g. good for multiple runs, participants, etc)
and if the default value of 100 is used, can be interpreted as
something akin to (but not exactly) “percent signal change.”
This is consistent with default behavior in AFNI and SPM.
Change this value to 10000 to make consistent with FSL.
Parameters:
scale_val – (int/float) what value to send the grand-mean to;
default 100
similarity(image, method='correlation')[source]¶
Calculate similarity of Brain_Data() instance with single
Brain_Data or Nibabel image
Parameters:
image – (Brain_Data, nifti)  image to evaluate similarity
method – (str) Type of similarity
[‘correlation’,’dot_product’,’cosine’]
Returns:
(list) Outputs a vector of pattern expression values
Return type:
standardize(axis=0, method='center')[source]¶
Standardize Brain_Data() instance.
Parameters:
axis – 0 for observations 1 for voxels
method – [‘center’,’zscore’]
Returns:
Brain_Data Instance
temporal_resample(sampling_freq=None, target=None, target_type='hz')[source]¶
Resample Brain_Data timeseries to a new target frequency or number of samples
using Piecewise Cubic Hermite Interpolating Polynomial (PCHIP) interpolation.
This function can up- or down-sample data.
Note: this function can use quite a bit of RAM.
Parameters:
sampling_freq – (float) sampling frequency of data in hertz
target – (float) upsampling target
target_type – (str) type of target can be [samples,seconds,hz]
Returns:
upsampled Brain_Data instance
threshold(upper=None, lower=None, binarize=False




    
, coerce_nan=True)[source]¶
Threshold Brain_Data instance. Provide upper and lower values or
percentages to perform two-sided thresholding. Binarize will return
a mask image respecting thresholds if provided, otherwise respecting
every non-zero value.
Parameters:
upper – (float or str) Upper cutoff for thresholding. If string
will interpret as percentile; can be None for one-sided
thresholding.
lower – (float or str) Lower cutoff for thresholding. If string
will interpret as percentile; can be None for one-sided
thresholding.
binarize (bool) – return binarized image respecting thresholds if
provided, otherwise binarize on every non-zero value;
default False
coerce_nan (bool) – coerce nan values to 0s; default True
Returns:
Thresholded Brain_Data object.
transform_pairwise()[source]¶
Extract brain connected regions into separate regions.
Args:
Returns:
Brain_Data instance tranformed into pairwise comparisons
Return type:
Brain_Data
ttest(threshold_dict=None, return_mask=False)[source]¶
Calculate one sample t-test across each voxel (two-sided)
Parameters:
threshold_dict – (dict) a dictionary of threshold parameters
{‘unc’:.001} or {‘fdr’:.05}
return_mask – (bool) if thresholding is requested, optionall return the mask of voxels that exceed threshold, e.g. for use with another map
Returns:
(dict) dictionary of regression statistics in Brain_Data
instances {‘t’,’p’}
upload_neurovault(access_token=None, collection_name=None, collection_id=None, img_type=None, img_modality=None, **kwargs)[source]¶
Upload Data to Neurovault.  Will add any columns in self.X to image
metadata. Index will be used as image name.
Parameters:
access_token – (str, Required) Neurovault api access token
collection_name – (str, Optional) name of new collection to create
collection_id – (int, Optional) neurovault collection_id if adding images
to existing collection
img_type – (str, Required) Neurovault map_type
img_modality – (str, Required) Neurovault image modality
Returns:
(pd.DataFrame) neurovault collection information
Return type:
collection
class nltools.data.Adjacency(data=None, Y=None, matrix_type=None, labels=None, **kwargs)[source]¶
Adjacency is a class to represent Adjacency matrices as a vector rather
than a 2-dimensional matrix. This makes it easier to perform data
manipulation and analyses.
Parameters:
data – pandas data instance or list of files
matrix_type – (str) type of matrix.  Possible values include:
[‘distance’,’similarity’,’directed’,’distance_flat’,
‘similarity_flat’,’directed_flat’]
Y – Pandas DataFrame of training labels
**kwargs – Additional keyword arguments
bootstrap(function, n_samples=5000, save_weights=False, n_jobs=-1, random_state=None, *args, **kwargs)[source]¶
Bootstrap an Adjacency method.
Parameters:
function – (str) method to apply to data for each bootstrap
n_samples – (int) number of samples to bootstrap with replacement
save_weights – (bool) Save each bootstrap iteration
(useful for aggregating many bootstraps on a cluster)
n_jobs – (int) The number of CPUs to use to do the computation.
-1 means all CPUs.Returns:
Returns:
summarized studentized bootstrap output
Examples
>>>  b = dat.bootstrap('mean', n_samples=5000)
>>>  b = dat.bootstrap('predict', n_samples=5000, algorithm='ridge')
>>>  b = dat.bootstrap('predict', n_samples=5000, save_weights=True)
cluster_summary(clusters=None, metric='mean', summary='within')[source]¶
This function provides summaries of clusters within Adjacency matrices.
It can compute mean/median of within and between cluster values. Requires a
list of cluster ids indicating the row/column of each cluster.
Parameters:
clusters – (list) list of cluster labels
metric – (str) method to summarize mean or median. If ‘None” then return all r values
summary – (str) summarize within cluster or between clusters
Returns:
(dict) within cluster means
Return type:
distance(metric='correlation', **kwargs)[source]¶
Calculate distance between images within an Adjacency() instance.
Parameters:
metric – (str) type of distance metric (can use any scikit learn or
sciypy metric)
Returns:
(Adjacency) Outputs a 2D distance matrix.
Return type:
distance_to_similarity(metric='correlation', beta=1)[source]¶
Convert distance matrix to similarity matrix.
Note: currently only implemented for correlation and euclidean.
Parameters:
metric – (str) Can only be correlation or euclidean
beta – (float) parameter to scale exponential function (default: 1) for euclidean
Returns:
(Adjacency) Adjacency object
Return type:
generate_permutations(n_perm, random_state=None)[source]¶
Generate n_perm permutated versions of Adjacency in a lazy fashion. Useful for iterating against.
Parameters:
n_perm (int) – number of permutations
random_state (int, np.random.seed, optional) – random seed for reproducibility. Defaults to None.
Examples
>>> for perm in adj.generate_permutations(1000):
>>>     out = neural_distance_mat.similarity(perm)
Yields:
Adjacency – permuted version of self
isc(n_samples=




    
5000, metric='median', ci_percentile=95, exclude_self_corr=True, return_null=False, tail=2, n_jobs=-1, random_state=None)[source]¶
Compute intersubject correlation.
This implementation uses the subject-wise bootstrap method from Chen et al., 2016.
Instead of recomputing the pairwise ISC using circle_shift or phase_randomization methods,
this approach uses the computationally more efficient method of bootstrapping the subjects
and computing a new pairwise similarity matrix with randomly selected subjects with replacement.
If the same subject is selected multiple times, we set the perfect correlation to a nan with
(exclude_self_corr=True). As recommended by Chen et al., 2016, we compute the median pairwise ISC
by default. However, if the mean is preferred, we compute the mean correlation after performing
the fisher r-to-z transformation and then convert back to correlations to minimize artificially
inflating the correlation values. We compute the p-values using the percentile method using the same
method in Brainiak.
Chen, G., Shin, Y. W., Taylor, P. A., Glen, D. R., Reynolds, R. C., Israel, R. B.,
& Cox, R. W. (2016). Untangling the relatedness among correlations, part I:
nonparametric approaches to inter-subject correlation analysis at the group level.
NeuroImage, 142, 248-259.
Hall, P., & Wilson, S. R. (1991). Two guidelines for bootstrap hypothesis testing.
Biometrics, 757-762.
Parameters:
n_bootstraps – (int) number of bootstraps
metric – (str) type of association metric [‘spearman’,’pearson’,’kendall’]
tail – (int) either 1 for one-tail or 2 for two-tailed test (default: 2)
n_jobs – (int) The number of CPUs to use to do the computation. -1 means all CPUs.
return_parms – (bool) Return the permutation distribution along with the p-value; default False
Returns:
(dict) dictionary of permutation results [‘correlation’,’p’]
Return type:
stats
isc_group(group, n_samples=5000, metric='median', method='permute', ci_percentile=95, exclude_self_corr=True, return_null=False, tail=2, n_jobs=-1, random_state=None)[source]¶
Compute intersubject correlation differences between groups.
This function computes pairwise intersubject correlations (ISC) using the median as recommended by Chen
et al., 2016). However, if the mean is preferred, we compute the mean correlation after performing
the fisher r-to-z transformation and then convert back to correlations to minimize artificially
inflating the correlation values.
There are currently two different methods to compute p-values. By default, we use the subject-wise permutation
method recommended Chen et al., 2016. This method combines the two groups and computes pairwise similarity both
within and between the groups. Then the group labels are permuted and the mean difference between the two groups
are recomputed to generate a null distribution. The second method uses subject-wise bootstrapping, where a new
pairwise similarity matrix with randomly selected subjects with replacement is created separately for each group
and the ISC difference between these groups is used to generate a null distribution. If the same subject is
selected multiple times, we set the perfect correlation to a nan with (exclude_self_corr=True). We compute the
p-values using the percentile method (Hall & Wilson, 1991).
Chen, G., Shin, Y. W., Taylor, P. A., Glen, D. R., Reynolds, R. C., Israel, R. B.,
& Cox, R. W. (2016). Untangling the relatedness among correlations, part I:
nonparametric approaches to inter-subject correlation analysis at the group level.
NeuroImage, 142, 248-259.
Hall, P., & Wilson, S. R. (1991). Two guidelines for bootstrap hypothesis testing.
Biometrics, 757-762.
Parameters:
group – (np.array) vector of group ids corresponding to subject data in Adjacency instance
n_samples – (int) number of samples for permutation or bootstrapping
metric – (str) type of isc summary metric [‘mean’,’median’]
method – (str) method to compute p-values [‘permute’, ‘bootstrap’] (default: permute)
tail – (int) either 1 for one-tail or 2 for two-tailed test (default: 2)
n_jobs – (int) The number of CPUs to use to do the computation. -1 means all CPUs.
return_null – (bool) Return the permutation distribution along with the p-value; default False
Returns:
(dict) dictionary of permutation results [‘correlation’,’p’]
Return type:
stats
Parameters:
axis – (int) calculate mean over features (0) or data (1).
For data it will be on upper triangle.
Returns:
float if single, adjacency if axis=0, np.array if axis=1
and multiple
Parameters:
axis – (int) calculate median over features (0) or data (1).
For data it will be on upper triangle.
Returns:
float if single, adjacency if axis=0, np.array if axis=1
and multiple
plot(limit=3, axes=None, *args, **kwargs)[source]¶
Create Heatmap of Adjacency Matrix
Can pass in any sns.heatmap argument
Parameters:
limit – (int) number of heatmaps to plot if object contains multiple adjacencies (default: 3)
axes – matplotlib axis handle
plot_label_distance(labels=None, ax=None)[source]¶
Create a violin plot indicating within and between label distance
Parameters:
labels (np.array) – numpy array of labels to plot
Returns:
violin plot handles
Return type:
plot_mds(n_components=2, metric=True, labels=None, labels_color=None, cmap=<matplotlib.colors.LinearSegmentedColormap object>, n_jobs=-1, view=(30, 20), figsize=[12, 8], ax=None, *args, **kwargs)[source]¶
Plot Multidimensional Scaling
Parameters:
n_components – (int) Number of dimensions to project (can be 2 or 3)
metric – (bool) Perform metric or non-metric dimensional scaling; default
labels – (list) Can override labels stored in Adjacency Class
labels_color – (str) list of colors for labels, if len(1) then make all same color
n_jobs – (int) Number of parallel jobs
view – (tuple) view for 3-Dimensional plot; default (30,20)
plot_silhouette(labels=None, ax=None, permutation_test=True, n_permute=5000, **kwargs)[source]¶
Create a silhouette plot
regress(X, mode='ols', **kwargs)[source]¶
Run a regression on an adjacency instance.
You can decompose an adjacency instance with another adjacency instance.
You can also decompose each pixel by passing a design_matrix instance.
Parameters:
X – Design matrix can be an Adjacency or Design_Matrix instance
method – type of regression (default: ols)
Returns:
(dict) dictionary of stats outputs.
Return type:
stats
similarity(data, plot=False, perm_type='2d', n_permute=5000, metric='spearman', ignore_diagonal=False, **kwargs)[source]¶
Calculate similarity between two Adjacency matrices. Default is to use spearman
correlation and permutation test.
Parameters:
data (Adjacency or array) – Adjacency data, or 1-d array same size as self.data




    

perm_type – (str) ‘1d’,’2d’, or None
metric – (str) ‘spearman’,’pearson’,’kendall’
ignore_diagonal – (bool) only applies to ‘directed’ Adjacency types using
perm_type='1d' (perm_type=None or) – 
social_relations_model(summarize_results=True, nan_replace=True)[source]¶
Estimate the social relations model from a matrix for a round-robin design.
X_{ij} = m + lpha_i + eta_j + g_{ij} + epsilon_{ijl}
where X_{ij} is the score for person i rating person j, m is the group mean,
lpha_i  is person i’s actor effect, eta_j is person j’s partner effect, g_{ij}
is the relationship  effect and epsilon_{ijl} is the error in measure l  for actor i and partner j.
This model is primarily concerned with partioning the variance of the various effects.
Code is based on implementation presented in Chapter 8 of Kenny, Kashy, & Cook (2006).
Tests replicate examples  presented in the book. Note, that this method assumes that
actor scores are rows (lower triangle), while partner scores are columnns (upper triangle).
The minimal sample size to estimate these effects is 4.
Model Assumptions:

Social interactions are exclusively dyadic
People are randomly sampled from population
No order effects
The effects combine additively and relationships are linear
In the future we might update the formulas and standard errors based on
Bond and Lashley, 1996
Parameters:
self – (adjacency) can be a single matrix or many matrices for each group
summarize_results – (bool) will provide a formatted summary of model results
nan_replace – (bool) will replace nan values with row and column means
Returns:
(pd.Series/pd.DataFrame) All of the effects estimated using SRM
Return type:
estimated effects
stats_label_distance(labels=None, n_permute=5000, n_jobs=-1)[source]¶
Calculate permutation tests on within and between label distance.
Parameters:
labels (np.array) – numpy array of labels to plot
n_permute (int) – number of permutations to run (default=5000)
Returns:
dictionary of within and between group differences
and p-values
Parameters:
axis – (int) calculate std over features (0) or data (1).
For data it will be on upper triangle.
Returns:
float if single, adjacency if axis=0, np.array if axis=1 and
multiple
Parameters:
axis – (int) calculate mean over features (0) or data (1).
For data it will be on upper triangle.
Returns:
float if single, adjacency if axis=0, np.array if axis=1
and multiple
threshold(upper=None, lower=None, binarize=False)[source]¶
Threshold Adjacency instance. Provide upper and lower values or
percentages to perform two-sided thresholding. Binarize will return
a mask image respecting thresholds if provided, otherwise respecting
every non-zero value.
Parameters:
upper – (float or str) Upper cutoff for thresholding. If string
will interpret as percentile; can be None for one-sided
thresholding.
lower – (float or str) Lower cutoff for thresholding. If string
will interpret as percentile; can be None for one-sided
thresholding.
binarize (bool) – return binarized image respecting thresholds if
provided, otherwise binarize on every non-zero value;
default False
Returns:
thresholded Adjacency instance
Return type:
Adjacency
split(data, mask)[source]¶
Split Brain_Data instance into separate masks and store as a
dictionary.
class nltools.data.Design_Matrix(*args, **kwargs)[source]¶
Design_Matrix is a class to represent design matrices with special methods for data processing (e.g. convolution, upsampling, downsampling) and also intelligent and flexible and intelligent appending (e.g. auto-matically keep certain columns or polynomial terms separated during concatentation). It plays nicely with Brain_Data and can be used to build an experimental design to pass to Brain_Data’s X attribute. It is essentially an enhanced pandas df, with extra attributes and methods. Methods always return a new design matrix instance (copy). Column names are always string types.
Parameters:
sampling_freq (float) – sampling rate of each row in hertz; To covert seconds to hertz (e.g. in the case of TRs for neuroimaging) using hertz = 1 / TR
convolved (list, optional) – on what columns convolution has been performed; defaults to None
polys (list, optional) – list of polynomial terms in design matrix, e.g. intercept, polynomial trends, basis functions, etc; default None
add_dct_basis(duration=180, drop=0)[source]¶
Adds unit scaled cosine basis functions to Design_Matrix columns,
based on spm-style discrete cosine transform for use in
high-pass filtering. Does not add intercept/constant. Care is recommended if using this along with .add_poly(), as some columns will be highly-correlated.
Parameters:
duration (int) – length of filter in seconds
drop (int) – index of which early/slow bases to drop if any; will always drop constant (i.e. intercept) like SPM. Unlike SPM, retains first basis (i.e. linear/sigmoidal). Will cumulatively drop bases up to and inclusive of index provided (e.g. 2, drops bases 1 and 2); default None
add_poly(order=0, include_lower=True)[source]¶
Add nth order Legendre polynomial terms as columns to design matrix. Good for adding constant/intercept to model (order = 0) and accounting for slow-frequency nuisance artifacts e.g. linear, quadratic, etc drifts. Care is recommended when using this with .add_dct_basis() as some columns will be highly correlated.
Parameters:
order (int) – what order terms to add; 0 = constant/intercept
(default), 1 = linear, 2 = quadratic, etc
include_lower – (bool) whether to add lower order terms if order > 0
append(dm, axis=0, keep_separate=True, unique_cols=None, fill_na=0, verbose=False)[source]¶
Method for concatenating another design matrix row or column-wise. When concatenating row-wise, has the ability to keep certain columns separated if they exist in multiple design matrices (e.g. keeping separate intercepts for multiple runs). This is on by default and will automatically separate out polynomial columns (i.e. anything added with the add_poly or add_dct_basis methods). Additional columns can be separate by run using the unique_cols parameter. Can also add new polynomial terms during vertical concatentation (when axis == 0). This will by default create new polynomial terms separately for each design matrix
Parameters:
dm (Design_Matrix or list) – design_matrix or list of design_matrices to append
axis (int) – 0 for row-wise (vert-cat), 1 for column-wise (horz-cat); default 0
keep_separate (bool,optional) – whether try and uniquify columns;
defaults to True; only applies
when axis==0
unique_cols (list,optional) – what additional columns to try to keep
separated by uniquifying, only applies when
axis = 0; defaults to None
fill_na (str/int/float) – if provided will fill NaNs with this value during row-wise appending (when axis = 0) if separate columns are desired; default 0
verbose (bool) – print messages during append about how polynomials are going to be separated
clean(fill_na=0, exclude_polys=False, thresh=0.95, verbose=True)[source]¶
Method to fill NaNs in Design Matrix and remove duplicate columns based on data values, NOT names. Columns are dropped if they are correlated >= the requested threshold (default = .95). In this case, only the first instance of that column will be retained and all others will be dropped.
Parameters:
fill_na (str/int/float) – value to fill NaNs with set to None to retain NaNs; default 0
exclude_polys (bool) – whether to skip checking of polynomial terms (i.e. intercept, trends, basis functions); default False
thresh (float) – correlation threshold to use to drop redundant columns; default .95
verbose (bool) – print what column names were dropped; default True
convolve




    
(conv_func='hrf', columns=None)[source]¶
Perform convolution using an arbitrary function.
Parameters:
conv_func (ndarray or string) – either a 1d numpy array containing output of a function that you want to convolve; a samples by kernel 2d array of several kernels to convolve; or the string ‘hrf’ which defaults to a glover HRF function at the Design_matrix’s sampling_freq
columns (list) – what columns to perform convolution on; defaults
to all non-polynomial columns
downsample(target, **kwargs)[source]¶
Downsample columns of design matrix. Relies on nltools.stats.downsample, but ensures that returned object is a design matrix.
Parameters:
target (float) – desired frequency in hz
kwargs – additional inputs to nltools.stats.downsample
heatmap(figsize=(8, 6), **kwargs)[source]¶
Visualize Design Matrix spm style. Use .plot() for typical pandas
plotting functionality. Can pass optional keyword args to seaborn
heatmap.
replace_data(data, column_names=None)[source]¶
Convenient method to replace all data in Design_Matrix with new data while keeping attributes and polynomial columns untouched.
Parameters:
columns_names (list) – list of columns names for new data
upsample(target, **kwargs)[source]¶
Upsample columns of design matrix. Relies on nltools.stats.upsample, but ensures that returned object is a design matrix.
Parameters:
target (float) – desired frequence in hz
kwargs – additional inputs to nltools.stats.downsample
vif(exclude_polys=True)[source]¶
Compute variance inflation factor amongst columns of design matrix,ignoring
polynomial terms. Much faster that statsmodels and more reliable too. Uses the
same method as Matlab and R (diagonal elements of the inverted correlation
matrix).
Parameters:
exclude_polys (bool) – whether to skip checking of polynomial terms (i.e intercept, trends, basis functions); default True
Returns:
list with length == number of columns - intercept
Return type:
vifs (list)
zscore(columns=[])[source]¶
Z-score specific columns of design matrix. Relies on nltools.stats.downsample, but ensures that returned object is a design matrix.
Parameters:
columns (list) – columns to z-score; defaults to all columns
class nltools.analysis.Roc(input_values=None, binary_outcome=None, threshold_type='optimal_overall', forced_choice=None, **kwargs)[source]¶
Roc Class
The Roc class is based on Tor Wager’s Matlab roc_plot.m function and
allows a user to easily run different types of receiver operator
characteristic curves.  For example, one might be interested in single
interval or forced choice.
Parameters:
input_values – nibabel data instance
binary_outcome – vector of training labels
threshold_type – [‘optimal_overall’, ‘optimal_balanced’,
‘minimum_sdt_bias’]
**kwargs – Additional keyword arguments to pass to the prediction
algorithm
calculate(input_values=None, binary_outcome=None, criterion_values=None, threshold_type='optimal_overall', forced_choice=None, balanced_acc=False)[source]¶
Calculate Receiver Operating Characteristic plot (ROC) for
single-interval classification.
Parameters:
input_values – nibabel data instance
binary_outcome – vector of training labels
criterion_values – (optional) criterion values for calculating fpr
threshold_type – [‘optimal_overall’, ‘optimal_balanced’,
‘minimum_sdt_bias’]
forced_choice – index indicating position for each unique subject
(default=None)
balanced_acc – balanced accuracy for single-interval classification
(bool). THIS IS NOT COMPLETELY IMPLEMENTED BECAUSE
IT AFFECTS ACCURACY ESTIMATES, BUT NOT P-VALUES OR
THRESHOLD AT WHICH TO EVALUATE SENS/SPEC
**kwargs – Additional keyword arguments to pass to the prediction
algorithm
plot(plot_method='gaussian', balanced_acc=False, **kwargs)[source]¶
Create ROC Plot
Create a specific kind of ROC curve plot, based on input values
along a continuous distribution and a binary outcome variable (logical)
Parameters:
plot_method – type of plot [‘gaussian’,’observed’]
binary_outcome – vector of training labels
**kwargs – Additional keyword arguments to pass to the prediction
algorithm
Returns:
nltools.stats.align(data, method='deterministic_srm', n_features=None, axis=0, *args, **kwargs)[source]¶
Align subject data into a common response model.
Can be used to hyperalign source data to target data using
Hyperalignment from Dartmouth (i.e., procrustes transformation; see
nltools.stats.procrustes) or Shared Response Model from Princeton (see
nltools.external.srm). (see nltools.data.Brain_Data.align for aligning
a single Brain object to another). Common Model is shared response
model or centered target data. Transformed data can be back projected to
original data using Tranformation matrix. Inputs must be a list of Brain_Data
instances or numpy arrays (observations by features).
Parameters:
data – (list) A list of Brain_Data objects
method – (str) alignment method to use
[‘probabilistic_srm’,’deterministic_srm’,’procrustes’]
n_features – (int) number of features to align to common space.
If None then will select number of voxels
axis – (int) axis to align on
Returns:
(dict) a dictionary containing a list of transformed subject
matrices, a list of transformation matrices, the shared
response matrix, and the intersubject correlation of the shared resposnes
nltools.stats.align_states(reference, target, metric='correlation', return_index=False, replace_zero_variance=False)[source]¶
Align state weight maps using hungarian algorithm by minimizing pairwise distance between group states.
Parameters:
reference – (np.array) reference pattern x state matrix
target – (np.array) target pattern x state matrix to align to reference
metric – (str) distance metric to use
return_index – (bool) return index if True, return remapped data if False
replace_zero_variance – (bool) transform a vector with zero variance to random numbers from a uniform distribution.
Useful for when using correlation as a distance metric to avoid NaNs.
Returns:
(list) a list of reordered state X pattern matrices




    

Return type:
ordered_weights
nltools.stats.calc_bpm(beat_interval, sampling_freq)[source]¶
Calculate instantaneous BPM from beat to beat interval
Parameters:
beat_interval – (int) number of samples in between each beat
(typically R-R Interval)
sampling_freq – (float) sampling frequency in Hz
Returns:
(float) beats per minute for time interval
Return type:
nltools.stats.correlation(data1, data2, metric='pearson')[source]¶
This function calculates the correlation between data1 and data2
Parameters:
data1 – (np.array) x
data2 – (np.array) y
metric – (str) type of correlation [“spearman” or “pearson” or “kendall”]
Returns:
(np.array) correlations
p: (float) p-value
Return type:
nltools.stats.correlation_permutation(data1, data2, method='permute', n_permute=5000, metric='spearman', tail=2, n_jobs=-1, return_perms=False, random_state=None)[source]¶
Compute correlation and calculate p-value using permutation methods.
‘permute’ method randomly shuffles one of the vectors. This method is recommended
for independent data. For timeseries data we recommend using ‘circle_shift’ or
‘phase_randomize’ methods.
Parameters:
data1 – (pd.DataFrame, pd.Series, np.array) dataset 1 to permute
data2 – (pd.DataFrame, pd.Series, np.array) dataset 2 to permute
n_permute – (int) number of permutations
metric – (str) type of association metric [‘spearman’,’pearson’,
‘kendall’]
method – (str) type of permutation [‘permute’, ‘circle_shift’, ‘phase_randomize’]
random_state – (int, None, or np.random.RandomState) Initial random seed (default: None)
tail – (int) either 1 for one-tail or 2 for two-tailed test (default: 2)
n_jobs – (int) The number of CPUs to use to do the computation.
-1 means all CPUs.
return_parms – (bool) Return the permutation distribution along with the p-value; default False
Returns:
(dict) dictionary of permutation results [‘correlation’,’p’]
Return type:
stats
nltools.stats.distance_correlation(x, y, bias_corrected=True, ttest=False)[source]¶
Compute the distance correlation betwen 2 arrays to test for multivariate dependence (linear or non-linear). Arrays must match on their first dimension. It’s almost always preferable to compute the bias_corrected version which can also optionally perform a ttest. This ttest operates on a statistic thats ~dcorr^2 and will be also returned.
Explanation:
Distance correlation involves computing the normalized covariance of two centered euclidean distance matrices. Each distance matrix is the euclidean distance between rows (if x or y are 2d) or scalars (if x or y are 1d). Each matrix is centered prior to computing the covariance either using double-centering or u-centering, which corrects for bias as the number of dimensions increases. U-centering is almost always preferred in all cases. It also permits inference of the normalized covariance between each distance matrix using a one-tailed directional t-test. (Szekely & Rizzo, 2013). While distance correlation is normally bounded between 0 and 1, u-centering can produce negative estimates, which are never significant.
Validated against the dcor and dcor.ttest functions in the ‘energy’ R package and the dcor.distance_correlation, dcor.udistance_correlation_sqr, and dcor.independence.distance_correlation_t_test functions in the dcor Python package.
Parameters:
x (ndarray) – 1d or 2d numpy array of observations by features
y (ndarry) – 1d or 2d numpy array of observations by features
bias_corrected (bool) – if false use double-centering which produces a biased-estimate that converges to 1 as the number of dimensions increase. Otherwise used u-centering to correct this bias. Note this must be True if ttest=True; default True
ttest (bool) – perform a ttest using the bias_corrected distance correlation; default False
Returns:
dictionary of results (correlation, t, p, and df.) Optionally, covariance, x variance, and y variance
Return type:
results (dict)
nltools.stats.downsample(data, sampling_freq=None, target=None, target_type='samples', method='mean')[source]¶
Downsample pandas to a new target frequency or number of samples
using averaging.
Parameters:
data – (pd.DataFrame, pd.Series) data to downsample
sampling_freq – (float) Sampling frequency of data in hertz
target – (float) downsampling target
target_type – type of target can be [samples,seconds,hz]
method – (str) type of downsample method [‘mean’,’median’],
default: mean
Returns:
(pd.DataFrame, pd.Series) downsmapled data
Return type:
nltools.stats.fdr(p, q=0.05)[source]¶
Determine FDR threshold given a p value array and desired false
discovery rate q. Written by Tal Yarkoni
Parameters:
p – (np.array) vector of p-values
q – (float) false discovery rate level
Returns:
(float) p-value threshold based on independence or positive
dependence
nltools.stats.find_spikes(data, global_spike_cutoff=3, diff_spike_cutoff=3)[source]¶
Function to identify spikes from fMRI Time Series Data
Parameters:
data – Brain_Data or nibabel instance
global_spike_cutoff – (int,None) cutoff to identify spikes in global signal
in standard deviations, None indicates do not calculate.
diff_spike_cutoff – (int,None) cutoff to identify spikes in average frame difference
in standard deviations, None indicates do not calculate.
Returns:
pandas dataframe with spikes as indicator variables
nltools.stats.holm_bonf(p, alpha=0.05)[source]¶
Compute corrected p-values based on the Holm-Bonferroni method, i.e. step-down procedure applying iteratively less correction to highest p-values. A bit more conservative than fdr, but much more powerful thanvanilla bonferroni.
Parameters:
p – (np.array) vector of p-values
alpha – (float) alpha level
Returns:
(float) p-value threshold based on bonferroni
step-down procedure
nltools.stats.isc(data, n_samples=5000, metric='median', method='bootstrap', ci_percentile=95, exclude_self_corr=True, return_null=False, tail=2, n_jobs=-1, random_state=None)[source]¶
Compute pairwise intersubject correlation from observations by subjects array.
This function computes pairwise intersubject correlations (ISC) using the median as recommended by Chen
et al., 2016). However, if the mean is preferred, we compute the mean correlation after performing
the fisher r-to-z transformation and then convert back to correlations to minimize artificially
inflating the correlation values.




    

There are currently three different methods to compute p-values. These include the classic methods for
computing permuted time-series by either circle-shifting the data or phase-randomizing the data
(see Lancaster et al., 2018). These methods create random surrogate data while preserving the temporal
autocorrelation inherent to the signal. By default, we use the subject-wise bootstrap method from
Chen et al., 2016. Instead of recomputing the pairwise ISC using circle_shift or phase_randomization methods,
this approach uses the computationally more efficient method of bootstrapping the subjects
and computing a new pairwise similarity matrix with randomly selected subjects with replacement.
If the same subject is selected multiple times, we set the perfect correlation to a nan with
(exclude_self_corr=True). We compute the p-values using the percentile method using the same
method in Brainiak.
Chen, G., Shin, Y. W., Taylor, P. A., Glen, D. R., Reynolds, R. C., Israel, R. B.,
& Cox, R. W. (2016). Untangling the relatedness among correlations, part I:
nonparametric approaches to inter-subject correlation analysis at the group level.
NeuroImage, 142, 248-259.
Hall, P., & Wilson, S. R. (1991). Two guidelines for bootstrap hypothesis testing.
Biometrics, 757-762.
Lancaster, G., Iatsenko, D., Pidde, A., Ticcinelli, V., & Stefanovska, A. (2018).
Surrogate data for hypothesis testing of physical systems. Physics Reports, 748, 1-60.
Parameters:
data – (pd.DataFrame, np.array) observations by subjects where isc is computed across subjects
n_samples – (int) number of random samples/bootstraps
metric – (str) type of isc summary metric [‘mean’,’median’]
method – (str) method to compute p-values [‘bootstrap’, ‘circle_shift’,’phase_randomize’] (default: bootstrap)
tail – (int) either 1 for one-tail or 2 for two-tailed test (default: 2)
n_jobs – (int) The number of CPUs to use to do the computation. -1 means all CPUs.
return_null – (bool) Return the permutation distribution along with the p-value; default False
Returns:
(dict) dictionary of permutation results [‘correlation’,’p’]
Return type:
stats
nltools.stats.isc_group(group1, group2, n_samples=5000, metric='median', method='permute', ci_percentile=95, exclude_self_corr=True, return_null=False, tail=2, n_jobs=-1, random_state=None)[source]¶
Compute difference in intersubject correlation between groups.
This function computes pairwise intersubject correlations (ISC) using the median as recommended by Chen
et al., 2016). However, if the mean is preferred, we compute the mean correlation after performing
the fisher r-to-z transformation and then convert back to correlations to minimize artificially
inflating the correlation values.
There are currently two different methods to compute p-values. By default, we use the subject-wise permutation
method recommended Chen et al., 2016. This method combines the two groups and computes pairwise similarity both
within and between the groups. Then the group labels are permuted and the mean difference between the two groups
are recomputed to generate a null distribution. The second method uses subject-wise bootstrapping, where a new
pairwise similarity matrix with randomly selected subjects with replacement is created separately for each group
and the ISC difference between these groups is used to generate a null distribution. If the same subject is
selected multiple times, we set the perfect correlation to a nan with (exclude_self_corr=True). We compute the
p-values using the percentile method (Hall & Wilson, 1991).
Chen, G., Shin, Y. W., Taylor, P. A., Glen, D. R., Reynolds, R. C., Israel, R. B.,
& Cox, R. W. (2016). Untangling the relatedness among correlations, part I:
nonparametric approaches to inter-subject correlation analysis at the group level.
NeuroImage, 142, 248-259.
Hall, P., & Wilson, S. R. (1991). Two guidelines for bootstrap hypothesis testing.
Biometrics, 757-762.
Parameters:
group1 – (pd.DataFrame, np.array) observations by subjects where isc is computed across subjects
group2 – (pd.DataFrame, np.array) observations by subjects where isc is computed across subjects
n_samples – (int) number of samples for permutation or bootstrapping
metric – (str) type of isc summary metric [‘mean’,’median’]
method – (str) method to compute p-values [‘bootstrap’, ‘circle_shift’,’phase_randomize’] (default: bootstrap)
tail – (int) either 1 for one-tail or 2 for two-tailed test (default: 2)
n_jobs – (int) The number of CPUs to use to do the computation. -1 means all CPUs.
return_null – (bool) Return the permutation distribution along with the p-value; default False
Returns:
(dict) dictionary of permutation results [‘correlation’,’p’]
Return type:
stats
nltools.stats.isfc(data, method='average')[source]¶
Compute intersubject functional connectivity (ISFC) from a list of observation x feature matrices
This function uses the leave one out approach to compute ISFC (Simony et al., 2016).
For each subject, compute the cross-correlation between each voxel/roi
with the average of the rest of the subjects data. In other words,
compute the mean voxel/ROI response for all participants except the
target subject. Then compute the correlation between each ROI within
the target subject with the mean ROI response in the group average.
Simony, E., Honey, C. J., Chen, J., Lositsky, O., Yeshurun, Y., Wiesel, A., & Hasson, U. (2016).
Dynamic reconfiguration of the default mode network during narrative comprehension.
Nature communications, 7, 12141.
Parameters:
data – list of subject matrices (observations x voxels/rois)
method – approach to computing ISFC. ‘average’ uses leave one
Returns:
list of subject ISFC matrices
nltools.stats.isps(data, sampling_freq=0.5, low_cut=0.04, high_cut=0.07, order=5, pairwise=False)[source]¶
Compute Dynamic Intersubject Phase Synchrony (ISPS from a observation by subject array)
This function computes the instantaneous intersubject phase synchrony for a single voxel/roi
timeseries. Requires multiple subjects. This method is largely based on that described by Glerean
et al., 2012 and performs a hilbert transform on narrow bandpass filtered timeseries (butterworth)
data to get the instantaneous phase angle. The function returns a dictionary containing the
average phase angle, the average vector length, and parametric p-values computed using the rayleigh test using circular
statistics (Fisher, 1993). If pairwise=True, then it will compute these on the pairwise phase angle differences,
if pairwise=False, it will compute these on the actual phase angles. This is called inter-site phase coupling
or inter-trial phase coupling respectively in the EEG literatures.
This function requires narrow band filtering your data. As a default we use the recommendations
by (Glerean et al., 2012) of .04-.07Hz. This is similar to the “slow-4” band (0.025–0.067 Hz)
described by (Zuo et al., 2010; Penttonen & Buzsáki, 2003), but excludes the .03 band, which has been
demonstrated to contain aliased respiration signals (Birn, 2006).
Birn RM, Smith MA, Bandettini PA, Diamond JB. 2006. Separating respiratory-variation-related
fluctuations from neuronal-activity- related fluctuations in fMRI. Neuroimage 31:1536–1548.
Buzsáki, G., & Draguhn, A. (2004). Neuronal oscillations in cortical networks. Science,
304(5679), 1926-1929.
Fisher, N. I. (1995). Statistical analysis of circular data. cambridge university press.
Glerean, E., Salmi, J., Lahnakoski, J. M., Jääskeläinen, I. P., & Sams, M. (2012).
Functional magnetic resonance imaging phase synchronization as a measure of dynamic
functional connectivity. Brain connectivity, 2(2), 91-101.
Parameters:
data – (pd.DataFrame, np.ndarray) observations x subjects data
sampling_freq – (float) sampling freqency of data in Hz
low_cut – (float) lower bound cutoff for high pass filter
high_cut – (float) upper bound cutoff for low pass filter
order – (int) filter order for butterworth bandpass
pairwise – (bool) compute phase angle coherence on pairwise phase angle differences
or on raw phase angle.
Returns:
dictionary with mean phase angle, vector length, and rayleigh statistic
nltools.stats.make_cosine_basis(nsamples, sampling_freq, filter_length, unit_scale=True, drop=0)[source]¶
Create a series of cosine basis functions for a discrete cosine
transform. Based off of implementation in spm_filter and spm_dctmtx
because scipy dct can only apply transforms but not return the basis
functions. Like SPM, does not add constant (i.e. intercept), but does
retain first basis (i.e. sigmoidal/linear drift)
Parameters:
nsamples (int) – number of observations (e.g. TRs)
sampling_freq (float) – sampling frequency in hertz (i.e. 1 / TR)
filter_length (int) – length of filter in seconds
unit_scale (true) – assure that the basis functions are on the normalized range [-1, 1]; default True
drop (int) – index of which early/slow bases to drop if any; default is
to drop constant (i.e. intercept) like SPM. Unlike SPM, retains
first basis (i.e. linear/sigmoidal). Will cumulatively drop bases
up to and inclusive of index provided (e.g. 2, drops bases 1 and 2)
Returns:
nsamples x number of basis sets numpy array
Return type:
out (ndarray)
nltools.stats.matrix_permutation(data1, data2, n_permute=5000, metric='spearman', how='upper', include_diag=False, tail=2, n_jobs=-1, return_perms=False, random_state=None)[source]




    
¶
Permute 2-dimensional matrix correlation (mantel test).
Chen, G. et al. (2016). Untangling the relatedness among correlations,
part I: nonparametric approaches to inter-subject correlation analysis
at the group level. Neuroimage, 142, 248-259.
Parameters:
data1 – (pd.DataFrame, np.array) square matrix
data2 – (pd.DataFrame, np.array) square matrix
n_permute – (int) number of permutations
metric – (str) type of association metric [‘spearman’,’pearson’,
‘kendall’]
how – (str) whether to use the ‘upper’ (default), ‘lower’, or ‘full’ matrix. The
default of ‘upper’ assumes both matrices are symmetric
include_diag (bool) – only applies if how=’full’. Whether to include the
diagonal elements in the comparison
tail – (int) either 1 for one-tail or 2 for two-tailed test
(default: 2)
n_jobs – (int) The number of CPUs to use to do the computation.
-1 means all CPUs.
return_parms – (bool) Return the permutation distribution along with the p-value; default False
Returns:
(dict) dictionary of permutation results [‘correlation’,’p’]
Return type:
stats
nltools.stats.multi_threshold(t_map, p_map, thresh)[source]¶
Threshold test image by multiple p-value from p image
Parameters:
stat – (Brain_Data) Brain_Data instance of arbitrary statistic metric
(e.g., beta, t, etc)
p – (Brain_Data) Brain_data instance of p-values
threshold – (list) list of p-values to threshold stat image
Returns:
Thresholded Brain_Data instance
Return type:
nltools.stats.one_sample_permutation(data, n_permute=5000, tail=2, n_jobs=-1, return_perms=False, random_state=None)[source]¶
One sample permutation test using randomization.
Parameters:
data – (pd.DataFrame, pd.Series, np.array) data to permute
n_permute – (int) number of permutations
tail – (int) either 1 for one-tail or 2 for two-tailed test (default: 2)
n_jobs – (int) The number of CPUs to use to do the computation.
-1 means all CPUs.
return_parms – (bool) Return the permutation distribution along with the p-value; default False
random_state – (int, None, or np.random.RandomState) Initial random seed (default: None)
Returns:
(dict) dictionary of permutation results [‘mean’,’p’]
Return type:
stats
nltools.stats.pearson(x, y)[source]¶
Correlates row vector x with each row vector in 2D array y.
From neurosynth.stats.py - author: Tal Yarkoni
nltools.stats.procrustes(data1, data2)[source]¶
Procrustes analysis, a similarity test for two data sets.
Each input matrix is a set of points or vectors (the rows of the matrix).
The dimension of the space is the number of columns of each matrix. Given
two identically sized matrices, procrustes standardizes both such that:
- \(tr(AA^{T}) = 1\).
- Both sets of points are centered around the origin.
Procrustes ([1]_, [2]_) then applies the optimal transform to the second
matrix (including scaling/dilation, rotations, and reflections) to minimize
\(M^{2}=\sum(data1-data2)^{2}\), or the sum of the squares of the
pointwise differences between the two input datasets.
This function was not designed to handle datasets with different numbers of
datapoints (rows).  If two data sets have different dimensionality
(different number of columns), this function will add columns of zeros to
the smaller of the two.
Parameters:
data1 – array_like
Matrix, n rows represent points in k (columns) space data1 is the
reference data, after it is standardised, the data from data2
will be transformed to fit the pattern in data1 (must have >1
unique points).
data2 – array_like
n rows of data in k space to be fit to data1.  Must be the  same
shape (numrows, numcols) as data1 (must have >1 unique points).
Returns:
array_like
A standardized version of data1.
mtx2array_like
The orientation of data2 that best fits data1. Centered, but not
necessarily \(tr(AA^{T}) = 1\).
disparityfloat
\(M^{2}\) as defined above.
R(N, N) ndarray
The matrix solution of the orthogonal Procrustes problem.
Minimizes the Frobenius norm of dot(data1, R) - data2, subject to
dot(R.T, R) == I.
scalefloat
Sum of the singular values of dot(data1.T, data2).
nltools.stats.procrustes_distance(mat1, mat2, n_permute=5000, tail=2, n_jobs=-1, random_state=None)[source]¶
Use procrustes super-position to perform a similarity test between 2 matrices. Matrices need to match in size on their first dimension only, as the smaller matrix on the second dimension will be padded with zeros. After aligning two matrices using the procrustes transformation, use the computed disparity between them (sum of squared error of elements) as a similarity metric. Shuffle the rows of one of the matrices and recompute the disparity to perform inference (Peres-Neto & Jackson, 2001).
Parameters:
mat1 (ndarray) – 2d numpy array; must have same number of rows as mat2
mat2 (ndarray) – 1d or 2d numpy array; must have same number of rows as mat1
n_permute (int) – number of permutation iterations to perform
tail (int) – either 1 for one-tailed or 2 for two-tailed test; default 2
n_jobs (int) – The number of CPUs to use to do permutation; default -1 (all)
Returns:
similarity between matrices bounded between 0 and 1
pval (float): permuted p-value
Return type:
similarity (float)
nltools.stats.regress(X, Y, mode='ols', stats='full', **kwargs)[source]¶
This is a flexible function to run several types of regression models provided X and Y numpy arrays. Y can be a 1d numpy array or 2d numpy array. In the latter case, results will be output with shape 1 x Y.shape[1], in other words fitting a separate regression model to each column of Y.
Does NOT add an intercept automatically to the X matrix before fitting like some other software packages. This is left up to the user.
This function can compute regression in 3 ways:
Standard OLS
OLS with robust sandwich estimators for standard errors. 3 robust types of
estimators exist:
‘hc0’ - classic huber-white estimator robust to heteroscedasticity (default)
‘hc3’ - a variant on huber-white estimator slightly more conservative when sample sizes are small
‘hac’ - an estimator robust to both heteroscedasticity and auto-correlation;
auto-correlation lag can be controlled with the nlags keyword argument; default
ARMA (auto-regressive moving-average) model (experimental). This model is fit through statsmodels.tsa.arima_model.ARMA, so more information about options can be found there. Any settings can be passed in as kwargs. By default fits a (1,1) model with starting lags of 2. This mode is computationally intensive and can take quite a while if Y has many columns.  If Y is a 2d array joblib.Parallel is used for faster fitting by parallelizing fits across columns of Y. Parallelization can be controlled by passing in kwargs. Defaults to multi-threading using 10 separate threads, as threads don’t require large arrays to be duplicated in memory. Defaults are also set to enable memory-mapping for very large arrays if backend=’multiprocessing’ to prevent crashes and hangs. Various levels of progress can be monitored using the ‘disp’ (statsmodels) and ‘verbose’ (joblib) keyword arguments with integer values > 0.
Parameters:
X (ndarray) – design matrix; assumes intercept is included
Y (ndarray) – dependent variable array; if 2d, a model is fit to each column of Y separately
mode (str) – kind of model to fit; must be one of ‘ols’ (default), ‘robust’, or
'arma' – 
stats (str) – one of ‘full’, ‘betas’, ‘tstats’. Useful to speed up calculation if
'full'. (you know you only need some statistics and not others. Defaults to) – 
robust_estimator (str,optional) – kind of robust estimator to use if mode = ‘robust’; default ‘hc0’
nlags (int,optional) – auto-correlation lag correction if mode = ‘robust’ and robust_estimator = ‘hac’; default 1
order (tuple,optional) – auto-regressive and moving-average orders for mode = ‘arma’; default (1,1)
kwargs (dict) – additional keyword arguments to statsmodels.tsa.arima_model.ARMA and joblib.Parallel
Returns:
coefficients
se: standard error of coefficients
t: t-statistics (coef/sterr)
p : p-values
df: degrees of freedom
res: residuals
Return type:
Examples
Standard OLS
>>> results = regress(X,Y,mode='ols')
Robust OLS with heteroscedasticity (hc0) robust standard errors
>>> results = regress(X,Y,mode='robust')
Robust OLS with heteroscedasticty and auto-correlation (with lag 2) robust standard errors
>>> results = regress(X,Y




    
,mode='robust',robust_estimator='hac',nlags=2)
Auto-regressive mode with auto-regressive and moving-average lags = 1
>>> results = regress(X,Y,mode='arma',order=(1,1))
Auto-regressive model with auto-regressive lag = 2, moving-average lag = 3, and multi-processing instead of multi-threading using 8 cores (this can use a lot of memory if input arrays are very large!).
>>> results = regress(X,Y,mode='arma',order=(2,3),backend='multiprocessing',n_jobs=8)
nltools.stats.summarize_bootstrap(data, save_weights=False)[source]¶
Calculate summary of bootstrap samples
Parameters:
sample – (Brain_Data) Brain_Data instance of samples
save_weights – (bool) save bootstrap weights
Returns:
(dict) dictionary of Brain_Data summary images
Return type:
output
nltools.stats.threshold(stat, p, thr=0.05, return_mask=False)[source]¶
Threshold test image by p-value from p image
Parameters:
stat – (Brain_Data) Brain_Data instance of arbitrary statistic metric
(e.g., beta, t, etc)
p – (Brain_Data) Brain_data instance of p-values
threshold – (float) p-value to threshold stat image
return_mask – (bool) optionall return the thresholding mask; default False
Returns:
Thresholded Brain_Data instance
Return type:
nltools.stats.transform_pairwise(X, y)[source]¶
Transforms data into pairs with balanced labels for ranking
Transforms a n-class ranking problem into a two-class classification
problem. Subclasses implementing particular strategies for choosing
pairs should override this method.
In this method, all pairs are choosen, except for those that have the
same target value. The output is an array of balanced classes, i.e.
there are the same number of -1 as +1
Reference: “Large Margin Rank Boundaries for Ordinal Regression”,
R. Herbrich, T. Graepel, K. Obermayer. Authors: Fabian Pedregosa
<fabian@fseoane.net> Alexandre Gramfort <alexandre.gramfort@inria.fr>
Parameters:
X – (np.array), shape (n_samples, n_features)
The data
y – (np.array), shape (n_samples,) or (n_samples, 2)
Target labels. If it’s a 2D array, the second column represents
the grouping of samples, i.e., samples with different groups will
not be considered.
Returns:
(np.array), shape (k, n_feaures)
Data as pairs, where k = n_samples * (n_samples-1)) / 2 if grouping
values were not passed. If grouping variables exist, then returns
values computed for each group.
y_trans: (np.array), shape (k,)
Output class labels, where classes have values {-1, +1}
If y was shape (n_samples, 2), then returns (k, 2) with groups on
the second dimension.
nltools.stats.trim(data, cutoff=None)[source]¶
Trim a Pandas DataFrame or Series by replacing outlier values with NaNs
Parameters:
data – (pd.DataFrame, pd.Series) data to trim
cutoff – (dict) a dictionary with keys {‘std’:[low,high]} or
{‘quantile’:[low,high]}
Returns:
(pd.DataFrame, pd.Series) trimmed data
Return type:
nltools.stats.two_sample_permutation(data1, data2, n_permute=5000, tail=2, n_jobs=-1, return_perms=False, random_state=None)[source]¶
Independent sample permutation test.
Parameters:
data1 – (pd.DataFrame, pd.Series, np.array) dataset 1 to permute
data2 – (pd.DataFrame, pd.Series, np.array) dataset 2 to permute
n_permute – (int) number of permutations
tail – (int) either 1 for one-tail or 2 for two-tailed test (default: 2)
n_jobs – (int) The number of CPUs to use to do the computation.
-1 means all CPUs.
return_parms – (bool) Return the permutation distribution along with the p-value; default False
Returns:
(dict) dictionary of permutation results [‘mean’,’p’]
Return type:
stats
nltools.stats.u_center(mat)[source]¶
U-center a 2d array. U-centering is a bias-corrected form of double-centering
Parameters:
mat (ndarray) – 2d numpy array
Returns:
u-centered version of input
Return type:
mat (narray)
nltools.stats.upsample(data, sampling_freq=None, target=None, target_type='samples', method='linear')[source]¶
Upsample pandas to a new target frequency or number of samples using interpolation.
Parameters:
data – (pd.DataFrame, pd.Series) data to upsample
(Note: will drop non-numeric columns from DataFrame)
sampling_freq – Sampling frequency of data in hertz
target – (float) upsampling target
target_type – (str) type of target can be [samples,seconds,hz]
method – (str) [‘linear’, ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’]
where ‘zero’, ‘slinear’, ‘quadratic’ and ‘cubic’
refer to a spline interpolation of zeroth, first,
second or third order  (default: linear)
Returns:
upsampled pandas object
nltools.stats.winsorize(data, cutoff=None, replace_with_cutoff=True)[source]¶
Winsorize a Pandas DataFrame or Series with the largest/lowest value not considered outlier
Parameters:
data – (pd.DataFrame, pd.Series) data to winsorize
cutoff – (dict) a dictionary with keys {‘std’:[low,high]} or
{‘quantile’:[low,high]}
replace_with_cutoff – (bool) If True, replace outliers with cutoff.
If False, replaces outliers with closest
existing values; (default: False)
Returns:
(pd.DataFrame, pd.Series) winsorized data
Return type:
nltools.stats.zscore(df)[source]¶
zscore every column in a pandas dataframe or series.
Parameters:
df – (pd.DataFrame) Pandas DataFrame instance
Returns:
(pd.DataFrame) z-scored pandas DataFrame or series instance
Return type:
z_data
nltools.datasets.download_collection(collection=None, data_dir=None, overwrite=False, resume=True, verbose=1)[source]¶
Download images and metadata from Neurovault collection




    

Parameters:
collection (int, optional) – collection id. Defaults to None.
data_dir (str, optional) – data directory. Defaults to None.
overwrite (bool, optional) – overwrite data directory. Defaults to False.
resume (bool, optional) – resume download. Defaults to True.
verbose (int, optional) – print diagnostic messages. Defaults to 1.
Returns:
(DataFrame of image metadata, list of files from downloaded collection)
Return type:
(pd.DataFrame, list)
nltools.datasets.fetch_emotion_ratings(data_dir=None, resume=True, verbose=1)[source]¶
Download and loads emotion rating dataset from neurovault
Parameters:
data_dir – (string, optional). Path of the data directory. Used to force data storage in a specified location. Default: None
Returns:
(Brain_Data) Brain_Data object with downloaded data. X=metadata
Return type:
nltools.datasets.fetch_pain(data_dir=None, resume=True, verbose=1)[source]¶
Download and loads pain dataset from neurovault
Parameters:
data_dir – (string, optional) Path of the data directory. Used to force data storage in a specified location. Default: None
Returns:
(Brain_Data) Brain_Data object with downloaded data. X=metadata
Return type:
nltools.datasets.get_collection_image_metadata(collection=None, data_dir=None, limit=10)[source]¶
Get image metadata associated with collection
Parameters:
collection (int, optional) – collection id. Defaults to None.
data_dir (str, optional) – data directory. Defaults to None.
limit (int, optional) – number of images to increment. Defaults to 10.
Returns:
Dataframe with full image metadata from collection
Return type:
pd.DataFrame
class nltools.cross_validation.KFoldStratified(n_splits=3, shuffle=False, random_state=None)[source]¶
K-Folds cross validation iterator which stratifies continuous data
(unlike scikit-learn equivalent).
Provides train/test indices to split data in train test sets. Split
dataset into k consecutive folds while ensuring that same subject is
held out within each fold.  Each fold is then used a validation set
once while the k - 1 remaining folds form the training set.
Extension of KFold from scikit-learn cross_validation model
Parameters:
n_splits – int, default=3
Number of folds. Must be at least 2.
shuffle – boolean, optional
Whether to shuffle the data before splitting into batches.
random_state – None, int or RandomState
Pseudo-random number generator state used for random
sampling. If None, use default numpy RNG for shuffling
split(X, y, groups=None)[source]¶
Generate indices to split data into training and test set.
Parameters:
X – array-like, shape (n_samples, n_features)
Training data, where n_samples is the number of samples
and n_features is the number of features.
Note that providing y is sufficient to generate the splits
and hence np.zeros(n_samples) may be used as a placeholder
for X instead of actual training data.
y – array-like, shape (n_samples,)
The target variable for supervised learning problems.
Stratification is done based on the y labels.
groups – (object) Always ignored, exists for compatibility.
Returns:
(ndarray) The training set indices for that split.
test : (ndarray) The testing set indices for that split.
Return type:
train
nltools.cross_validation.set_cv(Y=None, cv_dict=None, return_generator=True)[source]¶
Helper function to create a sci-kit learn compatible cv object using
common parameters for prediction analyses.
Parameters:
Y – (pd.DataFrame) Pandas Dataframe of Y labels
cv_dict – (dict) Type of cross_validation to use. A dictionary of
{‘type’: ‘kfolds’, ‘n_folds’: n},
{‘type’: ‘kfolds’, ‘n_folds’: n, ‘stratified’: Y},
{‘type’: ‘kfolds’, ‘n_folds’: n, ‘subject_id’: holdout}, or
{‘type’: ‘loso’, ‘subject_id’: holdout}
return_generator (bool) – return a cv generator instead of an instance; default True
Returns:
a scikit-learn model-selection generator
Return type:
class nltools.cross_validation.KFoldStratified(n_splits=3, shuffle=False, random_state=None)[source]¶
K-Folds cross validation iterator which stratifies continuous data
(unlike scikit-learn equivalent).
Provides train/test indices to split data in train test sets. Split
dataset into k consecutive folds while ensuring that same subject is
held out within each fold.  Each fold is then used a validation set
once while the k - 1 remaining folds form the training set.
Extension of KFold from scikit-learn cross_validation model
Parameters:
n_splits – int, default=3
Number of folds. Must be at least 2.
shuffle – boolean, optional
Whether to shuffle the data before splitting into batches.
random_state – None, int or RandomState
Pseudo-random number generator state used for random
sampling. If None, use default numpy RNG for shuffling
split(X, y, groups=None)[source]¶
Generate indices to split data into training and test set.
Parameters:
X – array-like, shape (n_samples, n_features)
Training data, where n_samples is the number of samples
and n_features is the number of features.
Note that providing y is sufficient to generate the splits
and hence np.zeros(n_samples) may be used as a placeholder
for X instead of actual training data.
y – array-like, shape (n_samples,)
The target variable for supervised learning problems.
Stratification is done based on the y labels.
groups – (object) Always ignored, exists for compatibility.
Returns:
(ndarray) The training set indices for that split.
test : (ndarray) The testing set indices for that split.
Return type:
train
mask – nibabel or Brain_Data instance
custom_mask – nibabel instance or string to file path; optional
Returns:
Brain_Data instance of a mask with different integers indicating
different masks
nltools.mask.create_sphere(coordinates, radius=5, mask=None)[source]¶
Generate a set of spheres in the brain mask space
Parameters:
radius – vector of radius.  Will create multiple spheres if
len(radius) > 1
centers – a vector of sphere centers of the form [px, py, pz] or
[[px1, py1, pz1], …, [pxn, pyn, pzn]]
nltools.mask.expand_mask(mask, custom_mask=None)[source]¶
expand a mask with multiple integers into separate binary masks




    

Parameters:
mask – nibabel or Brain_Data instance
custom_mask – nibabel instance or string to file path; optional
Returns:
Brain_Data instance of multiple binary masks
Return type:
nltools.mask.roi_to_brain(data, mask_x)[source]¶
This function will create convert an expanded binary mask of ROIs
(see expand_mask) based on a vector of of values. The dataframe of values
must correspond to ROI numbers.
This is useful for populating a parcellation scheme by a vector of Values
Parameters:
data – Pandas series, dataframe, list, np.array of ROI by observation
mask_x – an expanded binary mask
Returns:
(Brain_Data) Brain_Data instance where each ROI is now populated
with a value
nltools.file_reader.onsets_to_dm(F, sampling_freq, run_length, header='infer', sort=False, keep_separate=True, add_poly=None, unique_cols=None, fill_na=None, **kwargs)[source]¶
This function can assist in reading in one or several in a 2-3 column onsets files, specified in seconds and converting it to a Design Matrix organized as samples X Stimulus Classes. sampling_freq should be specified in hertz; for TRs use hertz = 1/TR. Onsets files must be organized with columns in one of the following 4 formats:
‘Stim, Onset’
‘Onset, Stim’
‘Stim, Onset, Duration’
‘Onset, Duration, Stim’
No other file organizations are currently supported. Note: Stimulus offsets (onset + duration) that fall into an adjacent TR include that full TR. E.g. offset of 10.16s with TR = 2 has an offset of TR 5, which spans 10-12s, rather than an offset of TR 4, which spans 8-10s.
Parameters:
F (str/Path/pd.DataFrame) – filepath or pandas dataframe
sampling_freq (float) – samping frequency in hertz, i.e 1 / TR
run_length (int) – run length in number of TRs
header (str/None, optional) – whether there’s an additional header row in the
"infer". (supplied file/dataframe. See pd.read_csv for more details. Defaults to) – 
sort (bool, optional) – whether to sort dataframe columns alphabetically. Defaults to False.
keep_separate (bool, optional) – if a list of files or dataframes is supplied,
True. (whether to create separate polynomial columns per file. Defaults to) – 
add_poly (bool/int, optional) – whether to add Nth order polynomials to design
None. (matrix. Defaults to) – 
unique_cols (list/None, optional) – if a list of files or dataframes is supplied,
file (what additional columns to keep separate per) – 
fill_na (Any, optional) – what to replace NaNs with. Defaults to None (no filling).
Returns:
design matrix organized as TRs x Stims
Return type:
nltools.data.Design_Matrix
nltools.utils.set_algorithm(algorithm, *args, **kwargs)[source]¶
Setup the algorithm to use in subsequent prediction analyses.
Parameters:
algorithm – The prediction algorithm to use. Either a string or an
(uninitialized) scikit-learn prediction object. If string,
must be one of ‘svm’,’svr’, linear’,’logistic’,’lasso’,
‘lassopcr’,’lassoCV’,’ridge’,’ridgeCV’,’ridgeClassifier’,
‘randomforest’, or ‘randomforestClassifier’
kwargs – Additional keyword arguments to pass onto the scikit-learn
clustering object.
Returns:
dictionary of settings for prediction
Return type:
predictor_settings
nltools.utils.set_decomposition_algorithm(algorithm, n_components=None, *args, **kwargs)[source]¶
Setup the algorithm to use in subsequent decomposition analyses.
Parameters:
algorithm – The decomposition algorithm to use. Either a string or an
(uninitialized) scikit-learn decomposition object.
If string must be one of ‘pca’,’nnmf’, ica’,’fa’,
‘dictionary’, ‘kernelpca’.
kwargs – Additional keyword arguments to pass onto the scikit-learn
clustering object.
Returns:
dictionary of settings for prediction
Return type:
predictor_settings
nltools.prefs: Preferences¶
This module can be used to adjust the default MNI template settings that are used
internally by all Brain_Data operations. By default all operations are performed in
MNI152 2mm space. Thus any files loaded with be resampled to this space by default.You can control this on a per-file loading basis using the mask argument of Brain_Data, e.g.
from nltools.data import Brain_Data
# my_brain will be resampled to 2mm
brain = Brain_Data('my_brain.nii.gz')
# my_brain will now be resampled to the same space as my_mask
brain = Brain_Data('my_brain.nii.gz', mask='my_mask.nii.gz') # will be resampled
Alternatively this module can be used to switch between 2mm or 3mm MNI spaces with and without ventricles:
from nltools.prefs import MNI_Template, resolve_mni_path
from nltools.data import Brain_Data
# Update the resolution globally
MNI_Template['resolution'] = '3mm'
# This works too:
MNI_Template.resolution = 3
# my_brain will be resampled to 3mm and future operation will be in 3mm space
brain = Brain_Data('my_brain.nii.gz')
# get the template nifti files
resolve_mni_path(MNI_Template)
# will print like:
    'resolution': '3mm',
    'mask_type': 'with_ventricles',
    'mask': '/Users/Esh/Documents/pypackages/nltools/nltools/resources/MNI152_T1_3mm_brain_mask.nii.gz',
    'plot': '/Users/Esh/Documents/pypackages/nltools/nltools/resources/MNI152_T1_3mm.nii.gz',
    'brain':
    '/Users/Esh/Documents/pypackages/nltools/nltools/resources/MNI152_T1_3mm_brain.nii.gz'
nltools.plotting.dist_from_hyperplane_plot(stats_output)[source]¶
Plot SVM Classification Distance from Hyperplane
Parameters:
stats_output – a pandas file with prediction output
Returns:
Will return a seaborn plot of distance from hyperplane
Return type:
nltools.plotting.plot_between_label_distance(distance, labels, ax=None, permutation_test=True, n_permute=5000, fontsize=18, **kwargs)[source]¶
Create a heatmap indicating average between label distance
Parameters:
distance – (pandas dataframe) brain_distance matrix
labels – (pandas dataframe) group labels
ax – axis to plot (default=None)
permutation_test – (boolean)
n_permute – (int) number of samples for permuation test
fontsize – (int) size of font for plot
Returns:
heatmap
out: pandas dataframe of pairwise distance between conditions
within_dist_out: average pairwise distance matrix
mn_dist_out: (optional if permutation_test=True) average difference in distance between conditions
p_dist_out: (optional if permutation_test=True) p-value for difference in distance between conditions
Return type:
nltools.plotting.plot_brain(objIn, how='full', thr_upper=None, thr_lower=None, save=False, **




    
kwargs)[source]¶
More complete brain plotting of a Brain_Data instance
Parameters:
obj (Brain_Data) – object to plot
how (str) – whether to plot a glass brain ‘glass’, 3 view-multi-slice mni ‘mni’, or both ‘full’
thr_upper (str/float) – thresholding of image. Can be string for percentage, or float for data units (see Brain_Data.threshold()
thr_lower (str/float) – thresholding of image. Can be string for percentage, or float for data units (see Brain_Data.threshold()
save (str) – if a string file name or path is provided plots will be saved into this directory appended with the orientation they belong to
kwargs – optionals args to nilearn plot functions (e.g. vmax)
nltools.plotting.plot_interactive_brain(brain, threshold=1e-06, surface=False, percentile_threshold=False, anatomical=None, **kwargs)[source]¶
This function leverages nilearn’s new javascript based brain viewer functions to create interactive plotting functionality.
Parameters:
brain (nltools.Brain_Data) – a Brain_Data instance of 1d or 2d shape (i.e. 3d or 4d volume)
threshold (float/str) – threshold to initialize the visualization, maybe be a percentile string; default 0
surface (bool) – whether to create a surface-based plot; default False
percentile_threshold (bool) – whether to interpret threshold values as percentiles
kwargs – optional arguments to nilearn.view_img or nilearn.view_img_on_surf
Returns:
interactive brain viewer widget
nltools.plotting.plot_mean_label_distance(distance, labels, ax=None, permutation_test=False, n_permute=5000, fontsize=18, **kwargs)[source]¶
Create a violin plot indicating within and between label distance.
Parameters:
distance – pandas dataframe of distance
labels – labels indicating columns and rows to group
ax – matplotlib axis to plot on
permutation_test – (bool) indicates whether to run permuatation test or not
n_permute – (int) number of permutations to run
fontsize – (int) fontsize for plot labels
Returns:
heatmap
stats: (optional if permutation_test=True) permutation results
Return type:
nltools.plotting.plot_silhouette(distance, labels, ax=None, permutation_test=True, n_permute=5000, **kwargs)[source]¶
Create a silhouette plot indicating between relative to within label distance
Parameters:
distance – (pandas dataframe) brain_distance matrix
labels – (pandas dataframe) group labels
ax – axis to plot (default=None)
permutation_test – (boolean)
n_permute – (int) number of samples for permuation test
heatmap
# out: pandas dataframe of pairwise distance between conditions
# within_dist_out: average pairwise distance matrix
# mn_dist_out: (optional if permutation_test=True) average difference in distance between conditions
# p_dist_out: (optional if permutation_test=True) p-value for difference in distance between conditions
Return type:
nltools.plotting.plot_stacked_adjacency(adjacency1, adjacency2, normalize=True, **kwargs)[source]¶
Create stacked adjacency to illustrate similarity.
Parameters:
matrix1 – Adjacency instance 1
matrix2 – Adjacency instance 2
normalize – (boolean) Normalize matrices.
Returns:
matplotlib figure
nltools.plotting.plot_t_brain(objIn, how='full', thr='unc', alpha=None, nperm=None, cut_coords=[], **kwargs)[source]¶
Takes a brain data object and computes a 1 sample t-test across it’s first axis. If a list is provided will compute difference between brain data objects in list (i.e. paired samples t-test).
:param objIn: if list will compute difference map first
:type objIn: list/Brain_Data
:param how: whether to plot a glass brain ‘glass’, 3 view-multi-slice mni ‘mni’, or both ‘full’
:type how: list
:param thr: what method to use for multiple comparisons correction unc, fdr, or tfce
:type thr: str
:param alpha: p-value threshold
:type alpha: float
:param nperm: number of permutations for tcfe; default 1000
:type nperm: int
:param cut_coords: x,y,z coords to plot brain slice
:type cut_coords: list
:param kwargs: optionals args to nilearn plot functions (e.g. vmax)
nltools.plotting.probability_plot(stats_output)[source]¶
Plot Classification Probability
Parameters:
stats_output – a pandas file with prediction output
Returns:
Will return a seaborn scatterplot
Return type:
class nltools.simulator.Simulator(brain_mask=None, output_dir=None, random_state=None)[source]¶
create_cov_data(cor, cov, sigma, mask=None, reps=1, n_sub=1, output_dir=None)[source]¶
create continuous simulated data with covariance
Parameters:
cor – amount of covariance between each voxel and Y variable
cov – amount of covariance between voxels
sigma – amount of noise to add
radius – vector of radius.  Will create multiple spheres if len(radius) > 1
center – center(s) of sphere(s) of the form [px, py, pz] or [[px1, py1, pz1], …, [pxn, pyn, pzn]]
reps – number of data repetitions
n_sub – number of subjects to simulate
output_dir – string path of directory to output data.  If None, no data will be written
**kwargs – Additional keyword arguments to pass to the prediction algorithm
create_data(levels, sigma, radius=5, center=None, reps=1, output_dir=None)[source]¶
create simulated data with integers
Parameters:
levels – vector of intensities or class labels
sigma – amount of noise to add
radius – vector of radius.  Will create multiple spheres if len(radius) > 1
center – center(s) of sphere(s) of the form [px, py, pz] or [[px1, py1, pz1], …, [pxn, pyn, pzn]]
reps – number of data repetitions useful for trials or subjects
output_dir – string path of directory to output data.  If None, no data will be written
**kwargs – Additional keyword arguments to pass to the prediction algorithm
create_ncov_data(cor, cov, sigma, masks=None, reps=1, n_sub=1, output_dir=None)[source]¶
create continuous simulated data with covariance
Parameters:
cor – amount of covariance between each voxel and Y variable (an int or a vector)
cov – amount of covariance between voxels (an int or a matrix)
sigma – amount of noise to add
mask – region(s) where we will have activations (list if more than one)
reps – number of data repetitions
n_sub – number of subjects to simulate
output_dir – string path of directory to output data.  If None, no data will be written
**kwargs – Additional keyword arguments to pass to the prediction algorithm
gaussian(mu, sigma, i_tot)[source]¶
create a 3D gaussian signal normalized to a given intensity
Parameters:
mu – average value of the gaussian signal (usually set to 0)
sigma – standard deviation
i_tot – sum total of activation (numerical integral over the gaussian returns this value)
n_spheres(radius, center)[source]¶
generate a set of spheres in the brain mask space
Parameters:
radius – vector of radius.  Will create multiple spheres if len(radius) > 1
centers – a vector of sphere centers of the form [px, py, pz] or [[px1, py1, pz1], …, [pxn, pyn, pzn]]
normal_noise(mu, sigma)[source]¶
produce a normal noise distribution for all all points in the brain mask
Parameters:
mu – average value of the gaussian signal (usually set to 0)
sigma – standard deviation
sphere(r, p)[source]¶
create a sphere of given radius at some point p in the brain mask
Parameters:
r – radius of the sphere
p – point (in coordinates of the brain mask) of the center of the sphere
to_nifti(m)[source]¶
convert a numpy matrix to the nifti format and assign to it the brain_mask’s affine matrix
Parameters:
m – the 3D numpy matrix we wish to convert to .nii
API Reference ¶

nltools.data : Data Types ¶

nltools.prefs: Preferences¶

`nltools.data` : Data Types ¶

`nltools.prefs`: Preferences¶