• nltools.cross_validation : Cross-Validation Tools
    • Cross-Validation Data Classes
    • KFoldStratified
      • KFoldStratified.split()
      • set_cv()
      • KFoldStratified
        • KFoldStratified.split()
        • API Reference

          This reference provides detailed documentation for all modules, classes, and methods in the current release of Neurolearn.

          nltools.data : Data Types

          class nltools.data. Brain_Data ( data = None , Y = None , X = None , mask = None , ** kwargs ) [source]

          Brain_Data is a class to represent neuroimaging data in python as a vector rather than a 3-dimensional matrix.This makes it easier to perform data manipulation and analyses.

          Parameters :
        • data – nibabel data instance or list of files

        • Y – Pandas DataFrame of training labels

        • X – Pandas DataFrame Design Matrix for running univariate models

        • mask – binary nifiti file to mask brain data

        • **kwargs – Additional keyword arguments to pass to the prediction algorithm

        • align ( target , method = 'procrustes' , axis = 0 , * args , ** kwargs ) [source]

          Align Brain_Data instance to target object using functional alignment

          Alignment type can be hyperalignment or Shared Response Model. When using hyperalignment, target image can be another subject or an already estimated common model. When using SRM, target must be a previously estimated common model stored as a numpy array. Transformed data can be back projected to original data using Tranformation matrix.

          See nltools.stats.align for aligning multiple Brain_Data instances

          Parameters :
        • target – (Brain_Data) object to align to.

        • method – (str) alignment method to use [‘probabilistic_srm’,’deterministic_srm’,’procrustes’]

        • axis – (int) axis to align on

        • Returns :
          (dict) a dictionary containing transformed object,

          transformation matrix, and the shared response matrix

        • data – (Brain_Data) Brain_Data instance to append

        • kwargs – optional inputs to Design_Matrix append

        • Returns :

          (Brain_Data) new appended Brain_Data instance

          Return type :
          apply_mask ( mask , resample_mask_to_brain = False ) [source]

          Mask Brain_Data instance

          Note target data will be resampled into the same space as the mask. If you would like the mask resampled into the Brain_Data space, then set resample_mask_to_brain=True.

          Parameters :
        • mask – (Brain_Data or nifti object) mask to apply to Brain_Data object.

        • resample_mask_to_brain – (bool) Will resample mask to brain space before applying mask (default=False).

        • Returns :

          (Brain_Data) masked Brain_Data object

          Return type :

          masked

          bootstrap ( function , n_samples = 5000 , save_weights = False , n_jobs = -1 , random_state = None , * args , ** kwargs ) [source]

          Bootstrap a Brain_Data method.

          Parameters :
        • function – (str) method to apply to data for each bootstrap

        • n_samples – (int) number of samples to bootstrap with replacement

        • save_weights – (bool) Save each bootstrap iteration (useful for aggregating

        • cluster ) ( many bootstraps on a ) –

        • n_jobs – (int) The number of CPUs to use to do the computation. -1 means all

        • CPUs.Returns

        • Returns :

          summarized studentized bootstrap output

          Return type :

          output

          Examples

          >>>  b = dat
          
          
          
          
              
          .bootstrap('mean', n_samples=5000)
          >>>  b = dat.bootstrap('predict', n_samples=5000, algorithm='ridge')
          >>>  b = dat.bootstrap('predict', n_samples=5000, save_weights=True)
          decompose(algorithm='pca', axis='voxels', n_components=None, *args, **kwargs)[source]
          

          Decompose Brain_Data object

          Parameters:
        • algorithm – (str) Algorithm to perform decomposition types=[‘pca’,’ica’,’nnmf’,’fa’,’dictionary’,’kernelpca’]

        • axis – dimension to decompose [‘voxels’,’images’]

        • n_components – (int) number of components. If None then retain as many as possible.

        • Returns:

          a dictionary of decomposition parameters

          Return type:

          output

          distance(metric='euclidean', **kwargs)[source]

          Calculate distance between images within a Brain_Data() instance.

          Parameters:

          metric – (str) type of distance metric (can use any scikit learn or sciypy metric)

          Returns:

          (Adjacency) Outputs a 2D distance matrix.

          Return type:
          extract_roi(mask, metric='mean', n_components=None)[source]

          Extract activity from mask

          Parameters:
        • mask – (nifti) nibabel mask can be binary or numbered for different rois

        • metric – type of extraction method [‘mean’, ‘median’, ‘pca’], (default=mean) NOTE: Only mean currently works!

        • n_components – if metric=’pca’, number of components to return (takes any input into sklearn.Decomposition.PCA)

        • Returns:

          mean within each ROI across images

          Return type:
          filter(sampling_freq=None, high_pass=None, low_pass=None, **kwargs)[source]

          Apply 5th order butterworth filter to data. Wraps nilearn functionality. Does not default to detrending and standardizing like nilearn implementation, but this can be overridden using kwargs.

          Parameters:
        • sampling_freq – sampling freq in hertz (i.e. 1 / TR)

        • high_pass – high pass cutoff frequency

        • low_pass – low pass cutoff frequency

        • kwargs – other keyword arguments to nilearn.signal.clean

        • Returns:

          Filtered Brain_Data instance

          Return type:

          Brain_Data

          find_spikes(global_spike_cutoff=3, diff_spike_cutoff=3)[source]

          Function to identify spikes from Time Series Data

          Parameters:
        • global_spike_cutoff – (int,None) cutoff to identify spikes in global signal in standard deviations, None indicates do not calculate.

        • diff_spike_cutoff – (int,None) cutoff to identify spikes in average frame difference in standard deviations, None indicates do not calculate.

        • Returns:

          pandas dataframe with spikes as indicator variables

          icc(icc_type='icc2')[source]
          Calculate intraclass correlation coefficient for data within

          Brain_Data class

          ICC Formulas are based on: Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: uses in assessing rater reliability. Psychological bulletin, 86(2), 420.

          icc1: x_ij = mu + beta_j + w_ij icc2/3: x_ij = mu + alpha_i + beta_j + (ab)_ij + epsilon_ij

          Code modifed from nipype algorithms.icc https://github.com/nipy/nipype/blob/master/nipype/algorithms/icc.py

          Parameters:

          icc_type – type of icc to calculate (icc: voxel random effect, icc2: voxel and column random effect, icc3: voxel and column fixed effect)

          Returns:

          (np.array) intraclass correlation coefficient

          Return type:
          iplot(threshold=0, surface=False, anatomical=None, **kwargs)[source]

          Create an interactive brain viewer for the current brain data instance.

          Parameters:
        • threshold – (float/str) two-sided threshold to initialize the visualization, maybe be a percentile string; default 0

        • surface – (bool) whether to create a surface-based plot; default False

        • anatomical – nifti image or filename to overlay

        • kwargs – optional arguments to nilearn.view_img or nilearn.view_img_on_surf

        • Returns:

          interactive brain viewer widget

          multivariate_similarity(images, method='ols')[source]

          Predict spatial distribution of Brain_Data() instance from linear combination of other Brain_Data() instances or Nibabel images

          Parameters:
        • self – Brain_Data instance of data to be applied

        • images – Brain_Data instance of weight map

        • Returns:
          dictionary of regression statistics in Brain_Data

          instances {‘beta’,’t’,’p’,’df’,’residual’}

          plot(limit=5, anatomical=None, view='axial', colorbar=False, black_bg=True, draw_cross=False, threshold_upper=None, threshold_lower=None, figsize=(15, 2), axes=None, **kwargs)[source]

          Create a quick plot of self.data. Will plot each image separately

          Parameters:
        • limit – (int) max number of images to return

        • anatomical – (nifti, str) nifti image or file name to overlay

        • view – (str) ‘axial’ for limit number of axial slices; ‘glass’ for ortho-view glass brain; ‘mni’ for multi-slice view mni brain; ‘full’ for both glass and mni views

        • threshold_upper – (str/float) threshold if view is ‘glass’, ‘mni’, or ‘full’

        • threshold_lower – (str/float)threshold if view is ‘glass’, ‘mni’, or ‘full’

        • save – (str/bool): optional string file name or path for saving; only applies if view is ‘mni’, ‘glass’, or ‘full’. Filenames will appended with the orientation they belong to

        • predict(algorithm=None, cv_dict=None, plot=True, verbose=True, **kwargs)[source]

          Run prediction

          Parameters:
        • algorithm – Algorithm to use for prediction. Must be one of ‘svm’, ‘svr’, ‘linear’, ‘logistic’, ‘lasso’, ‘ridge’, ‘ridgeClassifier’,’pcr’, or ‘lassopcr’

        • cv_dict – Type of cross_validation to use. A dictionary of {‘type’: ‘kfolds’, ‘n_folds’: n}, {‘type’: ‘kfolds’, ‘n_folds’: n, ‘stratified’: Y}, {‘type’: ‘kfolds’, ‘n_folds’: n, ‘subject_id’: holdout}, or {‘type’: ‘loso’, ‘subject_id’: holdout} where ‘n’ = number of folds, and ‘holdout’ = vector of subject ids that corresponds to self.Y

        • plot – Boolean indicating whether or not to create plots.

        • verbose (bool) – print performance; Default True

        • **kwargs – Additional keyword arguments to pass to the prediction algorithm

        • Returns:

          a dictionary of prediction parameters

          Return type:

          output

          predict_multi(algorithm=None, cv_dict=None, method='searchlight', rois=None, process_mask=None, radius=2.0, scoring=None, n_jobs=1, verbose=0, **kwargs)[source]

          Perform multi-region prediction. This can be a searchlight analysis or multi-roi analysis if provided a Brain_Data instance with labeled non-overlapping rois.

          Parameters:
        • algorithm (string) – algorithm to use for prediction Must be one of ‘svm’, ‘svr’, ‘linear’, ‘logistic’, ‘lasso’, ‘ridge’, ‘ridgeClassifier’,’pcr’, or ‘lassopcr’

        • cv_dict – Type of cross_validation to use. Default is 3-fold. A dictionary of {‘type’: ‘kfolds’, ‘n_folds’: n}, {‘type’: ‘kfolds’, ‘n_folds’: n, ‘stratified’: Y}, {‘type’: ‘kfolds’, ‘n_folds’: n, ‘subject_id’: holdout}, or {‘type’: ‘loso’, ‘subject_id’: holdout} where ‘n’ = number of folds, and ‘holdout’ = vector of subject ids that corresponds to self.Y

        • method (string) – one of ‘searchlight’ or ‘roi’

        • rois (string/nltools.Brain_Data) – nifti file path or Brain_data instance containing non-overlapping regions-of-interest labeled by integers

        • process_mask (nib.Nifti1Image/nltools.Brain_Data) – mask to constrain where to perform analyses; only applied if method = ‘searchlight’

        • radius (float) – radius of searchlight in mm; default 2mm

        • scoring (function) – callable scoring function; see sklearn documentation; defaults to estimator’s default scoring function

        • n_jobs (int) – The number of CPUs to use to do permutation; default 1 because this can be very memory intensive

        • verbose (int) – whether parallelization progress should be printed; default 0

        • Returns:

          image of results

          Return type:

          output

          randomise(n_permute=5000, threshold_dict=None, return_mask=False, **kwargs)[source]

          Run mass-univariate regression at each voxel with inference performed via permutation testing ala randomise in FSL. Operates just like .regress(), but intended to be used for second-level analyses.

          Parameters:
        • n_permute (int) – number of permutations

        • threshold_dict – (dict) a dictionary of threshold parameters {‘unc’:.001} or {‘fdr’:.05}

        • return_mask – (bool) optionally return the thresholding mask

        • Returns:

          dictionary of maps for betas, tstats, and pvalues

          Return type:
          regions(min_region_size=1350, extract_type='local_regions', smoothing_fwhm=6, is_mask=False)[source]

          Extract brain connected regions into separate regions.

          Parameters:
        • min_region_size (int) – Minimum volume in mm3 for a region to be kept.

        • extract_type (str) – Type of extraction method [‘connected_components’, ‘local_regions’]. If ‘connected_components’, each component/region in the image is extracted automatically by labelling each region based upon the presence of unique features in their respective regions. If ‘local_regions’, each component/region is extracted based on their maximum peak value to define a seed marker and then using random walker segementation algorithm on these markers for region separation.

        • smoothing_fwhm (scalar) – Smooth an image to extract more sparser regions. Only works for extract_type ‘local_regions’.

        • is_mask (bool) – Whether the Brain_Data instance should be treated as a boolean mask and if so, calls connected_label_regions instead.

        • Returns:

          Brain_Data instance with extracted ROIs as data.

          Return type:

          Brain_Data

          regress(mode='ols', **kwargs)[source]

          Run a mass-univariate regression across voxels. Three types of regressions can be run: 1) Standard OLS (default) 2) Robust OLS (heteroscedasticty and/or auto-correlation robust errors), i.e. OLS with “sandwich estimators” 3) ARMA (auto-regressive and moving-average lags = 1 by default; experimental)

          For more information see the help for nltools.stats.regress

          ARMA notes: This experimental mode is similar to AFNI’s 3dREMLFit but without spatial smoothing of voxel auto-correlation estimates. It can be very computationally intensive so parallelization is used by default to try to speed things up. Speed is limited because a unique ARMA model is fit to each voxel (like AFNI/FSL), but unlike SPM, which assumes the same AR parameters (~0.2) at each voxel. While coefficient results are typically very similar to OLS, std-errors and so t-stats, dfs and and p-vals can differ greatly depending on how much auto-correlation is explaining the response in a voxel relative to other regressors in the design matrix.

          Parameters:
        • mode (str) – kind of model to fit; must be one of ‘ols’ (default), ‘robust’, or ‘arma’

        • kwargs (dict) – keyword arguments to nltools.stats.regress

        • Returns:
          dictionary of regression statistics in Brain_Data instances

          {‘beta’,’t’,’p’,’df’,’residual’}

          scale(scale_val=100.0)[source]
          Scale all values such that they are on the range [0, scale_val],

          via grand-mean scaling. This is NOT global-scaling/intensity normalization. This is useful for ensuring that data is on a common scale (e.g. good for multiple runs, participants, etc) and if the default value of 100 is used, can be interpreted as something akin to (but not exactly) “percent signal change.” This is consistent with default behavior in AFNI and SPM. Change this value to 10000 to make consistent with FSL.

          Parameters:

          scale_val – (int/float) what value to send the grand-mean to; default 100

          similarity(image, method='correlation')[source]

          Calculate similarity of Brain_Data() instance with single Brain_Data or Nibabel image

          Parameters:
        • image – (Brain_Data, nifti) image to evaluate similarity

        • method – (str) Type of similarity [‘correlation’,’dot_product’,’cosine’]

        • Returns:

          (list) Outputs a vector of pattern expression values

          Return type:
          standardize(axis=0, method='center')[source]

          Standardize Brain_Data() instance.

          Parameters:
        • axis – 0 for observations 1 for voxels

        • method – [‘center’,’zscore’]

        • Returns:

          Brain_Data Instance

          temporal_resample(sampling_freq=None, target=None, target_type='hz')[source]

          Resample Brain_Data timeseries to a new target frequency or number of samples using Piecewise Cubic Hermite Interpolating Polynomial (PCHIP) interpolation. This function can up- or down-sample data.

          Note: this function can use quite a bit of RAM.

          Parameters:
        • sampling_freq – (float) sampling frequency of data in hertz

        • target – (float) upsampling target

        • target_type – (str) type of target can be [samples,seconds,hz]

        • Returns:

          upsampled Brain_Data instance

          threshold(upper=None, lower=None, binarize=False , coerce_nan=True)[source]
          Threshold Brain_Data instance. Provide upper and lower values or

          percentages to perform two-sided thresholding. Binarize will return a mask image respecting thresholds if provided, otherwise respecting every non-zero value.

          Parameters:
        • upper – (float or str) Upper cutoff for thresholding. If string will interpret as percentile; can be None for one-sided thresholding.

        • lower – (float or str) Lower cutoff for thresholding. If string will interpret as percentile; can be None for one-sided thresholding.

        • binarize (bool) – return binarized image respecting thresholds if provided, otherwise binarize on every non-zero value; default False

        • coerce_nan (bool) – coerce nan values to 0s; default True

        • Returns:

          Thresholded Brain_Data object.

          transform_pairwise()[source]

          Extract brain connected regions into separate regions.

          Args:

          Returns:

          Brain_Data instance tranformed into pairwise comparisons

          Return type:

          Brain_Data

          ttest(threshold_dict=None, return_mask=False)[source]

          Calculate one sample t-test across each voxel (two-sided)

          Parameters:
        • threshold_dict – (dict) a dictionary of threshold parameters {‘unc’:.001} or {‘fdr’:.05}

        • return_mask – (bool) if thresholding is requested, optionall return the mask of voxels that exceed threshold, e.g. for use with another map

        • Returns:
          (dict) dictionary of regression statistics in Brain_Data

          instances {‘t’,’p’}

          upload_neurovault(access_token=None, collection_name=None, collection_id=None, img_type=None, img_modality=None, **kwargs)[source]
          Upload Data to Neurovault. Will add any columns in self.X to image

          metadata. Index will be used as image name.

          Parameters:
        • access_token – (str, Required) Neurovault api access token

        • collection_name – (str, Optional) name of new collection to create

        • collection_id – (int, Optional) neurovault collection_id if adding images to existing collection

        • img_type – (str, Required) Neurovault map_type

        • img_modality – (str, Required) Neurovault image modality

        • Returns:

          (pd.DataFrame) neurovault collection information

          Return type:

          collection

          class nltools.data.Adjacency(data=None, Y=None, matrix_type=None, labels=None, **kwargs)[source]

          Adjacency is a class to represent Adjacency matrices as a vector rather than a 2-dimensional matrix. This makes it easier to perform data manipulation and analyses.

          Parameters:
        • data – pandas data instance or list of files

        • matrix_type – (str) type of matrix. Possible values include: [‘distance’,’similarity’,’directed’,’distance_flat’, ‘similarity_flat’,’directed_flat’]

        • Y – Pandas DataFrame of training labels

        • **kwargs – Additional keyword arguments

        • bootstrap(function, n_samples=5000, save_weights=False, n_jobs=-1, random_state=None, *args, **kwargs)[source]

          Bootstrap an Adjacency method.

          Parameters:
        • function – (str) method to apply to data for each bootstrap

        • n_samples – (int) number of samples to bootstrap with replacement

        • save_weights – (bool) Save each bootstrap iteration (useful for aggregating many bootstraps on a cluster)

        • n_jobs – (int) The number of CPUs to use to do the computation. -1 means all CPUs.Returns:

        • Returns:

          summarized studentized bootstrap output

          Examples

          >>>  b = dat.bootstrap('mean', n_samples=5000)
          >>>  b = dat.bootstrap('predict', n_samples=5000, algorithm='ridge')
          >>>  b = dat.bootstrap('predict', n_samples=5000, save_weights=True)
          cluster_summary(clusters=None, metric='mean', summary='within')[source]
          

          This function provides summaries of clusters within Adjacency matrices.

          It can compute mean/median of within and between cluster values. Requires a list of cluster ids indicating the row/column of each cluster.

          Parameters:
        • clusters – (list) list of cluster labels

        • metric – (str) method to summarize mean or median. If ‘None” then return all r values

        • summary – (str) summarize within cluster or between clusters

        • Returns:

          (dict) within cluster means

          Return type:
          distance(metric='correlation', **kwargs)[source]

          Calculate distance between images within an Adjacency() instance.

          Parameters:

          metric – (str) type of distance metric (can use any scikit learn or sciypy metric)

          Returns:

          (Adjacency) Outputs a 2D distance matrix.

          Return type:
          distance_to_similarity(metric='correlation', beta=1)[source]

          Convert distance matrix to similarity matrix.

          Note: currently only implemented for correlation and euclidean.

          Parameters:
        • metric – (str) Can only be correlation or euclidean

        • beta – (float) parameter to scale exponential function (default: 1) for euclidean

        • Returns:

          (Adjacency) Adjacency object

          Return type:
          generate_permutations(n_perm, random_state=None)[source]

          Generate n_perm permutated versions of Adjacency in a lazy fashion. Useful for iterating against.

          Parameters:
        • n_perm (int) – number of permutations

        • random_state (int, np.random.seed, optional) – random seed for reproducibility. Defaults to None.

        • Examples

          >>> for perm in adj.generate_permutations(1000):
          >>>     out = neural_distance_mat.similarity(perm)
          
          Yields:

          Adjacency – permuted version of self

          isc(n_samples= 5000, metric='median', ci_percentile=95, exclude_self_corr=True, return_null=False, tail=2, n_jobs=-1, random_state=None)[source]

          Compute intersubject correlation.

          This implementation uses the subject-wise bootstrap method from Chen et al., 2016. Instead of recomputing the pairwise ISC using circle_shift or phase_randomization methods, this approach uses the computationally more efficient method of bootstrapping the subjects and computing a new pairwise similarity matrix with randomly selected subjects with replacement. If the same subject is selected multiple times, we set the perfect correlation to a nan with (exclude_self_corr=True). As recommended by Chen et al., 2016, we compute the median pairwise ISC by default. However, if the mean is preferred, we compute the mean correlation after performing the fisher r-to-z transformation and then convert back to correlations to minimize artificially inflating the correlation values. We compute the p-values using the percentile method using the same method in Brainiak.

          Chen, G., Shin, Y. W., Taylor, P. A., Glen, D. R., Reynolds, R. C., Israel, R. B., & Cox, R. W. (2016). Untangling the relatedness among correlations, part I: nonparametric approaches to inter-subject correlation analysis at the group level. NeuroImage, 142, 248-259.

          Hall, P., & Wilson, S. R. (1991). Two guidelines for bootstrap hypothesis testing. Biometrics, 757-762.

          Parameters:
        • n_bootstraps – (int) number of bootstraps

        • metric – (str) type of association metric [‘spearman’,’pearson’,’kendall’]

        • tail – (int) either 1 for one-tail or 2 for two-tailed test (default: 2)

        • n_jobs – (int) The number of CPUs to use to do the computation. -1 means all CPUs.

        • return_parms – (bool) Return the permutation distribution along with the p-value; default False

        • Returns:

          (dict) dictionary of permutation results [‘correlation’,’p’]

          Return type:

          stats

          isc_group(group, n_samples=5000, metric='median', method='permute', ci_percentile=95, exclude_self_corr=True, return_null=False, tail=2, n_jobs=-1, random_state=None)[source]

          Compute intersubject correlation differences between groups.

          This function computes pairwise intersubject correlations (ISC) using the median as recommended by Chen et al., 2016). However, if the mean is preferred, we compute the mean correlation after performing the fisher r-to-z transformation and then convert back to correlations to minimize artificially inflating the correlation values.

          There are currently two different methods to compute p-values. By default, we use the subject-wise permutation method recommended Chen et al., 2016. This method combines the two groups and computes pairwise similarity both within and between the groups. Then the group labels are permuted and the mean difference between the two groups are recomputed to generate a null distribution. The second method uses subject-wise bootstrapping, where a new pairwise similarity matrix with randomly selected subjects with replacement is created separately for each group and the ISC difference between these groups is used to generate a null distribution. If the same subject is selected multiple times, we set the perfect correlation to a nan with (exclude_self_corr=True). We compute the p-values using the percentile method (Hall & Wilson, 1991).

          Chen, G., Shin, Y. W., Taylor, P. A., Glen, D. R., Reynolds, R. C., Israel, R. B., & Cox, R. W. (2016). Untangling the relatedness among correlations, part I: nonparametric approaches to inter-subject correlation analysis at the group level. NeuroImage, 142, 248-259.

          Hall, P., & Wilson, S. R. (1991). Two guidelines for bootstrap hypothesis testing. Biometrics, 757-762.

          Parameters:
        • group – (np.array) vector of group ids corresponding to subject data in Adjacency instance

        • n_samples – (int) number of samples for permutation or bootstrapping

        • metric – (str) type of isc summary metric [‘mean’,’median’]

        • method – (str) method to compute p-values [‘permute’, ‘bootstrap’] (default: permute)

        • tail – (int) either 1 for one-tail or 2 for two-tailed test (default: 2)

        • n_jobs – (int) The number of CPUs to use to do the computation. -1 means all CPUs.

        • return_null – (bool) Return the permutation distribution along with the p-value; default False

        • Returns:

          (dict) dictionary of permutation results [‘correlation’,’p’]

          Return type:

          stats

          Parameters:

          axis – (int) calculate mean over features (0) or data (1). For data it will be on upper triangle.

          Returns:
          float if single, adjacency if axis=0, np.array if axis=1

          and multiple

          Parameters:

          axis – (int) calculate median over features (0) or data (1). For data it will be on upper triangle.

          Returns:
          float if single, adjacency if axis=0, np.array if axis=1

          and multiple

          plot(limit=3, axes=None, *args, **kwargs)[source]

          Create Heatmap of Adjacency Matrix

          Can pass in any sns.heatmap argument

          Parameters:
        • limit – (int) number of heatmaps to plot if object contains multiple adjacencies (default: 3)

        • axes – matplotlib axis handle

        • plot_label_distance(labels=None, ax=None)[source]

          Create a violin plot indicating within and between label distance

          Parameters:

          labels (np.array) – numpy array of labels to plot

          Returns:

          violin plot handles

          Return type:
          plot_mds(n_components=2, metric=True, labels=None, labels_color=None, cmap=<matplotlib.colors.LinearSegmentedColormap object>, n_jobs=-1, view=(30, 20), figsize=[12, 8], ax=None, *args, **kwargs)[source]

          Plot Multidimensional Scaling

          Parameters:
        • n_components – (int) Number of dimensions to project (can be 2 or 3)

        • metric – (bool) Perform metric or non-metric dimensional scaling; default

        • labels – (list) Can override labels stored in Adjacency Class

        • labels_color – (str) list of colors for labels, if len(1) then make all same color

        • n_jobs – (int) Number of parallel jobs

        • view – (tuple) view for 3-Dimensional plot; default (30,20)

        • plot_silhouette(labels=None, ax=None, permutation_test=True, n_permute=5000, **kwargs)[source]

          Create a silhouette plot

          regress(X, mode='ols', **kwargs)[source]

          Run a regression on an adjacency instance. You can decompose an adjacency instance with another adjacency instance. You can also decompose each pixel by passing a design_matrix instance.

          Parameters:
        • X – Design matrix can be an Adjacency or Design_Matrix instance

        • method – type of regression (default: ols)

        • Returns:

          (dict) dictionary of stats outputs.

          Return type:

          stats

          similarity(data, plot=False, perm_type='2d', n_permute=5000, metric='spearman', ignore_diagonal=False, **kwargs)[source]

          Calculate similarity between two Adjacency matrices. Default is to use spearman correlation and permutation test.

          Parameters:
        • data (Adjacency or array) – Adjacency data, or 1-d array same size as self.data

        • perm_type – (str) ‘1d’,’2d’, or None

        • metric – (str) ‘spearman’,’pearson’,’kendall’

        • ignore_diagonal – (bool) only applies to ‘directed’ Adjacency types using

        • perm_type='1d' (perm_type=None or) –

        • social_relations_model(summarize_results=True, nan_replace=True)[source]

          Estimate the social relations model from a matrix for a round-robin design.

          X_{ij} = m + lpha_i + eta_j + g_{ij} + epsilon_{ijl}

          where X_{ij} is the score for person i rating person j, m is the group mean, lpha_i is person i’s actor effect, eta_j is person j’s partner effect, g_{ij} is the relationship effect and epsilon_{ijl} is the error in measure l for actor i and partner j.

          This model is primarily concerned with partioning the variance of the various effects.

          Code is based on implementation presented in Chapter 8 of Kenny, Kashy, & Cook (2006). Tests replicate examples presented in the book. Note, that this method assumes that actor scores are rows (lower triangle), while partner scores are columnns (upper triangle). The minimal sample size to estimate these effects is 4.

          Model Assumptions:
          • Social interactions are exclusively dyadic

          • People are randomly sampled from population

          • No order effects

          • The effects combine additively and relationships are linear

          • In the future we might update the formulas and standard errors based on Bond and Lashley, 1996

            Parameters:
          • self – (adjacency) can be a single matrix or many matrices for each group

          • summarize_results – (bool) will provide a formatted summary of model results

          • nan_replace – (bool) will replace nan values with row and column means

          • Returns:

            (pd.Series/pd.DataFrame) All of the effects estimated using SRM

            Return type:

            estimated effects

            stats_label_distance(labels=None, n_permute=5000, n_jobs=-1)[source]

            Calculate permutation tests on within and between label distance.

            Parameters:
          • labels (np.array) – numpy array of labels to plot

          • n_permute (int) – number of permutations to run (default=5000)

          • Returns:
            dictionary of within and between group differences

            and p-values

            Parameters:

            axis – (int) calculate std over features (0) or data (1). For data it will be on upper triangle.

            Returns:
            float if single, adjacency if axis=0, np.array if axis=1 and

            multiple

            Parameters:

            axis – (int) calculate mean over features (0) or data (1). For data it will be on upper triangle.

            Returns:
            float if single, adjacency if axis=0, np.array if axis=1

            and multiple

            threshold(upper=None, lower=None, binarize=False)[source]
            Threshold Adjacency instance. Provide upper and lower values or

            percentages to perform two-sided thresholding. Binarize will return a mask image respecting thresholds if provided, otherwise respecting every non-zero value.

            Parameters:
          • upper – (float or str) Upper cutoff for thresholding. If string will interpret as percentile; can be None for one-sided thresholding.

          • lower – (float or str) Lower cutoff for thresholding. If string will interpret as percentile; can be None for one-sided thresholding.

          • binarize (bool) – return binarized image respecting thresholds if provided, otherwise binarize on every non-zero value; default False

          • Returns:

            thresholded Adjacency instance

            Return type:

            Adjacency

            split(data, mask)[source]

            Split Brain_Data instance into separate masks and store as a dictionary.

            class nltools.data.Design_Matrix(*args, **kwargs)[source]

            Design_Matrix is a class to represent design matrices with special methods for data processing (e.g. convolution, upsampling, downsampling) and also intelligent and flexible and intelligent appending (e.g. auto-matically keep certain columns or polynomial terms separated during concatentation). It plays nicely with Brain_Data and can be used to build an experimental design to pass to Brain_Data’s X attribute. It is essentially an enhanced pandas df, with extra attributes and methods. Methods always return a new design matrix instance (copy). Column names are always string types.

            Parameters:
          • sampling_freq (float) – sampling rate of each row in hertz; To covert seconds to hertz (e.g. in the case of TRs for neuroimaging) using hertz = 1 / TR

          • convolved (list, optional) – on what columns convolution has been performed; defaults to None

          • polys (list, optional) – list of polynomial terms in design matrix, e.g. intercept, polynomial trends, basis functions, etc; default None

          • add_dct_basis(duration=180, drop=0)[source]

            Adds unit scaled cosine basis functions to Design_Matrix columns, based on spm-style discrete cosine transform for use in high-pass filtering. Does not add intercept/constant. Care is recommended if using this along with .add_poly(), as some columns will be highly-correlated.

            Parameters:
          • duration (int) – length of filter in seconds

          • drop (int) – index of which early/slow bases to drop if any; will always drop constant (i.e. intercept) like SPM. Unlike SPM, retains first basis (i.e. linear/sigmoidal). Will cumulatively drop bases up to and inclusive of index provided (e.g. 2, drops bases 1 and 2); default None

          • add_poly(order=0, include_lower=True)[source]

            Add nth order Legendre polynomial terms as columns to design matrix. Good for adding constant/intercept to model (order = 0) and accounting for slow-frequency nuisance artifacts e.g. linear, quadratic, etc drifts. Care is recommended when using this with .add_dct_basis() as some columns will be highly correlated.

            Parameters:
          • order (int) – what order terms to add; 0 = constant/intercept (default), 1 = linear, 2 = quadratic, etc

          • include_lower – (bool) whether to add lower order terms if order > 0

          • append(dm, axis=0, keep_separate=True, unique_cols=None, fill_na=0, verbose=False)[source]

            Method for concatenating another design matrix row or column-wise. When concatenating row-wise, has the ability to keep certain columns separated if they exist in multiple design matrices (e.g. keeping separate intercepts for multiple runs). This is on by default and will automatically separate out polynomial columns (i.e. anything added with the add_poly or add_dct_basis methods). Additional columns can be separate by run using the unique_cols parameter. Can also add new polynomial terms during vertical concatentation (when axis == 0). This will by default create new polynomial terms separately for each design matrix

            Parameters:
          • dm (Design_Matrix or list) – design_matrix or list of design_matrices to append

          • axis (int) – 0 for row-wise (vert-cat), 1 for column-wise (horz-cat); default 0

          • keep_separate (bool,optional) – whether try and uniquify columns; defaults to True; only applies when axis==0

          • unique_cols (list,optional) – what additional columns to try to keep separated by uniquifying, only applies when axis = 0; defaults to None

          • fill_na (str/int/float) – if provided will fill NaNs with this value during row-wise appending (when axis = 0) if separate columns are desired; default 0

          • verbose (bool) – print messages during append about how polynomials are going to be separated

          • clean(fill_na=0, exclude_polys=False, thresh=0.95, verbose=True)[source]

            Method to fill NaNs in Design Matrix and remove duplicate columns based on data values, NOT names. Columns are dropped if they are correlated >= the requested threshold (default = .95). In this case, only the first instance of that column will be retained and all others will be dropped.

            Parameters:
          • fill_na (str/int/float) – value to fill NaNs with set to None to retain NaNs; default 0

          • exclude_polys (bool) – whether to skip checking of polynomial terms (i.e. intercept, trends, basis functions); default False

          • thresh (float) – correlation threshold to use to drop redundant columns; default .95

          • verbose (bool) – print what column names were dropped; default True

          • convolve (conv_func='hrf', columns=None)[source]

            Perform convolution using an arbitrary function.

            Parameters:
          • conv_func (ndarray or string) – either a 1d numpy array containing output of a function that you want to convolve; a samples by kernel 2d array of several kernels to convolve; or the string ‘hrf’ which defaults to a glover HRF function at the Design_matrix’s sampling_freq

          • columns (list) – what columns to perform convolution on; defaults to all non-polynomial columns

          • downsample(target, **kwargs)[source]

            Downsample columns of design matrix. Relies on nltools.stats.downsample, but ensures that returned object is a design matrix.

            Parameters:
          • target (float) – desired frequency in hz

          • kwargs – additional inputs to nltools.stats.downsample

          • heatmap(figsize=(8, 6), **kwargs)[source]

            Visualize Design Matrix spm style. Use .plot() for typical pandas plotting functionality. Can pass optional keyword args to seaborn heatmap.

            replace_data(data, column_names=None)[source]

            Convenient method to replace all data in Design_Matrix with new data while keeping attributes and polynomial columns untouched.

            Parameters:

            columns_names (list) – list of columns names for new data

            upsample(target, **kwargs)[source]

            Upsample columns of design matrix. Relies on nltools.stats.upsample, but ensures that returned object is a design matrix.

            Parameters:
          • target (float) – desired frequence in hz

          • kwargs – additional inputs to nltools.stats.downsample

          • vif(exclude_polys=True)[source]

            Compute variance inflation factor amongst columns of design matrix,ignoring polynomial terms. Much faster that statsmodels and more reliable too. Uses the same method as Matlab and R (diagonal elements of the inverted correlation matrix).

            Parameters:

            exclude_polys (bool) – whether to skip checking of polynomial terms (i.e intercept, trends, basis functions); default True

            Returns:

            list with length == number of columns - intercept

            Return type:

            vifs (list)

            zscore(columns=[])[source]

            Z-score specific columns of design matrix. Relies on nltools.stats.downsample, but ensures that returned object is a design matrix.

            Parameters:

            columns (list) – columns to z-score; defaults to all columns

            class nltools.analysis.Roc(input_values=None, binary_outcome=None, threshold_type='optimal_overall', forced_choice=None, **kwargs)[source]

            Roc Class

            The Roc class is based on Tor Wager’s Matlab roc_plot.m function and allows a user to easily run different types of receiver operator characteristic curves. For example, one might be interested in single interval or forced choice.

            Parameters:
          • input_values – nibabel data instance

          • binary_outcome – vector of training labels

          • threshold_type – [‘optimal_overall’, ‘optimal_balanced’, ‘minimum_sdt_bias’]

          • **kwargs – Additional keyword arguments to pass to the prediction algorithm

          • calculate(input_values=None, binary_outcome=None, criterion_values=None, threshold_type='optimal_overall', forced_choice=None, balanced_acc=False)[source]

            Calculate Receiver Operating Characteristic plot (ROC) for single-interval classification.

            Parameters:
          • input_values – nibabel data instance

          • binary_outcome – vector of training labels

          • criterion_values – (optional) criterion values for calculating fpr

          • threshold_type – [‘optimal_overall’, ‘optimal_balanced’, ‘minimum_sdt_bias’]

          • forced_choice – index indicating position for each unique subject (default=None)

          • balanced_acc – balanced accuracy for single-interval classification (bool). THIS IS NOT COMPLETELY IMPLEMENTED BECAUSE IT AFFECTS ACCURACY ESTIMATES, BUT NOT P-VALUES OR THRESHOLD AT WHICH TO EVALUATE SENS/SPEC

          • **kwargs – Additional keyword arguments to pass to the prediction algorithm

          • plot(plot_method='gaussian', balanced_acc=False, **kwargs)[source]

            Create ROC Plot

            Create a specific kind of ROC curve plot, based on input values along a continuous distribution and a binary outcome variable (logical)

            Parameters:
          • plot_method – type of plot [‘gaussian’,’observed’]

          • binary_outcome – vector of training labels

          • **kwargs – Additional keyword arguments to pass to the prediction algorithm

          • Returns:
            nltools.stats.align(data, method='deterministic_srm', n_features=None, axis=0, *args, **kwargs)[source]

            Align subject data into a common response model.

            Can be used to hyperalign source data to target data using Hyperalignment from Dartmouth (i.e., procrustes transformation; see nltools.stats.procrustes) or Shared Response Model from Princeton (see nltools.external.srm). (see nltools.data.Brain_Data.align for aligning a single Brain object to another). Common Model is shared response model or centered target data. Transformed data can be back projected to original data using Tranformation matrix. Inputs must be a list of Brain_Data instances or numpy arrays (observations by features).

            Parameters:
          • data – (list) A list of Brain_Data objects

          • method – (str) alignment method to use [‘probabilistic_srm’,’deterministic_srm’,’procrustes’]

          • n_features – (int) number of features to align to common space. If None then will select number of voxels

          • axis – (int) axis to align on

          • Returns:
            (dict) a dictionary containing a list of transformed subject

            matrices, a list of transformation matrices, the shared response matrix, and the intersubject correlation of the shared resposnes

            nltools.stats.align_states(reference, target, metric='correlation', return_index=False, replace_zero_variance=False)[source]

            Align state weight maps using hungarian algorithm by minimizing pairwise distance between group states.

            Parameters:
          • reference – (np.array) reference pattern x state matrix

          • target – (np.array) target pattern x state matrix to align to reference

          • metric – (str) distance metric to use

          • return_index – (bool) return index if True, return remapped data if False

          • replace_zero_variance – (bool) transform a vector with zero variance to random numbers from a uniform distribution. Useful for when using correlation as a distance metric to avoid NaNs.

          • Returns:

            (list) a list of reordered state X pattern matrices

            Return type:

            ordered_weights

            nltools.stats.calc_bpm(beat_interval, sampling_freq)[source]

            Calculate instantaneous BPM from beat to beat interval

            Parameters:
          • beat_interval – (int) number of samples in between each beat (typically R-R Interval)

          • sampling_freq – (float) sampling frequency in Hz

          • Returns:

            (float) beats per minute for time interval

            Return type:
            nltools.stats.correlation(data1, data2, metric='pearson')[source]

            This function calculates the correlation between data1 and data2

            Parameters:
          • data1 – (np.array) x

          • data2 – (np.array) y

          • metric – (str) type of correlation [“spearman” or “pearson” or “kendall”]

          • Returns:

            (np.array) correlations p: (float) p-value

            Return type:
            nltools.stats.correlation_permutation(data1, data2, method='permute', n_permute=5000, metric='spearman', tail=2, n_jobs=-1, return_perms=False, random_state=None)[source]

            Compute correlation and calculate p-value using permutation methods.

            ‘permute’ method randomly shuffles one of the vectors. This method is recommended for independent data. For timeseries data we recommend using ‘circle_shift’ or ‘phase_randomize’ methods.

            Parameters:
          • data1 – (pd.DataFrame, pd.Series, np.array) dataset 1 to permute

          • data2 – (pd.DataFrame, pd.Series, np.array) dataset 2 to permute

          • n_permute – (int) number of permutations

          • metric – (str) type of association metric [‘spearman’,’pearson’, ‘kendall’]

          • method – (str) type of permutation [‘permute’, ‘circle_shift’, ‘phase_randomize’]

          • random_state – (int, None, or np.random.RandomState) Initial random seed (default: None)

          • tail – (int) either 1 for one-tail or 2 for two-tailed test (default: 2)

          • n_jobs – (int) The number of CPUs to use to do the computation. -1 means all CPUs.

          • return_parms – (bool) Return the permutation distribution along with the p-value; default False

          • Returns:

            (dict) dictionary of permutation results [‘correlation’,’p’]

            Return type:

            stats

            nltools.stats.distance_correlation(x, y, bias_corrected=True, ttest=False)[source]

            Compute the distance correlation betwen 2 arrays to test for multivariate dependence (linear or non-linear). Arrays must match on their first dimension. It’s almost always preferable to compute the bias_corrected version which can also optionally perform a ttest. This ttest operates on a statistic thats ~dcorr^2 and will be also returned.

            Explanation: Distance correlation involves computing the normalized covariance of two centered euclidean distance matrices. Each distance matrix is the euclidean distance between rows (if x or y are 2d) or scalars (if x or y are 1d). Each matrix is centered prior to computing the covariance either using double-centering or u-centering, which corrects for bias as the number of dimensions increases. U-centering is almost always preferred in all cases. It also permits inference of the normalized covariance between each distance matrix using a one-tailed directional t-test. (Szekely & Rizzo, 2013). While distance correlation is normally bounded between 0 and 1, u-centering can produce negative estimates, which are never significant.

            Validated against the dcor and dcor.ttest functions in the ‘energy’ R package and the dcor.distance_correlation, dcor.udistance_correlation_sqr, and dcor.independence.distance_correlation_t_test functions in the dcor Python package.

            Parameters:
          • x (ndarray) – 1d or 2d numpy array of observations by features

          • y (ndarry) – 1d or 2d numpy array of observations by features

          • bias_corrected (bool) – if false use double-centering which produces a biased-estimate that converges to 1 as the number of dimensions increase. Otherwise used u-centering to correct this bias. Note this must be True if ttest=True; default True

          • ttest (bool) – perform a ttest using the bias_corrected distance correlation; default False

          • Returns:

            dictionary of results (correlation, t, p, and df.) Optionally, covariance, x variance, and y variance

            Return type:

            results (dict)

            nltools.stats.downsample(data, sampling_freq=None, target=None, target_type='samples', method='mean')[source]

            Downsample pandas to a new target frequency or number of samples using averaging.

            Parameters:
          • data – (pd.DataFrame, pd.Series) data to downsample

          • sampling_freq – (float) Sampling frequency of data in hertz

          • target – (float) downsampling target

          • target_type – type of target can be [samples,seconds,hz]

          • method – (str) type of downsample method [‘mean’,’median’], default: mean

          • Returns:

            (pd.DataFrame, pd.Series) downsmapled data

            Return type:
            nltools.stats.fdr(p, q=0.05)[source]

            Determine FDR threshold given a p value array and desired false discovery rate q. Written by Tal Yarkoni

            Parameters:
          • p – (np.array) vector of p-values

          • q – (float) false discovery rate level

          • Returns:
            (float) p-value threshold based on independence or positive

            dependence

            nltools.stats.find_spikes(data, global_spike_cutoff=3, diff_spike_cutoff=3)[source]

            Function to identify spikes from fMRI Time Series Data

            Parameters:
          • data – Brain_Data or nibabel instance

          • global_spike_cutoff – (int,None) cutoff to identify spikes in global signal in standard deviations, None indicates do not calculate.

          • diff_spike_cutoff – (int,None) cutoff to identify spikes in average frame difference in standard deviations, None indicates do not calculate.

          • Returns:

            pandas dataframe with spikes as indicator variables

            nltools.stats.holm_bonf(p, alpha=0.05)[source]

            Compute corrected p-values based on the Holm-Bonferroni method, i.e. step-down procedure applying iteratively less correction to highest p-values. A bit more conservative than fdr, but much more powerful thanvanilla bonferroni.

            Parameters:
          • p – (np.array) vector of p-values

          • alpha – (float) alpha level

          • Returns:
            (float) p-value threshold based on bonferroni

            step-down procedure

            nltools.stats.isc(data, n_samples=5000, metric='median', method='bootstrap', ci_percentile=95, exclude_self_corr=True, return_null=False, tail=2, n_jobs=-1, random_state=None)[source]

            Compute pairwise intersubject correlation from observations by subjects array.

            This function computes pairwise intersubject correlations (ISC) using the median as recommended by Chen et al., 2016). However, if the mean is preferred, we compute the mean correlation after performing the fisher r-to-z transformation and then convert back to correlations to minimize artificially inflating the correlation values.

            There are currently three different methods to compute p-values. These include the classic methods for computing permuted time-series by either circle-shifting the data or phase-randomizing the data (see Lancaster et al., 2018). These methods create random surrogate data while preserving the temporal autocorrelation inherent to the signal. By default, we use the subject-wise bootstrap method from Chen et al., 2016. Instead of recomputing the pairwise ISC using circle_shift or phase_randomization methods, this approach uses the computationally more efficient method of bootstrapping the subjects and computing a new pairwise similarity matrix with randomly selected subjects with replacement. If the same subject is selected multiple times, we set the perfect correlation to a nan with (exclude_self_corr=True). We compute the p-values using the percentile method using the same method in Brainiak.

            Chen, G., Shin, Y. W., Taylor, P. A., Glen, D. R., Reynolds, R. C., Israel, R. B., & Cox, R. W. (2016). Untangling the relatedness among correlations, part I: nonparametric approaches to inter-subject correlation analysis at the group level. NeuroImage, 142, 248-259.

            Hall, P., & Wilson, S. R. (1991). Two guidelines for bootstrap hypothesis testing. Biometrics, 757-762.

            Lancaster, G., Iatsenko, D., Pidde, A., Ticcinelli, V., & Stefanovska, A. (2018). Surrogate data for hypothesis testing of physical systems. Physics Reports, 748, 1-60.

            Parameters:
          • data – (pd.DataFrame, np.array) observations by subjects where isc is computed across subjects

          • n_samples – (int) number of random samples/bootstraps

          • metric – (str) type of isc summary metric [‘mean’,’median’]

          • method – (str) method to compute p-values [‘bootstrap’, ‘circle_shift’,’phase_randomize’] (default: bootstrap)

          • tail – (int) either 1 for one-tail or 2 for two-tailed test (default: 2)

          • n_jobs – (int) The number of CPUs to use to do the computation. -1 means all CPUs.

          • return_null – (bool) Return the permutation distribution along with the p-value; default False

          • Returns:

            (dict) dictionary of permutation results [‘correlation’,’p’]

            Return type:

            stats

            nltools.stats.isc_group(group1, group2, n_samples=5000, metric='median', method='permute', ci_percentile=95, exclude_self_corr=True, return_null=False, tail=2, n_jobs=-1, random_state=None)[source]

            Compute difference in intersubject correlation between groups.

            This function computes pairwise intersubject correlations (ISC) using the median as recommended by Chen et al., 2016). However, if the mean is preferred, we compute the mean correlation after performing the fisher r-to-z transformation and then convert back to correlations to minimize artificially inflating the correlation values.

            There are currently two different methods to compute p-values. By default, we use the subject-wise permutation method recommended Chen et al., 2016. This method combines the two groups and computes pairwise similarity both within and between the groups. Then the group labels are permuted and the mean difference between the two groups are recomputed to generate a null distribution. The second method uses subject-wise bootstrapping, where a new pairwise similarity matrix with randomly selected subjects with replacement is created separately for each group and the ISC difference between these groups is used to generate a null distribution. If the same subject is selected multiple times, we set the perfect correlation to a nan with (exclude_self_corr=True). We compute the p-values using the percentile method (Hall & Wilson, 1991).

            Chen, G., Shin, Y. W., Taylor, P. A., Glen, D. R., Reynolds, R. C., Israel, R. B., & Cox, R. W. (2016). Untangling the relatedness among correlations, part I: nonparametric approaches to inter-subject correlation analysis at the group level. NeuroImage, 142, 248-259.

            Hall, P., & Wilson, S. R. (1991). Two guidelines for bootstrap hypothesis testing. Biometrics, 757-762.

            Parameters:
          • group1 – (pd.DataFrame, np.array) observations by subjects where isc is computed across subjects

          • group2 – (pd.DataFrame, np.array) observations by subjects where isc is computed across subjects

          • n_samples – (int) number of samples for permutation or bootstrapping

          • metric – (str) type of isc summary metric [‘mean’,’median’]

          • method – (str) method to compute p-values [‘bootstrap’, ‘circle_shift’,’phase_randomize’] (default: bootstrap)

          • tail – (int) either 1 for one-tail or 2 for two-tailed test (default: 2)

          • n_jobs – (int) The number of CPUs to use to do the computation. -1 means all CPUs.

          • return_null – (bool) Return the permutation distribution along with the p-value; default False

          • Returns:

            (dict) dictionary of permutation results [‘correlation’,’p’]

            Return type:

            stats

            nltools.stats.isfc(data, method='average')[source]

            Compute intersubject functional connectivity (ISFC) from a list of observation x feature matrices

            This function uses the leave one out approach to compute ISFC (Simony et al., 2016). For each subject, compute the cross-correlation between each voxel/roi with the average of the rest of the subjects data. In other words, compute the mean voxel/ROI response for all participants except the target subject. Then compute the correlation between each ROI within the target subject with the mean ROI response in the group average.

            Simony, E., Honey, C. J., Chen, J., Lositsky, O., Yeshurun, Y., Wiesel, A., & Hasson, U. (2016). Dynamic reconfiguration of the default mode network during narrative comprehension. Nature communications, 7, 12141.

            Parameters:
          • data – list of subject matrices (observations x voxels/rois)

          • method – approach to computing ISFC. ‘average’ uses leave one

          • Returns:

            list of subject ISFC matrices

            nltools.stats.isps(data, sampling_freq=0.5, low_cut=0.04, high_cut=0.07, order=5, pairwise=False)[source]

            Compute Dynamic Intersubject Phase Synchrony (ISPS from a observation by subject array)

            This function computes the instantaneous intersubject phase synchrony for a single voxel/roi timeseries. Requires multiple subjects. This method is largely based on that described by Glerean et al., 2012 and performs a hilbert transform on narrow bandpass filtered timeseries (butterworth) data to get the instantaneous phase angle. The function returns a dictionary containing the average phase angle, the average vector length, and parametric p-values computed using the rayleigh test using circular statistics (Fisher, 1993). If pairwise=True, then it will compute these on the pairwise phase angle differences, if pairwise=False, it will compute these on the actual phase angles. This is called inter-site phase coupling or inter-trial phase coupling respectively in the EEG literatures.

            This function requires narrow band filtering your data. As a default we use the recommendations by (Glerean et al., 2012) of .04-.07Hz. This is similar to the “slow-4” band (0.025–0.067 Hz) described by (Zuo et al., 2010; Penttonen & Buzsáki, 2003), but excludes the .03 band, which has been demonstrated to contain aliased respiration signals (Birn, 2006).

            Birn RM, Smith MA, Bandettini PA, Diamond JB. 2006. Separating respiratory-variation-related fluctuations from neuronal-activity- related fluctuations in fMRI. Neuroimage 31:1536–1548.

            Buzsáki, G., & Draguhn, A. (2004). Neuronal oscillations in cortical networks. Science, 304(5679), 1926-1929.

            Fisher, N. I. (1995). Statistical analysis of circular data. cambridge university press.

            Glerean, E., Salmi, J., Lahnakoski, J. M., Jääskeläinen, I. P., & Sams, M. (2012). Functional magnetic resonance imaging phase synchronization as a measure of dynamic functional connectivity. Brain connectivity, 2(2), 91-101.

            Parameters:
          • data – (pd.DataFrame, np.ndarray) observations x subjects data

          • sampling_freq – (float) sampling freqency of data in Hz

          • low_cut – (float) lower bound cutoff for high pass filter

          • high_cut – (float) upper bound cutoff for low pass filter

          • order – (int) filter order for butterworth bandpass

          • pairwise – (bool) compute phase angle coherence on pairwise phase angle differences or on raw phase angle.

          • Returns:

            dictionary with mean phase angle, vector length, and rayleigh statistic

            nltools.stats.make_cosine_basis(nsamples, sampling_freq, filter_length, unit_scale=True, drop=0)[source]
            Create a series of cosine basis functions for a discrete cosine

            transform. Based off of implementation in spm_filter and spm_dctmtx because scipy dct can only apply transforms but not return the basis functions. Like SPM, does not add constant (i.e. intercept), but does retain first basis (i.e. sigmoidal/linear drift)

            Parameters:
          • nsamples (int) – number of observations (e.g. TRs)

          • sampling_freq (float) – sampling frequency in hertz (i.e. 1 / TR)

          • filter_length (int) – length of filter in seconds

          • unit_scale (true) – assure that the basis functions are on the normalized range [-1, 1]; default True

          • drop (int) – index of which early/slow bases to drop if any; default is to drop constant (i.e. intercept) like SPM. Unlike SPM, retains first basis (i.e. linear/sigmoidal). Will cumulatively drop bases up to and inclusive of index provided (e.g. 2, drops bases 1 and 2)

          • Returns:

            nsamples x number of basis sets numpy array

            Return type:

            out (ndarray)

            nltools.stats.matrix_permutation(data1, data2, n_permute=5000, metric='spearman', how='upper', include_diag=False, tail=2, n_jobs=-1, return_perms=False, random_state=None)[source]

            Permute 2-dimensional matrix correlation (mantel test).

            Chen, G. et al. (2016). Untangling the relatedness among correlations, part I: nonparametric approaches to inter-subject correlation analysis at the group level. Neuroimage, 142, 248-259.

            Parameters:
          • data1 – (pd.DataFrame, np.array) square matrix

          • data2 – (pd.DataFrame, np.array) square matrix

          • n_permute – (int) number of permutations

          • metric – (str) type of association metric [‘spearman’,’pearson’, ‘kendall’]

          • how – (str) whether to use the ‘upper’ (default), ‘lower’, or ‘full’ matrix. The default of ‘upper’ assumes both matrices are symmetric

          • include_diag (bool) – only applies if how=’full’. Whether to include the diagonal elements in the comparison

          • tail – (int) either 1 for one-tail or 2 for two-tailed test (default: 2)

          • n_jobs – (int) The number of CPUs to use to do the computation. -1 means all CPUs.

          • return_parms – (bool) Return the permutation distribution along with the p-value; default False

          • Returns:

            (dict) dictionary of permutation results [‘correlation’,’p’]

            Return type:

            stats

            nltools.stats.multi_threshold(t_map, p_map, thresh)[source]

            Threshold test image by multiple p-value from p image

            Parameters:
          • stat – (Brain_Data) Brain_Data instance of arbitrary statistic metric (e.g., beta, t, etc)

          • p – (Brain_Data) Brain_data instance of p-values

          • threshold – (list) list of p-values to threshold stat image

          • Returns:

            Thresholded Brain_Data instance

            Return type:
            nltools.stats.one_sample_permutation(data, n_permute=5000, tail=2, n_jobs=-1, return_perms=False, random_state=None)[source]

            One sample permutation test using randomization.

            Parameters:
          • data – (pd.DataFrame, pd.Series, np.array) data to permute

          • n_permute – (int) number of permutations

          • tail – (int) either 1 for one-tail or 2 for two-tailed test (default: 2)

          • n_jobs – (int) The number of CPUs to use to do the computation. -1 means all CPUs.

          • return_parms – (bool) Return the permutation distribution along with the p-value; default False

          • random_state – (int, None, or np.random.RandomState) Initial random seed (default: None)

          • Returns:

            (dict) dictionary of permutation results [‘mean’,’p’]

            Return type:

            stats

            nltools.stats.pearson(x, y)[source]

            Correlates row vector x with each row vector in 2D array y. From neurosynth.stats.py - author: Tal Yarkoni

            nltools.stats.procrustes(data1, data2)[source]

            Procrustes analysis, a similarity test for two data sets.

            Each input matrix is a set of points or vectors (the rows of the matrix). The dimension of the space is the number of columns of each matrix. Given two identically sized matrices, procrustes standardizes both such that: - \(tr(AA^{T}) = 1\). - Both sets of points are centered around the origin. Procrustes ([1]_, [2]_) then applies the optimal transform to the second matrix (including scaling/dilation, rotations, and reflections) to minimize \(M^{2}=\sum(data1-data2)^{2}\), or the sum of the squares of the pointwise differences between the two input datasets. This function was not designed to handle datasets with different numbers of datapoints (rows). If two data sets have different dimensionality (different number of columns), this function will add columns of zeros to the smaller of the two.

            Parameters:
          • data1 – array_like Matrix, n rows represent points in k (columns) space data1 is the reference data, after it is standardised, the data from data2 will be transformed to fit the pattern in data1 (must have >1 unique points).

          • data2 – array_like n rows of data in k space to be fit to data1. Must be the same shape (numrows, numcols) as data1 (must have >1 unique points).

          • Returns:
            array_like

            A standardized version of data1.

            mtx2array_like

            The orientation of data2 that best fits data1. Centered, but not necessarily \(tr(AA^{T}) = 1\).

            disparityfloat

            \(M^{2}\) as defined above.

            R(N, N) ndarray

            The matrix solution of the orthogonal Procrustes problem. Minimizes the Frobenius norm of dot(data1, R) - data2, subject to dot(R.T, R) == I.

            scalefloat

            Sum of the singular values of dot(data1.T, data2).

            nltools.stats.procrustes_distance(mat1, mat2, n_permute=5000, tail=2, n_jobs=-1, random_state=None)[source]

            Use procrustes super-position to perform a similarity test between 2 matrices. Matrices need to match in size on their first dimension only, as the smaller matrix on the second dimension will be padded with zeros. After aligning two matrices using the procrustes transformation, use the computed disparity between them (sum of squared error of elements) as a similarity metric. Shuffle the rows of one of the matrices and recompute the disparity to perform inference (Peres-Neto & Jackson, 2001).

            Parameters:
          • mat1 (ndarray) – 2d numpy array; must have same number of rows as mat2

          • mat2 (ndarray) – 1d or 2d numpy array; must have same number of rows as mat1

          • n_permute (int) – number of permutation iterations to perform

          • tail (int) – either 1 for one-tailed or 2 for two-tailed test; default 2

          • n_jobs (int) – The number of CPUs to use to do permutation; default -1 (all)

          • Returns:

            similarity between matrices bounded between 0 and 1 pval (float): permuted p-value

            Return type:

            similarity (float)

            nltools.stats.regress(X, Y, mode='ols', stats='full', **kwargs)[source]

            This is a flexible function to run several types of regression models provided X and Y numpy arrays. Y can be a 1d numpy array or 2d numpy array. In the latter case, results will be output with shape 1 x Y.shape[1], in other words fitting a separate regression model to each column of Y.

            Does NOT add an intercept automatically to the X matrix before fitting like some other software packages. This is left up to the user.

            This function can compute regression in 3 ways:

          • Standard OLS

          • OLS with robust sandwich estimators for standard errors. 3 robust types of estimators exist:

          • ‘hc0’ - classic huber-white estimator robust to heteroscedasticity (default)

          • ‘hc3’ - a variant on huber-white estimator slightly more conservative when sample sizes are small

          • ‘hac’ - an estimator robust to both heteroscedasticity and auto-correlation; auto-correlation lag can be controlled with the nlags keyword argument; default

          • ARMA (auto-regressive moving-average) model (experimental). This model is fit through statsmodels.tsa.arima_model.ARMA, so more information about options can be found there. Any settings can be passed in as kwargs. By default fits a (1,1) model with starting lags of 2. This mode is computationally intensive and can take quite a while if Y has many columns. If Y is a 2d array joblib.Parallel is used for faster fitting by parallelizing fits across columns of Y. Parallelization can be controlled by passing in kwargs. Defaults to multi-threading using 10 separate threads, as threads don’t require large arrays to be duplicated in memory. Defaults are also set to enable memory-mapping for very large arrays if backend=’multiprocessing’ to prevent crashes and hangs. Various levels of progress can be monitored using the ‘disp’ (statsmodels) and ‘verbose’ (joblib) keyword arguments with integer values > 0.

          • Parameters:
          • X (ndarray) – design matrix; assumes intercept is included

          • Y (ndarray) – dependent variable array; if 2d, a model is fit to each column of Y separately

          • mode (str) – kind of model to fit; must be one of ‘ols’ (default), ‘robust’, or

          • 'arma'

          • stats (str) – one of ‘full’, ‘betas’, ‘tstats’. Useful to speed up calculation if

          • 'full'. (you know you only need some statistics and not others. Defaults to) –

          • robust_estimator (str,optional) – kind of robust estimator to use if mode = ‘robust’; default ‘hc0’

          • nlags (int,optional) – auto-correlation lag correction if mode = ‘robust’ and robust_estimator = ‘hac’; default 1

          • order (tuple,optional) – auto-regressive and moving-average orders for mode = ‘arma’; default (1,1)

          • kwargs (dict) – additional keyword arguments to statsmodels.tsa.arima_model.ARMA and joblib.Parallel

          • Returns:

            coefficients se: standard error of coefficients t: t-statistics (coef/sterr) p : p-values df: degrees of freedom res: residuals

            Return type:

            Examples

            Standard OLS

            >>> results = regress(X,Y,mode='ols')
            

            Robust OLS with heteroscedasticity (hc0) robust standard errors

            >>> results = regress(X,Y,mode='robust')
            

            Robust OLS with heteroscedasticty and auto-correlation (with lag 2) robust standard errors

            >>> results = regress(X,Y
            
            
            
            
                
            ,mode='robust',robust_estimator='hac',nlags=2)
            

            Auto-regressive mode with auto-regressive and moving-average lags = 1

            >>> results = regress(X,Y,mode='arma',order=(1,1))
            

            Auto-regressive model with auto-regressive lag = 2, moving-average lag = 3, and multi-processing instead of multi-threading using 8 cores (this can use a lot of memory if input arrays are very large!).

            >>> results = regress(X,Y,mode='arma',order=(2,3),backend='multiprocessing',n_jobs=8)
            nltools.stats.summarize_bootstrap(data, save_weights=False)[source]
            

            Calculate summary of bootstrap samples

            Parameters:
          • sample – (Brain_Data) Brain_Data instance of samples

          • save_weights – (bool) save bootstrap weights

          • Returns:

            (dict) dictionary of Brain_Data summary images

            Return type:

            output

            nltools.stats.threshold(stat, p, thr=0.05, return_mask=False)[source]

            Threshold test image by p-value from p image

            Parameters:
          • stat – (Brain_Data) Brain_Data instance of arbitrary statistic metric (e.g., beta, t, etc)

          • p – (Brain_Data) Brain_data instance of p-values

          • threshold – (float) p-value to threshold stat image

          • return_mask – (bool) optionall return the thresholding mask; default False

          • Returns:

            Thresholded Brain_Data instance

            Return type:
            nltools.stats.transform_pairwise(X, y)[source]

            Transforms data into pairs with balanced labels for ranking Transforms a n-class ranking problem into a two-class classification problem. Subclasses implementing particular strategies for choosing pairs should override this method. In this method, all pairs are choosen, except for those that have the same target value. The output is an array of balanced classes, i.e. there are the same number of -1 as +1

            Reference: “Large Margin Rank Boundaries for Ordinal Regression”, R. Herbrich, T. Graepel, K. Obermayer. Authors: Fabian Pedregosa <fabian@fseoane.net> Alexandre Gramfort <alexandre.gramfort@inria.fr>

            Parameters:
          • X – (np.array), shape (n_samples, n_features) The data

          • y – (np.array), shape (n_samples,) or (n_samples, 2) Target labels. If it’s a 2D array, the second column represents the grouping of samples, i.e., samples with different groups will not be considered.

          • Returns:
            (np.array), shape (k, n_feaures)

            Data as pairs, where k = n_samples * (n_samples-1)) / 2 if grouping values were not passed. If grouping variables exist, then returns values computed for each group.

            y_trans: (np.array), shape (k,)

            Output class labels, where classes have values {-1, +1} If y was shape (n_samples, 2), then returns (k, 2) with groups on the second dimension.

            nltools.stats.trim(data, cutoff=None)[source]

            Trim a Pandas DataFrame or Series by replacing outlier values with NaNs

            Parameters:
          • data – (pd.DataFrame, pd.Series) data to trim

          • cutoff – (dict) a dictionary with keys {‘std’:[low,high]} or {‘quantile’:[low,high]}

          • Returns:

            (pd.DataFrame, pd.Series) trimmed data

            Return type:
            nltools.stats.two_sample_permutation(data1, data2, n_permute=5000, tail=2, n_jobs=-1, return_perms=False, random_state=None)[source]

            Independent sample permutation test.

            Parameters:
          • data1 – (pd.DataFrame, pd.Series, np.array) dataset 1 to permute

          • data2 – (pd.DataFrame, pd.Series, np.array) dataset 2 to permute

          • n_permute – (int) number of permutations

          • tail – (int) either 1 for one-tail or 2 for two-tailed test (default: 2)

          • n_jobs – (int) The number of CPUs to use to do the computation. -1 means all CPUs.

          • return_parms – (bool) Return the permutation distribution along with the p-value; default False

          • Returns:

            (dict) dictionary of permutation results [‘mean’,’p’]

            Return type:

            stats

            nltools.stats.u_center(mat)[source]

            U-center a 2d array. U-centering is a bias-corrected form of double-centering

            Parameters:

            mat (ndarray) – 2d numpy array

            Returns:

            u-centered version of input

            Return type:

            mat (narray)

            nltools.stats.upsample(data, sampling_freq=None, target=None, target_type='samples', method='linear')[source]

            Upsample pandas to a new target frequency or number of samples using interpolation.

            Parameters:
          • data – (pd.DataFrame, pd.Series) data to upsample (Note: will drop non-numeric columns from DataFrame)

          • sampling_freq – Sampling frequency of data in hertz

          • target – (float) upsampling target

          • target_type – (str) type of target can be [samples,seconds,hz]

          • method – (str) [‘linear’, ‘nearest’, ‘zero’, ‘slinear’, ‘quadratic’, ‘cubic’] where ‘zero’, ‘slinear’, ‘quadratic’ and ‘cubic’ refer to a spline interpolation of zeroth, first, second or third order (default: linear)

          • Returns:

            upsampled pandas object

            nltools.stats.winsorize(data, cutoff=None, replace_with_cutoff=True)[source]

            Winsorize a Pandas DataFrame or Series with the largest/lowest value not considered outlier

            Parameters:
          • data – (pd.DataFrame, pd.Series) data to winsorize

          • cutoff – (dict) a dictionary with keys {‘std’:[low,high]} or {‘quantile’:[low,high]}

          • replace_with_cutoff – (bool) If True, replace outliers with cutoff. If False, replaces outliers with closest existing values; (default: False)

          • Returns:

            (pd.DataFrame, pd.Series) winsorized data

            Return type:
            nltools.stats.zscore(df)[source]

            zscore every column in a pandas dataframe or series.

            Parameters:

            df – (pd.DataFrame) Pandas DataFrame instance

            Returns:

            (pd.DataFrame) z-scored pandas DataFrame or series instance

            Return type:

            z_data

            nltools.datasets.download_collection(collection=None, data_dir=None, overwrite=False, resume=True, verbose=1)[source]

            Download images and metadata from Neurovault collection

            Parameters:
          • collection (int, optional) – collection id. Defaults to None.

          • data_dir (str, optional) – data directory. Defaults to None.

          • overwrite (bool, optional) – overwrite data directory. Defaults to False.

          • resume (bool, optional) – resume download. Defaults to True.

          • verbose (int, optional) – print diagnostic messages. Defaults to 1.

          • Returns:

            (DataFrame of image metadata, list of files from downloaded collection)

            Return type:

            (pd.DataFrame, list)

            nltools.datasets.fetch_emotion_ratings(data_dir=None, resume=True, verbose=1)[source]

            Download and loads emotion rating dataset from neurovault

            Parameters:

            data_dir – (string, optional). Path of the data directory. Used to force data storage in a specified location. Default: None

            Returns:

            (Brain_Data) Brain_Data object with downloaded data. X=metadata

            Return type:
            nltools.datasets.fetch_pain(data_dir=None, resume=True, verbose=1)[source]

            Download and loads pain dataset from neurovault

            Parameters:

            data_dir – (string, optional) Path of the data directory. Used to force data storage in a specified location. Default: None

            Returns:

            (Brain_Data) Brain_Data object with downloaded data. X=metadata

            Return type:
            nltools.datasets.get_collection_image_metadata(collection=None, data_dir=None, limit=10)[source]

            Get image metadata associated with collection

            Parameters:
          • collection (int, optional) – collection id. Defaults to None.

          • data_dir (str, optional) – data directory. Defaults to None.

          • limit (int, optional) – number of images to increment. Defaults to 10.

          • Returns:

            Dataframe with full image metadata from collection

            Return type:

            pd.DataFrame

            class nltools.cross_validation.KFoldStratified(n_splits=3, shuffle=False, random_state=None)[source]

            K-Folds cross validation iterator which stratifies continuous data (unlike scikit-learn equivalent).

            Provides train/test indices to split data in train test sets. Split dataset into k consecutive folds while ensuring that same subject is held out within each fold. Each fold is then used a validation set once while the k - 1 remaining folds form the training set. Extension of KFold from scikit-learn cross_validation model

            Parameters:
          • n_splits – int, default=3 Number of folds. Must be at least 2.

          • shuffle – boolean, optional Whether to shuffle the data before splitting into batches.

          • random_state – None, int or RandomState Pseudo-random number generator state used for random sampling. If None, use default numpy RNG for shuffling

          • split(X, y, groups=None)[source]

            Generate indices to split data into training and test set.

            Parameters:
          • X – array-like, shape (n_samples, n_features) Training data, where n_samples is the number of samples and n_features is the number of features. Note that providing y is sufficient to generate the splits and hence np.zeros(n_samples) may be used as a placeholder for X instead of actual training data.

          • y – array-like, shape (n_samples,) The target variable for supervised learning problems. Stratification is done based on the y labels.

          • groups – (object) Always ignored, exists for compatibility.

          • Returns:

            (ndarray) The training set indices for that split. test : (ndarray) The testing set indices for that split.

            Return type:

            train

            nltools.cross_validation.set_cv(Y=None, cv_dict=None, return_generator=True)[source]

            Helper function to create a sci-kit learn compatible cv object using common parameters for prediction analyses.

            Parameters:
          • Y – (pd.DataFrame) Pandas Dataframe of Y labels

          • cv_dict – (dict) Type of cross_validation to use. A dictionary of {‘type’: ‘kfolds’, ‘n_folds’: n}, {‘type’: ‘kfolds’, ‘n_folds’: n, ‘stratified’: Y}, {‘type’: ‘kfolds’, ‘n_folds’: n, ‘subject_id’: holdout}, or {‘type’: ‘loso’, ‘subject_id’: holdout}

          • return_generator (bool) – return a cv generator instead of an instance; default True

          • Returns:

            a scikit-learn model-selection generator

            Return type:
            class nltools.cross_validation.KFoldStratified(n_splits=3, shuffle=False, random_state=None)[source]

            K-Folds cross validation iterator which stratifies continuous data (unlike scikit-learn equivalent).

            Provides train/test indices to split data in train test sets. Split dataset into k consecutive folds while ensuring that same subject is held out within each fold. Each fold is then used a validation set once while the k - 1 remaining folds form the training set. Extension of KFold from scikit-learn cross_validation model

            Parameters:
          • n_splits – int, default=3 Number of folds. Must be at least 2.

          • shuffle – boolean, optional Whether to shuffle the data before splitting into batches.

          • random_state – None, int or RandomState Pseudo-random number generator state used for random sampling. If None, use default numpy RNG for shuffling

          • split(X, y, groups=None)[source]

            Generate indices to split data into training and test set.

            Parameters:
          • X – array-like, shape (n_samples, n_features) Training data, where n_samples is the number of samples and n_features is the number of features. Note that providing y is sufficient to generate the splits and hence np.zeros(n_samples) may be used as a placeholder for X instead of actual training data.

          • y – array-like, shape (n_samples,) The target variable for supervised learning problems. Stratification is done based on the y labels.

          • groups – (object) Always ignored, exists for compatibility.

          • Returns:

            (ndarray) The training set indices for that split. test : (ndarray) The testing set indices for that split.

            Return type:

            train

          • mask – nibabel or Brain_Data instance

          • custom_mask – nibabel instance or string to file path; optional

          • Returns:
            Brain_Data instance of a mask with different integers indicating

            different masks

            nltools.mask.create_sphere(coordinates, radius=5, mask=None)[source]

            Generate a set of spheres in the brain mask space

            Parameters:
          • radius – vector of radius. Will create multiple spheres if len(radius) > 1

          • centers – a vector of sphere centers of the form [px, py, pz] or [[px1, py1, pz1], …, [pxn, pyn, pzn]]

          • nltools.mask.expand_mask(mask, custom_mask=None)[source]

            expand a mask with multiple integers into separate binary masks

            Parameters:
          • mask – nibabel or Brain_Data instance

          • custom_mask – nibabel instance or string to file path; optional

          • Returns:

            Brain_Data instance of multiple binary masks

            Return type:
            nltools.mask.roi_to_brain(data, mask_x)[source]

            This function will create convert an expanded binary mask of ROIs (see expand_mask) based on a vector of of values. The dataframe of values must correspond to ROI numbers.

            This is useful for populating a parcellation scheme by a vector of Values

            Parameters:
          • data – Pandas series, dataframe, list, np.array of ROI by observation

          • mask_x – an expanded binary mask

          • Returns:
            (Brain_Data) Brain_Data instance where each ROI is now populated

            with a value

            nltools.file_reader.onsets_to_dm(F, sampling_freq, run_length, header='infer', sort=False, keep_separate=True, add_poly=None, unique_cols=None, fill_na=None, **kwargs)[source]

            This function can assist in reading in one or several in a 2-3 column onsets files, specified in seconds and converting it to a Design Matrix organized as samples X Stimulus Classes. sampling_freq should be specified in hertz; for TRs use hertz = 1/TR. Onsets files must be organized with columns in one of the following 4 formats:

          • ‘Stim, Onset’

          • ‘Onset, Stim’

          • ‘Stim, Onset, Duration’

          • ‘Onset, Duration, Stim’

          • No other file organizations are currently supported. Note: Stimulus offsets (onset + duration) that fall into an adjacent TR include that full TR. E.g. offset of 10.16s with TR = 2 has an offset of TR 5, which spans 10-12s, rather than an offset of TR 4, which spans 8-10s.

            Parameters:
          • F (str/Path/pd.DataFrame) – filepath or pandas dataframe

          • sampling_freq (float) – samping frequency in hertz, i.e 1 / TR

          • run_length (int) – run length in number of TRs

          • header (str/None, optional) – whether there’s an additional header row in the

          • "infer". (supplied file/dataframe. See pd.read_csv for more details. Defaults to) –

          • sort (bool, optional) – whether to sort dataframe columns alphabetically. Defaults to False.

          • keep_separate (bool, optional) – if a list of files or dataframes is supplied,

          • True. (whether to create separate polynomial columns per file. Defaults to) –

          • add_poly (bool/int, optional) – whether to add Nth order polynomials to design

          • None. (matrix. Defaults to) –

          • unique_cols (list/None, optional) – if a list of files or dataframes is supplied,

          • file (what additional columns to keep separate per) –

          • fill_na (Any, optional) – what to replace NaNs with. Defaults to None (no filling).

          • Returns:

            design matrix organized as TRs x Stims

            Return type:

            nltools.data.Design_Matrix

            nltools.utils.set_algorithm(algorithm, *args, **kwargs)[source]

            Setup the algorithm to use in subsequent prediction analyses.

            Parameters:
          • algorithm – The prediction algorithm to use. Either a string or an (uninitialized) scikit-learn prediction object. If string, must be one of ‘svm’,’svr’, linear’,’logistic’,’lasso’, ‘lassopcr’,’lassoCV’,’ridge’,’ridgeCV’,’ridgeClassifier’, ‘randomforest’, or ‘randomforestClassifier’

          • kwargs – Additional keyword arguments to pass onto the scikit-learn clustering object.

          • Returns:

            dictionary of settings for prediction

            Return type:

            predictor_settings

            nltools.utils.set_decomposition_algorithm(algorithm, n_components=None, *args, **kwargs)[source]

            Setup the algorithm to use in subsequent decomposition analyses.

            Parameters:
          • algorithm – The decomposition algorithm to use. Either a string or an (uninitialized) scikit-learn decomposition object. If string must be one of ‘pca’,’nnmf’, ica’,’fa’, ‘dictionary’, ‘kernelpca’.

          • kwargs – Additional keyword arguments to pass onto the scikit-learn clustering object.

          • Returns:

            dictionary of settings for prediction

            Return type:

            predictor_settings

            nltools.prefs: Preferences

            This module can be used to adjust the default MNI template settings that are used internally by all Brain_Data operations. By default all operations are performed in MNI152 2mm space. Thus any files loaded with be resampled to this space by default.You can control this on a per-file loading basis using the mask argument of Brain_Data, e.g.

            from nltools.data import Brain_Data
            # my_brain will be resampled to 2mm
            brain = Brain_Data('my_brain.nii.gz')
            # my_brain will now be resampled to the same space as my_mask
            brain = Brain_Data('my_brain.nii.gz', mask='my_mask.nii.gz') # will be resampled
            

            Alternatively this module can be used to switch between 2mm or 3mm MNI spaces with and without ventricles:

            from nltools.prefs import MNI_Template, resolve_mni_path
            from nltools.data import Brain_Data
            # Update the resolution globally
            MNI_Template['resolution'] = '3mm'
            # This works too:
            MNI_Template.resolution = 3
            # my_brain will be resampled to 3mm and future operation will be in 3mm space
            brain = Brain_Data('my_brain.nii.gz')
            # get the template nifti files
            resolve_mni_path(MNI_Template)
            # will print like:
                'resolution': '3mm',
                'mask_type': 'with_ventricles',
                'mask': '/Users/Esh/Documents/pypackages/nltools/nltools/resources/MNI152_T1_3mm_brain_mask.nii.gz',
                'plot': '/Users/Esh/Documents/pypackages/nltools/nltools/resources/MNI152_T1_3mm.nii.gz',
                'brain':
                '/Users/Esh/Documents/pypackages/nltools/nltools/resources/MNI152_T1_3mm_brain.nii.gz'
            nltools.plotting.dist_from_hyperplane_plot(stats_output)[source]
            

            Plot SVM Classification Distance from Hyperplane

            Parameters:

            stats_output – a pandas file with prediction output

            Returns:

            Will return a seaborn plot of distance from hyperplane

            Return type:
            nltools.plotting.plot_between_label_distance(distance, labels, ax=None, permutation_test=True, n_permute=5000, fontsize=18, **kwargs)[source]

            Create a heatmap indicating average between label distance

            Parameters:
          • distance – (pandas dataframe) brain_distance matrix

          • labels – (pandas dataframe) group labels

          • ax – axis to plot (default=None)

          • permutation_test – (boolean)

          • n_permute – (int) number of samples for permuation test

          • fontsize – (int) size of font for plot

          • Returns:

            heatmap out: pandas dataframe of pairwise distance between conditions within_dist_out: average pairwise distance matrix mn_dist_out: (optional if permutation_test=True) average difference in distance between conditions p_dist_out: (optional if permutation_test=True) p-value for difference in distance between conditions

            Return type:
            nltools.plotting.plot_brain(objIn, how='full', thr_upper=None, thr_lower=None, save=False, ** kwargs)[source]

            More complete brain plotting of a Brain_Data instance

            Parameters:
          • obj (Brain_Data) – object to plot

          • how (str) – whether to plot a glass brain ‘glass’, 3 view-multi-slice mni ‘mni’, or both ‘full’

          • thr_upper (str/float) – thresholding of image. Can be string for percentage, or float for data units (see Brain_Data.threshold()

          • thr_lower (str/float) – thresholding of image. Can be string for percentage, or float for data units (see Brain_Data.threshold()

          • save (str) – if a string file name or path is provided plots will be saved into this directory appended with the orientation they belong to

          • kwargs – optionals args to nilearn plot functions (e.g. vmax)

          • nltools.plotting.plot_interactive_brain(brain, threshold=1e-06, surface=False, percentile_threshold=False, anatomical=None, **kwargs)[source]

            This function leverages nilearn’s new javascript based brain viewer functions to create interactive plotting functionality.

            Parameters:
          • brain (nltools.Brain_Data) – a Brain_Data instance of 1d or 2d shape (i.e. 3d or 4d volume)

          • threshold (float/str) – threshold to initialize the visualization, maybe be a percentile string; default 0

          • surface (bool) – whether to create a surface-based plot; default False

          • percentile_threshold (bool) – whether to interpret threshold values as percentiles

          • kwargs – optional arguments to nilearn.view_img or nilearn.view_img_on_surf

          • Returns:

            interactive brain viewer widget

            nltools.plotting.plot_mean_label_distance(distance, labels, ax=None, permutation_test=False, n_permute=5000, fontsize=18, **kwargs)[source]

            Create a violin plot indicating within and between label distance.

            Parameters:
          • distance – pandas dataframe of distance

          • labels – labels indicating columns and rows to group

          • ax – matplotlib axis to plot on

          • permutation_test – (bool) indicates whether to run permuatation test or not

          • n_permute – (int) number of permutations to run

          • fontsize – (int) fontsize for plot labels

          • Returns:

            heatmap stats: (optional if permutation_test=True) permutation results

            Return type:
            nltools.plotting.plot_silhouette(distance, labels, ax=None, permutation_test=True, n_permute=5000, **kwargs)[source]

            Create a silhouette plot indicating between relative to within label distance

            Parameters:
          • distance – (pandas dataframe) brain_distance matrix

          • labels – (pandas dataframe) group labels

          • ax – axis to plot (default=None)

          • permutation_test – (boolean)

          • n_permute – (int) number of samples for permuation test

          • heatmap # out: pandas dataframe of pairwise distance between conditions # within_dist_out: average pairwise distance matrix # mn_dist_out: (optional if permutation_test=True) average difference in distance between conditions # p_dist_out: (optional if permutation_test=True) p-value for difference in distance between conditions

            Return type:
            nltools.plotting.plot_stacked_adjacency(adjacency1, adjacency2, normalize=True, **kwargs)[source]

            Create stacked adjacency to illustrate similarity.

            Parameters:
          • matrix1 – Adjacency instance 1

          • matrix2 – Adjacency instance 2

          • normalize – (boolean) Normalize matrices.

          • Returns:

            matplotlib figure

            nltools.plotting.plot_t_brain(objIn, how='full', thr='unc', alpha=None, nperm=None, cut_coords=[], **kwargs)[source]

            Takes a brain data object and computes a 1 sample t-test across it’s first axis. If a list is provided will compute difference between brain data objects in list (i.e. paired samples t-test). :param objIn: if list will compute difference map first :type objIn: list/Brain_Data :param how: whether to plot a glass brain ‘glass’, 3 view-multi-slice mni ‘mni’, or both ‘full’ :type how: list :param thr: what method to use for multiple comparisons correction unc, fdr, or tfce :type thr: str :param alpha: p-value threshold :type alpha: float :param nperm: number of permutations for tcfe; default 1000 :type nperm: int :param cut_coords: x,y,z coords to plot brain slice :type cut_coords: list :param kwargs: optionals args to nilearn plot functions (e.g. vmax)

            nltools.plotting.probability_plot(stats_output)[source]

            Plot Classification Probability

            Parameters:

            stats_output – a pandas file with prediction output

            Returns:

            Will return a seaborn scatterplot

            Return type:
            class nltools.simulator.Simulator(brain_mask=None, output_dir=None, random_state=None)[source] create_cov_data(cor, cov, sigma, mask=None, reps=1, n_sub=1, output_dir=None)[source]

            create continuous simulated data with covariance

            Parameters:
          • cor – amount of covariance between each voxel and Y variable

          • cov – amount of covariance between voxels

          • sigma – amount of noise to add

          • radius – vector of radius. Will create multiple spheres if len(radius) > 1

          • center – center(s) of sphere(s) of the form [px, py, pz] or [[px1, py1, pz1], …, [pxn, pyn, pzn]]

          • reps – number of data repetitions

          • n_sub – number of subjects to simulate

          • output_dir – string path of directory to output data. If None, no data will be written

          • **kwargs – Additional keyword arguments to pass to the prediction algorithm

          • create_data(levels, sigma, radius=5, center=None, reps=1, output_dir=None)[source]

            create simulated data with integers

            Parameters:
          • levels – vector of intensities or class labels

          • sigma – amount of noise to add

          • radius – vector of radius. Will create multiple spheres if len(radius) > 1

          • center – center(s) of sphere(s) of the form [px, py, pz] or [[px1, py1, pz1], …, [pxn, pyn, pzn]]

          • reps – number of data repetitions useful for trials or subjects

          • output_dir – string path of directory to output data. If None, no data will be written

          • **kwargs – Additional keyword arguments to pass to the prediction algorithm

          • create_ncov_data(cor, cov, sigma, masks=None, reps=1, n_sub=1, output_dir=None)[source]

            create continuous simulated data with covariance

            Parameters:
          • cor – amount of covariance between each voxel and Y variable (an int or a vector)

          • cov – amount of covariance between voxels (an int or a matrix)

          • sigma – amount of noise to add

          • mask – region(s) where we will have activations (list if more than one)

          • reps – number of data repetitions

          • n_sub – number of subjects to simulate

          • output_dir – string path of directory to output data. If None, no data will be written

          • **kwargs – Additional keyword arguments to pass to the prediction algorithm

          • gaussian(mu, sigma, i_tot)[source]

            create a 3D gaussian signal normalized to a given intensity

            Parameters:
          • mu – average value of the gaussian signal (usually set to 0)

          • sigma – standard deviation

          • i_tot – sum total of activation (numerical integral over the gaussian returns this value)

          • n_spheres(radius, center)[source]

            generate a set of spheres in the brain mask space

            Parameters:
          • radius – vector of radius. Will create multiple spheres if len(radius) > 1

          • centers – a vector of sphere centers of the form [px, py, pz] or [[px1, py1, pz1], …, [pxn, pyn, pzn]]

          • normal_noise(mu, sigma)[source]

            produce a normal noise distribution for all all points in the brain mask

            Parameters:
          • mu – average value of the gaussian signal (usually set to 0)

          • sigma – standard deviation

          • sphere(r, p)[source]

            create a sphere of given radius at some point p in the brain mask

            Parameters:
          • r – radius of the sphere

          • p – point (in coordinates of the brain mask) of the center of the sphere

          • to_nifti(m)[source]

            convert a numpy matrix to the nifti format and assign to it the brain_mask’s affine matrix

            Parameters:

            m – the 3D numpy matrix we wish to convert to .nii

  •