Preprocessing. For the visualization, the EEG was bandpass filtered between 0.3 and 40 Hz. For deep learning classifiers and cluster encoders, the EEG was bandpass filtered between 1 and 40 Hz, re-refer- enced to the common average, and normalized by dividing by the 99th percentile of the absolute amplitude. All filters were implemented in Python as 5th order Butterworth filters using scipy.signal (Xxxxxxxx et al., 2020) and zero-phase filtering.
Preprocessing. Preprocessing was performed in FMRIB’s Software Library (FSL 5.0.9, Xxxxxxxxx et al., 2012). The structural and functional MRI data were skull stripped. The functional data was registered to 2mm-MNI-standard space via the individual T1- weighted anatomical image (FLIRT). The functional data was motion corrected (MCFLIRT) and smoothed with a 6 mm Gaussian kernel. ICA-AROMA was used to filter out additional motion-related, physiologic, and scanner-induced noise while retaining the signal of interest (Pruim, Xxxxxx, Buitelaar, et al., 2015; Xxxxx, Xxxxxx, xxx Xxxxx, et al., 2015). White matter and cerebrospinal fluid signals were regressed out (Pruim, Xxxxxx, xxx Xxxxx, et al., 2015; Varoquaux & Xxxxxxxx, 2013). Lastly, a 128 s high-pass filter was applied to the data. To construct the functional RS connectome, we used the 264 regions of interest (ROIs) presented by Xxxxx et al. (2011) which are based on a meta-analysis of resting state and task-based fMRI data (Figure 1A). These ROIs represent nodes of common networks such as the default mode network. Calculating the connectivity between all nodes allows us to include connectivity between nodes within the same network as well as connectivity between nodes of different networks. The ROIs were spheres with a radius of 5mm around the coordinates described by Power et al. (2011). For each participant, the signal within these spheres was averaged and normalized resulting in 264 time series. Functional connectivity was calculated by correlating each time series with every other time series resulting in a 264x264 correlation matrix and 34,716 unique connectivity estimates – representing the functional RS connectome (Xxxxxxx et al., 2018; Xxx et al., 2018). For further calculations the connectome was vectorized (i.e., transforming matrix into column vector; Figure 2A).
Preprocessing. We start from a set of measured variables X at measurement locations K to learn a local causal graph that is valid for all locations. Trivially, we could run the FCI algorithm. This, however, is suboptimal due to several reasons. First, the measurements are not IID, violating one of the basic assumptions of the FCI algorithm. Second, by neglecting the spatial structure, we would be neglecting a lot of information that could potentially be useful (Xxxxxxx et al. (2000)). To avoid this loss of information, we construct upstream variables U as outlined in def. 1, using the mean as function f . We stress that this choice depends on the application. For example, for a system with currents of largely differing discharge volume, the weighted average might be a better choice. Further, we note that not all locations have a preceding location which results in locations with an incomplete set of variables. In this work, we wanted to evaluate our general approach, which is why we decided to exclude missing data imputation as a potential influence on modeling performance. In principle, however, the missing data problem could be tackled by any appropriate strategy, including regression imputation and Bayesian estimation (Xxxxxx (2010)). The spatial structure of the system could offer additional information here that could allow for better imputation of missing data. Note that our strategy of excluding locations with an incomplete set of variables slightly reduces the size of the data set. Our preprocessing strategy is outlined in alg. 1. ∈ / ∅ Input : Set of measured variables X at locations K in a system with directional currents, with KS being the set of locations x X for which Pre(k) = { } Output: Set of variables Xr = U , O, R, I at locations Kr 1 Partition variables into subsets I, O and R following def. 1; ∈ 3 Select an unvisited measurement location k KS and calculate U (k) following def. 1, using the average as the function f ; 4 until All measurement points k ∈ KS have been visited; 5 Remove entries of locations k ∈ K for which Pre(k) = ∅;
Preprocessing preprocessing steps are required. These steps standardize the geographic references and create a number of look-up tables that greatly speed the complex data processing. The following pre-processing tasks are performed: • Import and standardize AVL files, • Create stop location table • Update off-board stop location table • Create (or update) the quarter-mile look-up table • Create subsidy table • Link the subsidy table to ORCA cards (CSNs) • Hash the CSNs and Business IDs in the subsidy table, maintaining the link between the subsidy table and the hashed CSNs • Preprocess date and time values in the transaction data • Remove duplicate boarding records. Each of these tasks is described below.
Preprocessing. In order to fuse the input data sets together, geolocate the transactions, and then create the origin/destination and transfer files, a number of preprocessing steps must first be performed. These steps standardize the geographic references and create a number of look-up tables that greatly speed the complex data processing. The following are the pre-processing tasks: • Import and standardize the AVL files • Create a stop location table • Create a table that correlates the ORCA transaction record’s directional variable (i.e., inbound or outbound) with the cardinal directions used by the transit agency’s directional variable (i.e., north/south/east/west) • Create (or update) the quarter-mile look-up table • Update the off-board stop location table • Preprocess ORCA transactions data and reformat date and time variables • Create a subsidy table • Link the subsidy table to ORCA cards (CSNs) • Hash the CSNs and Business IDs in the subsidy table, maintaining the link between the subsidy table and the hashed CSNs • Remove duplicate boarding records. These tasks are described below. Schema for each of the data sets are presented in Appendix C.
Preprocessing. The simple protocol explained above uses the fact that Bob knows more about the value of Xxxxx than Xxx knows. In fact, one can show that a x y z PXYZ 11 1 1 1/4 Forget second bit H(X|Z) — H(X|Y) = 0 Send second bit u y z PUYZ
1 1 0 1 4 1 1 1 1/4 x y z v PXYZV
11 1 1 1 1 4 H(U|Z) — H(U|Y) = 1 H(X|ZV) — H(X|YV) = 1
Preprocessing. Alphabet Size
Preprocessing. The terrain in forests shows significant variations in height and contains substantial under-canopy vegetation. Our seg- mentation approach considers no semantics and is aimed solely at identifying trees. We preprocess an input point cloud with the aim of filtering out the ground, bushes, and any small near-ground structures. We first minimally denoise the cloud and apply the cloth simulation algorithm proposed by Xxxxx et al. [45] to compute a ground segmentation. Their method inverts the z-axis of the point cloud P and simulates the interaction of a rigid cloth covering the inverted ground surface, extracting the set of ground points PG. For points p = [p , p , p ]⊤ ∈ P and pi ∈ PG, we Fig. 2: Results of ground segmentation and height normalization steps. In the top image, points in red denote identified ground points. The ground segmentation is used to normalize the height, as shown in the image below. density-based clustering algorithm [39]. Following is a brief summary of Quickshift++ while illustrating how we use it in the context of our problem. For more details, we refer the reader to the work by Xxxxx et al. [14]. Let rk(p) for a point p ∈ P be the distance of p to its k-th nearest neighbor. For the true density f (p) of a point p, the k-NN density estimate of it is defined as interpolate the ground elevation of a point h(p) as fk(p) = n v r (p)3 , (3) h(p) = Σ Σpi ∈N w(p, pi)pi
(1) k
Preprocessing. × Processing of CT scans can be a memory-intensive task, particularly for high-resolution data. For this reason, some of the segmentation tasks described in this work use sub- sampled data. To subsample the data, the image size was reduced by block averaging to 256 256 voxels in the X-Y plane with the number of slices reduced such that the data were isotropically sampled. Linear interpolation was used to determine gray-values between voxel locations. This strat- egy does not attempt to apply a consistent image spacing for all images but rather aims to retain the best resolution possi- ble for each individual image. In this way, we hope to achieve the highest chance of success for each image in sub- sequent processing tasks. In order to reduce memory consumption in processes where full resolution data are preferred, the scan size was reduced by excluding image regions outside the lungs (after lung segmentation has taken place). A bounding box around the segmented lungs was constructed, with a margin of 5 voxels on each side. Data outside this bounding box were discarded. The resulting smaller image is referred to in this work as the bounded image.
Preprocessing. Before the Bayesian analysis, we cleaned the data and visualized general tendencies present in the data as summary plots using the tidyverse package system in R (Xxxxxxx et al., 2019). In the data-cleaning process, we had several criteria for exclusion. The first criteria was participants’ native language: we excluded participants whose native language is not Turkish. The second criteria was their accuracy in practice items: if they give wrong answers to more than half of the questions, we excluded them from the analysis. We also excluded participants that answered the questions too fast, that is below 200 milliseconds. Finally, we excluded participants with too many inaccurate answers in control conditions. We did not include missing data points or exclusions in our analysis and assumed that data were missing completely at random (Xxx Xxxxxx, 2018). In this thesis, we do not report the rates of missing data, but our raw data is available.