Data Summary. Questions Answers
Data Summary. The main categories of data foreseen to be collected or generated by MULTI-STR3AM are: • Underlying research data: This category encompasses the data, including associated metadata, forming the basis of results and conclusions presented in scientific articles and in any potential patents arising from the project. To remove any limitations to review and validation of results by the scientific community, green open access (self-archiving) will be the preferred model of publication for scientific articles. Additionally, the underlying data will be deposited in an open repository (independent of the project), which will be linked to in the resulting article. • Operational data: This includes raw or curated data arising from the operation of equipment, for example associated with biomass cultivation, fractionation and purification of microalgae components, and routine analyses of the resultant products (e.g., compositional analyses). Data related to the production process will be used to produce guidelines for optimal performances, quality checks and confirmation checks, which will be of use in the project and in future planned production of algae. This category of data is likely to contain commercially sensitive data; careful consideration will be given to which information can be published openly (e.g., for dissemination purposes) and which should be consideration non-open. Some of this data is also of value for scientific or other publications and presentations and will be treated accordingly. • Impact monitoring data: Primarily in WP5, data will be gathered to assess the social, environmental and economic impact of MULTI-STR3AM and to track the performance of the project against the KPIs set out in the proposal. These data include biorefinery process modelling and data gathered on e.g., feedstock, raw materials, energy, waste and emissions to complete life cycle and social life cycle assessments. Such assessments will be performed according to methodology as defined by ISO 14040/44 and the project impacts measured with the help of computer-based tools such as SimaPro v9 (with Ecoinvent v3.5 database, and others). • Documentation relating to instruments and methods: This category covers documentation needed to implement the project and reproduce its results, including SOPs from each partner for their respective processes and details of tools, methods, instruments and software. This section will describe the kinds of data that each work package will be handli...
Data Summary. (Outbound Translation) Outbound translation will reuse models for MT and Quality estimation from other WPs of the project. Models for detecting problematic words on the source text will be trained using publicly available data for automatic MT post-editing and synthetic training data generated using already trained translation models. Detection of problematic words strongly depends on the underlying MT system used 17 xxxxx://xxx.xxxXxxx.xx.xx/cics/research-storage/standard-storage 18 xxxxx://xxxxxx.xX.xxxx.xx in Outbound Translation and so do the synthetic training data. Because of that, we do not consider this data re-usable for other purposes and thus will only publish the software that can be used to generate the synthetic data for a particular translation system. During user testing and experimental deployment of the Outbound Translation system, detailed logs will be collected. We believe it will be possible to use the logs to compile datasets which might be useful for qual- ity estimation of automatic MT post-editing. In that case, the dataset will be anonymized and published in the LINDAT/CLARIN19 repository.
Data Summary. 2.1 Purpose of data collection and generation
2.2 Types and formats of data
2.2.1 Types and formats of research data collected in the project. Clinical data from HGSOC patients Sequencing data Imaging data Measurement data from experiments and analyses
Figure 1. Workflow for calling germline short variants from whole genome sequencing data.
2.2.2 Data collected or generated for project management
Data Summary. 2.1 Purpose of data collection and generation
2.2 Types and formats of data
2.2.1 Types and formats of research data collected in the project. • Prospective clinical data from consented patients: patient personal information (name, social security number, municipality, age) and data on diagnoses, height and weight, surgical procedures, PET/CT imaging results, information on chemotherapy and other treatments, blood sample results, histopathological analyses such as IHC stainings made in diagnostic routine, treatment outcome and survival. Clinical data are stored in a FileMaker Pro database managed by personnel in the TUCH. Pseudonymized clinical data exports for research use are made periodically from the clinical data database. Pseudonymized clinical data exports are shared to the DECIDER members after authorised by the OPM (operational project manager) through Eduuni, which is a collaboration service environment for flexible and secure collaboration across organization and ecosystem boundaries provided by the government owned company CSC — IT Center for Science. eDuuni is a service environment maintained by the Finnish state security regulation increased level (Vahti 2/2010), which is ensured with regular audits by CSC and external auditors. On these basis, eDuuni service environment can be used for material that is in protection level IV (Restricted). • Retrospective clinical data from biobank: survival and recurrence times, stages, ages, routine longitudinal diagnostic laboratory, surgery and treatment data. Data are stored in a FileMaker Pro database maintained by the personnel in HUS. The clinical data for retrospective cohort are shared via eDuuni in a similar fashion as prospective clinical data.
2.2.2 Types and formats of research data generated within the project • Tissue, ascites and plasma samples from consented patients are obtained during routine operations and sequenced. We will obtain whole-genome sequencing (WGS), RNA- sequencing, DNA methylation sequencing, circulating tumour DNA (ctDNA), shallow sequencing (plasma samples only) and exome-sequencing (plasma samples only) data. During the project, we may include other data layers pending technological advances in sequencing technologies. • WGS: FASTQ (raw read data). Downstream formats include BAM (mapped read data, processed read data as input for downstream analyses), VCF (variants), CSV/TSV (tab separated tables for various types of data) • RNA-seq: FASTQ (raw read data). Downstream anal...
Data Summary. 2.1. Purpose of data collection and relation to the project objectives The STAR4BBS Data Management Plan (DMP) aims to provide a strategy for managing key data generated and collected during the project and to optimize access to and re- use of research data. The DMP is intended to be a ‘living’ document that will outline how the STAR4BBS research data will be handled during and after the project, and so, if needed, it will be reviewed and updated at regular intervals. The main purpose of the DMP is to ensure the accessibility and intelligibility of the data generated during the project. Each data set created during the project will be assessed and categorized as open, embargo or restricted by the owners of the content of the data set. All the data sets, regardless of their categorization, will be stored in each of the participant entities databases and in the TUB Cloud created as internal database of the partners. In addition, those categorized as open or embargo will be publicly shared (in the case of embargo, after the embargo period is over) through the public section of the project website and ZENODO (xxxxx://xxxxxx.xxx/ ).
2.2. Data Management Plan (DMO) guiding Principles The Data Management Plan of STAR4BBS is implemented within the Work Package 8 Project Management & Internal Project Communication. The STAR4BBS project data management plan follows the principle of Open Access guideline summarized in the diagram here below.
Data Summary. 2.1. State the purpose of the data collection/generation
1. Enhance grid observability when monitoring: improve knowledge on demand/generation profiles, power flow computation, etc.
2. Modelling demand and generation for forecasting purposes: training of machine learning algorithms to forecast demand and generation in specific points of the grid.
Data Summary. 2.1. State the purpose of the data collection/generation
2.2. Explain the relation to the objectives of the project
2.3. Specify the types and formats of data generated/collected
Data Summary. In order to provide an overview of the different datasets that are produced over HECARRUS project life cycle, Table 2 presents the details of the data type, origin and format extension. Data types include numerical datasets, computer codes, text data, technical figures, contact lists, survey and workshops data. Primary data correspond to the main output that undergoes the already described confidentiality control, before it is made publicly available. Table 2. Information on the data types that will be used within the project.
Data Summary. In summary, we see that the existence of patterns in which a single conjunct controls (some) agree- ment processes considerably complicates the array of possible strategies for syntactic agreement with (nominal) coordinate structures. In addition to (18-1) and (18-2) we must accommodate a number of further patterns.
(18) 1. Agreement with resolved coordination-level features