The aim of the EU Child Cohort Network is to bring together data from existing child cohorts into one open and sustainable, multi-disciplinary network. To ensure sure that the EU Child Cohort Network is both open and sustainable, the consortia are using the data-sharing platform DataSHIELD.
Participating cohorts must harmonise data, following the data harmonisation manuals. Secondly, perform quality-control checks on their harmonised data. Thirdly, upload descriptions of harmonisation to the cohort catalogue. The last step is uploading the data into the DataSHIELD backends. The guide below guides you through the process.
Reference documentation:
You can install the package by executing the following commands:
dsUpload
Version 5.x.x is compatible with Armadillo
3. You should be able to install dsUpload without specifying any
additional versions.
Step 1: install devtools
install.packages("devtools")
Step 2: load devtools and install ds-upload
library(devtools)
devtools::install_github("lifecycle-project/ds-upload")
dsUpload
Version 4.7.x is compatible with Armadillo
2. When installing this version of dsUpload, the
install.packages
command might install the newest version
(incompatible) of MolgenisArmadillo
. Run these commands
(Rstudio) to install the correct version of MolgenisArmadillo:
Install devtools:
install.packages("devtools")
Load devtools and install ds-upload 4.7.1
library(devtools)
devtools::install_github("lifecycle-project/ds-upload@4.7.1")
You might get the following error message:
namespace ‘MolgenisArmadillo’ 2.0.0 is being loaded, but == 1.1.3 is required
To fix this you need to remove the incompatible version of
MolgenisArmadillo
:
unloadNamespace("MolgenisArmadillo")
remove.packages("MolgenisArmadillo")
You might have to install these additional packages:
Next we install a previous version of MolgenisArmadillo
1.1.3:
packageurl <- "https://cran.r-project.org/src/contrib/Archive/MolgenisArmadillo/MolgenisArmadillo_1.1.3.tar.gz"
install.packages(packageurl, repos=NULL, type="source")
Now we (again) install dsUpload 4.7.1:
devtools::install_github("lifecycle-project/ds-upload@4.7.1")
Make sure you do NOT update MolgenisArmadillo to another version then 1.1.3, if prompted, select option 3
Downloading GitHub repo lifecycle-project/ds-upload@4.7.1
These packages have more recent versions available.
It is recommended to update all of them.
Which would you like to update?
1: All
2: CRAN packages only
3: None
4: MolgenisA... (1.1.3 -> 2.0.0) [CRAN]
Enter one or more numbers, or an empty line to skip updates: 3
After that you should be able to load dsUpload
without
any problems.
If you are asked to update MolgenisArmadillo
to version
2.0.x please skip, this in order for dsUpload 4.7.x to work with
Armadillo 2.
Check: troubleshooting guide
To simplify the upload and importing of data dictionaries this package is written to import and upload the data dictionaries and data in one run. When running the package, you need to specify the data dictionary version and the data input file. When you use data formats other than CSV use need to specify the data format as well
Prerequisites
Upload core variables
Merge all the variables that are obtained in the dictionary of the core variables. So in general that means merge the data of WP1 and WP3 into one set.
Upload outcome variables
Merge all the variables that are obtained in the dictionary of the outcome variables. So in general this means merge the data of WP4, WP5 and WP6 into one set.
Please following the instruction below to upload the core and outcome variables in the Armadillo.
login_data <- data.frame(
server = "https://armadillo3.test.molgenis.org",
driver = "ArmadilloDriver")
login_data <- data.frame(
server = "https://armadillo.test.molgenis.org",
storage = "https://armadillo-minio.test.molgenis.org",
driver = "ArmadilloDriver")
# login to the Armadillo server
du.login(login_data = login_data)
#> ***********************************************************************************
#> [WARNING] You are not running the latest version of the dsUpload-package.
#> [WARNING] If you want to upgrade to newest version : [ 4.0.6 ],
#> [WARNING] Please run 'install.packages("dsUpload", repos = "https://registry.molgenis.org/repository/R/")'
#> [WARNING] Check the release notes here: https://github.com/lifecycle-project/analysis-protocols/releases/tag/4.0.6
#> ***********************************************************************************
#> Login to: "https://armadillo.test.molgenis.org"
#> [1] "We're opening a browser so you can log in with code GFL6Q6"
#> Logged on to: "https://armadillo.test.molgenis.org"
# upload the data into the DataSHIELD backend
# these are the core variables
# be advised the default input format is 'CSV'
# you can use STATA, SPSS, SAS, CSV's or R as source files
du.upload(
cohort_id = 'gecko',
dict_version = '2_1',
dict_kind = 'core',
data_version = '1_0',
data_input_format = 'CSV',
data_input_path = 'https://github.com/lifecycle-project/ds-upload/blob/master/inst/examples/data/WP1/data/all_measurements_v1_2.csv?raw=true',
run_mode = "non_interactive"
)
#> ***********************************************************************************
#> [WARNING] You are not running the latest version of the dsUpload-package.
#> [WARNING] If you want to upgrade to newest version : [ 4.0.6 ],
#> [WARNING] Please run 'install.packages("dsUpload", repos = "https://registry.molgenis.org/repository/R/")'
#> [WARNING] Check the release notes here: https://github.com/lifecycle-project/analysis-protocols/releases/tag/4.0.6
#> ***********************************************************************************
#> ######################################################
#> Start upload data into DataSHIELD backend
#> ------------------------------------------------------
#> * Create temporary workdir
#> ######################################################
#> Start download dictionaries
#> ------------------------------------------------------
#> * Download: [ 2_1_monthly_rep.xlsx ]
#> * Download: [ 2_1_non_rep.xlsx ]
#> * Download: [ 2_1_trimester_rep.xlsx ]
#> * Download: [ 2_1_yearly_rep.xlsx ]
#> Successfully downloaded dictionaries
#> ######################################################
#> Start importing data dictionaries
#> ######################################################
#> * Check released dictionaries
#> * Project : gecko already exists
#> ######################################################
#> Start converting and uploading data
#> ######################################################
#> * Setup: load data and set output directory
#> ------------------------------------------------------
#> [WARNING] This is an unmatched column, it will be dropped : [ art ].
#> * Generating: non-repeated measures
#> * Generating: yearly-repeated measures
#> Aggregate function missing, defaulting to 'length'
#> * Generating: monthly-repeated measures
#> Aggregate function missing, defaulting to 'length'
#> * Generating: trimesterly-repeated measures
#> Aggregate function missing, defaulting to 'length'
#> * Start importing: 2_1_core_1_0 into project: gecko
#> Compressing...
#>
|
| | 0%
|
|========================================================================| 100%
#> Uploaded 2_1_core_1_0/trimester
#> * Import finished successfully
#> * Start importing: 2_1_core_1_0 into project: gecko
#> Compressing...
#>
|
| | 0%
|
|========== | 15%
|
|===================== | 29%
|
|=============================== | 44%
|
|========================================== | 58%
|
|==================================================== | 73%
|
|=============================================================== | 87%
|
|========================================================================| 100%
#> Uploaded 2_1_core_1_0/non_rep
#> * Import finished successfully
#> * Start importing: 2_1_core_1_0 into project: gecko
#> Compressing...
#>
|
| | 0%
|
|========================================================================| 100%
#> Uploaded 2_1_core_1_0/yearly_rep
#> * Import finished successfully
#> * Start importing: 2_1_core_1_0 into project: gecko
#> Compressing...
#>
|
| | 0%
|
|========================================================================| 100%
#> Uploaded 2_1_core_1_0/monthly_rep
#> * Import finished successfully
#> ######################################################
#> Converting and import successfully finished
#> ######################################################
#> * Reinstate default working directory
#> * Cleanup temporary directory
# upload the outcome variables
du.upload(
cohort_id = 'gecko',
dict_version = '1_1',
dict_kind = 'outcome',
data_version = '1_0',
data_input_format = 'CSV',
data_input_path = 'https://github.com/lifecycle-project/ds-upload/blob/master/inst/examples/data/WP6/nd_data_wp6.csv?raw=true',
run_mode = "non_interactive"
)
#> ***********************************************************************************
#> [WARNING] You are not running the latest version of the dsUpload-package.
#> [WARNING] If you want to upgrade to newest version : [ 4.0.6 ],
#> [WARNING] Please run 'install.packages("dsUpload", repos = "https://registry.molgenis.org/repository/R/")'
#> [WARNING] Check the release notes here: https://github.com/lifecycle-project/analysis-protocols/releases/tag/4.0.6
#> ***********************************************************************************
#> ######################################################
#> Start upload data into DataSHIELD backend
#> ------------------------------------------------------
#> * Create temporary workdir
#> ######################################################
#> Start download dictionaries
#> ------------------------------------------------------
#> * Download: [ 1_1_monthly_rep.xlsx ]
#> * Download: [ 1_1_non_rep.xlsx ]
#> * Download: [ 1_1_weekly_rep.xlsx ]
#> * Download: [ 1_1_yearly_rep.xlsx ]
#> Successfully downloaded dictionaries
#> ######################################################
#> Start importing data dictionaries
#> ######################################################
#> * Check released dictionaries
#> * Project : gecko already exists
#> ######################################################
#> Start converting and uploading data
#> ######################################################
#> * Setup: load data and set output directory
#> ------------------------------------------------------
#> * Generating: non-repeated measures
#> * Generating: yearly-repeated measures
#> * Generating: monthly-repeated measures
#> * Generating: weekly-repeated measures
#> * Start importing: 1_1_outcome_1_0 into project: gecko
#> Compressing...
#>
|
| | 0%
|
|========================================================================| 100%
#> Uploaded 1_1_outcome_1_0/weekly_rep
#> * Import finished successfully
#> * Start importing: 1_1_outcome_1_0 into project: gecko
#> Compressing...
#>
|
| | 0%
|
|========================================================================| 100%
#> Uploaded 1_1_outcome_1_0/non_rep
#> * Import finished successfully
#> * Start importing: 1_1_outcome_1_0 into project: gecko
#> Compressing...
#>
|
| | 0%
|
|====== | 8%
|
|=========== | 16%
|
|================= | 24%
|
|======================= | 31%
|
|============================ | 39%
|
|================================== | 47%
|
|======================================== | 55%
|
|============================================= | 63%
|
|=================================================== | 71%
|
|========================================================= | 79%
|
|============================================================== | 86%
|
|==================================================================== | 94%
|
|========================================================================| 100%
#> Uploaded 1_1_outcome_1_0/yearly_rep
#> * Import finished successfully
#> * Start importing: 1_1_outcome_1_0 into project: gecko
#> Compressing...
#>
|
| | 0%
|
|========================================================================| 100%
#> Uploaded 1_1_outcome_1_0/monthly_rep
#> * Import finished successfully
#> ######################################################
#> Converting and import successfully finished
#> ######################################################
#> * Reinstate default working directory
#> * Cleanup temporary directory
A video guiding you through the process can be found here:
Check youtube channel: upload data dictionaries and data into Opal
Alternatively, execute these commands in your R-console:
login_data <- data.frame(
server = "https://opal.edge.molgenis.org",
user = "administrator",
password = "ouf0uPh6",
driver = "OpalDriver")
# login to the DataSHIELD backend
du.login(login_data = login_data)
#> ***********************************************************************************
#> [WARNING] You are not running the latest version of the dsUpload-package.
#> [WARNING] If you want to upgrade to newest version : [ 4.0.6 ],
#> [WARNING] Please run 'install.packages("dsUpload", repos = "https://registry.molgenis.org/repository/R/")'
#> [WARNING] Check the release notes here: https://github.com/lifecycle-project/analysis-protocols/releases/tag/4.0.6
#> ***********************************************************************************
#> Login to: "https://opal.edge.molgenis.org"
#> Logged on to: "https://opal.edge.molgenis.org"
# upload the data into the DataSHIELD backend
# these are the core variables
# be advised the default input format is 'CSV'
# you can use STATA, SPSS, SAS and CSV's as source files
du.upload(
cohort_id = 'gecko',
dict_version = '2_1',
dict_kind = 'core',
data_version = '1_0',
data_input_format = 'CSV',
data_input_path = 'https://github.com/lifecycle-project/ds-upload/blob/master/inst/examples/data/WP1/data/all_measurements_v1_2.csv?raw=true',
run_mode = "non_interactive"
)
#> ***********************************************************************************
#> [WARNING] You are not running the latest version of the dsUpload-package.
#> [WARNING] If you want to upgrade to newest version : [ 4.0.6 ],
#> [WARNING] Please run 'install.packages("dsUpload", repos = "https://registry.molgenis.org/repository/R/")'
#> [WARNING] Check the release notes here: https://github.com/lifecycle-project/analysis-protocols/releases/tag/4.0.6
#> ***********************************************************************************
#> ######################################################
#> Start upload data into DataSHIELD backend
#> ------------------------------------------------------
#> * Create temporary workdir
#> ######################################################
#> Start download dictionaries
#> ------------------------------------------------------
#> * Download: [ 2_1_monthly_rep.xlsx ]
#> * Download: [ 2_1_non_rep.xlsx ]
#> * Download: [ 2_1_trimester_rep.xlsx ]
#> * Download: [ 2_1_yearly_rep.xlsx ]
#> Successfully downloaded dictionaries
#> ######################################################
#> Start importing data dictionaries
#> ######################################################
#> * Check released dictionaries
#> ------------------------------------------------------
#> Start creating project: [ lc_gecko_core_2_1 ]
#> * Project: [ lc_gecko_core_2_1 ] already exists
#> ------------------------------------------------------
#> Start importing dictionaries
#> * Table: [ 1_0_monthly_rep ] already exists
#> * Import variables into: [ 1_0_monthly_rep ]
#> * Table: [ 1_0_non_rep ] already exists
#> * Matched categories for table: [ 1_0_non_rep ]
#> * Import variables into: [ 1_0_non_rep ]
#> * Table: [ 1_0_trimester_rep ] already exists
#> * Matched categories for table: [ 1_0_trimester_rep ]
#> * Import variables into: [ 1_0_trimester_rep ]
#> * Table: [ 1_0_yearly_rep ] already exists
#> * Matched categories for table: [ 1_0_yearly_rep ]
#> * Import variables into: [ 1_0_yearly_rep ]
#> All dictionaries are populated correctly
#> ######################################################
#> Start converting and uploading data
#> ######################################################
#> * Setup: load data and set output directory
#> ------------------------------------------------------
#> [WARNING] This is an unmatched column, it will be dropped : [ art ].
#> * Generating: non-repeated measures
#> * Generating: yearly-repeated measures
#> Aggregate function missing, defaulting to 'length'
#> * Generating: monthly-repeated measures
#> Aggregate function missing, defaulting to 'length'
#> * Generating: trimesterly-repeated measures
#> Aggregate function missing, defaulting to 'length'
#> * Upload: [ 2021-01-29_11-40-58_1_0_trimester_repeated_measures.csv ] to directory [ core ]
#> * Upload: [ 2021-01-29_11-40-58_1_0_non_repeated_measures.csv ] to directory [ core ]
#> * Upload: [ 2021-01-29_11-40-58_1_0_yearly_repeated_measures.csv ] to directory [ core ]
#> * Upload: [ 2021-01-29_11-40-58_1_0_monthly_repeated_measures.csv ] to directory [ core ]
#> ######################################################
#> Converting and import successfully finished
#> ######################################################
#> * Reinstate default working directory
#> * Cleanup temporary directory
# upload the outcome variables
du.upload(
cohort_id = 'gecko',
dict_version = '1_1',
dict_kind = 'outcome',
data_version = '1_0',
data_input_format = 'CSV',
data_input_path = 'https://github.com/lifecycle-project/ds-upload/blob/master/inst/examples/data/WP6/nd_data_wp6.csv?raw=true',
run_mode = "non_interactive"
)
#> ***********************************************************************************
#> [WARNING] You are not running the latest version of the dsUpload-package.
#> [WARNING] If you want to upgrade to newest version : [ 4.0.6 ],
#> [WARNING] Please run 'install.packages("dsUpload", repos = "https://registry.molgenis.org/repository/R/")'
#> [WARNING] Check the release notes here: https://github.com/lifecycle-project/analysis-protocols/releases/tag/4.0.6
#> ***********************************************************************************
#> ######################################################
#> Start upload data into DataSHIELD backend
#> ------------------------------------------------------
#> * Create temporary workdir
#> ######################################################
#> Start download dictionaries
#> ------------------------------------------------------
#> * Download: [ 1_1_monthly_rep.xlsx ]
#> * Download: [ 1_1_non_rep.xlsx ]
#> * Download: [ 1_1_weekly_rep.xlsx ]
#> * Download: [ 1_1_yearly_rep.xlsx ]
#> Successfully downloaded dictionaries
#> ######################################################
#> Start importing data dictionaries
#> ######################################################
#> * Check released dictionaries
#> ------------------------------------------------------
#> Start creating project: [ lc_gecko_outcome_1_1 ]
#> * Project: [ lc_gecko_outcome_1_1 ] already exists
#> ------------------------------------------------------
#> Start importing dictionaries
#> * Table: [ 1_0_monthly_rep ] already exists
#> * Matched categories for table: [ 1_0_monthly_rep ]
#> * Import variables into: [ 1_0_monthly_rep ]
#> * Table: [ 1_0_non_rep ] already exists
#> * Matched categories for table: [ 1_0_non_rep ]
#> * Import variables into: [ 1_0_non_rep ]
#> * Table: [ 1_0_weekly_rep ] already exists
#> * Import variables into: [ 1_0_weekly_rep ]
#> * Table: [ 1_0_yearly_rep ] already exists
#> * Matched categories for table: [ 1_0_yearly_rep ]
#> * Import variables into: [ 1_0_yearly_rep ]
#> All dictionaries are populated correctly
#> ######################################################
#> Start converting and uploading data
#> ######################################################
#> * Setup: load data and set output directory
#> ------------------------------------------------------
#> * Generating: non-repeated measures
#> * Generating: yearly-repeated measures
#> * Generating: monthly-repeated measures
#> * Generating: weekly-repeated measures
#> * Upload: [ 2021-01-29_11-41-21_1_0_weekly_repeated_measures.csv ] to directory [ outcome ]
#> * Upload: [ 2021-01-29_11-41-21_1_0_non_repeated_measures.csv ] to directory [ outcome ]
#> * Upload: [ 2021-01-29_11-41-21_1_0_yearly_repeated_measures.csv ] to directory [ outcome ]
#> * Upload: [ 2021-01-29_11-41-21_1_0_monthly_repeated_measures.csv ] to directory [ outcome ]
#> ######################################################
#> Converting and import successfully finished
#> ######################################################
#> * Reinstate default working directory
#> * Cleanup temporary directory
IMPORTANT: You can run this package for the core variables and for the outcome variables. Each of them requires changing some parameters in the function call. So dict_kind specific ‘core’ or ‘outcome’ variables and dict_version specifies the data dictionary version (check the changelogs here: https://github.com/lifecycle-project/ds-dictionaries/tree/master/changelogs).
IMPORTANT You can specify your upload format! So you do not have to export to CSV first. Supported upload formats are: ‘SPSS’, ‘SAS’, ‘STATA’ and ‘CSV’.
If you run these commands, your data will be uploaded to the DataSHIELD backend. If you use Opal, you can now import these data into the tables manually.
A video guiding you through the process can be found here: import data into Opal
Alternatively, execute these actions for Opal:
IMPORTANT: make sure no NEW variables are introduce 11. Click on “Finish” 12. Check the “Task logs” (on the left side of the screen, in the icon bar)
It will match your data dictionary and determine which variables are matched or not. You can re-upload the source files as often as needed.