Title: | OpenAccess TCGA Data on Terra as MultiAssayExperiment |
---|---|
Description: | Leverage the existing open access TCGA data on Terra with well-established Bioconductor infrastructure. Make use of the Terra data model without learning its complexities. With a few functions, you can copy / download and generate a MultiAssayExperiment from the TCGA example workspaces provided by Terra. |
Authors: | Marcel Ramos [aut, cre] |
Maintainer: | Marcel Ramos <[email protected]> |
License: | Artistic-2.0 |
Version: | 1.11.0 |
Built: | 2024-10-30 16:18:52 UTC |
Source: | https://github.com/waldronlab/terraTCGAdata |
Obtain assay datasets from Terra
getAssayData( assayName, sampleCode = "01", tablename = .DEFAULT_TABLENAME, workspace = terraTCGAworkspace(), namespace = .DEFAULT_NAMESPACE, metacols = .PARTICIPANT_METADATA_COLS, sampleIdx = TRUE )
getAssayData( assayName, sampleCode = "01", tablename = .DEFAULT_TABLENAME, workspace = terraTCGAworkspace(), namespace = .DEFAULT_NAMESPACE, metacols = .PARTICIPANT_METADATA_COLS, sampleIdx = TRUE )
assayName |
character() The name of the assay dataset column from
|
sampleCode |
character(1) The sample code used to filtering samples
e.g., "01" for Primary Solid Tumors, see
|
tablename |
The Terra data model table from which to extract the clinical data (default: "sample") |
workspace |
character(1) The Terra Data Resources workspace from which
to pull TCGA data (default: see |
namespace |
character(1) The Terra Workspace Namespace that defaults to "broad-firecloud-tcga" and rarely needs to be changed. |
metacols |
The set of columns that comprise of the metadata columns.
See the |
sampleIdx |
numeric() index or TRUE. Specify an index for subsetting the
assay data. This argument is mainly used for example and vignette
purposes. To use all the data, use the default value (default: |
Either a matrix or RaggedExperiment depending on the assay selected
if ( AnVILGCP::gcloud_exists() && identical(AnVILBase::avplatform_namespace(), "AnVILGCP") && nzchar(AnVILGCP::avworkspace_name()) ) getAssayData( assayName = "protein_exp__mda_rppa_core__mdanderson_org__Level_3__protein_normalization__data", sampleCode = c("01", "10"), workspace = "TCGA_ACC_OpenAccess_V1-0_DATA" )
if ( AnVILGCP::gcloud_exists() && identical(AnVILBase::avplatform_namespace(), "AnVILGCP") && nzchar(AnVILGCP::avworkspace_name()) ) getAssayData( assayName = "protein_exp__mda_rppa_core__mdanderson_org__Level_3__protein_normalization__data", sampleCode = c("01", "10"), workspace = "TCGA_ACC_OpenAccess_V1-0_DATA" )
The column names in the output can be used in the getAssayData
function.
getAssayTable( tablename = .DEFAULT_TABLENAME, metacols = .PARTICIPANT_METADATA_COLS, workspace = terraTCGAworkspace(), namespace = .DEFAULT_NAMESPACE )
getAssayTable( tablename = .DEFAULT_TABLENAME, metacols = .PARTICIPANT_METADATA_COLS, workspace = terraTCGAworkspace(), namespace = .DEFAULT_NAMESPACE )
tablename |
The Terra data model table from which to extract the clinical data (default: "sample") |
metacols |
The set of columns that comprise of the metadata columns.
See the |
workspace |
character(1) The Terra Data Resources workspace from which
to pull TCGA data (default: see |
namespace |
character(1) The Terra Workspace Namespace that defaults to "broad-firecloud-tcga" and rarely needs to be changed. |
A tibble of pointers to resources within the Terra data model
if ( AnVILGCP::gcloud_exists() && identical(AnVILBase::avplatform_namespace(), "AnVILGCP") && nzchar(AnVILGCP::avworkspace_name()) ) getAssayTable(workspace = "TCGA_COAD_OpenAccess_V1-0_DATA")
if ( AnVILGCP::gcloud_exists() && identical(AnVILBase::avplatform_namespace(), "AnVILGCP") && nzchar(AnVILGCP::avworkspace_name()) ) getAssayTable(workspace = "TCGA_COAD_OpenAccess_V1-0_DATA")
The participant table may contain curated demographic information e.g., sex, age, etc.
getClinical( columnName, participants = TRUE, tablename = .DEFAULT_TABLENAME, workspace = terraTCGAworkspace(), namespace = .DEFAULT_NAMESPACE, verbose = TRUE, metacols = .PARTICIPANT_METADATA_COLS, participantIds = NULL )
getClinical( columnName, participants = TRUE, tablename = .DEFAULT_TABLENAME, workspace = terraTCGAworkspace(), namespace = .DEFAULT_NAMESPACE, verbose = TRUE, metacols = .PARTICIPANT_METADATA_COLS, participantIds = NULL )
columnName |
The name of the column to extract files, see
|
participants |
logical(1) Whether to merge the participant table
from |
tablename |
The Terra data model table from which to extract the clinical data (default: "sample") |
workspace |
character(1) The Terra Data Resources workspace from which
to pull TCGA data (default: see |
namespace |
character(1) The Terra Workspace Namespace that defaults to "broad-firecloud-tcga" and rarely needs to be changed. |
verbose |
logical(1) Whether to output additional information regarding
the workspace and namespace (default: |
metacols |
The set of columns that comprise of the metadata columns.
See the |
participantIds |
character() TCGA participant identifiers usually in the
form of "TCGA-AB-1234". By default, all available participant identifiers
will be used. (default: |
A DataFrame
with clinical information from TCGA. The metadata i.e.,
metadata(object)
includes the columnName
used to obtain the data.
if ( AnVILGCP::gcloud_exists() && identical(AnVILBase::avplatform_namespace(), "AnVILGCP") && nzchar(AnVILGCP::avworkspace_name()) ) getClinical( workspace = "TCGA_ACC_OpenAccess_V1-0_DATA", participantIds = c("TCGA-OR-A5J1", "TCGA-OR-A5J2", "TCGA-OR-A5J3", "TCGA-OR-A5J4") )
if ( AnVILGCP::gcloud_exists() && identical(AnVILBase::avplatform_namespace(), "AnVILGCP") && nzchar(AnVILGCP::avworkspace_name()) ) getClinical( workspace = "TCGA_ACC_OpenAccess_V1-0_DATA", participantIds = c("TCGA-OR-A5J1", "TCGA-OR-A5J2", "TCGA-OR-A5J3", "TCGA-OR-A5J4") )
The column names in the output table can be used in the getClinical
function.
getClinicalTable( tablename = .DEFAULT_TABLENAME, metacols = .PARTICIPANT_METADATA_COLS, workspace = terraTCGAworkspace(), namespace = .DEFAULT_NAMESPACE, verbose = TRUE )
getClinicalTable( tablename = .DEFAULT_TABLENAME, metacols = .PARTICIPANT_METADATA_COLS, workspace = terraTCGAworkspace(), namespace = .DEFAULT_NAMESPACE, verbose = TRUE )
tablename |
The Terra data model table from which to extract the clinical data (default: "sample") |
metacols |
The set of columns that comprise of the metadata columns.
See the |
workspace |
character(1) The Terra Data Resources workspace from which
to pull TCGA data (default: see |
namespace |
character(1) The Terra Workspace Namespace that defaults to "broad-firecloud-tcga" and rarely needs to be changed. |
verbose |
logical(1) Whether to output additional information regarding
the workspace and namespace (default: |
A tibble of Google Storage resource locations e.g.,
gs://firecloud...
if ( AnVILGCP::gcloud_exists() && identical(AnVILBase::avplatform_namespace(), "AnVILGCP") && nzchar(AnVILGCP::avworkspace_name()) ) getClinicalTable( workspace = "TCGA_ACC_OpenAccess_V1-0_DATA" )
if ( AnVILGCP::gcloud_exists() && identical(AnVILBase::avplatform_namespace(), "AnVILGCP") && nzchar(AnVILGCP::avworkspace_name()) ) getClinicalTable( workspace = "TCGA_ACC_OpenAccess_V1-0_DATA" )
Import Terra TCGA data as a list
getTCGAdatalist( assayNames, sampleCode, workspace = terraTCGAworkspace(), namespace = .DEFAULT_NAMESPACE, tablename = .DEFAULT_TABLENAME, sampleIdx = TRUE, verbose = TRUE )
getTCGAdatalist( assayNames, sampleCode, workspace = terraTCGAworkspace(), namespace = .DEFAULT_NAMESPACE, tablename = .DEFAULT_TABLENAME, sampleIdx = TRUE, verbose = TRUE )
assayNames |
character() A vector of assays selected from the colnames
of |
sampleCode |
character(1) The sample code used to filtering samples
e.g., "01" for Primary Solid Tumors, see
|
workspace |
character(1) The Terra Data Resources workspace from which
to pull TCGA data (default: see |
namespace |
character(1) The Terra Workspace Namespace that defaults to "broad-firecloud-tcga" and rarely needs to be changed. |
tablename |
The Terra data model table from which to extract the clinical data (default: "sample") |
sampleIdx |
numeric() index or TRUE. Specify an index for subsetting the
assay data. This argument is mainly used for example and vignette
purposes. To use all the data, use the default value (default: |
verbose |
logical(1L) Whether to output additional details of the data facilitation. |
A list
of assay datasets
if ( AnVILGCP::gcloud_exists() && identical(AnVILBase::avplatform_namespace(), "AnVILGCP") && nzchar(AnVILGCP::avworkspace_name()) ) getTCGAdatalist( assayNames = c("protein_exp__mda_rppa_core__mdanderson_org__Level_3__protein_normalization__data", "snp__genome_wide_snp_6__broad_mit_edu__Level_3__segmented_scna_minus_germline_cnv_hg18__seg"), sampleCode = c("01", "10"), workspace = "TCGA_COAD_OpenAccess_V1-0_DATA" )
if ( AnVILGCP::gcloud_exists() && identical(AnVILBase::avplatform_namespace(), "AnVILGCP") && nzchar(AnVILGCP::avworkspace_name()) ) getTCGAdatalist( assayNames = c("protein_exp__mda_rppa_core__mdanderson_org__Level_3__protein_normalization__data", "snp__genome_wide_snp_6__broad_mit_edu__Level_3__segmented_scna_minus_germline_cnv_hg18__seg"), sampleCode = c("01", "10"), workspace = "TCGA_COAD_OpenAccess_V1-0_DATA" )
The function provides an overview of samples from the avtables("sample")
table for the current workspace. Along with the sample codes and frequencies,
the output provides a description for each code and the short letter codes.
sampleTypesTable( workspace = terraTCGAworkspace(), namespace = .DEFAULT_NAMESPACE, tablename = .DEFAULT_TABLENAME, verbose = TRUE )
sampleTypesTable( workspace = terraTCGAworkspace(), namespace = .DEFAULT_NAMESPACE, tablename = .DEFAULT_TABLENAME, verbose = TRUE )
workspace |
character(1) The Terra Data Resources workspace from which
to pull TCGA data (default: see |
namespace |
character(1) The Terra Workspace Namespace that defaults to "broad-firecloud-tcga" and rarely needs to be changed. |
tablename |
The Terra data model table from which to extract the clinical data (default: "sample") |
verbose |
logical(1) Whether to output additional information regarding
the workspace and namespace (default: |
A tibble
of sample codes and frequency along with their
definition and short letter code
if ( AnVILGCP::gcloud_exists() && identical(AnVILBase::avplatform_namespace(), "AnVILGCP") && nzchar(AnVILGCP::avworkspace_name()) ) sampleTypesTable(workspace = "TCGA_COAD_OpenAccess_V1-0_DATA")
if ( AnVILGCP::gcloud_exists() && identical(AnVILBase::avplatform_namespace(), "AnVILGCP") && nzchar(AnVILGCP::avworkspace_name()) ) sampleTypesTable(workspace = "TCGA_COAD_OpenAccess_V1-0_DATA")
Workspaces on Terra come pre-loaded with TCGA Data. The examples in the documentation correspond to the TCGA_COAD_OpenAccess_V1 workspace that can be found on app.terra.bio.
terraTCGAdata( clinicalName, assays, participants = TRUE, sampleCode = NULL, split = FALSE, workspace = terraTCGAworkspace(), namespace = .DEFAULT_NAMESPACE, tablename = .DEFAULT_TABLENAME, verbose = TRUE, sampleIdx = TRUE )
terraTCGAdata( clinicalName, assays, participants = TRUE, sampleCode = NULL, split = FALSE, workspace = terraTCGAworkspace(), namespace = .DEFAULT_NAMESPACE, tablename = .DEFAULT_TABLENAME, verbose = TRUE, sampleIdx = TRUE )
clinicalName |
character(1) The column name taken from
|
assays |
character() A character vector of assay names taken from
|
participants |
logical(1) Whether to merge the participant table
from |
sampleCode |
character() A character vector of sample codes from
|
split |
logical(1L) Whether or not to split the |
workspace |
character(1) The Terra Data Resources workspace from which
to pull TCGA data (default: see |
namespace |
character(1) The Terra Workspace Namespace that defaults to "broad-firecloud-tcga" and rarely needs to be changed. |
tablename |
The Terra data model table from which to extract the clinical data (default: "sample") |
verbose |
logical(1) Whether to output additional information regarding
the workspace and namespace (default: |
sampleIdx |
numeric() index or TRUE. Specify an index for subsetting the
assay data. This argument is mainly used for example and vignette
purposes. To use all the data, use the default value (default: |
A MultiAssayExperiment
object with n number of assays corresponding
to the assays
argument.
if ( AnVILGCP::gcloud_exists() && identical(AnVILBase::avplatform_namespace(), "AnVILGCP") ) terraTCGAdata( clinicalName = "clin__bio__nationwidechildrens_org__Level_1__biospecimen__clin", assays = c("protein_exp__mda_rppa_core__mdanderson_org__Level_3__protein_normalization__data", "rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__RSEM_genes_normalized__data"), workspace = "TCGA_COAD_OpenAccess_V1-0_DATA", sampleCode = NULL, sampleIdx = 1:4, split = FALSE )
if ( AnVILGCP::gcloud_exists() && identical(AnVILBase::avplatform_namespace(), "AnVILGCP") ) terraTCGAdata( clinicalName = "clin__bio__nationwidechildrens_org__Level_1__biospecimen__clin", assays = c("protein_exp__mda_rppa_core__mdanderson_org__Level_3__protein_normalization__data", "rnaseqv2__illuminahiseq_rnaseqv2__unc_edu__Level_3__RSEM_genes_normalized__data"), workspace = "TCGA_COAD_OpenAccess_V1-0_DATA", sampleCode = NULL, sampleIdx = 1:4, split = FALSE )
Terra allows access to about 71 open access TCGA datasets. A dataset
workspace can be set using the terraTCGAworkspace
function with a
projectName
input. Use the selectTCGAworkspace
function to select
a TCGA data workspace from an interactive table.
terraTCGAworkspace(projectName = getOption("terraTCGAdata.workspace", NULL)) selectTCGAworkspace( projectName = getOption("terraTCGAdata.workspace", NULL), verbose = FALSE, ... )
terraTCGAworkspace(projectName = getOption("terraTCGAdata.workspace", NULL)) selectTCGAworkspace( projectName = getOption("terraTCGAdata.workspace", NULL), verbose = FALSE, ... )
projectName |
character(1) A project code usually in the form of
|
verbose |
logical(1) Whether to provide more informative messages when an the "terraTCGAdata.workspace" option is set. |
... |
further arguments passed down to lower level functions, not intended for the end user. |
Note that GDC workspaces are not supported and are excluded
from the search results. GDC workspaces use a Terra workflow to download
TCGA data rather than providing Google Bucket storage locations for easy
data retrieval. To reset the terraTCGAworkspace
, use
terraTCGAworkspace(NULL)
and you will be prompted to select from a list
of TCGA workspaces. You may also check the current active workspace by
running terraTCGAworkspace()
without any inputs.
A Terra TCGA Workspace name
selectTCGAworkspace()
: Function to interactively select from the
available TCGA data workspaces in Terra. The 'projectName' argument and
'terraTCGAdata.workspace' option must be 'NULL' to enable the interactive
gadget.
if ( AnVILGCP::gcloud_exists() && identical(AnVILBase::avplatform_namespace(), "AnVILGCP") && nzchar(AnVILGCP::avworkspace_name()) ) { selectTCGAworkspace() terraTCGAworkspace() }
if ( AnVILGCP::gcloud_exists() && identical(AnVILBase::avplatform_namespace(), "AnVILGCP") && nzchar(AnVILGCP::avworkspace_name()) ) { selectTCGAworkspace() terraTCGAworkspace() }