This document serves as a reporting tool for errors that occur when running our utility functions on the cBioPortal datasets.
cBioPortalData()
)Typically, the number of errors encountered via the API are low. There are only a handful of packages that error when we apply the utility functions to provide a MultiAssayExperiment data representation.
First, we load the error Rda
dataset.
api_errs <- system.file(
"extdata", "api", "err_api_info.json",
package = "cBioPortalData", mustWork = TRUE
)
err_api_info <- fromJSON(api_errs)
We can now inspect the contents of the data:
## [1] "list"
## [1] 6
## Barcodes must start with 'TCGA'
## 2
## group length is 0 but data length > 0
## 1
## Frequency of NA values higher than the cutoff tolerance
## 2
## Inconsistent build numbers found
## 33
## `n` must be a single number, not an integer `NA`.
## 1
## Argument 1 must be a data frame or a named atomic vector.
## 1
There were about 6 unique errors during the last build run.
## [1] "Barcodes must start with 'TCGA'"
## [2] "group length is 0 but data length > 0"
## [3] "Frequency of NA values higher than the cutoff tolerance"
## [4] "Inconsistent build numbers found"
## [5] "`n` must be a single number, not an integer `NA`."
## [6] "Argument 1 must be a data frame or a named atomic vector."
The most common error was
Inconsistent build numbers found
. This is due to
annotations from different build numbers that were not able to be
resolved.
To see what datasets (cancer_study_id
s) have that error
we can use:
## [1] "msk_ch_2020" "msk_access_2021"
## [3] "mixed_msk_tcga_2021" "mixed_impact_subset_2022"
## [5] "pan_origimed_2020" "prad_msk_stopsack_2021"
## [7] "pancan_pcawg_2020" "prad_pik3r1_msk_2021"
## [9] "skcm_tcga" "stad_tcga"
## [11] "stad_tcga_pub" "skcm_tcga_pan_can_atlas_2018"
## [13] "stad_tcga_pan_can_atlas_2018" "stes_tcga_pub"
## [15] "summit_2018" "cfdna_msk_2019"
## [17] "blca_bcan_hcrn_2022" "nsclc_ctdx_msk_2022"
## [19] "thyroid_mskcc_2016" "skcm_mskcc_2014"
## [21] "tmb_mskcc_2018" "rectal_msk_2019"
## [23] "skcm_tcga_pub_2015" "msk_spectrum_tme_2022"
## [25] "ucec_ccr_cfdna_msk_2022" "paired_bladder_2022"
## [27] "mtnn_msk_2022" "pog570_bcgsc_2020"
## [29] "sarcoma_msk_2023" "bowel_colitis_msk_2022"
## [31] "luad_mskcc_2023_met_organotropism" "coad_silu_2022"
## [33] "paac_msk_jco_2023"
We can also have a look at the entirety of the dataset.
## $`Barcodes must start with 'TCGA'`
## [1] "blca_msk_tcga_2020" "nsclc_tcga_broad_2016"
##
## $`group length is 0 but data length > 0`
## [1] "glioma_msk_2018"
##
## $`Frequency of NA values higher than the cutoff tolerance`
## [1] "mixed_selpercatinib_2020" "ucec_ccr_msk_2022"
##
## $`Inconsistent build numbers found`
## [1] "msk_ch_2020" "msk_access_2021"
## [3] "mixed_msk_tcga_2021" "mixed_impact_subset_2022"
## [5] "pan_origimed_2020" "prad_msk_stopsack_2021"
## [7] "pancan_pcawg_2020" "prad_pik3r1_msk_2021"
## [9] "skcm_tcga" "stad_tcga"
## [11] "stad_tcga_pub" "skcm_tcga_pan_can_atlas_2018"
## [13] "stad_tcga_pan_can_atlas_2018" "stes_tcga_pub"
## [15] "summit_2018" "cfdna_msk_2019"
## [17] "blca_bcan_hcrn_2022" "nsclc_ctdx_msk_2022"
## [19] "thyroid_mskcc_2016" "skcm_mskcc_2014"
## [21] "tmb_mskcc_2018" "rectal_msk_2019"
## [23] "skcm_tcga_pub_2015" "msk_spectrum_tme_2022"
## [25] "ucec_ccr_cfdna_msk_2022" "paired_bladder_2022"
## [27] "mtnn_msk_2022" "pog570_bcgsc_2020"
## [29] "sarcoma_msk_2023" "bowel_colitis_msk_2022"
## [31] "luad_mskcc_2023_met_organotropism" "coad_silu_2022"
## [33] "paac_msk_jco_2023"
##
## $``n` must be a single number, not an integer `NA`.`
## [1] "msk_met_2021"
##
## $`Argument 1 must be a data frame or a named atomic vector.`
## [1] "makeanimpact_ccr_2023"
cBioDataPack()
Now let’s look at the errors in the packaged datasets that are used
for cBioDataPack
:
pack_errs <- system.file(
"extdata", "pack", "err_pack_info.json",
package = "cBioPortalData", mustWork = TRUE
)
err_pack_info <- fromJSON(pack_errs)
We can do the same for this data:
## [1] 5
## more columns than column names
## 12
## Frequency of NA values higher than the cutoff tolerance
## 5
## invalid class "ExperimentList" object: \n Non-unique names provided
## 2
## non-character argument
## 2
## 'wget' call had nonzero exit status
## 13
We can get a list of all the errors present:
## [1] "more columns than column names"
## [2] "Frequency of NA values higher than the cutoff tolerance"
## [3] "invalid class \"ExperimentList\" object: \n Non-unique names provided"
## [4] "non-character argument"
## [5] "'wget' call had nonzero exit status"
And finally the full list of errors:
## $`more columns than column names`
## [1] "ccrcc_utokyo_2013" "gbm_cptac_2021"
## [3] "luad_mskimpact_2021" "mbl_dkfz_2017"
## [5] "pan_origimed_2020" "sarcoma_msk_2022"
## [7] "bowel_colitis_msk_2022" "prad_msk_mdanderson_2023"
## [9] "brca_tcga_pan_can_atlas_2018" "coadread_tcga_pan_can_atlas_2018"
## [11] "ov_tcga_pan_can_atlas_2018" "sarc_tcga_pan_can_atlas_2018"
##
## $`Frequency of NA values higher than the cutoff tolerance`
## [1] "ihch_mskcc_2020" "mixed_selpercatinib_2020"
## [3] "ucec_ccr_msk_2022" "mixed_msk_tcga_2021"
## [5] "ihch_msk_2021"
##
## $`invalid class "ExperimentList" object: \n Non-unique names provided`
## [1] "mpnst_mskcc" "stad_tcga_pub"
##
## $`non-character argument`
## [1] "pcpg_tcga_pub" "mbn_mdacc_2013"
##
## $`'wget' call had nonzero exit status`
## [1] "braf_msk_impact_2024" "braf_msk_archer_2024" "prostate_msk_2024"
## [4] "pcnsl_msk_2024" "pdac_msk_2024" "ucs_msk_2024"
## [7] "asclc_msk_2024" "lms_msk_2024" "crc_orion_2024"
## [10] "brca_aurora_2023" "hcc_msk_2024" "pancreas_msk_2024"
## [13] "pancan_mimsi_msk_2024"
## R version 4.4.2 (2024-10-31)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
## LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Etc/UTC
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] jsonlite_1.8.9 survminer_0.5.0
## [3] ggpubr_0.6.0 ggplot2_3.5.1
## [5] survival_3.8-3 cBioPortalData_2.19.7
## [7] MultiAssayExperiment_1.33.5 SummarizedExperiment_1.37.0
## [9] Biobase_2.67.0 GenomicRanges_1.59.1
## [11] GenomeInfoDb_1.43.2 IRanges_2.41.2
## [13] S4Vectors_0.45.2 BiocGenerics_0.53.3
## [15] generics_0.1.3 MatrixGenerics_1.19.0
## [17] matrixStats_1.5.0 AnVIL_1.19.4
## [19] AnVILBase_1.1.0 dplyr_1.1.4
## [21] BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] sys_3.4.3 magrittr_2.0.3
## [3] GenomicFeatures_1.59.1 farver_2.1.2
## [5] rmarkdown_2.29 BiocIO_1.17.1
## [7] zlibbioc_1.53.0 vctrs_0.6.5
## [9] memoise_2.0.1 Rsamtools_2.23.1
## [11] RCurl_1.98-1.16 rstatix_0.7.2
## [13] htmltools_0.5.8.1 S4Arrays_1.7.1
## [15] BiocBaseUtils_1.9.0 lambda.r_1.2.4
## [17] curl_6.1.0 broom_1.0.7
## [19] Formula_1.2-5 SparseArray_1.7.2
## [21] sass_0.4.9 bslib_0.8.0
## [23] htmlwidgets_1.6.4 httr2_1.0.7
## [25] zoo_1.8-12 futile.options_1.0.1
## [27] cachem_1.1.0 commonmark_1.9.2
## [29] buildtools_1.0.0 GenomicAlignments_1.43.0
## [31] mime_0.12 lifecycle_1.0.4
## [33] pkgconfig_2.0.3 Matrix_1.7-1
## [35] R6_2.5.1 fastmap_1.2.0
## [37] GenomeInfoDbData_1.2.13 shiny_1.10.0
## [39] digest_0.6.37 colorspace_2.1-1
## [41] RaggedExperiment_1.31.1 AnnotationDbi_1.69.0
## [43] RSQLite_2.3.9 labeling_0.4.3
## [45] filelock_1.0.3 RTCGAToolbox_2.37.2
## [47] km.ci_0.5-6 RJSONIO_1.3-1.9
## [49] httr_1.4.7 abind_1.4-8
## [51] compiler_4.4.2 bit64_4.5.2
## [53] withr_3.0.2 backports_1.5.0
## [55] BiocParallel_1.41.0 carData_3.0-5
## [57] DBI_1.2.3 ggsignif_0.6.4
## [59] rappdirs_0.3.3 DelayedArray_0.33.3
## [61] rjson_0.2.23 tools_4.4.2
## [63] httpuv_1.6.15 glue_1.8.0
## [65] restfulr_0.0.15 promises_1.3.2
## [67] gridtext_0.1.5 grid_4.4.2
## [69] gtable_0.3.6 KMsurv_0.1-5
## [71] tzdb_0.4.0 tidyr_1.3.1
## [73] data.table_1.16.4 hms_1.1.3
## [75] car_3.1-3 xml2_1.3.6
## [77] utf8_1.2.4 XVector_0.47.2
## [79] markdown_1.13 pillar_1.10.1
## [81] stringr_1.5.1 later_1.4.1
## [83] splines_4.4.2 ggtext_0.1.2
## [85] BiocFileCache_2.15.0 lattice_0.22-6
## [87] rtracklayer_1.67.0 bit_4.5.0.1
## [89] tidyselect_1.2.1 maketools_1.3.1
## [91] Biostrings_2.75.3 miniUI_0.1.1.1
## [93] knitr_1.49 gridExtra_2.3
## [95] futile.logger_1.4.3 xfun_0.50
## [97] DT_0.33 stringi_1.8.4
## [99] UCSC.utils_1.3.0 yaml_2.3.10
## [101] evaluate_1.0.1 codetools_0.2-20
## [103] tibble_3.2.1 BiocManager_1.30.25
## [105] cli_3.6.3 xtable_1.8-4
## [107] munsell_0.5.1 jquerylib_0.1.4
## [109] survMisc_0.5.6 Rcpp_1.0.13-1
## [111] GenomicDataCommons_1.31.0 dbplyr_2.5.0
## [113] png_0.1-8 XML_3.99-0.18
## [115] rapiclient_0.1.8 parallel_4.4.2
## [117] TCGAutils_1.27.6 readr_2.1.5
## [119] blob_1.2.4 bitops_1.0-9
## [121] scales_1.3.0 purrr_1.0.2
## [123] crayon_1.5.3 rlang_1.1.4
## [125] KEGGREST_1.47.0 rvest_1.0.4
## [127] formatR_1.14