read GSE SOFT file (contains platform annotation)
read_gse_soft(soft_file, verbose = F)
string. path to the SOFT file
tibble or NULL. the first variable is ID_REF
(probe ID) and the second one is HUGO gene symbol
for now, we drop probes which haven't been mapped to a symbool (mapping multiple symbols is okay)
Other read raw data:
parse_gse_soft()
,
read_gse_matrix()
read_gse_soft(system.file('extdata/GSE19161_family.soft.gz', package = 'rGEO'))
#> # A tibble: 588 × 2
#> ID_REF symbol
#> <chr> <chr>
#> 1 121_at PAX8
#> 2 200003_s_at RPL28
#> 3 200004_at EIF4G2
#> 4 200006_at PARK7
#> 5 200009_at GDI2
#> 6 200010_at RPL11
#> 7 200016_x_at HNRNPA1
#> 8 200017_at RPS27A
#> 9 200018_at RPS13
#> 10 200019_s_at FAU
#> # … with 578 more rows
read_gse_soft(system.file('extdata/GSE51280_family.soft.gz', package = 'rGEO'), verbose = T)
#>
#> platform meta:
#> [1] "#ID = "
#> [2] "#GENE_SYMBOL = Gene Symbol"
#> [3] "#GB_ACC = Genebank accession number LINK_PRE:\"http://www.ncbi.nlm.nih.gov/nuccore/?term=\""
#> [4] "#Class Name = Gene Class"
#> [5] "#SPOT_ID = "
#>
#>
#>
#> platform:
#>
#> $accession
#> [1] "GPL17590"
#>
#> $info
#> # A tibble: 5 × 2
#> name description
#> <chr> <chr>
#> 1 ID ""
#> 2 GENE_SYMBOL "Gene Symbol"
#> 3 GB_ACC "Genebank accession number LINK_PRE:\"http://www.ncbi.nlm.nih.gov…
#> 4 Class Name "Gene Class"
#> 5 SPOT_ID ""
#>
#> $table
#> # A tibble: 142 × 5
#> ID GENE_SYMBOL GB_ACC `Class Name` SPOT_ID
#> <chr> <chr> <chr> <chr> <chr>
#> 1 1 POS_A(128) NA Positive CONTROL
#> 2 2 POS_B(32) NA Positive CONTROL
#> 3 3 POS_C(8) NA Positive CONTROL
#> 4 4 POS_D(2) NA Positive CONTROL
#> 5 5 POS_E(0.5) NA Positive CONTROL
#> 6 6 POS_F(0.125) NA Positive CONTROL
#> 7 7 NEG_A(0) NA Negative CONTROL
#> 8 8 NEG_B(0) NA Negative CONTROL
#> 9 9 NEG_C(0) NA Negative CONTROL
#> 10 10 NEG_D(0) NA Negative CONTROL
#> # … with 132 more rows
#>
#>
#>
#> GPL17590: use "GENE_SYMBOL" as symbol
#>
#>
#>
#> # A tibble: 121 × 2
#> ID_REF symbol
#> <chr> <chr>
#> 1 100 NDC80
#> 2 101 NDRG1
#> 3 102 NEUROG1
#> 4 103 NT5E
#> 5 104 NUF2
#> 6 106 PGR
#> 7 107 PHGDH
#> 8 109 PIK3CA
#> 9 110 PLOD1
#> 10 111 PNP
#> # … with 111 more rows