read GSE SOFT file (contains platform annotation)

read_gse_soft(soft_file, verbose = F)

Arguments

path

string. path to the SOFT file

Value

tibble or NULL. the first variable is ID_REF (probe ID) and the second one is HUGO gene symbol

Details

for now, we drop probes which haven't been mapped to a symbool (mapping multiple symbols is okay)

See also

Other read raw data: parse_gse_soft(), read_gse_matrix()

Examples

read_gse_soft(system.file('extdata/GSE19161_family.soft.gz', package = 'rGEO'))
#> # A tibble: 588 × 2
#>    ID_REF      symbol 
#>    <chr>       <chr>  
#>  1 121_at      PAX8   
#>  2 200003_s_at RPL28  
#>  3 200004_at   EIF4G2 
#>  4 200006_at   PARK7  
#>  5 200009_at   GDI2   
#>  6 200010_at   RPL11  
#>  7 200016_x_at HNRNPA1
#>  8 200017_at   RPS27A 
#>  9 200018_at   RPS13  
#> 10 200019_s_at FAU    
#> # … with 578 more rows

read_gse_soft(system.file('extdata/GSE51280_family.soft.gz', package = 'rGEO'), verbose = T)
#> 
#> platform meta:
#> [1] "#ID = "                                                                                     
#> [2] "#GENE_SYMBOL = Gene Symbol"                                                                 
#> [3] "#GB_ACC = Genebank accession number LINK_PRE:\"http://www.ncbi.nlm.nih.gov/nuccore/?term=\""
#> [4] "#Class Name = Gene Class"                                                                   
#> [5] "#SPOT_ID = "                                                                                
#> 
#> 
#> 
#> platform:
#> 
#> $accession
#> [1] "GPL17590"
#> 
#> $info
#> # A tibble: 5 × 2
#>   name        description                                                       
#>   <chr>       <chr>                                                             
#> 1 ID          ""                                                                
#> 2 GENE_SYMBOL "Gene Symbol"                                                     
#> 3 GB_ACC      "Genebank accession number LINK_PRE:\"http://www.ncbi.nlm.nih.gov…
#> 4 Class Name  "Gene Class"                                                      
#> 5 SPOT_ID     ""                                                                
#> 
#> $table
#> # A tibble: 142 × 5
#>    ID    GENE_SYMBOL  GB_ACC `Class Name` SPOT_ID
#>    <chr> <chr>        <chr>  <chr>        <chr>  
#>  1 1     POS_A(128)   NA     Positive     CONTROL
#>  2 2     POS_B(32)    NA     Positive     CONTROL
#>  3 3     POS_C(8)     NA     Positive     CONTROL
#>  4 4     POS_D(2)     NA     Positive     CONTROL
#>  5 5     POS_E(0.5)   NA     Positive     CONTROL
#>  6 6     POS_F(0.125) NA     Positive     CONTROL
#>  7 7     NEG_A(0)     NA     Negative     CONTROL
#>  8 8     NEG_B(0)     NA     Negative     CONTROL
#>  9 9     NEG_C(0)     NA     Negative     CONTROL
#> 10 10    NEG_D(0)     NA     Negative     CONTROL
#> # … with 132 more rows
#> 
#> 
#> 
#> GPL17590: use "GENE_SYMBOL" as symbol
#> 
#> 
#> 
#> # A tibble: 121 × 2
#>    ID_REF symbol 
#>    <chr>  <chr>  
#>  1 100    NDC80  
#>  2 101    NDRG1  
#>  3 102    NEUROG1
#>  4 103    NT5E   
#>  5 104    NUF2   
#>  6 106    PGR    
#>  7 107    PHGDH  
#>  8 109    PIK3CA 
#>  9 110    PLOD1  
#> 10 111    PNP    
#> # … with 111 more rows