parse GSE SOFT file (contains platform annotation)

parse_gse_soft(soft_file, verbose = T)

Arguments

path

string. path to the SOFT file

Value

list

  1. accession: string. platform accession

  2. info: tibble. chip column name information

  3. table: tibble. chip annotation

Details

for now, we drop probes which haven't been mapped to a symbool (mapping to multiple symbols is okay).

See also

Other read raw data: read_gse_matrix(), read_gse_soft()

Examples

parse_gse_soft(system.file('extdata/GSE19161_family.soft.gz', package = 'rGEO'), verbose = F)
#> $accession
#> [1] "GPL9717"
#> 
#> $info
#> # A tibble: 11 × 2
#>    name                        description                                      
#>    <chr>                       <chr>                                            
#>  1 ID                          ""                                               
#>  2 Species Scientific Name     ""                                               
#>  3 Sequence Type               ""                                               
#>  4 Sequence Source             ""                                               
#>  5 Transcript ID(Array Design) ""                                               
#>  6 UniGene ID                  ""                                               
#>  7 Gene Title                  ""                                               
#>  8 Gene Symbol                 ""                                               
#>  9 Ensembl                     ""                                               
#> 10 ORF                         "LINK_PRE:\"http://www.ncbi.nlm.nih.gov/gene/?te…
#> 11 SPOT_ID                     ""                                               
#> 
#> $table
#> # A tibble: 658 × 11
#>    ID         `Species Scien…` `Sequence Type` `Sequence Sour…` `Transcript ID…`
#>    <chr>      <chr>            <chr>           <chr>            <chr>           
#>  1 121_at     Homo sapiens     Exemplar seque… GenBank          X69699          
#>  2 200003_s_… Homo sapiens     Exemplar seque… GenBank          g4506626        
#>  3 200004_at  Homo sapiens     Exemplar seque… GenBank          g4503538        
#>  4 200006_at  Homo sapiens     Exemplar seque… GenBank          g6005748        
#>  5 200009_at  Homo sapiens     Exemplar seque… GenBank          g6598322        
#>  6 200010_at  Homo sapiens     Exemplar seque… GenBank          g4506594        
#>  7 200012_x_… Homo sapiens     Exemplar seque… GenBank          g4506610        
#>  8 200016_x_… Homo sapiens     Exemplar seque… GenBank          g4504444        
#>  9 200017_at  Homo sapiens     Exemplar seque… GenBank          g4506712        
#> 10 200018_at  Homo sapiens     Exemplar seque… GenBank          g4506684        
#> # … with 648 more rows, and 6 more variables: `UniGene ID` <chr>,
#> #   `Gene Title` <chr>, `Gene Symbol` <chr>, Ensembl <chr>, ORF <chr>,
#> #   SPOT_ID <chr>
#> 

parse_gse_soft(system.file('extdata/GSE51280_family.soft.gz', package = 'rGEO'))
#> 
#> platform meta:
#> [1] "#ID = "                                                                                     
#> [2] "#GENE_SYMBOL = Gene Symbol"                                                                 
#> [3] "#GB_ACC = Genebank accession number LINK_PRE:\"http://www.ncbi.nlm.nih.gov/nuccore/?term=\""
#> [4] "#Class Name = Gene Class"                                                                   
#> [5] "#SPOT_ID = "                                                                                
#> 
#> 
#> 
#> $accession
#> [1] "GPL17590"
#> 
#> $info
#> # A tibble: 5 × 2
#>   name        description                                                       
#>   <chr>       <chr>                                                             
#> 1 ID          ""                                                                
#> 2 GENE_SYMBOL "Gene Symbol"                                                     
#> 3 GB_ACC      "Genebank accession number LINK_PRE:\"http://www.ncbi.nlm.nih.gov…
#> 4 Class Name  "Gene Class"                                                      
#> 5 SPOT_ID     ""                                                                
#> 
#> $table
#> # A tibble: 142 × 5
#>    ID    GENE_SYMBOL  GB_ACC `Class Name` SPOT_ID
#>    <chr> <chr>        <chr>  <chr>        <chr>  
#>  1 1     POS_A(128)   NA     Positive     CONTROL
#>  2 2     POS_B(32)    NA     Positive     CONTROL
#>  3 3     POS_C(8)     NA     Positive     CONTROL
#>  4 4     POS_D(2)     NA     Positive     CONTROL
#>  5 5     POS_E(0.5)   NA     Positive     CONTROL
#>  6 6     POS_F(0.125) NA     Positive     CONTROL
#>  7 7     NEG_A(0)     NA     Negative     CONTROL
#>  8 8     NEG_B(0)     NA     Negative     CONTROL
#>  9 9     NEG_C(0)     NA     Negative     CONTROL
#> 10 10    NEG_D(0)     NA     Negative     CONTROL
#> # … with 132 more rows
#>