parse GSE SOFT file (contains platform annotation)
parse_gse_soft(soft_file, verbose = T)
string. path to the SOFT file
list
accession: string. platform accession
info: tibble. chip column name information
table: tibble. chip annotation
for now, we drop probes which haven't been mapped to a symbool (mapping to multiple symbols is okay).
Other read raw data:
read_gse_matrix()
,
read_gse_soft()
parse_gse_soft(system.file('extdata/GSE19161_family.soft.gz', package = 'rGEO'), verbose = F)
#> $accession
#> [1] "GPL9717"
#>
#> $info
#> # A tibble: 11 × 2
#> name description
#> <chr> <chr>
#> 1 ID ""
#> 2 Species Scientific Name ""
#> 3 Sequence Type ""
#> 4 Sequence Source ""
#> 5 Transcript ID(Array Design) ""
#> 6 UniGene ID ""
#> 7 Gene Title ""
#> 8 Gene Symbol ""
#> 9 Ensembl ""
#> 10 ORF "LINK_PRE:\"http://www.ncbi.nlm.nih.gov/gene/?te…
#> 11 SPOT_ID ""
#>
#> $table
#> # A tibble: 658 × 11
#> ID `Species Scien…` `Sequence Type` `Sequence Sour…` `Transcript ID…`
#> <chr> <chr> <chr> <chr> <chr>
#> 1 121_at Homo sapiens Exemplar seque… GenBank X69699
#> 2 200003_s_… Homo sapiens Exemplar seque… GenBank g4506626
#> 3 200004_at Homo sapiens Exemplar seque… GenBank g4503538
#> 4 200006_at Homo sapiens Exemplar seque… GenBank g6005748
#> 5 200009_at Homo sapiens Exemplar seque… GenBank g6598322
#> 6 200010_at Homo sapiens Exemplar seque… GenBank g4506594
#> 7 200012_x_… Homo sapiens Exemplar seque… GenBank g4506610
#> 8 200016_x_… Homo sapiens Exemplar seque… GenBank g4504444
#> 9 200017_at Homo sapiens Exemplar seque… GenBank g4506712
#> 10 200018_at Homo sapiens Exemplar seque… GenBank g4506684
#> # … with 648 more rows, and 6 more variables: `UniGene ID` <chr>,
#> # `Gene Title` <chr>, `Gene Symbol` <chr>, Ensembl <chr>, ORF <chr>,
#> # SPOT_ID <chr>
#>
parse_gse_soft(system.file('extdata/GSE51280_family.soft.gz', package = 'rGEO'))
#>
#> platform meta:
#> [1] "#ID = "
#> [2] "#GENE_SYMBOL = Gene Symbol"
#> [3] "#GB_ACC = Genebank accession number LINK_PRE:\"http://www.ncbi.nlm.nih.gov/nuccore/?term=\""
#> [4] "#Class Name = Gene Class"
#> [5] "#SPOT_ID = "
#>
#>
#>
#> $accession
#> [1] "GPL17590"
#>
#> $info
#> # A tibble: 5 × 2
#> name description
#> <chr> <chr>
#> 1 ID ""
#> 2 GENE_SYMBOL "Gene Symbol"
#> 3 GB_ACC "Genebank accession number LINK_PRE:\"http://www.ncbi.nlm.nih.gov…
#> 4 Class Name "Gene Class"
#> 5 SPOT_ID ""
#>
#> $table
#> # A tibble: 142 × 5
#> ID GENE_SYMBOL GB_ACC `Class Name` SPOT_ID
#> <chr> <chr> <chr> <chr> <chr>
#> 1 1 POS_A(128) NA Positive CONTROL
#> 2 2 POS_B(32) NA Positive CONTROL
#> 3 3 POS_C(8) NA Positive CONTROL
#> 4 4 POS_D(2) NA Positive CONTROL
#> 5 5 POS_E(0.5) NA Positive CONTROL
#> 6 6 POS_F(0.125) NA Positive CONTROL
#> 7 7 NEG_A(0) NA Negative CONTROL
#> 8 8 NEG_B(0) NA Negative CONTROL
#> 9 9 NEG_C(0) NA Negative CONTROL
#> 10 10 NEG_D(0) NA Negative CONTROL
#> # … with 132 more rows
#>