Read .sam file, only preserve 11 mandatory fields

read_sam(file)

Arguments

file

string. path to input file, passed onto readr::read_lines()

Value

a tibble of 11 variables

Details

Since .sam file can cantain arbitrary columns after the required 11, and it doesn't contain a header line, I find it hard to use readr::read_tsv(). Finally, I use stringr::str_split_fixed() to split the file into 12 columns, the last one which stored extra fields is discarded.

Examples

sam_file <- system.file('extdata', 'Neat1_1.Aligend_trunc.sam', package = 'paristools') read_sam(sam_file)
#> # A tibble: 4,996 x 11 #> QNAME FLAG RNAME POS MAPQ CIGAR RNEXT PNEXT TLEN SEQ QUAL #> <chr> <int> <chr> <int> <int> <chr> <chr> <int> <int> <chr> <chr> #> 1 ST-E0031… 0 neat1 2691 255 8S20M… * 0 0 TCGTCCCC… JJAJFAJ… #> 2 ST-E0031… 16 neat1 388 255 4S31M * 0 0 ACTCGGCA… <-FF<A-… #> 3 ST-E0031… 0 neat1 1653 255 2S32M * 0 0 TTGAGGAA… A<F<-AF… #> 4 ST-E0031… 0 neat1 3043 3 5S38M * 0 0 ATTCCAAT… JJJJJJJ… #> 5 ST-E0031… 256 neat1 2534 3 5M504… * 0 0 ATTCCAAT… JJJJJJJ… #> 6 ST-E0031… 0 neat1 2979 255 24M9S * 0 0 TTTTGTGA… JJJJJFJ… #> 7 ST-E0031… 0 neat1 2995 255 41M * 0 0 AAAAGTGG… JJJJJJJ… #> 8 ST-E0031… 0 neat1 184 255 1S29M * 0 0 ATCCAAAG… JJJJJJJ… #> 9 ST-E0031… 0 neat1 227 3 26M70… * 0 0 AGACCAGG… JFJJJJ7… #> 10 ST-E0031… 256 neat1 227 3 27M6S * 0 0 AGACCAGG… JFJJJJ7… #> # … with 4,986 more rows