read .duplexgroup file

read_duplexgroup(file)

Arguments

file

string. path to input file, passed onto readr::read_file()

Value

a tibble of 8 variables

implementation

read_duplexgroup() runs quite fast, since we fully utilize R's vectorisation feature, at the price of obscured code.

read_duplexgroup_old() is much clearer, it parses each group separately. reading its source can help you understand the implementation.

In short, the most difficult part is, how to label each row with correct identifier (group id here) after we concatenate each loc line and parse_locs() at once,

Examples

duplexgroup_file <- system.file('extdata', 'Neat1_1.duplexgroup', package = 'paristools'); read_duplexgroup(duplexgroup_file)
#> # A tibble: 15,964 x 8 #> chrom strand start end pair type id score #> <chr> <chr> <int> <int> <chr> <chr> <chr> <dbl> #> 1 neat1 + 1 15 left genome 0 0.01 #> 2 neat1 + 40 50 right genome 0 0.01 #> 3 neat1 + 1 15 left read 0 0.01 #> 4 neat1 + 1 19 left read 0 0.01 #> 5 neat1 + 40 69 right read 0 0.01 #> 6 neat1 + 27 50 right read 0 0.01 #> 7 neat1 + 1 15 left genome 1 0.012 #> 8 neat1 + 303 316 right genome 1 0.012 #> 9 neat1 + 1 15 left read 1 0.012 #> 10 neat1 + 1 16 left read 1 0.012 #> # … with 15,954 more rows