read .duplexgroup
file
read_duplexgroup(file)
file | string. path to input file, passed onto |
---|
a tibble of 8 variables
read_duplexgroup()
runs quite fast, since we fully utilize R's
vectorisation feature, at the price of obscured code.
read_duplexgroup_old()
is much clearer, it parses each group separately.
reading its source can help you understand the implementation.
In short, the most difficult part is, how to label each row with correct
identifier (group id here) after we concatenate each loc line and
parse_locs()
at once,
duplexgroup_file <- system.file('extdata', 'Neat1_1.duplexgroup', package = 'paristools'); read_duplexgroup(duplexgroup_file)#> # A tibble: 15,964 x 8 #> chrom strand start end pair type id score #> <chr> <chr> <int> <int> <chr> <chr> <chr> <dbl> #> 1 neat1 + 1 15 left genome 0 0.01 #> 2 neat1 + 40 50 right genome 0 0.01 #> 3 neat1 + 1 15 left read 0 0.01 #> 4 neat1 + 1 19 left read 0 0.01 #> 5 neat1 + 40 69 right read 0 0.01 #> 6 neat1 + 27 50 right read 0 0.01 #> 7 neat1 + 1 15 left genome 1 0.012 #> 8 neat1 + 303 316 right genome 1 0.012 #> 9 neat1 + 1 15 left read 1 0.012 #> 10 neat1 + 1 16 left read 1 0.012 #> # … with 15,954 more rows