This directory contains the files related to the read data BC.454 that were used to test the respective accuracy of the four trimming programs AlienTrimmer [1],
cutadapt [2], FLEXBAR [3] and Btrim [4]. See [1] for more details.

Available files:
- bcM644.c30.QF.fq.dsrc:  a simulated FASTQ [5] read file related to SRA:ERR161539 (Sanger Phred quality scores),
- bcM644.aliens.fasta:    a FASTA file containing adapter sequences that contaminate these reads,
- bcM644.c30.QF.info.txt: a txt file with alien residue positions within these reads.

The read file bcM644.c30.QF.fq.dsrc was compressed with DSRC [6]. To uncompress it, use the program dsrc (available at [7]) with the following command line:
  dsrc d bcM644.c30.QF.fq.dsrc bcM644.c30.QF.fq

Each line in bcM644.c30.QF.info.txt corresponds to a read  in bcM644.c30.QF.fq, and indicates  whether this read contains  alien residues or not.  Every line in 
bcM644.c30.QF.info.txt starts with the identifier of the related read. If a line only contains such identifier, the corresponding read is not contaminated. If a
read ends-up with alien residues, the corresponding info line contains different fields after the identifier:
- the name of the alien sequence (available in the FASTA file bcM644.aliens.fasta),
- the alien prefix or suffix of the read,
- the flag cl= followed by the length of the alien (sub)sequence,
- the flag cm= followed by the number of mismatch(es) within alien residues,
- the flag ci= followed by the alien residue index(es) in the read (starting from 0).

Examples from bcM644.c30.QF.info.txt:

@SRR032593.81
@SRR032593.82 454_Adapter_B CTGAGACACGCAACAGGG cl=18 cm=1 ci= 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216
@SRR032593.83 454_Adapter_B AGGGGATAGGCAAGGCACACAGGGGATAGG cl=30 cm=0 ci= 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

Here, read SRR032593.81  does not contain  alien residue.  Second read SRR032593.82  contains cl=18 alien  residues in 3'  end from 454_Adapter_B  (see the file
bcM644.aliens.fasta) with cm=1 mismatch. Third read SRR032593.83 also contains alien residues from 454_Adapter_B, but in 5' end.


REFERENCES:
[1] Criscuolo A,  Brisse S  (2012)  ALIENTRIMMER: a tool to quickly and accurately trim off multiple short contaminant sequences from high-throughput sequencing
    reads. Genomics.
[2] Martin M  (2011)  Cutadapt removes  adapter sequences  from high-throughput  sequencing reads,  EMBnet Journal, 17:10-12.
[3] Dodt M, Roehr JT, Ahmed R, Dieterich C (2012) FLEXBAR--Flexible barcode and adapter processing for next-generation sequencing platforms, Biology, 1:895-905.
[4] Kong Y (2011) Btrim: a fast, lightweight adapter and  quality trimming program for next-generation sequencing technologies, Genomics, 98:152-153.
[5] Cock PJA,  Fields CJ,  Goto N,  Heuer ML,  Rice PM  (2009)  The Sanger FASTQ file  format for sequences with  quality scores, and the  Solexa/Illumina FASTQ
    variants, Nucleic Acids Research, 38:1767-1771
[6] S Deorowicz, Sz Grabowski (2011) Compression of DNA sequence reads in FASTQ format, Bioinformatics, 27:860–862.
[7] DSRC -- DNA Sequence Reads Compression, http://sun.aei.polsl.pl/dsrc/index.html















