
PROGRAMS & LIBRARIES :

* Check all output functions results (printf, putc, ...).
* Load aa weights from extern data file.
* Allow extra gaps chars `.', `?', `~' (in all formats ?).
* Must convert all `..' to `. .' in GCG header.
* Make thread safe libraries (msf+gcg).
* Use valid gap characters according to each format.
* Check nucleic NEXUS format names.
* Check errors vs. normal end parsing.
* New `-q' (quiet) flag to suppress format message.
* New `-v' (verbose) flag to display all formats checks and results.
* Fix memory leaks in FASTA and PHYLIPS format detections.
* Make some sequence_t fields private.
* Warn for truncated values (names, accessions, ...).
* Check for sequence names output validity in each format.
* Require at least 2 sequences in alignments, even with non strict mode.
* Cannot handle matched token larger than 16kB.
* Empty file should not match any known format.

FORMATS :

* ASN1: Sequence format from NCBI.
* BAMBE: Alignment format (derived from PHYLIP).
* CLUSTAL: Handle positions range after sequence names.
* EMBL: Restore version parsing with new ID line format.
* FASTA: Add NCBI header format parsing.
* GENAL: Sequence format from GenAl program (may conflict with FASTA).
* GENBANK: Cannot handle sequence bigger than 10Mb.
* MAF: Multiple Alignment Format (from UCSC Genome Bioinformatics).
* MEGA: Only a single dataset is supported.
* NEXUS: Non interleaved format.
* NEXUS: MESQUITE program seems to generate invalid format.
* PHYLIP: Support for multiple alignments in the same file.
* PHYLIP: Exercise sequence names cleanup.
* PSA/XPSA: Alignment format from pftools package, cf. psa(5), xpsa(5).
* RSF: New `Rich Sequence Format' format from GCG.
* SELEX: Alignment format (STOCKHOLM ancestor).
* SPROT: Fix DE line output for new structured datas.

DATABANKS :

* EMBL: Invalid author name (Ref1) (I13016 - rel_pat_unc_13_r121.dat).
* EMBL: Dup title terminator (Ref2) (AJ697806 - rel_std_pln_02_r121.dat).

* GENBANK: Spurious newline in DBLINK (JQ435097 - gbpln77.seq).

* IMGT: Flat file has strange characters.
* IMGT: Many trailing spaces.
* IMGT: Primary accession number duplicated if secondary exists.

* REFSEQ: Multiple SOURCE (WP_018003284 - rsnc.1113.2014.gpff).
* REFSEQ: Multiple SOURCE (WP_018003284 - rsnc.1114.2014.gpff).
* REFSEQ: Multiple SOURCE (WP_018003279 - rsnc.1122.2014.gpff).
* REFSEQ: Multiple SOURCE (WP_018002905 - complete.nonredundant_protein.133.protein.gpff).
* REFSEQ: Multiple SOURCE (WP_018193001 - complete.nonredundant_protein.148.protein.gpff).
* REFSEQ: Multiple SOURCE (WP_001224390 - complete.nonredundant_protein.150.protein.gpff).
* REFSEQ: Multiple SOURCE (WP_003868548 - complete.nonredundant_protein.154.protein.gpff).
* REFSEQ: Multiple SOURCE (WP_018003457 - complete.nonredundant_protein.155.protein.gpff).
* REFSEQ: Multiple SOURCE (WP_018036044 - complete.nonredundant_protein.173.protein.gpff).
* REFSEQ: Multiple SOURCE (WP_018003162 - complete.nonredundant_protein.188.protein.gpff).
* REFSEQ: Multiple SOURCE (WP_020247627 - complete.nonredundant_protein.204.protein.gpff).
* REFSEQ: Multiple SOURCE (WP_018035696 - complete.nonredundant_protein.206.protein.gpff).
* REFSEQ: Multiple SOURCE (WP_008439751 - complete.nonredundant_protein.208.protein.gpff).
* REFSEQ: Multiple SOURCE (WP_018036744 - complete.nonredundant_protein.217.protein.gpff).
* REFSEQ: Multiple SOURCE (WP_008440780 - complete.nonredundant_protein.224.protein.gpff).

* UNIPROT: Ref 1 title has internal `"' (B2CI52 - uniprot_trembl.dat).
* UNIPROT: Ref 2 title has internal `"' (Q50565 - uniprot_trembl.dat).
* UNIPROT: Ref 1 title has internal `"' (Q4GX11 - uniprot_trembl.dat).

BUILD :

lib/align/megay.y: warning: 2 shift/reduce conflicts [-Wconflicts-sr]

