Common Lisp Package: SAM

cl-sam is a Common Lisp toolkit for manipulation of DNA sequence alignment data stored in the Sequence Alignment/Map (SAM) format <http://samtools.sourceforge.net>. The SAM specficiation describes text (SAM) and binary (BAM) formats. cl-sam uses the BGZF block compression code from the SAMtools C toolkit, but is otherwise an independent Lisp implementation capable of editing every aspect of BAM files or creating BAM data de novo. The following example implements something similar to samtools flagstat: ;;; (with-bgzf (bgzf "example.bam" :direction :input) ;;; (multiple-value-bind (header num-refs ref-meta) ;;; (read-bam-meta bgzf) ;;; (format t "BAM header: ~s~%" header) ;;; (format t "Number of references: ~d~%" num-refs) ;;; (loop ;;; for (id name length) in ref-meta ;;; do (format t "Reference name: ~s Length: ~d~%" id name length))) ;;; (loop ;;; for alignment = (read-alignment bgzf) ;;; while alignment ;;; for flag = (alignment-flag alignment) ;;; count flag into total ;;; count (fails-platform-qc-p flag) into qc-fail ;;; count (pcr/optical-duplicate-p flag) into duplicates ;;; count (not (query-unmapped-p flag)) into mapped ;;; count (sequenced-pair-p flag) into seq-paired ;;; count (first-in-pair-p flag) into read1 ;;; count (second-in-pair-p flag) into read2 ;;; count (mapped-proper-pair-p flag) into proper-paired ;;; count (and (not (query-unmapped-p flag)) ;;; (not (mate-unmapped-p flag))) into both-mapped ;;; count (mate-unmapped-p flag) into singletons ;;; finally (format t (str "~d in total~%" ;;; "~d QC failure~%" ;;; "~d duplicates~%" ;;; "~d mapped (~$%)~%" ;;; "~d paired in sequencing~%" ;;; "~d read1~%" ;;; "~d read2~%" ;;; "~d properly paired (~$%)~%" ;;; "~d both mapped~%" ;;; "~d singletons (~$%)~%") ;;; total qc-fail duplicates mapped ;;; (* 100 (/ mapped total)) ;;; seq-paired read1 read2 proper-paired ;;; (* 100 (/ proper-paired total)) ;;; both-mapped singletons ;;; (* 100 (/ singletons total)))))

README:

FUNCTION

Public

ADD-PG-RECORD (HEADER NEW-RECORD)

Returns a copy of HEADER with PG record NEW-RECORD added. If the ID of NEW-RECORD clashes with existing IDs, all IDs are remapped to new, generated sequential integer IDs, starting at 0. PP links are also updated. NEW-RECORD must have its PP field set by the caller.

ALIGNMENT-BIN (ALN)

Returns an integer that indicates the alignment bin to which ALN has been assigned.

ALIGNMENT-CIGAR (ALN)

Returns the CIGAR record list of the alignment described by ALN. CIGAR operations are given as a list, each member being a list of a CIGAR operation keyword and an integer operation length.

ALIGNMENT-CORE (ALN &KEY VALIDATE)

Returns a list of the core data described by ALN. The list elements are comprised of reference-id, alignment-position, read-name length, mapping-quality alignment-bin, cigar length, alignment flag, read length, mate reference-id, mate alignment-position and insert length.

ALIGNMENT-CORE-ALIST (ALN &KEY VALIDATE)

Returns the same data as {defun alignment-core} in the form of an alist.

ALIGNMENT-FLAG (ALN &KEY VALIDATE)

Returns an integer whose bits are flags that describe properties of the ALN. If the VALIDATE key is T (the default) the flag's bits are checked for internal consistency.

ALIGNMENT-FLAG-ALIST (ALN &KEY VALIDATE)

Returns the bitwise flags of ALN in the form of an alist. The primary purpose of this function is debugging.

ALIGNMENT-GENERATOR (REFERENCE-ID READ-GROUP &KEY (READ-LENGTH 50) NAME-SUFFIX (INSERT-LENGTH 250) (START 0) (END (+ INSERT-LENGTH (* 2 READ-LENGTH) 1)) (STEP-SIZE 10) (MAPPING-QUALITY 50) (SEQ-FN #'DEFAULT-SEQ-STRING) (QUALITY-FN #'DEFAULT-QUAL-STRING))

Returns a new standard generator function which returns pairs of BAM alignment records. Arguments: - reference-id (integer): The BAM reference identifier. - read-group (string): The read group name. Key: - read-length (integer): The read length. - insert-length (integer): The insert length, defined as the distance between the inner boundaries of the paired alignments. - start (integer): The position at which to start generating alignments. - end (integer): The position at which to stop generating alignments, expressed in Lisp vector indices (zero-based, half-open), rather than BAM's (zero-based, closed). - step-size (integer): The distance between each successive pair of alignments. - mapping-quality (integer): The mapping quality of the alignments. Returns - A function.

ALIGNMENT-NAME< (ALIGNMENT-RECORD1 ALIGNMENT-RECORD2)

Returns T if ALIGNMENT-RECORD1 sorts before ALIGNMENT-RECORD2 by read name. Sorting semantics of read names are not fully defined in the SAM spec; whether you should expect natural order or numeric order of read name strings is not defined. This function compares numerically by read name, then by alignment template region for reads paired in sequencing and finally by alignment strand.

ALIGNMENT-NOT-PRIMARY-P (FLAG)

Returns T if FLAG indicates that the read mapping was not the primary mapping to a reference, or NIL otherwise.

ALIGNMENT-ORPHANIZER (GEN &KEY (TEST (DEFAULT-ORPHANIZER-TEST)))

Given an alignment generator function GEN, returns a new function that omits some alignments, yielding orphans. Any alignments for which TEST returns T will be omitted. The default is to omit alternate last fragment alignments.

ALIGNMENT-POSITION (ALN)

Returns the 0-based sequence coordinate of ALN in the reference sequence of the first base of the clipped read.

ALIGNMENT-RECORD< (ALIGNMENT-RECORD1 ALIGNMENT-RECORD2)

Returns T if ALIGNMENT-RECORD1 sorts before ALIGNMENT-RECORD2. Sorting semantics are not fully defined in the SAM spec, however, an informal consensus on sequence order sorting seems to be: - mapped reads should first be sorted by their reference in the order in which reference sequences appear in the header - unmapped reads should sort after mapped reads - reads mapped to the same reference must appear in ascending order of their alignment position This function compares first by reference sequence, then alignment position, by alignment strand and finally by read name. The output is identical to a coordinate sort performed by Picard 1.07.

ALIGNMENT-REFERENCE-END (ALN)

Returns the end position of the alignment on the reference. Requires decoding the CIGAR string.

ALIGNMENT-TAG-VALUES (ALN)

Returns an alist of tag and values described by ALN.

BAM-INDEX-REFS (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

BGZF-CLOSE (BGZF)

Closes an open block gzip file. Arguments: - bgzf (bgzf structure): The file to close. Returns: - T on success.

BGZF-EOF-P (BGZF)

Returns T if the BGZF stream is terminated by an empty record.

BGZF-OPEN (BGZFSPEC &REST ARGS &KEY (COMPRESSION *DEFAULT-COMPRESSION*) &ALLOW-OTHER-KEYS)

Opens a block gzip file for reading or writing. Arguments: - bgzfspec (pathname designator, open stream or bgzf object): The file to open or an open stream or bgzf object (which is returned unmodified). The *standard-input* or *standard-ouput* streams may be used. Key: - compression (keyword): The zlib compression level (for writing). Also accepts the keyword arguments applicable to CL:OPEN. However, an :element-type argument will be ignored as the type is always octet. Returns: - A BGZF structure.

BGZF-SEEK (BGZF POSITION)

Seeks with the file encapsulated by a block gzip file. Arguments: - bgzf (bgzf structure): The handle to seek. - position (integer): The position to seek. Only values previously returned by {defun bgzf-tell} may be used. The most significant 48 bits denote the real file position and the least significant 16 bits the offset within the uncompressed gzip member (see the SAM spec). Returns: - The new position.

BGZF-TELL (BGZF)

Returns the current position in the encapsulated file of a block gzip file. Arguments: - bgzf (bgzf structure): The file. Returns: - The file position.

BGZIP-OPEN (FILESPEC &KEY (DIRECTION INPUT))

Opens a block gzip stream for FILESPEC. Key: - direction (symbol): One of :input (the default) or :output Returns: - A {defclass bgzip-stream}

BIN-CHUNK (BIN CHUNK-NUM)

Returns the chunk number CHUNK-NUM in BIN.

BIN-CHUNKS (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

BIN-NUM (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

CIGAR-LENGTH (ALN)

Returns the number of CIGAR operations in ALN.

COORDINATE-SORTED-P (HEADER)

Returns T if parsed HEADER indicates that the file is sorted by coordinate, or NIL otherwise.

COPY-ALIGNMENT-RECORD (ALN &KEY ALIGNMENT-FLAG (REFERENCE-ID (REFERENCE-ID ALN)) (ALIGNMENT-POS (ALIGNMENT-POSITION ALN)) (MATE-REFERENCE-ID (MATE-REFERENCE-ID ALN)) (MATE-ALIGNMENT-POS (MATE-ALIGNMENT-POSITION ALN)) (MAPPING-QUALITY (MAPPING-QUALITY ALN)) (ALIGNMENT-BIN (ALIGNMENT-BIN ALN)) (INSERT-LENGTH (INSERT-LENGTH ALN)) (CIGAR (ALIGNMENT-CIGAR ALN)) (TAG-VALUES (ALIGNMENT-TAG-VALUES ALN)))

Returns a copy of ALN, optionally setting some fields to new values.

EMPTY-BIN-P (BIN)

Returns T if BIN contains no chunks, or NIL otherwise.

ENSURE-MANDATORY-HEADER-TAGS (RECORD)

Returns HEADER-RECORD if all mandatory tags are present, or raises a {define-condition malformed-record-error} .

ENSURE-VALID-HEADER-TAGS (RECORD)

Checks list HEADER-RECORD for tag validity and returns HEADER-RECORD or raises a {define-condition malformed-record-error} if invalid tags are present. Ignores any user tags (lower case tags).

ENSURE-VALID-PROGRAMS (HEADER)

Returns HEADER if its PG records have unique ID tag values and valid PP tag values. A valid PP tag must point to the ID of one of the other PG records in the header.

FAILS-PLATFORM-QC-P (FLAG)

Returns T if FLAG indicates that the read failed plaform quality control, or NIL otherwise.

FIND-BAM-INDEX (BAM-FILESPEC)

Returns the pathname of an index file for BAM-FILESPEC, if the file is present and conforms to the naming convention of either Picard or Samtools. If both are present, the Picard convention is favoured.

FIND-BINS (INDEX REFERENCE-ID START END)

Returns a list of all the bins for reference REFERENCE-ID in INDEX, between reference positions START and END.

FIRST-FRAG-P (FLAG)

Returns T if FLAG indicates that the read was the first in a pair of reads from one template, or NIL otherwise.

FIRST-IN-PAIR-P (FLAG)

Returns T if FLAG indicates that the read was the first in a pair of reads from one template, or NIL otherwise.

FLAG-BITS (FLAG &REST BIT-NAMES)

Returns an integer FLAG that had BAM flag bits named by symbols BIT-NAMES set. Arguments: - flag (unsigned-byte 8): a BAM alignment flag. Rest: - bit-names (symbols): Any number of valid bit flag names: ;;; :sequenced-pair ;;; :mapped-proper-pair ;;; :query-mapped , :query-unmapped ;;; :mate-mapped , :mate-unmapped ;;; :query-forward , :query-reverse ;;; :mate-forward , :mate-reverse ;;; :first-in-pair , :second-in-pair ;;; :alignment-primary , :alignment-not-primary ;;; :fails-platform-qc ;;; :pcr/optical-duplicate Returns: - An (unsigned-byte 8)

FLAGSTAT (BAM-FILESPEC &OPTIONAL (STREAM T))

Writes SAM flag counts from BAM-FILESPEC to STREAM.

FRAG-FORWARD-P (FLAG)

Returns T if FLAG indicates that the read was mapped to the forward strand of a reference, or NIL if it was mapped to the reverse strand.

FRAG-MAPPED-P (FLAG)

Returns T if FLAG indicates that the read's mate was mapped to a reference, or NIL otherwise.

FRAG-REVERSE-P (FLAG)

Returns T if FLAG indicates that the read was mapped to the reverse strand of a reference, or NIL if it was mapped to the forward strand.

FRAG-UNMAPPED-P (FLAG)

Returns T if FLAG indicates that the read was not mapped to a reference, or NIL otherwise.

GENERATE-BAM-FILE (FILESPEC NUM-REFS REF-LENGTH READ-GROUPS &REST ALN-GENERATORS)

Writes a very uniform BAM file to the file denoted by pathname designator FILESPEC, for testing purposes. Arguments: - filespec (pathname designator): The BAM file to be written. - num-refs (integer): The number of reference sequences in the BAM header. - read-groups (list of string): The read group names in the BAM header. Each reference will have a set of reads generated in each read group. Rest: - aln-generators (list of function): A list of alignment generator functions created with {defun alignment-generator} function. All of these functions will be called until exhausted, so they should represent finite sequences. Returns - filespec.

GENERATE-REFERENCE-FILE (FILESPEC NAME LENGTH &OPTIONAL (FN (LAMBDA () (CHAR acgt (RANDOM 4)))))

Creates an artificial reference genome Fasta file to accompany a generated BAM file. Arguments: - filespec (pathname designator): The file to create. - name (string): The name of the reference sequence. - length (integer): The length of the reference sequence. Optional: - fn (function): A function that returns a single base character. The default randomly returns a, c, g or t in equal proportions. Returns: - The filespec.

HD-RECORD (&KEY (VERSION *SAM-VERSION*) (SORT-ORDER UNSORTED))

Returns a new HD record.

HEADER-RECORDS (HEADER HEADER-TYPE)

Returns a list of all records of HEADER-TYPE from HEADER.

HEADER-TAGS (RECORD)

Returns an alist of the tag-values of RECORD.

HEADER-TYPE (RECORD)

Returns a symbol indicating the header-type of RECORD, being one of :HD , :SQ , :RG or :PG .

HEADER-VALUE (RECORD TAG)

Returns the value associated with TAG in RECORD.

INDEX-BAM-FILE (FILESPEC)

Returns a new BAM-INDEX object, given pathname designator FILESPEC.

INSERT-LENGTH (ALN)

Returns the insert length described by ALN.

LAST-FRAG-P (FLAG)

Returns T if FLAG indicates that the read was the second in a pair of reads from one template, or NIL otherwise.

LAST-PROGRAMS (HEADER)

Returns a list of the PG ID values from HEADER that are the last programs to act on the data. i.e. these are the leaf programs in the previous program tree.

MAKE-ALIGNMENT-RECORD (READ-NAME SEQ-STR ALIGNMENT-FLAG &KEY (REFERENCE-ID +UNKNOWN-REFERENCE+) (ALIGNMENT-POS +UNKNOWN-POSITION+) (MATE-REFERENCE-ID +UNKNOWN-REFERENCE+) (MATE-ALIGNMENT-POS +UNKNOWN-POSITION+) (MAPPING-QUALITY 0) (ALIGNMENT-BIN 0) (INSERT-LENGTH 0) CIGAR QUALITY-STR TAG-VALUES)

Returns a new alignment record array. Arguments: - read-name (string): The read name. - seq-str (string): The read sequence. - alignment-flag (integer): The binary alignment flag. Key: - reference-id (integer): The reference identifier, defaults to -1 - alignment-pos (integer): The 1-based alignment position, defaults to -1. - mate-reference-id (integer): The reference identifier of the mate, defaults to -1. - mate-alignment-pos (integer): The 1-based alignment position of the mate. - mapping-quality (integer): The mapping quality, defaults to 0. - alignment-bin (integer): The alignment bin, defaults to 0. - insert-length (integer): The insert size, defaults to 0. - cigar (alist): The cigar represented as an alist of operations e.g. ;;; '((:M . 9) (:I . 1) (:M . 25)) - quality-str (string): The read quality string. - tag-values (alist): The alignment tags represented as an alist e.g. ;;; '((:XT . #U) (:NM . 1) (:X0 . 1) (:X1 . 0) ;;; (:XM . 1) (:XO . 0) (:XG . 0) (:MD . "3T31")) The tags must have been defined with {defmacro define-alignment-tag} . Returns: - A vector of '(unsigned-byte 8).

MAKE-BAM-OUTPUT (BAM)

Returns a consumer function that accepts an argument of a BAM record and writes it to BAM output stream BAM. The standard consumer interface function CONSUME may be used in operations on the returned consumer.

MAKE-HEADER-RECORD (STR)

Parses a single SAM header record STR and returns a list. May raise a {define-condition malformed-record-error} or {define-condition malformed-field-error} . SAM tags are converted to keyword symbols. Given a header record of ;;; "@SQ SN:AL096846 LN:6490 SP:Schizosaccharomyces pombe" the returned list will be ;;; (:SQ (:SN . "AL096846") (:LN . 6490) ;;; (:SP . "Schizosaccharomyces pombe")) thus the list's first element is a keyword describing the record type and the rest of the list is itself an alist of record keys and values.

MAKE-REFERENCE-TABLE (REF-META-LIST)

Returns a hash-table mapping reference identifiers to reference names for the reference data in REF-META-LIST.

MAKE-SAM-HEADER (STR)

Returns a list containing the data in header STR as Lisp objects. If the header is NIL (there was no header) this function returns NIL. Given a header of ;;; "@HD VN:1.0 ;;; @SQ SN:AL096846 LN:6490 SP:Schizosaccharomyces pombe ;;; @PG ID:bwa VN:0.4.6 CL:aln -n 0.04 -o 1 -e -1 -i 5 -d 10 -k 2 -M 3 -O 11 -E 4" the returned list will be ;;; ((:HD (:VN . "1.0")) ;;; (:SQ (:SN . "AL096846") (:LN . 6490) ;;; (:SP . "Schizosaccharomyces pombe")) ;;; (:PG (:ID . "bwa") (:VN . "0.4.6") ;;; (:CL . "aln -n 0.04 -o 1 -e -1 -i 5 -d 10 -k 2 -M 3 -O 11 -E 4")))) thus each list element is a list whose first element is a keyword describing the record type. The rest of each list is itself an alist of record keys and values.

MAPPED-PROPER-PAIR-P (FLAG)

Returns T if FLAG indicates that the read was mapped as a member of a properly oriented read-pair, or NIL otherwise.

MAPPING-QUALITY (ALN)

Returns the integer mapping quality of ALN.

MATE-ALIGNMENT-POSITION (ALN)

Returns the 0-based sequence position of the read mate's alignment described by ALN.

MATE-FORWARD-P (FLAG)

Returns T if FLAG indicates that the read's mate was mapped to the forward, or NIL if it was mapped to the reverse strand.

MATE-MAPPED-P (FLAG)

Returns T if FLAG indicates that the read's mate was mapped to a reference, or NIL otherwise.

MATE-REFERENCE-ID (ALN)

Returns the integer reference ID of ALN.

MATE-REVERSE-P (FLAG)

Returns T if FLAG indicates that the read's mate was mapped to the reverse, or NIL if it was mapped to the forward strand.

MATE-UNMAPPED-P (FLAG)

Returns T if FLAG indicates that the read's mate was not mapped to a reference, or NIL otherwise.

MEDIAL-FRAG-P (FLAG)

Returns T if FLAG indicates that the read was sequenced as a non-terminal part of a linear template, or NIL otherwise.

MERGE-HEADER-RECORDS (RECORD1 RECORD2)

Returns a new header record created by merging the tags of header-records RECORD1 and RECORD2. Records may be safely merged if they have the same header-type and do not contain any conflicting tag values.

MERGE-SAM-HEADERS (&REST HEADERS)

Returns a new SAM header that is the result of merging HEADERS. Headers may be safely merged if none of their constituent records contain conflicting tag values once merged.

MULTIPLE-FRAGS-P (FLAG)

Returns T if FLAG indicates that the read was sequenced as a member of a pair, or NIL otherwise.

NAME-SORTED-P (HEADER)

Returns T if parsed HEADER indicates that the file is sorted by name, or NIL otherwise.

NEXT-REVERSE-P (FLAG)

Returns T if FLAG indicates that the read's mate was mapped to the reverse, or NIL if it was mapped to the forward strand.

NEXT-UNMAPPED-P (FLAG)

Returns T if FLAG indicates that the read's mate was not mapped to a reference, or NIL otherwise.

PARSE-REGION-STRING (STR)

Returns a new region from a string region designator in samtools-style format: <reference name>:<start coordinate>-<end coordinate> The reference name must be one of those described in the BAM header metadata. The reference sequence coordinates are zero-based, half open with start < end.

PCR/OPTICAL-DUPLICATE-P (FLAG)

Returns T if FLAG indicates that the read is a PCR or optical duplicate, or NIL otherwise.

PG-RECORD (IDENTITY &KEY PROGRAM-NAME PROGRAM-VERSION PREVIOUS-PROGRAM COMMAND-LINE)

Returns a new PG record.

PREVIOUS-PROGRAMS (HEADER IDENTITY)

Returns a list of PG ID values from HEADER that are previous programs with respect to PG ID IDENTITY. The list is ordered with the most recently used program first i.e. reverse chronological order.

PROPER-ALIGNED-FRAGS-P (FLAG)

Returns T if FLAG indicates that the read was mapped as a member of a properly oriented read-pair, or NIL otherwise.

QUALITY-STRING (ALN)

Returns the sequence quality string described by ALN.

QUERY-FORWARD-P (FLAG)

Returns T if FLAG indicates that the read was mapped to the forward strand of a reference, or NIL if it was mapped to the reverse strand.

QUERY-MAPPED-P (FLAG)

Returns T if FLAG indicates that the read's mate was mapped to a reference, or NIL otherwise.

QUERY-REVERSE-P (FLAG)

Returns T if FLAG indicates that the read was mapped to the reverse strand of a reference, or NIL if it was mapped to the forward strand.

QUERY-UNMAPPED-P (FLAG)

Returns T if FLAG indicates that the read was not mapped to a reference, or NIL otherwise.

READ-ALIGNMENT (BGZF &KEY VALIDATE)

Reads one alignment block from handle BGZF, returns it as a Lisp array of unsigned-byte 8. The handle is advanced to the next alignment. If no more alignments are available, returns NIL. This is the preferred function to use for validation. i.e. validate early. A USE-VALUE restart is provided so that an invalid alignment may be modified and re-read without unwinding the stack.

READ-BAM-HEADER (BGZF)

Returns the unparsed BAM header from the handle BGZF as a Lisp string.

READ-BAM-INDEX (STREAM)

Reads a BAM (.bai) index from STREAM.

READ-BAM-MAGIC (BGZF)

Reads the BAM magic number from the handle BGZF and returns T if it is valid or raises a {define-condition malformed-file-error} if not.

READ-BAM-META (BGZF)

Reads all BAM metadata from handle BGZF, leaving the handle pointing at the first alignment record. Returns the header string, the number of references and a list of reference sequence metadata. The list contains one element per reference sequence, each element being a list of reference identifier, reference name and reference length.

READ-BAM-TERMINATOR (BGZF)

Reads the EOF from handle BGZF, returning T if it is present, or raising a {define-condition malformed-file-error} otherwise.

READ-LENGTH (ALN)

Returns the length of the read described by ALN.

READ-NAME (ALN)

Returns the read name string described by ALN.

READ-NAME-LENGTH (ALN)

Returns the length in ASCII characters of the read name of ALN.

READ-NUM-REFERENCES (BGZF)

Returns the number of reference sequences described in the BAM header of handle BGZF.

READ-REFERENCE-META (BGZF)

Returns the reference sequence metadata for a single reference sequence described in the BAM header of handle BGZF. Two values are returned, the reference name as a string and the reference length as an integer,

REF-INDEX (BAM-INDEX REFERENCE-ID)

Returns the reference index for reference number REFERENCE-ID in BAM-INDEX.

REF-INDEX-BIN (REF-INDEX BIN-NUM)

Returns the BIN number BIN-NUM in REF-INDEX.

REF-INDEX-BINS (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

REF-INDEX-INTERVALS (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

REFERENCE-ID (ALN)

Returns the reference sequence identifier of ALN. This is an integer locally assigned to a reference sequence within the context of a BAM file.

REPAIR-MAPPING-FLAGS (ALN)

Returns alignment ALN if it has correct mapped-proper-pair, query-mapped and mate-mapped flags, or a modified copy with fixed flags. Also sets the user tag ZF:I:<original flag> if the flags are changed. Alignments produced by the BWA aligner have been observed to contain invalid flags. This is caused by BWA creating mappings that overhang the end of the reference. The underlying cause is reference concatenation in the Burrows-Wheeler index. This function attempts to fix invalid mapped-proper-pair flags in cases where query-unmapped and/or mate-unmapped flags are set. Such unmapped reads may also have reference-ids set and good mapping scores. This function does not correct these other fields. It is necessary to use this function in order to obtain correct flag counts when using samtools flagstat or {defun flagstat} .

RG-RECORD (IDENTITY SAMPLE &KEY LIBRARY DESCRIPTION (PLATFORM-UNIT LANE) (INSERT-SIZE 0) SEQUENCING-CENTRE SEQUENCING-DATE PLATFORM-TECH FLOW-ORDER KEY-SEQUENCE)

Returns a new RG record.

SECOND-IN-PAIR-P (FLAG)

Returns T if FLAG indicates that the read was the second in a pair of reads from one template, or NIL otherwise.

SEQ-STRING (ALN)

Returns the sequence string described by ALN.

SEQUENCED-PAIR-P (FLAG)

Returns T if FLAG indicates that the read was sequenced as a member of a pair, or NIL otherwise.

SORT-BAM-FILE (IN-FILESPEC OUT-FILESPEC &KEY (SORT-ORDER COORDINATE) (BUFFER-SIZE 1000000))

Sorts a BAM file by coordinate or by read name. Arguments: - in-filespec (pathname designator): The input BAM file. - out-filespec (pathname designator): The output BAM file. Key: - sort-order (symbol): The sort order, either :coordinate or :queryname . - buffer-size (fixnum): The maximum number of reads to sort in memory at any one time, defaulting to 1000000. Returns: - The number of alignments sorted. - The number of files used in the external merge sort.

SQ-RECORD (SEQ-NAME SEQ-LENGTH &KEY ASSEMBLY-IDENTITY SEQ-MD5 SEQ-URI SEQ-SPECIES)

Returns a new SQ record.

SUBST-GROUP-ORDER (HEADER ORDER)

Returns a copy of HEADER with any group order tag initially present substituted by symbol ORDER, which must be one of the valid SAM sort orders. If there is no HD record in HEADER, one is added.

SUBST-SAM-VERSION (HEADER &OPTIONAL (VERSION *SAM-VERSION*))

Returns a copy of HEADER with any header version tag initially present substituted by string VERSION, defaulting to *SAM-VERSION* .

SUBST-SORT-ORDER (HEADER ORDER)

Returns a copy of HEADER with any sort order tag initially present substituted by symbol ORDER, which must be one of the valid SAM sort orders. If there is no HD record in HEADER, one is added.

UNKNOWN-FRAG-P (FLAG)

Returns T if FLAG indicates that the read was sequenced as part of a non-linear template or as part of a linear template where the position has been lost, or NIL otherwise.

VALID-FLAG-P (FLAG)

Returns T if the paired-read-specific bits of FLAG are internally consistent.

VALID-MAPPED-PAIR-P (FLAG)

Returns T if FLAG indicates valid mapping states for a pair of mapped reads, that is both must be mapped, or NIL otherwise.

VALID-MAPPED-PROPER-PAIR-P (FLAG)

Returns T if FLAG indicates valid proper mapping states for a pair of mapped reads, that is both must be mapped and on opposite strands, or NIL otherwise.

VALID-REFERENCE-NAME-P (STR)

Returns T if STR is a valid reference sequence name matching the regex [!-)+-<>-~][!-~]* , or NIL otherwise.

VALID-SAM-VERSION-P (STR)

Returns T if SAM version string STR matches /^[0-9]+.[0-9]$/, or NIL otherwise.

VIEW-SAM (BAM-FILESPEC SAM-FILESPEC)

Writes the contents of binary BAM file BAM-FILESPEC to SAM text file SAM-FILESPEC.

WRITE-ALIGNMENT (BGZF ALIGNMENT-RECORD)

Writes one ALIGNMENT-RECORD to handle BGZF and returns the number of bytes written.

WRITE-BAM-HEADER (BGZF HEADER &KEY (COMPRESS T) (NULL-PADDING 0) (MTIME 0))

Writes the BAM header string HEADER to handle BGZF, followed by padding of NULL-PADDING null bytes. This function also writes the header length, including padding, in the 4 bytes preceding the header.

WRITE-BAM-MAGIC (BGZF &KEY (COMPRESS T))

Writes the BAM magic number to the handle BGZF.

WRITE-BAM-META (BGZF HEADER NUM-REFS REF-META &KEY (COMPRESS T) (NULL-PADDING 0))

Writes BAM magic number and then all metadata to handle BGZF. The metadata consist of the HEADER string, number of reference sequences NUM-REFS and a list reference sequence metadata REF-META. The list contains one element per reference sequence, each element being a list of reference identifier, reference name and reference length. Returns the number of bytes written. Key: - compress-header (boolean): Compress the header block if T. - null-padding (fixnum): A number of null bytes to be appended to the end of the header string, as allowed by the SAM spec. This is useful for creating slack space so that BAM headers may be edited in place.

WRITE-HEADER-RECORD (HEADER-RECORD &OPTIONAL (STREAM T))

Writes alist HEADER-RECORD to STREAM.

WRITE-NUM-REFERENCES (BGZF N)

Writes the number of reference sequences N to handle BGZF.

WRITE-REFERENCE-META (BGZF REF-NAME REF-LENGTH)

Writes the metadata for a single reference sequence named REF-NAME, of length REF-LENGTH bases, to handle BGZF.

WRITE-SAM-ALIGNMENT (ALIGNMENT-RECORD REF-TABLE &OPTIONAL (STREAM T))

Writes ALIGNMENT-RECORD to STREAM as SAM. REF-TABLE is a hash-table created by {defun make-reference-table} .

WRITE-SAM-HEADER (ALIST &OPTIONAL (STREAM T))

Writes SAM header ALIST as a string to STREAM.

Undocumented

SETFBAM-INDEX-REFS (NEW-VALUE INSTANCE)

BAM-SORT-ERROR (PREVIOUS-REF PREVIOUS-POS REF POS &OPTIONAL MESSAGE &REST MESSAGE-ARGUMENTS)

BGZF-OPEN-P (BGZF)

SETFBIN-CHUNKS (NEW-VALUE INSTANCE)

CHUNK-END (INSTANCE)

SETFCHUNK-END (NEW-VALUE INSTANCE)

CHUNK-START (INSTANCE)

SETFCHUNK-START (NEW-VALUE INSTANCE)

SETFREF-INDEX-BINS (NEW-VALUE INSTANCE)

SETFREF-INDEX-INTERVALS (NEW-VALUE INSTANCE)

Private

ADJACENTP (CHUNK1 CHUNK2)

Returns T if BAM index CHUNK1 and CHUNK2 are adjacent. In this context this means that they are sequential in a file stream and close enough together for it to be more efficient to continue reading across the intervening bytes, rather than seeking.

ALIGNMENT-INDICES (ALN)

Returns 7 integer values which are byte-offsets within ALN at which the various core data lie. See the SAM spec.

ALIGNMENT-PRIMARY-P (FLAG)

Returns T if FLAG indicates that the read mapping was the primary mapping to a reference, or NIL otherwise.

ALIGNMENT-READ-LENGTH (ALN)

Returns the length of the alignment on the read.

ALIGNMENT-REFERENCE-LENGTH (ALN)

Returns the length of the alignment on the reference.

ALIGNMENT-TAG-BYTES (VALUE)

Returns the number of bytes required to encode VALUE.

ALN-BIN (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

ALN-CIGAR (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

ALN-FLAG (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

ALN-INSERT-LENGTH (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

ALN-MAP-QUAL (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

ALN-MATE-POS (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

ALN-MATE-REF-ID (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

ALN-POS (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

ALN-QUAL-STR (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

ALN-READ-NAME (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

ALN-RECORD (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

ALN-REF-ID (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

ALN-REF-LEN (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

ALN-SEQ-STR (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

BAM-META-SIZE (FILESPEC)

Returns the number of bytes occupied by the metadata in the BAM file FILESPEC. Metadata is defined here as the magic number, header and header padding, reference sequence count and reference sequence names and lengths.

BGZ-MEMBER-BSIZE (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

BGZ-MEMBER-CDATA (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

BGZ-MEMBER-CEND (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

BGZ-MEMBER-CM (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

BGZ-MEMBER-CRC32 (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

BGZ-MEMBER-FLG (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

BGZ-MEMBER-ID1 (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

BGZ-MEMBER-ID2 (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

BGZ-MEMBER-ISIZE (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

BGZ-MEMBER-MTIME (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

BGZ-MEMBER-OS (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

BGZ-MEMBER-SF1 (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

BGZ-MEMBER-SF2 (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

BGZ-MEMBER-SLEN (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

BGZ-MEMBER-UDATA (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

BGZ-MEMBER-XFL (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

BGZ-MEMBER-XLEN (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

BGZF-BUFFER (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

BGZF-COFFSET (POSITION)

Returns the compressed offset (most significant bits) component of BGZF virtual offset POSITION.

BGZF-COMPRESSION (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

BGZF-EMPTY-P (BGZF)

Returns T if all decompressed bytes have been read from the current decompressed BGZ block.

BGZF-EOF (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

BGZF-LOAD-SEEK (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

BGZF-LOADED-P (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

BGZF-OFFSET (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

BGZF-PATHNAME (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

BGZF-POINTER (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

BGZF-POSITION (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

BGZF-STREAM (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

BGZF-UOFFSET (POSITION)

Returns the uncompressed (least significant bits) component of BGZF virtual offset POSITION.

BGZF-UTIL-BUFFER (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

DECODE-CIGAR (ALN INDEX NUM-BYTES)

Returns an alist of CIGAR operations from NUM-BYTES bytes within ALN, starting at INDEX. The decoding of the = and X operations is not documented in the spec.

DECODE-QUALITY-STRING (ALN INDEX NUM-BYTES)

Returns a string containing the alignment query sequence of length NUM-BYTES. The sequence must be present in ALN at INDEX. The SAM spec states that quality data are optional, with absence indicated by 0xff. If the first byte of quality data is 0xff, NIL is returned.

DECODE-READ-NAME (ALN INDEX NUM-BYTES)

Returns a string containing the template/read name of length NUM-BYTES, encoded at byte INDEX in ALN.

DECODE-SEQ-STRING (ALN INDEX NUM-BYTES)

Returns a string containing the alignment query sequence of length NUM-BYTES. The sequence must be present in ALN at INDEX.

DECODE-TAG-VALUES (ALN INDEX)

Returns a list of auxilliary data from ALN at INDEX. The BAM two-letter data keys are transformed to Lisp keywords.

DEFAULT-ORPHANIZER-TEST

Returns a new orphanizer test that causes alternate last fragments to be omitted.

ENCODE-CHAR-TAG (VALUE ALN INDEX)

Returns ALN having encoded character VALUE into it, starting at INDEX.

ENCODE-CIGAR (CIGAR ALN INDEX)

Returns ALN having encoded alist CIGAR into it, starting at INDEX.

ENCODE-FLOAT-TAG (VALUE ALN INDEX)

Returns ALN having encoded float VALUE into it, starting at INDEX.

ENCODE-HEX-TAG (VALUE ALN INDEX)

Returns ALN having encoded hex string VALUE into it, starting at INDEX.

ENCODE-INT-TAG (VALUE ALN INDEX)

Returns ALN having encoded integer VALUE into it, starting at INDEX. BAM format is permitted to use more compact integer storage where possible.

ENCODE-PHRED-QUALITY (Q)

Returns the character encoding Phred quality Q.

ENCODE-QUALITY-STRING (STR ALN INDEX)

Returns ALN having encoded quality string STR into it, starting at INDEX.

ENCODE-READ-NAME (READ-NAME ALN INDEX)

Returns ALN having encoded READ-NAME into it, starting at INDEX.

ENCODE-SEQ-STRING (STR ALN INDEX)

Returns ALN having encoded STR into it, starting at INDEX.

ENCODE-STRING-TAG (VALUE ALN INDEX)

Returns ALN having encoded string VALUE into it, starting at INDEX.

ENCODE-UNKNOWN-QUALITY (N ALN INDEX)

Returns ALN having encoded N unknown qualities into it, starting at INDEX.

ENSURE-ORDER (HEADER DOMAIN)

Returns a copy of HEADER that is guaranteed to contain a sort tag for DOMAIN, which must be one of :sort or :group .

ENSURE-SAM-VERSION (HEADER &OPTIONAL (VERSION *SAM-VERSION*))

Returns a copy of HEADER that is guaranteed to contain a version tag.

ENSURE-UNIQUE-TAG-VALUES (HEADER RECORD-TYPE TAG)

Returns HEADER if all records of RECORD-TYPE have unique TAG values, with respect to each other, or raises a {define-condition malformed-field-error} .

ENSURE-VALID-REFERENCE-NAME (STR)

Returns STR if it is a valid reference sequence name, or raises a {define-condition malformed-field-error} if not.

ENSURE-VALID-REGION (REGION)

Returns REGION if it is a valid region designator, or raises an {define-condition invalid-argument-error} . A region designator may be any of the following: - A list of a string and two integers, being a reference name, start position and end position, respectively. - A list of three integers, being a reference identifier, start position and end position, respectively. - A region struct. The start position must be less than, or equal to the end position.

FIND-DUPLICATE-HEADER-TAGS (RECORD)

Returns a list of duplicate SAM header tags found in RECORD. Each list element contains the tag, followed by a list of the conflicting values.

FIND-DUPLICATE-RECORDS (HEADER HEADER-TYPE)

Returns a list of any duplicate records of HEADER-TYPE in HEADER.

FLAG-SYMBOL (FLAG)

Returns a 3 character string symbolising the read pairing denoted by FLAG. The left character indicates the query strand, the middle the alignment pairing and the right the mate strand. Query mapped forward > Query mapped reverse < Query unmapped . Not paired . Mapped proper pair = Mapped pair - Singleton ~ Mate mapped forward > Mate mapped reverse < Mate unmapped .

FLAG-VALIDATION-ERROR (FLAG MESSAGE &OPTIONAL ALN)

Raised a {define-condition malformed-field-error} for alignment FLAG in ALN, with MESSAGE.

GROUP-BY-TAG (RECORDS HEADER-TAG)

Returns a hash-table of header-records taken from list RECORDS. The hash-table keys are HEADER-TAG values taken from the records and the hash-table values are lists of header-records that share that HEADER-TAG value.

HEADER-RECORD (RECORD-TYPE &REST ARGS)

Returns a new header record of HEADER-TYPE. ARGS are tag values in the same order as the tag returned by {defun valid-header-tags} , which is the same order as they are presented in the SAM spec.

MAKE-ALIGNMENT (ALN)

Returns a new ALIGNMENT structure given BAM record ALN.

MAKE-BAM-CHUNKS (INDEX REGION)

Returns a list of read chunks from INDEX covering REGION, sorted by increasing chunk start.

MAKE-BAM-FULL-SCAN-INPUT (BAM)

Returns a generator function that returns BAM alignment records for BAM stream BAM by scanning the entire stream. This to be used in cases where a full scan is necessary, or as a fallback when neither index, nor regions are available. The standard generator interface functions NEXT and HAS-MORE-P may be used in operations on the returned generator.

MAKE-BAM-INDEX-INPUT (BAM INDEX REGIONS)

Returns a generator function that returns BAM alignment records for BAM stream BAM by scanning for alignments in the list of REGIONS. Note that REGIONS are zero-based, closed base coordinates. The standard generator interface functions NEXT and HAS-MORE-P may be used in operations on the returned generator.

MAKE-BAM-SCAN-INPUT (BAM REGIONS)

Returns a generator function that returns BAM alignment records for BAM stream BAM by scanning for alignments in the list of REGIONS. Note that REGIONS use zero-based, closed base coordinates. This to be used in cases where a full scan is necessary, or as a fallback when an index is not available. The standard generator interface functions NEXT and HAS-MORE-P may be used in operations on the returned generator.

MAKE-HEADER-STRING (ALIST)

Returns a new SAM header string representing the header data in ALIST.

MAKE-REGION< (REF-META)

Returns a new comparator function which sorts regions according to the BAM reference metadata REF-META. Ranges are sorted first by the reference order in the BAM file, then by region start and finally by region end. Regions are sorted before they are normalised.

MANDATORY-HEADER-TAGS (HEADER-TYPE)

Returns a list of the mandatory tags for SAM header HEADER-TYPE. Both HEADER-TYPE. and the returned tags are represented as symbols.

MERGE-CHUNKS (CHUNKS)

Returns a new list of BAM index chunks created by merging members of list CHUNKS that are {defun adjacentp} .

NEXT-FORWARD-P (FLAG)

Returns T if FLAG indicates that the read's mate was mapped to the forward, or NIL if it was mapped to the reverse strand.

NEXT-MAPPED-P (FLAG)

Returns T if FLAG indicates that the read's mate was mapped to a reference, or NIL otherwise.

NORMALISE-REGIONS (REGIONS REF-META)

Returns a list of REGIONS, normalised according to the BAM reference metadata REF-META. Reference names in REGIONS are converted to their corresponding reference ID and REGIONS are sorted first by the reference order in the BAM file, then by region start and finally by region end. Overlapping REGIONS are merged.

PARTITION-BY-TYPE (HEADERS)

Collects all the header-records in HEADERS by header-type, sorts them and returns 4 values that are lists of the collected :hd , :sq , :rg and :pg header-records, respectively. Does not modify HEADERS.

PARTITION-REGIONS (REGIONS)

Partitions a list of normalised REGIONS into lists, each pertaining to a specific reference sequence. The overall order of the REGIONS is maintained.

READ-BGZ-MEMBER (STREAM BUFFER)

Reads one BGZ member from STREAM, using BUFFER to hold integers as they are read. Arguments: - stream (octet input-stream): An open stream. - buffer (simple-octet-vector): A re-usable read buffer which must be able to contain at least 4 bytes. Returns: - A BGZ structure, or NIL.

READ-BIN (STREAM)

Reads an index bin from STREAM.

READ-BINNING-INDEX (NUM-BINS STREAM)

Reads NUM-BINS bins from STREAM, returning a vector of bins, sorted by increasing bin number.

READ-INDEX-MAGIC (STREAM)

Reads the BAI magic number from STREAM and returns T if it is valid or raises a {define-condition malformed-file-error} if not.

READ-LINEAR-INDEX (NUM-INTERVALS STREAM)

Reads NUM-INTERVALS linear bin intervals from STREAM, returning them in a vector. Identical intervals are run-length compressed.

READ-REF-INDEX (REFERENCE-ID STREAM)

Reads an index for a single reference sequence from STREAM.

READ-STRING (BGZF N &KEY NULL-TERMINATED)

Reads N characters from handle BGZF and returns them as a Lisp string. The NULL-TERMINATED keyword is used to indicate whether the C string is null-terminated so that the terminator may be consumed.

REF-INDEX-ID (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

REGION-END (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

REGION-REF (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

REGION-START (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

REGION-TO-BIN (START END)

Returns a bin number given a 0-based range.

REGION-TO-BINS (START END)

Returns a bit vector with the bits for relevant bins set, given a 0-based range. Each bin may span 2^29, 2^26, 2^23, 2^20, 2^17 or 2^14 bp. Bin 0 spans a 512 Mbp region, bins 1-8 span 64 Mbp, 9-72 8 Mbp, 73-584 1 Mbp, 585-4680 128 Kbp and bins 4681-37449 span 16 Kbp regions.

REGION-TO-INTERVAL (COORD)

Given a coordinate COORD, returns the array index of its interval in the BAM linear index.

RUN-LENGTH-ENCODE (INTERVALS)

Returns a run-length encoded list representing INTERVALS, a BGZF linear index. Each element of the list has a car of the run-length and a cdr of the BGZF file position. Used in printing text representations of the index.

SAMTOOLS-BAM-INDEX-REFS (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

SAMTOOLS-REF-INDEX-BINS (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

SAMTOOLS-REF-INDEX-ID (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

SAMTOOLS-REF-INDEX-INTERVALS (INSTANCE)

@arg[extid]{A @class{extid}} @return[sytemid]{puri:uri or nil} Returns the System ID part of this External ID.

SIMPLIFY-RECORDS (RECORDS HEADER-TAG)

Returns a simplified list of header-records copied from RECORDS. The RECORDS must all be of the same type. The simplifications are: removal of perfect duplicates, grouping of header-records by HEADER-TAG value and subsequent merging of header-records that share that HEADER-TAG value. RECORDS may be an empty list.

SORT-BAM-ALIGNMENTS (BGZF-IN BGZF-OUT PREDICATE &KEY KEY (BUFFER-SIZE 1000000))

Sorts alignments from block gzip input stream BGZF-IN and writes them to block gzip output stream BGZF-OUT, sorted by PREDICATE. A function KEY may be supplied to transform alignments into arguments for PREDICATE. The BUFFER-SIZE argument declares the maximum number of alignments that will be sorted in memory at any time, defaulting to 1000000.

STREAM-VIEW-SAM (BAM &OPTIONAL (STREAM T))

Writes the binary BAM records from BGZF stream BAM to STREAM, in SAM text format.

UPDATE-PG-RECORDS (CURRENT-RECORDS NEW-RECORD)

Returns a copy of CURRENT-RECORDS with NEW-RECORD added.

USER-HEADER-TAG-P (TAG)

Returns T if TAG is a SAM 1.4 user-defined header tag. User-defined tags are recognised by containing lower case letters.

VALID-HEADER-TAGS (HEADER-TYPE)

Returns a list of the valid tags for SAM header HEADER-TYPE. Both HEADER-TYPE and the returned tags are represented as symbols.

VALID-READ-NAME-P (STR)

Returns T if STR is a valid read name matching the regex [!-?A-~]1,255 , or NIL otherwise. The length limit is implicit in the BAM format, there being 8 bits to store the read name length. This function is not used as it is not clear that this check is accepted by other implementations.

VOFFSET-MERGE-P (VOFFSET1 VOFFSET2)

Returns T if BGZF virtual offsets are close enough to save disk seeks. Continuing to read from the current position may be faster than seeking forward a short distance.

WRITE-BGZ-MEMBER (BGZ STREAM BUFFER)

Writes one BGZ member to STREAM. Arguments: - bgz (bgz member): A bgz member to write. - stream (octet output-stream): An open stream. - buffer (simple-octet-vector): A re-usable read buffer which must be able to contain at least 4 bytes. Returns: - The number of bytes written.

WRITE-BYTES (BGZF BYTES N &KEY (COMPRESS T) (MTIME 0))

Write N elements from vector BYTES to stream BGZF. Key: - compress (boolean): If T, compress at current BGZF compression level when the current block becomes full. Defaults to T. - mtime (uint32): The zlib mtime. Defaults to 0. Returns: - The number of bytes written.

WRITE-CIGAR (ALIGNMENT-RECORD INDEX NUM-BYTES &OPTIONAL (STREAM T))

Writes the CIGAR string of ALIGNMENT-RECORD at INDEX as NUM-BYTES bytes, to STREAM.

WRITE-QUALITY-STRING (ALIGNMENT-RECORD INDEX NUM-BYTES &OPTIONAL (STREAM T))

Writes the quality string of ALIGNMENT-RECORD at INDEX as NUM-BYTES, to STREAM.

WRITE-SEQ-STRING (ALIGNMENT-RECORD INDEX NUM-BYTES &OPTIONAL (STREAM T))

Writes the sequence string of ALIGNMENT-RECORD at INDEX as NUM-BYTES bytes, to STREAM.

WRITE-TAG-VALUES (ALIGNMENT-RECORD INDEX &OPTIONAL (STREAM T))

Writes the auxilliary data of ALIGNMENT-RECORD at INDEX to STREAM.

Undocumented

%ENCODE-STRING-TAG (VALUE ALN INDEX)

%MAKE-BAM-INDEX (NUM-REFS REF-INDICES &OPTIONAL (UNASSIGNED 0))

%MAKE-REF-INDEX (REFERENCE-ID REF-START REF-END CHUNKS INTERVALS &OPTIONAL (MAPPED 0) (UNMAPPED 0))

%READ-BAM-ALIGNMENT (STREAM NUM-BYTES)

%STREAM-READ-SEQUENCE (STREAM SEQUENCE &OPTIONAL (START 0) END)

%WRITE-INT32 (N BUFFER STREAM)

%WRITE-INT64 (N BUFFER STREAM)

ALIGNMENT-NAME-NATURAL< (ALIGNMENT-RECORD1 ALIGNMENT-RECORD2)

ALIGNMENT-P (OBJECT)

SETFALN-BIN (NEW-VALUE INSTANCE)

SETFALN-CIGAR (NEW-VALUE INSTANCE)

SETFALN-FLAG (NEW-VALUE INSTANCE)

SETFALN-INSERT-LENGTH (NEW-VALUE INSTANCE)

SETFALN-MAP-QUAL (NEW-VALUE INSTANCE)

SETFALN-MATE-POS (NEW-VALUE INSTANCE)

SETFALN-MATE-REF-ID (NEW-VALUE INSTANCE)

SETFALN-POS (NEW-VALUE INSTANCE)

SETFALN-QUAL-STR (NEW-VALUE INSTANCE)

SETFALN-READ-NAME (NEW-VALUE INSTANCE)

SETFALN-RECORD (NEW-VALUE INSTANCE)

SETFALN-REF-ID (NEW-VALUE INSTANCE)

SETFALN-REF-LEN (NEW-VALUE INSTANCE)

SETFALN-SEQ-STR (NEW-VALUE INSTANCE)

BAM-INDEX-P (OBJECT)

BAM-META-SIZE2 (HEADER &REST REF-META)

SETFBGZ-MEMBER-BSIZE (NEW-VALUE INSTANCE)

SETFBGZ-MEMBER-CDATA (NEW-VALUE INSTANCE)

SETFBGZ-MEMBER-CEND (NEW-VALUE INSTANCE)

SETFBGZ-MEMBER-CRC32 (NEW-VALUE INSTANCE)

SETFBGZ-MEMBER-FLG (NEW-VALUE INSTANCE)

SETFBGZ-MEMBER-ISIZE (NEW-VALUE INSTANCE)

SETFBGZ-MEMBER-MTIME (NEW-VALUE INSTANCE)

SETFBGZ-MEMBER-OS (NEW-VALUE INSTANCE)

BGZ-MEMBER-P (OBJECT)

SETFBGZ-MEMBER-UDATA (NEW-VALUE INSTANCE)

SETFBGZ-MEMBER-XFL (NEW-VALUE INSTANCE)

SETFBGZ-MEMBER-XLEN (NEW-VALUE INSTANCE)

SETFBGZF-BUFFER (NEW-VALUE INSTANCE)

SETFBGZF-COMPRESSION (NEW-VALUE INSTANCE)

SETFBGZF-EOF (NEW-VALUE INSTANCE)

BGZF-FLUSH (BGZF &KEY (COMPRESS T) (APPEND-EOF T) (MTIME 0))

SETFBGZF-LOAD-SEEK (NEW-VALUE INSTANCE)

SETFBGZF-LOADED-P (NEW-VALUE INSTANCE)

SETFBGZF-OFFSET (NEW-VALUE INSTANCE)

BGZF-P (OBJECT)

SETFBGZF-PATHNAME (NEW-VALUE INSTANCE)

SETFBGZF-POINTER (NEW-VALUE INSTANCE)

SETFBGZF-POSITION (NEW-VALUE INSTANCE)

SETFBGZF-STREAM (NEW-VALUE INSTANCE)

SETFBGZF-UTIL-BUFFER (NEW-VALUE INSTANCE)

BIN-P (OBJECT)

BLOCK-SIZES (FILESPEC N)

BUFFER-EMPTY-P (STREAM)

CHUNK-P (OBJECT)

COPY-ALIGNMENT (INSTANCE)

COPY-BAM-INDEX (INSTANCE)

COPY-BGZ-MEMBER (INSTANCE)

COPY-BGZF (INSTANCE)

COPY-BIN (INSTANCE)

COPY-CHUNK (INSTANCE)

COPY-REF-INDEX (INSTANCE)

COPY-REGION (INSTANCE)

COPY-SAMTOOLS-BAM-INDEX (INSTANCE)

COPY-SAMTOOLS-REF-INDEX (INSTANCE)

DEFAULT-QUAL-STRING (LENGTH)

DEFAULT-SEQ-STRING (LENGTH)

DEFLATE-BGZ-MEMBER (BGZ &KEY (COMPRESSION -1) (MTIME 0) (BACKOFF 0))

ENSURE-VALID-FLAG (FLAG &OPTIONAL ALN)

ENSURE-VALID-READ-NAME (STR)

FILL-BUFFER (STREAM)

FIND-INTERVAL (REF-INDEX START)

INFLATE-BGZ-MEMBER (BGZ)

MAKE-ALN (READ-NAME SEQ-STR QUAL-STR FLAG REF-ID MATE-REF-ID POS MATE-POS CIGAR REF-LEN MAP-QUAL INSERT-LENGTH BIN RECORD)

MAKE-BAM-INDEX (&KEY ((REFS DUM0) (MAKE-ARRAY 0)))

MAKE-BGZ-MEMBER (&KEY ((ID1 DUM0) +ID1+) ((ID2 DUM1) +ID2+) ((CM DUM2) +CM-DEFLATE+) ((FLG DUM3) +FLAG-EXTRA+) ((MTIME DUM4) 0) ((XFL DUM5) 0) ((OS DUM6) +OS-UNKNOWN+) ((XLEN DUM7) 0) ((ISIZE DUM8) 0) ((CRC32 DUM9) 0) ((CDATA DUM10) (MAKE-ARRAY 0 ELEMENT-TYPE 'OCTET)) ((CEND DUM11) 0) ((SF1 DUM12) +SF1+) ((SF2 DUM13) +SF2+) ((SLEN DUM14) +SLEN+) ((BSIZE DUM15) 0) ((UDATA DUM16) (MAKE-ARRAY 0 ELEMENT-TYPE 'OCTET)))

MAKE-BGZF (&KEY ((PATHNAME DUM119) NIL) ((STREAM DUM120) NIL) ((COMPRESSION DUM121) *DEFAULT-COMPRESSION*) ((BUFFER DUM122) (MAKE-ARRAY +BGZ-MAX-PAYLOAD-LENGTH+ ELEMENT-TYPE 'OCTET)) ((POSITION DUM123) 0) ((OFFSET DUM124) 0) ((POINTER DUM125) 0) ((UTIL-BUFFER DUM126) (MAKE-ARRAY 4 ELEMENT-TYPE 'OCTET INITIAL-ELEMENT 0)) ((LOADED-P DUM127) NIL) ((LOAD-SEEK DUM128) 0) ((EOF DUM129) NIL))

MAKE-BIN (&KEY ((NUM DUM238) 0) ((CHUNKS DUM239) (MAKE-ARRAY 0)))

MAKE-CHUNK (&KEY ((START DUM288) 0) ((END DUM289) MOST-POSITIVE-FIXNUM))

MAKE-REF-INDEX (&KEY ((ID DUM80) +UNKNOWN-REFERENCE+) ((BINS DUM81) (MAKE-ARRAY 0 INITIAL-ELEMENT NIL)) ((INTERVALS DUM82) (MAKE-ARRAY 0 ELEMENT-TYPE 'FIXNUM)))

MAKE-REGION (&KEY (REF NIL) (START 0) (END 0))

MAKE-SAMTOOLS-BAM-INDEX (&KEY ((REFS DUM38) (MAKE-ARRAY 0)) ((UNASSIGNED DUM39) 0))

MAKE-SAMTOOLS-REF-INDEX (&KEY ((ID DUM152) +UNKNOWN-REFERENCE+) ((BINS DUM153) (MAKE-ARRAY 0 INITIAL-ELEMENT NIL)) ((INTERVALS DUM154) (MAKE-ARRAY 0 ELEMENT-TYPE 'FIXNUM)) ((START DUM155) 0) ((END DUM156) 0) ((MAPPED DUM157) 0) ((UNMAPPED DUM158) 0))

MAX-BIN-NUM (REF-LENGTH)

PARSE-DIGITS (BYTES START END)

PARSE-HD-TAG (STR)

PARSE-PG-TAG (STR)

PARSE-RG-TAG (STR)

PARSE-SQ-TAG (STR)

READ-BYTES (BGZF N &KEY BUFFER)

REF-INDEX-P (OBJECT)

REGION (REF START END)

SETFREGION-END (NEW-VALUE INSTANCE)

REGION-P (OBJECT)

SETFREGION-REF (NEW-VALUE INSTANCE)

SETFREGION-START (NEW-VALUE INSTANCE)

SAMTOOLS-BAM-INDEX-P (OBJECT)

SETFSAMTOOLS-BAM-INDEX-REFS (NEW-VALUE INSTANCE)

SAMTOOLS-BAM-INDEX-UNASSIGNED (INSTANCE)

SETFSAMTOOLS-BAM-INDEX-UNASSIGNED (NEW-VALUE INSTANCE)

SETFSAMTOOLS-REF-INDEX-BINS (NEW-VALUE INSTANCE)

SAMTOOLS-REF-INDEX-END (INSTANCE)

SETFSAMTOOLS-REF-INDEX-END (NEW-VALUE INSTANCE)

SETFSAMTOOLS-REF-INDEX-INTERVALS (NEW-VALUE INSTANCE)

SAMTOOLS-REF-INDEX-MAPPED (INSTANCE)

SETFSAMTOOLS-REF-INDEX-MAPPED (NEW-VALUE INSTANCE)

SAMTOOLS-REF-INDEX-P (OBJECT)

SAMTOOLS-REF-INDEX-START (INSTANCE)

SETFSAMTOOLS-REF-INDEX-START (NEW-VALUE INSTANCE)

SAMTOOLS-REF-INDEX-UNMAPPED (INSTANCE)

SETFSAMTOOLS-REF-INDEX-UNMAPPED (NEW-VALUE INSTANCE)

WRITE-BIN (BIN STREAM)

WRITE-BINNING-INDEX (BINS STREAM)

WRITE-CHUNKED-BYTES (BGZF BYTES N &KEY (COMPRESS T) (MTIME 0))

WRITE-INDEX-MAGIC (STREAM)

WRITE-LINEAR-INDEX (INTERVALS STREAM)

MACRO

Public

DEFINE-ALIGNMENT-TAG (TAG VALUE-TYPE &OPTIONAL DOCSTRING)

Defines a new alignment tag to hold a datum of a particular SAM type. Arguments: - tag (symbol): The tag e.g. :rg - value-type (symbol): The value type, one of :char , :string , :hex :int32 or :float . Optional: - docstring (string): Documentation for the tag.

WITH-BAM ((VAR (&OPTIONAL HEADER NUM-REFS REF-META) FILESPEC &REST ARGS &KEY (COMPRESS-HEADER T) (PAD-HEADER 0) INDEX REGIONS &ALLOW-OTHER-KEYS) &BODY BODY)

Evaluates BODY with VAR bound to a new BAM generator function on pathname designator FILESPEC. The direction (:input versus :output) is determined by the stream-opening arguments in ARGS. The standard generator interface functions NEXT and HAS-MORE-P may be used in operations on the returned generator. On reading, HEADER, NUM-REFS and REF-META will be automatically bound to the BAM file metadata i.e. the metadata are read automatically and the iterator positioned before the first alignment record. Optionally, iteration may be restricted to a specific reference, designated by REF-NUM and to alignments that start between reference positions START and END. Furthermore, a BAM-INDEX object INDEX may be provided to allow the stream to seek directly to the desired region of the file. On writing, HEADER, NUM-REFS and REF-META should be bound to appropriate values for the BAM file metadata, which will be automatically written to the underlying stream. The COMPRESS-HEADER and PAD-HEADER keyword arguments are only applicable on writing where they control whether the records and header block should be compressed and whether the header string should be padded with nulls to allow space for expansion. Writing an uncompressed, padded header means that a header that fits in the first BGZF block may be updated without re-writing the entire BAM file. The BGZF COMPRESSION keyword is also accepted, permitting the Zlib compression level to be set for the stream. A list REGIONS may be supplied to limit the returned alignments to specific references and reference coordinates. REGIONS may be region objects or region designators in the form of list tuples ;;; (<reference designator> start end) where a reference designator is either the reference name string or its identifier number in the BAM file. REGIONS will be normalised automatically by sorting according to reference position in the BAM file (according to the BAM metadata) and then by start and end. Overlapping regions will be merged. For example: To count all records in a BAM file: ;;; (with-bam (in () "in.bam") ;;; (loop ;;; while (has-more-p in) ;;; count (next in))) To copy only records with a mapping quality of >= 30 to another BAM file: ;;; (with-bam (in (header n ref-meta) "in.bam") ;;; (with-bam (out (header n rer-meta) "out.bam" :direction :output) ;;; (let ((q30 (discarding-if (lambda (x) ;;; (< (mapping-quality x) 30)) in))) ;;; (loop ;;; while (has-more-p q30) ;;; do (consume out (next q30)))))) To count records with mapping quality of >= 30 in a set of genomic ranges, using an index: ;;; (with-bam-index (index "index.bai") ;;; (with-bam (bam () bam-file :index index ;;; :regions '((0 1000000 1100000) (1 1000000 1100000))) ;;; (loop ;;; while (has-more-p bam) ;;; count (>= (mapping-quality (next bam)) 30))))

WITH-BAM-INDEX ((VAR FILESPEC) &BODY BODY)

Evaluates BODY with VAR bound to a BAM-INDEX read from file denoted by pathname designator FILESPEC.

WITH-BGZF ((VAR FILESPEC &REST ARGS) &BODY BODY)

Executes BODY with VAR bound to a BGZF handle structure created by opening the file denoted by FILESPEC. Arguments: - var (symbol): The symbol to be bound. - filespec (pathname designator): The file to open. Rest: - args: Arguments applicable to bgzf-open.

Undocumented

WITH-OPEN-BGZIP ((VAR FILESPEC &REST ARGS) &BODY BODY)

Private

DEFINE-HEADER-TAG-PARSER (NAME (HEADER-TYPE VAR) TAG-SPECS)

Defines a tag parsing function NAME that parses tag values for SAM header HEADER-TYPE.

GENERIC-FUNCTION

Public

ALIGNMENT-TAG-DOCUMENTATION (TAG)

Returns the documentation for TAG or NIL if none is available.

Undocumented

WRITE-BAM-INDEX (INDEX STREAM)

Private

ENCODE-ALIGNMENT-TAG (VALUE TAG VECTOR INDEX)

Performs binary encoding of VALUE into VECTOR under TAG at INDEX, returning VECTOR.

Undocumented

BGZF-OF (CONDITION)

ERRNO-OF (CONDITION)

POSITION-OF (CONDITION)

PREV-POSITION-OF (CONDITION)

PREV-REFERENCE-OF (CONDITION)

REFERENCE-OF (CONDITION)

WRITE-REF-INDEX (REF-INDEX STREAM)

SLOT-ACCESSOR

Private

Undocumented

BUFFER-OF (OBJECT)

VARIABLE

Public

*SAM-VERSION*

The SAM version written by cl-sam.

Private

*BAM-INDEX-MAGIC*

The BAI index file magic header bytes.

*BAM-MAGIC*

The BAM file magic header bytes.

*DEFAULT-COMPRESSION*

The default bgz compression level (Zlib level 5).

*EMPTY-BGZ-RECORD*

SAMtools version >0.1.5 appends an empty BGZF record to allow detection of truncated BAM files. These 28 bytes constitute such a record.

*MANDATORY-HEADER-TAGS*

A mapping that describes the mandatory tags for each SAM header record type.

*VALID-GROUP-ORDERS*

Valid values for SAM group order tags. Group order is no longer a valid tag in SAM version 1.3.

*VALID-HEADER-TAGS*

A mapping that describes the valid tags for each SAM header record type.

*VALID-HEADER-TYPES*

A list of valid SAM header types.

*VALID-SORT-ORDERS*

Valid values for SAM sort order tags.

*VOFFSET-MERGE-DISTANCE*

If two index chunks are this number of bytes or closer to each other, they should be merged.

Undocumented

*BAM-INDEX-FILE-TYPE*

*INVALID-READ-NAME-CHARS*

*TREE-DEEPENING-BOUNDARIES*

CLASS

Public

BAM-INDEX

A BAM file index covering all the reference sequences in a BAM file.

BGZF

A block gzip file. - pathname: The file pathname. - stream: The file stream. - buffer: A simple-octet-vector used for buffering reads. - position: The file position component of the BGZF virtual file offset (most significant 48 bits). This is the position in the stream at which this member starts. - offset: The within-member offset component of the BGZF virtual file offset (least significant 16 bits). This is a position within the uncompressed data of the member. - pointer: The buffer index after the last usable data element. i.e. the index at which new data may be written to the buffer prior to deflating. Calling (subseq buffer 0 bgzf-pointer) will extract the current usable bytes. - util-buffer: a 4 byte buffer used internally to store an integer. - loaded-p: T if the bgz data has been loaded into the buffer (used internally in decompression and reading). - load-seek: On loading the bgz data, proceed immediately to this offset (used internally in decompression and reading). - eof: T if the decompression process has reached EOF (used internally in decompression and reading).

BGZIP-INPUT-STREAM

A stream that reads from a BGZF file.

BGZIP-STREAM

A BGZF stream capable of reading or writing block compressed data.

BIN

A bin within a BAM binning index. A bin is a region on a reference sequence that is identified by a number. The bin location and numbering system used is the UCSC binning scheme by Richard Durbin and Lincoln Stein. A bin contains a vector of chunks.

CHUNK

A chunk within a BAM index bin. Chunks are groups of records that lie within a bin and are close together within the BAM file.

REF-INDEX (BAM-INDEX REFERENCE-ID)

An index for a single reference sequence. An index is composed of a vector of bins (the binning index) and a vector of intervals (the linear index). The bins are sorted by increasing bin number. The binning index is hierarchical, while the linear index is flat, being a projection of reads from all bins onto a single vector.

Private

BGZ-MEMBER

A block-gzip data chunk; a gzip member with extensions, as defined by RFC1952. The extensions are described in RFC1952 and the SAM format specification. - sf1: RCF1952 first extra SubField. - sf2: RCF1952 second extra SubField. - slen: RFC1952 Subfield LENgth. - bsize: SAM spec total Block (member) SIZE. The serialized value is (1- bsize). - udata: Uncompressed DATA.

REGION (REF START END)

A range of bases over a reference sequence. A region is expressed in zero-based, half-open, interbase coordinates.

SAMTOOLS-BAM-INDEX

A samtools-specific index containing extra, undocumented data: - unassigned: number of unmapped reads not assigned to a reference

SAMTOOLS-REF-INDEX

A samtools-specific reference index containing extra, undocumented data: - start: the start offset of the reference - end: the end offset of the reference - mapped: number of reads mapped to the reference - unmapped: number of unmapped reads assigned to the reference by magic

Undocumented

ALIGNMENT

BAM-MERGE-STREAM

BAM-SORT-INPUT-STREAM

BAM-SORT-OUTPUT-STREAM

BGZF-HANDLE-MIXIN

CONDITION

Public

BAM-ERROR

The parent type of all BAM error conditions.

BGZF-IO-ERROR

A condition raised when an error occurs reading from or writing to a BGZF stream.

Undocumented

BAM-SORT-ERROR (PREVIOUS-REF PREVIOUS-POS REF POS &OPTIONAL MESSAGE &REST MESSAGE-ARGUMENTS)

CONSTANT

Private

+ALIGNMENT-SIZE-TAG+

The size of the BAM alignment size indicator.

+BGZ-MAX-PAYLOAD-LENGTH+

The maximium size in bytes of a BGZ member payload. This is dictated by the fact that the SAM spec makes 16 bits are avaliable for addressing positions with the BGZ member.

+BGZIP-BUFFER-SIZE+

Buffer size for {defclass bgzip-input-stream} internal buffer.

+LINEAR-BIN-SIZE+

The size in bases of intervals in the linear bin index.

+MAX-NUM-BINS+

The maximum number of bins possible in the BAM indexing scheme.

+MEMBER-HEADER-LENGTH+

The total number of bytes in the BGZ header.

+NULL-BYTE+

The termination byte for BAM strings.

+SAMTOOLS-KLUDGE-BIN+

Extra bin with different semantics added by samtools.

+SF1+

The value of the RCF1952 first extra subfield used by BGZ members.

+SF2+

The value of the RCF1952 first extra subfield used by BGZ members.

+SLEN+

The value of the RFC1952 subfield length field used by BGZ members.

+TAG-SIZE+

The size of a BAM auxilliary tag in bytes.

+UNKNOWN-POSITION+

The position value for unmapped reads.

+UNKNOWN-QUALITY+

The value for unavailable quality scores.

+UNKNOWN-REFERENCE+

The reference id for unmapped reads.

+XLEN+

The value of the RCF1952 eXtra LENgth field used by BGZ members.