Common Lisp Package: CL-ARFF-PARSER

A parser and some manipulation for Weka Machine (ARFF) Learning datasets.

README:

cl-arff-parser

A simple and easy to use arff (Attribute-Relation File Format) parser for common lisp.

For more information on arff take a look at: http://www.cs.waikato.ac.nz/ml/weka/arff.html

It is compatible with all major lisps (SBCL, CCL, Lispworks, ...)

FUNCTION

Public

PARSE-ARFF (ARFF-PATH)

The arff-path should be a string pointing to an arff-file.

Private

CSV->LIST (STRING &OPTIONAL (SEPARATOR ,))

Given a string like '1,2,3, 6, foo' will return list ('1' '2' '3' '6' 'foo')

PARSE-@ATTRIBUTE (LINE)

@attribute <attribute-name> <datatype>. Returns a list containing the attribute-name and then a list containing datatype information as parsed by parse-datatype.

PARSE-ATTRIBUTE-NAME (LINE)

Assumes the beginning of this line is the attribute-name. If spaces are to be included in the name then the entire name must be quoted. As second return value it also returns the rest of the line which should be the datatype.

PARSE-DATATYPE (LINE)

Assumes that the line starts with the datatype.Look at http://www.cs.waikato.ac.nz/~ml/weka/arff.html for information about the datatype. There is no support for the date datatype.

STRING-REPLACE (STR1 SUB1 SUB2)

Nondestructively replaces all occurences of sub1 in str1 by sub2

Undocumented

PARSE-DATA (LINE)

SEARCH-SPACE-OR-TAB (LINE)

TRIM-COMMENTS-AND-SPACES (STRING &OPTIONAL (COMMENT-MARKER %))

GENERIC-FUNCTION

Public

REMOVE-ATTRIBUTE-BY-NAME (ARFF NAME)

Removes the feature with the given name from the arff object (not from the actual file). It will remove it both from that @attributes and the @data.

SLOT-ACCESSOR

Public

ARFF-ATTRIBUTES (OBJECT)

The attributes as specified in the header. Each attribute is a list that looks as follows: ("attribute-name" ("type")). In case of a nominal attribute it looks like this: ("attribute-name" ("nominal" . values)).

SETFARFF-ATTRIBUTES (NEW-VALUE OBJECT)

The attributes as specified in the header. Each attribute is a list that looks as follows: ("attribute-name" ("type")). In case of a nominal attribute it looks like this: ("attribute-name" ("nominal" . values)).

ARFF-DATA (OBJECT)

All the data. The bulk of the file.

SETFARFF-DATA (NEW-VALUE OBJECT)

All the data. The bulk of the file.

ARFF-PATH (OBJECT)

A string to the path of the arff file. e.g. /home/user/myData/foo.arff

SETFARFF-PATH (NEW-VALUE OBJECT)

A string to the path of the arff file. e.g. /home/user/myData/foo.arff

ARFF-RELATION (OBJECT)

The string after @relation. This is essentially the name of the arff.

SETFARFF-RELATION (NEW-VALUE OBJECT)

The string after @relation. This is essentially the name of the arff.

CLASS

Public

ARFF

An arff object contains all the data found in a parsed arff file.