Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

CsvDataContext

The Apache MetaModel CSV module is one of the most advanced implementations there is, compared to how simple a file format CSV is. The implementation's main features are:

  • Full implementation of DataContext and UpdateableDataContext.
  • Streaming query support without memory leaks, tested on billion-record data sets.
  • Support for parallelized row-parsing when multiline values are turned OFF. In these cases the Row objects served for queries have not yet been parsed, making this a potential parallel data consumption activity. 
  • Support for sample-based COUNT queries when the query's COUNT select item has the "allow function approximation" flag set. This means that applications can get a quick approximation of the amount of rows, even in a really big file.

Creating from plain old java code - CsvDataContext

This is really simple:

Code Block
languagejava
Resource csvResource = new FileResource("/path/to/my/file.csv");
CsvConfiguration configuration = new CsvConfiguration(
  // arguments here to fit the resource
);
 
DataContext dataContext = new JdbcDataContext(resource, configuration);

Creating from properties - CsvDataContextFactory

If you wish to construct your CSV DataContext from properties, this is also possible. For instance:

...

PropertyExample valueRequiredDescription
type
csv
(tick)Must be set to 'csv' or else another type of DataContext will be constructed.
resource
/data/stuff.csv
(tick)Must reference the resource path to read/write CSV data from/to.
quote-char
"
 The enclosing quote character to use for values in the CSV file.
separator-char
,
 The separator character to use for separating values in the CSV file.
escape-char
\
 The escape character to use for escaping CSV parsing of special characters.
encoding
UTF-8
 The character set encoding of the data.
column-name-line-number
1
 The line-number which holds column names / headers.
fail-on-inconsistent-row-length
true
 Whether or not to fail (throw exception) on inconsistent row lengths, or to suppress these parsing issues.
multiline-values
false
 Whether or not the data contains values spanning multiple lines (if this never happens, a faster parsing approach can be applied).

Updating CSV data

Modifying CSV data is done just like with any other MetaModel module - by means of implementing your an update script that is then submitted to the UpdateableDataContext's executeUpdate(...) method. This approach guarantees isolation and coherence in all update operations. Here is a simple example:

...