Using Super Csv
Dependencies
Super Csv requires JDK >= 1.5Javadocs
Can be found here Super Csv javadocTest coverage
Super Csv is quite heavily tested - more than 150 Junit tests are guarding the behavior with a coverage > 95%. While the tests are no guarantee, we hope most bugs have been eliminated. Here you can browse the code annotated with its test coverage: test coverage reports. Please feel free to submit any tests you find missing.Speed indications
Speed is a relative thing, depending on the underlying hardware, the current load of the system etc. We compare SuperCSV's speed relative to reading a file line by line and doing a split(",") on each line. While this comparison is not completely sound as a simple split does not encompass the complexities of CSV parsing it indicates the overhead of CSV procesing.
Method | Relative speed | Average execution time |
---|---|---|
split() | 100,00% | 0,37 |
ListReader | 338,92% | 1,24 |
MapReader | 424,24% | 1,55 |
BeanReader | 481,00% | 1,76 |
BeanReader (full) | 1003,66% | 3,68 |
The test implementation can be found in class ReadingSpeedTest.java in the source folder. The test generate a csv file consisting of 250.000 lines each with 6 columns of random size representing numbers, names and dates. The generated file is approximately 7.6 MB. The file is then read 5 times for each reading strategy used. The tests were conducted a slow 1800 Dell with 500 MB ram (of which 472 were in use upon testing).
The conclusion of the tests are that SuperCSV is only 3-5 times slower than a native read with a simple (insufficient) processing of data. Processing 1/4 of a million lines takes less than 2 seconds (the BeanReader). The two seconds encompasses reading the file, parsing the lines, instantiating a bean instance and populate the fields (as strings) using the bean's set-methods. The last row in the table (BeanReader full) is special in the sense, that on top of reading lines, creating objects and populating fields, it converts two columns on each lines into long's and one column into a date.
Putting these figures into perspective, I once was hired to fix a business application that spent 16 hours processing a CSV file of around 50.000 lines. So if your application is running slow, I dare say that it is unlikely due to using SuperCSV!
Overall usage explained
When using Super Csv you have to make five decisions- Will I read or write? Depending on choice use a reader or a writer
- What data structure will I read/write the data into/from. For example when reading, reading into a bean use CSVBeanReader, read into a map use CSVMapReader or a String list use CSVListReader
- If reading into maps or beans, either define a list of column names or utilize an existing header from the CSV file (if any). Use getCSVHeader() to easily retrieve the header
- Decide on what conversions and constraints you want for each column. See processors and constraints.
- Specify preferences for separation character, newline character, etc. See preferences.