Category Archives: metaprogramming

CCSVLIB 20130307

Update 3/7/13: I forgot do a collision search on Google for “csvlib” before releasing this. Apparenlty everyone calls their CSV handling utilities package “csvlib”. I’ve renamed mine CCSVLIB.

I keep finding myself in situations where I’ve got data in spreadsheet form but I want to perform some analysis or transformation on the data beyond my capabilities with LibreOffice Calc or Microsoft Excel. Then sometimes, I want to go the opposite direction, and have my software do output in a format that I can convert into a spreadsheet. Fortunately, both Calc and Excel can read and write comma-separated-values (CSV) files. CSV files are nice to work with. They’re plain text and thus easy to read or write from my own software.

Well, easy in theory. In practice, CSVs that come from different sources may use different formats (quotes vs. no quotes vs. optional quotes, are commas allowed in the data, etc.), which makes reading CSVs a little too painful for use in small one-off scripts and programs. In addition to inconsistent formats, the code necessary to correctly parse a CSV file is often larger than the code that performs whatever analysis or transformation I want.

I deal with CSV files enough that I decided to write my own CSV parsing library.  I suppose I could have searched the Internet for someone else’s solution, but I needed a project, and I think parsing is fun. I also decided that if I was going to implement a CSV parsing library then I was going to do it right. I started by looking for a standard for the CSV format and found RFC 4180. After a couple of hours at my keyboard, I had a working parser and a decent data structure for pulling data from RFC 4180 CSVs into memory. My library came up in conversation with my supervisor a few days back (I’m a TA for CS 115 at UK), and she mentioned that she wanted a copy. I decided I wanted to release the software, so I added the capability to write CSV files, polished the API, and wrote some documentation.

What makes CCSVLIB a better choice than any of the other CSV parsing libraries for C/C++? Objectively speaking, nothing, or at least nothing that I know of. I haven’t taken a close look at any of the other stuff that’s available. I can say, based on a cursory Google search, that there aren’t many implementations of RFC 4180. CCSVLIB implements RFC 4180 (well, at least mostly), so it should be able to consume most sane CSV files. Also, CCSVLIB is simple, short, and well documented. The current version is 1051 lines of C, about 400 of which are comments.

I’m releasing CCSVLIB under the BSD license. You can download the source tarball from the link below or from the software page. Documentation and an example are included in the download.

CCSVLIB-20130307.tgz

KOAP 20130205

In an effort to sustain momentum going into the semester, I was tentatively scheduled to give a talk about KOAP for our research group Tuesday afternoon. KAOP is my tool for developing OpenCL applications using the C host API. I took the opportunity yesterday afternoon to change a few of the things that were bugging me about KOAP.

First, a little bit about how KOAP works internally (well, how it worked until Tuesday). KOAP takes an input file containing C code, OpenCL code, and KOAP directives as input. KOAP expands the directives into OpenCL API calls and combines all of the OpenCL code into a string for compilation at runtime. KOAP does not use formal parsing methods. The parsing takes place over multiple passes and is very ad-hoc. KOAP reads the input into a single string. KOAP processes comments and KOAP includes (like C preprocessor includes) in this first step. KOAP then separates the OpenCL source from the C source and breaks the source strings into double-ended queues (STL deque) of strings, using newline characters as delimiters. KOAP expands directives one line at a time, building a deque of output lines as it goes.

Why STL deques you ask? At one point, that was the only STL container that supported the methods I needed (or thought I needed). My first modification Tuesday was to replace all deques with STL vectors. Vectors support all of the needed operations, and are better suited to the problem (I’m mostly using the element access operator [] and the push_back method). KOAP has been released for over two years now, and I’ve spent two years thinking it was dumb that KOAP used double-ended queues. That’s not bugging me anymore.

My other modification is actually user visible. KOAP understands a handful of arguments for things like setting the flags passed to the OpenCL compiler, setting the device type to be used (OpenCL works on CPUs, GPUs, and other accelerators), and a few other things. All of the command line arguments came in pairs (-argname argument). I had written a very dumb bit of code to parse the command line arugments and set the necessary internal flags. My old parser required that the KOAP file for processing be the last argument, and would only process one KOAP file. I’ve rewritten the argument parser to be more general. The new parser is smarter about how it parses the arguments and accepts as many KOAP input files as you wish to give it.

The queues and the argument parser were the two things that bugged me the most about KOAP. Now that they’re fixed, I’m reasonably satisfied with how KOAP is structured internally. I’m not quite to the point of being proud of the codebase, but at least now there’s nothing in KOAP that I find embarrassing.