java - Execute SQL on CSV files via JDBC

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

I need to apply an SQL query to CSV files (comma-separated text files). My SQL is predefined from another tool, and is not eligible to change. It may contain embedded selects and table aliases in the FROM part.

For my task I have found two open-source (this is a project requirement) libraries that provide JDBC drivers:

CsvJdbc

XlSQL

JBoss Teiid

Create an Apache Derby DB, load all CSVs as tables and execute the query.

These are the problems I encountered:

it does not accept the syntax of the SQL (it uses internal selects and table aliases). Furthermore, it has not been maintained since 2004.

I could not get it to work, as it has as dependency a SAX Parser that causes exception when parsing other documents. Similarly, no change since 2004.

Have not checked if it supports the syntax, but seems like an overhead. It needs several entities defines (Virtual Databases, Bindings). From the mailing list they told me that last release supports runtime creation of required objects. Has anyone used it for such simple task (normally it can connect to several types of data, like CSV, XML or other DBS and create a virtual, unified one)?

Can this even be done easily?

From the 4 things I considered/tried, only 3 and 4 seem to me viable. Any advice on these, or any other way in which I can query my CSV files?

Cheers

In fact HSQLDB is the only suggested solution that can open an existing CSV file as an SQL table. It allows both performing SQL queries directly on the CSV file and updating the records. – fredt Nov 21, 2010 at 13:43 I did not try files that big with HSQL, but the documentation says the largest one is 8 TB (T, not G). You need to use CREATE CACHED TABLE to not keep the data in memory. hsqldb.org/web/hsqlFAQ.html#FAQ – Vladimir Dyuzhev Jun 2, 2014 at 20:26

If your SQL is predefined and cannot be changed your best option is to load your CSV into a database and run queries against it.

Apache Derby is a viable option, so are MySQL , which even has a CSV storage engine or PostgreSQL .

Does your SQL use any proprietary functions / extensions? If so, that may limit your choices.

If you are wanting to treat csv files as databases from within a Java program, you should look at the h2 database engine. It has really nice support for reading/writing CSV files and working with in-memory databases. It's a successor to hsql, faster and with added features. You can read about the csv support in the h2 tutorial.

You can read how easy this is to do using h2 in a Groovy script in this blog post: bayesianconspiracy.blogspot.com/2010/02/… – Kolmogorov Apr 1, 2010 at 0:10

maybe a bit late, sorry for that.

I've been developing the csvjdbc for over a year now and since a few weeks I've got "administrator" rights on that project so I've been able to release the most recent version I had produced. it does all "we" need (we: me and my current my colleagues) need, and I'm adding things as bugs are filed.

have a look at it now and decide again. (the web documentation still needs reviewing, for better insight, check the test cases, which are very extensive).

embedded selects? table aliases? no, not yet available there. but then again, feel free to file a bug report with a non working query and who knows... – mariotomo May 29, 2010 at 7:28

There is a Groovy script, gcsvsql that lets you treat csv files as database tables, including joins. With gcsvsql you can do things like:

gcsvsql "select * from people.csv where age > 40"

gcsvsql "select people.name,children.child from people.csv,children.csv where people.name=children.name"

gcsvsql "select avg(score) from people.csv where age < 40"

You can find this script, which is based on the h2 database engine, at Google code here:

http://code.google.com/p/gcsvsql/

I know, it's a very old case, but...

CsvJdbc is a cool library, but there are some issues using DbUtils while mapping ResultsSets to PoJos. A second bad thing is, that dosn't have a good support for different Datatypes.

After playing with CSVJdbc I will use a stupid CsvParser to read the Files an pump them into a HsqlDB or something like that.

have you reported the issues with csvjdbc? the current maintainer (not me any more) is very prompt in picking up serious issues. – mariotomo Jul 3, 2015 at 12:12 you should be able to create a ticket there once you log in / register. anonymous users are not allowed to create nor update tickets. I guess in order to avoid spammers... but be more specific, what "some issues" you met, and what "good support" you expect? – mariotomo Jul 11, 2015 at 16:32

Thanks for contributing an answer to Stack Overflow!

Please be sure to answer the question . Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.
Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers .