sql - How do I read Unicode characters from an MS Access 2007 database through Java?

Collectives™ on Stack Overflow

Find centralized, trusted content and collaborate around the technologies you use most.

Learn more about Collectives

Teams

Q&A for work

Connect and share knowledge within a single location that is structured and easy to search.

Learn more about Teams

In Java, I have written a program that reads a UTF8 text file. The text file contains a SQL query of the SELECT kind. The program then executes the query on the Microsoft Access 2007 database and writes all fields of the first row to a UTF8 text file.

The problem I have is when a row is returned that contains unicode characters, such as "♪". These characters show up as "?" in the text file.

I know that the text files are read and written correctly, because a dummy UTF8 character ("◎") is read from the text file containing the SQL query and written to the text file containing the resulting row. The UTF8 character looks correct when the written text file is opened in Notepad, so the reading and writing of the text files are not part of the problem.

This is how I connect to the database and how I execute the SQL query:

Connection c = DriverManager.getConnection("jdbc:odbc:Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=C:/database.accdb;Pwd=temp");
ResultSet r = c.createStatement().executeQuery(sql);
I have tried making a charSet property to the Connection but it makes no difference:
Properties p = new Properties();
p.put("charSet", "utf-8");
p.put("lc_ctype", "utf-8");
p.put("encoding", "utf-8");
Connection c = DriverManager.getConnection("...", p);
Tried with "utf8"/"UTF8"/"UTF-8", no difference. If I enter "UTF-16" I get the following exception: 
java.lang.IllegalArgumentException: Illegal replacement
Been searching around for hours with no results and now turn my hope to you. Please help!
I also accept workaround suggestions. =) What I want to be able to do is to make a Unicode query (for example one that searches for posts that contain the "あ" character) and to have results with Unicode characters receieved and saved correctly.
Thank you!
Update. Here is a self-contained example of the issue:
package test;
import java.io.BufferedReader;import java.io.File;import java.io.FileInputStream;import java.io.FileOutputStream;import java.io.InputStreamReader;import java.io.OutputStreamWriter;import java.nio.charset.Charset;import java.sql.Connection;import java.sql.DriverManager;import java.sql.ResultSet;import java.util.Properties;
public class Standalone {
    public static void main(String[] args) {
        try {
            Properties p = new Properties();
            p.put("charSet", "UTF8");
            Connection c = DriverManager.getConnection("jdbc:odbc:Driver={Microsoft Access Driver (*.mdb, *.accdb)};DBQ=./dummy.accdb;Pwd=pass", p);
            ResultSet r = c.createStatement().executeQuery("SELECT TOP 1 * FROM main;");
            r.next();
            OutputStreamWriter osw = new OutputStreamWriter(new FileOutputStream(new File("results.txt")), Charset.forName("UTF-8"));
            osw.write(new BufferedReader(new InputStreamReader(new FileInputStream("utf8.txt"), Charset.forName("UTF-8"))).readLine() +" : "+ r.getString("content"));
            osw.close();
            c.close();
            System.out.println("Done.");
        } catch (Exception e) {
            e.printStackTrace();
What the example does is that it opens the database "dummy.accdb" encrypted with the password "pass" and pulls the first post out of the table "main". It then reads the text file "utf8.txt" and writes a text file "results.txt" which will contain the first row of "utf8.txt" plus the value of the field "content" it got from the database.
In the file "utf8.txt" I have stored "♜♞♝♛♚♝♞♜♟♖♘♗♕♔♗♘♖♙".
In the database's "main" table's "content" field I have stored "♫♪あｷﾀℳℴℯ♥∞۞♀♂".
After the application has finished running the "results.txt" has the following content: "♜♞♝♛♚♝♞♜♟♖♘♗♕♔♗♘♖♙ : ?????Moe?8???".
It successfully read and write the UTF8 characters of the "utf8.txt" text file, but failed to obtain the correct characters from the database. This is where the problem lies.
Update. Thought I should mention that the field in the database is of the type "memo", I have tried havig "Unicode Compression" set both to "No" and to "Yes" (recreating the post between tries to make sure no compression were there when "No" was selected). To my understanding Access uses UTF-16 when it saves Unicode characters, however with compression on it changes to UTF-8. In any case this did not make any difference.
Bonus question, anyone know how to connect to the database using a pure ODBC provider in Java? Or any other kind of method? This would provide me with a good workaround.
Update. I have been trying to feed these four to getConnection:
"Provider=Microsoft.Jet.OLEDB.4.0;Data Source=./dummy.accdb"
"jdbc:odbc:Provider=Microsoft.Jet.OLEDB.4.0;Data Source=./dummy.accdb"
"jdbc:odbc:Driver={Microsoft.Jet.OLEDB.4.0};Data Source=./dummy.accdb"
"jdbc:odbc:Provider=Microsoft.ACE.OLEDB.12.0;Data Source=./dummy.accdb"
The first give the error "java.sql.SQLException: No suitable driver found for Provider=Microsoft.Jet.OLEDB.4.0;Data Source=./dummy.accdb" and the two in the middle gets "java.sql.SQLException: [Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified". The last one gets "java.sql.SQLException: [Microsoft][ODBC Driver Manager] Data source name too long".
I don't understand what getConnection wants. The parameter description is as follows: "url - a database url of the form jdbc:subprotocol:subname". Huh? I clearly don't get what that means.
Anyone know any alternative working ways of connecting to the Access 2007 database through Java? Maybe the providers I tried aren't supported but some other might be?
                You'll need to post a minimalist, self-contained example demonstrating the problem, because according to the answer here: stackoverflow.com/questions/1467412/…, p.put("charSet", "UTF8") should be working, since you're using the JDBC<->ODBC bridge (java.sun.com/j2se/1.4.2/docs/guide/jdbc/bridge.html). And on the file side, you say you're sure the file is being written as UTF8 (which you must be doing on purpose; Java's default will be platform encoding, which almost certainly isn't UTF8).
– T.J. Crowder
                May 9, 2010 at 10:23
                @Crowder: "which almost certainly isn't UTF8" is true for Windows. Most modern Linux distributions use UTF-8. I know it's not directly relevant to the question, I just want to avoid anyone taking that as a statement that's true in all situations.
– Joachim Sauer
                May 9, 2010 at 10:30
                @T.J. Crowder: Ok, I've done a self-contained example, just going to figure out how to share it with you guys.
– Peter
                May 9, 2010 at 12:14
                The original post has been edited to contain the example, please have a look at it and give me your input. Thanks a lot.
– Peter
                May 9, 2010 at 12:23
                @T.J Crowder: By the way, the answer you were referring to refers to java.sun.com/j2se/1.4.2/docs/guide/jdbc/bridge.html  That article is outdated, it was probably written over five years ago (the copyright notice at the bottom says 2002). Today some key classes used in that article cannot be used, for example "sun.jdbc.odbc.JdbcOdbcDriver", "sun.jdbc.odbc.ee.DataSource" and "sun.jdbc.odbc.ee.ConnectionPoolDataSource". They either have access restrictions upon them or simply cannot be resolved.  Setting "charSet" may have worked once in the past, but it sadly doesn't work today.
– Peter
                May 9, 2010 at 14:43
Since you mentioned switching to some other DB than Access is possible, I urge you to do so. Making your software on Microsoft Office products has always been a maintenance nightmare for me, so choose anything else from this list: http://java-source.net/open-source/database-engines.
I would go with Apache Derby for this, or just use the Java Database JavaDB that comes preinstalled with any current Sun Java Installation (and is in fact a repackaged Derby DB)
Now that the JDBC-ODBC Bridge has been dropped from Java SE 8 and Oracle has confirmed that this issue will never be fixed (ref: here) a good alternative would be to use UCanAccess. For more information, see
UCanAccess on SourceForge
Manipulating an Access database from Java without ODBC
        Thanks for contributing an answer to Stack Overflow!
Please be sure to answer the question. Provide details and share your research!
But avoid …
Asking for help, clarification, or responding to other answers.
Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.