Introduction
Our customers often have to import data from very simple Excel *.xslx-files into our software product: with some relevant rows and cells in rather primitive worksheets of an Excel workbook, and that's it. But we do not want to use large DLL's or third party software for that. Therefore we produced a small solution for our needs. It could be useful for you, too:
Using the code
Download the "Excel.dll" (12 kByte, you need .net 4.5!) and add it as a reference to your project. Or adapt the compact source code before - with only the relevant classes "
Workbook
", "
Worksheet
", "
Row
" and "
Cell
" and 45 C# code lines doing important work in total. Then read the worksheets, rows and cells from any Excel *.xlsx-file in your program like so:
foreach
(
var
worksheet
in
Workbook.Worksheets(
@"
C:\ExcelFile.xlsx"
)
foreach
(
var
row
in
worksheet.Rows)
foreach
(
var
cell
in
row.Cells)
Here you iterate through the worksheets, the rows (and the cells of each row) of the Excel file within three lines of code.
Points of Interest
This
article
(written by
M I developer
) describes all the theoretical background, if you are interested in it. We based our solution on the integrated ZIP-library in .net 4.5 and the standard XML-serializer of .net.
If you want to adapt our solution to your needs: edit the simple source code for the Excel.dll. This is how it works:
Maybe you did not know that xlsx-files are ZIP-files. And the text strings of the Excel cells of all worksheets per workbook are always stored in a file named "
xl/sharedStrings.xml
", while the worksheets are called "
xl/worksheets/sheet[1...n].xml
".
So we have to unzip and deserialize the relevant XML files in the Excel xlsx-file:
using
System.IO.Compression;
public
static
IEnumerable<worksheet> Worksheets(
string
ExcelFileName)
worksheet ws;
using
(ZipArchive zipArchive = ZipFile.Open(ExcelFileName, ZipArchiveMode.Read))
SharedStrings = DeserializedZipEntry<sst>(GetZipArchiveEntry(zipArchive,
@"
xl/sharedStrings.xml"
));
foreach
(
var
worksheetEntry
in
(WorkSheetFileNames(zipArchive)).OrderBy(x => x.FullName))
ws = DeserializedZipEntry<worksheet>(worksheetEntry);
ws.ExpandRows();
yield
return
ws;
As you see, we also have to find all worksheets of the workbook at the beginning. We filter the ZIP-archive entries for that:
private
static
IEnumerable<ziparchiveentry> WorkSheetFileNames(ZipArchive ZipArchive)
foreach
(
var
zipEntry
in
ZipArchive.Entries)
if
(zipEntry.FullName.StartsWith(
"
xl/worksheets/sheet"
))
yield
return
zipEntry;
For deserialization of each XML formatted ZIP-entry (see also this
article
written by
Md. Rashim uddin
) we use this generic method:
private
static
T DeserializedZipEntry<T>(ZipArchiveEntry ZipArchiveEntry)
using
(Stream stream = ZipArchiveEntry.Open())
return
(T)
new
XmlSerializer(
typeof
(T)).Deserialize(XmlReader.Create(stream));
Therefore the XML-structures have to be reflected in our classes. Here you see the "
sst
"-class and the "
SharedString
"-class for the XML in the "
shared strings table
":
="
1.0"
="
UTF-8"
="
yes"
<
sst
xmlns
="
http://schemas.openxmlformats.org/spreadsheetml/2006/main"
count
="
72"
uniqueCount
="
6"
>
<
t
>
Text A
<
/t
>
<
t
>
Text B
<
/t
>
public
class
sst
[XmlElement(
"
si"
)]
public
SharedString[] si;
public
sst()
public
class
SharedString
public
string
t;
The same strategy we also use for the "
worksheet
" -XML-file in the ZIP-file. There we focus on the XML-elements and -attributes "
row
", "
c
", "
v
", "
r
" and "
t
". All the work is done again by the
XmlSerializer
:
="
1.0"
="
UTF-8"
="
yes"
<
worksheet
xmlns
="
http://schemas.openxmlformats.org/spreadsheetml/2006/main"
>
<
dimension
ref
="
A1:F12"
/
>
<
sheetViews
>
<
sheetView
workbookViewId
="
0"
>
<
/sheetView
>
<
/sheetViews
>
<
sheetFormatPr
baseColWidth
="
10"
defaultRowHeight
="
15"
/
>
<
sheetData
>
<
row
r
="
1"
>
<
c
r
="
A1"
t
="
s"
>
<
v
>
0
<
/v
>
<
c
r
="
B1"
t
="
s"
>
<
v
>
1
<
/v
>
<
c
r
="
C1"
t
="
s"
>
<
v
>
2
<
/v
>
<
/sheetData
>
<
/worksheet
>
public
class
worksheet
[XmlArray(
"
sheetData"
)]
[XmlArrayItem(
"
row"
)]
public
Row[] Rows;
public
class
worksheet
public
class
Row
[XmlElement(
"
c"
)]
public
Cell[] FilledCells;
public
class
Cell
[XmlAttribute(
"
r"
)]
public
string
CellReference;
[XmlAttribute(
"
t"
)]
public
string
tType =
"
"
;
[XmlElement(
"
v"
)]
public
string
Value;
Of course we have to do a little bit in order to convert the usual Excel cell references like "
A1
", "
B1
" and so on to column indices. That is done via the setter of "
CellReference
" in the "
Cell
"-class (here we also derive the maximum column index for the whole worksheet) ...
[
XmlAttribute
(
"
r"
)]
public
string
CellReference
return
ColumnIndex.ToString();
ColumnIndex = worksheet.GetColumnIndex(
value
);
if
(ColumnIndex > worksheet.MaxColumnIndex)
worksheet.MaxColumnIndex = ColumnIndex;
... and a small method named "
GetColumnIndex()
":
private
int
GetColumnIndex(
string
CellReference)
string
colLetter =
new
Regex(
"
[A-Za-z]+"
).Match(CellReference).Value.ToUpper();
int
colIndex =
0
;
for
(
int
i =
0
; i < colLetter.Length; i++)
colIndex *=
26
;
colIndex += (colLetter[i] -
'
A'
+
1
);
return
colIndex -
1
;
The last challenge has to do with the fact, that the Excel file does not contain empty Excel cells. So the tiny methods "
ExpandRows()
" and "
ExpandCells()
" handle that problem:
public
void
ExpandRows()
foreach
(
var
row
in
Rows)
row.ExpandCells(NumberOfColumns);
public
void
ExpandCells(
int
NumberOfColumns)
Cells =
new
Cell[NumberOfColumns];
foreach
(
var
cell
in
FilledCells)
Cells[cell.ColumnIndex] = cell;
FilledCells =
null
;
In the end we have an array of all rows and an array of all cells for each row representing all columns of the specific Excel worksheet. Empty cells are null in the array, but the
ColumnIndex
of each cell in "
Row.Cells[]
" corresponds with the actual Excel column of each cell.
Unfortunately the XML format is not very clear about how to interpret the value of the Excel cells. We tried to do it like so, but any hint for improvement would be appreciated:
if
(tType.Equals(
"
s"
))
Text = Workbook.SharedStrings.si[Convert.ToInt32(_value)].t;
return
;
if
(tType.Equals(
"
str"
))
Text = _value;
return
;
Amount = Convert.ToDouble(_value, CultureInfo.InvariantCulture);
Text = Amount.ToString(
"
#,##0.##"
);
IsAmount =
true
;
catch
(Exception ex)
Amount =
0
;
Text =
String
.Format(
"
Cell Value '{0}': {1}"
, _value, ex.Message);
Besides, when you know that an Excel cell contains a date as its value, you can use this method for conversion:
public
static
DateTime DateFromExcelFormat(
string
ExcelCellValue)
return
DateTime.FromOADate(Convert.ToDouble(ExcelCellValue));
Let me know how the total Excel.DLL works in your environment - and have fun with it!
History
26.7.2014 - Posted initially.
26.7.2014 - Files uploaded here.
26.7.2014 - Explanations added, formatting improved.
28.7.2014 - More explanation added. Deserialization slightly improved.
29.7.2014 - Date conversion from Excel format to DateTime via DateTime.FromOADate()
29.7.2014 - Essence of class "worksheet" added.
31.7.2014 - Comments added in the source code, XML-documentation added to the dll
31.7.2014 - The program now reads all worksheets from a workbook, not only the first one
Is there a way to open the Excel file if it is opened by another process
Member 10561156
22-Apr-21 9:54
Member 10561156
22-Apr-21 9:54
OK. I absolutely love this little nimble library.
But if the file is opened by another process or by another user, it would throw an exception.
Is there a way we deal with this situation?
Thank you.
Sign in
·
View Thread
Hi Dietmar,
Great library, perfect for rudimentary parsing of xlsx files. You saved me from home brewing or having to include some large over-kill library.
I'm using it to parse uploaded xlsx files on an intranet site. In my case, I am reading the file and then making database updates based on it's data. Since I don't need the file afterwards I don't need to write it to the file system and just process it in memory before disposing of it. To that end I overloaded the Workbook.Worksheets function to accept a MemoryStream rather than a file path. It's a minor change but maybe it's useful for someone else and worth including in your source.
public
static
IEnumerable<worksheet> Worksheets(MemoryStream ExcelFileMemoryStream)
worksheet ws;
using
(ZipArchive zipArchive =
new
ZipArchive(ExcelFileMemoryStream, ZipArchiveMode.Read,
true
))
SharedStrings = DeserializedZipEntry<sst>(GetZipArchiveEntry(zipArchive,
@"
xl/sharedStrings.xml"
));
foreach
(
var
worksheetEntry
in
(WorkSheetFileNames(zipArchive)).OrderBy(x => x.FullName))
ws = DeserializedZipEntry<worksheet>(worksheetEntry);
ws.NumberOfColumns = worksheet.MaxColumnIndex +
1
;
ws.ExpandRows();
yield
return
ws;
Thanks again!
Cheers,
-Will
Sign in
·
View Thread
I had the same problem: it seems to be that some files contain "odd" xml data, which means that FilledCells does not get set.
I didn't chase it down, but a simple fix was:
Download the source code (second link)
Edit the "rows.cs" file.
Change the "ExpandCells" method to allow for it:
Cells =
new
Cell[NumberOfColumns];
if
(FilledCells ==
null
)
return
;
foreach
(
var
cell
in
FilledCells)
Cells[cell.ColumnIndex] = cell;
FilledCells =
null
;
It then works with my files, and may do for yours.
Bad command or file name. Bad, bad command! Sit! Stay! Staaaay...
AntiTwitter:
@DalekDave
is now a follower!
Sign in
·
View Thread
Hi. Thank you for your work. Unfortunately it doen't work with some of the xlsx files. See the one where it fails here:
generic.xlsx - Google Drive
[
^
]
The exception happens when trying to enumerate the sheets or just access the first sheet:
System.NullReferenceException:
Object
reference not
set
to an instance of an
object
at Excel.Row.ExpandCells (System.
Int32
NumberOfColumns) [0x0002d]
in
<d6ec7237f753496cbadf28efb7f11940>:
0
at Excel.worksheet.ExpandRows () [0x0000f]
in
<d6ec7237f753496cbadf28efb7f11940>:
0
at Excel.Workbook+<Worksheets>d__2.MoveNext () [0x000c6]
in
<d6ec7237f753496cbadf28efb7f11940>:
0
at System.Linq.Enumerable+<SelectManyIterator>c__Iterator5`3[TSource,TCollection,TResult].MoveNext () [0x000fd]
in
<63992662b765477a898ef49cdcc99ee2>:
0
at System.Linq.Enumerable+WhereEnumerableIterator`1[TSource].MoveNext () [0x00062]
in
<63992662b765477a898ef49cdcc99ee2>:
0
at System.Linq.Buffer`1[TElement]..ctor (System.Collections.Generic.IEnumerable`1[T] source) [0x00087]
in
<63992662b765477a898ef49cdcc99ee2>:
0
at System.Linq.Enumerable.ToArray[TSource] (System.Collections.Generic.IEnumerable`1[T] source) [0x00011]
in
<63992662b765477a898ef49cdcc99ee2>:
0
Sign in
·
View Thread
Try the solution in my post above.
Bad command or file name. Bad, bad command! Sit! Stay! Staaaay...
AntiTwitter:
@DalekDave
is now a follower!
Sign in
·
View Thread
I have .net 4.5 installed
included the dll in my reference list, but when I compiled I get a message saying
Workbook does not exist in the current context. This was compiled using Loadrunner. It seems like the dll was not recognized.
Sign in
·
View Thread
I was looking exactly for this solution! In my company I have internal xlsx files that I need to upload to a database. And this is the best solution I found.
Really I didn't understand some of the posts I read (I'm new to CodeProject), because the solution is perfectly fine for so many environments, and if it does not apply to specific ones - adapt it or discard it. Thanks in advance Dietmar.
Sign in
·
View Thread
Hi Dietmar,
I have used your excel.dll and it works great!
There is only one problem I have. after I have read some Cell-values in my programm, and even closing my program, this excel-file is locked for using in excel. When I try to open it, there comes a warning that I can only read this file because it is still used by another program. After I have shut down my computer and restart it, the excel-file is writeble again (until I use it again im my program).
Even if I only do this part of my program, it is locked:
foreach (var worksheet in Workbook.Worksheets(FileName))
break;
Can you tell me what I can do to solve this?
Sign in
·
View Thread
I actually read about this in another article. Where if the program isn't successfully cleaned up, and closed. You'll run into awkward problems like so.
https://coderwall.com/p/app3ya/read-excel-file-in-c
Sign in
·
View Thread
i need to R\W to excel and i was gonna use MS Office interop to Write and this to read but there where some complication with the same package name..I'm new to C# so help me..
Sign in
·
View Thread
Please check my excel file
here
The library does not work on that file, I run the code but every cell is empty
Sign in
·
View Thread
I included the source code and a detailed description of how one can work with excel-files. But I am afraid every programmer has to adapt this sample code to his own specific case e.g. by understanding the approach, analysing his data, debugging the program in his environment and adapting it by himself.
This is, what programmers usually do.
Sign in
·
View Thread
Nice library.
How would I select a specific worksheet by name in a file with multiple worksheets?
I have a file with 10 or so worksheets and I need to process a specific worksheet.
Thank you.
Sign in
·
View Thread
Pls. look into the first code lines under "Points Of Interest". There you see, that the collection of worksheets is ordered by their name, and the program iterates through the worksheets. Instead of this iteration you can access the worksheet by name also, as you do it with any collection. Does that help you?
Sign in
·
View Thread
I used your 3 line code and the program does not enter the first foreach, so basically the 3 line code is entirely skipped. Even if I change the filename to a non existent file it does the same and I don't even get an error.
What could be the problem?
Thanks
Sign in
·
View Thread
So the program does not find any worksheet in your xlsx-file, I presume. I'm afraid you have to rename your file to .zip, look into into it and try to figure out what is abnormal with it by debugging my code.
Sign in
·
View Thread
How could I read defined names or list of defined names in Name Manager?
LeonOX
21-Jun-16 1:43
LeonOX
21-Jun-16 1:43
Re: How could I read defined names or list of defined names in Name Manager?
dietmar paul schoder
21-Jun-16 3:03
dietmar paul schoder
21-Jun-16 3:03
This article shows, how you simple rename a *.xlsx-file to *.zip and look into its content.
And it shows, how you transform the XML you find in there into C Sharp classes.
So you can possibly figure out, how things like the one you mention, can be programmed.
Sign in
·
View Thread
Re: How could I read defined names or list of defined names in Name Manager?
LeonOX
22-Jun-16 0:11
LeonOX
22-Jun-16 0:11
I finally found the way to read Defined Names. I made some changes in the code, basically I read and serialize the file xl/workbook.xml, which contains Defined Names.
Also I found an error when reading some xslt files and adapted Cell.CellReference (ColumnIndex to Cell reference) to allow relating Defined Names with Cells, since ColumnIndex is still accessible due to its public modifier.
So now, plus having the Defined Names we have the complete workbook.xml data which contains additional info in the new property Workbook.WorkbookDefinition
I posted the complete code on github:
GitHub - Excel.Net: Read Xslx files for .NET plattform (C#)
Thank you
Kind regards
Leonardo
Sign in
·
View Thread
Thanks for posting this it is very helpful for what I am working on. I was just curious if you have found a solution for the following issue. I have been stuck on this for a long time and my attempts to adapt the code to this situation have failed.
When reading an excel sheet with cells that have character level formatting, the text value returns null for that cell. Looking into the sharedstrings.xml I found that the "< si>" elements that return null during the deserialization of the xml file are the ones that contain two "< t>" elements within the "< si>" element to account for separately formatted text within the same cell.
Is this a known issue?
Any direction would be greatly appreciated.
Thanks
modified 7-Feb-16 20:50pm.
Sign in
·
View Thread
Web01
2.8:2024-04-02:1