Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about Collectives
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about Teams
In
Polars
, how can one specify a single dtype for all columns in
read_csv
?
According to the
docs
, the
dtypes
argument to
read_csv
can take either a mapping (dict) in the form of
{'column_name': dtype}
, or a list of dtypes, one for each column.
However, it is not clear how to specify "I want all columns to be a single dtype".
If you wanted all columns to be Utf-8 for example and you knew the total number of columns, you could do:
pl.read_csv('sample.csv', dtypes=[pl.Utf8]*number_of_columns)
However, this doesn't work if you don't know the total number of columns.
In Pandas, you could do something like:
pd.read_csv('sample.csv', dtype=str)
But this doesn't work in Polars.
Reading all data in a csv to any other type than pl.Utf8
likely fails with a lot of null
values. We can use expressions to declare how we want to deal with those null values.
If you read a csv with infer_schema_length=0
, polars does not know the schema and will read all columns as pl.Utf8
as that is a super type of all polars types.
When read as Utf8
we can use expressions to cast all columns.
(pl.read_csv("test.csv", infer_schema_length=0)
.with_columns(pl.all().cast(pl.Int32, strict=False))
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.