So You Want to Write Your Own CSV Code? (2014)

栏目: IT技术 · 发布时间: 5年前

内容简介:So You Want To Write Your Own CSV code? Fields separated by commas and rows separated by newline. Easy right? You can write the code yourself in just a few lines.Hold on a second…You need to enclose the field with quotes (

So You Want To Write Your Own CSV code? Fields separated by commas and rows separated by newline. Easy right? You can write the code yourself in just a few lines.

Hold on a second…

What if there are commas inside the fields?

You need to enclose the field with quotes ( ). Easy right?

But can only some fields but not all be quoted?

What if there are quotes in the fields

You need to double each instance of quote in the field and god forbid you forget to enclose the field in quotes.

Also make sure not to mistake a quoted empty field ( ...,"",... ) for a double quote.

What if there is a newline inside a field?

Of course you must enclose the field using quotes.

What are the accepted newline characters?

CRLF? CR? LF? What if there are multiple newlines?

What if the newline characters change?

E.g.: newlines within a fields are different from newlines at the end of a line.

Still with me?

What if there is an extra comma at the end of a line?

Is there an empty field at the end or is that just a superfluous comma?

What if there is a variable amount of field per line?

What if there is an empty line?

Is that an EOF, a single empty field or no field at all?

What about whitespace?

What if there is heading/trailing whitespaces in the fields?

What if the CSV you get always has a space after a comma but it’s not part of the data?

What if the character separating fields is not a comma?

Not kidding.

Some countries use a comma as decimal separator instead of a colon. In those countries Excel will generate CSVs with semicolon as separator. Some files use tabs instead of comma to avoid this specific issue. Some even use non displayable ASCII characters .

Don’t forget to account for it when reading an arbitrary CSV file. No there’s no indication which delimiter a file uses.

What if the program reading CSV use multiple delimiters?

Some program (including Excel) will assume different delimiters when reading a file from the disk and reading it from the web. Make sure to give it the right one!

What if there is non ASCII data?

Just use utf8 right? But wait…

What if the program reading the CSV use an encoding depending on the locale?

A program can’t magically know what encoding a file is using. Some will use an encoding depending on the locale of the machine.

Meaning if you save a CSV on a machine and open it it another it may silently corrupt the data.

What if I put a BOM in my file?

After all Byte Order Masks can determine the unicode encoding used, that’s what they are for right? (actually they are used to determine the endianness but I won’t get into that).

If you include a BOM Excel will interpret the csv as a text file, not a CSV. This means breaks within lines are not handled.

Do you really still want to roll your own code to handle CSV?

CSV is not a well defined file-format. The RFC4180 does not represent reality. It seems as every program handles CSV in subtly different ways. Please do not inflict another one onto this world. Use a solid library.

If you have full control over the CSV provider and supplier and the data they emit you’ll be able to build a reliable automated system.

If a supplied CSV is arbitrary, the only real way to make sure the data is correct is for an user to check it and eventually specify the delimiter, quoting rule,… Barring that you may end up with a error or worse silently corrupted data.

Writing CSV code that works with files out there in the real world is a difficult task. The rabbit hole goes deep. Ruby CSV library is 2321 lines.

Discussion on Hacker News and Reddit .


以上所述就是小编给大家介绍的《So You Want to Write Your Own CSV Code? (2014)》,希望对大家有所帮助,如果大家有任何疑问请给我留言,小编会及时回复大家的。在此也非常感谢大家对 码农网 的支持!

查看所有标签

猜你喜欢:

本站部分资源来源于网络,本站转载出于传递更多信息之目的,版权归原作者或者来源机构所有,如转载稿涉及版权问题,请联系我们

Java与模式

Java与模式

阎宏 编著 / 电子工业出版社 / 2002-10 / 88.00元

《Java与模式》是一本讲解设计原则以及最为常见的设计模式的实用教材,目的是为了工作繁忙的Java系统设计师提供一个快速而准确的设计原则和设计模式的辅导。全书分为55章,第一个章节讲解一个编编程模式,说明此模式的用意、结构,以及这一模式适合于什么样的情况等。每一个章节都附有多个例子和练习题,研习这些例子、完成这些练习题可以帮助读者更好地理解所讲的内容。大多数的章节都是相对独立的,读者可以从任何一章......一起来看看 《Java与模式》 这本书的介绍吧!

Markdown 在线编辑器
Markdown 在线编辑器

Markdown 在线编辑器

正则表达式在线测试
正则表达式在线测试

正则表达式在线测试

HSV CMYK 转换工具
HSV CMYK 转换工具

HSV CMYK互换工具