Windows,macOS,Linux换行标识的前世今生,如何处理文本文件行尾的^M

2023年 9月 25日 29.7k 0

我们在Windows创建文本文件,默认使用的换行标识为CRLF

🌈When we create a text file in Windows, the default line break symbol is CRLF.

这种CRLF文件在Linux系统中打开,则会在换行的位置显示^M

🌈When this CRLF file is opened in the Linux system, ^M will be displayed at the line break.

为什么会出现^M / Why Does ^M Appear?

Windows换行使用的CRLF标准,CRLF模仿了老式打字机,在老式打字机中,Carriage Return(r,回车)的功能是将打印头移回行首,而 Line Feed(n,换行)的功能是将纸张向上移动一行。因此,为了开始新的一行,需要先执行回车操作,然后执行换行操作。

🌈Windows uses the CRLF standard for line breaks, which imitates old typewriters. In old typewriters, the function of Carriage Return (r, return) is to move the print head back to the beginning of the line, while Line Feed (n, line break) moves the paper up one line. Therefore, to start a new line, you need to first perform a return operation and then a line break operation.

而Linux系统则简化了操作,使用单独的 LF 作为行尾标记,多出来的CR 则被显示为^MCR在 ASCII 表中的值是 13, ^ 是控制字符的前缀,M 是 ASCII 表中第 13 个字符)

🌈On the other hand, the Linux system has simplified this process by using a single LF as the end-of-line marker. The extra CR is displayed as ^M (CR has a value of 13 in the ASCII table, ^ is the prefix for control characters, and M is the 13th character in the ASCII table).

无用的知识 / Useless Knowledge

最骚的是,Mac OS X以前的版本(也就是2001年之前的版本)采用CR作为换行符,和Windows的CRLF与Linux的LF都不一样,古早时期的程序员们跨操作系统写脚本真的备受折磨。

🌈What's interesting is that versions of Mac OS X prior to 2001 used CR as the line break symbol, which is different from Windows' CRLF and Linux's LF. Programmers from the old days really suffered when writing scripts across different operating systems.

Mac OS X以及之后的版本(Mac OS X先后改名为, OS XmacOS),最后采用了LF的换行标准,算是和Linux达成了一致,这也是程序员偏爱macOS系统的原因,程序员写的程序会放到服务器运行,而服务器大多运行Linux

🌈Mac OS X and its subsequent versions (which were renamed OS X and then macOS) eventually adopted the LF line break standard, which is consistent with Linux. This is why programmers prefer the macOS system, as the programs they write will be run on servers, and most servers run Linux.

如何消除^M? / How to Eliminate ^M?

方法1: 使用VScode消除(适合少量文件)

🌈Method 1: Use VScode to eliminate (suitable for a small number of files)

在Windows中,^M 是换行的一部分,不需要消除;如果你使用专业代码编辑器VScode,也可以手动指定换行方式,如果你的文本文件已经采用了 CRLF 编写,也可以通过VScode直接转换为LF换行。

🌈In Windows, ^M is part of the line break and does not need to be eliminated. If you use a professional code editor like VScode, you can also manually specify the line break method. If your text file has already been written using CRLF, you can also convert it directly to LF line breaks through VScode.

方法2: 使用dos2unix转换(适合大量文件转换)

🌈Method 2: Use dos2unix for conversion (suitable for a large number of file conversions)

# 安装dos2unix / Install dos2unix
sudo apt install dos2unix -y 

保留旧文件,转换单个文件

🌈Keep the old file and convert a single file

# 保留原文件转换 dos2unix -n oldfile newfile / Keep the original file for conversion dos2unix -n oldfile newfile
dos2unix -n  newline-character-换行符.txt newline-character-换行符-for-unix.txt

不保留旧文件,直接转换单个文件

🌈Do not keep the old file, directly convert a single file

# 单个文件 / Single file
dos2unix file

不保留旧文件, 批量转换文件夹内的文件 (谨慎起见,请提前复制文件夹做好备份)

🌈Do not keep old files, batch convert files in the folder (to be cautious, please copy the folder in advance for backup)

# 转换某文件夹内的所有文件 / Convert all files in a folder
find /path/to/your/directory -type f -exec dos2unix {} ;

批量转换成功!

🌈Batch conversion successful!

小结 / Summary

Windows是民用领域市场占有率最高的操作系统,但在Windows上做程序开发,确实会遇到CRLF这类独特奇妙的小问题,对于开发者而言,用Windows启动Linux虚拟机,在虚拟机中做开发,也许是省事的开发策略。

🌈Windows is the operating system with the highest market share in the consumer field, but when doing program development on Windows, you will indeed encounter unique and interesting little problems like CRLF. For developers, using Windows to launch a Linux virtual machine and doing development in the virtual machine may be a convenient development strategy.

原文 v2fy.com/p/2023-09-2…

相关文章

服务器端口转发,带你了解服务器端口转发
服务器开放端口,服务器开放端口的步骤
产品推荐:7月受欢迎AI容器镜像来了,有Qwen系列大模型镜像
如何使用 WinGet 下载 Microsoft Store 应用
百度搜索:蓝易云 – 熟悉ubuntu apt-get命令详解
百度搜索:蓝易云 – 域名解析成功但ping不通解决方案

发布评论