在Windows中,在500k行文件上执行dos2unix的最佳方法是什么? [关闭]
-
10-07-2019 - |
题
问题说明了一切,我有一个500,000行文件,它是在Windows机器上自动构建过程的一部分生成的,并且充满了 ^ M 。当它出门时需要 * nix 友好,这里最好的方法是什么,是否有一个方便的代码片段可以为我做到这一点?或者我是否需要编写一些C#或Java应用程序?
解决方案
这是一个Perl单行,取自 http://www.technocage.com /〜卡斯基/ DOS2UNIX的/
#!/usr/bin/perl -pi
s/\r\n/\n/;
您可以按如下方式运行它:
perl dos2unix.pl < file.dos > file.unix
或者,您也可以通过这种方式运行它(转换是就地完成的):
perl -pi dos2unix.pl file.dos
这是我的(幼稚)C版:
#include <stdio.h>
int main(void)
{
int c;
while( (c = fgetc(stdin)) != EOF )
if(c != '\r')
fputc(c, stdout);
return 0;
}
您应该使用输入和输出重定向运行它:
dos2unix.exe < file.dos > file.unix
其他提示
If installing a base cygwin is too heavy, there are a number of standalone dos2unix
and unix2dos
Windows standalone console-based programs on the net, many with C/C++ source available. If I'm understanding the requirement correctly, either of these solutions would fit nicely into an automated build script.
If you're on Windows and need something run in a batch script, you can compile a simple C program to do the trick.
#include <stdio.h>
int main() {
while(1) {
int c = fgetc(stdin);
if(c == EOF)
break;
if(c == '\r')
continue;
fputc(c, stdout);
}
return 0;
}
Usage:
myprogram.exe < input > output
Editing in-place would be a bit more difficult. Besides, you may want to keep backups of the originals for some reason (in case you accidentally strip a binary file, for example).
That version removes all CR characters; if you only want to remove the ones that are in a CR-LF pair, you can use (this is the classic one-character-back method :-):
/* XXX Contains a bug -- see comments XXX */
#include <stdio.h>
int main() {
int lastc = EOF;
int c;
while ((c = fgetc(stdin)) != EOF) {
if ((lastc != '\r') || (c != '\n')) {
fputc (lastc, stdout);
}
lastc = c;
}
fputc (lastc, stdout);
return 0;
}
You can edit the file in-place using mode "r+". Below is a general myd2u program, which accepts file names as arguments. NOTE: This program uses ftruncate to chop off extra characters at the end. If there's any better (standard) way to do this, please edit or comment. Thanks!
#include <stdio.h>
int main(int argc, char **argv) {
FILE *file;
if(argc < 2) {
fprintf(stderr, "Usage: myd2u <files>\n");
return 1;
}
file = fopen(argv[1], "rb+");
if(!file) {
perror("");
return 2;
}
long readPos = 0, writePos = 0;
int lastC = EOF;
while(1) {
fseek(file, readPos, SEEK_SET);
int c = fgetc(file);
readPos = ftell(file); /* For good measure. */
if(c == EOF)
break;
if(c == '\n' && lastC == '\r') {
/* Move back so we override the \r with the \n. */
--writePos;
}
fseek(file, writePos, SEEK_SET);
fputc(c, file);
writePos = ftell(file);
lastC = c;
}
ftruncate(fileno(file), writePos); /* Not in C89/C99/ANSI! */
fclose(file);
/* 'cus I'm too lazy to make a loop. */
if(argc > 2)
main(argc - 1, argv - 1);
return 0;
}
tr -d '^M' < infile > outfile
You will type ^M as : ctrl+V , Enter
Edit: You can use '\r' instead of manually entering a carriage return, [thanks to @strager]
tr -d '\r' < infile > outfile
Edit 2: 'tr' is a unix utility, you can download a native windows version from http://unxutils.sourceforge.net[thanks to @Rob Kennedy] or use cygwin's unix emulation.
Ftp it from the dos box, to the unix box, as an ascii file, instead of a binary file. Ftp will strip the crlf, and insert a lf. Transfer it back to the dos box as a binary file, and the lf will be retained.
Some text editors, such as UltraEdit/UEStudio have this functionality built-in.
File > Conversions > DOS to UNIX
If it is just one file I use notepad++. Nice because it is free. I have cygwin installed and use a one liner script I wrote for multiple files. If your interest in the script leave a comment. (I don't have it available to me a this moment.)