Objective-C:逐行读取文件
-
20-08-2019 - |
题
在 Objective-C 中处理大文本文件的正确方法是什么?假设我需要单独读取每一行,并希望将每一行视为 NSString。做到这一点最有效的方法是什么?
一种解决方案是使用 NSString 方法:
+ (id)stringWithContentsOfFile:(NSString *)path
encoding:(NSStringEncoding)enc
error:(NSError **)error
然后用换行符分隔行,然后迭代数组中的元素。然而,这似乎效率相当低。是否没有简单的方法将文件视为流,枚举每一行,而不是一次读取全部内容?有点像 Java 的 java.io.BufferedReader。
解决方案
这是一个很好的问题。我认为 @迪德里克 有一个很好的答案,尽管不幸的是 Cocoa 没有一个机制来实现您想要做的事情。
NSInputStream
允许您读取 N 个字节的块(非常类似于 java.io.BufferedReader
),但你必须将其转换为 NSString
自行扫描,然后扫描换行符(或任何其他分隔符)并保存所有剩余字符以供下次读取,或者如果尚未读取换行符则读取更多字符。(NSFileHandle
让您阅读 NSData
然后您可以将其转换为 NSString
, ,但本质上是相同的过程。)
苹果有一个 流编程指南 这可以帮助填写详细信息,并且 这个问题 如果您要处理的话也可能有帮助 uint8_t*
缓冲区。
如果您要经常读取这样的字符串(尤其是在程序的不同部分),最好将此行为封装在一个可以为您处理详细信息的类中,甚至可以子类化 NSInputStream
(它是 设计为子类化)并添加允许您准确阅读所需内容的方法。
根据记录,我认为这将是一个很好的添加功能,我将提交一个增强请求,以实现这一点。:-)
编辑: 结果这个请求已经存在了。有一个 2006 年推出的 Radar(对于 Apple 内部人员来说是 rdar://4742914)。
其他提示
这将一般为读取来自String
一个Text
工作。
如果你想读长文本的(大尺寸的文字)的,然后用这里的其他人被提到如缓冲的方法的(保留文本的内存空间大小)的
说你读的文本文件。
NSString* filePath = @""//file path...
NSString* fileRoot = [[NSBundle mainBundle]
pathForResource:filePath ofType:@"txt"];
您想摆脱新的生产线。
// read everything from text
NSString* fileContents =
[NSString stringWithContentsOfFile:fileRoot
encoding:NSUTF8StringEncoding error:nil];
// first, separate by new line
NSArray* allLinedStrings =
[fileContents componentsSeparatedByCharactersInSet:
[NSCharacterSet newlineCharacterSet]];
// then break down even further
NSString* strsInOneLine =
[allLinedStrings objectAtIndex:0];
// choose whatever input identity you have decided. in this case ;
NSArray* singleStrs =
[currentPointString componentsSeparatedByCharactersInSet:
[NSCharacterSet characterSetWithCharactersInString:@";"]];
有你有它。
此应达到目的:
#include <stdio.h>
NSString *readLineAsNSString(FILE *file)
{
char buffer[4096];
// tune this capacity to your liking -- larger buffer sizes will be faster, but
// use more memory
NSMutableString *result = [NSMutableString stringWithCapacity:256];
// Read up to 4095 non-newline characters, then read and discard the newline
int charsRead;
do
{
if(fscanf(file, "%4095[^\n]%n%*c", buffer, &charsRead) == 1)
[result appendFormat:@"%s", buffer];
else
break;
} while(charsRead == 4095);
return result;
}
使用如下:
FILE *file = fopen("myfile", "r");
// check for NULL
while(!feof(file))
{
NSString *line = readLineAsNSString(file);
// do stuff with line; line is autoreleased, so you should NOT release it (unless you also retain it beforehand)
}
fclose(file);
这个代码从文件一次读取非换行符,高达4095。如果你有一个线长于4095个字符,它一直读取,直到遇到一个换行符或档案结尾。
注意的:我没有测试此代码。在使用它之前,请进行测试。
的Mac OS X是UNIX,Objective-C的是C的超集,所以你可以只使用旧学校fopen
和fgets
从<stdio.h>
。它保证工作。
[NSString stringWithUTF8String:buf]
将转换C字符串到NSString
。也有用于在其它的编码串和没有拷贝创建方法。
您可以使用NSInputStream
这对于文件流的基本实现。您可以将字节读入缓冲区(read:maxLength:
方法)。你必须扫描换行符自己的缓冲区。
在可可/ Objective-C的读取文本文件采取适当的方式在苹果的字符串编程指南记录。 应该是读取和写入文件的部分正是你后。 PS:什么是“行”?字符串的两个部分由“\ n”个分开吗?或 “\ r”?或 “\ r \ n”?或者,也许你的段落后,实际上是?前面提到的导向件还包括在将字符串分割为行或段落的截面。 (这一段被称为“段落和换行”,并链接到我指着上面的页面的左手侧菜单。不幸的是这个网站不允许我发布一个以上的网址为我不可信赖的用户爱好。)
要套用Knuth的:过早的优化是所有罪恶的根源。不要简单地认为“阅读整个文件到内存”是缓慢的。你有没有基准呢?你知道它的真正的读取整个文件到内存?也许它只是返回一个代理对象,并保持阅读幕后为你消费的字符串? (免责声明:我不知道,如果NSString的实际执行这一点,可以想象可能的)问题的关键是:先用做事的记录的路要走。然后,如果基准测试显示,这并没有你想要的性能,优化。
很多这些答案都是长的代码块,或者它们在整个文件中读取。我喜欢用C方式完成本次任务非常
FILE* file = fopen("path to my file", "r");
size_t length;
char *cLine = fgetln(file,&length);
while (length>0) {
char str[length+1];
strncpy(str, cLine, length);
str[length] = '\0';
NSString *line = [NSString stringWithFormat:@"%s",str];
% Do what you want here.
cLine = fgetln(file,&length);
}
请注意fgetln不会让你的换行符。此外,我们的+1 str的长度,因为我们希望让空间的NULL终止。
要读取线(也用于极端大文件)的文件线可以由以下函数来完成:
DDFileReader * reader = [[DDFileReader alloc] initWithFilePath:pathToMyFile];
NSString * line = nil;
while ((line = [reader readLine])) {
NSLog(@"read line: %@", line);
}
[reader release];
或者:
DDFileReader * reader = [[DDFileReader alloc] initWithFilePath:pathToMyFile];
[reader enumerateLinesUsingBlock:^(NSString * line, BOOL * stop) {
NSLog(@"read line: %@", line);
}];
[reader release];
在类DDFileReader,使这是以下内容:
<强>接口文件(.H):强>
@interface DDFileReader : NSObject {
NSString * filePath;
NSFileHandle * fileHandle;
unsigned long long currentOffset;
unsigned long long totalFileLength;
NSString * lineDelimiter;
NSUInteger chunkSize;
}
@property (nonatomic, copy) NSString * lineDelimiter;
@property (nonatomic) NSUInteger chunkSize;
- (id) initWithFilePath:(NSString *)aPath;
- (NSString *) readLine;
- (NSString *) readTrimmedLine;
#if NS_BLOCKS_AVAILABLE
- (void) enumerateLinesUsingBlock:(void(^)(NSString*, BOOL *))block;
#endif
@end
<强>实施(.M)强>
#import "DDFileReader.h"
@interface NSData (DDAdditions)
- (NSRange) rangeOfData_dd:(NSData *)dataToFind;
@end
@implementation NSData (DDAdditions)
- (NSRange) rangeOfData_dd:(NSData *)dataToFind {
const void * bytes = [self bytes];
NSUInteger length = [self length];
const void * searchBytes = [dataToFind bytes];
NSUInteger searchLength = [dataToFind length];
NSUInteger searchIndex = 0;
NSRange foundRange = {NSNotFound, searchLength};
for (NSUInteger index = 0; index < length; index++) {
if (((char *)bytes)[index] == ((char *)searchBytes)[searchIndex]) {
//the current character matches
if (foundRange.location == NSNotFound) {
foundRange.location = index;
}
searchIndex++;
if (searchIndex >= searchLength) { return foundRange; }
} else {
searchIndex = 0;
foundRange.location = NSNotFound;
}
}
return foundRange;
}
@end
@implementation DDFileReader
@synthesize lineDelimiter, chunkSize;
- (id) initWithFilePath:(NSString *)aPath {
if (self = [super init]) {
fileHandle = [NSFileHandle fileHandleForReadingAtPath:aPath];
if (fileHandle == nil) {
[self release]; return nil;
}
lineDelimiter = [[NSString alloc] initWithString:@"\n"];
[fileHandle retain];
filePath = [aPath retain];
currentOffset = 0ULL;
chunkSize = 10;
[fileHandle seekToEndOfFile];
totalFileLength = [fileHandle offsetInFile];
//we don't need to seek back, since readLine will do that.
}
return self;
}
- (void) dealloc {
[fileHandle closeFile];
[fileHandle release], fileHandle = nil;
[filePath release], filePath = nil;
[lineDelimiter release], lineDelimiter = nil;
currentOffset = 0ULL;
[super dealloc];
}
- (NSString *) readLine {
if (currentOffset >= totalFileLength) { return nil; }
NSData * newLineData = [lineDelimiter dataUsingEncoding:NSUTF8StringEncoding];
[fileHandle seekToFileOffset:currentOffset];
NSMutableData * currentData = [[NSMutableData alloc] init];
BOOL shouldReadMore = YES;
NSAutoreleasePool * readPool = [[NSAutoreleasePool alloc] init];
while (shouldReadMore) {
if (currentOffset >= totalFileLength) { break; }
NSData * chunk = [fileHandle readDataOfLength:chunkSize];
NSRange newLineRange = [chunk rangeOfData_dd:newLineData];
if (newLineRange.location != NSNotFound) {
//include the length so we can include the delimiter in the string
chunk = [chunk subdataWithRange:NSMakeRange(0, newLineRange.location+[newLineData length])];
shouldReadMore = NO;
}
[currentData appendData:chunk];
currentOffset += [chunk length];
}
[readPool release];
NSString * line = [[NSString alloc] initWithData:currentData encoding:NSUTF8StringEncoding];
[currentData release];
return [line autorelease];
}
- (NSString *) readTrimmedLine {
return [[self readLine] stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
}
#if NS_BLOCKS_AVAILABLE
- (void) enumerateLinesUsingBlock:(void(^)(NSString*, BOOL*))block {
NSString * line = nil;
BOOL stop = NO;
while (stop == NO && (line = [self readLine])) {
block(line, &stop);
}
}
#endif
@end
类通过戴夫德朗
就像@porneL表示,C API是非常方便的。
NSString* fileRoot = [[NSBundle mainBundle] pathForResource:@"record" ofType:@"txt"];
FILE *file = fopen([fileRoot UTF8String], "r");
char buffer[256];
while (fgets(buffer, 256, file) != NULL){
NSString* result = [NSString stringWithUTF8String:buffer];
NSLog(@"%@",result);
}
正如其他人回答两者NSInputStream和NSFileHandle都很好的选择,但它也可以与NSData的和存储器映射相当紧凑的方式完成的:
BRLineReader.h
#import <Foundation/Foundation.h>
@interface BRLineReader : NSObject
@property (readonly, nonatomic) NSData *data;
@property (readonly, nonatomic) NSUInteger linesRead;
@property (strong, nonatomic) NSCharacterSet *lineTrimCharacters;
@property (readonly, nonatomic) NSStringEncoding stringEncoding;
- (instancetype)initWithFile:(NSString *)filePath encoding:(NSStringEncoding)encoding;
- (instancetype)initWithData:(NSData *)data encoding:(NSStringEncoding)encoding;
- (NSString *)readLine;
- (NSString *)readTrimmedLine;
- (void)setLineSearchPosition:(NSUInteger)position;
@end
BRLineReader.m
#import "BRLineReader.h"
static unsigned char const BRLineReaderDelimiter = '\n';
@implementation BRLineReader
{
NSRange _lastRange;
}
- (instancetype)initWithFile:(NSString *)filePath encoding:(NSStringEncoding)encoding
{
self = [super init];
if (self) {
NSError *error = nil;
_data = [NSData dataWithContentsOfFile:filePath options:NSDataReadingMappedAlways error:&error];
if (!_data) {
NSLog(@"%@", [error localizedDescription]);
}
_stringEncoding = encoding;
_lineTrimCharacters = [NSCharacterSet whitespaceAndNewlineCharacterSet];
}
return self;
}
- (instancetype)initWithData:(NSData *)data encoding:(NSStringEncoding)encoding
{
self = [super init];
if (self) {
_data = data;
_stringEncoding = encoding;
_lineTrimCharacters = [NSCharacterSet whitespaceAndNewlineCharacterSet];
}
return self;
}
- (NSString *)readLine
{
NSUInteger dataLength = [_data length];
NSUInteger beginPos = _lastRange.location + _lastRange.length;
NSUInteger endPos = 0;
if (beginPos == dataLength) {
// End of file
return nil;
}
unsigned char *buffer = (unsigned char *)[_data bytes];
for (NSUInteger i = beginPos; i < dataLength; i++) {
endPos = i;
if (buffer[i] == BRLineReaderDelimiter) break;
}
// End of line found
_lastRange = NSMakeRange(beginPos, endPos - beginPos + 1);
NSData *lineData = [_data subdataWithRange:_lastRange];
NSString *line = [[NSString alloc] initWithData:lineData encoding:_stringEncoding];
_linesRead++;
return line;
}
- (NSString *)readTrimmedLine
{
return [[self readLine] stringByTrimmingCharactersInSet:_lineTrimCharacters];
}
- (void)setLineSearchPosition:(NSUInteger)position
{
_lastRange = NSMakeRange(position, 0);
_linesRead = 0;
}
@end
此答案不是ObjC但是C
由于ObjC是基于 'C',为什么不使用与fgets?
是的,我相信ObjC有它自己的方法 - 我只是不精通,还不足以知道它是什么:)
,fscanf
的格式串将被改变象下面这样:
"%4095[^\r\n]%n%*[\n\r]"
将在OSX,LINUX,窗户行结束工作。
使用类别或延伸,使我们的生活更容易一点。
extension String {
func lines() -> [String] {
var lines = [String]()
self.enumerateLines { (line, stop) -> () in
lines.append(line)
}
return lines
}
}
// then
for line in string.lines() {
// do the right thing
}
我发现通过@lukaswelte和代码响应从戴夫德朗的非常有益的。我一直在寻找解决这个问题,但通过\r\n
不仅仅是\n
解析大文件需要。
作为写入的代码包含一个错误,如果解析由一个以上的字符。我已经改变为下面的代码。
h文件:
#import <Foundation/Foundation.h>
@interface FileChunkReader : NSObject {
NSString * filePath;
NSFileHandle * fileHandle;
unsigned long long currentOffset;
unsigned long long totalFileLength;
NSString * lineDelimiter;
NSUInteger chunkSize;
}
@property (nonatomic, copy) NSString * lineDelimiter;
@property (nonatomic) NSUInteger chunkSize;
- (id) initWithFilePath:(NSString *)aPath;
- (NSString *) readLine;
- (NSString *) readTrimmedLine;
#if NS_BLOCKS_AVAILABLE
- (void) enumerateLinesUsingBlock:(void(^)(NSString*, BOOL *))block;
#endif
@end
.m文件:
#import "FileChunkReader.h"
@interface NSData (DDAdditions)
- (NSRange) rangeOfData_dd:(NSData *)dataToFind;
@end
@implementation NSData (DDAdditions)
- (NSRange) rangeOfData_dd:(NSData *)dataToFind {
const void * bytes = [self bytes];
NSUInteger length = [self length];
const void * searchBytes = [dataToFind bytes];
NSUInteger searchLength = [dataToFind length];
NSUInteger searchIndex = 0;
NSRange foundRange = {NSNotFound, searchLength};
for (NSUInteger index = 0; index < length; index++) {
if (((char *)bytes)[index] == ((char *)searchBytes)[searchIndex]) {
//the current character matches
if (foundRange.location == NSNotFound) {
foundRange.location = index;
}
searchIndex++;
if (searchIndex >= searchLength)
{
return foundRange;
}
} else {
searchIndex = 0;
foundRange.location = NSNotFound;
}
}
if (foundRange.location != NSNotFound
&& length < foundRange.location + foundRange.length )
{
// if the dataToFind is partially found at the end of [self bytes],
// then the loop above would end, and indicate the dataToFind is found
// when it only partially was.
foundRange.location = NSNotFound;
}
return foundRange;
}
@end
@implementation FileChunkReader
@synthesize lineDelimiter, chunkSize;
- (id) initWithFilePath:(NSString *)aPath {
if (self = [super init]) {
fileHandle = [NSFileHandle fileHandleForReadingAtPath:aPath];
if (fileHandle == nil) {
return nil;
}
lineDelimiter = @"\n";
currentOffset = 0ULL; // ???
chunkSize = 128;
[fileHandle seekToEndOfFile];
totalFileLength = [fileHandle offsetInFile];
//we don't need to seek back, since readLine will do that.
}
return self;
}
- (void) dealloc {
[fileHandle closeFile];
currentOffset = 0ULL;
}
- (NSString *) readLine {
if (currentOffset >= totalFileLength)
{
return nil;
}
@autoreleasepool {
NSData * newLineData = [lineDelimiter dataUsingEncoding:NSUTF8StringEncoding];
[fileHandle seekToFileOffset:currentOffset];
unsigned long long originalOffset = currentOffset;
NSMutableData *currentData = [[NSMutableData alloc] init];
NSData *currentLine = [[NSData alloc] init];
BOOL shouldReadMore = YES;
while (shouldReadMore) {
if (currentOffset >= totalFileLength)
{
break;
}
NSData * chunk = [fileHandle readDataOfLength:chunkSize];
[currentData appendData:chunk];
NSRange newLineRange = [currentData rangeOfData_dd:newLineData];
if (newLineRange.location != NSNotFound) {
currentOffset = originalOffset + newLineRange.location + newLineData.length;
currentLine = [currentData subdataWithRange:NSMakeRange(0, newLineRange.location)];
shouldReadMore = NO;
}else{
currentOffset += [chunk length];
}
}
if (currentLine.length == 0 && currentData.length > 0)
{
currentLine = currentData;
}
return [[NSString alloc] initWithData:currentLine encoding:NSUTF8StringEncoding];
}
}
- (NSString *) readTrimmedLine {
return [[self readLine] stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceAndNewlineCharacterSet]];
}
#if NS_BLOCKS_AVAILABLE
- (void) enumerateLinesUsingBlock:(void(^)(NSString*, BOOL*))block {
NSString * line = nil;
BOOL stop = NO;
while (stop == NO && (line = [self readLine])) {
block(line, &stop);
}
}
#endif
@end
我加入这个,因为其他所有的答案我想未达这种或那种方式。以下方法可以处理大文件,任意长的线条,以及空行。它已经过测试与实际内容,并将从输出剔除换行符。
- (NSString*)readLineFromFile:(FILE *)file
{
char buffer[4096];
NSMutableString *result = [NSMutableString stringWithCapacity:1000];
int charsRead;
do {
if(fscanf(file, "%4095[^\r\n]%n%*[\n\r]", buffer, &charsRead) == 1) {
[result appendFormat:@"%s", buffer];
}
else {
break;
}
} while(charsRead == 4095);
return result.length ? result : nil;
}
幸得@Adam罗森菲尔德和@sooop
下面是一个不错的简单的解决方案,我使用了较小的文件:
NSString *path = [[NSBundle mainBundle] pathForResource:@"Terrain1" ofType:@"txt"];
NSString *contents = [NSString stringWithContentsOfFile:path encoding:NSASCIIStringEncoding error:nil];
NSArray *lines = [contents componentsSeparatedByCharactersInSet:[NSCharacterSet characterSetWithCharactersInString:@"\r\n"]];
for (NSString* line in lines) {
if (line.length) {
NSLog(@"line: %@", line);
}
}
使用这个脚本,它的伟大工程:
NSString *path = @"/Users/xxx/Desktop/names.txt";
NSError *error;
NSString *stringFromFileAtPath = [NSString stringWithContentsOfFile: path
encoding: NSUTF8StringEncoding
error: &error];
if (stringFromFileAtPath == nil) {
NSLog(@"Error reading file at %@\n%@", path, [error localizedFailureReason]);
}
NSLog(@"Contents:%@", stringFromFileAtPath);